title: BookCrossing parent: Data Model
The BookCrossing data set consists of user-provided ratings — both implicit and explicit — of books.
If you use this data, cite:
Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, and Georg Lausen. 2005. Improving Recommendation Lists Through Topic Diversification. Proceedings of the 14th International World Wide Web Conference (WWW '05), May 10-14, 2005, Chiba, Japan.
Imported data lives in the
bx schema. The source data files are automatically downloaded and unpacked by
the provided scripts and DVC stages.
The import is controlled by the following DVC steps:
: Unpack the BookCrossing zip file.
: Download the BookCrossing zip file.s
bx-schema.sql to set up the base schema.
: Import raw BookCrossing ratings from
bx-index.sql to index the rating data and integrate with book data.
The raw rating data, with invalid characters cleaned up, is in the
bx.raw_ratings table, with
the following columns:
user_id : The user identifier (numeric).
isbn : The book ISBN (text).
rating : The book rating. The ratings are on a 1-10 scale, with 0 indicating an implicit-feedback record.
We extract the following tables for BookCrossing ratings:
: The explicit ratings (
rating > 0) from the raw ratings table.
: Records of users adding books, either by rating or through an implicit feedback action,
without rating values.
Both of these tables are pre-clustered, so the book IDs refer to book clusters and not individual ISBNs or editions. They have the following columns:
: The user ID.
: The book code for this book; the cluster identifier if available, or the
ISBN-based book code if this book is not in a cluster.
: The rating value; if the user has rated multiple books in a cluster, the median value is reported.
This field is only on the `rating` table.
: The number of book actions this user performed on this book. Equivalent to the number of books in
the cluster that the user has added or rated.
Press p or to see the previous file or, n or to see the next file