Michael Ekstrand 83f1aec7fa freeze 2 weeks ago
..
loc-books
loc-names
.gitignore f27edd8757 Use Curl instead of wget/aria2 for downloads 4 months ago
BX-Book-Ratings.csv
BX-Books.csv
BX-CSV-Dump.zip
BX-CSV-Dump.zip.dvc 621efa70d0 Bump DVC version 4 months ago
BX-Users.csv
BX.dvc 621efa70d0 Bump DVC version 4 months ago
README.md 96e1a3d4f9 Document data download 2 years ago
goodreads_book_authors.json.gz
goodreads_book_authors.json.gz.dvc 621efa70d0 Bump DVC version 4 months ago
goodreads_book_genres_initial.json.gz
goodreads_book_genres_initial.json.gz.dvc 621efa70d0 Bump DVC version 4 months ago
goodreads_book_works.json.gz
goodreads_book_works.json.gz.dvc 621efa70d0 Bump DVC version 4 months ago
goodreads_books.json.gz
goodreads_books.json.gz.dvc 621efa70d0 Bump DVC version 4 months ago
goodreads_interactions.json.gz
goodreads_interactions.json.gz.dvc 621efa70d0 Bump DVC version 4 months ago
id-graph.gt
loc-books.dvc 83f1aec7fa freeze 2 weeks ago
loc-names.dvc 83f1aec7fa freeze 2 weeks ago
ol_dump_authors.txt.gz
ol_dump_authors.txt.gz.dvc 621efa70d0 Bump DVC version 4 months ago
ol_dump_editions.txt.gz
ol_dump_editions.txt.gz.dvc 621efa70d0 Bump DVC version 4 months ago
ol_dump_works.txt.gz
ol_dump_works.txt.gz.dvc 621efa70d0 Bump DVC version 4 months ago
ratings_Books.csv
ratings_Books.csv.dvc 621efa70d0 Bump DVC version 4 months ago
viaf-clusters-marc21.xml.gz
viaf-clusters-marc21.xml.gz.dvc 621efa70d0 Bump DVC version 4 months ago

README.md

Data files go

Library of Congress

https://www.loc.gov/cds/products/MDSConnect-books_all.html

Download the MARC-XML files, all 42 of them, to a subdirectory called LOC.

OpenLibrary

https://openlibrary.org/developers/dumps

BookCrossing

http://www2.informatik.uni-freiburg.de/~cziegler/BX/

Amazon ratings

http://jmcauley.ucsd.edu/data/amazon/

Download the ratings-only file for Books.