Are you sure you want to delete this access key?
This repository contains the code to import and integrate the book and rating data that we work with.
psql
executable on the machine where the import scripts will runAll scripts read database connection info from the standard PostgreSQL client environment variables:
PGDATGABASE
PGHOST
PGUSER
PGPASSWORD
The -schema
files contain the base schemas for the data to import:
common-schema.sql
— common tablesloc-schema.sql
— Library of Congress catalog tablesol-schema.sql
— OpenLibrary book dataviaf-schema.sql
— VIAF tablesaz-schema.sql
— Amazon rating schemabx-schema.sql
— BookCrossing rating data schemaThe importer is run with Gulp.
npm install
npx gulp importOpenLib
npx gulp importLOC
npx gulp importVIAF
npx gulp importBX
npx gulp importAmazon
The full import takes 1–3 days.
Start tying the data together:
psql <viaf-index.sql
psql <loc-index.sql
psql <ol-index.sql
Clustering is done by the ClusterISBNs.r
script:
Rscript ClusterISBNs.r
psql <load-clusters.sql
With the clusters in place, we're ready to index the rating data:
psql <az-index.sql
psql <bx-index.sql
And finally, compute author information for ISBN clusters:
psql <author-info.sql
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?