Dean/Bookdata-tools

2 Branches 4 Releases

.vscode

f67473629b

Code!

7 years ago

data

96e1a3d4f9

Document data download

6 years ago

lib

0e1d6fafb2

no more q-e-s in ol import

6 years ago

test

26860d5d65

update ol import

6 years ago

.editorconfig

b44a6f1cf4

use gulp to orchestrate import

7 years ago

.gitattributes

5e04344d3f

use fortran for clustering

6 years ago

.gitignore

5e04344d3f

use fortran for clustering

6 years ago

ClusterISBNs.ipynb

5e04344d3f

use fortran for clustering

6 years ago

README.md

eb9cb1f50d

imporrt PG info

6 years ago

author-info.sql

03ecee71a4

first author indexing

6 years ago

az-index.sql

9368d7f44a

Properly use different clusters

6 years ago

az-schema.sql

da164ba54d

Separate indexing and schemas

7 years ago

bx-index.sql

9368d7f44a

Properly use different clusters

6 years ago

bx-schema.sql

da164ba54d

Separate indexing and schemas

7 years ago

common-schema.sql

6870631fbc

lots more indexing

6 years ago

fcluster.f95

5e04344d3f

use fortran for clustering

6 years ago

gulpfile.js

4e214d9b79

index LOC

6 years ago

isbn.cpp

a94229a62e

isbn?

6 years ago

load-clusters.sql

dfadb93d4e

import final clusters

6 years ago

loc-index.sql

f57863fe5e

Add view for publication years

6 years ago

loc-schema.sql

2d6c5ddbdb

drop record tables from schemas

6 years ago

ol-index.sql

e9a73d61b1

more author names and genders

6 years ago

ol-schema.sql

b0dd321089

new ol schema

6 years ago

package-lock.json

7438f01100

package-lock

6 years ago

package.json

1e3eecc7ad

drop node packages

6 years ago

test-parse.js

155db26317

get LOC import running

6 years ago

typings.json

5329dad25a

typings

7 years ago

viaf-index.sql

26860d5d65

update ol import

6 years ago

viaf-schema.sql

2d6c5ddbdb

drop record tables from schemas

6 years ago

DagsHub Storage

You have to be logged in to leave a comment.

This repository contains the code to import and integrate the book and rating data that we work with.

Requirements

PostgreSQL 10 (9.x may also work)
Node.js (tested on Carbon, the 8.x LTS line)
R with the Tidyverse and RPostgreSQL
psql executable on the machine where the import scripts will run
300GB disk space for the database
20-30GB disk for data files

All scripts read database connection info from the standard PostgreSQL client environment variables:

PGDATGABASE
PGHOST
PGUSER
PGPASSWORD

Setting Up Schemas

The -schema files contain the base schemas for the data to import:

common-schema.sql — common tables
loc-schema.sql — Library of Congress catalog tables
ol-schema.sql — OpenLibrary book data
viaf-schema.sql — VIAF tables
az-schema.sql — Amazon rating schema
bx-schema.sql — BookCrossing rating data schema

Importing Data

The importer is run with Gulp.

npm install
npx gulp importOpenLib
npx gulp importLOC
npx gulp importVIAF
npx gulp importBX
npx gulp importAmazon

The full import takes 1–3 days.

Indexing and Integrating

Start tying the data together:

psql <viaf-index.sql
psql <loc-index.sql
psql <ol-index.sql

Clustering is done by the ClusterISBNs.r script:

Rscript ClusterISBNs.r
psql <load-clusters.sql

With the clusters in place, we're ready to index the rating data:

psql <az-index.sql
psql <bx-index.sql

And finally, compute author information for ISBN clusters:

psql <author-info.sql

Tip!

Press p or to see the previous file or, n or to see the next file

README.md

Requirements

Setting Up Schemas

Importing Data

Indexing and Integrating

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Dean / Bookdata-tools mirror of https://github.com/BoiseState/bookdata-tools

README.md

Requirements

Setting Up Schemas

Importing Data

Indexing and Integrating

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Dean
/
Bookdata-tools
mirror of https://github.com/BoiseState/bookdata-tools