No Description

Michael Ekstrand cd857028a7 no jekyll 2 months ago
.dvc f1083188ce fix url 1 year ago
.github b48775fa84 Revert "Deploy to Netlify" 2 months ago
bookdata 6bc65ecadd Export titles 2 months ago
data 0ce8407694 update dvc 2 months ago
docs cd857028a7 no jekyll 2 months ago
export 8da85ee8ff working exports 2 months ago
import 0ce8407694 update dvc 2 months ago
index 8da85ee8ff working exports 2 months ago
integrate 0ce8407694 update dvc 2 months ago
schemas 0ce8407694 update dvc 2 months ago
scripts 8da85ee8ff working exports 2 months ago
src c3fb2d09d7 small warning fix 6 months ago
.editorconfig 0ffc7b5cc2 Add Windows command support 1 year ago
.gitattributes 0ffc7b5cc2 Add Windows command support 1 year ago
.gitignore 586ebc0e9f use sphinx to buil docs 2 months ago
Cargo.lock 8fa6ec587e Use happylog from crates 7 months ago
Cargo.toml 8fa6ec587e Use happylog from crates 7 months ago
ClusterStats.ipynb e67aa5e364 Cluster statistics and exploration 1 year ago
IDGraphExplore.ipynb e67aa5e364 Cluster statistics and exploration 1 year ago
LICENSE.txt d929779cfb Add license 2 years ago
LinkageStats.ipynb 9304ea62f0 uhh 6 months ago
README.md b6fa88eee5 citation update 2 months ago
doc-requirements.in 586ebc0e9f use sphinx to buil docs 2 months ago
doc-requirements.txt 586ebc0e9f use sphinx to buil docs 2 months ago
dvc.cmd 0ffc7b5cc2 Add Windows command support 1 year ago
dvc.lock 0ce8407694 update dvc 2 months ago
dvc.yaml 0ce8407694 update dvc 2 months ago
environment.yml 8da85ee8ff working exports 2 months ago
loc-mds-extract-isbns.transcript
ol-explore.sql baaae933e6 simplify + goodreads import 2 years ago
run.py c54eca1a4e Add Rust-based ISBN-parsing logic 1 year ago

Data Pipeline

Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

This repository contains the code to import and integrate the book and rating data that we work with. It imports and integrates data from several sources in a single PostgreSQL database; import scripts are primarily in Python, with Rust code for high-throughput processing of raw data files.

If you use these scripts in any published research, cite our paper (PDF):

Michael D. Ekstrand and Daniel Kluver. 2021. Exploring Author Gender in Book Rating and Recommendation. User Modeling and User-Adapted Interaction (February 2021) DOI:10.1007/s11257-020-09284-2.

We also ask that you contact Michael Ekstrand to let us know about your use of the data, so we can include your paper in our list of relying publications.

Note: the limitations section of the paper contains important information about the limitations of the data these scripts compile. Do not use the gender information in this data data or tools without understanding those limitations. In particular, VIAF's gender information is incomplete and, in a number of cases, incorrect.

In addition, several of the data sets integrated by this project come from other sources with their own publications. If you use any of the rating or interaction data, cite the appropriate original source paper. For each data set below, we have provided a link to the page that describes the data and its appropriate citation.

See the documentation site for details on using and extending these tools.

Running Everything

You can run the entire import process with:

dvc repro

Copyright and Acknowledgements

Copyright © 2020 Boise State University. Distributed under the MIT License; see LICENSE.md. This material is based upon work supported by the National Science Foundation under Grant No. IIS 17-51278. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.