Are you sure you want to delete this access key?
title | parent | nav_order |
---|---|---|
OpenLibrary | Data Model | 3 |
{: .no_toc}
We also source book data from OpenLibrary, as downloaded from their developer dumps.
The DVC control files automatically download the appropriate version. The version can be
updated by modifying the data/ol_dump_*.txt.gz.dvc
files.
Imported data lives in the ol
schema.
The import is controlled by the following DVC steps:
schemas/ol-schema.dvc
ol-schema.sql
to set up the base schema.import/ol-works.dvc
data/ol_dump_works.txt.gz
.import/ol-editions.dvc
data/ol_dump_editions.txt.gz
.import/ol-authors.dvc
data/ol_dump_authors.txt.gz
.index/ol-index.dvc
ol-index.sql
to index the book data and extract tables.index/ol-book-info.dvc
ol-book-info.sql
to extract additional book data into tables.OpenLibrary provides its data as JSON. It is imported as-is into a JSONB column in three tables:
ol.author
ol.work
ol.edition
Each of these has the following columns:
/books/3180A3
).We use PostgreSQL's JSON operators and functions to extract the data from these tables for the rest of the OpenLibrary data model.
We extract the following tables from OpenLibrary editions:
edition_author
edition
and author
to record an edition's authors.edition_first_author
edition
and author
to record an edition's first author.edition_work
edition
to its work
(s)edition_isbn
edition
(not ISBN IDs)isbn_link
We extract the following tables from OpenLibrary works:
work_author
work
and author
to record an work's authors.work_first_author
work
and author
to record an work's first author.work_subject
subjects
entries for each work.author_name
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?