title: OpenLibrary parent: Data Model
The DVC control files automatically download the appropriate version. The version can be
updated by modifying the
Imported data lives in the
The import is controlled by the following DVC steps:
ol-schema.sql to set up the base schema.
: Import raw OpenLibrary works from
: Import raw OpenLibrary editions from
: Import raw OpenLibrary authors from
ol-index.sql to index the book data and extract tables.
ol-book-info.sql to extract additional book data into tables.
OpenLibrary provides its data as JSON. It is imported as-is into a JSONB column in three tables:
Each of these has the following columns:
type_id : A numeric record identifier generated at import.
: The OpenLibrary identifier key (e.g.
type_data : The raw JSON data containing the record.
We use PostgreSQL's JSON operators and functions to extract the data from these tables for the rest of the OpenLibrary data model.
We extract the following tables from OpenLibrary editions:
author to record an edition's authors.
author to record an edition's first author.
: Links each
edition to its
: The raw ISBNs for each
edition (not ISBN IDs)
: Link ISBNs, editions, and works, along with the book code derived from an edition's
work and edition IDs. If an edition belongs to multiple works, it will appear multiple times here. This table violates 4NF.
We extract the following tables from OpenLibrary works:
author to record an work's authors.
author to record an work's first author.
subjects entries for each work.
: The names for each author. An author may have more than one listed name; this extracts
all of them.
Press p or to see the previous file or, n or to see the next file