One of our sources of book data is the Library of Congress MDSConnect Books bibliography records.
We download and import the XML versions of these files.
Imported data lives under the
The import is controlled by the following DVC steps:
loc-mds-schema.sql to set up the base schema.
: Import raw MARC data from
: Parse ISBNs from LOC ISBN records.
loc-mds-index-books.sql to index the book data and extract tables.
loc-mds-book-info.sql to extract additional book data into tables.
locmds.book_marc_fields table contains the raw data imported from the MARC files, as MARC fields. The LOC book data follows the MARC 21 Bibliographic Data format; the various tags, field codes, and indicators are defined there. This table is not terribly useful on its own, but it is the source from which the other tables are derived.
It has the following columns:
: The record identifier (generated at import)
: The field number. This corresponds to a single MARC field entry; rows in this table
containing data from MARC subfields will share a `fld_no` with their containing field.
: The MARC tag; either a three-digit number, or
LDR for the MARC leader.
: MARC indicators. Their meanings are defined in the MARC specification.
: MARC subfield code.
: The raw textual content of the MARC field or subfield.
We then extract a number of tables and views from this MARC data. These tables include:
: Code information for each book record.
- MARC Control Number - Library of Congress Control Number (LCCN) - Record status - Record type - Bibliographic level More information about the last three is in the [leader specification](https://www.loc.gov/marc/bibliographic/bdleader.html).
: A subset of
book_record_info intended to capture the actual books in the collection,
as opposed to other types of materials. We consider a book to be anything that has MARC record type ‘a’ or ‘t’ (language material), and is not also classified as a government record in MARC field 008.
: Textual ISBNs as extracted from LOC records. The actual ISBN strings (tag 020 subfield ‘a’) are
quite messy; the Rust program `parse-isbns` parses out ISBNs, along with additional tags or descriptors, from the ISBN strings using a number of best-effort heuristics. This table contains the results of that process.
: Map book records to their ISBNs.
: Author names for book records. This only extracts the primary author name (MARC field 100
: Book publication year (MARC field 260 subfield ‘c’).
: Book title (MARC field 245 subfield ‘a’).
Press p or to see the previous file or, n or to see the next file