Are you sure you want to delete this access key?
title | parent | nav_order |
---|---|---|
VIAF | Data Model | 4 |
{: .no_toc}
We source author data from the Virtual Internet Authority File, as downloaded from their data dumps. This file is slow and error-prone to download, and is not* auto-downloaded.
Imported data lives in the viaf
schema.
The import is controlled by the following DVC steps:
schemas/viaf-schema.dvc
viaf-schema.sql
to set up the base schema.import/viaf.dvc
data/viaf-clusters-marc21.xml.gz
.index/viaf-index.dvc
viaf-index.sql
to index the MARC data and extract tables.VIAF data is in MARC 21 Authority Record format. The raw
MARC data is imported into the marc_field
table with the same format as LOC.
We extract the following tables for VIAF authors:
author_name
author_gender
The MARC gender field is defined as the author's gender identity. It allows identities from an open vocabulary, along with start and end dates for the validity of each identity.
The Program for Cooperative Cataloging Task Group on Gender in Name Authority Records produced a report with recommendations for how to record this field. Many libraries contributing to the Library of Congress file, from which many VIAF records are sourced, follow these recommendations, but it is not safe to assume they are universally followed by all VIAF contributors.
Further, as near as we can tell, the VIAF removes all non-binary gender identities or converts them to ‘unknown’.
This data should only be used with great care. We discuss these limitations in the extended preprint.
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?