Install DagsHub:
pip install dagshub
To stream this data directly on DagsHub
from dagshub.streaming import DagsHubFilesystem
fs = DagsHubFilesystem(".", repo_url="https://dagshub.com/DagsHub-Datasets/pdb-3d-structural-biology-data-dataset")
fs.listdir("s3://pdbsnapshots")
Description
The “Protein Data Bank (PDB) archive” was established in 1971 as the first open-access digital data archive in biology. It is a collection of three-dimensional (3D) atomic-level structures of biological macromolecules (i.e., proteins, DNA, and RNA) and their complexes with one another and various small-molecule ligands (e.g., US FDA approved drugs, enzyme co-factors). For each PDB entry (unique identifier: 1abc or PDB_0000001abc) multiple data files contain information about the 3D atomic coordinates, sequences of biological macromolecules, information about any small molecules/ligands present in the entry, details about the structure-determination experiment, authors and publication information, experimental data, and the wwPDB validation report. Additional content stored in the archive includes documentation, summary reports, and software (among others). The PDB is a jointly-managed core archive of the Worldwide Protein Data Bank partnership [RCSB Protein Data Bank (RCSB PDB, rcsb.org); Protein Data Bank in Europe (PDBe, pdbe.org); Protein Data Bank Japan (PDBj, pdbj.org); Electron Microscopy Data Bank (EMDB, emdb-empiar.org); and Biological Magnetic Resonance Bank (BMRB, bmrb.io)]. RCSB PDB serves as the wwPDB-designated Archive Keeper for the Protein Data Bank. Additional wwPDB Core Archives are as follows: Electron Microscopy Data Bank (wwPDB-designated Archive Keeper: EMDB) Biological Magnetic Resonance Bank (wwPDB-designated Archive Keeper: BMRB)
Additional information
Documentation
Update frequency
New and updated data files are published weekly and released on Wednesdays 0:00 UTC.
Managed by
wwpdb.org