Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
a3bf520dd9
Initial commit
1 year ago
bf2d96f8a6
update readme automation
1 year ago
Storage Buckets

README.md

You have to be logged in to leave a comment. Sign In

OpenProteinSet

Stream data with DDA:

from dagshub.streaming import DagsHubFilesystem

fs = DagsHubFilesystem(".", repo_url="https://dagshub.com/DagsHub-Datasets/openfold-dataset")

fs.listdir("s3://openfold")

Description:

Multiple sequence alignments (MSAs) for 132,000 unique Protein Data Bank (PDB) chains, covering 640,000 PDB chains in total, and 4,850,000 UniClust30 clusters. Template hits are also provided for the PDB chains and 270,000 UniClust30 clusters chosen for maximal diversity and MSA depth. MSAs were generated with HHBlits (-n3) and JackHMMER against MGnify, BFD, UniRef90, and UniClust30 while templates were identified from PDB70 with HHSearch, all according to procedures outlined in the supplement to the AlphaFold 2 Nature paper, Jumper et al. 2021. We expect the database to be broadly useful to structural biologists training or validating deep learning models for protein structure prediction and related tasks.

Contact:

Multiple sequence alignments (MSAs) for 132,000 unique Protein Data Bank (PDB) chains, covering 640,000 PDB chains in total, and 4,850,000 UniClust30 clusters. Template hits are also provided for the PDB chains and 270,000 UniClust30 clusters chosen for maximal diversity and MSA depth. MSAs were generated with HHBlits (-n3) and JackHMMER against MGnify, BFD, UniRef90, and UniClust30 while templates were identified from PDB70 with HHSearch, all according to procedures outlined in the supplement to the AlphaFold 2 Nature paper, Jumper et al. 2021. We expect the database to be broadly useful to structural biologists training or validating deep learning models for protein structure prediction and related tasks.

Update Frequency:

Never

Managed By:

OpenFold

Resources:

  1. resource:
    • Description: A repository of MSAs and template hits.
    • ARN: arn:aws:s3:::openfold
    • Region: us-east-1
    • Type: S3 Bucket

Tags:

openfold, msa, protein, protein template, protein folding, alphafold, open source software, life sciences, aws-pds

Tutorials:

  1. tutorial:

Publication:

  1. publication:
    • Title: OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization
    • URL: https://biorxiv.org/content/10.1101/2022.11.20.517210
    • AuthorName: Ahdritz, Gustaf; Bouatta, Nazim; Kadyan, Sachin; Xia, Qinghui; Gerecke, William; O'Donnell, Timothy J, et al
Tip!

Press p or to see the previous file or, n or to see the next file

About

openfold-dataset is originate from the Registry of Open Data on AWS

Collaborators 5

Comments

Loading...