Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
28d016c6bc
Initial commit
1 year ago
ca2327db2f
update readme automation
1 year ago
Storage Buckets

README.md

You have to be logged in to leave a comment. Sign In

ClinVar - Data Lakehouse Ready

Stream data with DDA:

from dagshub.streaming import DagsHubFilesystem

fs = DagsHubFilesystem(".", repo_url="https://dagshub.com/DagsHub-Datasets/clinvar-dataset")

fs.listdir("s3://aws-roda-hcls-datalake/clinvar_summary_variants/")

Description:

ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation. ClinVar processes submissions reporting variants found in patient samples, assertions made regarding their clinical significance, information about the submitter, and other supporting data. The alleles described in submissions are mapped to reference sequences, and reported according to the HGVS standard. ClinVar then presents the data for interactive users as well as those wishing to use ClinVar in daily workflows and other local applications. ClinVar works in collaboration with interested organizations to meet the needs of the medical genetics community as efficiently and effectively as possible. This representation of ClinVar is stored in Parquet format and most easily utilized through Amazon Athena. Follow the documentation link for install instructions (< 2 minute install).

Contact:

ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation. ClinVar processes submissions reporting variants found in patient samples, assertions made regarding their clinical significance, information about the submitter, and other supporting data. The alleles described in submissions are mapped to reference sequences, and reported according to the HGVS standard. ClinVar then presents the data for interactive users as well as those wishing to use ClinVar in daily workflows and other local applications. ClinVar works in collaboration with interested organizations to meet the needs of the medical genetics community as efficiently and effectively as possible. This representation of ClinVar is stored in Parquet format and most easily utilized through Amazon Athena. Follow the documentation link for install instructions (< 2 minute install).

Update Frequency:

Every Sunday at 1AM UTC

Managed By:

https://aws.amazon.com/

Resources:

  1. resource:
    • Description: ClinVar
    • ARN: arn:aws:s3:::aws-roda-hcls-datalake/clinvar_summary_variants/
    • Region: us-east-1
    • Type: S3 Bucket

Tags:

chemistry, genetic, genomic, life sciences, biotech blueprint, parquet

Tutorials:

  1. tutorial:

Publication:

  1. publication:
Tip!

Press p or to see the previous file or, n or to see the next file

About

clinvar-dataset is originate from the Registry of Open Data on AWS

Collaborators 5

Comments

Loading...