Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
850b294b4c
Initial commit
1 year ago
346047bdff
update readme automation
1 year ago

README.md

You have to be logged in to leave a comment. Sign In

Open Targets - Data Lakehouse Ready

Stream data with DDA:

from dagshub.streaming import DagsHubFilesystem

fs = DagsHubFilesystem(".", repo_url="https://dagshub.com/DagsHub-Datasets/opentargets-dataset")

fs.listdir("s3://aws-roda-hcls-datalake/opentargets_latest/")

Description:

This a Parquet representation of the Open Targets Platform's latest export. The Open Targets Platform integrates evidence from genetics, genomics, transcriptomics, drugs, animal models and scientific literature to score and rank target-disease associations for drug target identification. The Open Targets Platform (https://www.targetvalidation.org) is a freely available resource for the integration of genetics, genomics, and chemical data to aid systematic drug target identification and prioritisation. This dataset is 'Lakehouse Ready'. Meaning, you can query this data in-place straight out of the Registry of Open Data S3 bucket. Deploy this dataset's corresponding CloudFormation template to create the AWS Glue catalog entries into your account in about 30 seconds. That one step will enable you to write SQL with AWS Athena, build dashboards and charts with Amazon Quicksight, perform HPC with AWS EMR, or join into your AWS Redshift clusters. More detail in (the documentation)[https://github.com/aws-samples/data-lake-as-code/blob/roda/README.md.

Contact:

This a Parquet representation of the Open Targets Platform's latest export. The Open Targets Platform integrates evidence from genetics, genomics, transcriptomics, drugs, animal models and scientific literature to score and rank target-disease associations for drug target identification. The Open Targets Platform (https://www.targetvalidation.org) is a freely available resource for the integration of genetics, genomics, and chemical data to aid systematic drug target identification and prioritisation. This dataset is 'Lakehouse Ready'. Meaning, you can query this data in-place straight out of the Registry of Open Data S3 bucket. Deploy this dataset's corresponding CloudFormation template to create the AWS Glue catalog entries into your account in about 30 seconds. That one step will enable you to write SQL with AWS Athena, build dashboards and charts with Amazon Quicksight, perform HPC with AWS EMR, or join into your AWS Redshift clusters. More detail in (the documentation)[https://github.com/aws-samples/data-lake-as-code/blob/roda/README.md.

Update Frequency:

Within two weeks of new Open Targets releases

Managed By:

https://aws.amazon.com/

Resources:

  1. resource:

    • Description: Latest Open Targets release. Updates within two weeks of new Open Targets version. Information on Open Targets releases can be found here.
    • ARN: arn:aws:s3:::aws-roda-hcls-datalake/opentargets_latest/
    • Region: us-east-1
    • Type: S3 Bucket
  2. resource:

    • Description: Open Targets v20.06. Does not update.
    • ARN: arn:aws:s3:::aws-roda-hcls-datalake/opentargets_20_06/
    • Region: us-east-1
    • Type: S3 Bucket
  3. resource:

    • Description: Open Targets v19.11. Does not update
    • ARN: arn:aws:s3:::aws-roda-hcls-datalake/opentargets_1911/
    • Region: us-east-1
    • Type: S3 Bucket

Tags:

chemistry, genetic, genomic, molecule, life sciences, biotech blueprint, parquet

Tutorials:

  1. tutorial:

Publication:

  1. publication:
Tip!

Press p or to see the previous file or, n or to see the next file

About

opentargets-dataset is originate from the Registry of Open Data on AWS

Collaborators 5

Comments

Loading...