Photo by DeepMind on Unsplash

Variant Effect Predictor (VEP) and the Loss-Of-Function Transcript Effect Estimator (LOFTEE) Plugin Dataset for Machine Learning

Install DagsHub:

pip install dagshub
Click on copy button to copy content

To stream this data directly on DagsHub

from dagshub.streaming import DagsHubFilesystem

fs = DagsHubFilesystem(".", repo_url="https://dagshub.com/DagsHub-Datasets/hail-vep-pipeline-dataset")

fs.listdir("s3://hail-vep-pipeline")
Click on copy button to copy content

Description

VEP determines the effect of genetic variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. The European Bioinformatics Institute produces the VEP tool/db and releases updates every 1 – 6 months. The latest release contains 267 genomes from 232 species containing 5567663 protein coding genes. This dataset hosts the last 5 releases for human, rat, and zebrafish. Also, it hosts the required reference files for the Loss-Of-Function Transcript Effect Estimator (LOFTEE) plugin as it is commonly used with VEP.

Additional information

Update frequency

New packages are added as soon as they are available and confirmed to work with recent versions of Hail.

License

VEP use is governed by the Apache 2.0 licenses, and LOFTEE use is governed by the MIT license.

Related datasets

Allen Brain Observatory – Visual Coding AWS Public Data Set

Allen Cell Imaging Collections

Biological and Physical Sciences (BPS) Microscopy Benchmark Training Dataset

Cancer Cell Line Encyclopedia (CCLE)

Launch your ML development to new heights with DagsHub

Back to top