Photo by DeepMind on Unsplash

UK Biobank Pan-Ancestry Summary Statistics Dataset for Machine Learning

Install DagsHub:

pip install dagshub
Click on copy button to copy content

To stream this data directly on DagsHub

from dagshub.streaming import DagsHubFilesystem

fs = DagsHubFilesystem(".", repo_url="https://dagshub.com/DagsHub-Datasets/broad-pan-ukb-dataset")

fs.listdir("s3://pan-ukb-us-east-1")
Click on copy button to copy content

Description

A multi-ancestry analysis of 7,221 phenotypes using a generalized mixed model association testing framework, spanning 16,119 genome-wide association studies. We provide standard meta-analysis across all populations and with a leave-one-population-out approach for each trait. The data are provided in tsv format (per phenotype) and Hail MatrixTable (all phenotypes and variants). Metadata is provided in phenotype and variant manifests.

Additional information

Update frequency

Occasional

Managed by

Analytic and Translational Genetics Unit, Massachusetts General Hospital and the Broad Institute

License

CC BY-4.0 (usage may be restricted by UK Biobank, more details on the “Downloads page“)

Related datasets

Allen Brain Observatory – Visual Coding AWS Public Data Set

Allen Cell Imaging Collections

Biological and Physical Sciences (BPS) Microscopy Benchmark Training Dataset

Cancer Cell Line Encyclopedia (CCLE)

Launch your ML development to new heights with DagsHub

Back to top