1000 Genomes
Binding DB – Data Lakehouse Ready
IBL Behavioral Data on AWS
OpenCell on AWS
1000 Genomes Phase 3 Reanalysis with DRAGEN 3.5 – Data Lakehouse Ready
Allen Ivy Glioblastoma Atlas
Encyclopedia of DNA Elements (ENCODE)
Allen Brain Observatory – Visual Coding AWS Public Data Set
Tabula Sapiens
SiPeCaM (Sitios Permanentes de la Calibración y Monitoreo de la Biodiversidad)
Variant Effect Predictor (VEP) and the Loss-Of-Function Transcript Effect Estimator (LOFTEE) Plugin
COVID-19 Genome Sequence Dataset
Oxford Nanopore Technologies Benchmark Datasets
PubSeq – Public Sequence Resource
DNAStack COVID19 SRA Data
AWS iGenomes
stdpopsim species resources
Hecatomb Databases
Cloud Indexes for Bowtie, Kraken, HISAT, and Centrifuge
International Neuroimaging Data-Sharing Initiative (INDI)
Cell Organelle Segmentation in Electron Microscopy (COSEM) on AWS
Toxicant Exposures and Responses by Genomic and Epigenomic Regulators of Transcription (TaRGET)
UCSC Genome Browser Sequence and Annotations
Natural Scenes Dataset
Pacific Ocean Sound Recordings
Cancer Cell Line Encyclopedia (CCLE)
Allen Cell Imaging Collections
Distributed Archives for Neurophysiology Data Integration (DANDI)
COVID-19 Data Lake
Ohio State Cardiac MRI Raw Data (OCMR)
NOAA Water-Column Sonar Data Archive
REDASA COVID-19 Open Data
Physionet
CoMMpass from the Multiple Myeloma Research Foundation
GATK Test Data
Human Cancer Models Initiative (HCMI) Cancer Model Development Center
National Cancer Institute Center for Cancer Research – Diffuse Large B Cell Lymphoma (DLBCL) Genomics and Expression
4D Nucleome (4DN)
recount3
OpenProteinSet
BossDB Open Neuroimagery Datasets
Cell Painting Image Collection
Sea Around Us Global Fisheries Catch Data
Tabula Muris Senis
IBL Neuropixels Reproducible Ephys Data on AWS
COVID-19 Open Research Dataset (CORD-19)
1000 Genomes Phase 3 Reanalysis with DRAGEN 3.5 and 3.7
GATK Structural Variation (SV) Data
InRad COVID-19 X-Ray and CT Scans
UK Biobank Pan-Ancestry Summary Statistics
Clinical Trial Sequencing Project – Diffuse Large B-Cell Lymphoma
CAncer MEtastases in LYmph nOdes challeNge (CAMELYON) Dataset
Biological and Physical Sciences (BPS) Microscopy Benchmark Training Dataset
3000 Rice Genomes Project
Open Targets – Data Lakehouse Ready
Genome in a Bottle on AWS
ZINC Database
The Genome Modeling System
Human PanGenomics Project
Sounds of Central African landscapes
Oregon Health & Science University Chronic Neutrophilic Leukemia Dataset
IBL Neuropixels Brainwide Map on AWS
iNaturalist Licensed Observation Images
Open NeuroData
Genome Aggregation Database (gnomAD)
Refgenie reference genome assets
University of British Columbia Sunflower Genome Dataset
CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF) in OMOP Common Data Model
Australasian Genomes
Medical Segmentation Decathlon
OpenCRAVAT
Africa Soil Information Service (AfSIS) Soil Chemistry
STOIC2021 Training
iSDAsoil
Foldingathome COVID-19 Datasets
Synthea synthetic patient generator data in OMOP Common Data Model
Foundation Medicine Adult Cancer Clinical Dataset (FM-AD)
OpenNeuro
COVID-19 Harmonized Data
Allen Mouse Brain Atlas
Mouse Brain Anatomy: MouseLight Imagery
Serratus: Ultra-deep Search for Novel Viruses – Versioned Data Release
Nanopore Reference Human Genome
Protein Data Bank 3D Structural Biology Data
Google Brain Genomics Sequencing Dataset for Benchmarking and Development
Conformational Space of Short Peptides
Seattle Alzheimer’s Disease Brain Cell Atlas (SEA-AD)
Cell Painting Gallery
The Singapore Nanopore Expression Data Set
NIH NCBI PubMed Central (PMC) Article Datasets – Full-Text Biomedical and Life Sciences Journal Articles on AWS
ChEMBL – Data Lakehouse Ready
Genome Aggregation Database (gnomAD) – Data Lakehouse Ready
National Herbarium of NSW
Broad Genome References
COVID-19 Molecular Structure and Therapeutics Hub
Tabula Muris
Basic Local Alignment Sequences Tool (BLAST) Databases
Clinical Proteomic Tumor Analysis Consortium 2 (CPTAC-2)
Orcasound – bioacoustic data for marine conservation
Clinical Proteomic Tumor Analysis Consortium 3 (CPTAC-3)
Fly Brain Anatomy: FlyLight Gen1 and Split-GAL4 Imagery
The Human Microbiome Project
ClinVar – Data Lakehouse Ready
QIIME 2 User Tutorial Datasets
UniProt
Open Bioinformatics Reference Data for Galaxy
CIViC (Clinical Interpretation of Variants in Cancer)
Biological and Physical Sciences (BPS) RNA Sequencing Benchmark Training Dataset
TIGER Training
Therapeutically Applicable Research to Generate Effective Treatments (TARGET)
Genome Ark