1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
- ##Impute-First Data Files
- #Benchmarking_Files/
- * HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz - GIAB HG001 Benchmark VCF
- * HG001_GRCh38_1_22_v4.2.1_benchmark.bed - GIAB HG001 Benchmark BED (High-conifdence regions)
- * GRCh38_mrg_full_gene.bed - GIAB GRCh38 CMRG regions
- * GRCh38_stratifications/extract_common_HG001_benchmark_sites.sh - Script identifying common regions in HG001 benchmarking and GRCh38 stratifications. Bed files with a 'confident_' prefix are outputs of the script.
- * GRCh38_stratifications/BED files:
- * [GRCh38_MHC.bed,
- GRCh38_allOtherDifficultregions.bed,
- GRCh38_alldifficultregions.bed,
- GRCh38_alllowmapandsegdupregions.bed]
- * [confident_GRCh38_MHC.bed,
- confident_GRCh38_allOtherDifficultregions.bed,
- confident_GRCh38_alldifficultregions.bed,
- confident_GRCh38_alllowmapandsegdupregions.bed]
- #Personalized_HG001_calls/ - Impute-first(IF) generated personalized variant callsets for HG001.
- * [IF_bbbc5.vcf.gz,
- IF_bbgc5.vcf.gz,
- IF_rgc1.vcf.gz,
- IF_rgc5.vcf.gz]
- #Workflow_GATK-HC_outputs/ - Downstream Variant Calling outputs generated using GATK-HC on Workflows discussed in Impute-first framework.
- * [BWAMEM_HG001.vcf.gz,
- bbbc5_HG001.vcf.gz,
- bbgc5_HG001.vcf.gz,
- pangenome_HG001.vcf.gz,
- rgc1_HG001.vcf.gz,
- rgc5_HG001.vcf.gz,
- vglinear_HG001.vcf.gz]
- #Supplementary_Tables/
- * Table_S1.csv: Alignment score comparison between the rgc1-imputed diploid personalized reference and the standard linear GRCh38 reference.
- * Table_S2.csv: Variant calling performance metrics for HG001 real donor reads, stratified by SNVs, indels, and overall variants, across different reference combinations within GIAB HG001 high-confidence regions.
- * Table_S3.csv: Variant calling performance metrics for HG001 real donor reads for overall variants, across different reference combinations within GIAB GRCh38 Complex Medically Relevant Gene (CMRG) regions.
- * Table_S4.csv: Variant calling performance metrics for HG001 real donor reads for overall variants, across different reference combinations within GIAB GRCh38 stratifications.
- * Table_S5.csv: Time and memory taken by each VG Giraffe-based workflow in the personalization, downstream indexing, and downstream alignment steps.
- * Table_S6.csv: Index and graph size measurements for the VG Giraffe graphs generated in various workflows.
|