Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

README.txt

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
  1. ##Impute-First Data Files
  2. #Benchmarking_Files/
  3. * HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz - GIAB HG001 Benchmark VCF
  4. * HG001_GRCh38_1_22_v4.2.1_benchmark.bed - GIAB HG001 Benchmark BED (High-conifdence regions)
  5. * GRCh38_mrg_full_gene.bed - GIAB GRCh38 CMRG regions
  6. * GRCh38_stratifications/extract_common_HG001_benchmark_sites.sh - Script identifying common regions in HG001 benchmarking and GRCh38 stratifications. Bed files with a 'confident_' prefix are outputs of the script.
  7. * GRCh38_stratifications/BED files:
  8. * [GRCh38_MHC.bed,
  9. GRCh38_allOtherDifficultregions.bed,
  10. GRCh38_alldifficultregions.bed,
  11. GRCh38_alllowmapandsegdupregions.bed]
  12. * [confident_GRCh38_MHC.bed,
  13. confident_GRCh38_allOtherDifficultregions.bed,
  14. confident_GRCh38_alldifficultregions.bed,
  15. confident_GRCh38_alllowmapandsegdupregions.bed]
  16. #Personalized_HG001_calls/ - Impute-first(IF) generated personalized variant callsets for HG001.
  17. * [IF_bbbc5.vcf.gz,
  18. IF_bbgc5.vcf.gz,
  19. IF_rgc1.vcf.gz,
  20. IF_rgc5.vcf.gz]
  21. #Workflow_GATK-HC_outputs/ - Downstream Variant Calling outputs generated using GATK-HC on Workflows discussed in Impute-first framework.
  22. * [BWAMEM_HG001.vcf.gz,
  23. bbbc5_HG001.vcf.gz,
  24. bbgc5_HG001.vcf.gz,
  25. pangenome_HG001.vcf.gz,
  26. rgc1_HG001.vcf.gz,
  27. rgc5_HG001.vcf.gz,
  28. vglinear_HG001.vcf.gz]
  29. #Supplementary_Tables/
  30. * Table_S1.csv: Alignment score comparison between the rgc1-imputed diploid personalized reference and the standard linear GRCh38 reference.
  31. * Table_S2.csv: Variant calling performance metrics for HG001 real donor reads, stratified by SNVs, indels, and overall variants, across different reference combinations within GIAB HG001 high-confidence regions.
  32. * Table_S3.csv: Variant calling performance metrics for HG001 real donor reads for overall variants, across different reference combinations within GIAB GRCh38 Complex Medically Relevant Gene (CMRG) regions.
  33. * Table_S4.csv: Variant calling performance metrics for HG001 real donor reads for overall variants, across different reference combinations within GIAB GRCh38 stratifications.
  34. * Table_S5.csv: Time and memory taken by each VG Giraffe-based workflow in the personalization, downstream indexing, and downstream alignment steps.
  35. * Table_S6.csv: Index and graph size measurements for the VG Giraffe graphs generated in various workflows.
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...