pichuan/force_calling_v1.1.0.md

## force_calling_v1.1.0.md

      
    Raw
  

              force_calling_v1.1.0.md
            
          
    Tutorial: Force calling with DeepVariant v1.1.0

Using --variant_caller=vcf_candidate_importer in make_examples, we can use DeepVariant to act like a genotyper to force call candidates in the VCF file specified in --proposed_variants.
Context

Before reading this guide, read the WES Case Study and metrics:

https://github.com/google/deepvariant/blob/r1.1/docs/deepvariant-exome-case-study.md
https://github.com/google/deepvariant/blob/r1.1/docs/metrics.md

The instructions below will be similar to:
https://raw.githubusercontent.com/google/deepvariant/r1.1/scripts/run_wes_case_study_doc
ker.sh
But we’ll provide new flags for you to run in force calling mode.
Machine

I got a GCE instance following this command:
https://github.com/google/deepvariant/blob/r1.1/docs/deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform
I used ubuntu-1804-lts instead of 1604. (Ubuntu 16.04 should work too. I just picked one for this tutorial.)
Install necessary tools

sudo apt -y update && \
  sudo apt -y install docker.io && \
  sudo apt -y install bedtools && \
  sudo apt -y install tabix

Use run_deepvariant.py with customized args

With the files from the WES Case Study, run:
bedtools intersect -header \
  -a benchmark/HG003_GRCh38_1_22_v4.2_benchmark.vcf.gz \
  -b benchmark/HG003_GRCh38_1_22_v4.2_benchmark.bed \
  | bgzip > input/force_calling.candidates.vcf.gz
  tabix -p vcf input/force_calling.candidates.vcf.gz

Prepare a VCF file with proposed sites (that we can do force calling on):
Replace the run_deepvariant section in the WES Case Study with this:
mkdir -p output
mkdir -p output/intermediate_results_dir

BIN_VERSION="1.1.0"
MAKE_EXAMPLE_ARGS="variant_caller=vcf_candidate_importer,proposed_variants=/input/force_calling.candidates.vcf.gz,vsc_min_count_snps=0,vsc_min_count_indels=0,vsc_min_fraction_snps=0,vsc_min_fraction_indels=0"

sudo docker run \
  -v "${PWD}/input":"/input" \
  -v "${PWD}/output":"/output" \
  -v "${PWD}/reference":"/reference" \
  google/deepvariant:"${BIN_VERSION}" \
  /opt/deepvariant/bin/run_deepvariant \
  --model_type WES \
  --ref /reference/GRCh38_no_alt_analysis_set.fasta \
  --reads /input/HG003.novaseq.wes_idt.100x.dedup.bam \
  --regions /input/idt_capture_novogene.grch38.bed \
  --output_vcf /output/HG003.output.vcf.gz \
  --output_gvcf /output/HG003.output.g.vcf.gz \
  --num_shards $(nproc) \
  --make_examples_extra_args="${MAKE_EXAMPLE_ARGS}" \
  --intermediate_results_dir /output/intermediate_results_dir

Callset is generated at output/HG003.output.vcf.gz.
Runtime

make_examples step took:
real    7m34.109s
user    446m27.685s   
sys     9m19.618s

call_variants step took:
real    0m59.444s
user    25m49.371s
sys     1m25.558s

postprocess_variants step took:
real    0m44.978s
user    0m43.506s
sys     0m8.865s

Evaluate with hap.py

Baseline (WES metrics):


Type
# TP
# FN
# FP
Recall
Precision
F1_Score


Indel
1025
28
19
0.973409
0.982126
0.977748


SNP
25005
319
169
0.987403
0.993287
0.990337


Force calling on NIST v4.2 truth:


Type
# TP
# FN
# FP
Recall
Precision
F1_Score


Indel
1026
27
7
0.974359
0.993340
0.983758


SNP
25022
302
48
0.988075
0.998085
0.993055


hap.py output:
Benchmarking Summary:
  Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
 INDEL    ALL         1053      1026        27         1051         7          0      7       0.974359          0.993340               0         0.983758                     NaN                     NaN                   1.752717                   1.833333
 INDEL   PASS         1053      1026        27         1051         7          0      7       0.974359          0.993340               0         0.983758                     NaN                     NaN                   1.752717                   1.833333
   SNP    ALL        25324     25022       302        25071        48          0     48       0.988075          0.998085               0         0.993055                2.856273                2.860508                   1.625246                   1.625773
   SNP   PASS        25324     25022       302        25071        48          0     48       0.988075          0.998085               0         0.993055                2.856273                2.860508                   1.625246                   1.625773
Type	# TP	# FN	# FP	Recall	Precision	F1_Score
Indel	1025	28	19	0.973409	0.982126	0.977748
SNP	25005	319	169	0.987403	0.993287	0.990337
Type	# TP	# FN	# FP	Recall	Precision	F1_Score
Indel	1026	27	7	0.974359	0.993340	0.983758
SNP	25022	302	48	0.988075	0.998085	0.993055