- Sample 1 g.vcf
- Sample 2 g.vcf
- Human reference genome (gs://genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta)
g.vcf files and produced databases available in Google Drive: https://drive.google.com/file/d/1rq5p5YpDYY6n0OqTL05IaDAl8vlMhxNv/view?usp=sharing
SAMPLE1=gvcfs/sample-1-chr20.g.vcf.gz
SAMPLE2=gvcfs/sample-2-chr20.g.vcf.gz
DB=my-local-database
GATK=../gatk-4.1.9.0/gatk
$GATK --java-options "-Xmx10g -Xms5g" \
GenomicsDBImport \
--genomicsdb-workspace-path $DB \
-L chr20 \
-V $SAMPLE1
# [March 14, 2021 at 12:59:13 PM UTC] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.07 minutes.
# Runtime.totalMemory()=5372903424
# Tool returned:
# true
$GATK --java-options "-Xmx10g -Xms5g" \
GenomicsDBImport \
--genomicsdb-update-workspace-path $DB \
-L chr20 \
-V $SAMPLE2
# [March 14, 2021 at 12:59:42 PM UTC] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.07 minutes.
# Runtime.totalMemory()=5372903424
# Tool returned:
# true
$GATK --java-options "-Xmx10g -Xms5g" \
SelectVariants \
-R ../Homo_sapiens_assembly38.fasta \
-V gendb://$DB \
-L chr20 -O test.vcf.gz
# [March 14, 2021 at 1:00:20 PM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 0.10 minutes.
# Runtime.totalMemory()=5372903424
SAMPLE1=gvcfs/sample-1-chr20.g.vcf.gz
SAMPLE2=gvcfs/sample-2-chr20.g.vcf.gz
DB=genomicsdb-test/my-gcs-database
export GOOGLE_APPLICATION_CREDENTIALS=SA-secret.json
$GATK --java-options "-Xmx10g -Xms5g" \
GenomicsDBImport \
--genomicsdb-workspace-path gs://$DB \
-L chr20 \
-V $SAMPLE1
# [March 14, 2021 at 12:42:51 PM UTC] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.60 minutes.
# Runtime.totalMemory()=5372903424
# Tool returned:
# true
$GATK --java-options "-Xmx10g -Xms5g" \
GenomicsDBImport \
--genomicsdb-update-workspace-path gs://$DB \
-L chr20 \
-V $SAMPLE2
# [March 14, 2021 at 12:44:24 PM UTC] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.55 minutes.
# Runtime.totalMemory()=5372903424
# Tool returned:
# true
$GATK --java-options "-Xmx10g -Xms5g" \
SelectVariants \
-R ../Homo_sapiens_assembly38.fasta \
-V gendb.gs://$DB \
-L chr20 -O test2.vcf.gz
# 12:46:02.138 INFO SelectVariants - Done initializing engine
# 12:46:02.306 INFO ProgressMeter - Starting traversal
# 12:46:02.307 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
# ---and hangs here
Also reported in GATK forum.
Database sizes:
gsutil du -sh gs://genomicsdb-test/my-gcs-database
# 1.31 MiB gs://genomicsdb-test/my-gcs-database
du -sh my-local-database/
# 1.6M my-local-database/
If I copy my-gcs-database to local filesystem it works:
gsutil cp -r gs://genomicsdb-test/my-gcs-database .
$GATK --java-options "-Xmx10g -Xms5g" SelectVariants \
-R ../Homo_sapiens_assembly38.fasta \
-V gendb://my-gcs-database -L chr20 \
-O test2.vcf.gz
# [March 14, 2021 at 1:02:36 PM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 0.10 minutes.
# Runtime.totalMemory()=5372903424
du -sh my-local-database my-gcs-database
# 1.6M my-local-database
# 1.6M my-gcs-database