Skip to content

Instantly share code, notes, and snippets.

@lmtani
Last active March 14, 2021 13:20
Show Gist options
  • Save lmtani/bdbf34366545c1b5140251c5a2f1dae6 to your computer and use it in GitHub Desktop.
Save lmtani/bdbf34366545c1b5140251c5a2f1dae6 to your computer and use it in GitHub Desktop.

Inputs

  • Sample 1 g.vcf
  • Sample 2 g.vcf
  • Human reference genome (gs://genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta)

g.vcf files and produced databases available in Google Drive: https://drive.google.com/file/d/1rq5p5YpDYY6n0OqTL05IaDAl8vlMhxNv/view?usp=sharing

Using locally (works)

SAMPLE1=gvcfs/sample-1-chr20.g.vcf.gz
SAMPLE2=gvcfs/sample-2-chr20.g.vcf.gz
DB=my-local-database
GATK=../gatk-4.1.9.0/gatk

$GATK --java-options "-Xmx10g -Xms5g" \
    GenomicsDBImport \
    --genomicsdb-workspace-path $DB \
    -L chr20 \
    -V $SAMPLE1
    
# [March 14, 2021 at 12:59:13 PM UTC] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.07 minutes.
# Runtime.totalMemory()=5372903424
# Tool returned:
# true
    
$GATK --java-options "-Xmx10g -Xms5g" \
    GenomicsDBImport \
    --genomicsdb-update-workspace-path $DB \
    -L chr20 \
    -V $SAMPLE2
    
# [March 14, 2021 at 12:59:42 PM UTC] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.07 minutes.
# Runtime.totalMemory()=5372903424
# Tool returned:
# true
    
$GATK --java-options "-Xmx10g -Xms5g" \
    SelectVariants \
    -R ../Homo_sapiens_assembly38.fasta \
    -V gendb://$DB \
    -L chr20 -O test.vcf.gz
    
# [March 14, 2021 at 1:00:20 PM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 0.10 minutes.
# Runtime.totalMemory()=5372903424

Using Google Cloud Storage (still not working)

SAMPLE1=gvcfs/sample-1-chr20.g.vcf.gz
SAMPLE2=gvcfs/sample-2-chr20.g.vcf.gz
DB=genomicsdb-test/my-gcs-database
export GOOGLE_APPLICATION_CREDENTIALS=SA-secret.json

$GATK --java-options "-Xmx10g -Xms5g" \
    GenomicsDBImport \
    --genomicsdb-workspace-path gs://$DB \
    -L chr20 \
    -V $SAMPLE1
    
# [March 14, 2021 at 12:42:51 PM UTC] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.60 minutes.
# Runtime.totalMemory()=5372903424
# Tool returned:
# true
    
$GATK --java-options "-Xmx10g -Xms5g" \
    GenomicsDBImport \
    --genomicsdb-update-workspace-path gs://$DB \
    -L chr20 \
    -V $SAMPLE2
    
# [March 14, 2021 at 12:44:24 PM UTC] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.55 minutes.
# Runtime.totalMemory()=5372903424
# Tool returned:
# true
    
$GATK --java-options "-Xmx10g -Xms5g" \
    SelectVariants \
    -R ../Homo_sapiens_assembly38.fasta \
    -V gendb.gs://$DB \
    -L chr20 -O test2.vcf.gz
    
# 12:46:02.138 INFO  SelectVariants - Done initializing engine
# 12:46:02.306 INFO  ProgressMeter - Starting traversal
# 12:46:02.307 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
# ---and hangs here  

Also reported in GATK forum.

Database sizes:

gsutil du -sh gs://genomicsdb-test/my-gcs-database
# 1.31 MiB     gs://genomicsdb-test/my-gcs-database

du -sh my-local-database/
# 1.6M	my-local-database/

If I copy my-gcs-database to local filesystem it works:

gsutil cp -r gs://genomicsdb-test/my-gcs-database .
$GATK --java-options "-Xmx10g -Xms5g" SelectVariants \
  -R ../Homo_sapiens_assembly38.fasta \
  -V gendb://my-gcs-database -L chr20 \
  -O test2.vcf.gz
  
# [March 14, 2021 at 1:02:36 PM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 0.10 minutes.
# Runtime.totalMemory()=5372903424

du -sh my-local-database my-gcs-database
# 1.6M	my-local-database
# 1.6M	my-gcs-database
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment