Skip to content

Instantly share code, notes, and snippets.

@JamesKane
Created June 17, 2018 23:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save JamesKane/7a4524d909c4e940172c9f572cc0d11c to your computer and use it in GitHub Desktop.
Save JamesKane/7a4524d909c4e940172c9f572cc0d11c to your computer and use it in GitHub Desktop.
Collect gVCF files and add chrY to a GenomicsDB using GATK4.
# Very basic Ruby script that collects all the gVCFs in a directory, and puts the results
# into a GenomicsDB for later genotyping. The batch size is limited to 200 files at a time
# since memory usage is quite demanding. This currently consumes 18GB of RAM on a Fedora 28
# workstation. Reader threads does not appear to have significant impact.
# TODO: Parameterize the contig, since GenomicsDBImport doesn't support multiple
# chromosomes at present.
command = "gatk --java-options \"-Xmx32g -Xms32g\" GenomicsDBImport \\\n"
command += "-R /mnt/genomics/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa \\\n"
Dir.glob('*.vcf.gz') do |file|
command += "--variant #{file} \\\n"
end
command += "--genomicsdb-workspace-path /mnt/genomics/chrY \\\n"
command += "-L chrY --reader-threads=8 --batch-size 200"
exec command
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment