Skip to content

Instantly share code, notes, and snippets.

@obenshaindw
Last active April 6, 2023 09:45
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save obenshaindw/f4388a0205d0f46ffdc6 to your computer and use it in GitHub Desktop.
Save obenshaindw/f4388a0205d0f46ffdc6 to your computer and use it in GitHub Desktop.
Stream VCF file from AWS s3 and do stuff (sort, gzip, index, subset for specific region)
#!/usr/bin/bash
#
# make_gz.sh
#
# Call this script with a list of s3 locations with VCF files to parse
# aws --profile NDAR s3 ls s3:/S3_URL/ | awk '{print $4}' | xargs -n1 -P4 sh make_gz.sh
# xargs -n1 -P4 accepts one argument and runs 4 parallel processes
#
#Create named pipe
mkfifo $1_pipe
#Set up stream for pipe
aws --profile NDAR s3 cp s3://S3_URL/$1 - | /usr/bin/vcftools/vcftools_0.1.11/perl/vcf-sort -c > $1_pipe &
#Use pipe output to create bgzip file
/usr/bin/htslib/htslib/bgzip -c $1_pipe > $1.gz
#Index bgzip format file
/usr/bin/htslib/bcftools/bcftools index $1.gz
#Remove named pipe
rm $1_pipe
#Query VCF for UBE3A gene location, output VCF, gzip, and index.
/usr/bin/htslib/bcftools/bcftools view --regions 15:25337244-25439042 $1.gz | /usr/bin/htslib/htslib/bgzip -c > $1.query.gz
/usr/bin/htslib/bcftools/bcftools index $1.query.gz
#Remove original vcf and index files
rm $1.gz
rm $1.gz.csi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment