Skip to content

Instantly share code, notes, and snippets.

@liangyy
Created February 28, 2020 20:33
Show Gist options
  • Save liangyy/69146113cac107a99d571a4b9c8726bb to your computer and use it in GitHub Desktop.
Save liangyy/69146113cac107a99d571a4b9c8726bb to your computer and use it in GitHub Desktop.
Download and clean up dependent data for lab 7 (HG471 @ UChicago)
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/dbsnp_138.b37.vcf.gz.md5
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/dbsnp_138.b37.vcf.gz
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz.md5
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/1000G_phase1.indels.b37.vcf.gz
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/1000G_phase1.indels.b37.vcf.gz.md5
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/1000G_phase1.snps.high_confidence.b37.vcf.gz
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/1000G_phase1.snps.high_confidence.b37.vcf.gz.md5
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/hapmap_3.3.b37.vcf.gz
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/hapmap_3.3.b37.vcf.gz.md5
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/1000G_omni2.5.b37.vcf.gz
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/1000G_omni2.5.b37.vcf.gz.md5
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/1000G_phase3_v4_20130502.sites.vcf.gz
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/1000G_phase3_v4_20130502.sites.vcf.gz.md5
wget ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
module load samtools
module load picard
zcat hs37d5.fa.gz|head -n 48017254|tail -n 855078 > hs37d5_chr22.fa
samtools faidx hs37d5_chr22.fa
java -jar $PICARD CreateSequenceDictionary REFERENCE=hs37d5_chr22.fa OUTPUT=hs37d5_chr22.dict
module load htslib
unzipVCF () {
filename=$1
echo Processing $filename
zcat /project2/hgen47100/data/lab7/$filename.gz > /project2/hgen47100/data/lab7/$filename
# tabix vcf /project2/hgen47100/data/lab7/$filename
# mv /project2/hgen47100/data/lab7/$filename.tmp /project2/hgen47100/data/lab7/$filename
# mv /project2/hgen47100/data/lab7/$filename.tmp.tbi /project2/hgen47100/data/lab7/$filename.tbi
}
unzipVCF "hapmap_3.3.b37.vcf"
unzipVCF "1000G_omni2.5.b37.vcf"
unzipVCF "1000G_phase1.snps.high_confidence.b37.vcf"
unzipVCF "dbsnp_138.b37.vcf"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment