Skip to content

Instantly share code, notes, and snippets.

@brwnj
Last active November 25, 2020 23:51
Show Gist options
  • Save brwnj/a7a83f76d3be57276fee62937dbd13ee to your computer and use it in GitHub Desktop.
Save brwnj/a7a83f76d3be57276fee62937dbd13ee to your computer and use it in GitHub Desktop.

Setting up a small-ish example

Grab 5 bams and their indexes from 1000G to represent our alignments.

mkdir data && cd data
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00096/alignment/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00096/alignment/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam.bai
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00097/alignment/HG00097.chrom20.SOLID.bfast.GBR.low_coverage.20101123.bam
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00097/alignment/HG00097.chrom20.SOLID.bfast.GBR.low_coverage.20101123.bam.bai
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00182/alignment/HG00182.chrom20.ILLUMINA.bwa.FIN.low_coverage.20101123.bam
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00182/alignment/HG00182.chrom20.ILLUMINA.bwa.FIN.low_coverage.20101123.bam.bai
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00100/alignment/HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00100/alignment/HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam.bai
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00183/alignment/HG00183.chrom20.ILLUMINA.bwa.FIN.low_coverage.20101123.bam
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00183/alignment/HG00183.chrom20.ILLUMINA.bwa.FIN.low_coverage.20101123.bam.bai
echo done

We see from the header that 1000G uses GRCh37. Using ggd search we can find our reference:

ggd search -g GRCh37 reference genome

Among the listings, we see the reference we need with install instructions:

----------------------------------------------------------------------------------------------------

	grch37-reference-genome-1000g-v1
	================================

	Summary: GRCh37 reference genome from 1000 genomes

	Species: Homo_sapiens

	Genome Build: GRCh37

	Keywords: ref, reference, fasta-file

	Data Version: phase2_reference

	Prefix Install WARNING: This package has not been set up to use the --prefix flag when running ggd install. Once installed, this package will work with other ggd tools that use the --prefix flag.


	To install run:
		ggd install grch37-reference-genome-1000g-v1

      ----------------------------------------------------------------------------------------------------

We install our reference:

ggd install grch37-reference-genome-1000g-v1
# activate environmental variables
source activate base

Now we have everything to QC our reads using a Nextflow workflow for seqcover.

GENES="MYL9,TLDC2,NNAT,ADIG,FAM83D,PTPRT,SGK2,HNF4A"
nextflow run brwnj/seqcover-nf -revision main -profile docker \
    --reference $ggd_grch37_reference_genome_1000g_v1_file \
    --crams 'data/*.bam' \
    --genes $GENES --hg19

The Nextflow output gives:

N E X T F L O W  ~  version 20.10.0
Launching `brwnj/seqcover-nf` [pedantic_jennings] - revision: 8bba84f42f [main]
executor >  local (7)
[bb/2fddf4] process > mosdepth (3)        [100%] 5 of 5 ✔
[72/2ea7b3] process > seqcover_background [100%] 1 of 1 ✔
[e5/8d2432] process > seqcover_report     [100%] 1 of 1 ✔
Completed at: 25-Nov-2020 16:30:28
Duration    : 12m 50s
CPU hours   : 0.6
Succeeded   : 7

And we have our results in ./results/seqcover_report.html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment