Skip to content

Instantly share code, notes, and snippets.

James Kane JamesKane

Block or report user

Report or block JamesKane

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
JamesKane /
Last active Jan 21, 2020
Take paired FASTQ files and create a CRAM file containing chrY and chrM reads with their pairs.
# USAGE: sh [Sample]
# This simple script automates aligning and filtering Samples from the ENA. The workflow creates a name ordered BAM using all
# available threads from a group of FASTQ files using the naming convention [Sample]_1.fastq.gz and [Sample]_2.fastq.gz.
# After this the BAM is filtered to only reads on the chrY and chrM regions. Finally, the flow of control is passed to a
# modified version of GATK's Best Practices to create a suiteable gVCF and collect some metrics.
JamesKane / build_cohort.rb
Created Jun 17, 2018
Collect gVCF files and add chrY to a GenomicsDB using GATK4.
View build_cohort.rb
# Very basic Ruby script that collects all the gVCFs in a directory, and puts the results
# into a GenomicsDB for later genotyping. The batch size is limited to 200 files at a time
# since memory usage is quite demanding. This currently consumes 18GB of RAM on a Fedora 28
# workstation. Reader threads does not appear to have significant impact.
# TODO: Parameterize the contig, since GenomicsDBImport doesn't support multiple
# chromosomes at present.
command = "gatk --java-options \"-Xmx32g -Xms32g\" GenomicsDBImport \\\n"
command += "-R /mnt/genomics/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa \\\n"
JamesKane /
Last active May 28, 2018
Use GATK to create an unaligned BAM from FASTQ data
# USAGE: sh <fastq1> <fastq2> <sample_name> <read_group> <platform_unit>
$gatk --java-options "-Xmx8G" FastqToSam \
-FASTQ=$1 \
-FASTQ2=$2 \
-OUTPUT=$3.unmapped.bam \
JamesKane /
Last active May 25, 2018
Use GATK to mark optical duplicates, apply base recalibration, and call a clean BAM
# USAGE: sh <sample name>
# CONFIG VARIABLES: Update to match environment
$gatk --java-options "-Xmx4G" \
MarkDuplicates -I=$1.bwa.clean.bam -O=$1.dedup.bam -METRICS_FILE=metrics.txt
JamesKane /
Last active May 25, 2018
Prepare the Clean BAM for an Illumina Sample with GATK
# USAGE: sh <sample name>
# Based on
# CONFIG VARIABLES: Update to match environment
# Mark the Illumina adapters (if present. The sequencing lab should have removed them
# prior to delivering the results.)
$gatk --java-options "-Xmx8G" MarkIlluminaAdapters \
JamesKane /
Last active May 28, 2018
Use GATK to revert an aligned BAM to an unaligned BAM
# USAGE: sh <sample name>
# Assumes GATK is on the path. Based on
gatk RevertSam \
-I=$1.bam \
-SANITIZE=true \
You can’t perform that action at this time.