Last active
November 20, 2024 11:35
-
-
Save JamesKane/1f6fa0c37dedf36cea8760d1d3787d95 to your computer and use it in GitHub Desktop.
Realign a BAM to the Telomere-to-telomere reference
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# USAGE: realign.sh [Source BAM/CRAM] | |
# ./realign.sh source.GRCh38.bam | |
# | |
# The script produces a new BAM aligned on the reference specified in the variable. Once complete it will apply CallableLoci | |
# for some quick QC. The script assumes that the working directory name matches the Sample e.g. | |
# /mnt/md0/B6564/source.GRCh38.bam | |
# | |
# There's a generation of Big Y 500 which do not have the pairs marked correctly. This results in treating the reads as SE. | |
# bbmap's repair will fix them. | |
# https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/repair-guide/ | |
# Future versions of this script should detect the problem and run repair first. | |
reference=/mnt/genomics/Reference/CHM13/chm13v2.0.fa.gz | |
contig=chrY | |
sample=${PWD##*/} | |
source=${1} | |
target=${sample}.chm13.cram | |
# Exit when any command fails | |
set -e | |
# Collate the read pairs from the source BAM. The prefix causes the intermediate BAMs to be output into the work directory | |
# instead of /tmp, which can be overrun with very large WGS sources depending on system configuration. The results are converted | |
# into a FASTQ stream retaining the RG and BC tags. BWA-MEM2 is used to align the stream onto the reference with the RG lines from | |
# the original BAM. Fixmate is applied to ensure they are setup for marking duplicates. Finally, the stream is sorted and duplicates | |
# marked into a CRAM file for space reduction. | |
# | |
# WARNING! When using experimental reference sequences you must retain a copy to get the original reads back with this method | |
samtools collate -Oun128 ${source} rnd | samtools fastq -OT RG,BC - \ | |
| bwa-mem2 mem -pt$(nproc) -CH <(samtools view -H ${1}|grep ^@RG) ${reference} - \ | |
| samtools fixmate -u -m - - \ | |
| samtools sort -@2 -m4g - \ | |
| samtools markdup -@8 --reference ${reference} - ${target} | |
samtools index ${target} | |
# Create the callable loci report with GATK3. Used for the creation of coverage histograms. | |
java -jar /mnt/genomics/Applications/GenomeAnalysisTK.jar -T CallableLoci \ | |
-R $reference \ | |
-I ${target} -summary table.txt -o callable_status.bed \ | |
-L ${contig} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment