Skip to content

Instantly share code, notes, and snippets.

@dyndna
Created January 28, 2016 23:36
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save dyndna/4aba7aa6f5840a078227 to your computer and use it in GitHub Desktop.
Save dyndna/4aba7aa6f5840a078227 to your computer and use it in GitHub Desktop.
Merge bam files using samtools merge, add @rg and @co tags in merged header file.

merge bams file and add @RG and @CO

Let's say you have two bam files from two different runs you need to merge.

  1. bam1_batch1.bam
  2. bam1_batch2.bam

Using samtools view -H <bamfile>, get @RG information for each of two bams. I usually get ID, DT and PL info. from bam header and then add SM (sample tag) tag using following script. You can run this script before samtools merge command in your PBS script.

#!/bin/bash

cat > rg.txt << EOF
@RG	ID:HWI-ST208:532:C72EYACXX	DT:2015-08-21	PL:ILLUMINA	PU:C72EYACXX	SM:$1
@RG	ID:HWI-ST1120:363:C72GNACXX	DT:2015-09-29	PL:ILLUMINA	PU:C72GNACXX	SM:$2
EOF

Run this script:

make_rg.sh bam1_batch1.bam bam1_batch2.bam 

This will create file rg.txt in a working direcotry, unique for every pair of bams you want to merge.

To merge bam files, run following:

samtools merge -h rg.txt tmp_<merged_bam_name>.bam bam1_batch1.bam bam1_batch2.bam

Following command from Picard tools will add comments @CO tag with steps you did to add @RG above. This steps essentially rewrites entire bam to edit a header. So, larger the bam size, more time it takes to rewrite header info.

java -jar /apps/picard/default/AddCommentsToBam.jar INPUT=tmp_<merged_bam_name>.bam OUTPUT=<merged_bam_name>.bam COMMENT=\"ID:samtools	VN:1.2.1	CL: "$1" COMMENT="Picard AddCommentsToBam.jar command was used to add this @CO line and make bam index." COMMENT="Batch 1 bam is from "$2" run; Batch2 bam is from "$3" run" CREATE_INDEX=true CREATE_MD5_FILE=true VALIDATION_STRINGENCY=LENIENT TMP_DIR=/dev/shm

View updated header:

samtools view -H <merged_bam_name>.bam
@RG ID:HWI-ST208:532:C72EYACXX  PL:ILLUMINA PU:C72EYACXX    DT:2015-08-20T19:00:00-0500 SM:bam1_batch1.bam
@RG ID:HWI-ST1120:363:C72GNACXX PL:ILLUMINA PU:C72GNACXX    DT:2015-09-28T19:00:00-0500 SM:bam1_batch2.bam
...
@CO ID:samtools VN:1.2.1    CL: make_rg.sh bam1_batch1.bam bam1_batch2.bam && sleep 5 && samtools merge -h rg.txt tmp_<merged_bam_name>.bam bam1_batch1.bam bam1_batch2.bam
@CO Picard AddCommentsToBam.jar command was used to add this @CO line and make bam index.
@CO Batch 1 bam is from 150821_SN208_0532_AC72EYACXX_27Aug15 run; Batch2 bam is from 150929_SN1120_0363_AC72GNACXX run.

END

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment