Skip to content

Instantly share code, notes, and snippets.

@armintoepfer
Created December 21, 2022 10:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save armintoepfer/07ed52dfe3520eab2086ac117552e039 to your computer and use it in GitHub Desktop.
Save armintoepfer/07ed52dfe3520eab2086ac117552e039 to your computer and use it in GitHub Desktop.
Split subreads.bam file into pieces
# Extract all hole numbers / ZMW ids
pbindexdump test.bam.pbi --format cpp | grep basicData.holeNumber_ | sed 's/basicData.holeNumber_ = {//;s/};//' | tr ',' '\n' | uniq > zmws.list
# Count them
wc -l zmws.list
# Split into two lists
head -n XXXX zmws.list > part1.list
tail -n YYYY zmws.list > part2.list
# Create two files
zmwfilter --include part1.list test.bam part1.bam
zmwfilter --include part2.list test.bam part2.bam
# Sanity check to ensure that the merged files contain the same ZMWs
pbmerge -o merged.bam part1.bam part2.bam
pbindexdump merged.bam.pbi --format cpp | grep basicData.holeNumber_ | sed 's/basicData.holeNumber_ = {//;s/};//' | tr ',' '\n' | uniq > merged.list
diff zmws.list merged.list
# Sanity check to ensure that the merged files contain the records
samtools view test.bam > test.sam
samtools view merged.bam > merged.sam
diff test.sam merged.sam
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment