Skip to content

Instantly share code, notes, and snippets.

@jvhaarst
Created May 8, 2019 12:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jvhaarst/a9c125ce71a1b3eba5cb2ac829c796c8 to your computer and use it in GitHub Desktop.
Save jvhaarst/a9c125ce71a1b3eba5cb2ac829c796c8 to your computer and use it in GitHub Desktop.
Simple script to split PacBio BAM file into subread files.
import os
import sys
import gzip
filetype="txt"
for line in sys.stdin:
# {movieName}/{holeNumber}/{qStart}_{qEnd} according to https://pacbiofileformats.readthedocs.io/en/3.0/BAM.html
(movieName,holeNumber,subread)=(line.split()[0].split('/'))
directory = movieName+'/'+holeNumber[:3]
if filetype == "gzip":
outfile = directory+"/"+holeNumber+'.gz'
else:
outfile = directory+"/"+holeNumber+'.txt'
try:
if filetype == "gzip":
f=gzip.open(outfile,mode='at', compresslevel=1)
else:
f=open(outfile,mode='at')
f.write(line)
f.close()
except:
os.makedirs(directory, exist_ok=True)
if filetype == "gzip":
f=gzip.open(outfile,mode='at', compresslevel=1)
else:
f=open(outfile,mode='at')
f.write(line)
f.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment