Skip to content

Instantly share code, notes, and snippets.

@konrad
Created September 28, 2011 18:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save konrad/1248694 to your computer and use it in GitHub Desktop.
Save konrad/1248694 to your computer and use it in GitHub Desktop.
A short test to generate a test SAM file with a long read
import sys
base_seq = "ACGT"
seq = base_seq * int(int(sys.argv[1])/len(base_seq))
spacer_length = 10000
header = "@HD VN:1.0\n@SQ SN:Mock LN:%s" % (len(seq) + spacer_length)
cigar_string = "%sM" % (len(seq))
genome_line = "62DJLAAXX_8:1:17056:1190 0 Mock 1 255 %s * 0 0 %s * NM:i:0 MD:Z:30 NH:i:1" % (
cigar_string, seq)
print("\n".join([header, genome_line]))
for LEN in 400000 800000 1600000
do
python generate_sam_test.py ${LEN} > sam_test_${LEN}_nt.sam
samtools view -bS sam_test_${LEN}_nt.sam > sam_test_${LEN}_nt.bam
samtools sort sam_test_${LEN}_nt.bam sam_test_${LEN}_nt_sorted
rm -f sam_test_${LEN}_nt.bam
samtools index sam_test_${LEN}_nt_sorted.bam
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment