Skip to content

Instantly share code, notes, and snippets.

@sinamajidian
Last active March 31, 2022 22:33
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sinamajidian/542f2f7a29557b113601c9d05bdc6f26 to your computer and use it in GitHub Desktop.
Save sinamajidian/542f2f7a29557b113601c9d05bdc6f26 to your computer and use it in GitHub Desktop.
Extracting the high coverage region (as bed file) from a BAM file

A simpe code for extracting the regions with high coverage (>depth_treshold) from a bed file. The input of the code is a bed file containing rows of regions with its depth as following format

21	9424000	9424500	69.00
21	9424500	9425000	69.00
21	9425000	9425500	67.00
21	9425500	9426000	66.00
21	9426000	9426500	65.00
21	9426500	9427000	66.00

which can be generated by following from a BAM file.

chr="21"
bam_file_add="21_hifi.bam"
mosdepth --threads 3 --no-per-base --by 500 -c ${chr} -m depth ${bam_file_add}

The code has two outputs, the combined versio will be as follows in this case.

21 9424000 9427000

bed_address="depth.regions.bed"
bed_address_out="depth.regions.masked.bed"
depth_treshold= 48.2+3*9.1
bed_file=open(bed_address,'r');
selected_lines=[]
for line in bed_file:
line_strip=line.strip()
line_split=line_strip.split("\t")
depth=float(line_split[3])
if depth> depth_treshold:
selected_line= "\t".join(line_split)
selected_lines.append(selected_line)
bed_file.close()
bed_file_out=open(bed_address_out,'w');
for line_out in selected_lines:
bed_file_out.write(line_out+"\n")
bed_file_out.close()
#Combine small regions
first=True
for selected_line in selected_lines:
line_split= selected_line.split("\t")
start=int(line_split[1])
end=int(line_split[2])
if first:
first=False
combined_records=[line_split[:3]] # no need to report the depth of record
else:
last_combined=combined_records[-1]
end_last_combined= int(last_combined[2])
if end_last_combined==start:
combined_records[-1][2]=str(end)
else:
combined_records.append(line_split[:3])
bed_file_out=open(bed_address_out[:-3]+"combined.bed",'w');
for records in combined_records:
line_out= "\t".join(records)
bed_file_out.write(line_out+"\n")
bed_file_out.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment