Skip to content

Instantly share code, notes, and snippets.

@finswimmer
Created June 6, 2018 05:23
Show Gist options
  • Save finswimmer/a84f7dd570550737a7bceb29ff7f1ab7 to your computer and use it in GitHub Desktop.
Save finswimmer/a84f7dd570550737a7bceb29ff7f1ab7 to your computer and use it in GitHub Desktop.
import sys
with open(sys.argv[1], "r") as gtf:
header = next(gtf)
last_gene = {
"gene": None,
"chr": None,
"end": None,
}
for line in gtf:
gene, chr, start, end = line.strip().split("\t")[:4]
if chr != last_gene["chr"]:
last_gene["gene"] = gene
last_gene["chr"] = chr
last_gene["end"] = end
else:
print(
last_gene["chr"],
last_gene["end"],
start,
last_gene["gene"]+"_"+gene,
sep="\t"
)
last_gene = {
"gene": gene,
"chr": chr,
"end": end
}
@finswimmer
Copy link
Author

Answer on biostars question: https://www.biostars.org/p/318790

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment