Skip to content

Instantly share code, notes, and snippets.

@ypandit
Created April 27, 2012 21:32
Show Gist options
  • Save ypandit/2513421 to your computer and use it in GitHub Desktop.
Save ypandit/2513421 to your computer and use it in GitHub Desktop.
Solution for pseudogene issue in GFF3 for Artemis
import sys, os
if len(sys.argv) != 2:
print 'You seem to have forgotten to provide the input GGF3 file.'
exit()
if __name__ == "__main__":
found = False
END = outofbound = issues = 0
outfile = open(os.path.basename(sys.argv[1]).replace(".gff", "_corrected.gff"), 'w')
with open (sys.argv[1], 'r') as file:
for line in file:
if line.startswith('##sequence-region') is True:
END = (line.split(' '))[2].rstrip()
try:
row = line.split('\t')
if found and row[2] == 'exon':
issues += 1
continue
if row[2] == 'pseudogene':
found = True
if row[2] == 'gap' and row[3] > END:
outofbound += 1
continue
outfile.write(line)
except IndexError:
outfile.write(line)
outfile.close()
#print "# of issues = %d" % issues
#print "# Out of bound gaps = %d" % outofbound
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment