Skip to content

Instantly share code, notes, and snippets.

@rmzelle
Created January 20, 2015 21:38
Show Gist options
  • Save rmzelle/dbb504d2e64030023b2f to your computer and use it in GitHub Desktop.
Save rmzelle/dbb504d2e64030023b2f to your computer and use it in GitHub Desktop.
##gff-version 3
scaffold_28 prediction gene 1 402 0 + . ID=545184;Name=545184
scaffold_28 prediction gene 805 981 0 - . ID=93782;Name=93782
scaffold_28 prediction gene 2030 2721 0 + . ID=545205;Name=545205
scaffold_28 prediction gene 3273 3545 0 - . Name=YOL159C-A;Synteny=no_synteny;SystematicGeneName=YOL159C-A;ID=38792
scaffold_28 prediction gene 5318 5833 0 - . Name=YOL159C;Synteny=no_synteny;SystematicGeneName=YOL159C;ID=38793
scaffold_28 prediction gene 6780 8600 0 - . Name=ENB1;Synteny=no_synteny;SystematicGeneName=YOL158C;StandardGeneName=ENB1;ID=38794
scaffold_28 prediction gene 9698 11467 0 - . Name=IMA4;Synteny=no_synteny;SystematicGeneName=YJL221C;StandardGeneName=IMA4;ID=38795
import gffutils
import os
def updateGenome():
genomeGFF="genome-minimal-fail.gff"
genomeGFFdb = genomeGFF.replace(".gff", ".db")
#create database for genome
if os.path.isfile(genomeGFFdb):
db = gffutils.FeatureDB(genomeGFFdb)
else:
db = gffutils.create_db(genomeGFF, dbfn=genomeGFFdb, force=True, merge_strategy="merge")
#shift coordinates of features that follow the original integration site by deletion size
downstreamGenomeFeatures = db.region('scaffold_28:6001-11467', completely_within=True)
for feature in downstreamGenomeFeatures:
print(feature["Name"][0])
feature.start += 10000
feature.end += 10000
print(feature.end)
db.update(downstreamGenomeFeatures, merge_strategy="replace")
return
updateGenome()
@daler
Copy link

daler commented Jan 22, 2015

I can't figure out how to submit a web-based pull request on a gist, so I'll just describe here:

downstreamGenomeFeatures on line 15 is a generator. Like all generators (and just like reading a file object), iterating through it will consume it. You can't rewind. So on line 17 when you iterate over it, it consumes everything in the generator. Then when you call db.update and give it downstreamGenomeFeatures, by that time it's empty. So you get the error, "ValueError: No lines parsed -- was an empty file provided?" because no features were provided.

One solution is to make your own generator that consumes downstreamGenomeFeatures but yields the modified feature. Then give that generator to db.update. Like this:

    def gen():
        for feature in downstreamGenomeFeatures:
            print(feature["Name"][0])
            feature.start += 10000
            feature.end += 10000
            print(feature.end)
        yield feature

    db.update(gen(), merge_strategy="replace")

@rmzelle
Copy link
Author

rmzelle commented Jan 22, 2015

Thanks a lot. Could you please check my new example, though, at https://gist.github.com/rmzelle/22fc9c537ddf8f40de4b

I'm trying to shift the last two features of the GFF file downstream by 10000 bp. While the generator shows that the coordinates change, somehow the changes don't make it into the database after running db.update().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment