Skip to content

Instantly share code, notes, and snippets.

@radaniba
Created March 22, 2013 14:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save radaniba/5221539 to your computer and use it in GitHub Desktop.
Save radaniba/5221539 to your computer and use it in GitHub Desktop.
When reading in sequences, you may want to arrange or index them in some way (rather than just get one big list o' sequences). Fortunately Biopython's SeqIO has a useful function for this: "to_dict" returns a dictionary where the keys are derived from the SeqRecords that are the values.
from Bio import SeqIO
handle = open("example.fasta", "rU")
record_dict = SeqIO.to_dict (SeqIO.parse (handle, "fasta"))
handle.close()
# you now have dict where the keys are the sequence IDs, e.g. record_dict["gi:12345678"]
# you can index in other ways with the "key_function".
# for example, if you wanted to index by the description of the sequence
handle = open("example.fasta", "rU")
record_dict = SeqIO.to_dict (SeqIO.parse (handle, "fasta"), key_function=lambda s: s.description)
handle.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment