Skip to content

Instantly share code, notes, and snippets.

@alexholehouse
Created February 26, 2017 21:43
Show Gist options
  • Save alexholehouse/89fd04480a44bbf8bdc9a21664b5e385 to your computer and use it in GitHub Desktop.
Save alexholehouse/89fd04480a44bbf8bdc9a21664b5e385 to your computer and use it in GitHub Desktop.
Simple code for reading in a FASTA file and accessing each sequence for analysis in localCIDER
from localcider.sequenceParameters import SequenceParameters
from pyfasta import Fasta
#
# This assumes you have previously installed localCIDER and pyfasta (both
# are available via pip)
#
# read in the FASTA file using pyfasta
F = Fasta('swissprot_human_proteome.fasta')
# get all the header lines associated with each sequence
# (a header line is the line that starts with a ">" and
# generally contains identifying information about the
# protein). This is going to be the dictionary key we're going
# use
all_fasta_keys = F.keys()
# for each header we use this to extract the full amino acid
# sequence
header_to_sequence = {}
for k in all_fasta_keys:
# this assigns the value in the dictionary to
# the sequence
header_to_sequence[k] = str(F[k][:])
# now header_to_sequence is a dictionary where each key-value pair
# is a FASTA file header line and the associated amino acid sequence
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment