Create a gist now

Instantly share code, notes, and snippets.

Embed
Sequence example file
"""
Sequence
--------
String object representing biological sequences with alphabets.
"""
cdef class Sequence(object):
'''
The Sequence class is a general container for storing a big string (a long sequence of characters) and
for making its manipulation easy and efficient.
Like python standard strings, the basic sequence object is immutable. So you cannot do attribution such as
seq[5] = 'A'. However it allows the Sequence objects to be used as dictionary keys.
The Sequence object provides a number of string like methods (such as count, find, split and strip), which
are general for any biological sequences.
For more biology-oriented purpose of storing a DNA, RNA or a sequence of amino acids, please check those
subclasses: DNASequence, RNASequence, AASequence. They all derive directly from the Sequence class.
The Sequence object can be used for storing any single string based on single-byte character set.
Parameters
----------
data: string
The sequence of letters
alphabet: Optional argument, an Alphabet object, default is None.
The expected symbols (letters), for Sequence the alphabet is None.
'''
cdef public bytes data
cdef public object alphabet
def __init__(self, bytes data, alphabet=None):
self.data = data
self.alphabet = alphabet
def __repr__(self):
'''
Returns a (truncated) representation of the sequence for debugging.
'''
return '%d-letter "%s" instance\nseq: %s' % (len(self.data), self.__class__.__name__, repr(self.data))
def __str__(self):
'''
Return the full sequence as a python string, use str(sequence).
'''
return self.data
def __len__(self):
'''
Return the length of the sequence.
'''
return len(self.data)
def __getitem__(self, index):
'''
Returns a subsequence of single letter.
'''
return Sequence(self.data[index], self.alphabet)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment