Skip to content

Instantly share code, notes, and snippets.

@HelBorn
Created August 17, 2012 15:35
Show Gist options
  • Save HelBorn/3379967 to your computer and use it in GitHub Desktop.
Save HelBorn/3379967 to your computer and use it in GitHub Desktop.
import re
# Your protein input sequence. This should be replaced with something like raw_input(""),
# but for this demo I just typed the string in.
sequence = "AUGCAAGGUACUUUCAGUUGACAAUAG" # Valid protein
#sequence = "AUGCAAGGUACUUUCAGUUGACAACAA" # Invalid protein
# Performs the search. The first argument is the regular expression (regex)
# and the second argument is the sequence (as seen above).
m = re.search(r'^(AUG([ACGU]{3,})(UAA|UAG|UGA))$', sequence)
# "re.search" returns None if no matching pattern colud be found.
if m == None:
print("This is not a protein")
else: # Else if not None, this means a pattern has been found.
# Checks the sequence is a multiple of 3
if(len(sequence) % 3 == 0):
print("This is a possible protein")
else:
print("This is not a protein")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment