Skip to content

Instantly share code, notes, and snippets.

@dacarlin
Last active August 29, 2015 14:01
Show Gist options
  • Save dacarlin/014a5ba2ddade5aaa465 to your computer and use it in GitHub Desktop.
Save dacarlin/014a5ba2ddade5aaa465 to your computer and use it in GitHub Desktop.
The story of parse.py

I'm a fan of simplifying code. Yesterday, I wrote a parser that would take a submitted list of single amino acid switch mutations and output a Python object listing each one. It accepted input like

A123F, F343G, H7A

or

A123F+F343G+H7A

with handling for whitespace and different seperators and printed the output. It ran like this:

$ python3 parse.py 'A123F + F343G + H7A'
['A123F', 'F343G', 'H7A']

It looked like this:

from sys import argv
mutations = mutations.strip() 
print("Stripped:", mutations)

# len('A1000A'), AKA biggest single mutant, is 6
# len('A1A A1A'), AKA smallest double mutant, is 7
if len(mutations) < 7:
  next
else:
  if "+" in mutations:
    mutations = mutations.split("+")
  if "," in mutations:
    mutations = mutations.split(",")
  if " " in mutations:
    mutations = mutations.split(" ")

mutations = [ mutation.strip() for mutation in mutations ]
print(mutations)

Today, I rewrote it in one (OK two) lines using a regular expression:

from sys import argv
from re import findall
print(findall(r'\w+', argv[1]))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment