I'm a fan of simplifying code. Yesterday, I wrote a parser that would take a submitted list of single amino acid switch mutations and output a Python object listing each one. It accepted input like
A123F, F343G, H7A
or
A123F+F343G+H7A
with handling for whitespace and different seperators and printed the output. It ran like this:
$ python3 parse.py 'A123F + F343G + H7A'
['A123F', 'F343G', 'H7A']
It looked like this:
from sys import argv
mutations = mutations.strip()
print("Stripped:", mutations)
# len('A1000A'), AKA biggest single mutant, is 6
# len('A1A A1A'), AKA smallest double mutant, is 7
if len(mutations) < 7:
next
else:
if "+" in mutations:
mutations = mutations.split("+")
if "," in mutations:
mutations = mutations.split(",")
if " " in mutations:
mutations = mutations.split(" ")
mutations = [ mutation.strip() for mutation in mutations ]
print(mutations)
Today, I rewrote it in one (OK two) lines using a regular expression:
from sys import argv
from re import findall
print(findall(r'\w+', argv[1]))