Skip to content

Instantly share code, notes, and snippets.

@battis
Last active December 17, 2015 13:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save battis/5617010 to your computer and use it in GitHub Desktop.
Save battis/5617010 to your computer and use it in GitHub Desktop.
{
"metadata": {
"name": "Parse Molecule"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": "'''\n Let's assume that you're receiving exactly what Robert and Damion _should_\n be giving you, something like the following:\n'''\n\nsplitInput = [\n ['C3H8', '3O2'],\n ['4H2O', 'CO2']\n]\n\n'''\n Let's further assume that you have some handy function like parseEquation()\n that you can use to call your parseMolecule() function, which is doing all\n the \"real\" work.\n\n If you look at the Round 1 design and the modifications from May 14, we've\n got a set of steps for how to parse a single molecule. So, for example,\n if we call parseMolecule('4H2O') it should return something like:\n\n {'coefficient': 4, 'formula': {'H': 2, 'O': 1}}\n\n It only needs to handle one molecule at a time (since parseEquation() will\n call it for each molecule). We outlined one way of doing this in class that\n relied on looking up chemical symbols in the periodicTable(), but you could\n also (as Julie pointed out) use the pattern of Upper and lower case letters\n to identify chemical symbols.\n\n There are some handy built-in string functions out there...\n http://docs.python.org/2/library/stdtypes.html#str.isalnum\n'''\n\ndef parseMolecule(text):\n # magic goes here\n return moleculeAsDictionary\n\ndef parseEquation(splitEquation):\n parsedEquation = splitEquation[:] # lists are mutable!\n for side in parsedEquation:\n for i in list(range(len(side))):\n side[i] = parseMolecule(side[i])\n return parsedEquation",
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": "You guys really have three priorities right now...\n\n- Make sure that once you \"parse\" something off the beginning of the text, you remove it from the text\n- Think about exactly how many different kinds of things you need to parse off the beginning of the text\n- How do we loop those three things so that we can parse *everything* out of the text"
},
{
"cell_type": "code",
"collapsed": false,
"input": "text = '4Na2K3HBrO4'\n\ni = 0\nwhile text[i].isdigit():\n i = i + 1\ncoefficient = text[0:i]\ntext = text[i:]\n\ni = 1\nwhile text[i].islower():\n i = i + 1\nsymbol = text[0:i]\ntext = text[i:]\n\ni = 0\nwhile text[i].isdigit():\n i = i + 1\nsubscript = text[0:i]\nif subscript == '':\n subscript = '1'\ntext = text[i:]\n\ni = 1\nwhile text[i].islower():\n i = i + 1\nsymbol2 = text[0:i]\ntext = text[i:]\n\ni = 0\nwhile text[i].isdigit():\n i = i + 1\nsubscript2 = text[0:i]\nif subscript2 == '':\n subscript2 = '1'\ntext = text[i:]\n\ni = 1\nwhile text[i].islower():\n i = i + 1\nsymbol3 = text[0:i]\ntext = text[i:]\n\ni = 0\nwhile text[i].isdigit():\n i = i + 1\nsubscript3 = text[0:i]\nif subscript3 == '':\n subscript3 = '1'\ntext = text[i:]\n\ni = 1\nwhile text[i].islower():\n i = i + 1\nsymbol4 = text[0:i]\ntext = text[i:]\n\ni = 0\nwhile text[i].isdigit():\n i = i + 1\nsubscript4 = text[0:i]\nif subscript4 == '':\n subscript4 = '1'\ntext = text[i:]\n\ni = 1\nwhile text[i].islower():\n i = i + 1\nsymbol5 = text[0:i]\ntext = text[i:]\n\ni = 0\nwhile text[i].isdigit():\n i = i + 1\nsubscript5 = text[0:i]\nif subscript5 == '':\n subscript5 = '1'\ntext = text[i:]\n\nprint(coefficient, symbol, subscript, symbol2, subscript2, symbol3, subscript3, symbol4, subscript4, symbol5, subscript5, text)\n",
"language": "python",
"metadata": {},
"outputs": [
{
"ename": "IndexError",
"evalue": "string index out of range",
"output_type": "pyerr",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mIndexError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-13-f3f3d330ce33>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 70\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 71\u001b[0m \u001b[0mi\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;36m0\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 72\u001b[1;33m \u001b[1;32mwhile\u001b[0m \u001b[0mtext\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mi\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0misdigit\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 73\u001b[0m \u001b[0mi\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mi\u001b[0m \u001b[1;33m+\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 74\u001b[0m \u001b[0msubscript5\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mtext\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m:\u001b[0m\u001b[0mi\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;31mIndexError\u001b[0m: string index out of range"
]
}
],
"prompt_number": 13
},
{
"cell_type": "code",
"collapsed": false,
"input": "",
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment