Skip to content

Instantly share code, notes, and snippets.

@fomightez
Last active December 20, 2015 08:39
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save fomightez/6102154 to your computer and use it in GitHub Desktop.
Save fomightez/6102154 to your computer and use it in GitHub Desktop.
compositioncalc2.py from Practical Computing for Biologists by Steven H. D. Haddock and Casey W. Dunn AS A STATIC IPYTHON Notebook. Posted as a Gist by Wayne Decatur (fomightez) with full credit and reference to the original authors and note where the freely share the code online. You can see an interactive IPython gist of this at https://www.py…
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "compositioncalc2.ipynb"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"compositioncalc2.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
">code by Steven H. D. Haddock and Casey W. Dunn as described in: \n",
">Practical Computing for Biologists \n",
">by Steven H. D. Haddock and Casey W. Dunn \n",
">published in 2011 by Sinauer Associates. \n",
">ISBN 978-0-87893-391-4 \n",
">\n",
">[http://www.sinauer.com/practical-computing-for-biologists.html](http://www.sinauer.com/practical-computing-for-biologists.html) \n",
">see [practicalcomputing.org](practicalcomputing.org) \n",
">\n",
">scripts freely available by the original authors at practicalcomputing.org \n",
">DIRECT LINK: [http://practicalcomputing.org/files/pcfb_examples.zip](http://practicalcomputing.org/files/pcfb_examples.zip) \n",
">####posted as a Gist and IPython Notebook by Wayne (fomightez at GitHub) with full credit and reference to original code authors."
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"compositioncalc2.py calculates percent of bases or amino acids in a DNA or protein sequence. <br/>\n",
"It handles nonstandard bases or amino acids, and actually is so general that will calculare percent of all characters of any string. <br/>\n",
"This improvement over the <a href=\"http://nbviewer.ipython.org/6077226\">compositoncalc1 script</a> is made possible by defining 'BaseList = list(set(DNASeq))' .<br/>\n",
"The code:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"DNASeq = \"ATGTCTCATTCAAAGCA\"\t\n",
"SeqLength = float(len(DNASeq))\t\n",
"\n",
"BaseList = list(set(DNASeq))\t\n",
"for Base in BaseList:\t\n",
"\tPercent = 100 * DNASeq.count(Base) / SeqLength\t\n",
"\tprint \"%s: %4.1f\" % (Base,Percent)\n",
" "
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"A: 35.3\n",
"C: 23.5\n",
"T: 29.4\n",
"G: 11.8\n"
]
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**See the code in action and explore it interactively [here](https://www.pythonanywhere.com/gists/6097538/compositioncalc2.py/ipython2/).** "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Obtain a copy of this entire IPython Notebook [here](https://gist.github.com/fomightez/6102154) in order to explore it interactively.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" "
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"<br/>\n",
"Additional aid and exploration below:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%whos"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Variable Type Data/Info\n",
"---------------------------------\n",
"Base str G\n",
"BaseList list n=4\n",
"DNASeq str ATGTCTCATTCAAAGCA\n",
"MySeq str TTGGGGGGCGAAAA\n",
"Percent float 11.7647058824\n",
"SeqLength float 17.0\n",
"calc function <function calc at 0x10b444050>\n"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above special command lets us see what is defined and can be used in an IPython Notebook. \n",
"(For some reason it doesn't work for any of the initiating variables over in the interactive gist console.)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" \n",
" \n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" _We can go ahead and define a function that will do this caculation:_"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def calc (MySeq):\n",
" SeqLength = float(len(MySeq))\n",
" ComponentList = list(set(MySeq))\n",
" for Component in ComponentList:\n",
" Percent = 100 * MySeq.count(Component) / SeqLength\n",
" print \"%s: %4.1f%%\" % (Component,Percent)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_Then define a variable_"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"MySeq=\"TTGGGGGGCGAAAA\""
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_Then we feed that variable to the function:_"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"calc(MySeq)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"A: 28.6%\n",
"C: 7.1%\n",
"T: 14.3%\n",
"G: 50.0%\n"
]
}
],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_In fact, we can even skip the variable and directly input the sequence into the function:_"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"calc(\"TGTTNNTCTCCCCAAAA\")"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"A: 23.5%\n",
"C: 29.4%\n",
"T: 29.4%\n",
"G: 5.9%\n",
"N: 11.8%\n"
]
}
],
"prompt_number": 12
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment