Skip to content

Instantly share code, notes, and snippets.

@gregcaporaso
Last active October 12, 2015 07:47
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save gregcaporaso/3994000 to your computer and use it in GitHub Desktop.
Save gregcaporaso/3994000 to your computer and use it in GitHub Desktop.
IPython Notebook files used in Greg Caporaso's Fall 2012 BIO599 Computational Biology course. See the included README.md file for more details and licensing information.

IPython Notebook files used in Greg Caporaso's Fall 2012 BIO599 Computational Biology course.

These closely follow the Python Programming chapters of Practical Computing for Biologists. A lot of exercises can be found in Learn Python the Hard Way.

This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Feel free to use or modify these notebooks, but please credit me by placing the following attribution information where you feel that it makes sense: Greg Caporaso, www.caporaso.us.

Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
from sys import argv
usage = "Lecture25_example.py <name> <day>"
if len(argv) != 3:
print "ERROR: Incorrect number of arguments passed."
print "USAGE: " + usage
else:
script_name, name, day = argv
print "Hello " + name
print "Today is " + day
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "CaporasoLecture27"
},
"name": "CaporasoLecture27",
"nbformat": 2,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"source": "**Writing files**\n\nWe previously looked at how to read files using python, so now let's look briefly at how to write files. Again we use ``open``, but use either mode ``w`` (for write) or ``a`` (for append)."
},
{
"cell_type": "code",
"collapsed": true,
"input": "f = open('example.txt','w')",
"language": "python",
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": true,
"input": "f.write(\"Hello world!\")\nf.write(\"\\n\")\nf.write(\"Goodbye!\")\nf.write(\"\\n\")",
"language": "python",
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": true,
"input": "f.close()",
"language": "python",
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "markdown",
"source": "Here I wrote a few lines to file. We can view those from within the IPython Notebook using the ``bash`` command ``cat``. Remember that to run a ``bash`` command in the Notebook you preface it with an ``!``."
},
{
"cell_type": "code",
"collapsed": false,
"input": "!cat example.txt",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "Hello world!\nGoodbye!"
}
],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": true,
"input": "f = open('example.txt','w')\nf.write(\"This is a test!\\n\")\nf.close()",
"language": "python",
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": "!cat example.txt",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "This is a test!"
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"source": "Notice that we overwrote what was previously in that file. You need to be really careful with this - once you call ``open`` in ``w`` mode, any contents of an existing file will be immediately overwritten.\n\nTo append to a file, we can open it in ``a`` mode."
},
{
"cell_type": "code",
"collapsed": true,
"input": "f = open('example.txt','a')\nf.write(\"This is only a test!\\n\")\nf.close()",
"language": "python",
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": "!cat example.txt",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "This is a test!\nThis is only a test!"
}
],
"prompt_number": 11
},
{
"cell_type": "markdown",
"source": "Notice that we now started writing at the end of the existing file. This distinction between writing to and appending to files should be familiar from when we looked at the ``>`` and ``>>`` operators in ``bash``. If you don't remember how those worked, go back and review them.\n\nWe can of course now open this file in python and read on a line-by-line basis. Here's an example of how to do that, adding some annotation."
},
{
"cell_type": "code",
"collapsed": false,
"input": "f = open('example.txt','U')\nline_n = 1\nfor line in f:\n line = line.strip()\n print \"Line number %d: %s\" % (line_n, line)\n line_n += 1\nf.close()",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "Line number 1: This is a test!\nLine number 2: This is only a test!"
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"source": "Notice that I'm using some different syntax for building strings here. This can be more convenient than concatenating strings with ``+`` when your strings start getting more complex, but this is personal preference. You can find some discussion and exercises on the *Learn Python The Hard Way* site [here](http://learnpythonthehardway.org/book/ex5.html) and [here](http://learnpythonthehardway.org/book/ex6.html). "
},
{
"cell_type": "markdown",
"source": "**Working with python's built-in modules**"
},
{
"cell_type": "markdown",
"source": "Next we'll jump to chapter 12 of *Practical Computing for Biologists* and look at some of the built-in functions and objects that come with python. These are referred to as modules or libraries.\n\nTo access code in an existing module we use the ``import`` statement. You can browse the built-in modules on the python website in Sections 7 - 40 of [this page](http://docs.python.org/2/library/).\n\nFirst let's take a look at [the ``random`` module](http://docs.python.org/2/library/random.html), which is used for generating pseudo-random numbers. To access the functions in random we can do the following:"
},
{
"cell_type": "code",
"collapsed": true,
"input": "import random",
"language": "python",
"outputs": [],
"prompt_number": 14
},
{
"cell_type": "markdown",
"source": "We can now use ``dir`` to see what functions are available."
},
{
"cell_type": "code",
"collapsed": false,
"input": "dir(random)",
"language": "python",
"outputs": [
{
"output_type": "pyout",
"prompt_number": 15,
"text": "[&apos;BPF&apos;,\n &apos;LOG4&apos;,\n &apos;NV_MAGICCONST&apos;,\n &apos;RECIP_BPF&apos;,\n &apos;Random&apos;,\n &apos;SG_MAGICCONST&apos;,\n &apos;SystemRandom&apos;,\n &apos;TWOPI&apos;,\n &apos;WichmannHill&apos;,\n &apos;_BuiltinMethodType&apos;,\n &apos;_MethodType&apos;,\n &apos;__all__&apos;,\n &apos;__builtins__&apos;,\n &apos;__doc__&apos;,\n &apos;__file__&apos;,\n &apos;__name__&apos;,\n &apos;__package__&apos;,\n &apos;_acos&apos;,\n &apos;_ceil&apos;,\n &apos;_cos&apos;,\n &apos;_e&apos;,\n &apos;_exp&apos;,\n &apos;_hashlib&apos;,\n &apos;_hexlify&apos;,\n &apos;_inst&apos;,\n &apos;_log&apos;,\n &apos;_pi&apos;,\n &apos;_random&apos;,\n &apos;_sin&apos;,\n &apos;_sqrt&apos;,\n &apos;_test&apos;,\n &apos;_test_generator&apos;,\n &apos;_urandom&apos;,\n &apos;_warn&apos;,\n &apos;betavariate&apos;,\n &apos;choice&apos;,\n &apos;division&apos;,\n &apos;expovariate&apos;,\n &apos;gammavariate&apos;,\n &apos;gauss&apos;,\n &apos;getrandbits&apos;,\n &apos;getstate&apos;,\n &apos;jumpahead&apos;,\n &apos;lognormvariate&apos;,\n &apos;normalvariate&apos;,\n &apos;paretovariate&apos;,\n &apos;randint&apos;,\n &apos;random&apos;,\n &apos;randrange&apos;,\n &apos;sample&apos;,\n &apos;seed&apos;,\n &apos;setstate&apos;,\n &apos;shuffle&apos;,\n &apos;triangular&apos;,\n &apos;uniform&apos;,\n &apos;vonmisesvariate&apos;,\n &apos;weibullvariate&apos;]"
}
],
"prompt_number": 15
},
{
"cell_type": "markdown",
"source": "The ones I use most here are ``choice``, ``random``, and ``shuffle``. After calling ``import random`` you can access these as ``random.choice``, ``random.random``, and ``random.shuffle``. Remember that you can call ``help`` to find out what a python function does - give that a try."
},
{
"cell_type": "code",
"collapsed": false,
"input": "help(random.shuffle)",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "Help on method shuffle in module random:\n\nshuffle(self, x, random=None, int=&lt;type &apos;int&apos;&gt;) method of random.Random instance\n x, random=random.random -&gt; shuffle list x in place; return None.\n \n Optional arg random is a 0-argument function returning a random\n float in [0.0, 1.0); by default, the standard random.random.\n"
}
],
"prompt_number": 16
},
{
"cell_type": "markdown",
"source": "Let's try it out:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "l = range(100)\nprint l\nrandom.shuffle(l)\nprint l",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]\n[25, 87, 62, 43, 91, 38, 58, 88, 22, 17, 89, 59, 57, 96, 46, 49, 72, 30, 24, 97, 10, 40, 32, 84, 8, 47, 35, 45, 53, 42, 77, 27, 0, 44, 86, 33, 6, 71, 4, 85, 37, 90, 39, 78, 3, 67, 52, 1, 11, 51, 65, 41, 50, 23, 70, 55, 9, 2, 56, 15, 92, 14, 81, 54, 94, 48, 83, 63, 19, 74, 28, 68, 98, 20, 26, 16, 5, 31, 36, 95, 61, 29, 80, 60, 18, 69, 66, 34, 75, 79, 12, 99, 13, 21, 73, 76, 82, 93, 64, 7]"
}
],
"prompt_number": 17
},
{
"cell_type": "markdown",
"source": "Notice that if you call this different times you'll get different results."
},
{
"cell_type": "code",
"collapsed": false,
"input": "random.shuffle(l)\nprint l\nrandom.shuffle(l)\nprint l\nrandom.shuffle(l)\nprint l\nrandom.shuffle(l)\nprint l\nrandom.shuffle(l)\nprint l",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "[75, 98, 84, 95, 33, 4, 39, 19, 48, 86, 27, 68, 79, 34, 44, 50, 51, 31, 92, 99, 85, 23, 17, 88, 12, 15, 38, 49, 29, 65, 9, 1, 20, 71, 42, 30, 59, 91, 0, 47, 41, 13, 14, 40, 63, 25, 36, 46, 66, 10, 82, 24, 37, 56, 94, 18, 61, 77, 58, 78, 93, 6, 53, 7, 57, 69, 81, 89, 74, 45, 16, 72, 64, 2, 11, 73, 35, 22, 28, 97, 21, 70, 83, 3, 62, 80, 8, 67, 96, 54, 76, 55, 60, 90, 5, 26, 87, 43, 52, 32]\n[23, 79, 91, 21, 68, 18, 58, 63, 78, 25, 76, 24, 56, 36, 8, 86, 65, 46, 1, 42, 38, 28, 53, 98, 40, 95, 88, 13, 3, 87, 73, 82, 81, 71, 97, 7, 43, 89, 75, 19, 92, 17, 15, 33, 70, 85, 84, 99, 14, 62, 83, 90, 69, 16, 41, 48, 94, 66, 2, 74, 4, 5, 51, 52, 49, 93, 22, 61, 12, 60, 55, 11, 6, 10, 77, 9, 0, 37, 45, 20, 47, 39, 72, 27, 32, 26, 64, 30, 59, 57, 29, 44, 50, 35, 80, 34, 67, 31, 54, 96]\n[72, 32, 70, 57, 90, 2, 48, 51, 66, 80, 11, 35, 59, 4, 6, 39, 64, 21, 43, 84, 30, 46, 14, 58, 24, 27, 88, 22, 82, 78, 15, 55, 69, 8, 45, 77, 33, 98, 44, 19, 41, 38, 67, 97, 29, 25, 26, 23, 40, 16, 10, 92, 5, 28, 86, 50, 13, 20, 9, 12, 49, 99, 53, 95, 54, 79, 94, 65, 89, 74, 3, 83, 71, 76, 7, 85, 56, 47, 73, 63, 61, 60, 37, 75, 42, 18, 34, 68, 93, 96, 0, 36, 91, 62, 81, 31, 87, 52, 1, 17]\n[20, 71, 31, 67, 97, 25, 8, 70, 88, 80, 99, 50, 55, 0, 22, 4, 56, 1, 10, 79, 15, 3, 45, 7, 69, 62, 82, 93, 61, 44, 91, 84, 90, 74, 24, 28, 77, 17, 72, 21, 27, 41, 19, 51, 37, 39, 85, 46, 76, 40, 35, 5, 59, 60, 16, 95, 29, 23, 58, 48, 98, 94, 38, 34, 57, 53, 43, 47, 52, 87, 26, 32, 9, 54, 33, 65, 14, 73, 66, 11, 63, 49, 68, 13, 2, 75, 30, 92, 64, 6, 81, 86, 18, 83, 89, 78, 12, 96, 36, 42]\n[7, 15, 32, 28, 64, 71, 66, 69, 50, 9, 97, 67, 78, 83, 39, 90, 56, 79, 65, 74, 61, 13, 81, 25, 99, 86, 1, 41, 27, 6, 29, 53, 8, 47, 43, 72, 73, 3, 62, 58, 40, 0, 89, 96, 33, 55, 49, 4, 98, 21, 2, 5, 35, 45, 18, 77, 84, 16, 17, 94, 30, 23, 38, 51, 76, 34, 46, 63, 37, 70, 87, 68, 54, 91, 20, 36, 44, 93, 11, 22, 19, 88, 75, 12, 10, 80, 57, 26, 60, 24, 59, 95, 82, 92, 14, 48, 52, 31, 85, 42]"
}
],
"prompt_number": 18
},
{
"cell_type": "markdown",
"source": "``random.choice`` is another extremely useful function. This returns an element at random from a list, so is useful for subsampling with replacement."
},
{
"cell_type": "code",
"collapsed": false,
"input": "print random.choice(l)\nprint random.choice(l)\nprint random.choice(l)\nprint random.choice(l)",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "0\n9\n72\n50"
}
],
"prompt_number": 19
},
{
"cell_type": "markdown",
"source": "In general, you don't want to import all of the functions in a module, but rather only the ones you want to use. This is important because import too many modules can make your program bulkier as references will need to be stored for all of the functions in memory. Instead, you should use ``from`` with ``import`` to only import specific functions. For example:"
},
{
"cell_type": "code",
"collapsed": true,
"input": "from random import choice",
"language": "python",
"outputs": [],
"prompt_number": 20
},
{
"cell_type": "markdown",
"source": "Or, to import several at once:"
},
{
"cell_type": "code",
"collapsed": true,
"input": "from random import (choice, shuffle)",
"language": "python",
"outputs": [],
"prompt_number": 21
},
{
"cell_type": "markdown",
"source": "When you import in this way, you access these functions only by their name, not prefixed with ``random.``"
},
{
"cell_type": "code",
"collapsed": false,
"input": "l = range(100)\nprint choice(l)\nprint choice(l)",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "20\n82"
}
],
"prompt_number": 22
},
{
"cell_type": "code",
"collapsed": false,
"input": "shuffle(l)\nprint l\nshuffle(l)\nprint l",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "[94, 63, 54, 95, 79, 92, 77, 90, 12, 40, 0, 35, 55, 56, 68, 89, 87, 78, 26, 34, 80, 8, 65, 97, 53, 50, 29, 61, 41, 48, 4, 59, 99, 69, 74, 27, 67, 33, 73, 9, 71, 1, 14, 66, 7, 88, 44, 24, 38, 32, 22, 39, 91, 28, 25, 98, 85, 83, 62, 19, 37, 11, 23, 31, 18, 36, 57, 3, 13, 21, 2, 46, 51, 70, 64, 16, 60, 86, 45, 84, 49, 15, 96, 30, 20, 10, 5, 17, 82, 42, 43, 52, 76, 75, 47, 81, 6, 93, 72, 58]\n[43, 79, 21, 91, 65, 51, 22, 70, 8, 61, 23, 46, 54, 42, 98, 15, 59, 10, 78, 55, 80, 29, 32, 67, 69, 93, 97, 25, 76, 27, 53, 44, 20, 90, 0, 72, 35, 71, 56, 41, 39, 99, 94, 45, 6, 88, 18, 86, 36, 11, 75, 24, 12, 74, 52, 19, 66, 49, 34, 50, 96, 57, 48, 63, 73, 47, 92, 38, 1, 85, 33, 31, 5, 87, 40, 16, 30, 28, 13, 68, 7, 9, 82, 4, 84, 26, 14, 17, 37, 95, 58, 62, 89, 83, 81, 3, 2, 64, 77, 60]"
}
],
"prompt_number": 23
},
{
"cell_type": "markdown",
"source": "Next let's look at some other modules. ``time`` is very useful for timing how long some code takes to execute. The function of interest for that is called ``time`` (so the function has the same name as the module."
},
{
"cell_type": "code",
"collapsed": false,
"input": "from time import time\nhelp(time)",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "Help on built-in function time in module time:\n\ntime(...)\n time() -&gt; floating point number\n \n Return the current time in seconds since the Epoch.\n Fractions of a second may be present if the system clock provides them.\n"
}
],
"prompt_number": 25
},
{
"cell_type": "markdown",
"source": "This function returns the number of seconds since the [Epoch](http://en.wikipedia.org/wiki/Epoch_(reference_date)) which most often refers to January 1, 1970."
},
{
"cell_type": "code",
"collapsed": false,
"input": "time()",
"language": "python",
"outputs": [
{
"output_type": "pyout",
"prompt_number": 26,
"text": "1354200680.177434"
}
],
"prompt_number": 26
},
{
"cell_type": "code",
"collapsed": false,
"input": "time()",
"language": "python",
"outputs": [
{
"output_type": "pyout",
"prompt_number": 27,
"text": "1354200682.845203"
}
],
"prompt_number": 27
},
{
"cell_type": "code",
"collapsed": false,
"input": "time()",
"language": "python",
"outputs": [
{
"output_type": "pyout",
"prompt_number": 28,
"text": "1354200686.952125"
}
],
"prompt_number": 28
},
{
"cell_type": "markdown",
"source": "So how can you use this to see how long some piece of code takes to run? First, let's grab another function from ``time`` called ``sleep``. Figure out what ``sleep`` does."
},
{
"cell_type": "code",
"collapsed": false,
"input": "from time import sleep\nhelp(sleep)",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "Help on built-in function sleep in module time:\n\nsleep(...)\n sleep(seconds)\n \n Delay execution for a given number of seconds. The argument may be\n a floating point number for subsecond precision.\n"
}
],
"prompt_number": 29
},
{
"cell_type": "code",
"collapsed": false,
"input": "start_time = time()\nsleep(5)\nend_time = time()\nrun_time = end_time - start_time\nprint \"The code took %1.3f seconds to run.\" % run_time",
"language": "python",
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "The code took 5.003 seconds to run."
}
],
"prompt_number": 30
}
]
}
]
}
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment