Skip to content

Instantly share code, notes, and snippets.

@jonathanmorgan
Last active March 24, 2017 16:50
Show Gist options
  • Save jonathanmorgan/8158b32bcf641885cb3e to your computer and use it in GitHub Desktop.
Save jonathanmorgan/8158b32bcf641885cb3e to your computer and use it in GitHub Desktop.
Good programming habits
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "",
"signature": "sha256:8275345da5ea1af02c4db165a29abfc95532a1e463df9582794f7384f8598113"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Week 5 - Good Programming Habits"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Use libraries and packages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Part of the reason we are learning to design as well as to code is so that we can figure out early on which parts of the technical work we do might have been done before, and so might be made easier by using an existing library or application.\n",
"\n",
"In python, external code libraries are usually distributed as packages, and can be easily installed and maintained using the programs `pip` (a standard package installer for any python environment) or `conda` (the installer specific to the anaconda environment). In the interests of not making you too dependent on anaconda, we will cover both `conda` and `pip` below, but will focus on `pip`.\n",
"\n",
"To use `pip`, you run the `pip` command:\n",
"\n",
" pip\n",
" \n",
"to use `conda`:\n",
"\n",
" conda\n",
" \n",
"Both `pip` and `conda` install packages to the default site-packages folder in your anaconda installation, so you can use them in any program you run via ipython or ipython notebook."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Finding and Installing packages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The easiest way to start looking for a package is to do an Internet search for the task you are trying to perform. Make sure to include the keyword \"python\" in your search, and if at first you don't find anything, try reading posts that look like they might be relevant to your work, to get ideas of keywords you can use to refine your search.\n",
"\n",
"Once you find the name of a package you want to install, when using `pip`, you can search the Python Package Index (also known as PyPI - \"pie-pie\") to get more details, including the precise name you enter in the `pip` command to load the package: [https://pypi.python.org/pypi](https://pypi.python.org/pypi). In either `pip` or `conda`, you can also use the \"search\" command.\n",
"\n",
" pip search \"requests\"\n",
" \n",
"OR\n",
"\n",
" conda search requests\n",
" \n",
"If you look at output, you'll see that the `pip` output is much more comprehensive. There are thousands of packages in PyPI, and `pip`'s search command does a text search for your query string through the information it knows about all of those projects. The package repository that `conda` references is more carefully vetted, but more limited."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Updating packages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In either `pip` or `conda`, you can use the \"list\" command to see which packages are installed:\n",
"\n",
" pip list\n",
"\n",
" conda list\n",
"\n",
"In pip, to see which installed packages can be updated, you can run the command:\n",
"\n",
" pip list -o\n",
" \n",
"To update existing packages, `pip` and `conda` each command has an \"update\" command:\n",
"\n",
" pip update <package-name>\n",
" \n",
" conda update <package-name>\n",
" \n",
"When you run the update command, you'll not only update the package in question, but also any packages on which it depends.\n",
"\n",
"In general, it is a good idea to keep packages up to date so you get security updates and bug fixes. It is also a good idea to be careful about updating packages, however, especially when you are in the middle of a project. Python package maintainers tend to be pretty good about backward compatibility (so changes don't break existing code), but you never know when a package update might cause a problem."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Importing and using packages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once you've installed a package, you then need to import it in your code before you can use it. There are a couple of ways you can import packages into python:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# bring everything defined in the package into the current name space\n",
"from requests import *\n",
"\n",
"# bring a certain thing defined by the package into the current name space\n",
"# - the get() function\n",
"from requests import get\n",
"\n",
"# add the package, but require that you use the package name to access\n",
"# objects or functions in the package.\n",
"import requests"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, what do we think of these three options?\n",
"\n",
"- The first is not so good. Never pull everything defined in a package into your code. It might overwrite other things with the same name, and later, you have no idea what parts of the package you did or did not use.\n",
"\n",
"- The second is much better. You are specifically only importing what you are using. There is still the potential for name collisions (like in this example, when you have a function named \"get\"), but this is much cleaner, and will help you figure out later what parts of the package you used.\n",
"\n",
"- The third is also good. In this case, whenever you want to reference something in the package, you'd have to prefix the reference with the package name ( `requests.get()` ), and so to find what you used in a given program, you can just search for the package name followed by a period. In the case of long packages names, however, sometimes it is OK to just import the things you'll be using into the name space for readability."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Example: the \"requests\" package"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`requests` is a good example of a package that takes something that can be potentially very complicated and makes it straightforward. `requests` is a package that implements a simple HTTP client (like a web browser, but usable in a python program). HTTP clients are used to interact with HTTP-based APIs like those used to collect tweets from twitter ([https://dev.twitter.com/rest/public](https://dev.twitter.com/rest/public) and [https://dev.twitter.com/streaming/overview](https://dev.twitter.com/streaming/overview)) and reddit ([http://www.reddit.com/dev/api](http://www.reddit.com/dev/api)).\n",
"\n",
"Brief explanation of HTTP:\n",
"\n",
"- HTTP is a request-response protocol. You send a request, the server send you a response. Requests and responses are always paired.\n",
"- both requests and responses are made up of headers and a body. Headers are name-value pairs, just like in a dictionary. The body can contain anything, but generally for a request, it will either be empty or contain data for servicing a request (ever wonder where the data is stored when you submit a form? Often in the body of the request, though sometimes it is added to the URL.), and for a response, it will contain text - HTML, when you request a web page, for example.\n",
"\n",
"`requests` takes care of all the underlying details of what goes into making HTTP requests and receiving HTTP responses such that interacting with web resources becomes pretty easy with it (though it can still be pointy).\n",
"\n",
"To install requests:\n",
"\n",
" pip install requests\n",
" \n",
"OR\n",
"\n",
" conda install requests\n",
" \n",
"Then, to use it to connect to a URL:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# import requests\n",
"import requests\n",
"\n",
"# make a simple GET request\n",
"response = requests.get('https://api.github.com/user', auth=('user', 'pass'))\n",
"\n",
"# check the status code\n",
"print( \"Status code = \" + str( response.status_code ) )\n",
"\n",
"# Header - Content Type?\n",
"print( \"Content type = \" + response.headers['content-type'] )\n",
"\n",
"# Header - Encoding?\n",
"print( \"Encoding = \" + response.encoding )\n",
"\n",
"# Text contained in the body of the response\n",
"print( \"Contents of response body = \" + response.text )\n",
"\n",
"# that is JSON - convert to a JSON object.\n",
"print( \"As JSON object: \" + str(response.json() ) )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Status code = 401\n",
"Content type = application/json; charset=utf-8\n",
"Encoding = utf-8\n",
"Contents of response body = {\"message\":\"Bad credentials\",\"documentation_url\":\"https://developer.github.com/v3\"}\n",
"As JSON object: {u'documentation_url': u'https://developer.github.com/v3', u'message': u'Bad credentials'}\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### More information:\n",
"\n",
"- `pip` documentation: [https://pip.pypa.io/en/latest/](https://pip.pypa.io/en/latest/)\n",
"- `conda` documentation: [http://conda.pydata.org/docs/](http://conda.pydata.org/docs/)\n",
"- `requests` documentation: [http://docs.python-requests.org/en/latest/](http://docs.python-requests.org/en/latest/)\n",
"\n",
"<hr />"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Versioning and backups"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Versioning your code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As your code gets more complicated, you'll want to make sure that you are versioning your code as you make changes. When exactly you commit and push changes is up to you, but you should do it frequently enough that your versioning provides a useful backup. If you version once a year, after 6 months, your old versions will be so old and out of date as to be worthless if you have to revert. A few options are committing changes at regular intervals, committing whenver you complete a task, or (my favorite) committing before big changes to hardware or software so you have a backup in case your changes destroy your computer. Some people aim to never check broken code into a version control system. I personally am looking for a more fine-grained change record, so I check in more frequently, even if something might be broken and needs to be tested.\n",
"\n",
"To check in using github command line:\n",
"\n",
"- First, go into the directory of the git repository in which you are working.\n",
"- run git status to see what changes have been made:\n",
"\n",
" git status\n",
" \n",
"- Add any files or directories that are new or have been changed:\n",
"\n",
" git add <file_name>\n",
" \n",
" git add <directory_name>\n",
"\n",
" git add README.md\n",
" \n",
" git add *.py # you can use wild cards\n",
" \n",
"- Once you've added all the files, commit.\n",
"\n",
" git commit\n",
" \n",
"- As part of commit, it will ask you to enter a commit message. On Unix and Mac, this will open up your default shell text editor.\n",
"\n",
"- After commit, you sync with the github remote repository.\n",
"\n",
" # first pull, to receive changes that are on the server, not on your computer.\n",
" git pull\n",
" \n",
" # then, push your changes to github\n",
" git push\n",
" \n",
"When you are collaborating with a team of developers, pulls sometimes force you to manually reconcile changes made to the same bits of code. If it is just you working alone in a repository, however, chances are your pull won't result in any changes or merges. It will just tell you there aren't any changes.\n",
"\n",
"When you push, depending on how you cloned your repository, you will likely have to log in to github. If you haven't set up two-step authentication at github, just log in with your normal username and password for github.com. If you have set up two-step authentication, it gets a little more complicated. If you need help with this, let me know.\n",
"\n",
"Some examples:\n",
"\n",
"- what it looks like to have access to versions - GitHub for Mac and GitHub for Windows GUI\n",
"- editing README.md and committing changes"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Backing up an sqlite database"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To back up an SQLite database, just copy the \\*.sqlite file that holds your data and paste it somewhere else.\n",
"\n",
"<hr />"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Style (PEP-8, and more)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The most important thing about coding style is that you have some, are consistent about it, and are able to explain why you do something if asked. It is good to build up a style over time, based on what you see and like and don't like, and to expose yourself to different styles. It can also be good to adopt the style of those with whom you are working. There is no one right style, however, so be careful when judging others' style lest you be judged yourself."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"PEP-8 - standard Python style guidelines"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The bible for Python style is PEP-8 ( [https://www.python.org/dev/peps/pep-0008/](https://www.python.org/dev/peps/pep-0008/) ). PEP's are \"Python Enhancement Proposals\". Usually PEPs are requests for new functionality or changes to the way basic features of python work. PEP-8 is the style guide for people who write Python, included in the enhancement system for those who implement the enhancements.\n",
"\n",
"Some highlights:\n",
"\n",
"- always use spaces for indentation.\n",
"- use four spaces per indentation level.\n",
"- be kind to people interacting with a computer via a terminal:\n",
"\n",
" - never make a line of code longer than 79 characters.\n",
" - for comments and doc string, keep lines to 72 characters.\n",
"\n",
"- put two blank lines between top-level functions and classes in a file.\n",
"- put one blank line between method or function declarations.\n",
"- only import one package per line.\n",
"- avoid extraneous white space. Example:\n",
"\n",
" Yes: spam(ham[1], {eggs: 2})\n",
" No: spam( ham[ 1 ], { eggs: 2 } )\n",
"\n",
"When someone tells you that something is or is not pythonic, they are usually referring at least in part to PEP-8. Even though the level of detail is a little intimidating, it is worth reading PEP-8 just to get an idea of how detailed style guides can be, and how others will look at the code you write.\n",
"\n",
"If someone refers to pythonic, they might also be referring to PEP-20 - the Zen of python [https://www.python.org/dev/peps/pep-0020/](https://www.python.org/dev/peps/pep-0020/):\n",
"\n",
" Beautiful is better than ugly.\n",
" Explicit is better than implicit.\n",
" Simple is better than complex.\n",
" Complex is better than complicated.\n",
" Flat is better than nested.\n",
" Sparse is better than dense.\n",
" Readability counts.\n",
" Special cases aren't special enough to break the rules.\n",
" Although practicality beats purity.\n",
" Errors should never pass silently.\n",
" Unless explicitly silenced.\n",
" In the face of ambiguity, refuse the temptation to guess.\n",
" There should be one-- and preferably only one --obvious way to do it.\n",
" Although that way may not be obvious at first unless you're Dutch.\n",
" Now is better than never.\n",
" Although never is often better than *right* now.\n",
" If the implementation is hard to explain, it's a bad idea.\n",
" If the implementation is easy to explain, it may be a good idea.\n",
" Namespaces are one honking great idea -- let's do more of those!\n",
"\n",
"Be aware that some of the Python community feels very strongly about this, even as you might notice that some of it is difficult to really pin down. In programming, the culture and context of a language should be taken into account when working with and seeking help on a language. It is to the Python community's credit that they at least write some of this context down for others to see."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Jon's (sometimes non-Pythonic) style suggestions:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- be consistent.\n",
"- Only do one thing on a line.\n",
"\n",
" - Quote from Brian Kernighan (of Kernighan and Ritchie fame - authors of the book that specified the original C language): \"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you\u2019re as clever as you can be when you write it, how will you ever debug it?\"\n",
" \n",
" - example:\n",
" \n",
" # what does this do?\n",
" b = [i for i in a if i > 4] \n",
" \n",
"- Explicitly check for specific values when you expect them (like True, False, None) rather than just putting a variable name in an \"if\". Example:\n",
" \n",
" # If I'm expecting a boolean (True or False), I prefer:\n",
" if boolean_value == True:\n",
" \n",
" # rather than\n",
" if boolean_value:\n",
" \n",
" # same with None\n",
" if object_instance is None:\n",
" \n",
" # rather than\n",
" if object_instance:\n",
"\n",
"- Comment a lot.\n",
"- use human-readable, descriptive variable names.\n",
"\n",
" # so, for a dictionary that maps first names to last names:\n",
" first_to_last_name_dict = { 'Jonathan' : 'Morgan', 'Cliff' : 'Lampe' }\n",
" \n",
" # rather than\n",
" n = { 'Jonathan' : 'Morgan', 'Cliff' : 'Lampe' }\n",
" \n",
" # OR (better, but still not specific enough for me)\n",
" names_dict = { 'Jonathan' : 'Morgan', 'Cliff' : 'Lampe' }\n",
"\n",
"- put spaces around everything (in extraneous white space example above, I prefer the \"No\" style):\n",
"\n",
" - before and after every operator.\n",
" - before and after parentheses, square brackets, and curley brackets.\n",
" - etc.\n",
"\n",
"- always use parentheses to explicitly define order of operations in math and in conditional logic.\n",
"\n",
" # I prefer:\n",
" int_holder = ( 1 + ( 2 * 3 ) ) / 12\n",
" \n",
" # to something like:\n",
" int_holder = ( 1 + 2 * 3 ) / 12\n",
" \n",
" # same with booleans:\n",
" if ( ( x == 12 ) and ( is_cool == True ) ):\n",
" \n",
" # better than:\n",
" if x == 12 and is_cool == True:\n",
"\n",
"- in general, whenever possible, be clear and explicit: for example, always using parenthese in math or boolean logic to explicitly say the order you want. \n",
"- When you do conditionals or loops, if you only have one statement inside the loop or conditional branch, still place it on a separate line and indent it. Don't place the conditional check and the logic all on the same line.\n",
"\n",
"<hr />"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Don't Repeat Yourself (DRY) and function basics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_Based in part on information in [http://introtopython.org/introducing_functions.html](http://introtopython.org/introducing_functions.html)_\n",
"\n",
"DRY stands for \"Don't Repeat Yourself\". The idea here is that if in your program you repeat the exact same code more than once, you should abstract it out so you always call the same code everywhere you use that logic, so you are consistent, and can easily change it everywhere you use it by making changes once.\n",
"\n",
"Code re-use (breaking code out into functions, or perhaps objects so it can be re-used) is a way to make your programs more reliable and easier to maintain. It does have the sometimes unwanted side-effect of breaking more than one thing if you break a heavily shared function or object, but having reusable code that is widely used also means that you will be more likely to catch problems sooner than you would be if the code wasn't re-used.\n",
"\n",
"Here is what a function definition looks like:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def <function_name>( <param_list> ):\n",
"\n",
" '''\n",
" Documentation string that explains what the function does.\n",
" '''\n",
"\n",
" # return reference\n",
" status_OUT = \"Success!\"\n",
"\n",
" # indented stuff that is part of the function. If there is an error, change status_OUT to an error message.\n",
"\n",
" return status_OUT\n",
"\n",
"#-- END function <function_name> --#"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Where:\n",
"\n",
"- `<function_name>` is the name you'll reference when you call your function. Usually a function name is all lower case with underscores between words, just like a variable name. It should be precise and specific, and will often start with a verb and end with the object of the verb. Here, again, \"x\" is not a good function name. Something like \"get_dictionary_value_as_int()\" is better.\n",
"- `<param_list>` can either be empty (so no parameters defined here), or could be one or more names of parameters you want passed to your function in a comma delimited list.\n",
" - you can make parameters optional, as well, but setting a default value in the function declaration.\n",
"- you should always place a documentation string just after the declaration, inside the function. This is a multi-lined comment (surrounded by `'''`) that explains what the function does.\n",
"- if your function returns something, you should define it at the top, and return it at the bottom. Don't fall out of the function in the middle.\n",
"- for variables you declare and use inside a function, declare them at the top of the function and initialize them to their empty state.\n",
"\n",
"Example:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# define function that retrieves a dictionary value as an integer\n",
"def get_dictionary_value_as_int( dict_IN, name_IN, default_IN = -1 ):\n",
" \n",
" '''\n",
" Accepts a dictionary, a name, and a default value to be used if the\n",
" name is not in the dictionary. Returns value for the name, after\n",
" casting it to an integer. If the value can't be converted to an\n",
" integer, throws an exception.\n",
" '''\n",
" \n",
" # return reference\n",
" value_OUT = -1\n",
" \n",
" # is the name present in the dictionary?\n",
" if ( name_IN in dict_IN ):\n",
" \n",
" # retrieve the value\n",
" value_OUT = dict_IN.get( name_IN, default_IN )\n",
" \n",
" # convert to integer\n",
" value_OUT = int( value_OUT )\n",
" \n",
" else:\n",
" \n",
" # not present. Return default.\n",
" value_OUT = default_IN\n",
" \n",
" #-- END check to see if name is in dictionary --#\n",
" \n",
" # return the result.\n",
" return value_OUT\n",
"\n",
"#-- END function get_dictionary_value_as_int() --#\n",
"\n",
"# make a dictionary\n",
"test_dictionary = { \"name1\" : \"value1\", \"name2\" : \"2\", \"name3\" : \"value3\" }\n",
"\n",
"# get the value for name2 as an integer\n",
"int_value = get_dictionary_value_as_int( test_dictionary, \"name2\", -1 )\n",
"\n",
"# print the value\n",
"print( \"Int value returned = \" + str( int_value ) )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Int value returned = 2\n"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A slightly more involved example, and introduction to objects:\n",
"\n",
"- My python_utilities: [https://github.com/jonathanmorgan/python_utilities](https://github.com/jonathanmorgan/python_utilities)\n",
"\n",
" - README (home page on github) gives an idea of the types of things one can abstract.\n",
" - classes, functions - many different ways to make reusable things.\n",
" - Example: python_utilities/dictionaries/dict_helper.py\n",
" \n",
" - there is one function here, get_dict_value(), that shows how you'd interact with the class to call class methods, and then a Python class that just serves as a namespace for many more convenience functions."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Detecting opportunities for re-use"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the following code, see what looks like it might be a candidate for turning into a function:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"students = ['bernice', 'aaron', 'cody']\n",
"\n",
"# Put students in alphabetical order.\n",
"students.sort()\n",
"\n",
"# Display the list in its current order.\n",
"print(\"Our students are currently in alphabetical order.\")\n",
"for student in students:\n",
" print(student.title())\n",
"\n",
"# Put students in reverse alphabetical order.\n",
"students.sort(reverse=True)\n",
"\n",
"# Display the list in its current order.\n",
"print(\"\\nOur students are now in reverse alphabetical order.\")\n",
"for student in students:\n",
" print(student.title())"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Our students are currently in alphabetical order.\n",
"Aaron\n",
"Bernice\n",
"Cody\n",
"\n",
"Our students are now in reverse alphabetical order.\n",
"Cody\n",
"Bernice\n",
"Aaron\n"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The answer - printing out the list of students! It is exactly the same each time. So, to make it into a function:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def show_students(students, message):\n",
" \n",
" '''\n",
" Print out a message, and then the list of students\n",
" '''\n",
"\n",
" print(message)\n",
" \n",
" # loop over list of students, printing each.\n",
" for student in students:\n",
"\n",
" print( student.title() )\n",
" \n",
" #-- END loop over students. --#\n",
" \n",
"#-- END function show_students() --#\n",
"\n",
"students = ['bernice', 'aaron', 'cody']\n",
"\n",
"# Put students in alphabetical order.\n",
"students.sort()\n",
"show_students( students, \"Our students are currently in alphabetical order.\" )\n",
"\n",
"#Put students in reverse alphabetical order.\n",
"students.sort( reverse=True )\n",
"show_students( students, \"\\nOur students are now in reverse alphabetical order.\" )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Our students are currently in alphabetical order.\n",
"Aaron\n",
"Bernice\n",
"Cody\n",
"\n",
"Our students are now in reverse alphabetical order.\n",
"Cody\n",
"Bernice\n",
"Aaron\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"More information:\n",
"\n",
"- on functions: [http://introtopython.org/introducing_functions.html](http://introtopython.org/introducing_functions.html)\n",
"- more on functions: [http://introtopython.org/more_functions.html](http://introtopython.org/more_functions.html)\n",
"- Intro. to objects: [http://introtopython.org/classes.html](http://introtopython.org/classes.html)\n",
"\n",
"<hr />"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Comments (again)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Review:\n",
"\n",
"#### What makes a good comment?\n",
"\n",
"- It is short and to the point, but a complete thought. Most comments should be written in complete sentences.\n",
"- It explains your thinking, so that when you return to the code later you will understand how you were approaching the problem.\n",
"- It explains your thinking, so that others who work with your code will understand your overall approach to a problem.\n",
"- It explains particularly difficult sections of code in detail.\n",
"\n",
"#### When should you write a comment?\n",
"\n",
"- When you have to think about code before writing it.\n",
"- When you are likely to forget later exactly how you were approaching a problem.\n",
"- When there is more than one way to solve a problem.\n",
"- When others are unlikely to anticipate your way of thinking about a problem.\n",
"\n",
"#### More considerations:\n",
"\n",
"- At the least, you should have an explanation of a given chunk of code (method, class, function) as the first thing inside it, surrounded by the multi-line comment - `'''` before and after. Describe what the code does, any assumptions, etc.\n",
"\n",
"- Consider not just putting an explanation, but outlining purpose, preconditions, and postconditions:\n",
"\n",
" - purpose: What the program does, what arguments it accepts, what it returns, ow it handles error conditions, and anything else you think is important about the program.\n",
" - preconditions: Anything that has to happen before you can successfully use the code, and anything a user of the code should understand before using it.\n",
" - postconditions: Anything that you should know about the state of a program after the code completes, including side-effects like updates to database or changes to objects that were passed in as parameters.\n",
" - Example:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
" def process_article( self, article_IN, coding_user_IN = None, *args, **kwargs ):\n",
"\n",
" '''\n",
" purpose: After the ArticleCoder is initialized, this method accepts one\n",
" article instance and codes it for sourcing. In regards to articles,\n",
" this class is stateless, so you can process many articles with a\n",
" single instance of this object without having to reconfigure each\n",
" time.\n",
" preconditions: load_config_properties() should have been invoked before\n",
" running this method.\n",
" postconditions: article passed in is coded, which means an Article_Data\n",
" instance is created for it and populated to the extent the child\n",
" class is capable of coding the article.\n",
" '''\n",
"\n",
" pass\n",
"\n",
" #-- END method process_article() --#"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Always use \"#\" except for comments at the top of methods, so you can use block comments to easily comment out big chunks of code.\n",
"- consider a comment at the end of indentation to anchor you (not pythonic).\n",
"- Also, seriously, comment. For you, and for others.\n",
"\n",
"<hr />"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Testing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You should always test your code, especially as you make changes over time. While there are many sophisticated testing strategies that software professionals use, for this class, we are just going to cover a basic philosophy of testing, and if you decide you want to learn formal methods, go for it.\n",
"\n",
"- Most importantly, test frequently such that you are generally testing fewer smaller changes at once, and testing more frequently, rather than letting things build up and then having to test a complex knot of changes all at the same time (similar to idea of not doing more than one thing on a given line, try to not test too many changes at once).\n",
"\n",
"- aim to make sure that you have tested every line of code at least once.\n",
"\n",
"- If you have a conditional, it is not enough to just test the true condition. You should also test to make sure the false works as expected.\n",
"\n",
"- If you have a loop, you should test your code in situations where the criteria for even entering the loop are never met, where you only go through the loop one time, looping a small number of times, and then looping a HUGE number of times.\n",
"\n",
"- If you are dealing with numbers, always test with the lowest and highest numbers you expect might be present as values, and then test with something squarely in hte middle.\n",
"\n",
"- If you are testing database logic, first test everything you can that isn't responsible for a write to the database before you actually update the database, and then always make a backup before you start updating the database, so you can revert if you make a huge mistake.\n",
"\n",
"- Also take a little time to think of things you'd expect to break your code, and see if they do. Always be on the lookout for interesting data (whatever it may be) that might have a chance of breaking your code.\n",
"\n",
"- If you have a particular chunk of code that changes often, consider writing a few reusable test routines to periodically check whether that chunk of code still works or not."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"More information:\n",
"\n",
"- [http://docs.python-guide.org/en/latest/writing/tests/](http://docs.python-guide.org/en/latest/writing/tests/)\n",
"\n",
"<hr />"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Debugging"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_Based on information in [https://scipy-lectures.github.io/advanced/debugging/index.html](https://scipy-lectures.github.io/advanced/debugging/index.html)_\n",
"\n",
"So, you have a bug. Perhaps it causes an exception that looks something like this:\n",
"\n",
" Traceback (most recent call last):\n",
" File \"/home/jonathanmorgan/work/sourcenet/django/research/sourcenet/article_coding/article_coding.py\", line 258, in code_article_data\n",
" current_status = article_coder.code_article( current_article )\n",
" File \"/home/jonathanmorgan/work/sourcenet/django/research/sourcenet/article_coding/open_calais_article_coder.py\", line 170, in code_article\n",
" requests_response = my_http_helper.load_url_requests( self.OPEN_CALAIS_REST_API_URL, data_IN = article_body_html )\n",
" File \"/home/jonathanmorgan/work/sourcenet/django/research/python_utilities/network/http_helper.py\", line 638, in load_url_requests\n",
" response_OUT = requests.post( url_IN, headers = headers, data = data_IN )\n",
" File \"/home/jonathanmorgan/.virtualenvs/sourcenet/local/lib/python2.7/site-packages/requests/api.py\", line 99, in post\n",
" return request('post', url, data=data, json=json, **kwargs)\n",
" File \"/home/jonathanmorgan/.virtualenvs/sourcenet/local/lib/python2.7/site-packages/requests/api.py\", line 49, in request\n",
" response = session.request(method=method, url=url, **kwargs)\n",
" File \"/home/jonathanmorgan/.virtualenvs/sourcenet/local/lib/python2.7/site-packages/requests/sessions.py\", line 461, in request\n",
" resp = self.send(prep, **send_kwargs)\n",
" File \"/home/jonathanmorgan/.virtualenvs/sourcenet/local/lib/python2.7/site-packages/requests/sessions.py\", line 573, in send\n",
" r = adapter.send(request, **kwargs)\n",
" File \"/home/jonathanmorgan/.virtualenvs/sourcenet/local/lib/python2.7/site-packages/requests/adapters.py\", line 370, in send\n",
" timeout=timeout\n",
" File \"/home/jonathanmorgan/.virtualenvs/sourcenet/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py\", line 518, in urlopen\n",
" body=body, headers=headers)\n",
" File \"/home/jonathanmorgan/.virtualenvs/sourcenet/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py\", line 330, in _make_request\n",
" conn.request(method, url, **httplib_request_kw)\n",
" File \"/usr/lib/python2.7/httplib.py\", line 1001, in request\n",
" self._send_request(method, url, body, headers)\n",
" File \"/usr/lib/python2.7/httplib.py\", line 1035, in _send_request\n",
" self.endheaders(body)\n",
" File \"/usr/lib/python2.7/httplib.py\", line 997, in endheaders\n",
" self._send_output(message_body)\n",
" File \"/usr/lib/python2.7/httplib.py\", line 854, in _send_output\n",
" self.send(message_body)\n",
" File \"/usr/lib/python2.7/httplib.py\", line 826, in send\n",
" self.sock.sendall(data)\n",
" File \"/usr/lib/python2.7/socket.py\", line 224, in meth\n",
" return getattr(self._sock,name)(*args)\n",
" UnicodeEncodeError: 'ascii' codec can't encode character u'\\u2014' in position 98: ordinal not in range(128)\n",
" \n",
"What do you do?\n",
"\n",
"First, take a deep breath and remember that we all make buggy code. Debugging is part of the job. You'll get a better and better idea of what is broken and how you fix it the more programming you do, and the more bugs you see and resolve, but you will always run into unexpected errors, and you won't always know exactly what caused these errors (this was a tricky one for me).\n",
"\n",
"So, to start, let's look in more detail at that exception. Parts of an exception:\n",
"\n",
"- Exception name\n",
"- message\n",
"- stack trace - log of where the python interpreter thought it was in the hierarchy of function calls that make up a procedure when the error occurred. If you are lucky, it will tell you exactly where the problem is. If not, like last week when we saw what happened if you forgot a right parenthesis on a statement, you will at least know that there was a problem and you'll have to use the other information plus what the Internet tells you to try to figure it ll out.\n",
"\n",
"So in this case, the exception is a UnicodeEncodeError, the messages states that it is having rrouble converting a unicode character (\\u2014) to ASCII, and the stack trace lets me start at the original subroutine call and work my way through what happened to the point where the actual exception ocurred. In this case, the chain of calls started in my code, but then pretty quickly went into libraries that were not my code, suggesting that there was a problem in a library I was using (requests!). Ugh. Ends up I had to encode a string I was passing to requests a certain way, else it would break. "
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"General debugging steps"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, some general debugging steps from [https://scipy-lectures.github.io/advanced/debugging/index.html (https://scipy-lectures.github.io/advanced/debugging/index.html):\n",
"\n",
"If you do have a non trivial bug, this is when debugging strategies kick in. There is no silver bullet. Yet, strategies help:\n",
"\n",
"- For debugging a given problem, the favorable situation is when the problem is isolated in a small number of lines of code, outside framework or application code, with short modify-run-fail cycles\n",
"\n",
"- Make it fail reliably. Find a test case that makes the code fail every time.\n",
"\n",
"- \\* Divide and Conquer. Once you have a failing test case, isolate the failing code.\n",
"\n",
" - Which module.\n",
" - Which function.\n",
" - Which line of code.\n",
"\n",
"- isolate a small reproducible failure: a test case\n",
"- Change one thing at a time and re-run the failing test case.\n",
"- Use the debugger or print() statements to understand what is going wrong.\n",
"- Take notes and be patient. It may take a while.\n",
"- If all else fails and you have a complex file of code and you can't figure out what is causing a problem, strip the entire program out into another file, then gradually paste pieces back in until your problem occurs again. Then at least you'll have a better idea of what/where the problem is."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exception handling"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you are getting an exception and you want to handle it more gracefully than having your whole program explode in a cascade of stack trace, you can use the `try` and `except` statements to trap the exception statement and do some debugging or make your program fail gracefully.\n",
"\n",
"To use a try-except block to trap exceptions, simply place `try:` on a line before you do the things that causes an exception, indent the code you are having trouble with so it is nested inside the `try`. After this code of interest, un-indent back out to the same level as `try:`, then place `except:` on a line and nest your exception handling code inside the `except` block.\n",
"\n",
"Example:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# define function that retrieves a dictionary value as an integer\n",
"def get_dictionary_value_as_int( dict_IN, name_IN, default_IN = -1 ):\n",
" \n",
" '''\n",
" Accepts a dictionary, a name, and a default value to be used if the\n",
" name is not in the dictionary. Returns value for the name, after\n",
" casting it to an integer. If the value can't be converted to an\n",
" integer, returns the default.\n",
" '''\n",
" \n",
" # return reference\n",
" value_OUT = -1\n",
" \n",
" # is the name present in the dictionary?\n",
" if ( name_IN in dict_IN ):\n",
" \n",
" # retrieve the value\n",
" value_OUT = dict_IN.get( name_IN, default_IN )\n",
" \n",
" # exception handling, in case the value isn't a number.\n",
" try:\n",
" \n",
" # convert to integer\n",
" value_OUT = int( value_OUT )\n",
" \n",
" except:\n",
" \n",
" # not a number. Return the default.\n",
" value_OUT = default_IN\n",
" \n",
" #-- END try-except block --#\n",
" \n",
" else:\n",
" \n",
" # not present. Return default.\n",
" value_OUT = default_IN\n",
" \n",
" #-- END check to see if name is in dictionary --#\n",
" \n",
" # return the result.\n",
" return value_OUT\n",
"\n",
"#-- END function get_dictionary_value_as_int() --#\n",
"\n",
"# make a dictionary\n",
"test_dictionary = { \"name1\" : \"value1\", \"name2\" : \"2\", \"name3\" : \"value3\" }\n",
"\n",
"# get the value for name2 as an integer\n",
"int_value = get_dictionary_value_as_int( test_dictionary, \"name3\", -1 )\n",
"\n",
"# print the value\n",
"print( \"Int value returned = \" + str( int_value ) )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Int value returned = -1\n"
]
}
],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are more options for exception handling. You can catch a specific type of exception:\n",
"\n",
" except ValueError:\n",
"\n",
"You can store the exception as a variable so that you can reference it in your `except` clause:\n",
"\n",
" except ValueError as exception_instance:\n",
"\n",
"And you can have multiple `except` clauses that deal with different exception scenarios, including a base `except` at the end to handle unexpected exceptions:\n",
"\n",
" except ValueError as exception_instance:\n",
" \n",
" print( \"Ack! a value error: \" + str( exception_instance ) )\n",
" \n",
" except UnicodeEncodeError:\n",
" \n",
" print( \"Cursed emoji!\" )\n",
" \n",
" except:\n",
" \n",
" print( \"I don't know what to say...\" )\n",
" \n",
" #-- END try-except block --#\n",
"\n",
"More information:\n",
"\n",
"- on debugging: [https://scipy-lectures.github.io/advanced/debugging/index.html](https://scipy-lectures.github.io/advanced/debugging/index.html)\n",
"- python exception tutorial: [https://docs.python.org/2/tutorial/errors.html](https://docs.python.org/2/tutorial/errors.html)\n",
"- more on exceptions: [http://doughellmann.com/2009/06/19/python-exception-handling-techniques.html](http://doughellmann.com/2009/06/19/python-exception-handling-techniques.html)\n",
"\n",
"<hr />"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Long-running processes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When you have a long-running data collection or analysis process, where long-running is days, weeks, or even months, there are a few things you should be aware of that make it more likely your long-running process will actually run as long as you want/need it to.\n",
"\n",
"- 1) Power management settings - If you are running this program on your laptop or home computer, remember to change your power management settings so that your computer doesn't go to sleep or hibernate while it is supposed to be running.\n",
"\n",
"- 2) If you are on a Unix machine, you can install and use a program called screen to create a background session on the server that survives even if your connection to the computer breaks (especially useful if you are running the program on a remote computer). To create a screen session, at the unix prompt, type `screen`.\n",
"\n",
" This should open up another shell. To leave the screen session, press:\n",
" \n",
" Ctrl-A + D\n",
" \n",
" To rejoin a screen session:\n",
" \n",
" screen -R\n",
"\n",
"- 3) learn how to use python to send emails (you can find this information in my python_utilities package, in /python_utilities/email/email_helper.py). When you have a long-running process going, it can be a real relief to be able to add code to the process that emails you if the process encounters a problem (or an amazing miracle), or to just have the process email you at intervals to let you know it is still running.\n",
"\n",
"- 4) get a real server. Eventually, you'll want to both work with \"big data\" and be able to close your laptop when you leave the house, for example. At this point, it might be worth considering getting access to a unix/linux server and setting everything up there. There will be a learning curve when you switch to having to work in a command line most of the time, but nothing beats a well-implemented and well-maintained unix/linux server for big data tasks."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%nbtoc"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<!-- extracted from https://gist.github.com/magican/5574556 -->\n",
"<div id=\"toc-wrapper\">\n",
" <div class=\"header\">Contents <a href=\"#\" class=\"hide-btn\">[hide]</a></div>\n",
" <div id=\"toc\"></div>\n",
"</div>\n",
" \n",
"<style>\n",
" #toc {\n",
" overflow-y: scroll;\n",
" max-height: 300px;\n",
" }\n",
" #toc-wrapper {\n",
" position: fixed; top: 120px; max-width:430px; right: 20px;\n",
" border: thin solid rgba(0, 0, 0, 0.38); opacity: .8;\n",
" border-radius: 5px; background-color: #fff; padding:10px;\n",
" z-index: 100;\n",
" }\n",
" #toc-wrapper.closed {\n",
" min-width: 100px;\n",
" width: auto;\n",
" transition: width;\n",
" }\n",
" #toc-wrapper:hover{\n",
" opacity:1;\n",
" }\n",
" #toc-wrapper .header {\n",
" font-size:18px; font-weight: bold;\n",
" }\n",
" #toc-wrapper .hide-btn {\n",
" font-size: 14px;\n",
" }\n",
" \n",
"</style>\n",
"\n",
"<style>\n",
" ol.nested {\n",
" counter-reset: item;\n",
" list-style: none;\n",
" }\n",
" li.nested {\n",
" display: block;\n",
" }\n",
" li.nested:before {\n",
" counter-increment: item;\n",
" content: counters(item, \".\")\" \";\n",
" }\n",
"</style>\n"
],
"metadata": {},
"output_type": "display_data"
},
{
"javascript": [
"// adapted from https://gist.github.com/magican/5574556\n",
"\n",
"function clone_anchor(element) {\n",
" // clone link\n",
" var h = element.find(\"div.text_cell_render\").find(':header').first();\n",
" var a = h.find('a').clone();\n",
" var new_a = $(\"<a>\");\n",
" new_a.attr(\"href\", a.attr(\"href\"));\n",
" // get the text *excluding* the link text, whatever it may be\n",
" var hclone = h.clone();\n",
" hclone.children().remove();\n",
" new_a.text(hclone.text());\n",
" return new_a;\n",
"}\n",
"\n",
"function ol_depth(element) {\n",
" // get depth of nested ol\n",
" var d = 0;\n",
" while (element.prop(\"tagName\").toLowerCase() == 'ol') {\n",
" d += 1;\n",
" element = element.parent();\n",
" }\n",
" return d;\n",
"}\n",
"\n",
"function table_of_contents(threshold) {\n",
" if (threshold === undefined) {\n",
" threshold = 4;\n",
" }\n",
" var cells = IPython.notebook.get_cells();\n",
" \n",
" var ol = $(\"<ol/>\");\n",
" $(\"#toc\").empty().append(ol);\n",
" \n",
" for (var i=0; i < cells.length; i++) {\n",
" var cell = cells[i];\n",
" \n",
" if (cell.cell_type !== 'heading') continue;\n",
" \n",
" var level = cell.level;\n",
" if (level > threshold) continue;\n",
" \n",
" var depth = ol_depth(ol);\n",
"\n",
" // walk down levels\n",
" for (; depth < level; depth++) {\n",
" var new_ol = $(\"<ol/>\");\n",
" ol.append(new_ol);\n",
" ol = new_ol;\n",
" }\n",
" // walk up levels\n",
" for (; depth > level; depth--) {\n",
" ol = ol.parent();\n",
" }\n",
" //\n",
" ol.append(\n",
" $(\"<li/>\").append(clone_anchor(cell.element))\n",
" );\n",
" }\n",
"\n",
" $('#toc-wrapper .header').click(function(){\n",
" $('#toc').slideToggle();\n",
" $('#toc-wrapper').toggleClass('closed');\n",
" if ($('#toc-wrapper').hasClass('closed')){\n",
" $('#toc-wrapper .hide-btn').text('[show]');\n",
" } else {\n",
" $('#toc-wrapper .hide-btn').text('[hide]');\n",
" }\n",
" return false;\n",
" })\n",
"\n",
" $(window).resize(function(){\n",
" $('#toc').css({maxHeight: $(window).height() - 200})\n",
" })\n",
"\n",
" $(window).trigger('resize')\n",
"}\n",
"\n",
"table_of_contents();\n",
"\n",
"\n"
],
"metadata": {},
"output_type": "display_data"
}
],
"prompt_number": 13
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment