Skip to content

Instantly share code, notes, and snippets.

@jonathanmorgan
Last active September 1, 2015 07:49
Show Gist options
  • Save jonathanmorgan/0e0cb8d4a9bdfb9519ce to your computer and use it in GitHub Desktop.
Save jonathanmorgan/0e0cb8d4a9bdfb9519ce to your computer and use it in GitHub Desktop.
Big Data Basics - Good Coding Habits
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Good Coding Habits"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Table of Contents"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- [1. Comments again](#1.-Comments-again)\n",
"- [2. Testing](#2.-Testing)\n",
"- [3. Debugging and exception handling](#3.-Debugging-and-exception-handling)\n",
"- [4. Style - PEP-8 and more](#4.-Style---PEP-8-and-more)\n",
"- [5. The software development process](#5.-The-software-development-process)\n",
"- [6. Use libraries and packages](#6.-Use-libraries-and-packages)\n",
"- [7. Versioning and backups](#7.-Versioning-and-backups)\n",
"- [8. Long-running processes](#8.-Long-running-processes)\n",
"- [9. DRY and function basics](#9.-DRY-and-function-basics)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Comments again"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- [Back to Table of Contents](#Table-of-Contents)\n",
"\n",
"Review:\n",
"\n",
"#### What makes a good comment?\n",
"\n",
"- It is short and to the point, but a complete thought. Most comments should be written in complete sentences.\n",
"- It explains your thinking, so that when you return to the code later you will understand how you were approaching the problem.\n",
"- It explains your thinking, so that others who work with your code will understand your overall approach to a problem.\n",
"- It explains particularly difficult sections of code in detail.\n",
"\n",
"#### When should you write a comment?\n",
"\n",
"- When you have to think about code before writing it.\n",
"- When you are likely to forget later exactly how you were approaching a problem.\n",
"- When there is more than one way to solve a problem.\n",
"- When others are unlikely to anticipate your way of thinking about a problem.\n",
"\n",
"#### More considerations:\n",
"\n",
"- At the least, you should have an explanation of a given chunk of code (method, class, function) as the first thing inside it, surrounded by the multi-line comment - `'''` before and after. Describe what the code does, any assumptions, etc.\n",
"\n",
"- Consider not just putting an explanation, but outlining purpose, preconditions, and postconditions:\n",
"\n",
" - purpose: What the program does, what arguments it accepts, what it returns, ow it handles error conditions, and anything else you think is important about the program.\n",
" - preconditions: Anything that has to happen before you can successfully use the code, and anything a user of the code should understand before using it.\n",
" - postconditions: Anything that you should know about the state of a program after the code completes, including side-effects like updates to database or changes to objects that were passed in as parameters.\n",
" - Example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
" def process_article( self, article_IN, coding_user_IN = None, *args, **kwargs ):\n",
"\n",
" '''\n",
" purpose: After the ArticleCoder is initialized, this method accepts one\n",
" article instance and codes it for sourcing. In regards to articles,\n",
" this class is stateless, so you can process many articles with a\n",
" single instance of this object without having to reconfigure each\n",
" time.\n",
" preconditions: load_config_properties() should have been invoked before\n",
" running this method.\n",
" postconditions: article passed in is coded, which means an Article_Data\n",
" instance is created for it and populated to the extent the child\n",
" class is capable of coding the article.\n",
" '''\n",
"\n",
" pass\n",
"\n",
" #-- END method process_article() --#"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Always use \"#\" except for comments at the top of methods, so you can use block comments to easily comment out big chunks of code.\n",
"- consider a comment at the end of indentation to anchor you (not pythonic).\n",
"- Also, seriously, comment. For you, and for others.\n",
"\n",
"<hr />"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Testing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- [Back to Table of Contents](#Table-of-Contents)\n",
"\n",
"You should test your code thoroughly. While there are many sophisticated testing strategies that software professionals use, for this class, we are just going to cover a basic philosophy of testing, and if you decide you want to learn formal methods, go for it.\n",
"\n",
"- Most importantly, test frequently such that you are generally testing fewer smaller changes at once, and testing more frequently, rather than letting things build up and then having to test a complex knot of changes all at the same time (similar to idea of not doing more than one thing on a given line, try to not test too many changes at once).\n",
"\n",
"- aim to make sure that you have tested every line of code at least once.\n",
"\n",
"- If you have a conditional, it is not enough to just test the true condition. You should also test to make sure the false works as expected.\n",
"\n",
"- If you have a loop, you should test your code in situations where the criteria for even entering the loop are never met, where you only go through the loop one time, looping a small number of times, and then looping a HUGE number of times.\n",
"\n",
"- If you are dealing with numbers, always test with the lowest and highest numbers you expect might be present as values, and then test with something squarely in hte middle.\n",
"\n",
"- If you are testing database logic, first test everything you can that isn't responsible for a write to the database before you actually update the database, and then always make a backup before you start updating the database, so you can revert if you make a huge mistake.\n",
"\n",
"- Also take a little time to think of things you'd expect to break your code, and see if they do. Always be on the lookout for interesting data (whatever it may be) that might have a chance of breaking your code.\n",
"\n",
"- If you have a particular chunk of code that changes often, consider writing a few reusable test routines to periodically check whether that chunk of code still works or not.\n",
"\n",
"More information:\n",
"\n",
"- [http://docs.python-guide.org/en/latest/writing/tests/](http://docs.python-guide.org/en/latest/writing/tests/)\n",
"\n",
"<hr />"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Debugging and exception handling"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- [Back to Table of Contents](#Table-of-Contents)\n",
"\n",
"_Based on information in [https://scipy-lectures.github.io/advanced/debugging/index.html](https://scipy-lectures.github.io/advanced/debugging/index.html)_\n",
"\n",
"So, you have a bug. Perhaps it causes an exception that looks something like this:\n",
"\n",
" Traceback (most recent call last):\n",
" File \"/home/jonathanmorgan/work/sourcenet/django/research/sourcenet/article_coding/article_coding.py\", line 258, in code_article_data\n",
" current_status = article_coder.code_article( current_article )\n",
" File \"/home/jonathanmorgan/work/sourcenet/django/research/sourcenet/article_coding/open_calais_article_coder.py\", line 170, in code_article\n",
" requests_response = my_http_helper.load_url_requests( self.OPEN_CALAIS_REST_API_URL, data_IN = article_body_html )\n",
" File \"/home/jonathanmorgan/work/sourcenet/django/research/python_utilities/network/http_helper.py\", line 638, in load_url_requests\n",
" response_OUT = requests.post( url_IN, headers = headers, data = data_IN )\n",
" File \"/home/jonathanmorgan/.virtualenvs/sourcenet/local/lib/python2.7/site-packages/requests/api.py\", line 99, in post\n",
" return request('post', url, data=data, json=json, **kwargs)\n",
" File \"/home/jonathanmorgan/.virtualenvs/sourcenet/local/lib/python2.7/site-packages/requests/api.py\", line 49, in request\n",
" response = session.request(method=method, url=url, **kwargs)\n",
" File \"/home/jonathanmorgan/.virtualenvs/sourcenet/local/lib/python2.7/site-packages/requests/sessions.py\", line 461, in request\n",
" resp = self.send(prep, **send_kwargs)\n",
" File \"/home/jonathanmorgan/.virtualenvs/sourcenet/local/lib/python2.7/site-packages/requests/sessions.py\", line 573, in send\n",
" r = adapter.send(request, **kwargs)\n",
" File \"/home/jonathanmorgan/.virtualenvs/sourcenet/local/lib/python2.7/site-packages/requests/adapters.py\", line 370, in send\n",
" timeout=timeout\n",
" File \"/home/jonathanmorgan/.virtualenvs/sourcenet/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py\", line 518, in urlopen\n",
" body=body, headers=headers)\n",
" File \"/home/jonathanmorgan/.virtualenvs/sourcenet/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py\", line 330, in _make_request\n",
" conn.request(method, url, **httplib_request_kw)\n",
" File \"/usr/lib/python2.7/httplib.py\", line 1001, in request\n",
" self._send_request(method, url, body, headers)\n",
" File \"/usr/lib/python2.7/httplib.py\", line 1035, in _send_request\n",
" self.endheaders(body)\n",
" File \"/usr/lib/python2.7/httplib.py\", line 997, in endheaders\n",
" self._send_output(message_body)\n",
" File \"/usr/lib/python2.7/httplib.py\", line 854, in _send_output\n",
" self.send(message_body)\n",
" File \"/usr/lib/python2.7/httplib.py\", line 826, in send\n",
" self.sock.sendall(data)\n",
" File \"/usr/lib/python2.7/socket.py\", line 224, in meth\n",
" return getattr(self._sock,name)(*args)\n",
" UnicodeEncodeError: 'ascii' codec can't encode character u'\\u2014' in position 98: ordinal not in range(128)\n",
" \n",
"What do you do?\n",
"\n",
"First, take a deep breath and remember that we all make buggy code. Debugging is part of the job. You'll get a better and better idea of what is broken and how you fix it the more programming you do, and the more bugs you see and resolve, but you will always run into unexpected errors, and you won't always know exactly what caused these errors (this was a tricky one for me).\n",
"\n",
"So, to start, let's look in more detail at that exception. Parts of an exception:\n",
"\n",
"- Exception name\n",
"- message\n",
"- stack trace - log of where the python interpreter thought it was in the hierarchy of function calls that make up a procedure when the error occurred. If you are lucky, it will tell you exactly where the problem is. If not, like last week when we saw what happened if you forgot a right parenthesis on a statement, you will at least know that there was a problem and you'll have to use the other information plus what the Internet tells you to try to figure it ll out.\n",
"\n",
"So in this case, the exception is a UnicodeEncodeError, the messages states that it is having rrouble converting a unicode character (\\u2014) to ASCII, and the stack trace lets me start at the original subroutine call and work my way through what happened to the point where the actual exception ocurred. In this case, the chain of calls started in my code, but then pretty quickly went into libraries that were not my code, suggesting that there was a problem in a library I was using (requests!). Ugh. Ends up I had to encode a string I was passing to requests a certain way, else it would break. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### General debugging steps"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, some general debugging steps from [https://scipy-lectures.github.io/advanced/debugging/index.html (https://scipy-lectures.github.io/advanced/debugging/index.html):\n",
"\n",
"If you do have a non trivial bug, this is when debugging strategies kick in. There is no silver bullet. Yet, strategies help:\n",
"\n",
"- For debugging a given problem, the favorable situation is when the problem is isolated in a small number of lines of code, outside framework or application code, with short modify-run-fail cycles\n",
"\n",
"- Make it fail reliably. Find a test case that makes the code fail every time.\n",
"\n",
"- \\* Divide and Conquer. Once you have a failing test case, isolate the failing code.\n",
"\n",
" - Which module.\n",
" - Which function.\n",
" - Which line of code.\n",
"\n",
"- isolate a small reproducible failure: a test case\n",
"- Change one thing at a time and re-run the failing test case.\n",
"- Use the debugger or print() statements to understand what is going wrong.\n",
"- Take notes and be patient. It may take a while.\n",
"- If all else fails and you have a complex file of code and you can't figure out what is causing a problem, strip the entire program out into another file, then gradually paste pieces back in until your problem occurs again. Then at least you'll have a better idea of what/where the problem is."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exception handling"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you are getting an exception and you want to handle it more gracefully than having your whole program explode in a cascade of stack trace, you can use the `try` and `except` statements to trap the exception statement and do some debugging or make your program fail gracefully.\n",
"\n",
"To use a try-except block to trap exceptions, simply place `try:` on a line before you do the things that causes an exception, indent the code you are having trouble with so it is nested inside the `try`. After this code of interest, un-indent back out to the same level as `try:`, then place `except:` on a line and nest your exception handling code inside the `except` block.\n",
"\n",
"Example:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Int value returned = -1\n"
]
}
],
"source": [
"# define function that retrieves a dictionary value as an integer\n",
"def get_dictionary_value_as_int( dict_IN, name_IN, default_IN = -1 ):\n",
" \n",
" '''\n",
" Accepts a dictionary, a name, and a default value to be used if the\n",
" name is not in the dictionary. Returns value for the name, after\n",
" casting it to an integer. If the value can't be converted to an\n",
" integer, returns the default.\n",
" '''\n",
" \n",
" # return reference\n",
" value_OUT = -1\n",
" \n",
" # is the name present in the dictionary?\n",
" if ( name_IN in dict_IN ):\n",
" \n",
" # retrieve the value\n",
" value_OUT = dict_IN.get( name_IN, default_IN )\n",
" \n",
" # exception handling, in case the value isn't a number.\n",
" try:\n",
" \n",
" # convert to integer\n",
" value_OUT = int( value_OUT )\n",
" \n",
" except:\n",
" \n",
" # not a number. Return the default.\n",
" value_OUT = default_IN\n",
" \n",
" #-- END try-except block --#\n",
" \n",
" else:\n",
" \n",
" # not present. Return default.\n",
" value_OUT = default_IN\n",
" \n",
" #-- END check to see if name is in dictionary --#\n",
" \n",
" # return the result.\n",
" return value_OUT\n",
"\n",
"#-- END function get_dictionary_value_as_int() --#\n",
"\n",
"# make a dictionary\n",
"test_dictionary = { \"name1\" : \"value1\", \"name2\" : \"2\", \"name3\" : \"value3\" }\n",
"\n",
"# get the value for name2 as an integer\n",
"int_value = get_dictionary_value_as_int( test_dictionary, \"name3\", -1 )\n",
"\n",
"# print the value\n",
"print( \"Int value returned = \" + str( int_value ) )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are more options for exception handling. You can catch a specific type of exception:\n",
"\n",
" except ValueError:\n",
"\n",
"You can store the exception as a variable so that you can reference it in your `except` clause:\n",
"\n",
" except ValueError as exception_instance:\n",
"\n",
"And you can have multiple `except` clauses that deal with different exception scenarios, including a base `except` at the end to handle unexpected exceptions:\n",
"\n",
" except ValueError as exception_instance:\n",
" \n",
" print( \"Ack! a value error: \" + str( exception_instance ) )\n",
" \n",
" except UnicodeEncodeError:\n",
" \n",
" print( \"Cursed emoji!\" )\n",
" \n",
" except:\n",
" \n",
" print( \"I don't know what to say...\" )\n",
" \n",
" #-- END try-except block --#\n",
"\n",
"More information:\n",
"\n",
"- on debugging: [https://scipy-lectures.github.io/advanced/debugging/index.html](https://scipy-lectures.github.io/advanced/debugging/index.html)\n",
"- python exception tutorial: [https://docs.python.org/2/tutorial/errors.html](https://docs.python.org/2/tutorial/errors.html)\n",
"- more on exceptions: [http://doughellmann.com/2009/06/19/python-exception-handling-techniques.html](http://doughellmann.com/2009/06/19/python-exception-handling-techniques.html)\n",
"\n",
"<hr />"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Style - PEP-8 and more"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- [Back to Table of Contents](#Table-of-Contents)\n",
"\n",
"The most important thing about coding style is that you have some, are consistent about it, and are able to explain why you do something if asked. It is good to build up a style over time, based on what you see and like and don't like, and to expose yourself to different styles. It can also be good to adopt the style of those with whom you are working. There is no one right style, however, so be careful when judging others' style lest you be judged yourself."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### PEP-8 - standard Python style guidelines"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The bible for Python style is PEP-8 ( [https://www.python.org/dev/peps/pep-0008/](https://www.python.org/dev/peps/pep-0008/) ). PEP's are \"Python Enhancement Proposals\". Usually PEPs are requests for new functionality or changes to the way basic features of python work. PEP-8 is the style guide for people who write Python, included in the enhancement system for those who implement the enhancements.\n",
"\n",
"Some highlights:\n",
"\n",
"- always use spaces for indentation.\n",
"- use four spaces per indentation level.\n",
"- be kind to people interacting with a computer via a terminal:\n",
"\n",
" - never make a line of code longer than 79 characters.\n",
" - for comments and doc string, keep lines to 72 characters.\n",
"\n",
"- put two blank lines between top-level functions and classes in a file.\n",
"- put one blank line between method or function declarations.\n",
"- only import one package per line.\n",
"- avoid extraneous white space. Example:\n",
"\n",
" Yes: spam(ham[1], {eggs: 2})\n",
" No: spam( ham[ 1 ], { eggs: 2 } )\n",
"\n",
"When someone tells you that something is or is not pythonic, they are usually referring at least in part to PEP-8. Even though the level of detail is a little intimidating, it is worth reading PEP-8 just to get an idea of how detailed style guides can be, and how others will look at the code you write.\n",
"\n",
"If someone refers to pythonic, they might also be referring to PEP-20 - the Zen of python [https://www.python.org/dev/peps/pep-0020/](https://www.python.org/dev/peps/pep-0020/):\n",
"\n",
" Beautiful is better than ugly.\n",
" Explicit is better than implicit.\n",
" Simple is better than complex.\n",
" Complex is better than complicated.\n",
" Flat is better than nested.\n",
" Sparse is better than dense.\n",
" Readability counts.\n",
" Special cases aren't special enough to break the rules.\n",
" Although practicality beats purity.\n",
" Errors should never pass silently.\n",
" Unless explicitly silenced.\n",
" In the face of ambiguity, refuse the temptation to guess.\n",
" There should be one-- and preferably only one --obvious way to do it.\n",
" Although that way may not be obvious at first unless you're Dutch.\n",
" Now is better than never.\n",
" Although never is often better than *right* now.\n",
" If the implementation is hard to explain, it's a bad idea.\n",
" If the implementation is easy to explain, it may be a good idea.\n",
" Namespaces are one honking great idea -- let's do more of those!\n",
"\n",
"Be aware that some of the Python community feels very strongly about this, even as you might notice that some of it is difficult to really pin down. In programming, the culture and context of a language should be taken into account when working with and seeking help on a language. It is to the Python community's credit that they at least write some of this context down for others to see."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Jon's (sometimes non-Pythonic) style suggestions:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- be consistent.\n",
"- Only do one thing on a line.\n",
"\n",
" - Quote from Brian Kernighan (of Kernighan and Ritchie fame - authors of the book that specified the original C language): \"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?\"\n",
" \n",
" - example:\n",
" \n",
" # what does this do?\n",
" b = [i for i in a if i > 4] \n",
" \n",
"- Explicitly check for specific values when you expect them (like True, False, None) rather than just putting a variable name in an \"if\". Example:\n",
" \n",
" # If I'm expecting a boolean (True or False), I prefer:\n",
" if boolean_value == True:\n",
" \n",
" # rather than\n",
" if boolean_value:\n",
" \n",
" # same with None\n",
" if object_instance is None:\n",
" \n",
" # rather than\n",
" if object_instance:\n",
"\n",
"- Comment a lot.\n",
"- use human-readable, descriptive variable names.\n",
"\n",
" # so, for a dictionary that maps first names to last names:\n",
" first_to_last_name_dict = { 'Jonathan' : 'Morgan', 'Cliff' : 'Lampe' }\n",
" \n",
" # rather than\n",
" n = { 'Jonathan' : 'Morgan', 'Cliff' : 'Lampe' }\n",
" \n",
" # OR (better, but still not specific enough for me)\n",
" names_dict = { 'Jonathan' : 'Morgan', 'Cliff' : 'Lampe' }\n",
"\n",
"- put spaces around everything (in extraneous white space example above, I prefer the \"No\" style):\n",
"\n",
" - before and after every operator.\n",
" - before and after parentheses, square brackets, and curley brackets.\n",
" - etc.\n",
"\n",
"- always use parentheses to explicitly define order of operations in math and in conditional logic.\n",
"\n",
" # I prefer:\n",
" int_holder = ( 1 + ( 2 * 3 ) ) / 12\n",
" \n",
" # to something like:\n",
" int_holder = ( 1 + 2 * 3 ) / 12\n",
" \n",
" # same with booleans:\n",
" if ( ( x == 12 ) and ( is_cool == True ) ):\n",
" \n",
" # better than:\n",
" if x == 12 and is_cool == True:\n",
"\n",
"- in general, whenever possible, be clear and explicit: for example, always using parenthese in math or boolean logic to explicitly say the order you want. \n",
"- When you do conditionals or loops, if you only have one statement inside the loop or conditional branch, still place it on a separate line and indent it. Don't place the conditional check and the logic all on the same line.\n",
"\n",
"<hr />"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. The software development process"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- [Back to Table of Contents](#Table-of-Contents)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Basic steps in software development"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- **project statements, goals, and requirements** - briefly describe the project, including goals for the project, technical and otherwise, and requirements that the project must meet.\n",
"- **high-level design** - outline all the high level parts of a system, the things that have to do work, what they do (described in sentences), and how they interact.\n",
"- **low-level design and pseudocode** - starts to get down into the details of implementation - identify tasks that need to get done, then arrange them in the order you want them done, then in a separate document or graph, illustrate how all the things work together.\n",
"- **incremental implementation and testing** - for each step in your low-level design:\n",
" - create code\n",
" - test\n",
" - integrate into overall project\n",
" - test overall project to make sure you haven’t broken anything from previous steps.\n",
" - repeat\n",
"- **end-to-end testing**\n",
"\n",
"Notes:\n",
"\n",
"- recursive and iterative process - always can (and should) loop back to a previous step as understanding of project changes, to see how new information affects everything. Also be on the lookout for unknowns and keep a running list of questions you need answered to be able to finish the project.\n",
"- everyone has their own process, and there are lots of methodologies.\n",
" - extreme programming\n",
" - agile development\n",
" - etc.\n",
" - many of these are marketed as revolutions, but most are evolutionary, still trains that are dispatched or routed differently on the tracks of the overall process outlined above.\n",
"\n",
"- most important thing is to have a process other than “work real hard until it is done”.\n",
"\n",
"_More information:_ good wikipedia links for more information on software development methodologies (and to see how easy it is to get lost in the weeds here):\n",
"- [https://en.wikipedia.org/wiki/Software_development_process](https://en.wikipedia.org/wiki/Software_development_process)\n",
"- https://en.wikipedia.org/wiki/Software_development_process](https://en.wikipedia.org/wiki/Software_development_process)\n",
"- [https://en.wikipedia.org/wiki/Agile_software_development](https://en.wikipedia.org/wiki/Agile_software_development)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Enough process to make learning easier"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The formal software development methodology is a lot of work. Here is a pared back plan that can help you manage complexity while you are learning technology, or just working on small projects. Make an implementation plan that contains:\n",
"\n",
"- project description, goals and requirements\n",
"- high level design that makes a list of technical tasks you'll need to do and includes some pseudo code so you start to capture the order in which you'll have to do the tasks to gather your data.\n",
"- as you discover unknowns, holes in your plan, or things you don't know make to implement, make a note of them in your plan and make a separate list of questions that need to be answered.\n",
"\n",
"When doing pseudocode, there are two basic ways to control the flow of a program you should use:\n",
"\n",
"- **loops** - looping through lists of data or lines in a file.\n",
"- **conditionals** - if <some criteria> do one thing <else> do another thing.\n",
"\n",
"Also, as we design, we'll keep an eye out for tasks that look like they might have been common enough that someone would have created a program, package, or library we can use rather than implementing everything ourselves."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Use libraries and packages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- [Back to Table of Contents](#Table-of-Contents)\n",
"\n",
"Part of the reason we are learning to design as well as to code is so that we can figure out early on which parts of the technical work we do might have been done before, and so might be made easier by using an existing library or application.\n",
"\n",
"In python, external code libraries are usually distributed as packages, and can be easily installed and maintained using `pip`, a standard package installer for any python environment.\n",
"\n",
"To use `pip`, you run the `pip` command:\n",
"\n",
" pip\n",
" \n",
"You can use the `pip list` command to see which packages are installed:\n",
"\n",
" pip list\n",
"\n",
"On our server, the python packages are managed by the system administrator. If you need a package and it is not installed on the server, please let Jon know and we'll see what it will take to get it installed (not all packages play nice with each other)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Finding packages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The easiest way to start looking for a package is to do an Internet search for the task you are trying to perform. Make sure to include the keyword \"python\" in your search, and if at first you don't find anything, try reading posts that look like they might be relevant to your work, to get ideas of keywords you can use to refine your search.\n",
"\n",
"Once you find the name of a package you want to install, when using `pip`, you can search the Python Package Index (also known as PyPI - \"pie-pie\") to get more details, including the precise name you enter in the `pip` command to load the package: [https://pypi.python.org/pypi](https://pypi.python.org/pypi). You can also use the `pip` \"search\" command:\n",
"\n",
" pip search \"requests\"\n",
" \n",
"Again, once you find a package you want or need installed on the server, let Jon know and he will work on getting it installed for you."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installing and updating packages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you were in control of packages, to install a package, you'd run:\n",
"\n",
" pip install <package-name>\n",
"\n",
"To see which installed packages can be updated, you can run the command:\n",
"\n",
" pip list -o\n",
" \n",
"To update existing packages, `pip` has an \"--upgrade\" flag you pass to the install command:\n",
"\n",
" pip install --upgrade <package-name>\n",
" \n",
"When you run the `pip install --upgrade` command, you'll not only update the package in question, but also any packages on which it depends.\n",
"\n",
"In general, it is a good idea to keep packages up to date so you get security updates and bug fixes. It is also a good idea to be careful about updating packages, however, especially when you are in the middle of a project. Python package maintainers tend to be pretty good about backward compatibility (so changes don't break existing code), but you never know when a package update might cause a problem."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Importing and using packages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once you've installed a package, you then need to import it in your code before you can use it. There are a couple of ways you can import packages into python:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# bring everything defined in the package into the current name space\n",
"from requests import *\n",
"\n",
"# bring a certain thing defined by the package into the current name space\n",
"# - the get() function\n",
"from requests import get\n",
"\n",
"# add the package, but require that you use the package name to access\n",
"# objects or functions in the package.\n",
"import requests"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, what do we think of these three options?\n",
"\n",
"- The first is not so good. Never pull everything defined in a package into your code. It might overwrite other things with the same name, and later, you have no idea what parts of the package you did or did not use.\n",
"\n",
"- The second is much better. You are specifically only importing what you are using. There is still the potential for name collisions (like in this example, when you have a function named \"get\"), but this is much cleaner, and will help you figure out later what parts of the package you used.\n",
"\n",
"- The third is also good. In this case, whenever you want to reference something in the package, you'd have to prefix the reference with the package name ( `requests.get()` ), and so to find what you used in a given program, you can just search for the package name followed by a period. In the case of long packages names, however, sometimes it is OK to just import the things you'll be using into the name space for readability."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Example: the \"requests\" package"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`requests` is a good example of a package that takes something that can be potentially very complicated and makes it straightforward. `requests` is a package that implements a simple HTTP client (like a web browser, but usable in a python program). HTTP clients are used to interact with HTTP-based APIs like those used to collect tweets from twitter ([https://dev.twitter.com/rest/public](https://dev.twitter.com/rest/public) and [https://dev.twitter.com/streaming/overview](https://dev.twitter.com/streaming/overview)) and reddit ([http://www.reddit.com/dev/api](http://www.reddit.com/dev/api)).\n",
"\n",
"Brief explanation of HTTP:\n",
"\n",
"- HTTP is a request-response protocol. You send a request, the server send you a response. Requests and responses are always paired.\n",
"- both requests and responses are made up of headers and a body. Headers are name-value pairs, just like in a dictionary. The body can contain anything, but generally for a request, it will either be empty or contain data for servicing a request (ever wonder where the data is stored when you submit a form? Often in the body of the request, though sometimes it is added to the URL.), and for a response, it will contain text - HTML, when you request a web page, for example.\n",
"\n",
"`requests` takes care of all the underlying details of what goes into making HTTP requests and receiving HTTP responses such that interacting with web resources becomes pretty easy with it (though it can still be pointy).\n",
"\n",
"To install requests:\n",
"\n",
" pip install requests\n",
" \n",
"Then, to use it to connect to a URL:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Status code = 401\n",
"Content type = application/json; charset=utf-8\n",
"Encoding = utf-8\n",
"Contents of response body = {\"message\":\"Bad credentials\",\"documentation_url\":\"https://developer.github.com/v3\"}\n",
"As JSON object: {'message': 'Bad credentials', 'documentation_url': 'https://developer.github.com/v3'}\n"
]
}
],
"source": [
"# import requests\n",
"import requests\n",
"\n",
"# make a simple GET request\n",
"response = requests.get('https://api.github.com/user', auth=('user', 'pass'))\n",
"\n",
"# check the status code\n",
"print( \"Status code = \" + str( response.status_code ) )\n",
"\n",
"# Header - Content Type?\n",
"print( \"Content type = \" + response.headers['content-type'] )\n",
"\n",
"# Header - Encoding?\n",
"print( \"Encoding = \" + response.encoding )\n",
"\n",
"# Text contained in the body of the response\n",
"print( \"Contents of response body = \" + response.text )\n",
"\n",
"# that is JSON - convert to a JSON object.\n",
"print( \"As JSON object: \" + str(response.json() ) )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### More information:\n",
"\n",
"- `pip` documentation: [https://pip.pypa.io/en/latest/](https://pip.pypa.io/en/latest/)\n",
"- `conda` documentation: [http://conda.pydata.org/docs/](http://conda.pydata.org/docs/)\n",
"- `requests` documentation: [http://docs.python-requests.org/en/latest/](http://docs.python-requests.org/en/latest/)\n",
"\n",
"<hr />"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. Versioning and backups"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- [Back to Table of Contents](#Table-of-Contents)\n",
"\n",
"As your code gets more complicated, you'll want to make sure that you are versioning your code as you make changes. When exactly you commit and push changes is up to you, but you should do it frequently enough that your versioning provides a useful backup. If you version once a year, after 6 months, your old versions will be so old and out of date as to be worthless if you have to revert. A few options are committing changes at regular intervals, committing whenver you complete a task, or (my favorite) committing before big changes to hardware or software so you have a backup in case your changes destroy your computer. Some people aim to never check broken code into a version control system. I personally am looking for a more fine-grained change record, so I check in more frequently, even if something might be broken and needs to be tested.\n",
"\n",
"Our class server is backed up periodically. If you are interested in using version control for your code on our server, please let Jon know and he can help you get something set up."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 8. Long-running processes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- [Back to Table of Contents](#Table-of-Contents)\n",
"\n",
"When you have a long-running data collection or analysis process, where long-running is days, weeks, or even months, there are a few things you should be aware of that make it more likely your long-running process will actually run as long as you want/need it to.\n",
"\n",
"- 1) Power management settings - If you are running this program on your laptop or home computer, remember to change your power management settings so that your computer doesn't go to sleep or hibernate while it is supposed to be running.\n",
"\n",
"- 2) If you are on a Unix machine, you can install and use a program called screen to create a background session on the server that survives even if your connection to the computer breaks (especially useful if you are running the program on a remote computer). To create a screen session, at the unix prompt, type `screen`.\n",
"\n",
" This should open up another shell. To leave the screen session, press:\n",
" \n",
" Ctrl-A + D\n",
" \n",
" To rejoin a screen session:\n",
" \n",
" screen -R\n",
"\n",
"- 3) learn how to use python to send emails (you can find this information in my python_utilities package, in /python_utilities/email/email_helper.py). When you have a long-running process going, it can be a real relief to be able to add code to the process that emails you if the process encounters a problem (or an amazing miracle), or to just have the process email you at intervals to let you know it is still running.\n",
"\n",
"- 4) get a real server. Eventually, you'll want to both work with \"big data\" and be able to close your laptop when you leave the house, for example. At this point, it might be worth considering getting access to a unix/linux server and setting everything up there. There will be a learning curve when you switch to having to work in a command line most of the time, but nothing beats a well-implemented and well-maintained unix/linux server for big data tasks."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 9. DRY and function basics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- [Back to Table of Contents](#Table-of-Contents)\n",
"\n",
"_Based in part on information in [http://introtopython.org/introducing_functions.html](http://introtopython.org/introducing_functions.html)_\n",
"\n",
"DRY stands for \"Don't Repeat Yourself\". The idea here is that if in your program you repeat the exact same code more than once, you should abstract it out so you always call the same code everywhere you use that logic, so you are consistent, and can easily change it everywhere you use it by making changes once.\n",
"\n",
"Code re-use (breaking code out into functions, or perhaps objects so it can be re-used) is a way to make your programs more reliable and easier to maintain. It does have the sometimes unwanted side-effect of breaking more than one thing if you break a heavily shared function or object, but having reusable code that is widely used also means that you will be more likely to catch problems sooner than you would be if the code wasn't re-used.\n",
"\n",
"Here is what a function definition looks like:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def <function_name>( <param_list> ):\n",
"\n",
" '''\n",
" Documentation string that explains what the function does.\n",
" '''\n",
"\n",
" # return reference\n",
" status_OUT = \"Success!\"\n",
"\n",
" # indented stuff that is part of the function. If there is an error, change status_OUT to an error message.\n",
"\n",
" return status_OUT\n",
"\n",
"#-- END function <function_name> --#"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Where:\n",
"\n",
"- `<function_name>` is the name you'll reference when you call your function. Usually a function name is all lower case with underscores between words, just like a variable name. It should be precise and specific, and will often start with a verb and end with the object of the verb. Here, again, \"x\" is not a good function name. Something like \"get_dictionary_value_as_int()\" is better.\n",
"- `<param_list>` can either be empty (so no parameters defined here), or could be one or more names of parameters you want passed to your function in a comma delimited list.\n",
" - you can make parameters optional, as well, but setting a default value in the function declaration.\n",
"- you should always place a documentation string just after the declaration, inside the function. This is a multi-lined comment (surrounded by `'''`) that explains what the function does.\n",
"- if your function returns something, you should define it at the top, and return it at the bottom. Don't fall out of the function in the middle.\n",
"- for variables you declare and use inside a function, declare them at the top of the function and initialize them to their empty state.\n",
"\n",
"Example:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Int value returned = 2\n"
]
}
],
"source": [
"# define function that retrieves a dictionary value as an integer\n",
"def get_dictionary_value_as_int( dict_IN, name_IN, default_IN = -1 ):\n",
" \n",
" '''\n",
" Accepts a dictionary, a name, and a default value to be used if the\n",
" name is not in the dictionary. Returns value for the name, after\n",
" casting it to an integer. If the value can't be converted to an\n",
" integer, throws an exception.\n",
" '''\n",
" \n",
" # return reference\n",
" value_OUT = -1\n",
" \n",
" # is the name present in the dictionary?\n",
" if ( name_IN in dict_IN ):\n",
" \n",
" # retrieve the value\n",
" value_OUT = dict_IN.get( name_IN, default_IN )\n",
" \n",
" # convert to integer\n",
" value_OUT = int( value_OUT )\n",
" \n",
" else:\n",
" \n",
" # not present. Return default.\n",
" value_OUT = default_IN\n",
" \n",
" #-- END check to see if name is in dictionary --#\n",
" \n",
" # return the result.\n",
" return value_OUT\n",
"\n",
"#-- END function get_dictionary_value_as_int() --#\n",
"\n",
"# make a dictionary\n",
"test_dictionary = { \"name1\" : \"value1\", \"name2\" : \"2\", \"name3\" : \"value3\" }\n",
"\n",
"# get the value for name2 as an integer\n",
"int_value = get_dictionary_value_as_int( test_dictionary, \"name2\", -1 )\n",
"\n",
"# print the value\n",
"print( \"Int value returned = \" + str( int_value ) )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Detecting opportunities for re-use"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the following code, see what looks like it might be a candidate for turning into a function:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Our students are currently in alphabetical order.\n",
"Aaron\n",
"Bernice\n",
"Cody\n",
"\n",
"Our students are now in reverse alphabetical order.\n",
"Cody\n",
"Bernice\n",
"Aaron\n"
]
}
],
"source": [
"students = ['bernice', 'aaron', 'cody']\n",
"\n",
"# Put students in alphabetical order.\n",
"students.sort()\n",
"\n",
"# Display the list in its current order.\n",
"print(\"Our students are currently in alphabetical order.\")\n",
"for student in students:\n",
" print(student.title())\n",
"\n",
"# Put students in reverse alphabetical order.\n",
"students.sort(reverse=True)\n",
"\n",
"# Display the list in its current order.\n",
"print(\"\\nOur students are now in reverse alphabetical order.\")\n",
"for student in students:\n",
" print(student.title())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The answer - printing out the list of students! It is exactly the same each time. So, to make it into a function:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Our students are currently in alphabetical order.\n",
"Aaron\n",
"Bernice\n",
"Cody\n",
"\n",
"Our students are now in reverse alphabetical order.\n",
"Cody\n",
"Bernice\n",
"Aaron\n"
]
}
],
"source": [
"def show_students(students, message):\n",
" \n",
" '''\n",
" Print out a message, and then the list of students\n",
" '''\n",
"\n",
" print(message)\n",
" \n",
" # loop over list of students, printing each.\n",
" for student in students:\n",
"\n",
" print( student.title() )\n",
" \n",
" #-- END loop over students. --#\n",
" \n",
"#-- END function show_students() --#\n",
"\n",
"students = ['bernice', 'aaron', 'cody']\n",
"\n",
"# Put students in alphabetical order.\n",
"students.sort()\n",
"show_students( students, \"Our students are currently in alphabetical order.\" )\n",
"\n",
"#Put students in reverse alphabetical order.\n",
"students.sort( reverse=True )\n",
"show_students( students, \"\\nOur students are now in reverse alphabetical order.\" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"More information:\n",
"\n",
"- on functions: [http://introtopython.org/introducing_functions.html](http://introtopython.org/introducing_functions.html)\n",
"- more on functions: [http://introtopython.org/more_functions.html](http://introtopython.org/more_functions.html)\n",
"- Intro. to objects: [http://introtopython.org/classes.html](http://introtopython.org/classes.html)\n",
"\n",
"<hr />"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment