Skip to content

Instantly share code, notes, and snippets.

@ifosch
Last active April 21, 2017 16:42
Show Gist options
  • Save ifosch/4c872c3bea1d430e540aafc981e9225a to your computer and use it in GitHub Desktop.
Save ifosch/4c872c3bea1d430e540aafc981e9225a to your computer and use it in GitHub Desktop.
Basic TDD, Python, and py.test introduction dojo - Weather Kata
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Basic TDD, Python, and py.test introduction dojo - Weather Kata"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## About this session\n",
"\n",
"### For beginners\n",
"### Probably not best approach\n",
"### Show up and practice TDD techniques, basic Python and py.test\n",
"### Avoid going further, no prize!\n",
"### Omit lines starting with `%%`"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"application/javascript": [
"\n",
"IPython.keyboard_manager.command_shortcuts.add_shortcut('9', {\n",
" help: 'Clear all output', // This text will show up on the help page (CTRL-M h or ESC h)\n",
" handler: function (event) { // Function that gets invoked\n",
" if (IPython.notebook.mode == 'command') {\n",
" IPython.notebook.clear_all_output();\n",
" return false;\n",
" }\n",
" return true;\n",
" }\n",
"});"
],
"text/plain": [
"<IPython.core.display.Javascript object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%javascript\n",
"\n",
"IPython.keyboard_manager.command_shortcuts.add_shortcut('9', {\n",
" help: 'Clear all output', // This text will show up on the help page (CTRL-M h or ESC h)\n",
" handler: function (event) { // Function that gets invoked\n",
" if (IPython.notebook.mode == 'command') {\n",
" IPython.notebook.clear_all_output();\n",
" return false;\n",
" }\n",
" return true;\n",
" }\n",
"});"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"rm: *.py*: No such file or directory\n"
]
}
],
"source": [
"%%bash\n",
"rm *.py*"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Software testing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Technique to ensure software meets requirements and expectations\n",
"### Bug reproducibility and future detection\n",
"### Part of the documentation"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Tests must not solve the requirement\n",
"```python\n",
"def addition(number1, number2):\n",
" return number1 + number2\n",
"\n",
"def bad_test_addition():\n",
" assert addition(2,3) == 2 + 3 # Bad test, it contains the actual working code\n",
"\n",
"def test_addition():\n",
" assert addition(2,3) == 5\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Fixtures\n",
"### Flaky tests\n",
"### Slow tests\n",
"### Readability\n",
"### Avoid as much as possible calculations\n",
"### AAA Pattern: Arrange, Act, Assert"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Test Driven Development"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tests before code\n",
"### Useful for documentation\n",
"### Ensures tests existance\n",
"### Helps reducing complexity\n",
"### Red, Green, Refactor cycle"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Dojos"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Coding session with a challenge using techniques or tools\n",
"### Problem is trivial, again there is no prize!\n",
"### Many different kinds, focuses, and constraints"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Have fun\n",
"### Learn something new\n",
"### Practice something different\n",
"### Share knowledge"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Time boxed iterations\n",
"### Discussion\n",
"### Baby steps\n",
"### Pair programming and Red, Green, Refactor"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Final retrospective\n",
"### Kata"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Weather data kata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The goal of this exercise is to get a program reading the `weather.dat` file and printing the day and minimum temperature values for the day with the lowest minimum temperature within the month depicted in the file.\n",
"The program should work like this:\n",
"\n",
" python weather.py\n",
" 9 32"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Contents for the `weather.dat` file are tabular space-separated data for weather measurements for a month in a place.\n",
"The file has a header line, followed by an empty line, each month's day data, and a last line with month's mean values for some of the columns.\n",
"The data lines contain the number of the day of the month, in the first column, and the minimum temperature for this day in the third column."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The contents look like these:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# %load weather.dat\n",
" Dy MxT MnT AvT HDDay AvDP 1HrP TPcpn WxType PDir AvSp Dir MxS SkyC MxR MnR AvSLP\n",
"\n",
" 1 88 59 74 53.8 0.00 F 280 9.6 270 17 1.6 93 23 1004.5\n",
" 2 79 63 71 46.5 0.00 330 8.7 340 23 3.3 70 28 1004.5\n",
" 3 77 55 66 39.6 0.00 350 5.0 350 9 2.8 59 24 1016.8\n",
" 4 77 59 68 51.1 0.00 110 9.1 130 12 8.6 62 40 1021.1\n",
" 5 90 66 78 68.3 0.00 TFH 220 8.3 260 12 6.9 84 55 1014.4\n",
" 6 81 61 71 63.7 0.00 RFH 030 6.2 030 13 9.7 93 60 1012.7\n",
" 7 73 57 65 53.0 0.00 RF 050 9.5 050 17 5.3 90 48 1021.8\n",
" 8 75 54 65 50.0 0.00 FH 160 4.2 150 10 2.6 93 41 1026.3\n",
" 9 86 32* 59 6 61.5 0.00 240 7.6 220 12 6.0 78 46 1018.6\n",
" 10 84 64 74 57.5 0.00 F 210 6.6 050 9 3.4 84 40 1019.0\n",
" 11 91 59 75 66.3 0.00 H 250 7.1 230 12 2.5 93 45 1012.6\n",
" 12 88 73 81 68.7 0.00 RTH 250 8.1 270 21 7.9 94 51 1007.0\n",
" 13 70 59 65 55.0 0.00 H 150 3.0 150 8 10.0 83 59 1012.6\n",
" 14 61 59 60 5 55.9 0.00 RF 060 6.7 080 9 10.0 93 87 1008.6\n",
" 15 64 55 60 5 54.9 0.00 F 040 4.3 200 7 9.6 96 70 1006.1\n",
" 16 79 59 69 56.7 0.00 F 250 7.6 240 21 7.8 87 44 1007.0\n",
" 17 81 57 69 51.7 0.00 T 260 9.1 270 29* 5.2 90 34 1012.5\n",
" 18 82 52 67 52.6 0.00 230 4.0 190 12 5.0 93 34 1021.3\n",
" 19 81 61 71 58.9 0.00 H 250 5.2 230 12 5.3 87 44 1028.5\n",
" 20 84 57 71 58.9 0.00 FH 150 6.3 160 13 3.6 90 43 1032.5\n",
" 21 86 59 73 57.7 0.00 F 240 6.1 250 12 1.0 87 35 1030.7\n",
" 22 90 64 77 61.1 0.00 H 250 6.4 230 9 0.2 78 38 1026.4\n",
" 23 90 68 79 63.1 0.00 H 240 8.3 230 12 0.2 68 42 1021.3\n",
" 24 90 77 84 67.5 0.00 H 350 8.5 010 14 6.9 74 48 1018.2\n",
" 25 90 72 81 61.3 0.00 190 4.9 230 9 5.6 81 29 1019.6\n",
" 26 97* 64 81 70.4 0.00 H 050 5.1 200 12 4.0 107 45 1014.9\n",
" 27 91 72 82 69.7 0.00 RTH 250 12.1 230 17 7.1 90 47 1009.0\n",
" 28 84 68 76 65.6 0.00 RTFH 280 7.6 340 16 7.0 100 51 1011.0\n",
" 29 88 66 77 59.7 0.00 040 5.4 020 9 5.3 84 33 1020.6\n",
" 30 90 45 68 63.6 0.00 H 240 6.0 220 17 4.8 200 41 1022.7\n",
" mo 82.9 60.5 71.7 16 58.8 0.00 6.9 5.3\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Introduction to py.test"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Most powerful and idiomatic unit testing Python library\n",
"### Not included in standard library\n",
"### Uses `assert` statement\n",
"### Runs other unit testing libraries\n",
"### Goodies and plugins"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Writing test_demo.py\n"
]
}
],
"source": [
"%%writefile test_demo.py\n",
"def multiply(number1, number2):\n",
" return number1 + number2\n",
"\n",
"def test_multiply():\n",
" result = multiply(5,6)\n",
" assert result == 30"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 1 items\n",
"\n",
"test_demo.py F\n",
"\n",
"==================================================================================================== FAILURES =====================================================================================================\n",
"__________________________________________________________________________________________________ test_multiply __________________________________________________________________________________________________\n",
"\n",
" def test_multiply():\n",
" result = multiply(5,6)\n",
"> assert result == 30\n",
"E assert 11 == 30\n",
"\n",
"test_demo.py:6: AssertionError\n",
"============================================================================================ 1 failed in 0.01 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test ."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"In this output, py.test shows it caught the file `test_demo.py` and run the function `test_multiply`, since it starts with `test_`, and it print an `F` close to the file name, indicating it just contained one test function that failed to pass.\n",
"It also shows what was the assertion with the value got from the Act part of the test, and the fixture expected to be.\n",
"It also shows it got an `AssertionError`, meaning the assert statement could not pass.\n",
"If the test run would failed for any other reason, i.e. any kind of error or exception, it would be there as well."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting test_demo.py\n"
]
}
],
"source": [
"%%writefile test_demo.py\n",
"def multiply(number1, number2):\n",
" return number1 * number2\n",
"\n",
"def test_multiply():\n",
" result = multiply(5,6)\n",
" assert result == 30"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 1 items\n",
"\n",
"test_demo.py .\n",
"\n",
"============================================================================================ 1 passed in 0.01 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test ."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"%%bash\n",
"rm test_demo.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Bootstrapping the solution with py.test"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"```python\n",
"import weather\n",
"\n",
"def test_process_weather():\n",
" weather.process()\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Writing test_weather.py\n"
]
}
],
"source": [
"%%writefile test_weather.py\n",
"import weather\n",
"\n",
"def test_process_weather():\n",
" weather.process()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 0 items / 1 errors\n",
"\n",
"===================================================================================================== ERRORS ======================================================================================================\n",
"________________________________________________________________________________________ ERROR collecting test_weather.py _________________________________________________________________________________________\n",
"test_weather.py:1: in <module>\n",
" import weather\n",
"E ImportError: No module named 'weather'\n",
"============================================================================================= 1 error in 0.01 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test test_weather.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"```python\n",
"def process():\n",
" pass\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Writing weather.py\n"
]
}
],
"source": [
"%%writefile weather.py\n",
"def process():\n",
" pass"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 1 items\n",
"\n",
"test_weather.py .\n",
"\n",
"============================================================================================ 1 passed in 0.01 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test test_weather.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Files with Python (I)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `open(filename[, mode[, buffering]])` returns file object\n",
"### `file.close()` should be done manually"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```python\n",
" myfile = open('myfile.dat')\n",
" ...\n",
" myfile.close()\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Read all lines in file with `file.readlines()`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```python\n",
" myfile = open('myfile.dat')\n",
" mylines = file.readlines()\n",
" myfile.close()\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Printing strings with Python"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Hello\n"
]
}
],
"source": [
"print(\"Hello\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Take care with the hidden new line"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Use `string.format()` for string formatting"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Hello pythonist\n",
"Hello nice pythonist\n"
]
}
],
"source": [
"name = 'pythonist'\n",
"print(\"Hello {}\".format(name))\n",
"adjective = \"nice\"\n",
"print(\"Hello {} {}\".format(adjective, name))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Capturing output with py.test"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `capsys` fixture"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```python\n",
" def test_demo(capsys):\n",
" print(\"Hello\")\n",
" out, err = capsys.readouterr()\n",
" assert out == \"Hello\\n\"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Lists with Python"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[]\n",
"[1, 2, 3]\n"
]
}
],
"source": [
"a = []\n",
"b = [1, 2, 3,]\n",
"print(a)\n",
"print(b)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0\n",
"3\n"
]
}
],
"source": [
"print(len(a))\n",
"print(len(b))"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n",
"[2]\n",
"[2, 3]\n",
"[1, 3]\n",
"[3, 2, 1]\n"
]
}
],
"source": [
"print(b[0])\n",
"print(b[1:2])\n",
"print(b[1:])\n",
"print(b[::2])\n",
"print(b[::-1])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Strings with Python (I)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Hello\n",
"0\n",
"5\n",
"Hlo\n"
]
}
],
"source": [
"a = \"\"\n",
"b = \"Hello\"\n",
"print(a)\n",
"print(b)\n",
"print(len(a))\n",
"print(len(b))\n",
"print(b[::2])"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Hello\n"
]
}
],
"source": [
"if b.startswith(\"H\"):\n",
" print(b)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['1', '2', '3']\n",
"['Hello', 'Bye']\n"
]
}
],
"source": [
"print(\"1-2-3\".split(\"-\"))\n",
"print(\"Hello123Bye\".split(\"123\"))"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'1-2-3'"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\"-\".join([\"1\", \"2\", \"3\"])"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This ia a line starting with a space\n",
" line \n",
"line\n"
]
}
],
"source": [
"print(\" This ia a line starting with a space\\n\".strip())\n",
"print(\"New line New\".strip(\"New\"))\n",
"print(\"New line New\".strip(\"New\").strip())"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# First iteration"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The first iteration will focus on load data lines from the file.\n",
"The approach chosen is pretty simple, just read lines from the file, and print them all.\n",
"\n",
"**This should take not more than 40 minutes.**"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"So, the first test checks the function prints the content of the file stripping lines, when called.\n",
"In order to do that check the output of the script with py.test, the `capsys` plugin is being used, which allows the test environment to keep standard output and error in memory, and enables the test to check these afterwards:"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"```python\n",
"import weather\n",
"\n",
"def test_process_weather(capsys):\n",
" weather.process()\n",
" out, err = capsys.readouterr()\n",
" assert out.startswith(\"Dy\")\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting test_weather.py\n"
]
}
],
"source": [
"%%writefile test_weather.py\n",
"import weather\n",
"\n",
"def test_process_weather(capsys):\n",
" weather.process()\n",
" out, err = capsys.readouterr()\n",
" assert out.startswith(\"Dy\")"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 1 items\n",
"\n",
"test_weather.py F\n",
"\n",
"==================================================================================================== FAILURES =====================================================================================================\n",
"______________________________________________________________________________________________ test_process_weather _______________________________________________________________________________________________\n",
"\n",
"capsys = <_pytest.capture.CaptureFixture object at 0x104fca710>\n",
"\n",
" def test_process_weather(capsys):\n",
" weather.process()\n",
" out, err = capsys.readouterr()\n",
"> assert out.startswith(\"Dy\")\n",
"E assert <built-in method startswith of str object at 0x1003acab0>('Dy')\n",
"E + where <built-in method startswith of str object at 0x1003acab0> = ''.startswith\n",
"\n",
"test_weather.py:6: AssertionError\n",
"============================================================================================ 1 failed in 0.01 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test test_weather.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"```python\n",
"def process():\n",
" print(\"Dy\")\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting weather.py\n"
]
}
],
"source": [
"%%writefile weather.py\n",
"def process():\n",
" print(\"Dy\")"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 1 items\n",
"\n",
"test_weather.py .\n",
"\n",
"============================================================================================ 1 passed in 0.01 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test test_weather.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"```python\n",
"import weather\n",
"\n",
"def test_process_weather(capsys):\n",
" weather.process()\n",
" out, err = capsys.readouterr()\n",
" assert out.startswith(\"Dy\")\n",
" assert len(out.split(\"\\n\")) > 2\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting test_weather.py\n"
]
}
],
"source": [
"%%writefile test_weather.py\n",
"import weather\n",
"\n",
"def test_process_weather(capsys):\n",
" weather.process()\n",
" out, err = capsys.readouterr()\n",
" assert out.startswith(\"Dy\")\n",
" assert len(out.split(\"\\n\")) > 2"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 1 items\n",
"\n",
"test_weather.py F\n",
"\n",
"==================================================================================================== FAILURES =====================================================================================================\n",
"______________________________________________________________________________________________ test_process_weather _______________________________________________________________________________________________\n",
"\n",
"capsys = <_pytest.capture.CaptureFixture object at 0x10597e438>\n",
"\n",
" def test_process_weather(capsys):\n",
" weather.process()\n",
" out, err = capsys.readouterr()\n",
" assert out.startswith(\"Dy\")\n",
"> assert len(out.split(\"\\n\")) > 2\n",
"E assert 2 > 2\n",
"E + where 2 = len(['Dy', ''])\n",
"E + where ['Dy', ''] = <built-in method split of str object at 0x105977ab0>('\\n')\n",
"E + where <built-in method split of str object at 0x105977ab0> = 'Dy\\n'.split\n",
"\n",
"test_weather.py:7: AssertionError\n",
"============================================================================================ 1 failed in 0.01 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test test_weather.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Some baby steps after"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"```python\n",
"def process():\n",
" data_file = open('weather.dat')\n",
" print(\"\".join(data_file.readlines()).strip())\n",
" data_file.close()\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting weather.py\n"
]
}
],
"source": [
"%%writefile weather.py\n",
"def process():\n",
" data_file = open('weather.dat')\n",
" print(\"\".join(data_file.readlines()).strip())\n",
" data_file.close()"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 1 items\n",
"\n",
"test_weather.py .\n",
"\n",
"============================================================================================ 1 passed in 0.01 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test test_weather.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Regular expressions in Python (I)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Regular expressions technique for pattern matching and data extraction\n",
"### Automata\n",
"### Regular expression as pattern string"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### `[a-z]`\n",
"### `[a-zA-Z]*`\n",
"### `[0-9]+ [a-z]*`\n",
"### `.*`"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### In Python, regular expression strings are differentiated by enclosing them with `r\"\"`, like in `r\"[a-z]*\"`.\n",
"### `re.match(pattern, string)` and `re.search(pattern, string)`"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"It matches with search!\n"
]
}
],
"source": [
"import re\n",
"\n",
"a = \"this is 1 2 3 4 500\"\n",
"regexp = r\"[0-9 ]+\"\n",
"\n",
"match = re.search(regexp, a)\n",
"if match:\n",
" print(\"It matches with search!\")\n",
"match = re.match(regexp, a)\n",
"if match:\n",
" print(\"It matches with match!\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Second iteration"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this second iteration, the program should be able to skip header and empty lines, printing only data lines.\n",
"\n",
"**This should be easily completed in 20 minutes.**"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"As last requirement for this iteration, no headers or empty lines should be printed:"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"```python\n",
"import weather\n",
"\n",
"def test_process_weather(capsys):\n",
" weather.process()\n",
" out, err = capsys.readouterr()\n",
" assert out.startswith(\"1\")\n",
" assert len(out.split(\"\\n\")) > 2\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting test_weather.py\n"
]
}
],
"source": [
"%%writefile test_weather.py\n",
"import weather\n",
"\n",
"def test_process_weather(capsys):\n",
" weather.process()\n",
" out, err = capsys.readouterr()\n",
" assert out.startswith(\"1\")\n",
" assert len(out.split(\"\\n\")) > 2"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 1 items\n",
"\n",
"test_weather.py F\n",
"\n",
"==================================================================================================== FAILURES =====================================================================================================\n",
"______________________________________________________________________________________________ test_process_weather _______________________________________________________________________________________________\n",
"\n",
"capsys = <_pytest.capture.CaptureFixture object at 0x105952908>\n",
"\n",
" def test_process_weather(capsys):\n",
" weather.process()\n",
" out, err = capsys.readouterr()\n",
"> assert out.startswith(\"1\")\n",
"E assert <built-in method startswith of str object at 0x104890c00>('1')\n",
"E + where <built-in method startswith of str object at 0x104890c00> = 'Dy MxT MnT AvT HDDay AvDP 1HrP TPcpn WxType PDir AvSp Dir MxS SkyC MxR MnR AvSLP\\n\\n 1 88 59 74 ... 240 6.0 220 17 4.8 200 41 1022.7\\n mo 82.9 60.5 71.7 16 58.8 0.00 6.9 5.3\\n'.startswith\n",
"\n",
"test_weather.py:6: AssertionError\n",
"============================================================================================ 1 failed in 0.01 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test test_weather.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"```python\n",
"import re\n",
"\n",
"def process():\n",
" data_file = open('weather.dat')\n",
" pattern = r\"[0-9]+.*\"\n",
" for line in data_file.readlines():\n",
" match = re.match(pattern, line.strip())\n",
" if match:\n",
" print(line.strip())\n",
" data_file.close()\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting weather.py\n"
]
}
],
"source": [
"%%writefile weather.py\n",
"import re\n",
"\n",
"def process():\n",
" data_file = open('weather.dat')\n",
" pattern = r\"[0-9]+.*\"\n",
" for line in data_file.readlines():\n",
" match = re.match(pattern, line.strip())\n",
" if match:\n",
" print(line.strip())\n",
" data_file.close()"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false,
"scrolled": true,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 1 items\n",
"\n",
"test_weather.py .\n",
"\n",
"============================================================================================ 1 passed in 0.00 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test test_weather.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Regular expressions in Python (II)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import re"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Grouping: `()`\n",
"### `(?P<id>any_regex)`\n",
"### `\\s`"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Guido from Netherlands was born in 1956\n"
]
}
],
"source": [
"a = \"Name: Guido, Country: Netherlands, Year of birth: 1956\"\n",
"pattern = r\"Name: (?P<name>[a-zA-Z]+), \"\n",
"pattern += r\"Country: (?P<country>[a-zA-Z]+), \"\n",
"pattern += r\"Year of birth: (?P<year>[0-9]+)\"\n",
"match = re.match(pattern, a)\n",
"if match:\n",
" print(\"{} from {} was born in {}\".format(\n",
" match.group('name'),\n",
" match.group('country'),\n",
" match.group('year')))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Third iteration"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The third iteration should allow to get data using grouping in regular expressions, and finishing by reducing the numer of lines to the correct output, i.e. day and minimum temperature for the day with minimum temperature.\n",
"\n",
"**This shouldn't take more than 15 minutes**"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"```python\n",
"import weather\n",
"\n",
"def test_process_weather(capsys):\n",
" weather.process()\n",
" out, err = capsys.readouterr()\n",
" assert out.startswith(\"1\")\n",
" output_lines = out.split(\"\\n\")\n",
" assert len(output_lines) > 2\n",
" assert output_lines[0] == \"1 59\"\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting test_weather.py\n"
]
}
],
"source": [
"%%writefile test_weather.py\n",
"import weather\n",
"\n",
"def test_process_weather(capsys):\n",
" weather.process()\n",
" out, err = capsys.readouterr()\n",
" assert out.startswith(\"1\")\n",
" output_lines = out.split(\"\\n\")\n",
" assert len(output_lines) > 2\n",
" assert output_lines[0] == \"1 59\""
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 1 items\n",
"\n",
"test_weather.py F\n",
"\n",
"==================================================================================================== FAILURES =====================================================================================================\n",
"______________________________________________________________________________________________ test_process_weather _______________________________________________________________________________________________\n",
"\n",
"capsys = <_pytest.capture.CaptureFixture object at 0x105fec080>\n",
"\n",
" def test_process_weather(capsys):\n",
" weather.process()\n",
" out, err = capsys.readouterr()\n",
" assert out.startswith(\"1\")\n",
" output_lines = out.split(\"\\n\")\n",
" assert len(output_lines) > 2\n",
"> assert output_lines[0] == \"1 59\"\n",
"E assert '1 88 59 ... 93 23 1004.5' == '1 59'\n",
"E - 1 88 59 74 53.8 0.00 F 280 9.6 270 17 1.6 93 23 1004.5\n",
"E + 1 59\n",
"\n",
"test_weather.py:9: AssertionError\n",
"============================================================================================ 1 failed in 0.02 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test test_weather.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"```python\n",
"import re\n",
"\n",
"def process():\n",
" data_file = open('weather.dat')\n",
" pattern = r\"(?P<day>[0-9]+)\\s+[0-9]+\\s+(?P<min>[0-9]+).*\"\n",
" for line in data_file.readlines():\n",
" match = re.match(pattern, line.strip())\n",
" if match:\n",
" print(\"{} {}\".format(match.group('day'), match.group('min')))\n",
" data_file.close()\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting weather.py\n"
]
}
],
"source": [
"%%writefile weather.py\n",
"import re\n",
"\n",
"def process():\n",
" data_file = open('weather.dat')\n",
" pattern = r\"(?P<day>[0-9]+)\\s+[0-9]+\\s+(?P<min>[0-9]+).*\"\n",
" for line in data_file.readlines():\n",
" match = re.match(pattern, line.strip())\n",
" if match:\n",
" print(\"{} {}\".format(match.group('day'), match.group('min')))\n",
" data_file.close()"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 1 items\n",
"\n",
"test_weather.py .\n",
"\n",
"============================================================================================ 1 passed in 0.01 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test test_weather.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"```python\n",
"import weather\n",
"\n",
"def test_process_weather(capsys):\n",
" weather.process()\n",
" out, err = capsys.readouterr()\n",
" assert out == \"9 32\\n\"\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting test_weather.py\n"
]
}
],
"source": [
"%%writefile test_weather.py\n",
"import weather\n",
"\n",
"def test_process_weather(capsys):\n",
" weather.process()\n",
" out, err = capsys.readouterr()\n",
" assert out == \"9 32\\n\""
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 1 items\n",
"\n",
"test_weather.py F\n",
"\n",
"==================================================================================================== FAILURES =====================================================================================================\n",
"______________________________________________________________________________________________ test_process_weather _______________________________________________________________________________________________\n",
"\n",
"capsys = <_pytest.capture.CaptureFixture object at 0x10578a940>\n",
"\n",
" def test_process_weather(capsys):\n",
" weather.process()\n",
" out, err = capsys.readouterr()\n",
"> assert out == \"9 32\\n\"\n",
"E assert '1 59\\n2 63\\n...9 66\\n30 45\\n' == '9 32\\n'\n",
"E - 1 59\n",
"E - 2 63\n",
"E - 3 55\n",
"E - 4 59\n",
"E - 5 66\n",
"E - 6 61\n",
"E - 7 57\n",
"E - 8 54\n",
"E 9 32\n",
"E - 10 64\n",
"E - 11 59\n",
"E - 12 73\n",
"E - 13 59\n",
"E - 14 59\n",
"E - 15 55\n",
"E - 16 59\n",
"E - 17 57\n",
"E - 18 52\n",
"E - 19 61\n",
"E - 20 57\n",
"E - 21 59\n",
"E - 22 64\n",
"E - 23 68\n",
"E - 24 77\n",
"E - 25 72\n",
"E - 27 72\n",
"E - 28 68\n",
"E - 29 66\n",
"E - 30 45\n",
"\n",
"test_weather.py:6: AssertionError\n",
"============================================================================================ 1 failed in 0.01 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test test_weather.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"```python\n",
"import re\n",
"\n",
"def process():\n",
" data_file = open('weather.dat')\n",
" pattern = r\"(?P<day>[0-9]+)\\s+[0-9]+\\s+(?P<min>[0-9]+).*\"\n",
" day = 0\n",
" temp = 1000\n",
" for line in data_file.readlines():\n",
" match = re.match(pattern, line.strip())\n",
" if match:\n",
" if int(match.group('min')) < temp:\n",
" day = match.group('day')\n",
" temp = int(match.group('min'))\n",
" print(\"{} {}\".format(day, temp))\n",
" data_file.close()\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting weather.py\n"
]
}
],
"source": [
"%%writefile weather.py\n",
"import re\n",
"\n",
"def process():\n",
" data_file = open('weather.dat')\n",
" pattern = r\"(?P<day>[0-9]+)\\s+[0-9]+\\s+(?P<min>[0-9]+).*\"\n",
" day = 0\n",
" temp = 1000\n",
" for line in data_file.readlines():\n",
" match = re.match(pattern, line.strip())\n",
" if match:\n",
" if int(match.group('min')) < temp:\n",
" day = match.group('day')\n",
" temp = int(match.group('min'))\n",
" print(\"{} {}\".format(day, temp))\n",
" data_file.close()"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 1 items\n",
"\n",
"test_weather.py .\n",
"\n",
"============================================================================================ 1 passed in 0.01 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test test_weather.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Files with Python (II)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Better way for opening and closing files\n",
"```python\n",
" with open('myfile') as data_file:\n",
" ...\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# First refactor"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the last iteration, the code should be easily modified to be more idiomatic and efficient when opening and closing the file, without causing the tests to fail.\n",
"\n",
"**This should be accomplished in 10 minutes.**"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"```python\n",
"import re\n",
"\n",
"def process():\n",
" pattern = r\"(?P<day>[0-9]+)\\s+[0-9]+\\s+(?P<min>[0-9]+).*\"\n",
" day = 0\n",
" temp = 1000\n",
" with open('weather.dat') as data_file:\n",
" for data_file in data_file.readlines():\n",
" match = re.match(pattern, line.strip())\n",
" if match:\n",
" if int(match.group('min')) < temp:\n",
" day = match.group('day')\n",
" temp = int(match.group('min'))\n",
" print(\"{} {}\".format(day, temp))\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting weather.py\n"
]
}
],
"source": [
"%%writefile weather.py\n",
"import re\n",
"\n",
"def process():\n",
" pattern = r\"(?P<day>[0-9]+)\\s+[0-9]+\\s+(?P<min>[0-9]+).*\"\n",
" day = 0\n",
" temp = 1000\n",
" with open('weather.dat') as data_file:\n",
" for line in data_file.readlines():\n",
" match = re.match(pattern, line.strip())\n",
" if match:\n",
" if int(match.group('min')) < temp:\n",
" day = match.group('day')\n",
" temp = int(match.group('min'))\n",
" print(\"{} {}\".format(day, temp))"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 1 items\n",
"\n",
"test_weather.py .\n",
"\n",
"============================================================================================ 1 passed in 0.01 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test test_weather.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Files with Python (III)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Use the file object to iterate over lines, instead of `readlines()`\n",
"```python\n",
" for line in data_file:\n",
" ...\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Second refactor"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The way the lines are being read must be refactored to ensure it is memory efficient and faster.\n",
"\n",
"**This should not take more than 10 minutes.**"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"```python\n",
"import re\n",
"\n",
"def process():\n",
" pattern = r\"(?P<day>[0-9]+)\\s+[0-9]+\\s+(?P<min>[0-9]+).*\"\n",
" day = 0\n",
" temp = 1000\n",
" with open('weather.dat') as file:\n",
" for line in file:\n",
" match = re.match(pattern, line.strip())\n",
" if match:\n",
" if int(match.group('min')) < temp:\n",
" day = match.group('day')\n",
" temp = int(match.group('min'))\n",
" print(\"{} {}\".format(day, temp))\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting weather.py\n"
]
}
],
"source": [
"%%writefile weather.py\n",
"import re\n",
"\n",
"def process():\n",
" pattern = r\"(?P<day>[0-9]+)\\s+[0-9]+\\s+(?P<min>[0-9]+).*\"\n",
" day = 0\n",
" temp = 1000\n",
" with open('weather.dat') as file:\n",
" for line in file:\n",
" match = re.match(pattern, line.strip())\n",
" if match:\n",
" if int(match.group('min')) < temp:\n",
" day = match.group('day')\n",
" temp = int(match.group('min'))\n",
" print(\"{} {}\".format(day, temp))"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 1 items\n",
"\n",
"test_weather.py .\n",
"\n",
"============================================================================================ 1 passed in 0.01 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test test_weather.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Strings with Python (II)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'Replace strings'"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\"REplace strings---\".replace(\"E\", \"e\").replace(\"-\", \"\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Exceptions in Python"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `try ... except` construct:"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Operation is not valid\n"
]
}
],
"source": [
"try:\n",
" a = 1 / 0\n",
"except ZeroDivisionError:\n",
" print(\"Operation is not valid\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### `ValueError`"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"10\n"
]
},
{
"ename": "ValueError",
"evalue": "invalid literal for int() with base 10: 'Hello'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-49-eda733313bc1>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mmy_string\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"Hello\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmy_integer_string\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmy_string\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mValueError\u001b[0m: invalid literal for int() with base 10: 'Hello'"
]
}
],
"source": [
"my_integer_string = \"10\"\n",
"my_string = \"Hello\"\n",
"print(int(my_integer_string))\n",
"print(int(my_string))"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"my_string doesn't represent an integer\n"
]
}
],
"source": [
"try:\n",
" print(int(my_string))\n",
"except ValueError:\n",
" print(\"my_string doesn't represent an integer\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Test benchmarking"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Benchmarking is measuring and metricking software"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### `benchmark` fixture from pytest-benchmark package"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Writing test_benchmark.py\n"
]
}
],
"source": [
"%%writefile test_benchmark.py\n",
"import time\n",
"\n",
"def variable_time_function(seconds=0.001):\n",
" time.sleep(seconds)\n",
" return 123\n",
"\n",
"def test_variable_time_function(benchmark):\n",
" result = benchmark(variable_time_function)\n",
" assert result == 123"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 1 items\n",
"\n",
"test_benchmark.py .\n",
"\n",
"\n",
"\r",
"Computing stats ...\r",
"Computing stats ... group 1/1\r",
"Computing stats ... group 1/1: min\r",
"Computing stats ... group 1/1: min (1/1)\r",
"Computing stats ... group 1/1: min (1/1)\r",
"Computing stats ... group 1/1: max\r",
"Computing stats ... group 1/1: max (1/1)\r",
"Computing stats ... group 1/1: max (1/1)\r",
"Computing stats ... group 1/1: mean\r",
"Computing stats ... group 1/1: mean (1/1)\r",
"Computing stats ... group 1/1: mean (1/1)\r",
"Computing stats ... group 1/1: median\r",
"Computing stats ... group 1/1: median (1/1)\r",
"Computing stats ... group 1/1: median (1/1)\r",
"Computing stats ... group 1/1: iqr\r",
"Computing stats ... group 1/1: iqr (1/1)\r",
"Computing stats ... group 1/1: iqr (1/1)\r",
"Computing stats ... group 1/1: stddev\r",
"Computing stats ... group 1/1: stddev (1/1)\r",
"Computing stats ... group 1/1: stddev (1/1)\r",
"Computing stats ... group 1/1: stddev: outliers\r",
"Computing stats ... group 1/1: stddev: outliers (1/1)\r",
"Computing stats ... group 1/1: stddev: rounds\r",
"Computing stats ... group 1/1: stddev: rounds (1/1)\r",
"Computing stats ... group 1/1: stddev: iterations\r",
"Computing stats ... group 1/1: stddev: iterations (1/1)\r",
"---------------------------------------------- benchmark: 1 tests ---------------------------------------------\n",
"Name (time in ms) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations\n",
"---------------------------------------------------------------------------------------------------------------\n",
"test_variable_time_function 1.0096 2.3916 1.2002 0.1285 1.2021 0.1932 252;5 782 1\n",
"---------------------------------------------------------------------------------------------------------------\n",
"\n",
"(*) Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.\n",
"============================================================================================ 1 passed in 1.97 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test test_benchmark.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### `benchmark` fixture doesn't mix well with capsys!!!!"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"%%bash\n",
"rm test_benchmark.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Third refactor"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The usage of `re` module might be overkill for splitting the data in the lines as required for this case.\n",
"In this refactor the usage of the py.test benchmark plugin can help to view possible time optimization.\n",
"\n",
"Some pitfalls this change might imply are:\n",
"* Some numbers in the columns are marked with an `*`.\n",
"* Watch out with header and empty lines.\n",
"\n",
"**This should take not more than 5 minutes.**"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"```python\n",
"import weather\n",
"\n",
"def test_process_weather(capsys):\n",
" weather.process()\n",
" out, err = capsys.readouterr()\n",
" assert out == \"9 32\\n\"\n",
"\n",
"def test_benchmark_process_weather(benchmark):\n",
" benchmark(weather.process)\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting test_weather.py\n"
]
}
],
"source": [
"%%writefile test_weather.py\n",
"import weather\n",
"\n",
"def test_process_weather(capsys):\n",
" weather.process()\n",
" out, err = capsys.readouterr()\n",
" assert out == \"9 32\\n\"\n",
"\n",
"def test_benchmark_process_weather(benchmark):\n",
" benchmark(weather.process)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 2 items\n",
"\n",
"test_weather.py ..\n",
"\n",
"\n",
"\r",
"Computing stats ...\r",
"Computing stats ... group 1/1\r",
"Computing stats ... group 1/1: min\r",
"Computing stats ... group 1/1: min (1/1)\r",
"Computing stats ... group 1/1: min (1/1)\r",
"Computing stats ... group 1/1: max\r",
"Computing stats ... group 1/1: max (1/1)\r",
"Computing stats ... group 1/1: max (1/1)\r",
"Computing stats ... group 1/1: mean\r",
"Computing stats ... group 1/1: mean (1/1)\r",
"Computing stats ... group 1/1: mean (1/1)\r",
"Computing stats ... group 1/1: median\r",
"Computing stats ... group 1/1: median (1/1)\r",
"Computing stats ... group 1/1: median (1/1)\r",
"Computing stats ... group 1/1: iqr\r",
"Computing stats ... group 1/1: iqr (1/1)\r",
"Computing stats ... group 1/1: iqr (1/1)\r",
"Computing stats ... group 1/1: stddev\r",
"Computing stats ... group 1/1: stddev (1/1)\r",
"Computing stats ... group 1/1: stddev (1/1)\r",
"Computing stats ... group 1/1: stddev: outliers\r",
"Computing stats ... group 1/1: stddev: outliers (1/1)\r",
"Computing stats ... group 1/1: stddev: rounds\r",
"Computing stats ... group 1/1: stddev: rounds (1/1)\r",
"Computing stats ... group 1/1: stddev: iterations\r",
"Computing stats ... group 1/1: stddev: iterations (1/1)\r",
"---------------------------------------------------- benchmark: 1 tests ---------------------------------------------------\n",
"Name (time in us) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations\n",
"---------------------------------------------------------------------------------------------------------------------------\n",
"test_benchmark_process_weather 98.1162 841.9789 127.9027 47.7033 106.5750 31.2938 312;301 2824 1\n",
"---------------------------------------------------------------------------------------------------------------------------\n",
"\n",
"(*) Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.\n",
"============================================================================================ 2 passed in 1.40 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test test_weather.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"```python\n",
"def process():\n",
" day = 0\n",
" temp = 1000\n",
" with open('weather.dat') as file:\n",
" for line in file:\n",
" columns = line.replace(\"*\", \"\").split()\n",
" try:\n",
" if len(columns) > 0 and int(columns[2]) < temp:\n",
" day = columns[0]\n",
" temp = int(columns[2])\n",
" except ValueError:\n",
" pass\n",
" print(\"{} {}\".format(day, temp))\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Some pitfalls\n",
"### Some numbers marked with an `*`\n",
"### Check if `columns` is not an empty string.\n",
"### Accessor from match object replaced by columns location\n",
"### ValueError exception needs to be captured for lines with string columns, like the header one."
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting weather.py\n"
]
}
],
"source": [
"%%writefile weather.py\n",
"def process():\n",
" day = 0\n",
" temp = 1000\n",
" with open('weather.dat') as file:\n",
" for line in file:\n",
" columns = line.replace(\"*\", \"\").split()\n",
" try:\n",
" if len(columns) > 0 and int(columns[2]) < temp:\n",
" day = columns[0]\n",
" temp = int(columns[2])\n",
" except ValueError:\n",
" pass\n",
" print(\"{} {}\".format(day, temp))"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============================================================================================== test session starts ===============================================================================================\n",
"platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
"benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
"rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
"plugins: benchmark-3.0.0\n",
"collected 2 items\n",
"\n",
"test_weather.py ..\n",
"\n",
"\n",
"\r",
"Computing stats ...\r",
"Computing stats ... group 1/1\r",
"Computing stats ... group 1/1: min\r",
"Computing stats ... group 1/1: min (1/1)\r",
"Computing stats ... group 1/1: min (1/1)\r",
"Computing stats ... group 1/1: max\r",
"Computing stats ... group 1/1: max (1/1)\r",
"Computing stats ... group 1/1: max (1/1)\r",
"Computing stats ... group 1/1: mean\r",
"Computing stats ... group 1/1: mean (1/1)\r",
"Computing stats ... group 1/1: mean (1/1)\r",
"Computing stats ... group 1/1: median\r",
"Computing stats ... group 1/1: median (1/1)\r",
"Computing stats ... group 1/1: median (1/1)\r",
"Computing stats ... group 1/1: iqr\r",
"Computing stats ... group 1/1: iqr (1/1)\r",
"Computing stats ... group 1/1: iqr (1/1)\r",
"Computing stats ... group 1/1: stddev\r",
"Computing stats ... group 1/1: stddev (1/1)\r",
"Computing stats ... group 1/1: stddev (1/1)\r",
"Computing stats ... group 1/1: stddev: outliers\r",
"Computing stats ... group 1/1: stddev: outliers (1/1)\r",
"Computing stats ... group 1/1: stddev: rounds\r",
"Computing stats ... group 1/1: stddev: rounds (1/1)\r",
"Computing stats ... group 1/1: stddev: iterations\r",
"Computing stats ... group 1/1: stddev: iterations (1/1)\r",
"---------------------------------------------------- benchmark: 1 tests ---------------------------------------------------\n",
"Name (time in us) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations\n",
"---------------------------------------------------------------------------------------------------------------------------\n",
"test_benchmark_process_weather 98.2578 590.2671 126.3224 38.2697 111.1568 29.9662 444;350 4184 1\n",
"---------------------------------------------------------------------------------------------------------------------------\n",
"\n",
"(*) Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.\n",
"============================================================================================ 2 passed in 1.58 seconds =============================================================================================\n"
]
}
],
"source": [
"%%bash\n",
"py.test test_weather.py"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment