ifosch/1. Weather Solved - Slides.ipynb

## 1. Weather Solved - Slides.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Basic TDD, Python, and py.test introduction dojo - Weather Kata"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## About this session\n",
    "\n",
    "### For beginners\n",
    "### Probably not best approach\n",
    "### Show up and practice TDD techniques, basic Python and py.test\n",
    "### Avoid going further, no prize!\n",
    "### Omit lines starting with `%%`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "data": {
      "application/javascript": [
       "\n",
       "IPython.keyboard_manager.command_shortcuts.add_shortcut('9', {\n",
       "    help: 'Clear all output',               // This text will show up on the help page (CTRL-M h or ESC h)\n",
       "    handler: function (event) {             // Function that gets invoked\n",
       "        if (IPython.notebook.mode == 'command') {\n",
       "            IPython.notebook.clear_all_output();\n",
       "            return false;\n",
       "        }\n",
       "        return true;\n",
       "    }\n",
       "});"
      ],
      "text/plain": [
       "<IPython.core.display.Javascript object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%javascript\n",
    "\n",
    "IPython.keyboard_manager.command_shortcuts.add_shortcut('9', {\n",
    "    help: 'Clear all output',               // This text will show up on the help page (CTRL-M h or ESC h)\n",
    "    handler: function (event) {             // Function that gets invoked\n",
    "        if (IPython.notebook.mode == 'command') {\n",
    "            IPython.notebook.clear_all_output();\n",
    "            return false;\n",
    "        }\n",
    "        return true;\n",
    "    }\n",
    "});"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "rm: *.py*: No such file or directory\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "rm *.py*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Software testing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Technique to ensure software meets requirements and expectations\n",
    "### Bug reproducibility and future detection\n",
    "### Part of the documentation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Tests must not solve the requirement\n",
    "```python\n",
    "def addition(number1, number2):\n",
    "    return number1 + number2\n",
    "\n",
    "def bad_test_addition():\n",
    "    assert addition(2,3) == 2 + 3 # Bad test, it contains the actual working code\n",
    "\n",
    "def test_addition():\n",
    "    assert addition(2,3) == 5\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Fixtures\n",
    "### Flaky tests\n",
    "### Slow tests\n",
    "### Readability\n",
    "### Avoid as much as possible calculations\n",
    "### AAA Pattern: Arrange, Act, Assert"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Test Driven Development"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tests before code\n",
    "### Useful for documentation\n",
    "### Ensures tests existance\n",
    "### Helps reducing complexity\n",
    "### Red, Green, Refactor cycle"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Dojos"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Coding session with a challenge using techniques or tools\n",
    "### Problem is trivial, again there is no prize!\n",
    "### Many different kinds, focuses, and constraints"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Have fun\n",
    "### Learn something new\n",
    "### Practice something different\n",
    "### Share knowledge"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Time boxed iterations\n",
    "### Discussion\n",
    "### Baby steps\n",
    "### Pair programming and Red, Green, Refactor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Final retrospective\n",
    "### Kata"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Weather data kata"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The goal of this exercise is to get a program reading the `weather.dat` file and printing the day and minimum temperature values for the day with the lowest minimum temperature within the month depicted in the file.\n",
    "The program should work like this:\n",
    "\n",
    "    python weather.py\n",
    "    9 32"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Contents for the `weather.dat` file are tabular space-separated data for weather measurements for a month in a place.\n",
    "The file has a header line, followed by an empty line, each month's day data, and a last line with month's mean values for some of the columns.\n",
    "The data lines contain the number of the day of the month, in the first column, and the minimum temperature for this day in the third column."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "The contents look like these:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# %load weather.dat\n",
    "  Dy MxT   MnT   AvT   HDDay  AvDP 1HrP TPcpn WxType PDir AvSp Dir MxS SkyC MxR MnR AvSLP\n",
    "\n",
    "   1  88    59    74          53.8       0.00 F       280  9.6 270  17  1.6  93 23 1004.5\n",
    "   2  79    63    71          46.5       0.00         330  8.7 340  23  3.3  70 28 1004.5\n",
    "   3  77    55    66          39.6       0.00         350  5.0 350   9  2.8  59 24 1016.8\n",
    "   4  77    59    68          51.1       0.00         110  9.1 130  12  8.6  62 40 1021.1\n",
    "   5  90    66    78          68.3       0.00 TFH     220  8.3 260  12  6.9  84 55 1014.4\n",
    "   6  81    61    71          63.7       0.00 RFH     030  6.2 030  13  9.7  93 60 1012.7\n",
    "   7  73    57    65          53.0       0.00 RF      050  9.5 050  17  5.3  90 48 1021.8\n",
    "   8  75    54    65          50.0       0.00 FH      160  4.2 150  10  2.6  93 41 1026.3\n",
    "   9  86    32*   59       6  61.5       0.00         240  7.6 220  12  6.0  78 46 1018.6\n",
    "  10  84    64    74          57.5       0.00 F       210  6.6 050   9  3.4  84 40 1019.0\n",
    "  11  91    59    75          66.3       0.00 H       250  7.1 230  12  2.5  93 45 1012.6\n",
    "  12  88    73    81          68.7       0.00 RTH     250  8.1 270  21  7.9  94 51 1007.0\n",
    "  13  70    59    65          55.0       0.00 H       150  3.0 150   8 10.0  83 59 1012.6\n",
    "  14  61    59    60       5  55.9       0.00 RF      060  6.7 080   9 10.0  93 87 1008.6\n",
    "  15  64    55    60       5  54.9       0.00 F       040  4.3 200   7  9.6  96 70 1006.1\n",
    "  16  79    59    69          56.7       0.00 F       250  7.6 240  21  7.8  87 44 1007.0\n",
    "  17  81    57    69          51.7       0.00 T       260  9.1 270  29* 5.2  90 34 1012.5\n",
    "  18  82    52    67          52.6       0.00         230  4.0 190  12  5.0  93 34 1021.3\n",
    "  19  81    61    71          58.9       0.00 H       250  5.2 230  12  5.3  87 44 1028.5\n",
    "  20  84    57    71          58.9       0.00 FH      150  6.3 160  13  3.6  90 43 1032.5\n",
    "  21  86    59    73          57.7       0.00 F       240  6.1 250  12  1.0  87 35 1030.7\n",
    "  22  90    64    77          61.1       0.00 H       250  6.4 230   9  0.2  78 38 1026.4\n",
    "  23  90    68    79          63.1       0.00 H       240  8.3 230  12  0.2  68 42 1021.3\n",
    "  24  90    77    84          67.5       0.00 H       350  8.5 010  14  6.9  74 48 1018.2\n",
    "  25  90    72    81          61.3       0.00         190  4.9 230   9  5.6  81 29 1019.6\n",
    "  26  97*   64    81          70.4       0.00 H       050  5.1 200  12  4.0 107 45 1014.9\n",
    "  27  91    72    82          69.7       0.00 RTH     250 12.1 230  17  7.1  90 47 1009.0\n",
    "  28  84    68    76          65.6       0.00 RTFH    280  7.6 340  16  7.0 100 51 1011.0\n",
    "  29  88    66    77          59.7       0.00         040  5.4 020   9  5.3  84 33 1020.6\n",
    "  30  90    45    68          63.6       0.00 H       240  6.0 220  17  4.8 200 41 1022.7\n",
    "  mo  82.9  60.5  71.7    16  58.8       0.00              6.9          5.3\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Introduction to py.test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Most powerful and idiomatic unit testing Python library\n",
    "### Not included in standard library\n",
    "### Uses `assert` statement\n",
    "### Runs other unit testing libraries\n",
    "### Goodies and plugins"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Writing test_demo.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile test_demo.py\n",
    "def multiply(number1, number2):\n",
    "    return number1 + number2\n",
    "\n",
    "def test_multiply():\n",
    "    result = multiply(5,6)\n",
    "    assert result == 30"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 1 items\n",
      "\n",
      "test_demo.py F\n",
      "\n",
      "==================================================================================================== FAILURES =====================================================================================================\n",
      "__________________________________________________________________________________________________ test_multiply __________________________________________________________________________________________________\n",
      "\n",
      "    def test_multiply():\n",
      "        result = multiply(5,6)\n",
      ">       assert result == 30\n",
      "E       assert 11 == 30\n",
      "\n",
      "test_demo.py:6: AssertionError\n",
      "============================================================================================ 1 failed in 0.01 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test ."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "In this output, py.test shows it caught the file `test_demo.py` and run the function `test_multiply`, since it starts with `test_`, and it print an `F` close to the file name, indicating it just contained one test function that failed to pass.\n",
    "It also shows what was the assertion with the value got from the Act part of the test, and the fixture expected to be.\n",
    "It also shows it got an `AssertionError`, meaning the assert statement could not pass.\n",
    "If the test run would failed for any other reason, i.e. any kind of error or exception, it would be there as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting test_demo.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile test_demo.py\n",
    "def multiply(number1, number2):\n",
    "    return number1 * number2\n",
    "\n",
    "def test_multiply():\n",
    "    result = multiply(5,6)\n",
    "    assert result == 30"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 1 items\n",
      "\n",
      "test_demo.py .\n",
      "\n",
      "============================================================================================ 1 passed in 0.01 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test ."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "rm test_demo.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Bootstrapping the solution with py.test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "```python\n",
    "import weather\n",
    "\n",
    "def test_process_weather():\n",
    "    weather.process()\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Writing test_weather.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile test_weather.py\n",
    "import weather\n",
    "\n",
    "def test_process_weather():\n",
    "    weather.process()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 0 items / 1 errors\n",
      "\n",
      "===================================================================================================== ERRORS ======================================================================================================\n",
      "________________________________________________________________________________________ ERROR collecting test_weather.py _________________________________________________________________________________________\n",
      "test_weather.py:1: in <module>\n",
      "    import weather\n",
      "E   ImportError: No module named 'weather'\n",
      "============================================================================================= 1 error in 0.01 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test test_weather.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "```python\n",
    "def process():\n",
    "    pass\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Writing weather.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile weather.py\n",
    "def process():\n",
    "    pass"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 1 items\n",
      "\n",
      "test_weather.py .\n",
      "\n",
      "============================================================================================ 1 passed in 0.01 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test test_weather.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Files with Python (I)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `open(filename[, mode[, buffering]])` returns file object\n",
    "### `file.close()` should be done manually"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "    myfile = open('myfile.dat')\n",
    "    ...\n",
    "    myfile.close()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Read all lines in file with `file.readlines()`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "    myfile = open('myfile.dat')\n",
    "    mylines = file.readlines()\n",
    "    myfile.close()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Printing strings with Python"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello\n"
     ]
    }
   ],
   "source": [
    "print(\"Hello\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Take care with the hidden new line"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Use `string.format()` for string formatting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello pythonist\n",
      "Hello nice pythonist\n"
     ]
    }
   ],
   "source": [
    "name = 'pythonist'\n",
    "print(\"Hello {}\".format(name))\n",
    "adjective = \"nice\"\n",
    "print(\"Hello {} {}\".format(adjective, name))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Capturing output with py.test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `capsys` fixture"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "    def test_demo(capsys):\n",
    "        print(\"Hello\")\n",
    "        out, err = capsys.readouterr()\n",
    "        assert out == \"Hello\\n\"\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Lists with Python"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[]\n",
      "[1, 2, 3]\n"
     ]
    }
   ],
   "source": [
    "a = []\n",
    "b = [1, 2, 3,]\n",
    "print(a)\n",
    "print(b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\n",
      "3\n"
     ]
    }
   ],
   "source": [
    "print(len(a))\n",
    "print(len(b))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\n",
      "[2]\n",
      "[2, 3]\n",
      "[1, 3]\n",
      "[3, 2, 1]\n"
     ]
    }
   ],
   "source": [
    "print(b[0])\n",
    "print(b[1:2])\n",
    "print(b[1:])\n",
    "print(b[::2])\n",
    "print(b[::-1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Strings with Python (I)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Hello\n",
      "0\n",
      "5\n",
      "Hlo\n"
     ]
    }
   ],
   "source": [
    "a = \"\"\n",
    "b = \"Hello\"\n",
    "print(a)\n",
    "print(b)\n",
    "print(len(a))\n",
    "print(len(b))\n",
    "print(b[::2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello\n"
     ]
    }
   ],
   "source": [
    "if b.startswith(\"H\"):\n",
    "    print(b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['1', '2', '3']\n",
      "['Hello', 'Bye']\n"
     ]
    }
   ],
   "source": [
    "print(\"1-2-3\".split(\"-\"))\n",
    "print(\"Hello123Bye\".split(\"123\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'1-2-3'"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"-\".join([\"1\", \"2\", \"3\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This ia a line starting with a space\n",
      " line \n",
      "line\n"
     ]
    }
   ],
   "source": [
    "print(\" This ia a line starting with a space\\n\".strip())\n",
    "print(\"New line New\".strip(\"New\"))\n",
    "print(\"New line New\".strip(\"New\").strip())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# First iteration"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first iteration will focus on load data lines from the file.\n",
    "The approach chosen is pretty simple, just read lines from the file, and print them all.\n",
    "\n",
    "**This should take not more than 40 minutes.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "So, the first test checks the function prints the content of the file stripping lines, when called.\n",
    "In order to do that check the output of the script with py.test, the `capsys` plugin is being used, which allows the test environment to keep standard output and error in memory, and enables the test to check these afterwards:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "```python\n",
    "import weather\n",
    "\n",
    "def test_process_weather(capsys):\n",
    "    weather.process()\n",
    "    out, err = capsys.readouterr()\n",
    "    assert out.startswith(\"Dy\")\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting test_weather.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile test_weather.py\n",
    "import weather\n",
    "\n",
    "def test_process_weather(capsys):\n",
    "    weather.process()\n",
    "    out, err = capsys.readouterr()\n",
    "    assert out.startswith(\"Dy\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 1 items\n",
      "\n",
      "test_weather.py F\n",
      "\n",
      "==================================================================================================== FAILURES =====================================================================================================\n",
      "______________________________________________________________________________________________ test_process_weather _______________________________________________________________________________________________\n",
      "\n",
      "capsys = <_pytest.capture.CaptureFixture object at 0x104fca710>\n",
      "\n",
      "    def test_process_weather(capsys):\n",
      "        weather.process()\n",
      "        out, err = capsys.readouterr()\n",
      ">       assert out.startswith(\"Dy\")\n",
      "E       assert <built-in method startswith of str object at 0x1003acab0>('Dy')\n",
      "E        +  where <built-in method startswith of str object at 0x1003acab0> = ''.startswith\n",
      "\n",
      "test_weather.py:6: AssertionError\n",
      "============================================================================================ 1 failed in 0.01 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test test_weather.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "```python\n",
    "def process():\n",
    "    print(\"Dy\")\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting weather.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile weather.py\n",
    "def process():\n",
    "    print(\"Dy\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 1 items\n",
      "\n",
      "test_weather.py .\n",
      "\n",
      "============================================================================================ 1 passed in 0.01 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test test_weather.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "```python\n",
    "import weather\n",
    "\n",
    "def test_process_weather(capsys):\n",
    "    weather.process()\n",
    "    out, err = capsys.readouterr()\n",
    "    assert out.startswith(\"Dy\")\n",
    "    assert len(out.split(\"\\n\")) > 2\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting test_weather.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile test_weather.py\n",
    "import weather\n",
    "\n",
    "def test_process_weather(capsys):\n",
    "    weather.process()\n",
    "    out, err = capsys.readouterr()\n",
    "    assert out.startswith(\"Dy\")\n",
    "    assert len(out.split(\"\\n\")) > 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 1 items\n",
      "\n",
      "test_weather.py F\n",
      "\n",
      "==================================================================================================== FAILURES =====================================================================================================\n",
      "______________________________________________________________________________________________ test_process_weather _______________________________________________________________________________________________\n",
      "\n",
      "capsys = <_pytest.capture.CaptureFixture object at 0x10597e438>\n",
      "\n",
      "    def test_process_weather(capsys):\n",
      "        weather.process()\n",
      "        out, err = capsys.readouterr()\n",
      "        assert out.startswith(\"Dy\")\n",
      ">       assert len(out.split(\"\\n\")) > 2\n",
      "E       assert 2 > 2\n",
      "E        +  where 2 = len(['Dy', ''])\n",
      "E        +    where ['Dy', ''] = <built-in method split of str object at 0x105977ab0>('\\n')\n",
      "E        +      where <built-in method split of str object at 0x105977ab0> = 'Dy\\n'.split\n",
      "\n",
      "test_weather.py:7: AssertionError\n",
      "============================================================================================ 1 failed in 0.01 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test test_weather.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Some baby steps after"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "```python\n",
    "def process():\n",
    "    data_file = open('weather.dat')\n",
    "    print(\"\".join(data_file.readlines()).strip())\n",
    "    data_file.close()\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting weather.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile weather.py\n",
    "def process():\n",
    "    data_file = open('weather.dat')\n",
    "    print(\"\".join(data_file.readlines()).strip())\n",
    "    data_file.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 1 items\n",
      "\n",
      "test_weather.py .\n",
      "\n",
      "============================================================================================ 1 passed in 0.01 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test test_weather.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Regular expressions in Python (I)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Regular expressions technique for pattern matching and data extraction\n",
    "### Automata\n",
    "### Regular expression as pattern string"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### `[a-z]`\n",
    "### `[a-zA-Z]*`\n",
    "### `[0-9]+ [a-z]*`\n",
    "### `.*`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### In Python, regular expression strings are differentiated by enclosing them with `r\"\"`, like in `r\"[a-z]*\"`.\n",
    "### `re.match(pattern, string)` and `re.search(pattern, string)`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "It matches with search!\n"
     ]
    }
   ],
   "source": [
    "import re\n",
    "\n",
    "a = \"this is 1 2 3 4 500\"\n",
    "regexp = r\"[0-9 ]+\"\n",
    "\n",
    "match = re.search(regexp, a)\n",
    "if match:\n",
    "    print(\"It matches with search!\")\n",
    "match = re.match(regexp, a)\n",
    "if match:\n",
    "    print(\"It matches with match!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Second iteration"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this second iteration, the program should be able to skip header and empty lines, printing only data lines.\n",
    "\n",
    "**This should be easily completed in 20 minutes.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "As last requirement for this iteration, no headers or empty lines should be printed:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "```python\n",
    "import weather\n",
    "\n",
    "def test_process_weather(capsys):\n",
    "    weather.process()\n",
    "    out, err = capsys.readouterr()\n",
    "    assert out.startswith(\"1\")\n",
    "    assert len(out.split(\"\\n\")) > 2\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting test_weather.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile test_weather.py\n",
    "import weather\n",
    "\n",
    "def test_process_weather(capsys):\n",
    "    weather.process()\n",
    "    out, err = capsys.readouterr()\n",
    "    assert out.startswith(\"1\")\n",
    "    assert len(out.split(\"\\n\")) > 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 1 items\n",
      "\n",
      "test_weather.py F\n",
      "\n",
      "==================================================================================================== FAILURES =====================================================================================================\n",
      "______________________________________________________________________________________________ test_process_weather _______________________________________________________________________________________________\n",
      "\n",
      "capsys = <_pytest.capture.CaptureFixture object at 0x105952908>\n",
      "\n",
      "    def test_process_weather(capsys):\n",
      "        weather.process()\n",
      "        out, err = capsys.readouterr()\n",
      ">       assert out.startswith(\"1\")\n",
      "E       assert <built-in method startswith of str object at 0x104890c00>('1')\n",
      "E        +  where <built-in method startswith of str object at 0x104890c00> = 'Dy MxT   MnT   AvT   HDDay  AvDP 1HrP TPcpn WxType PDir AvSp Dir MxS SkyC MxR MnR AvSLP\\n\\n   1  88    59    74      ...    240  6.0 220  17  4.8 200 41 1022.7\\n  mo  82.9  60.5  71.7    16  58.8       0.00              6.9          5.3\\n'.startswith\n",
      "\n",
      "test_weather.py:6: AssertionError\n",
      "============================================================================================ 1 failed in 0.01 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test test_weather.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "```python\n",
    "import re\n",
    "\n",
    "def process():\n",
    "    data_file = open('weather.dat')\n",
    "    pattern = r\"[0-9]+.*\"\n",
    "    for line in data_file.readlines():\n",
    "        match = re.match(pattern, line.strip())\n",
    "        if match:\n",
    "            print(line.strip())\n",
    "    data_file.close()\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting weather.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile weather.py\n",
    "import re\n",
    "\n",
    "def process():\n",
    "    data_file = open('weather.dat')\n",
    "    pattern = r\"[0-9]+.*\"\n",
    "    for line in data_file.readlines():\n",
    "        match = re.match(pattern, line.strip())\n",
    "        if match:\n",
    "            print(line.strip())\n",
    "    data_file.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": false,
    "scrolled": true,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 1 items\n",
      "\n",
      "test_weather.py .\n",
      "\n",
      "============================================================================================ 1 passed in 0.00 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test test_weather.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Regular expressions in Python (II)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "import re"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Grouping: `()`\n",
    "### `(?P<id>any_regex)`\n",
    "### `\\s`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Guido from Netherlands was born in 1956\n"
     ]
    }
   ],
   "source": [
    "a = \"Name: Guido, Country: Netherlands, Year of birth: 1956\"\n",
    "pattern = r\"Name: (?P<name>[a-zA-Z]+), \"\n",
    "pattern += r\"Country: (?P<country>[a-zA-Z]+), \"\n",
    "pattern += r\"Year of birth: (?P<year>[0-9]+)\"\n",
    "match = re.match(pattern, a)\n",
    "if match:\n",
    "    print(\"{} from {} was born in {}\".format(\n",
    "            match.group('name'),\n",
    "            match.group('country'),\n",
    "            match.group('year')))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Third iteration"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The third iteration should allow to get data using grouping in regular expressions, and finishing by reducing the numer of lines to the correct output, i.e. day and minimum temperature for the day with minimum temperature.\n",
    "\n",
    "**This shouldn't take more than 15 minutes**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "```python\n",
    "import weather\n",
    "\n",
    "def test_process_weather(capsys):\n",
    "    weather.process()\n",
    "    out, err = capsys.readouterr()\n",
    "    assert out.startswith(\"1\")\n",
    "    output_lines = out.split(\"\\n\")\n",
    "    assert len(output_lines) > 2\n",
    "    assert output_lines[0] == \"1 59\"\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting test_weather.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile test_weather.py\n",
    "import weather\n",
    "\n",
    "def test_process_weather(capsys):\n",
    "    weather.process()\n",
    "    out, err = capsys.readouterr()\n",
    "    assert out.startswith(\"1\")\n",
    "    output_lines = out.split(\"\\n\")\n",
    "    assert len(output_lines) > 2\n",
    "    assert output_lines[0] == \"1 59\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 1 items\n",
      "\n",
      "test_weather.py F\n",
      "\n",
      "==================================================================================================== FAILURES =====================================================================================================\n",
      "______________________________________________________________________________________________ test_process_weather _______________________________________________________________________________________________\n",
      "\n",
      "capsys = <_pytest.capture.CaptureFixture object at 0x105fec080>\n",
      "\n",
      "    def test_process_weather(capsys):\n",
      "        weather.process()\n",
      "        out, err = capsys.readouterr()\n",
      "        assert out.startswith(\"1\")\n",
      "        output_lines = out.split(\"\\n\")\n",
      "        assert len(output_lines) > 2\n",
      ">       assert output_lines[0] == \"1 59\"\n",
      "E       assert '1  88    59 ... 93 23 1004.5' == '1 59'\n",
      "E         - 1  88    59    74          53.8       0.00 F       280  9.6 270  17  1.6  93 23 1004.5\n",
      "E         + 1 59\n",
      "\n",
      "test_weather.py:9: AssertionError\n",
      "============================================================================================ 1 failed in 0.02 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test test_weather.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "```python\n",
    "import re\n",
    "\n",
    "def process():\n",
    "    data_file = open('weather.dat')\n",
    "    pattern = r\"(?P<day>[0-9]+)\\s+[0-9]+\\s+(?P<min>[0-9]+).*\"\n",
    "    for line in data_file.readlines():\n",
    "        match = re.match(pattern, line.strip())\n",
    "        if match:\n",
    "            print(\"{} {}\".format(match.group('day'), match.group('min')))\n",
    "    data_file.close()\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting weather.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile weather.py\n",
    "import re\n",
    "\n",
    "def process():\n",
    "    data_file = open('weather.dat')\n",
    "    pattern = r\"(?P<day>[0-9]+)\\s+[0-9]+\\s+(?P<min>[0-9]+).*\"\n",
    "    for line in data_file.readlines():\n",
    "        match = re.match(pattern, line.strip())\n",
    "        if match:\n",
    "            print(\"{} {}\".format(match.group('day'), match.group('min')))\n",
    "    data_file.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 1 items\n",
      "\n",
      "test_weather.py .\n",
      "\n",
      "============================================================================================ 1 passed in 0.01 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test test_weather.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "```python\n",
    "import weather\n",
    "\n",
    "def test_process_weather(capsys):\n",
    "    weather.process()\n",
    "    out, err = capsys.readouterr()\n",
    "    assert out == \"9 32\\n\"\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting test_weather.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile test_weather.py\n",
    "import weather\n",
    "\n",
    "def test_process_weather(capsys):\n",
    "    weather.process()\n",
    "    out, err = capsys.readouterr()\n",
    "    assert out == \"9 32\\n\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 1 items\n",
      "\n",
      "test_weather.py F\n",
      "\n",
      "==================================================================================================== FAILURES =====================================================================================================\n",
      "______________________________________________________________________________________________ test_process_weather _______________________________________________________________________________________________\n",
      "\n",
      "capsys = <_pytest.capture.CaptureFixture object at 0x10578a940>\n",
      "\n",
      "    def test_process_weather(capsys):\n",
      "        weather.process()\n",
      "        out, err = capsys.readouterr()\n",
      ">       assert out == \"9 32\\n\"\n",
      "E       assert '1 59\\n2 63\\n...9 66\\n30 45\\n' == '9 32\\n'\n",
      "E         - 1 59\n",
      "E         - 2 63\n",
      "E         - 3 55\n",
      "E         - 4 59\n",
      "E         - 5 66\n",
      "E         - 6 61\n",
      "E         - 7 57\n",
      "E         - 8 54\n",
      "E           9 32\n",
      "E         - 10 64\n",
      "E         - 11 59\n",
      "E         - 12 73\n",
      "E         - 13 59\n",
      "E         - 14 59\n",
      "E         - 15 55\n",
      "E         - 16 59\n",
      "E         - 17 57\n",
      "E         - 18 52\n",
      "E         - 19 61\n",
      "E         - 20 57\n",
      "E         - 21 59\n",
      "E         - 22 64\n",
      "E         - 23 68\n",
      "E         - 24 77\n",
      "E         - 25 72\n",
      "E         - 27 72\n",
      "E         - 28 68\n",
      "E         - 29 66\n",
      "E         - 30 45\n",
      "\n",
      "test_weather.py:6: AssertionError\n",
      "============================================================================================ 1 failed in 0.01 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test test_weather.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "```python\n",
    "import re\n",
    "\n",
    "def process():\n",
    "    data_file = open('weather.dat')\n",
    "    pattern = r\"(?P<day>[0-9]+)\\s+[0-9]+\\s+(?P<min>[0-9]+).*\"\n",
    "    day = 0\n",
    "    temp = 1000\n",
    "    for line in data_file.readlines():\n",
    "        match = re.match(pattern, line.strip())\n",
    "        if match:\n",
    "            if int(match.group('min')) < temp:\n",
    "                day = match.group('day')\n",
    "                temp = int(match.group('min'))\n",
    "    print(\"{} {}\".format(day, temp))\n",
    "    data_file.close()\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting weather.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile weather.py\n",
    "import re\n",
    "\n",
    "def process():\n",
    "    data_file = open('weather.dat')\n",
    "    pattern = r\"(?P<day>[0-9]+)\\s+[0-9]+\\s+(?P<min>[0-9]+).*\"\n",
    "    day = 0\n",
    "    temp = 1000\n",
    "    for line in data_file.readlines():\n",
    "        match = re.match(pattern, line.strip())\n",
    "        if match:\n",
    "            if int(match.group('min')) < temp:\n",
    "                day = match.group('day')\n",
    "                temp = int(match.group('min'))\n",
    "    print(\"{} {}\".format(day, temp))\n",
    "    data_file.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 1 items\n",
      "\n",
      "test_weather.py .\n",
      "\n",
      "============================================================================================ 1 passed in 0.01 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test test_weather.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Files with Python (II)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Better way for opening and closing files\n",
    "```python\n",
    "    with open('myfile') as data_file:\n",
    "        ...\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# First refactor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the last iteration, the code should be easily modified to be more idiomatic and efficient when opening and closing the file, without causing the tests to fail.\n",
    "\n",
    "**This should be accomplished in 10 minutes.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "```python\n",
    "import re\n",
    "\n",
    "def process():\n",
    "    pattern = r\"(?P<day>[0-9]+)\\s+[0-9]+\\s+(?P<min>[0-9]+).*\"\n",
    "    day = 0\n",
    "    temp = 1000\n",
    "    with open('weather.dat') as data_file:\n",
    "        for data_file in data_file.readlines():\n",
    "            match = re.match(pattern, line.strip())\n",
    "            if match:\n",
    "                if int(match.group('min')) < temp:\n",
    "                    day = match.group('day')\n",
    "                    temp = int(match.group('min'))\n",
    "    print(\"{} {}\".format(day, temp))\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting weather.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile weather.py\n",
    "import re\n",
    "\n",
    "def process():\n",
    "    pattern = r\"(?P<day>[0-9]+)\\s+[0-9]+\\s+(?P<min>[0-9]+).*\"\n",
    "    day = 0\n",
    "    temp = 1000\n",
    "    with open('weather.dat') as data_file:\n",
    "        for line in data_file.readlines():\n",
    "            match = re.match(pattern, line.strip())\n",
    "            if match:\n",
    "                if int(match.group('min')) < temp:\n",
    "                    day = match.group('day')\n",
    "                    temp = int(match.group('min'))\n",
    "    print(\"{} {}\".format(day, temp))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 1 items\n",
      "\n",
      "test_weather.py .\n",
      "\n",
      "============================================================================================ 1 passed in 0.01 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test test_weather.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Files with Python (III)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Use the file object to iterate over lines, instead of `readlines()`\n",
    "```python\n",
    "    for line in data_file:\n",
    "        ...\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Second refactor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The way the lines are being read must be refactored to ensure it is memory efficient and faster.\n",
    "\n",
    "**This should not take more than 10 minutes.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "```python\n",
    "import re\n",
    "\n",
    "def process():\n",
    "    pattern = r\"(?P<day>[0-9]+)\\s+[0-9]+\\s+(?P<min>[0-9]+).*\"\n",
    "    day = 0\n",
    "    temp = 1000\n",
    "    with open('weather.dat') as file:\n",
    "        for line in file:\n",
    "            match = re.match(pattern, line.strip())\n",
    "            if match:\n",
    "                if int(match.group('min')) < temp:\n",
    "                    day = match.group('day')\n",
    "                    temp = int(match.group('min'))\n",
    "    print(\"{} {}\".format(day, temp))\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting weather.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile weather.py\n",
    "import re\n",
    "\n",
    "def process():\n",
    "    pattern = r\"(?P<day>[0-9]+)\\s+[0-9]+\\s+(?P<min>[0-9]+).*\"\n",
    "    day = 0\n",
    "    temp = 1000\n",
    "    with open('weather.dat') as file:\n",
    "        for line in file:\n",
    "            match = re.match(pattern, line.strip())\n",
    "            if match:\n",
    "                if int(match.group('min')) < temp:\n",
    "                    day = match.group('day')\n",
    "                    temp = int(match.group('min'))\n",
    "    print(\"{} {}\".format(day, temp))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 1 items\n",
      "\n",
      "test_weather.py .\n",
      "\n",
      "============================================================================================ 1 passed in 0.01 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test test_weather.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Strings with Python (II)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Replace strings'"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"REplace strings---\".replace(\"E\", \"e\").replace(\"-\", \"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Exceptions in Python"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `try ... except` construct:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Operation is not valid\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    a = 1 / 0\n",
    "except ZeroDivisionError:\n",
    "    print(\"Operation is not valid\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### `ValueError`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10\n"
     ]
    },
    {
     "ename": "ValueError",
     "evalue": "invalid literal for int() with base 10: 'Hello'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-49-eda733313bc1>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      2\u001b[0m \u001b[0mmy_string\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"Hello\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      3\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmy_integer_string\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmy_string\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m: invalid literal for int() with base 10: 'Hello'"
     ]
    }
   ],
   "source": [
    "my_integer_string = \"10\"\n",
    "my_string = \"Hello\"\n",
    "print(int(my_integer_string))\n",
    "print(int(my_string))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "my_string doesn't represent an integer\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    print(int(my_string))\n",
    "except ValueError:\n",
    "    print(\"my_string doesn't represent an integer\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Test benchmarking"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Benchmarking is measuring and metricking software"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### `benchmark` fixture from pytest-benchmark package"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Writing test_benchmark.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile test_benchmark.py\n",
    "import time\n",
    "\n",
    "def variable_time_function(seconds=0.001):\n",
    "    time.sleep(seconds)\n",
    "    return 123\n",
    "\n",
    "def test_variable_time_function(benchmark):\n",
    "    result = benchmark(variable_time_function)\n",
    "    assert result == 123"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 1 items\n",
      "\n",
      "test_benchmark.py .\n",
      "\n",
      "\n",
      "\r",
      "Computing stats ...\r",
      "Computing stats ... group 1/1\r",
      "Computing stats ... group 1/1: min\r",
      "Computing stats ... group 1/1: min (1/1)\r",
      "Computing stats ... group 1/1: min (1/1)\r",
      "Computing stats ... group 1/1: max\r",
      "Computing stats ... group 1/1: max (1/1)\r",
      "Computing stats ... group 1/1: max (1/1)\r",
      "Computing stats ... group 1/1: mean\r",
      "Computing stats ... group 1/1: mean (1/1)\r",
      "Computing stats ... group 1/1: mean (1/1)\r",
      "Computing stats ... group 1/1: median\r",
      "Computing stats ... group 1/1: median (1/1)\r",
      "Computing stats ... group 1/1: median (1/1)\r",
      "Computing stats ... group 1/1: iqr\r",
      "Computing stats ... group 1/1: iqr (1/1)\r",
      "Computing stats ... group 1/1: iqr (1/1)\r",
      "Computing stats ... group 1/1: stddev\r",
      "Computing stats ... group 1/1: stddev (1/1)\r",
      "Computing stats ... group 1/1: stddev (1/1)\r",
      "Computing stats ... group 1/1: stddev: outliers\r",
      "Computing stats ... group 1/1: stddev: outliers (1/1)\r",
      "Computing stats ... group 1/1: stddev: rounds\r",
      "Computing stats ... group 1/1: stddev: rounds (1/1)\r",
      "Computing stats ... group 1/1: stddev: iterations\r",
      "Computing stats ... group 1/1: stddev: iterations (1/1)\r",
      "---------------------------------------------- benchmark: 1 tests ---------------------------------------------\n",
      "Name (time in ms)                  Min     Max    Mean  StdDev  Median     IQR  Outliers(*)  Rounds  Iterations\n",
      "---------------------------------------------------------------------------------------------------------------\n",
      "test_variable_time_function     1.0096  2.3916  1.2002  0.1285  1.2021  0.1932        252;5     782           1\n",
      "---------------------------------------------------------------------------------------------------------------\n",
      "\n",
      "(*) Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.\n",
      "============================================================================================ 1 passed in 1.97 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test test_benchmark.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### `benchmark` fixture doesn't mix well with capsys!!!!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "collapsed": true,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "rm test_benchmark.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Third refactor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The usage of `re` module might be overkill for splitting the data in the lines as required for this case.\n",
    "In this refactor the usage of the py.test benchmark plugin can help to view possible time optimization.\n",
    "\n",
    "Some pitfalls this change might imply are:\n",
    "* Some numbers in the columns are marked with an `*`.\n",
    "* Watch out with header and empty lines.\n",
    "\n",
    "**This should take not more than 5 minutes.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "```python\n",
    "import weather\n",
    "\n",
    "def test_process_weather(capsys):\n",
    "    weather.process()\n",
    "    out, err = capsys.readouterr()\n",
    "    assert out == \"9 32\\n\"\n",
    "\n",
    "def test_benchmark_process_weather(benchmark):\n",
    "    benchmark(weather.process)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting test_weather.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile test_weather.py\n",
    "import weather\n",
    "\n",
    "def test_process_weather(capsys):\n",
    "    weather.process()\n",
    "    out, err = capsys.readouterr()\n",
    "    assert out == \"9 32\\n\"\n",
    "\n",
    "def test_benchmark_process_weather(benchmark):\n",
    "    benchmark(weather.process)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 2 items\n",
      "\n",
      "test_weather.py ..\n",
      "\n",
      "\n",
      "\r",
      "Computing stats ...\r",
      "Computing stats ... group 1/1\r",
      "Computing stats ... group 1/1: min\r",
      "Computing stats ... group 1/1: min (1/1)\r",
      "Computing stats ... group 1/1: min (1/1)\r",
      "Computing stats ... group 1/1: max\r",
      "Computing stats ... group 1/1: max (1/1)\r",
      "Computing stats ... group 1/1: max (1/1)\r",
      "Computing stats ... group 1/1: mean\r",
      "Computing stats ... group 1/1: mean (1/1)\r",
      "Computing stats ... group 1/1: mean (1/1)\r",
      "Computing stats ... group 1/1: median\r",
      "Computing stats ... group 1/1: median (1/1)\r",
      "Computing stats ... group 1/1: median (1/1)\r",
      "Computing stats ... group 1/1: iqr\r",
      "Computing stats ... group 1/1: iqr (1/1)\r",
      "Computing stats ... group 1/1: iqr (1/1)\r",
      "Computing stats ... group 1/1: stddev\r",
      "Computing stats ... group 1/1: stddev (1/1)\r",
      "Computing stats ... group 1/1: stddev (1/1)\r",
      "Computing stats ... group 1/1: stddev: outliers\r",
      "Computing stats ... group 1/1: stddev: outliers (1/1)\r",
      "Computing stats ... group 1/1: stddev: rounds\r",
      "Computing stats ... group 1/1: stddev: rounds (1/1)\r",
      "Computing stats ... group 1/1: stddev: iterations\r",
      "Computing stats ... group 1/1: stddev: iterations (1/1)\r",
      "---------------------------------------------------- benchmark: 1 tests ---------------------------------------------------\n",
      "Name (time in us)                      Min       Max      Mean   StdDev    Median      IQR  Outliers(*)  Rounds  Iterations\n",
      "---------------------------------------------------------------------------------------------------------------------------\n",
      "test_benchmark_process_weather     98.1162  841.9789  127.9027  47.7033  106.5750  31.2938      312;301    2824           1\n",
      "---------------------------------------------------------------------------------------------------------------------------\n",
      "\n",
      "(*) Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.\n",
      "============================================================================================ 2 passed in 1.40 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test test_weather.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "```python\n",
    "def process():\n",
    "    day = 0\n",
    "    temp = 1000\n",
    "    with open('weather.dat') as file:\n",
    "        for line in file:\n",
    "            columns = line.replace(\"*\", \"\").split()\n",
    "            try:\n",
    "                if len(columns) > 0 and int(columns[2]) < temp:\n",
    "                    day = columns[0]\n",
    "                    temp = int(columns[2])\n",
    "            except ValueError:\n",
    "                pass\n",
    "    print(\"{} {}\".format(day, temp))\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Some pitfalls\n",
    "### Some numbers marked with an `*`\n",
    "### Check if `columns` is not an empty string.\n",
    "### Accessor from match object replaced by columns location\n",
    "### ValueError exception needs to be captured for lines with string columns, like the header one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting weather.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile weather.py\n",
    "def process():\n",
    "    day = 0\n",
    "    temp = 1000\n",
    "    with open('weather.dat') as file:\n",
    "        for line in file:\n",
    "            columns = line.replace(\"*\", \"\").split()\n",
    "            try:\n",
    "                if len(columns) > 0 and int(columns[2]) < temp:\n",
    "                    day = columns[0]\n",
    "                    temp = int(columns[2])\n",
    "            except ValueError:\n",
    "                pass\n",
    "    print(\"{} {}\".format(day, temp))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=============================================================================================== test session starts ===============================================================================================\n",
      "platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1\n",
      "benchmark: 3.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=5.00us max_time=1.00s calibration_precision=10 warmup=False warmup_iterations=100000)\n",
      "rootdir: /Users/ifosch/src/github.com/BCNDojos/pyDojos/factor-out, inifile: \n",
      "plugins: benchmark-3.0.0\n",
      "collected 2 items\n",
      "\n",
      "test_weather.py ..\n",
      "\n",
      "\n",
      "\r",
      "Computing stats ...\r",
      "Computing stats ... group 1/1\r",
      "Computing stats ... group 1/1: min\r",
      "Computing stats ... group 1/1: min (1/1)\r",
      "Computing stats ... group 1/1: min (1/1)\r",
      "Computing stats ... group 1/1: max\r",
      "Computing stats ... group 1/1: max (1/1)\r",
      "Computing stats ... group 1/1: max (1/1)\r",
      "Computing stats ... group 1/1: mean\r",
      "Computing stats ... group 1/1: mean (1/1)\r",
      "Computing stats ... group 1/1: mean (1/1)\r",
      "Computing stats ... group 1/1: median\r",
      "Computing stats ... group 1/1: median (1/1)\r",
      "Computing stats ... group 1/1: median (1/1)\r",
      "Computing stats ... group 1/1: iqr\r",
      "Computing stats ... group 1/1: iqr (1/1)\r",
      "Computing stats ... group 1/1: iqr (1/1)\r",
      "Computing stats ... group 1/1: stddev\r",
      "Computing stats ... group 1/1: stddev (1/1)\r",
      "Computing stats ... group 1/1: stddev (1/1)\r",
      "Computing stats ... group 1/1: stddev: outliers\r",
      "Computing stats ... group 1/1: stddev: outliers (1/1)\r",
      "Computing stats ... group 1/1: stddev: rounds\r",
      "Computing stats ... group 1/1: stddev: rounds (1/1)\r",
      "Computing stats ... group 1/1: stddev: iterations\r",
      "Computing stats ... group 1/1: stddev: iterations (1/1)\r",
      "---------------------------------------------------- benchmark: 1 tests ---------------------------------------------------\n",
      "Name (time in us)                      Min       Max      Mean   StdDev    Median      IQR  Outliers(*)  Rounds  Iterations\n",
      "---------------------------------------------------------------------------------------------------------------------------\n",
      "test_benchmark_process_weather     98.2578  590.2671  126.3224  38.2697  111.1568  29.9662      444;350    4184           1\n",
      "---------------------------------------------------------------------------------------------------------------------------\n",
      "\n",
      "(*) Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.\n",
      "============================================================================================ 2 passed in 1.58 seconds =============================================================================================\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "py.test test_weather.py"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}