pronojitsaha/Avazu CTR.ipynb

## Avazu CTR.ipynb
{
 "metadata": {
  "name": "",
  "signature": "sha256:95a42d89f795a9a268747330dd89410b1a831b1c6bd7ffad3c46bb9361be57eb"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "[AVAZU CTR PREDICTION](https://www.kaggle.com/c/avazu-ctr-prediction)"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "1. Background"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "The Problem"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Predict whether a mobile ad will be clicked.\n",
      "\n",
      "In online advertising, click-through rate (CTR) is a very important metric for evaluating ad performance. As a result, click prediction systems are essential and widely used for sponsored search and real-time bidding.\n",
      "\n",
      "For this competition, we are provided 11 days worth of Avazu data to build and test prediction models. The challenge was to find a strategy that beats standard classification algorithms."
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "The Data"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The data was provided in two files as follows:  \n",
      "1. train - Training set. 10 days of click-through data, ordered chronologically. Non-clicks and clicks are subsampled according to different strategies.\n",
      "\n",
      "2. test - Test set. 1 day of ads to for testing your model predictions. \n",
      "\n",
      "3. sampleSubmission.csv - Sample submission file in the correct format, corresponds to the All-0.5 Benchmark.\n",
      "\n",
      "The train & test file contained the following **data fields/features**:  \n",
      "id: ad identifier  \n",
      "click: 0/1 for non-click/click  \n",
      "hour: format is YYMMDDHH, so 14091123 means 23:00 on Sept. 11, 2014 UTC.  \n",
      "C1 -- anonymized categorical variable  \n",
      "banner_pos  \n",
      "site_id  \n",
      "site_domain  \n",
      "site_category  \n",
      "app_id  \n",
      "app_domain  \n",
      "app_category  \n",
      "device_id  \n",
      "device_ip  \n",
      "device_model  \n",
      "device_type  \n",
      "device_conn_type  \n",
      "C14-C21 -- anonymized categorical variables  "
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Evaluation Metric"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Submissions are evaluated using the [Logarithmic Loss](https://www.kaggle.com/wiki/LogarithmicLoss) (smaller is better)."
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "2. Data Exploration"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "A simple unix commmand of ```wc -l``` tells us that there are around 40 million rows in the train data file. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!wc -l data/train"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 40428968 data/train\r\n"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Lets now examine the first few rows and get an idea of the structure of the file. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!head -n 5 data/train"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "id,click,hour,C1,banner_pos,site_id,site_domain,site_category,app_id,app_domain,app_category,device_id,device_ip,device_model,device_type,device_conn_type,C14,C15,C16,C17,C18,C19,C20,C21\r",
        "\r\n",
        "1000009418151094273,0,14102100,1005,0,1fbe01fe,f3845767,28905ebd,ecad2386,7801e8d9,07d7df22,a99f214a,ddd2926e,44956a24,1,2,15706,320,50,1722,0,35,-1,79\r",
        "\r\n",
        "10000169349117863715,0,14102100,1005,0,1fbe01fe,f3845767,28905ebd,ecad2386,7801e8d9,07d7df22,a99f214a,96809ac8,711ee120,1,0,15704,320,50,1722,0,35,100084,79\r",
        "\r\n",
        "10000371904215119486,0,14102100,1005,0,1fbe01fe,f3845767,28905ebd,ecad2386,7801e8d9,07d7df22,a99f214a,b3cf8def,8a4875bd,1,0,15704,320,50,1722,0,35,100084,79\r",
        "\r\n",
        "10000640724480838376,0,14102100,1005,0,1fbe01fe,f3845767,28905ebd,ecad2386,7801e8d9,07d7df22,a99f214a,e8275b8f,6332421a,1,0,15706,320,50,1722,0,35,100084,79\r",
        "\r\n"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As we can see, the data is completely annonymized and values for most of the columns are enrypted too. \n",
      "\n",
      "Lets now find out the same details about the test set. As we can see the test set has around 4 million rows. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!wc -l data/test"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 4577465 data/test\r\n"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!head -n 5 data/test"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "id,hour,C1,banner_pos,site_id,site_domain,site_category,app_id,app_domain,app_category,device_id,device_ip,device_model,device_type,device_conn_type,C14,C15,C16,C17,C18,C19,C20,C21\r",
        "\r\n",
        "10000174058809263569,14103100,1005,0,235ba823,f6ebf28e,f028772b,ecad2386,7801e8d9,07d7df22,a99f214a,69f45779,0eb711ec,1,0,8330,320,50,761,3,175,100075,23\r",
        "\r\n",
        "10000182526920855428,14103100,1005,0,1fbe01fe,f3845767,28905ebd,ecad2386,7801e8d9,07d7df22,a99f214a,e8d44657,ecb851b2,1,0,22676,320,50,2616,0,35,100083,51\r",
        "\r\n",
        "10000554139829213984,14103100,1005,0,1fbe01fe,f3845767,28905ebd,ecad2386,7801e8d9,07d7df22,a99f214a,10fb085b,1f0bc64f,1,0,22676,320,50,2616,0,35,100083,51\r",
        "\r\n",
        "10001094637809798845,14103100,1005,0,85f751fd,c4e18dd6,50e219e0,51cedd4e,aefc06bd,0f2161f8,a99f214a,422d257a,542422a7,1,0,18648,320,50,1092,3,809,100156,61\r",
        "\r\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We now find out the percentage of ads that were clicked in the training data set using the following command. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!cat data/train | awk -F ',' '$2==\"1\"{sum+=$2;} END { printf \"%.2f\\n\", sum }'"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "6865066.00\r\n"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Thus the percentage of clicks is (6865066/40428967)= 17% (approx.). The data is highly skewed as it has only 17% click rate. We may want to negate the effect of this later on during our analysis. "
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "3. Data Analysis"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The problem at hand is one of classification. Now we can employ logistic regression, naive bayes, or support vector machines. However, we need to remember that this is big data and as such one particular tool that really does well in classification of big data is the recetly developed online learning tool named [Vowpal Wabbit](http://hunch.net/~vw).\n",
      "\n",
      "Vowpal Wabbit or VW is a machine learning algorithm developed by [John Langford](http://research.microsoft.com/en-us/people/jcl/). VW is the essence of speed in machine learning, able to learn from terafeature datasets with ease. Via parallel learning, it can exceed the throughput of any single machine network interface when doing linear learning, a first amongst learning algorithms. For a deeper understanding of large scale online machine learning watch this fine [video tutorial](http://techtalks.tv/talks/online-linear-learning-part-1/57924/) with John Langford. To install and get started follow this [tutorial](https://github.com/JohnLangford/vowpal_wabbit/wiki/Tutorial).\n",
      "\n",
      "As mentioned, it is particulary suited for terafeature datasets as its [hashing trick](http://en.wikipedia.org/wiki/Feature_hashing#Feature_vectorization_using_the_hashing_trick) reduces the feature space to a number of bits, greatly reducing the computing time."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "3.1 Data Conversion"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As with any exercise with VW, we first convert the data into a format as required by vowpal wabbit. Basically VW requires each row of the dataset in the follwoing format:\n",
      "    \n",
      "    100 |n var1:20 var2:20 var3:2 |c 1.0 2.0 0.0 1.0\n",
      "    \n",
      "The first number 100 is the dependent variable, the one we want to predict (in our case the ridership value). |n defines a namespace which denotes the beginning of the numerical features of our dataset. So this example dataset has 3 numerical features, both the name of the feature and value are required. It is then followed by |c which indicates the categorical features of the dataset. Here only the values are required.\n",
      "\n",
      "We can generate this format from our raw dataset by writing a function 'csv_to_vw' as follows."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from datetime import datetime\n",
      "from csv import DictReader\n",
      "import math\n",
      "\n",
      "def csv_to_vw(loc_csv, loc_output, train=True):\n",
      "    \"\"\"\n",
      "    Munges a CSV file (loc_csv) to a VW file (loc_output). Set \"train\"\n",
      "    to False when munging a test set.\n",
      "    \"\"\"\n",
      "    start = datetime.now()\n",
      "    print(\"\\nTurning %s into %s. Is_train_set? %s\"%(loc_csv,loc_output,train))\n",
      "\n",
      "    with open(loc_output,\"wb\") as outfile:\n",
      "        for e, row in enumerate( DictReader(open(loc_csv)) ):\n",
      "\n",
      "          #Creating the features\n",
      "            numerical_features = \"\"\n",
      "            categorical_features = \"\"\n",
      "            for k,v in row.items():\n",
      "                if k == 'hour':\n",
      "                    new_date= datetime(int(\"20\"+v[0:2]),int(v[2:4]),int(v[4:6]))\n",
      "                    hour= v[6:8]\n",
      "                    sinHour = math.sin(2*math.pi*int(hour)/23)\n",
      "                    cosHour = math.cos(2*math.pi*int(hour)/23)\n",
      "                    day = new_date.strftime(\"%w\")\n",
      "                    if day not in [0,6]:\n",
      "                        weekend = 0\n",
      "                    else:\n",
      "                        weekend = 1\n",
      "                elif k not in [\"id\",\"click\",\"hour\"]:\n",
      "                    if len(str(v)) > 0:\n",
      "                        categorical_features += \"%s \" % v\n",
      "            categorical_features += \" |hr %s\" % hour\n",
      "            categorical_features += \" |day %s\" % day\n",
      "            categorical_features += \" |sinhr %s\" % sinHour\n",
      "            categorical_features += \" |coshr %s\" % cosHour\n",
      "            categorical_features += \" |weekend %s\" % weekend\n",
      "\n",
      "          #Creating the labels\t\t  \n",
      "            if train: #we care about labels\n",
      "                if row['click'] == \"1\":\n",
      "                  label = 1\n",
      "                else:\n",
      "                  label = -1 #we set negative label to -1\n",
      "                outfile.write( \"%s '%s |c %s\\n\" % (label,row['id'],categorical_features))\n",
      "\n",
      "            else: #we dont care about labels\n",
      "                outfile.write( \"1 '%s |c %s\\n\" % (row['id'],categorical_features))\n",
      "\n",
      "          #Reporting progress\n",
      "            if e % 100000 == 0:\n",
      "                print(\"%s\\t%s\"%(e, str(datetime.now() - start)))\n",
      "\n",
      "    print(\"\\n %s Task execution time:\\n\\t%s\"%(e, str(datetime.now() - start)))\n",
      "\n",
      "csv_to_vw(\"data/train\", \"train_ful.vw\", train=True)\n",
      "csv_to_vw(\"data/test\", \"test.vw\", train=False)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "In the above conversion code we have grouped all the existing categorical features in the data into a single namespace '|c'. We had even experimented with creating separate namespaces for each categorical variables, however that did not give better results in our analysis (explained later) for some reason. "
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "3.2 Feature Engineering"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As one would notice, we have also done some feature engineering within the conversion code itself. Primarily we have extracted the hour and day data from the date feature which should be one of the strongest determinants of the CTR. We have also added a weekend variable which is 1 if the day was a weekend else 0. The hypothesis is that the CTR would be different on a weekend than a weekday, with the former seeing higher CTR rates.  \n",
      "\n",
      "Much of the observed structure to the data is periodic in nature (hourly). With that in mind we construct some features to reflect this a priori symmetry in our data. For variables with inherent symmetry present (hours:24) we transformed to polar coordinates via a (sin,cos) pair of variables to seek solutions of given periodicity.\n",
      "\n",
      "Each of the above newly created features was given a separate namespace of their own. We save this function to a file named csv2vw.py."
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Truly Big Data"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This data set is truly Big in terms of number of records. As we saw earlier, there are in total close to 40 million rows and the size of the train data file is more than 1GB. Such high cardinality data will require a substantial amount of time to analyse if done with a standard python implementation. \n",
      "\n",
      "Enter pypy! [PyPy](http://pypy.org/) is a fast, compliant alternative implementation of the Python language (2.7.8 and 3.2.5). It has several advantages and distinct features, the most important being significant speed improvements and  less memory usage. One can install pypy by simply typing the following at the command line:     \n",
      "```\n",
      "pip install pypy\n",
      "```\n",
      "After installing you need to add the line ```'export PATH=$PATH:/Users/your user name/pypy-2.4/bin'``` to your .bash_profile file (on mac) to call the command from anywhere. \n",
      "\n",
      "We then call our function csv2vw.py with ```pypy``` to speed up the processing. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": true,
     "input": [
      "!pypy csv2vw.py"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r\n",
        "Turning data/train into train_full.vw. Is_train_set? True\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0\t0:00:00.047804\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "100000\t0:00:03.125274\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "200000\t0:00:05.414076\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "300000\t0:00:08.025433\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "400000\t0:00:10.525358\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "500000\t0:00:12.765795\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "600000\t0:00:15.077024\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "700000\t0:00:17.251730\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "800000\t0:00:19.431892\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "900000\t0:00:21.726936\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1000000\t0:00:24.056752\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1100000\t0:00:26.388141\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1200000\t0:00:28.863936\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1300000\t0:00:31.373522\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1400000\t0:00:33.753087\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1500000\t0:00:36.001352\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1600000\t0:00:38.616340\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1700000\t0:00:40.691579\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1800000\t0:00:42.745740\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1900000\t0:00:44.822003\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2000000\t0:00:46.998355\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2100000\t0:00:49.067117\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2200000\t0:00:51.157181\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2300000\t0:00:53.253797\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2400000\t0:00:55.457481\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2500000\t0:00:57.537969\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2600000\t0:00:59.718608\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2700000\t0:01:01.924935\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2800000\t0:01:04.281188\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2900000\t0:01:06.639845\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3000000\t0:01:09.176489\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3100000\t0:01:12.498500\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3200000\t0:01:15.050708\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3300000\t0:01:17.990987\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3400000\t0:01:22.997781\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3500000\t0:01:25.711598\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3600000\t0:01:28.331162\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3700000\t0:01:30.560905\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3800000\t0:01:32.831209\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3900000\t0:01:35.274403\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "4000000\t0:01:41.019958\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "4100000\t0:01:43.629878\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "4200000\t0:01:45.981372\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "4300000\t0:01:48.290610\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "4400000\t0:01:50.974897\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "4500000\t0:01:53.139334\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "4600000\t0:01:55.269406\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "4700000\t0:01:57.911296\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "4800000\t0:02:00.230052\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "4900000\t0:02:02.473404\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "5000000\t0:02:04.839360\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "5100000\t0:02:06.988160\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "5200000\t0:02:09.876421\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "5300000\t0:02:12.170417\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "5400000\t0:02:14.841060\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "5500000\t0:02:17.359877\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "5600000\t0:02:19.517009\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "5700000\t0:02:21.762051\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "5800000\t0:02:23.825424\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "5900000\t0:02:26.054605\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "6000000\t0:02:28.178697\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "6100000\t0:02:30.327634\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "6200000\t0:02:32.406893\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "6300000\t0:02:34.476356\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "6400000\t0:02:36.573601\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "6500000\t0:02:39.035409\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "6600000\t0:02:41.096041\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "6700000\t0:02:43.149530\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "6800000\t0:02:45.204246\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "6900000\t0:02:47.296942\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "7000000\t0:02:49.440160\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "7100000\t0:02:51.631061\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "7200000\t0:02:53.879069\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "7300000\t0:02:56.323656\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "7400000\t0:02:58.598483\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "7500000\t0:03:00.853745\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "7600000\t0:03:02.962542\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "7700000\t0:03:05.311579\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "7800000\t0:03:08.046281\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "7900000\t0:03:10.171453\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "8000000\t0:03:12.299833\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "8100000\t0:03:14.468313\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "8200000\t0:03:16.663510\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "8300000\t0:03:18.774629\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "8400000\t0:03:21.038572\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "8500000\t0:03:23.140971\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "8600000\t0:03:25.254869\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "8700000\t0:03:27.382594\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "8800000\t0:03:29.543040\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "8900000\t0:03:31.668171\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "9000000\t0:03:33.750707\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "9100000\t0:03:35.814012\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "9200000\t0:03:38.205865\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "9300000\t0:03:40.295267\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "9400000\t0:03:42.520617\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "9500000\t0:03:44.700667\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "9600000\t0:03:46.789793\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "9700000\t0:03:49.058558\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "9800000\t0:03:51.142138\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "9900000\t0:03:53.320217\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10000000\t0:03:55.456283\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10100000\t0:03:57.658168\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10200000\t0:03:59.684915\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10300000\t0:04:02.034920\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10400000\t0:04:05.218467\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10500000\t0:04:10.986226\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10600000\t0:04:14.079148\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10700000\t0:04:16.790339\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10800000\t0:04:18.908982\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10900000\t0:04:21.020127\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "11000000\t0:04:23.230544\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "11100000\t0:04:25.359970\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "11200000\t0:04:27.471399\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "11300000\t0:04:29.554756\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "11400000\t0:04:31.670109\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "11500000\t0:04:33.769131\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "11600000\t0:04:36.174343\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "11700000\t0:04:38.445656\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "11800000\t0:04:40.532657\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "11900000\t0:04:42.601505\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12000000\t0:04:44.720613\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12100000\t0:04:46.873163\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12200000\t0:04:48.995944\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12300000\t0:04:51.163283\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12400000\t0:04:53.280281\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12500000\t0:04:55.339392\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12600000\t0:04:57.434778\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12700000\t0:04:59.585351\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12800000\t0:05:01.782611\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12900000\t0:05:03.862962\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "13000000\t0:05:06.038647\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "13100000\t0:05:08.382670\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "13200000\t0:05:10.547036\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "13300000\t0:05:12.818244\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "13400000\t0:05:14.994002\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "13500000\t0:05:17.074602\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "13600000\t0:05:19.162240\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "13700000\t0:05:21.210896\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "13800000\t0:05:23.471346\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "13900000\t0:05:25.555641\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "14000000\t0:05:27.659380\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "14100000\t0:05:29.739968\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "14200000\t0:05:31.748673\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "14300000\t0:05:33.873070\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "14400000\t0:05:36.278676\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "14500000\t0:05:38.662028\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "14600000\t0:05:40.752361\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "14700000\t0:05:42.883450\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "14800000\t0:05:45.004206\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "14900000\t0:05:47.068354\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "15000000\t0:05:49.274582\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "15100000\t0:05:51.396624\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "15200000\t0:05:53.622257\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "15300000\t0:05:55.699988\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "15400000\t0:05:57.799065\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "15500000\t0:05:59.987981\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "15600000\t0:06:02.163151\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "15700000\t0:06:04.231077\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "15800000\t0:06:06.470009\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "15900000\t0:06:08.926133\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "16000000\t0:06:11.165976\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "16100000\t0:06:13.316600\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "16200000\t0:06:15.408495\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "16300000\t0:06:17.636273\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "16400000\t0:06:19.846414\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "16500000\t0:06:21.995364\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "16600000\t0:06:24.111667\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "16700000\t0:06:26.198068\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "16800000\t0:06:28.376070\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "16900000\t0:06:30.472536\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "17000000\t0:06:32.637075\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "17100000\t0:06:34.999189\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "17200000\t0:06:37.729723\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "17300000\t0:06:40.758164\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "17400000\t0:06:43.169266\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "17500000\t0:06:45.348597\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "17600000\t0:06:47.484247\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "17700000\t0:06:49.639634\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "17800000\t0:06:51.913760\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "17900000\t0:06:54.071131\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "18000000\t0:06:56.270082\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "18100000\t0:06:58.845183\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "18200000\t0:07:01.480167\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "18300000\t0:07:03.593919\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "18400000\t0:07:05.841567\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "18500000\t0:07:08.361925\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "18600000\t0:07:10.448564\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "18700000\t0:07:12.579954\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "18800000\t0:07:14.744024\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "18900000\t0:07:16.977304\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "19000000\t0:07:19.169949\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "19100000\t0:07:21.591992\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "19200000\t0:07:23.728840\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "19300000\t0:07:25.878847\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "19400000\t0:07:28.002798\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "19500000\t0:07:30.293610\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "19600000\t0:07:32.365683\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "19700000\t0:07:34.479270\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "19800000\t0:07:36.592220\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "19900000\t0:07:39.081007\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "20000000\t0:07:41.247057\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "20100000\t0:07:44.109167\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "20200000\t0:07:46.959910\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "20300000\t0:07:49.443149\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "20400000\t0:07:51.901375\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "20500000\t0:07:54.449487\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "20600000\t0:07:56.619092\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "20700000\t0:07:58.728153\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "20800000\t0:08:01.068127\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "20900000\t0:08:04.557182\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "21000000\t0:08:10.240958\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "21100000\t0:08:14.173843\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "21200000\t0:08:16.467218\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "21300000\t0:08:18.643082\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "21400000\t0:08:20.993410\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "21500000\t0:08:23.204275\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "21600000\t0:08:25.892382\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "21700000\t0:08:28.153642\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "21800000\t0:08:30.378103\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "21900000\t0:08:32.675462\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "22000000\t0:08:35.094677\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "22100000\t0:08:37.220887\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "22200000\t0:08:39.701447\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "22300000\t0:08:41.825569\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "22400000\t0:08:43.962336\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "22500000\t0:08:46.184523\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "22600000\t0:08:48.385941\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "22700000\t0:08:50.748595\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "22800000\t0:08:52.948933\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "22900000\t0:08:55.210258\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "23000000\t0:08:57.682410\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "23100000\t0:09:00.421876\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "23200000\t0:09:02.584910\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "23300000\t0:09:04.789942\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "23400000\t0:09:06.979292\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "23500000\t0:09:09.388666\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "23600000\t0:09:11.440115\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "23700000\t0:09:13.581463\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "23800000\t0:09:15.803057\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "23900000\t0:09:17.986015\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "24000000\t0:09:20.113843\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "24100000\t0:09:22.284587\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "24200000\t0:09:24.346999\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "24300000\t0:09:26.502958\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "24400000\t0:09:28.734019\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "24500000\t0:09:30.892801\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "24600000\t0:09:32.933431\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "24700000\t0:09:35.089428\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "24800000\t0:09:37.269586\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "24900000\t0:09:39.622227\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "25000000\t0:09:41.805330\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "25100000\t0:09:43.945053\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "25200000\t0:09:46.102319\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "25300000\t0:09:48.200972\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "25400000\t0:09:50.315932\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "25500000\t0:09:52.456558\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "25600000\t0:09:54.574408\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "25700000\t0:09:57.867183\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "25800000\t0:10:00.347779\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "25900000\t0:10:02.677836\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "26000000\t0:10:04.970916\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "26100000\t0:10:07.476955\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "26200000\t0:10:09.751777\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "26300000\t0:10:12.509111\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "26400000\t0:10:15.137179\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "26500000\t0:10:17.482610\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "26600000\t0:10:19.709332\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "26700000\t0:10:21.955176\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "26800000\t0:10:24.239558\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "26900000\t0:10:26.518958\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "27000000\t0:10:28.834057\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "27100000\t0:10:31.195673\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "27200000\t0:10:33.478614\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "27300000\t0:10:35.818797\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "27400000\t0:10:38.473092\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "27500000\t0:10:40.969430\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "27600000\t0:10:43.386918\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "27700000\t0:10:45.699665\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "27800000\t0:10:48.002200\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "27900000\t0:10:50.243782\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "28000000\t0:10:52.489478\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "28100000\t0:10:54.737367\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "28200000\t0:10:57.011131\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "28300000\t0:10:59.239614\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "28400000\t0:11:01.491052\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "28500000\t0:11:03.744568\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "28600000\t0:11:06.030529\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "28700000\t0:11:08.464698\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "28800000\t0:11:10.965787\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "28900000\t0:11:13.306159\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "29000000\t0:11:15.550609\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "29100000\t0:11:17.888817\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "29200000\t0:11:20.153300\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "29300000\t0:11:22.452839\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "29400000\t0:11:24.743960\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "29500000\t0:11:26.987991\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "29600000\t0:11:29.191326\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "29700000\t0:11:31.448352\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "29800000\t0:11:33.730769\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "29900000\t0:11:36.294338\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "30000000\t0:11:39.416061\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "30100000\t0:11:42.023527\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "30200000\t0:11:44.368512\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "30300000\t0:11:46.792241\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "30400000\t0:11:49.053756\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "30500000\t0:11:51.285333\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "30600000\t0:11:53.681092\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "30700000\t0:11:56.011474\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "30800000\t0:11:58.439315\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "30900000\t0:12:00.679832\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "31000000\t0:12:02.958133\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "31100000\t0:12:05.285705\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "31200000\t0:12:08.030956\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "31300000\t0:12:10.367000\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "31400000\t0:12:12.966223\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "31500000\t0:12:15.217474\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "31600000\t0:12:17.702077\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "31700000\t0:12:19.955318\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "31800000\t0:12:22.237921\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "31900000\t0:12:24.605581\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "32000000\t0:12:27.235702\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "32100000\t0:12:29.649983\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "32200000\t0:12:32.138495\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "32300000\t0:12:34.551207\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "32400000\t0:12:36.862203\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "32500000\t0:12:39.277660\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "32600000\t0:12:42.548220\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "32700000\t0:12:46.698360\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "32800000\t0:12:49.674839\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "32900000\t0:12:51.819920\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "33000000\t0:12:53.944133\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "33100000\t0:12:56.143414\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "33200000\t0:12:58.639184\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "33300000\t0:13:01.118545\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "33400000\t0:13:03.396758\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "33500000\t0:13:05.725015\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "33600000\t0:13:08.223470\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "33700000\t0:13:10.523782\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "33800000\t0:13:12.792982\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "33900000\t0:13:15.048618\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "34000000\t0:13:17.264889\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "34100000\t0:13:19.690845\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "34200000\t0:13:21.944722\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "34300000\t0:13:24.272920\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "34400000\t0:13:26.505669\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "34500000\t0:13:28.775863\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "34600000\t0:13:31.054234\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "34700000\t0:13:33.294992\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "34800000\t0:13:35.552007\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "34900000\t0:13:37.931610\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "35000000\t0:13:40.179241\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "35100000\t0:13:42.414772\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "35200000\t0:13:44.642122\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "35300000\t0:13:46.889752\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "35400000\t0:13:49.271323\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "35500000\t0:13:51.471765\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "35600000\t0:13:53.738911\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "35700000\t0:13:55.990057\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "35800000\t0:13:58.242554\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "35900000\t0:14:00.554165\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "36000000\t0:14:02.890782\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "36100000\t0:14:05.134838\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "36200000\t0:14:07.681641\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "36300000\t0:14:09.914644\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "36400000\t0:14:11.997960\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "36500000\t0:14:14.090380\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "36600000\t0:14:16.415742\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "36700000\t0:14:18.897523\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "36800000\t0:14:21.099668\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "36900000\t0:14:23.333953\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "37000000\t0:14:25.535937\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "37100000\t0:14:27.706679\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "37200000\t0:14:29.895577\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "37300000\t0:14:32.012866\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "37400000\t0:14:34.134062\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "37500000\t0:14:36.260045\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "37600000\t0:14:38.731040\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "37700000\t0:14:40.892148\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "37800000\t0:14:43.001161\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "37900000\t0:14:45.093239\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "38000000\t0:14:47.208387\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "38100000\t0:14:49.307836\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "38200000\t0:14:51.392479\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "38300000\t0:14:53.560565\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "38400000\t0:14:55.711804\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "38500000\t0:14:57.844363\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "38600000\t0:14:59.930705\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "38700000\t0:15:02.030321\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "38800000\t0:15:04.048183\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "38900000\t0:15:06.122848\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "39000000\t0:15:08.455613\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "39100000\t0:15:10.722343\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "39200000\t0:15:12.890226\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "39300000\t0:15:15.047808\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "39400000\t0:15:17.176741\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "39500000\t0:15:19.430118\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "39600000\t0:15:21.579657\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "39700000\t0:15:23.675786\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "39800000\t0:15:25.807229\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "39900000\t0:15:27.981126\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "40000000\t0:15:30.175297\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "40100000\t0:15:32.267476\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "40200000\t0:15:34.343716\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "40300000\t0:15:36.452628\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "40400000\t0:15:38.886291\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r\n",
        " 40428966 Task execution time:\r\n",
        "\t0:15:39.800758\r\n",
        "\r\n",
        "Turning data/test into test.vw. Is_train_set? False\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0\t0:00:00.420749\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "100000\t0:00:04.398498\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "200000\t0:00:06.563869\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "300000\t0:00:08.668418\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "400000\t0:00:10.700820\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "500000\t0:00:12.780113\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "600000\t0:00:14.860785\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "700000\t0:00:17.529860\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "800000\t0:00:19.821141\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "900000\t0:00:21.860520\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1000000\t0:00:23.938180\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1100000\t0:00:26.108977\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1200000\t0:00:28.694526\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1300000\t0:00:30.923212\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1400000\t0:00:33.088179\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1500000\t0:00:35.215851\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1600000\t0:00:37.332083\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1700000\t0:00:39.578230\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1800000\t0:00:41.679309\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1900000\t0:00:43.915111\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2000000\t0:00:45.968854\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2100000\t0:00:48.003002\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2200000\t0:00:50.052432\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2300000\t0:00:52.205396\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2400000\t0:00:54.192727\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2500000\t0:00:56.245027\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2600000\t0:00:58.596864\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2700000\t0:01:00.725468\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2800000\t0:01:03.128903\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "2900000\t0:01:05.409398\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3000000\t0:01:07.975637\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3100000\t0:01:10.506805\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3200000\t0:01:12.726931\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3300000\t0:01:14.791400\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3400000\t0:01:16.888496\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3500000\t0:01:19.004787\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3600000\t0:01:21.060767\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3700000\t0:01:23.184298\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3800000\t0:01:25.283394\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3900000\t0:01:27.391460\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "4000000\t0:01:29.752674\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "4100000\t0:01:31.856220\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "4200000\t0:01:34.229628\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "4300000\t0:01:36.470578\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "4400000\t0:01:38.747250\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "4500000\t0:01:40.836265\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r\n",
        " 4577463 Task execution time:\r\n",
        "\t0:01:42.397042\r\n"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Lets have a look at the train and test files we have created. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!head -n 1 train_full.vw"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "-1 '1000009418151094273 |c 79 ddd2926e 1fbe01fe ecad2386 35 0 1 1722 320 15706 50 2 1005 07d7df22 28905ebd 7801e8d9 f3845767 0 a99f214a -1 44956a24  |hr 00 |day 2 |sinhr 0.0 |coshr 1.0 |weekend 0\r\n"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!head -n 1 test.vw"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1 '10000174058809263569 |c 23 69f45779 235ba823 ecad2386 175 3 1 761 50 320 8330 0 1005 07d7df22 f028772b 7801e8d9 f6ebf28e 0 a99f214a 100075 0eb711ec  |hr 00 |day 5 |sinhr 0.0 |coshr 1.0 |weekend 0\r\n"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "A note on cross validation"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "With any machine learning problem, we are served well to split the training data into training set and a validation set so as to get a better estimate of the test set error. However, since we are using VW there is no requirement of the same. This is because VW itself does the cross validation on the training data while learning and reports the cross validation error as 'average loss' in the output. Here average loss computes the [progressive validation loss](http://hunch.net/~jl/projects/prediction_bounds/progressive_validation/coltfinal.pdf). The critical thing to understand here is that progressive validation loss deviates like a test set, and hence is a reliable indicator of success on the first pass over any data-set."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "3.3 Analysis using VW"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Picking a loss function"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "One of the most important aspects of using vowpal wabbit successfully is to pick the right loss function over which the algorithm optimizes to learn. Online machine learning with VW learns from samples one at a time. When our model is trained it iterates through the train dataset and optimizes this function.\n",
      "\n",
      "Vowpal Wabbit has five loss functions:\n",
      "1. Squared loss. Useful for regression problems, when minimizing expectation. For example: Expected return on a stock.\n",
      "2. Classic loss. Vanilla squared loss (without the importance weight aware update).\n",
      "3. Quantile loss. Useful for regression problems, for example: predicting house pricing.\n",
      "4. Hinge loss. Useful for classification problems, minimizing the yes/no question (closest 0-1 approximation). For example: Keyword_tag or not.\n",
      "5. Log loss. Useful for classification problems, minimizer = probability, for example: Probability of click on ad.\n",
      "\n",
      "\n",
      "As we are predicting the probability of click on ads, so  Log loss is more suitable for our purpose. Also as stated earlier the competition evaluation is also based on log loss and hence this serves our purpose very well.  "
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Model 1"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We now train a simple vanilla model to generate the initial set of predictions by the following command. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!vw train_full.vw -f avazu.model.vw --loss_function logistic"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "final_regressor = avazu.model.vw\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Num weight bits = 18\r\n",
        "learning rate = 0.5\r\n",
        "initial_t = 0\r\n",
        "power_t = 0.5\r\n",
        "using no cache\r\n",
        "Reading datafile = train_full.vw\r\n",
        "num sources = 1\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "average    since         example     example  current  current  current\r\n",
        "loss       last          counter      weight    label  predict features\r\n",
        "0.693147   0.693147            1         1.0  -1.0000   0.0000       27\r\n",
        "0.458050   0.222953            2         2.0  -1.0000  -1.3872       27\r\n",
        "0.274633   0.091217            4         4.0  -1.0000  -2.5609       27\r\n",
        "0.180545   0.086458            8         8.0  -1.0000  -2.5604       27\r\n",
        "0.358718   0.536891           16        16.0  -1.0000  -2.7296       27\r\n",
        "0.400848   0.442978           32        32.0  -1.0000  -1.7490       27\r\n",
        "0.530634   0.660419           64        64.0  -1.0000  -0.8129       27\r\n",
        "0.536752   0.542870          128       128.0  -1.0000  -1.4927       27\r\n",
        "0.490036   0.443320          256       256.0   1.0000  -1.8038       27\r\n",
        "0.460721   0.431407          512       512.0  -1.0000  -2.7424       27\r\n",
        "0.435663   0.410605         1024      1024.0  -1.0000  -1.4675       27\r\n",
        "0.431234   0.426805         2048      2048.0  -1.0000  -2.9783       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.433299   0.435365         4096      4096.0  -1.0000  -2.2066       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.425758   0.418216         8192      8192.0  -1.0000  -2.2592       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.416503   0.407248        16384     16384.0  -1.0000  -2.8503       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.417421   0.418338        32768     32768.0   1.0000  -1.4954       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.411354   0.405288        65536     65536.0  -1.0000  -2.8381       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.406810   0.402266       131072    131072.0  -1.0000  -4.1793       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.401592   0.396373       262144    262144.0  -1.0000  -2.7953       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.384311   0.367031       524288    524288.0  -1.0000  -1.9057       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.376373   0.368435      1048576   1048576.0  -1.0000  -1.2119       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.378080   0.379787      2097152   2097152.0  -1.0000  -3.3367       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.399552   0.421024      4194304   4194304.0  -1.0000  -3.6283       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.382552   0.365553      8388608   8388608.0  -1.0000  -0.6560       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.397118   0.411683     16777216  16777216.0  -1.0000  -0.8555       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.395913   0.394709     33554432  33554432.0   1.0000  -0.7646       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r\n",
        "finished run\r\n",
        "number of examples per pass = 40428967\r\n",
        "passes used = 1\r\n",
        "weighted example sum = 4.0429e+07\r\n",
        "weighted label sum = -2.66988e+07\r\n",
        "average loss = 0.394479\r\n",
        "best constant = -0.660389\r\n",
        "total feature number = 1091582109\r\n"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The -f option in the command saves our learned model in avazu.model.vw to be used later for predictions. --loss_function logistic tells the model to use logistic i.e. log loss for learning purpose. \n",
      "\n",
      "The output of the training process is as above. The detailed explanation of various numbers can be found at the VW tutorial link shared in the begining. "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We then predict the test set output with the following command from the terminal. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!vw test.vw -t -i avazu.model.vw --link logistic -p avazu.preds.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "only testing\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Num weight bits = 18\r\n",
        "learning rate = 10\r\n",
        "initial_t = 1\r\n",
        "power_t = 0.5\r\n",
        "predictions = avazu.preds.txt\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "using no cache\r\n",
        "Reading datafile = test.vw\r\n",
        "num sources = 1\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "average    since         example     example  current  current  current\r\n",
        "loss       last          counter      weight    label  predict features\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "8.694021   8.694021            1         1.0   1.0000   0.1247       27\r\n",
        "8.727754   8.761487            2         2.0   1.0000   0.1235       27\r\n",
        "9.583062   10.438371           4         4.0   1.0000   0.0615       27\r\n",
        "10.341092  11.099122           8         8.0   1.0000   0.0180       27\r\n",
        "9.568561   8.796030           16        16.0   1.0000   0.1548       27\r\n",
        "9.822393   10.076225          32        32.0   1.0000   0.0604       27\r\n",
        "9.959246   10.096099          64        64.0   1.0000   0.1584       27\r\n",
        "10.545743  11.132239         128       128.0   1.0000   0.0723       27\r\n",
        "11.866919  13.188095         256       256.0   1.0000   0.0439       27\r\n",
        "12.152296  12.437672         512       512.0   1.0000   0.1457       27\r\n",
        "12.365687  12.579078        1024      1024.0   1.0000   0.0570       27\r\n",
        "12.336797  12.307907        2048      2048.0   1.0000   0.0174       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12.388471  12.440146        4096      4096.0   1.0000   0.0488       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12.471634  12.554796        8192      8192.0   1.0000   0.1106       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12.458044  12.444454       16384     16384.0   1.0000   0.1175       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12.532114  12.606185       32768     32768.0   1.0000   0.0135       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12.491542  12.450970       65536     65536.0   1.0000   0.2037       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12.514672  12.537803      131072    131072.0   1.0000   0.0645       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "13.046750  13.578828      262144    262144.0   1.0000   0.1653       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12.012632  10.978514      524288    524288.0   1.0000   0.1626       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10.884075  9.755517      1048576   1048576.0   1.0000   0.0444       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "9.990137   9.096199      2097152   2097152.0   1.0000   0.0551       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "9.872667   9.755197      4194304   4194304.0   1.0000   0.0991       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r\n",
        "finished run\r\n",
        "number of examples per pass = 4577464\r\n",
        "passes used = 1\r\n",
        "weighted example sum = 4.57746e+06\r\n",
        "weighted label sum = 4.57746e+06\r\n",
        "average loss = 10.0096\r\n",
        "best constant = 1\r\n",
        "total feature number = 123591528\r\n"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The -t says to test only and not train. -i avazu.model.vw says to use the model that we learned from the training process. -p saves our predictions to avazu.preds.txt. --link logistic automatically gives a prediction in the range of 0 to 1 (i.e. does the sigmoid transformation) as required by our submission format for the competition."
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Convert Data to Kaggle submission format"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We now convert our predictions obatined in the avazu.pred.txt file to kaggle submission format with the following code. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import sys\n",
      "\n",
      "with open(\"submission.csv\",\"wb\") as outfile:\n",
      "\toutfile.write(\"id,click\\n\")\n",
      "\tfor line in open(sys.argv[1]):\n",
      "\t\trow = line.strip().split(\" \")\n",
      "\t\toutfile.write(\"%s,%f\\n\"%(row[1],float(row[0])))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We save the above to a file named vw2kaggle.py and then execute it with ```pypy``` as follows."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!pypy vw2kaggle.py avazu.preds.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Lets have a look at the submission file to check its content.  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!head -n 10 submission.csv"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "id,click\r\n",
        "10000174058809263569,0.124710\r\n",
        "10000182526920855428,0.123469\r\n",
        "10000554139829213984,0.161742\r\n",
        "10001094637809798845,0.061488\r\n",
        "10001377041558670745,0.260012\r\n",
        "10001521204153353724,0.194398\r\n",
        "10001911056707023378,0.113664\r\n",
        "10001982898844213216,0.017951\r\n",
        "10002000217531288531,0.050727\r\n"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Upon submission this gave a leaderboard score of 0.3991901 which is pretty close to the average loss reported by our learner in the training phase i.e. 0.394479, hence confirming our earlier approximation of using average loss as a guide for validation loss. This submission also put me around the 50% mark of the leaderboard. "
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "3.4 Fine Tuning the Learning Model in VW"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "3.4.1 Number of bits (-b option)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can see from the training phase of Model 1 that the total number of features used by our model is 970295208. \n",
      "\n",
      "Now VW hashes feature names into a 2^b dimensional space. By default it uses 18 bits, so that\u2019s about 262k possible features. If we have more than that, they will collide, meaning that the software won\u2019t be able to distinguish between some of them. Fortunately we can increase the number of bits used for hashing so that we can get millions of features. Since our total number of features is a nine digit figure (970295208) so the ideal value of b for us would be 28 (2^28 is a nine digit number) so that we avoid feature collisions. We chose b=30 to add a bit of buffer.  "
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "3.4.2 Tuning the learning rate (-l option)"
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Model 2"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "vw-hypersearch is a simple wrapper to vw to help in finding lowest-loss hyper-parameters (argmin).\n",
      "\n",
      "For example: to find the lowest average loss for --l1 (L1-norm regularization) on a train-set called train.dat:\n",
      "    \n",
      "    $ vw-hypersearch  1e-10  5e-4  vw  --l1 %  train.dat\n",
      "vw-hypersearch will train multiple times (but in a efficient way) until it finds the --l1 value resulting in the lowest average training loss.\n",
      "\n",
      "Explanation of the example:\n",
      "the % character is a placeholder for the (argmin) parameter we are looking for.\n",
      "1e-10 is the lower-bound for the search range\n",
      "5e-4 is the upper-bound of the search range\n",
      "The lower & upper bounds are arguments to vw-hypersearch. Anything from vw on, are normal vw arguments exactly as one would use in training. The only change one must apply to the training command is to use % instead of the value of the parameter we trying to optimize on.\n",
      "\n",
      "In order to find the best learning rate for our model which gives the lowest average loss we run the following VW command. It took around 2 hours to run on my MacBook 2012 with 4GB RAM. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!vw-hypersearch 0.0 1 vw -b 30 --loss_function logistic --learning_rate % train_full.vw"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "trying 0.618033988749895 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "..."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "..."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "....."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.387706 (best)\r\n",
        "trying 0.381966011250105 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "...."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.389596\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "trying 0.76393202250021 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "...."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "........"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.386971 (best)\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "trying 0.854101966249684 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "....."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.386623 (best)\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "trying 0.909830056250526 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.386441 (best)\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "trying 0.944271909999159 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "..........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.38634 (best)\r\n",
        "trying 0.965558146251367 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.386281 (best)\r\n",
        "trying 0.978713763747792 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.386246 (best)\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "trying 0.986844382503575 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "..."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.386225 (best)\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "trying 0.991869381244217 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.386213 (best)\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "trying 0.994975001259358 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "....."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.386205 (best)\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "trying 0.996894379984858 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "..........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "...."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "....."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.3862 (best)\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "trying 0.9980806212745 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.386197 (best)\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "trying 0.998813758710358 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.386195 (best)\r\n",
        "trying 0.999266862564143 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "....."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "...."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "....."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.386194 (best)\r\n",
        "trying 0.999546896146215 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.386194 (best)\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "vw-hypersearch: loss(0.999547) == loss(0.999267): 0.386194\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "trying 0.999406879355179 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.386194 (best)\r\n",
        "0.999407\t0.386194\r\n"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "At the end of it it tells us that 0.999407 gives the lowest average loss. "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We now run our model with the updated value of learning rate as follows."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!vw train_full.vw -b 30 -l 0.999407 -f model-b30-l.vw --loss_function logistic"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "final_regressor = model-b30-l.vw\r\n",
        "Num weight bits = 30\r\n",
        "learning rate = 0.999407\r\n",
        "initial_t = 0\r\n",
        "power_t = 0.5\r\n",
        "using no cache\r\n",
        "Reading datafile = train_full.vw\r\n",
        "num sources = 1\r\n",
        "average    since         example     example  current  current  current\r\n",
        "loss       last          counter      weight    label  predict features\r\n",
        "0.693147   0.693147            1         1.0  -1.0000   0.0000       27\r\n",
        "0.409970   0.126793            2         2.0  -1.0000  -2.0011       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.227020   0.044070            4         4.0  -1.0000  -3.3033       27\r\n",
        "0.138481   0.049943            8         8.0  -1.0000  -3.1471       27\r\n",
        "0.392604   0.646727           16        16.0  -1.0000  -2.8508       27\r\n",
        "0.443221   0.493838           32        32.0  -1.0000  -1.5859       27\r\n",
        "0.569091   0.694960           64        64.0  -1.0000  -0.9992       27\r\n",
        "0.579374   0.589657          128       128.0  -1.0000  -1.6149       27\r\n",
        "0.514951   0.450528          256       256.0   1.0000  -1.8114       27\r\n",
        "0.480296   0.445641          512       512.0  -1.0000  -3.6789       27\r\n",
        "0.453998   0.427700         1024      1024.0  -1.0000  -1.7688       27\r\n",
        "0.447397   0.440796         2048      2048.0  -1.0000  -3.4554       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.444261   0.441125         4096      4096.0  -1.0000  -1.9968       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.434646   0.425031         8192      8192.0  -1.0000  -2.0707       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.422058   0.409470        16384     16384.0  -1.0000  -2.6599       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.420672   0.419287        32768     32768.0   1.0000  -1.5586       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.412430   0.404188        65536     65536.0  -1.0000  -3.2506       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.406065   0.399699       131072    131072.0  -1.0000  -4.6054       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.399577   0.393090       262144    262144.0  -1.0000  -2.7182       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.382006   0.364434       524288    524288.0  -1.0000  -2.0122       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.373160   0.364315      1048576   1048576.0  -1.0000  -1.0706       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.373796   0.374431      2097152   2097152.0  -1.0000  -3.3859       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.394055   0.414315      4194304   4194304.0  -1.0000  -3.7731       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.375953   0.357851      8388608   8388608.0  -1.0000  -0.6535       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.389097   0.402241     16777216  16777216.0  -1.0000  -0.6822       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.387660   0.386223     33554432  33554432.0   1.0000  -0.9213       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r\n",
        "finished run\r\n",
        "number of examples per pass = "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "40428967\r\n",
        "passes used = 1\r\n",
        "weighted example sum = 4.0429e+07\r\n",
        "weighted label sum = -2.66988e+07\r\n",
        "average loss = 0.386194\r\n",
        "best constant = -0.660389\r\n",
        "total feature number = 1091582109\r\n"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Here we get an average loss of 0.386194 which is better than our plain vanilla Model 1 (0.394479) tried out earlier. \n",
      "\n",
      "We can then predict on the test set and generate the submission file using the following two commands.  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!vw test.vw -t -i model-b30-l.vw --link logistic -p preds-b30-l.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "only testing\r\n",
        "Num weight bits = 30\r\n",
        "learning rate = 10\r\n",
        "initial_t = 1\r\n",
        "power_t = 0.5\r\n",
        "predictions = preds-b30-l.txt\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "using no cache\r\n",
        "Reading datafile = test.vw\r\n",
        "num sources = 1\r\n",
        "average    since         example     example  current  current  current\r\n",
        "loss       last          counter      weight    label  predict features\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12.273828  12.273828           1         1.0   1.0000   0.0756       27\r\n",
        "11.649576  11.025324           2         2.0   1.0000   0.0894       27\r\n",
        "15.738128  19.826681           4         4.0   1.0000   0.0100       27\r\n",
        "15.530900  15.323672           8         8.0   1.0000   0.0073       27\r\n",
        "14.530898  13.530895          16        16.0   1.0000   0.0688       27\r\n",
        "13.962127  13.393357          32        32.0   1.0000   0.0381       27\r\n",
        "14.330523  14.698918          64        64.0   1.0000   0.0956       27\r\n",
        "14.210387  14.090252         128       128.0   1.0000   0.0352       27\r\n",
        "16.509169  18.807951         256       256.0   1.0000   0.0651       27\r\n",
        "16.354505  16.199841         512       512.0   1.0000   0.1227       27\r\n",
        "16.653517  16.952528        1024      1024.0   1.0000   0.0366       27\r\n",
        "16.548226  16.442935        2048      2048.0   1.0000   0.0103       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "16.662457  16.776689        4096      4096.0   1.0000   0.0439       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "16.723325  16.784193        8192      8192.0   1.0000   0.0546       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "16.675594  16.627863       16384     16384.0   1.0000   0.0684       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "16.739274  16.802954       32768     32768.0   1.0000   0.0116       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "16.690641  16.642009       65536     65536.0   1.0000   0.0864       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "16.875157  17.059673      131072    131072.0   1.0000   0.0402       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "17.628174  18.381192      262144    262144.0   1.0000   0.1118       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "15.692613  13.757051      524288    524288.0   1.0000   0.1239       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12.597286  9.501959      1048576   1048576.0   1.0000   0.1167       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10.301880  8.006474      2097152   2097152.0   1.0000   0.0771       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "9.699953   9.098027      4194304   4194304.0   1.0000   0.1192       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r\n",
        "finished run\r\n",
        "number of examples per pass = 4577464\r\n",
        "passes used = 1\r\n",
        "weighted example sum = 4.57746e+06\r\n",
        "weighted label sum = 4.57746e+06\r\n",
        "average loss = 9.93796\r\n",
        "best constant = 1\r\n",
        "total feature number = 123591528\r\n"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!pypy vw2kaggle.py preds-b30-l.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Interestingly, this submission gave a leaderboard score of 0.4074371 which is worse than the plain vanilla model 1. "
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "3.4.3 Applying Regularization (-l1/l2)"
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Model 3"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As the resultant number of features is very large (of the order of 10^7, as can be seen from the output of learning models), hence we incorporate bit of regularization so that our model does not overfit the training data. We find the best possible value of -l1 using vw -hypersearch as before. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!vw-hypersearch 0.000000001 1 vw -b 30 --loss_function logistic --l1 % train_full.vw"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "vw-hypersearch: you may get better results with -L (log-space search)\r\n",
        "\t\twhen any of --l1/hinge-loss/small-param-values are used\r\n",
        "trying 0.618033989131861 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "...................."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "..."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.693147 (best)\r\n",
        "trying 0.381966011868139 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "..................."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "......."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.693147 (best)\r\n",
        "vw-hypersearch: loss(0.618034) == loss(0.381966): 0.693147\r\n",
        "trying 0.5000000005 "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "...."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "...."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "..........."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        ".."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "..."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "...."
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " 0.693147 (best)\r\n",
        "0.5\t0.693147\r\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "For some reason, it gives the same average loss for different values of l1 parameter implying regularization does not add any value to the model. \n",
      "\n",
      "However, in a model with large number of features (like ours) it is always advisable to add a bit of regularization to avoid overfitting, hence we add a very small -l1 value of 0.00000001 to our model. We now run our model with the updated value of -l1 as follows."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!vw train_full.vw -b 30 -l 0.999407 --l1 0.00000001 -f model-b30-l-l1.vw --loss_function logistic  "
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "using l1 regularization = 1e-08\r\n",
        "final_regressor = model-b30-l-l1.vw\r\n",
        "Num weight bits = 30\r\n",
        "learning rate = 0.999407\r\n",
        "initial_t = 0\r\n",
        "power_t = 0.5\r\n",
        "using no cache\r\n",
        "Reading datafile = train_full.vw\r\n",
        "num sources = 1\r\n",
        "average    since         example     example  current  current  current\r\n",
        "loss       last          counter      weight    label  predict features\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.693147   0.693147            1         1.0  -1.0000   0.0000       27\r\n",
        "0.409970   0.126793            2         2.0  -1.0000  -2.0011       27\r\n",
        "0.227020   0.044070            4         4.0  -1.0000  -3.3033       27\r\n",
        "0.138481   0.049943            8         8.0  -1.0000  -3.1471       27\r\n",
        "0.392604   0.646727           16        16.0  -1.0000  -2.8508       27\r\n",
        "0.443221   0.493838           32        32.0  -1.0000  -1.5859       27\r\n",
        "0.569091   0.694960           64        64.0  -1.0000  -0.9992       27\r\n",
        "0.579374   0.589656          128       128.0  -1.0000  -1.6149       27\r\n",
        "0.514951   0.450528          256       256.0   1.0000  -1.8115       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.480296   0.445641          512       512.0  -1.0000  -3.6788       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.453997   0.427698         1024      1024.0  -1.0000  -1.7688       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.447395   0.440792         2048      2048.0  -1.0000  -3.4553       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.444256   0.441117         4096      4096.0  -1.0000  -1.9970       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.434639   0.425023         8192      8192.0  -1.0000  -2.0707       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.422050   0.409461        16384     16384.0  -1.0000  -2.6602       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.420667   0.419285        32768     32768.0   1.0000  -1.5584       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.412426   0.404185        65536     65536.0  -1.0000  -3.2491       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.406063   0.399700       131072    131072.0  -1.0000  -4.6018       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.399582   0.393100       262144    262144.0  -1.0000  -2.7171       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.382007   0.364432       524288    524288.0  -1.0000  -2.0076       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.373174   0.364341      1048576   1048576.0  -1.0000  -1.0797       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.373839   0.374505      2097152   2097152.0  -1.0000  -3.3702       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.394181   0.414523      4194304   4194304.0  -1.0000  -3.7544       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.376235   0.358288      8388608   8388608.0  -1.0000  -0.6966       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.389913   0.403591     16777216  16777216.0  -1.0000  -0.7033       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.389299   0.388685     33554432  33554432.0   1.0000  -0.9818       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r\n",
        "finished run\r\n",
        "number of examples per pass = "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "40428967\r\n",
        "passes used = 1\r\n",
        "weighted example sum = 4.0429e+07\r\n",
        "weighted label sum = -2.66988e+07\r\n",
        "average loss = 0.388106\r\n",
        "best constant = -0.660389\r\n",
        "total feature number = 1091582109\r\n"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We will also try one model with -l2 regularization as below and infact find that l2 regularization works better here than l1."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!vw train_full.vw -b 30 -l 0.999407 --l2 0.00000001 -f model-b30-l-l2.vw --loss_function logistic  "
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "using l2 regularization = 1e-08\r\n",
        "final_regressor = model-b30-l-l2.vw\r\n",
        "Num weight bits = 30\r\n",
        "learning rate = 0.999407\r\n",
        "initial_t = 0\r\n",
        "power_t = 0.5\r\n",
        "using no cache\r\n",
        "Reading datafile = train_full.vw\r\n",
        "num sources = 1\r\n",
        "average    since         example     example  current  current  current\r\n",
        "loss       last          counter      weight    label  predict features\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.693147   0.693147            1         1.0  -1.0000   0.0000       27\r\n",
        "0.409970   0.126793            2         2.0  -1.0000  -2.0011       27\r\n",
        "0.227020   0.044070            4         4.0  -1.0000  -3.3033       27\r\n",
        "0.138481   0.049943            8         8.0  -1.0000  -3.1471       27\r\n",
        "0.392604   0.646727           16        16.0  -1.0000  -2.8508       27\r\n",
        "0.443221   0.493838           32        32.0  -1.0000  -1.5859       27\r\n",
        "0.569091   0.694960           64        64.0  -1.0000  -0.9992       27\r\n",
        "0.579374   0.589657          128       128.0  -1.0000  -1.6149       27\r\n",
        "0.514951   0.450528          256       256.0   1.0000  -1.8115       27\r\n",
        "0.480296   0.445641          512       512.0  -1.0000  -3.6789       27\r\n",
        "0.453998   0.427700         1024      1024.0  -1.0000  -1.7687       27\r\n",
        "0.447397   0.440796         2048      2048.0  -1.0000  -3.4554       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.444260   0.441124         4096      4096.0  -1.0000  -1.9968       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.434645   0.425030         8192      8192.0  -1.0000  -2.0707       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.422057   0.409469        16384     16384.0  -1.0000  -2.6599       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.420672   0.419286        32768     32768.0   1.0000  -1.5586       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.412430   0.404187        65536     65536.0  -1.0000  -3.2504       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.406064   0.399699       131072    131072.0  -1.0000  -4.6042       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.399576   0.393088       262144    262144.0  -1.0000  -2.7179       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.382001   0.364426       524288    524288.0  -1.0000  -2.0115       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.373149   0.364297      1048576   1048576.0  -1.0000  -1.0726       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.373772   0.374396      2097152   2097152.0  -1.0000  -3.3835       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.394002   0.414231      4194304   4194304.0  -1.0000  -3.7604       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.375869   0.357736      8388608   8388608.0  -1.0000  -0.6617       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.388928   0.401987     16777216  16777216.0  -1.0000  -0.6949       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.387333   0.385739     33554432  33554432.0   1.0000  -0.9512       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r\n",
        "finished run\r\n",
        "number of examples per pass = "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "4042"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "8967\r\n",
        "passes used = 1\r\n",
        "weighted example sum = 4.0429e+07\r\n",
        "weighted label sum = -2.66988e+07\r\n",
        "average loss = 0.385823\r\n",
        "best constant = -0.660389\r\n",
        "total feature number = 1091582109\r\n"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As we can see this gives a lower average loss (0.385823) than the that with l1 regularization (0.388106) and one in Model 2 (0.386194)."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!vw test.vw -t -i model-b30-l-l2.vw --link logistic -p preds-b30-l-l2.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "only testing\r\n",
        "Num weight bits = "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "30\r\n",
        "learning rate = 10\r\n",
        "initial_t = 1\r\n",
        "power_t = 0.5\r\n",
        "predictions = preds-b30-l-l2.txt\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "using no cache\r\n",
        "Reading datafile = test.vw\r\n",
        "num sources = 1\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "average    since         example     example  current  current  current\r\n",
        "loss       last          counter      weight    label  predict features\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "11.425002  11.425002           1         1.0   1.0000   0.0847       27\r\n",
        "10.503748  9.582495            2         2.0   1.0000   0.1095       27\r\n",
        "14.356219  18.208689           4         4.0   1.0000   0.0121       27\r\n",
        "13.835917  13.315616           8         8.0   1.0000   0.0106       27\r\n",
        "12.818223  11.800529          16        16.0   1.0000   0.0894       27\r\n",
        "12.313324  11.808425          32        32.0   1.0000   0.0467       27\r\n",
        "12.641553  12.969782          64        64.0   1.0000   0.1170       27\r\n",
        "12.429421  12.217288         128       128.0   1.0000   0.0452       27\r\n",
        "14.433907  16.438394         256       256.0   1.0000   0.0815       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "14.300987  14.168067         512       512.0   1.0000   0.1429       27\r\n",
        "14.549103  14.797220        1024      1024.0   1.0000   0.0459       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "14.474095  14.399087        2048      2048.0   1.0000   0.0142       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "14.573010  14.671925        4096      4096.0   1.0000   0.0544       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "14.630715  14.688420        8192      8192.0   1.0000   0.0737       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "14.591898  14.553080       16384     16384.0   1.0000   0.0821       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "14.645670  14.699442       32768     32768.0   1.0000   0.0176       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "14.601456  14.557242       65536     65536.0   1.0000   0.1198       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "14.641540  14.681624      131072    131072.0   1.0000   0.0548       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "15.141720  15.641900      262144    262144.0   1.0000   0.1402       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "13.690059  12.238398      524288    524288.0   1.0000   0.1355       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "11.455350  9.220642      1048576   1048576.0   1.0000   0.0992       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "9.719304   7.983257      2097152   2097152.0   1.0000   0.0752       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "9.190696   8.662089      4194304   4194304.0   1.0000   0.1346       27\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r\n",
        "finished run\r\n",
        "number of examples per pass = "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "4577464\r\n",
        "passes used = 1\r\n",
        "weighted example sum = 4.57746e+06\r\n",
        "weighted label sum = 4.57746e+06\r\n",
        "average loss = 9.34768\r\n",
        "best constant = 1\r\n",
        "total feature number = 123591528\r\n"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!pypy vw2kaggle.py preds-b30-l-l2.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "3.4.4 Increase the number of passes (-p)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "By increasing the number of times the algorithm will cycle over the training data, we allow VW to better fit the model and thus reduce the average loss further. However, due to the limitation of my hardware I was not able to try it out.  "
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "3.4.5 Provide a higher weight to click data points"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As we had seen earlier, the data is highly skewed with only 17% clicks (6865066/40428967). So we decided to give more weightage to the data points reflecting a click behavior however that dint give a better result in our final analysis. Hence we will not go into the details of this. "
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "3.4.5 Incorporating quadratic terms"
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Model 4"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Often, we want to be able to include interaction features between sets of features. This is a useful way to model nonlinearities in the data while still being able to use a linear learner like VW.\n",
      "\n",
      "VW has support for both two-way (quadratic) and three-way (cubic) interactions across feature namespaces. Quadratic features create a new namespace where each feature is the concatenation of two features from each namespace. The value of this feature is the product of the values that make up the features its composed from. Quadratic features are specified with the -q options and the feature namespaces as arguments.\n",
      "\n",
      "We incorporate the quadratic terms in our model as follows. '-q ::' specifies the model to form interactions of each namespace in our data set with the other. Further using quadratic interactions also poses a danger of overfitting, so we apply --l2 regularization to counter it."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!vw train_full.vw -b 30 -l 0.999407 --l2 0.00000001 -q :: -f model-b30-l-l2-q.vw --loss_function logistic"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "raw",
     "metadata": {},
     "source": [
      "creating quadratic features for pairs: :: in pair creation\n",
      "\n",
      "final_regressor = avazu.model.vw\n",
      "Num weight bits = 25\n",
      "learning rate = 0.3945\n",
      "initial_t = 0\n",
      "power_t = 0.5\n",
      "using no cache\n",
      "Reading datafile = train_full.vw\n",
      "num sources = 1\n",
      "average    since         example     example  current  current  current\n",
      "loss       last          counter      weight    label  predict features\n",
      "0.693147   0.693147            1         1.0  -1.0000   0.0000      601\n",
      "0.394862   0.096578            2         2.0  -1.0000  -2.2887      601\n",
      "0.209225   0.023588            4         4.0  -1.0000  -3.8691      601\n",
      "0.144953   0.080681            8         8.0  -1.0000  -2.5628      601\n",
      "0.429961   0.714969           16        16.0  -1.0000  -3.3126      601\n",
      "0.474228   0.518494           32        32.0  -1.0000  -1.5024      601\n",
      "0.585054   0.695881           64        64.0  -1.0000  -1.0034      601\n",
      "0.622823   0.660591          128       128.0  -1.0000  -1.4885      601\n",
      "0.560085   0.497348          256       256.0   1.0000  -1.7599      601\n",
      "0.529467   0.498849          512       512.0  -1.0000  -4.3134      601\n",
      "0.505159   0.480852         1024      1024.0  -1.0000  -3.0823      601\n",
      "0.496035   0.486911         2048      2048.0  -1.0000  -4.3662      601\n",
      "0.482713   0.469391         4096      4096.0  -1.0000  -0.9873      601\n",
      "0.467771   0.452830         8192      8192.0  -1.0000  -1.2962      601\n",
      "0.447862   0.427952        16384     16384.0  -1.0000  -2.1017      601\n",
      "0.441907   0.435953        32768     32768.0   1.0000  -1.8154      601\n",
      "0.427817   0.413728        65536     65536.0  -1.0000  -4.1079      601\n",
      "0.416466   0.405114       131072    131072.0  -1.0000  -6.1644      601\n",
      "0.406305   0.396145       262144    262144.0  -1.0000  -1.8602      601\n",
      "0.387445   0.368586       524288    524288.0  -1.0000  -2.2565      601\n",
      "0.376486   0.365527      1048576   1048576.0  -1.0000  -1.6468      601\n",
      "0.375087   0.373687      2097152   2097152.0  -1.0000  -3.8969      601\n",
      "0.393947   0.412807      4194304   4194304.0  -1.0000  -4.2576      601\n",
      "0.374755   0.355562      8388608   8388608.0  -1.0000  -1.5612      601\n",
      "0.386943   0.399131     16777216  16777216.0  -1.0000  -0.8215      601\n",
      "0.385446   0.383950     33554432  33554432.0   1.0000  -0.9720      601\n",
      "\n",
      "finished run\n",
      "number of examples per pass = 40428967\n",
      "passes used = 1\n",
      "weighted example sum = 4.0429e+07\n",
      "weighted label sum = -2.66988e+07\n",
      "average loss = 0.383988\n",
      "best constant = -0.660389\n",
      "total feature number = 24297809167"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This gives us a average loss of 0.383988, which is better than that of Model 3 (0.385823) and Model 2 (0.386194).  "
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Submissions"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "For my submissions I chose the plain vanilla model optimized for learning rate and regularization i.e. Model 3 and the one with quadratic interactions i.e. Model 4 which gave me the best average loss among all my models. Out of the two my best submission came out to be Model 3, but it put me only around the 45% mark of the leader board. Having said that it was a great to further enhance my competence in VW and dive deep into its salient features. "
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Further scope of Work"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      " 1. The data is very non linear and hence one can also look into using neural nets to take care of it. \n",
      " 2. Clustering of the data sets as per device or domain ids could be looked into as one can expect to have some common behavior in these groups. "
     ]
    }
   ],
   "metadata": {}
  }
 ]
}