Skip to content

Instantly share code, notes, and snippets.

@pronojitsaha
Created April 30, 2015 12:44
Show Gist options
  • Save pronojitsaha/6693a467e74fc72d9fba to your computer and use it in GitHub Desktop.
Save pronojitsaha/6693a467e74fc72d9fba to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "",
"signature": "sha256:95a42d89f795a9a268747330dd89410b1a831b1c6bd7ffad3c46bb9361be57eb"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"[AVAZU CTR PREDICTION](https://www.kaggle.com/c/avazu-ctr-prediction)"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"1. Background"
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"The Problem"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Predict whether a mobile ad will be clicked.\n",
"\n",
"In online advertising, click-through rate (CTR) is a very important metric for evaluating ad performance. As a result, click prediction systems are essential and widely used for sponsored search and real-time bidding.\n",
"\n",
"For this competition, we are provided 11 days worth of Avazu data to build and test prediction models. The challenge was to find a strategy that beats standard classification algorithms."
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"The Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data was provided in two files as follows: \n",
"1. train - Training set. 10 days of click-through data, ordered chronologically. Non-clicks and clicks are subsampled according to different strategies.\n",
"\n",
"2. test - Test set. 1 day of ads to for testing your model predictions. \n",
"\n",
"3. sampleSubmission.csv - Sample submission file in the correct format, corresponds to the All-0.5 Benchmark.\n",
"\n",
"The train & test file contained the following **data fields/features**: \n",
"id: ad identifier \n",
"click: 0/1 for non-click/click \n",
"hour: format is YYMMDDHH, so 14091123 means 23:00 on Sept. 11, 2014 UTC. \n",
"C1 -- anonymized categorical variable \n",
"banner_pos \n",
"site_id \n",
"site_domain \n",
"site_category \n",
"app_id \n",
"app_domain \n",
"app_category \n",
"device_id \n",
"device_ip \n",
"device_model \n",
"device_type \n",
"device_conn_type \n",
"C14-C21 -- anonymized categorical variables "
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"Evaluation Metric"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Submissions are evaluated using the [Logarithmic Loss](https://www.kaggle.com/wiki/LogarithmicLoss) (smaller is better)."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"2. Data Exploration"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A simple unix commmand of ```wc -l``` tells us that there are around 40 million rows in the train data file. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!wc -l data/train"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 40428968 data/train\r\n"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets now examine the first few rows and get an idea of the structure of the file. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!head -n 5 data/train"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"id,click,hour,C1,banner_pos,site_id,site_domain,site_category,app_id,app_domain,app_category,device_id,device_ip,device_model,device_type,device_conn_type,C14,C15,C16,C17,C18,C19,C20,C21\r",
"\r\n",
"1000009418151094273,0,14102100,1005,0,1fbe01fe,f3845767,28905ebd,ecad2386,7801e8d9,07d7df22,a99f214a,ddd2926e,44956a24,1,2,15706,320,50,1722,0,35,-1,79\r",
"\r\n",
"10000169349117863715,0,14102100,1005,0,1fbe01fe,f3845767,28905ebd,ecad2386,7801e8d9,07d7df22,a99f214a,96809ac8,711ee120,1,0,15704,320,50,1722,0,35,100084,79\r",
"\r\n",
"10000371904215119486,0,14102100,1005,0,1fbe01fe,f3845767,28905ebd,ecad2386,7801e8d9,07d7df22,a99f214a,b3cf8def,8a4875bd,1,0,15704,320,50,1722,0,35,100084,79\r",
"\r\n",
"10000640724480838376,0,14102100,1005,0,1fbe01fe,f3845767,28905ebd,ecad2386,7801e8d9,07d7df22,a99f214a,e8275b8f,6332421a,1,0,15706,320,50,1722,0,35,100084,79\r",
"\r\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see, the data is completely annonymized and values for most of the columns are enrypted too. \n",
"\n",
"Lets now find out the same details about the test set. As we can see the test set has around 4 million rows. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!wc -l data/test"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 4577465 data/test\r\n"
]
}
],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!head -n 5 data/test"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"id,hour,C1,banner_pos,site_id,site_domain,site_category,app_id,app_domain,app_category,device_id,device_ip,device_model,device_type,device_conn_type,C14,C15,C16,C17,C18,C19,C20,C21\r",
"\r\n",
"10000174058809263569,14103100,1005,0,235ba823,f6ebf28e,f028772b,ecad2386,7801e8d9,07d7df22,a99f214a,69f45779,0eb711ec,1,0,8330,320,50,761,3,175,100075,23\r",
"\r\n",
"10000182526920855428,14103100,1005,0,1fbe01fe,f3845767,28905ebd,ecad2386,7801e8d9,07d7df22,a99f214a,e8d44657,ecb851b2,1,0,22676,320,50,2616,0,35,100083,51\r",
"\r\n",
"10000554139829213984,14103100,1005,0,1fbe01fe,f3845767,28905ebd,ecad2386,7801e8d9,07d7df22,a99f214a,10fb085b,1f0bc64f,1,0,22676,320,50,2616,0,35,100083,51\r",
"\r\n",
"10001094637809798845,14103100,1005,0,85f751fd,c4e18dd6,50e219e0,51cedd4e,aefc06bd,0f2161f8,a99f214a,422d257a,542422a7,1,0,18648,320,50,1092,3,809,100156,61\r",
"\r\n"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now find out the percentage of ads that were clicked in the training data set using the following command. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!cat data/train | awk -F ',' '$2==\"1\"{sum+=$2;} END { printf \"%.2f\\n\", sum }'"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"6865066.00\r\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Thus the percentage of clicks is (6865066/40428967)= 17% (approx.). The data is highly skewed as it has only 17% click rate. We may want to negate the effect of this later on during our analysis. "
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"3. Data Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The problem at hand is one of classification. Now we can employ logistic regression, naive bayes, or support vector machines. However, we need to remember that this is big data and as such one particular tool that really does well in classification of big data is the recetly developed online learning tool named [Vowpal Wabbit](http://hunch.net/~vw).\n",
"\n",
"Vowpal Wabbit or VW is a machine learning algorithm developed by [John Langford](http://research.microsoft.com/en-us/people/jcl/). VW is the essence of speed in machine learning, able to learn from terafeature datasets with ease. Via parallel learning, it can exceed the throughput of any single machine network interface when doing linear learning, a first amongst learning algorithms. For a deeper understanding of large scale online machine learning watch this fine [video tutorial](http://techtalks.tv/talks/online-linear-learning-part-1/57924/) with John Langford. To install and get started follow this [tutorial](https://github.com/JohnLangford/vowpal_wabbit/wiki/Tutorial).\n",
"\n",
"As mentioned, it is particulary suited for terafeature datasets as its [hashing trick](http://en.wikipedia.org/wiki/Feature_hashing#Feature_vectorization_using_the_hashing_trick) reduces the feature space to a number of bits, greatly reducing the computing time."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"3.1 Data Conversion"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As with any exercise with VW, we first convert the data into a format as required by vowpal wabbit. Basically VW requires each row of the dataset in the follwoing format:\n",
" \n",
" 100 |n var1:20 var2:20 var3:2 |c 1.0 2.0 0.0 1.0\n",
" \n",
"The first number 100 is the dependent variable, the one we want to predict (in our case the ridership value). |n defines a namespace which denotes the beginning of the numerical features of our dataset. So this example dataset has 3 numerical features, both the name of the feature and value are required. It is then followed by |c which indicates the categorical features of the dataset. Here only the values are required.\n",
"\n",
"We can generate this format from our raw dataset by writing a function 'csv_to_vw' as follows."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from datetime import datetime\n",
"from csv import DictReader\n",
"import math\n",
"\n",
"def csv_to_vw(loc_csv, loc_output, train=True):\n",
" \"\"\"\n",
" Munges a CSV file (loc_csv) to a VW file (loc_output). Set \"train\"\n",
" to False when munging a test set.\n",
" \"\"\"\n",
" start = datetime.now()\n",
" print(\"\\nTurning %s into %s. Is_train_set? %s\"%(loc_csv,loc_output,train))\n",
"\n",
" with open(loc_output,\"wb\") as outfile:\n",
" for e, row in enumerate( DictReader(open(loc_csv)) ):\n",
"\n",
" #Creating the features\n",
" numerical_features = \"\"\n",
" categorical_features = \"\"\n",
" for k,v in row.items():\n",
" if k == 'hour':\n",
" new_date= datetime(int(\"20\"+v[0:2]),int(v[2:4]),int(v[4:6]))\n",
" hour= v[6:8]\n",
" sinHour = math.sin(2*math.pi*int(hour)/23)\n",
" cosHour = math.cos(2*math.pi*int(hour)/23)\n",
" day = new_date.strftime(\"%w\")\n",
" if day not in [0,6]:\n",
" weekend = 0\n",
" else:\n",
" weekend = 1\n",
" elif k not in [\"id\",\"click\",\"hour\"]:\n",
" if len(str(v)) > 0:\n",
" categorical_features += \"%s \" % v\n",
" categorical_features += \" |hr %s\" % hour\n",
" categorical_features += \" |day %s\" % day\n",
" categorical_features += \" |sinhr %s\" % sinHour\n",
" categorical_features += \" |coshr %s\" % cosHour\n",
" categorical_features += \" |weekend %s\" % weekend\n",
"\n",
" #Creating the labels\t\t \n",
" if train: #we care about labels\n",
" if row['click'] == \"1\":\n",
" label = 1\n",
" else:\n",
" label = -1 #we set negative label to -1\n",
" outfile.write( \"%s '%s |c %s\\n\" % (label,row['id'],categorical_features))\n",
"\n",
" else: #we dont care about labels\n",
" outfile.write( \"1 '%s |c %s\\n\" % (row['id'],categorical_features))\n",
"\n",
" #Reporting progress\n",
" if e % 100000 == 0:\n",
" print(\"%s\\t%s\"%(e, str(datetime.now() - start)))\n",
"\n",
" print(\"\\n %s Task execution time:\\n\\t%s\"%(e, str(datetime.now() - start)))\n",
"\n",
"csv_to_vw(\"data/train\", \"train_ful.vw\", train=True)\n",
"csv_to_vw(\"data/test\", \"test.vw\", train=False)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the above conversion code we have grouped all the existing categorical features in the data into a single namespace '|c'. We had even experimented with creating separate namespaces for each categorical variables, however that did not give better results in our analysis (explained later) for some reason. "
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"3.2 Feature Engineering"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As one would notice, we have also done some feature engineering within the conversion code itself. Primarily we have extracted the hour and day data from the date feature which should be one of the strongest determinants of the CTR. We have also added a weekend variable which is 1 if the day was a weekend else 0. The hypothesis is that the CTR would be different on a weekend than a weekday, with the former seeing higher CTR rates. \n",
"\n",
"Much of the observed structure to the data is periodic in nature (hourly). With that in mind we construct some features to reflect this a priori symmetry in our data. For variables with inherent symmetry present (hours:24) we transformed to polar coordinates via a (sin,cos) pair of variables to seek solutions of given periodicity.\n",
"\n",
"Each of the above newly created features was given a separate namespace of their own. We save this function to a file named csv2vw.py."
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"Truly Big Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This data set is truly Big in terms of number of records. As we saw earlier, there are in total close to 40 million rows and the size of the train data file is more than 1GB. Such high cardinality data will require a substantial amount of time to analyse if done with a standard python implementation. \n",
"\n",
"Enter pypy! [PyPy](http://pypy.org/) is a fast, compliant alternative implementation of the Python language (2.7.8 and 3.2.5). It has several advantages and distinct features, the most important being significant speed improvements and less memory usage. One can install pypy by simply typing the following at the command line: \n",
"```\n",
"pip install pypy\n",
"```\n",
"After installing you need to add the line ```'export PATH=$PATH:/Users/your user name/pypy-2.4/bin'``` to your .bash_profile file (on mac) to call the command from anywhere. \n",
"\n",
"We then call our function csv2vw.py with ```pypy``` to speed up the processing. "
]
},
{
"cell_type": "code",
"collapsed": true,
"input": [
"!pypy csv2vw.py"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n",
"Turning data/train into train_full.vw. Is_train_set? True\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0\t0:00:00.047804\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"100000\t0:00:03.125274\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"200000\t0:00:05.414076\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"300000\t0:00:08.025433\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"400000\t0:00:10.525358\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"500000\t0:00:12.765795\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"600000\t0:00:15.077024\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"700000\t0:00:17.251730\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"800000\t0:00:19.431892\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"900000\t0:00:21.726936\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1000000\t0:00:24.056752\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1100000\t0:00:26.388141\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1200000\t0:00:28.863936\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1300000\t0:00:31.373522\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1400000\t0:00:33.753087\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1500000\t0:00:36.001352\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1600000\t0:00:38.616340\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1700000\t0:00:40.691579\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1800000\t0:00:42.745740\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1900000\t0:00:44.822003\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2000000\t0:00:46.998355\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2100000\t0:00:49.067117\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2200000\t0:00:51.157181\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2300000\t0:00:53.253797\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2400000\t0:00:55.457481\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2500000\t0:00:57.537969\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2600000\t0:00:59.718608\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2700000\t0:01:01.924935\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2800000\t0:01:04.281188\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2900000\t0:01:06.639845\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3000000\t0:01:09.176489\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3100000\t0:01:12.498500\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3200000\t0:01:15.050708\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3300000\t0:01:17.990987\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3400000\t0:01:22.997781\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3500000\t0:01:25.711598\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3600000\t0:01:28.331162\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3700000\t0:01:30.560905\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3800000\t0:01:32.831209\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3900000\t0:01:35.274403\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4000000\t0:01:41.019958\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4100000\t0:01:43.629878\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4200000\t0:01:45.981372\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4300000\t0:01:48.290610\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4400000\t0:01:50.974897\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4500000\t0:01:53.139334\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4600000\t0:01:55.269406\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4700000\t0:01:57.911296\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4800000\t0:02:00.230052\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4900000\t0:02:02.473404\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"5000000\t0:02:04.839360\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"5100000\t0:02:06.988160\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"5200000\t0:02:09.876421\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"5300000\t0:02:12.170417\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"5400000\t0:02:14.841060\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"5500000\t0:02:17.359877\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"5600000\t0:02:19.517009\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"5700000\t0:02:21.762051\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"5800000\t0:02:23.825424\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"5900000\t0:02:26.054605\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"6000000\t0:02:28.178697\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"6100000\t0:02:30.327634\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"6200000\t0:02:32.406893\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"6300000\t0:02:34.476356\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"6400000\t0:02:36.573601\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"6500000\t0:02:39.035409\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"6600000\t0:02:41.096041\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"6700000\t0:02:43.149530\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"6800000\t0:02:45.204246\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"6900000\t0:02:47.296942\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"7000000\t0:02:49.440160\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"7100000\t0:02:51.631061\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"7200000\t0:02:53.879069\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"7300000\t0:02:56.323656\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"7400000\t0:02:58.598483\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"7500000\t0:03:00.853745\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"7600000\t0:03:02.962542\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"7700000\t0:03:05.311579\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"7800000\t0:03:08.046281\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"7900000\t0:03:10.171453\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"8000000\t0:03:12.299833\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"8100000\t0:03:14.468313\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"8200000\t0:03:16.663510\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"8300000\t0:03:18.774629\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"8400000\t0:03:21.038572\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"8500000\t0:03:23.140971\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"8600000\t0:03:25.254869\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"8700000\t0:03:27.382594\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"8800000\t0:03:29.543040\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"8900000\t0:03:31.668171\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"9000000\t0:03:33.750707\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"9100000\t0:03:35.814012\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"9200000\t0:03:38.205865\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"9300000\t0:03:40.295267\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"9400000\t0:03:42.520617\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"9500000\t0:03:44.700667\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"9600000\t0:03:46.789793\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"9700000\t0:03:49.058558\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"9800000\t0:03:51.142138\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"9900000\t0:03:53.320217\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10000000\t0:03:55.456283\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10100000\t0:03:57.658168\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10200000\t0:03:59.684915\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10300000\t0:04:02.034920\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10400000\t0:04:05.218467\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10500000\t0:04:10.986226\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10600000\t0:04:14.079148\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10700000\t0:04:16.790339\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10800000\t0:04:18.908982\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10900000\t0:04:21.020127\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"11000000\t0:04:23.230544\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"11100000\t0:04:25.359970\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"11200000\t0:04:27.471399\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"11300000\t0:04:29.554756\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"11400000\t0:04:31.670109\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"11500000\t0:04:33.769131\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"11600000\t0:04:36.174343\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"11700000\t0:04:38.445656\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"11800000\t0:04:40.532657\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"11900000\t0:04:42.601505\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12000000\t0:04:44.720613\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12100000\t0:04:46.873163\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12200000\t0:04:48.995944\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12300000\t0:04:51.163283\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12400000\t0:04:53.280281\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12500000\t0:04:55.339392\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12600000\t0:04:57.434778\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12700000\t0:04:59.585351\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12800000\t0:05:01.782611\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12900000\t0:05:03.862962\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"13000000\t0:05:06.038647\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"13100000\t0:05:08.382670\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"13200000\t0:05:10.547036\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"13300000\t0:05:12.818244\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"13400000\t0:05:14.994002\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"13500000\t0:05:17.074602\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"13600000\t0:05:19.162240\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"13700000\t0:05:21.210896\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"13800000\t0:05:23.471346\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"13900000\t0:05:25.555641\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"14000000\t0:05:27.659380\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"14100000\t0:05:29.739968\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"14200000\t0:05:31.748673\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"14300000\t0:05:33.873070\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"14400000\t0:05:36.278676\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"14500000\t0:05:38.662028\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"14600000\t0:05:40.752361\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"14700000\t0:05:42.883450\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"14800000\t0:05:45.004206\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"14900000\t0:05:47.068354\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"15000000\t0:05:49.274582\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"15100000\t0:05:51.396624\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"15200000\t0:05:53.622257\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"15300000\t0:05:55.699988\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"15400000\t0:05:57.799065\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"15500000\t0:05:59.987981\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"15600000\t0:06:02.163151\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"15700000\t0:06:04.231077\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"15800000\t0:06:06.470009\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"15900000\t0:06:08.926133\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"16000000\t0:06:11.165976\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"16100000\t0:06:13.316600\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"16200000\t0:06:15.408495\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"16300000\t0:06:17.636273\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"16400000\t0:06:19.846414\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"16500000\t0:06:21.995364\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"16600000\t0:06:24.111667\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"16700000\t0:06:26.198068\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"16800000\t0:06:28.376070\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"16900000\t0:06:30.472536\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"17000000\t0:06:32.637075\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"17100000\t0:06:34.999189\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"17200000\t0:06:37.729723\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"17300000\t0:06:40.758164\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"17400000\t0:06:43.169266\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"17500000\t0:06:45.348597\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"17600000\t0:06:47.484247\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"17700000\t0:06:49.639634\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"17800000\t0:06:51.913760\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"17900000\t0:06:54.071131\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"18000000\t0:06:56.270082\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"18100000\t0:06:58.845183\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"18200000\t0:07:01.480167\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"18300000\t0:07:03.593919\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"18400000\t0:07:05.841567\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"18500000\t0:07:08.361925\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"18600000\t0:07:10.448564\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"18700000\t0:07:12.579954\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"18800000\t0:07:14.744024\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"18900000\t0:07:16.977304\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"19000000\t0:07:19.169949\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"19100000\t0:07:21.591992\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"19200000\t0:07:23.728840\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"19300000\t0:07:25.878847\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"19400000\t0:07:28.002798\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"19500000\t0:07:30.293610\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"19600000\t0:07:32.365683\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"19700000\t0:07:34.479270\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"19800000\t0:07:36.592220\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"19900000\t0:07:39.081007\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"20000000\t0:07:41.247057\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"20100000\t0:07:44.109167\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"20200000\t0:07:46.959910\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"20300000\t0:07:49.443149\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"20400000\t0:07:51.901375\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"20500000\t0:07:54.449487\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"20600000\t0:07:56.619092\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"20700000\t0:07:58.728153\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"20800000\t0:08:01.068127\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"20900000\t0:08:04.557182\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"21000000\t0:08:10.240958\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"21100000\t0:08:14.173843\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"21200000\t0:08:16.467218\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"21300000\t0:08:18.643082\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"21400000\t0:08:20.993410\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"21500000\t0:08:23.204275\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"21600000\t0:08:25.892382\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"21700000\t0:08:28.153642\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"21800000\t0:08:30.378103\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"21900000\t0:08:32.675462\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"22000000\t0:08:35.094677\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"22100000\t0:08:37.220887\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"22200000\t0:08:39.701447\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"22300000\t0:08:41.825569\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"22400000\t0:08:43.962336\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"22500000\t0:08:46.184523\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"22600000\t0:08:48.385941\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"22700000\t0:08:50.748595\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"22800000\t0:08:52.948933\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"22900000\t0:08:55.210258\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"23000000\t0:08:57.682410\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"23100000\t0:09:00.421876\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"23200000\t0:09:02.584910\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"23300000\t0:09:04.789942\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"23400000\t0:09:06.979292\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"23500000\t0:09:09.388666\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"23600000\t0:09:11.440115\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"23700000\t0:09:13.581463\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"23800000\t0:09:15.803057\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"23900000\t0:09:17.986015\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"24000000\t0:09:20.113843\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"24100000\t0:09:22.284587\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"24200000\t0:09:24.346999\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"24300000\t0:09:26.502958\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"24400000\t0:09:28.734019\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"24500000\t0:09:30.892801\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"24600000\t0:09:32.933431\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"24700000\t0:09:35.089428\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"24800000\t0:09:37.269586\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"24900000\t0:09:39.622227\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"25000000\t0:09:41.805330\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"25100000\t0:09:43.945053\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"25200000\t0:09:46.102319\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"25300000\t0:09:48.200972\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"25400000\t0:09:50.315932\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"25500000\t0:09:52.456558\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"25600000\t0:09:54.574408\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"25700000\t0:09:57.867183\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"25800000\t0:10:00.347779\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"25900000\t0:10:02.677836\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"26000000\t0:10:04.970916\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"26100000\t0:10:07.476955\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"26200000\t0:10:09.751777\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"26300000\t0:10:12.509111\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"26400000\t0:10:15.137179\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"26500000\t0:10:17.482610\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"26600000\t0:10:19.709332\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"26700000\t0:10:21.955176\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"26800000\t0:10:24.239558\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"26900000\t0:10:26.518958\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"27000000\t0:10:28.834057\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"27100000\t0:10:31.195673\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"27200000\t0:10:33.478614\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"27300000\t0:10:35.818797\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"27400000\t0:10:38.473092\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"27500000\t0:10:40.969430\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"27600000\t0:10:43.386918\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"27700000\t0:10:45.699665\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"27800000\t0:10:48.002200\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"27900000\t0:10:50.243782\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"28000000\t0:10:52.489478\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"28100000\t0:10:54.737367\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"28200000\t0:10:57.011131\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"28300000\t0:10:59.239614\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"28400000\t0:11:01.491052\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"28500000\t0:11:03.744568\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"28600000\t0:11:06.030529\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"28700000\t0:11:08.464698\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"28800000\t0:11:10.965787\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"28900000\t0:11:13.306159\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"29000000\t0:11:15.550609\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"29100000\t0:11:17.888817\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"29200000\t0:11:20.153300\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"29300000\t0:11:22.452839\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"29400000\t0:11:24.743960\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"29500000\t0:11:26.987991\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"29600000\t0:11:29.191326\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"29700000\t0:11:31.448352\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"29800000\t0:11:33.730769\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"29900000\t0:11:36.294338\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"30000000\t0:11:39.416061\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"30100000\t0:11:42.023527\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"30200000\t0:11:44.368512\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"30300000\t0:11:46.792241\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"30400000\t0:11:49.053756\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"30500000\t0:11:51.285333\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"30600000\t0:11:53.681092\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"30700000\t0:11:56.011474\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"30800000\t0:11:58.439315\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"30900000\t0:12:00.679832\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"31000000\t0:12:02.958133\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"31100000\t0:12:05.285705\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"31200000\t0:12:08.030956\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"31300000\t0:12:10.367000\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"31400000\t0:12:12.966223\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"31500000\t0:12:15.217474\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"31600000\t0:12:17.702077\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"31700000\t0:12:19.955318\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"31800000\t0:12:22.237921\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"31900000\t0:12:24.605581\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"32000000\t0:12:27.235702\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"32100000\t0:12:29.649983\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"32200000\t0:12:32.138495\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"32300000\t0:12:34.551207\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"32400000\t0:12:36.862203\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"32500000\t0:12:39.277660\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"32600000\t0:12:42.548220\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"32700000\t0:12:46.698360\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"32800000\t0:12:49.674839\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"32900000\t0:12:51.819920\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"33000000\t0:12:53.944133\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"33100000\t0:12:56.143414\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"33200000\t0:12:58.639184\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"33300000\t0:13:01.118545\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"33400000\t0:13:03.396758\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"33500000\t0:13:05.725015\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"33600000\t0:13:08.223470\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"33700000\t0:13:10.523782\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"33800000\t0:13:12.792982\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"33900000\t0:13:15.048618\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"34000000\t0:13:17.264889\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"34100000\t0:13:19.690845\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"34200000\t0:13:21.944722\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"34300000\t0:13:24.272920\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"34400000\t0:13:26.505669\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"34500000\t0:13:28.775863\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"34600000\t0:13:31.054234\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"34700000\t0:13:33.294992\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"34800000\t0:13:35.552007\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"34900000\t0:13:37.931610\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"35000000\t0:13:40.179241\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"35100000\t0:13:42.414772\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"35200000\t0:13:44.642122\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"35300000\t0:13:46.889752\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"35400000\t0:13:49.271323\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"35500000\t0:13:51.471765\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"35600000\t0:13:53.738911\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"35700000\t0:13:55.990057\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"35800000\t0:13:58.242554\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"35900000\t0:14:00.554165\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"36000000\t0:14:02.890782\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"36100000\t0:14:05.134838\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"36200000\t0:14:07.681641\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"36300000\t0:14:09.914644\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"36400000\t0:14:11.997960\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"36500000\t0:14:14.090380\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"36600000\t0:14:16.415742\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"36700000\t0:14:18.897523\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"36800000\t0:14:21.099668\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"36900000\t0:14:23.333953\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"37000000\t0:14:25.535937\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"37100000\t0:14:27.706679\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"37200000\t0:14:29.895577\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"37300000\t0:14:32.012866\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"37400000\t0:14:34.134062\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"37500000\t0:14:36.260045\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"37600000\t0:14:38.731040\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"37700000\t0:14:40.892148\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"37800000\t0:14:43.001161\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"37900000\t0:14:45.093239\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"38000000\t0:14:47.208387\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"38100000\t0:14:49.307836\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"38200000\t0:14:51.392479\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"38300000\t0:14:53.560565\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"38400000\t0:14:55.711804\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"38500000\t0:14:57.844363\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"38600000\t0:14:59.930705\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"38700000\t0:15:02.030321\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"38800000\t0:15:04.048183\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"38900000\t0:15:06.122848\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"39000000\t0:15:08.455613\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"39100000\t0:15:10.722343\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"39200000\t0:15:12.890226\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"39300000\t0:15:15.047808\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"39400000\t0:15:17.176741\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"39500000\t0:15:19.430118\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"39600000\t0:15:21.579657\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"39700000\t0:15:23.675786\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"39800000\t0:15:25.807229\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"39900000\t0:15:27.981126\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"40000000\t0:15:30.175297\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"40100000\t0:15:32.267476\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"40200000\t0:15:34.343716\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"40300000\t0:15:36.452628\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"40400000\t0:15:38.886291\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n",
" 40428966 Task execution time:\r\n",
"\t0:15:39.800758\r\n",
"\r\n",
"Turning data/test into test.vw. Is_train_set? False\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0\t0:00:00.420749\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"100000\t0:00:04.398498\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"200000\t0:00:06.563869\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"300000\t0:00:08.668418\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"400000\t0:00:10.700820\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"500000\t0:00:12.780113\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"600000\t0:00:14.860785\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"700000\t0:00:17.529860\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"800000\t0:00:19.821141\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"900000\t0:00:21.860520\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1000000\t0:00:23.938180\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1100000\t0:00:26.108977\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1200000\t0:00:28.694526\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1300000\t0:00:30.923212\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1400000\t0:00:33.088179\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1500000\t0:00:35.215851\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1600000\t0:00:37.332083\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1700000\t0:00:39.578230\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1800000\t0:00:41.679309\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1900000\t0:00:43.915111\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2000000\t0:00:45.968854\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2100000\t0:00:48.003002\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2200000\t0:00:50.052432\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2300000\t0:00:52.205396\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2400000\t0:00:54.192727\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2500000\t0:00:56.245027\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2600000\t0:00:58.596864\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2700000\t0:01:00.725468\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2800000\t0:01:03.128903\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2900000\t0:01:05.409398\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3000000\t0:01:07.975637\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3100000\t0:01:10.506805\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3200000\t0:01:12.726931\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3300000\t0:01:14.791400\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3400000\t0:01:16.888496\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3500000\t0:01:19.004787\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3600000\t0:01:21.060767\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3700000\t0:01:23.184298\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3800000\t0:01:25.283394\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"3900000\t0:01:27.391460\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4000000\t0:01:29.752674\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4100000\t0:01:31.856220\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4200000\t0:01:34.229628\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4300000\t0:01:36.470578\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4400000\t0:01:38.747250\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4500000\t0:01:40.836265\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n",
" 4577463 Task execution time:\r\n",
"\t0:01:42.397042\r\n"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets have a look at the train and test files we have created. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!head -n 1 train_full.vw"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"-1 '1000009418151094273 |c 79 ddd2926e 1fbe01fe ecad2386 35 0 1 1722 320 15706 50 2 1005 07d7df22 28905ebd 7801e8d9 f3845767 0 a99f214a -1 44956a24 |hr 00 |day 2 |sinhr 0.0 |coshr 1.0 |weekend 0\r\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!head -n 1 test.vw"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 '10000174058809263569 |c 23 69f45779 235ba823 ecad2386 175 3 1 761 50 320 8330 0 1005 07d7df22 f028772b 7801e8d9 f6ebf28e 0 a99f214a 100075 0eb711ec |hr 00 |day 5 |sinhr 0.0 |coshr 1.0 |weekend 0\r\n"
]
}
],
"prompt_number": 6
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"A note on cross validation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With any machine learning problem, we are served well to split the training data into training set and a validation set so as to get a better estimate of the test set error. However, since we are using VW there is no requirement of the same. This is because VW itself does the cross validation on the training data while learning and reports the cross validation error as 'average loss' in the output. Here average loss computes the [progressive validation loss](http://hunch.net/~jl/projects/prediction_bounds/progressive_validation/coltfinal.pdf). The critical thing to understand here is that progressive validation loss deviates like a test set, and hence is a reliable indicator of success on the first pass over any data-set."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"3.3 Analysis using VW"
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"Picking a loss function"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One of the most important aspects of using vowpal wabbit successfully is to pick the right loss function over which the algorithm optimizes to learn. Online machine learning with VW learns from samples one at a time. When our model is trained it iterates through the train dataset and optimizes this function.\n",
"\n",
"Vowpal Wabbit has five loss functions:\n",
"1. Squared loss. Useful for regression problems, when minimizing expectation. For example: Expected return on a stock.\n",
"2. Classic loss. Vanilla squared loss (without the importance weight aware update).\n",
"3. Quantile loss. Useful for regression problems, for example: predicting house pricing.\n",
"4. Hinge loss. Useful for classification problems, minimizing the yes/no question (closest 0-1 approximation). For example: Keyword_tag or not.\n",
"5. Log loss. Useful for classification problems, minimizer = probability, for example: Probability of click on ad.\n",
"\n",
"\n",
"As we are predicting the probability of click on ads, so Log loss is more suitable for our purpose. Also as stated earlier the competition evaluation is also based on log loss and hence this serves our purpose very well. "
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Model 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now train a simple vanilla model to generate the initial set of predictions by the following command. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!vw train_full.vw -f avazu.model.vw --loss_function logistic"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"final_regressor = avazu.model.vw\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Num weight bits = 18\r\n",
"learning rate = 0.5\r\n",
"initial_t = 0\r\n",
"power_t = 0.5\r\n",
"using no cache\r\n",
"Reading datafile = train_full.vw\r\n",
"num sources = 1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"average since example example current current current\r\n",
"loss last counter weight label predict features\r\n",
"0.693147 0.693147 1 1.0 -1.0000 0.0000 27\r\n",
"0.458050 0.222953 2 2.0 -1.0000 -1.3872 27\r\n",
"0.274633 0.091217 4 4.0 -1.0000 -2.5609 27\r\n",
"0.180545 0.086458 8 8.0 -1.0000 -2.5604 27\r\n",
"0.358718 0.536891 16 16.0 -1.0000 -2.7296 27\r\n",
"0.400848 0.442978 32 32.0 -1.0000 -1.7490 27\r\n",
"0.530634 0.660419 64 64.0 -1.0000 -0.8129 27\r\n",
"0.536752 0.542870 128 128.0 -1.0000 -1.4927 27\r\n",
"0.490036 0.443320 256 256.0 1.0000 -1.8038 27\r\n",
"0.460721 0.431407 512 512.0 -1.0000 -2.7424 27\r\n",
"0.435663 0.410605 1024 1024.0 -1.0000 -1.4675 27\r\n",
"0.431234 0.426805 2048 2048.0 -1.0000 -2.9783 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.433299 0.435365 4096 4096.0 -1.0000 -2.2066 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.425758 0.418216 8192 8192.0 -1.0000 -2.2592 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.416503 0.407248 16384 16384.0 -1.0000 -2.8503 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.417421 0.418338 32768 32768.0 1.0000 -1.4954 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.411354 0.405288 65536 65536.0 -1.0000 -2.8381 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.406810 0.402266 131072 131072.0 -1.0000 -4.1793 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.401592 0.396373 262144 262144.0 -1.0000 -2.7953 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.384311 0.367031 524288 524288.0 -1.0000 -1.9057 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.376373 0.368435 1048576 1048576.0 -1.0000 -1.2119 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.378080 0.379787 2097152 2097152.0 -1.0000 -3.3367 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.399552 0.421024 4194304 4194304.0 -1.0000 -3.6283 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.382552 0.365553 8388608 8388608.0 -1.0000 -0.6560 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.397118 0.411683 16777216 16777216.0 -1.0000 -0.8555 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.395913 0.394709 33554432 33554432.0 1.0000 -0.7646 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n",
"finished run\r\n",
"number of examples per pass = 40428967\r\n",
"passes used = 1\r\n",
"weighted example sum = 4.0429e+07\r\n",
"weighted label sum = -2.66988e+07\r\n",
"average loss = 0.394479\r\n",
"best constant = -0.660389\r\n",
"total feature number = 1091582109\r\n"
]
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The -f option in the command saves our learned model in avazu.model.vw to be used later for predictions. --loss_function logistic tells the model to use logistic i.e. log loss for learning purpose. \n",
"\n",
"The output of the training process is as above. The detailed explanation of various numbers can be found at the VW tutorial link shared in the begining. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We then predict the test set output with the following command from the terminal. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!vw test.vw -t -i avazu.model.vw --link logistic -p avazu.preds.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"only testing\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Num weight bits = 18\r\n",
"learning rate = 10\r\n",
"initial_t = 1\r\n",
"power_t = 0.5\r\n",
"predictions = avazu.preds.txt\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"using no cache\r\n",
"Reading datafile = test.vw\r\n",
"num sources = 1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"average since example example current current current\r\n",
"loss last counter weight label predict features\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"8.694021 8.694021 1 1.0 1.0000 0.1247 27\r\n",
"8.727754 8.761487 2 2.0 1.0000 0.1235 27\r\n",
"9.583062 10.438371 4 4.0 1.0000 0.0615 27\r\n",
"10.341092 11.099122 8 8.0 1.0000 0.0180 27\r\n",
"9.568561 8.796030 16 16.0 1.0000 0.1548 27\r\n",
"9.822393 10.076225 32 32.0 1.0000 0.0604 27\r\n",
"9.959246 10.096099 64 64.0 1.0000 0.1584 27\r\n",
"10.545743 11.132239 128 128.0 1.0000 0.0723 27\r\n",
"11.866919 13.188095 256 256.0 1.0000 0.0439 27\r\n",
"12.152296 12.437672 512 512.0 1.0000 0.1457 27\r\n",
"12.365687 12.579078 1024 1024.0 1.0000 0.0570 27\r\n",
"12.336797 12.307907 2048 2048.0 1.0000 0.0174 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12.388471 12.440146 4096 4096.0 1.0000 0.0488 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12.471634 12.554796 8192 8192.0 1.0000 0.1106 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12.458044 12.444454 16384 16384.0 1.0000 0.1175 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12.532114 12.606185 32768 32768.0 1.0000 0.0135 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12.491542 12.450970 65536 65536.0 1.0000 0.2037 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12.514672 12.537803 131072 131072.0 1.0000 0.0645 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"13.046750 13.578828 262144 262144.0 1.0000 0.1653 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12.012632 10.978514 524288 524288.0 1.0000 0.1626 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10.884075 9.755517 1048576 1048576.0 1.0000 0.0444 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"9.990137 9.096199 2097152 2097152.0 1.0000 0.0551 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"9.872667 9.755197 4194304 4194304.0 1.0000 0.0991 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n",
"finished run\r\n",
"number of examples per pass = 4577464\r\n",
"passes used = 1\r\n",
"weighted example sum = 4.57746e+06\r\n",
"weighted label sum = 4.57746e+06\r\n",
"average loss = 10.0096\r\n",
"best constant = 1\r\n",
"total feature number = 123591528\r\n"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The -t says to test only and not train. -i avazu.model.vw says to use the model that we learned from the training process. -p saves our predictions to avazu.preds.txt. --link logistic automatically gives a prediction in the range of 0 to 1 (i.e. does the sigmoid transformation) as required by our submission format for the competition."
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"Convert Data to Kaggle submission format"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now convert our predictions obatined in the avazu.pred.txt file to kaggle submission format with the following code. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import sys\n",
"\n",
"with open(\"submission.csv\",\"wb\") as outfile:\n",
"\toutfile.write(\"id,click\\n\")\n",
"\tfor line in open(sys.argv[1]):\n",
"\t\trow = line.strip().split(\" \")\n",
"\t\toutfile.write(\"%s,%f\\n\"%(row[1],float(row[0])))"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We save the above to a file named vw2kaggle.py and then execute it with ```pypy``` as follows."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!pypy vw2kaggle.py avazu.preds.txt"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets have a look at the submission file to check its content. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!head -n 10 submission.csv"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"id,click\r\n",
"10000174058809263569,0.124710\r\n",
"10000182526920855428,0.123469\r\n",
"10000554139829213984,0.161742\r\n",
"10001094637809798845,0.061488\r\n",
"10001377041558670745,0.260012\r\n",
"10001521204153353724,0.194398\r\n",
"10001911056707023378,0.113664\r\n",
"10001982898844213216,0.017951\r\n",
"10002000217531288531,0.050727\r\n"
]
}
],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Upon submission this gave a leaderboard score of 0.3991901 which is pretty close to the average loss reported by our learner in the training phase i.e. 0.394479, hence confirming our earlier approximation of using average loss as a guide for validation loss. This submission also put me around the 50% mark of the leaderboard. "
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"3.4 Fine Tuning the Learning Model in VW"
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"3.4.1 Number of bits (-b option)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see from the training phase of Model 1 that the total number of features used by our model is 970295208. \n",
"\n",
"Now VW hashes feature names into a 2^b dimensional space. By default it uses 18 bits, so that\u2019s about 262k possible features. If we have more than that, they will collide, meaning that the software won\u2019t be able to distinguish between some of them. Fortunately we can increase the number of bits used for hashing so that we can get millions of features. Since our total number of features is a nine digit figure (970295208) so the ideal value of b for us would be 28 (2^28 is a nine digit number) so that we avoid feature collisions. We chose b=30 to add a bit of buffer. "
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"3.4.2 Tuning the learning rate (-l option)"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Model 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"vw-hypersearch is a simple wrapper to vw to help in finding lowest-loss hyper-parameters (argmin).\n",
"\n",
"For example: to find the lowest average loss for --l1 (L1-norm regularization) on a train-set called train.dat:\n",
" \n",
" $ vw-hypersearch 1e-10 5e-4 vw --l1 % train.dat\n",
"vw-hypersearch will train multiple times (but in a efficient way) until it finds the --l1 value resulting in the lowest average training loss.\n",
"\n",
"Explanation of the example:\n",
"the % character is a placeholder for the (argmin) parameter we are looking for.\n",
"1e-10 is the lower-bound for the search range\n",
"5e-4 is the upper-bound of the search range\n",
"The lower & upper bounds are arguments to vw-hypersearch. Anything from vw on, are normal vw arguments exactly as one would use in training. The only change one must apply to the training command is to use % instead of the value of the parameter we trying to optimize on.\n",
"\n",
"In order to find the best learning rate for our model which gives the lowest average loss we run the following VW command. It took around 2 hours to run on my MacBook 2012 with 4GB RAM. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!vw-hypersearch 0.0 1 vw -b 30 --loss_function logistic --learning_rate % train_full.vw"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"trying 0.618033988749895 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"..."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"..."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"....."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.387706 (best)\r\n",
"trying 0.381966011250105 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"...."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.389596\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"trying 0.76393202250021 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"...."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"........"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.386971 (best)\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"trying 0.854101966249684 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"....."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.386623 (best)\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"trying 0.909830056250526 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.386441 (best)\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"trying 0.944271909999159 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"..........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.38634 (best)\r\n",
"trying 0.965558146251367 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.386281 (best)\r\n",
"trying 0.978713763747792 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.386246 (best)\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"trying 0.986844382503575 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"..."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.386225 (best)\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"trying 0.991869381244217 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.386213 (best)\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"trying 0.994975001259358 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"....."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.386205 (best)\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"trying 0.996894379984858 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"..........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"...."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"....."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.3862 (best)\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"trying 0.9980806212745 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.386197 (best)\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"trying 0.998813758710358 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.386195 (best)\r\n",
"trying 0.999266862564143 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"....."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"...."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"....."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.386194 (best)\r\n",
"trying 0.999546896146215 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.386194 (best)\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"vw-hypersearch: loss(0.999547) == loss(0.999267): 0.386194\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"trying 0.999406879355179 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.386194 (best)\r\n",
"0.999407\t0.386194\r\n"
]
}
],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At the end of it it tells us that 0.999407 gives the lowest average loss. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now run our model with the updated value of learning rate as follows."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!vw train_full.vw -b 30 -l 0.999407 -f model-b30-l.vw --loss_function logistic"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"final_regressor = model-b30-l.vw\r\n",
"Num weight bits = 30\r\n",
"learning rate = 0.999407\r\n",
"initial_t = 0\r\n",
"power_t = 0.5\r\n",
"using no cache\r\n",
"Reading datafile = train_full.vw\r\n",
"num sources = 1\r\n",
"average since example example current current current\r\n",
"loss last counter weight label predict features\r\n",
"0.693147 0.693147 1 1.0 -1.0000 0.0000 27\r\n",
"0.409970 0.126793 2 2.0 -1.0000 -2.0011 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.227020 0.044070 4 4.0 -1.0000 -3.3033 27\r\n",
"0.138481 0.049943 8 8.0 -1.0000 -3.1471 27\r\n",
"0.392604 0.646727 16 16.0 -1.0000 -2.8508 27\r\n",
"0.443221 0.493838 32 32.0 -1.0000 -1.5859 27\r\n",
"0.569091 0.694960 64 64.0 -1.0000 -0.9992 27\r\n",
"0.579374 0.589657 128 128.0 -1.0000 -1.6149 27\r\n",
"0.514951 0.450528 256 256.0 1.0000 -1.8114 27\r\n",
"0.480296 0.445641 512 512.0 -1.0000 -3.6789 27\r\n",
"0.453998 0.427700 1024 1024.0 -1.0000 -1.7688 27\r\n",
"0.447397 0.440796 2048 2048.0 -1.0000 -3.4554 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.444261 0.441125 4096 4096.0 -1.0000 -1.9968 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.434646 0.425031 8192 8192.0 -1.0000 -2.0707 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.422058 0.409470 16384 16384.0 -1.0000 -2.6599 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.420672 0.419287 32768 32768.0 1.0000 -1.5586 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.412430 0.404188 65536 65536.0 -1.0000 -3.2506 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.406065 0.399699 131072 131072.0 -1.0000 -4.6054 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.399577 0.393090 262144 262144.0 -1.0000 -2.7182 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.382006 0.364434 524288 524288.0 -1.0000 -2.0122 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.373160 0.364315 1048576 1048576.0 -1.0000 -1.0706 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.373796 0.374431 2097152 2097152.0 -1.0000 -3.3859 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.394055 0.414315 4194304 4194304.0 -1.0000 -3.7731 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.375953 0.357851 8388608 8388608.0 -1.0000 -0.6535 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.389097 0.402241 16777216 16777216.0 -1.0000 -0.6822 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.387660 0.386223 33554432 33554432.0 1.0000 -0.9213 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n",
"finished run\r\n",
"number of examples per pass = "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"40428967\r\n",
"passes used = 1\r\n",
"weighted example sum = 4.0429e+07\r\n",
"weighted label sum = -2.66988e+07\r\n",
"average loss = 0.386194\r\n",
"best constant = -0.660389\r\n",
"total feature number = 1091582109\r\n"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we get an average loss of 0.386194 which is better than our plain vanilla Model 1 (0.394479) tried out earlier. \n",
"\n",
"We can then predict on the test set and generate the submission file using the following two commands. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!vw test.vw -t -i model-b30-l.vw --link logistic -p preds-b30-l.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"only testing\r\n",
"Num weight bits = 30\r\n",
"learning rate = 10\r\n",
"initial_t = 1\r\n",
"power_t = 0.5\r\n",
"predictions = preds-b30-l.txt\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"using no cache\r\n",
"Reading datafile = test.vw\r\n",
"num sources = 1\r\n",
"average since example example current current current\r\n",
"loss last counter weight label predict features\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12.273828 12.273828 1 1.0 1.0000 0.0756 27\r\n",
"11.649576 11.025324 2 2.0 1.0000 0.0894 27\r\n",
"15.738128 19.826681 4 4.0 1.0000 0.0100 27\r\n",
"15.530900 15.323672 8 8.0 1.0000 0.0073 27\r\n",
"14.530898 13.530895 16 16.0 1.0000 0.0688 27\r\n",
"13.962127 13.393357 32 32.0 1.0000 0.0381 27\r\n",
"14.330523 14.698918 64 64.0 1.0000 0.0956 27\r\n",
"14.210387 14.090252 128 128.0 1.0000 0.0352 27\r\n",
"16.509169 18.807951 256 256.0 1.0000 0.0651 27\r\n",
"16.354505 16.199841 512 512.0 1.0000 0.1227 27\r\n",
"16.653517 16.952528 1024 1024.0 1.0000 0.0366 27\r\n",
"16.548226 16.442935 2048 2048.0 1.0000 0.0103 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"16.662457 16.776689 4096 4096.0 1.0000 0.0439 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"16.723325 16.784193 8192 8192.0 1.0000 0.0546 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"16.675594 16.627863 16384 16384.0 1.0000 0.0684 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"16.739274 16.802954 32768 32768.0 1.0000 0.0116 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"16.690641 16.642009 65536 65536.0 1.0000 0.0864 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"16.875157 17.059673 131072 131072.0 1.0000 0.0402 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"17.628174 18.381192 262144 262144.0 1.0000 0.1118 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"15.692613 13.757051 524288 524288.0 1.0000 0.1239 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"12.597286 9.501959 1048576 1048576.0 1.0000 0.1167 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10.301880 8.006474 2097152 2097152.0 1.0000 0.0771 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"9.699953 9.098027 4194304 4194304.0 1.0000 0.1192 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n",
"finished run\r\n",
"number of examples per pass = 4577464\r\n",
"passes used = 1\r\n",
"weighted example sum = 4.57746e+06\r\n",
"weighted label sum = 4.57746e+06\r\n",
"average loss = 9.93796\r\n",
"best constant = 1\r\n",
"total feature number = 123591528\r\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!pypy vw2kaggle.py preds-b30-l.txt"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Interestingly, this submission gave a leaderboard score of 0.4074371 which is worse than the plain vanilla model 1. "
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"3.4.3 Applying Regularization (-l1/l2)"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Model 3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As the resultant number of features is very large (of the order of 10^7, as can be seen from the output of learning models), hence we incorporate bit of regularization so that our model does not overfit the training data. We find the best possible value of -l1 using vw -hypersearch as before. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!vw-hypersearch 0.000000001 1 vw -b 30 --loss_function logistic --l1 % train_full.vw"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"vw-hypersearch: you may get better results with -L (log-space search)\r\n",
"\t\twhen any of --l1/hinge-loss/small-param-values are used\r\n",
"trying 0.618033989131861 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"...................."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"..."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.693147 (best)\r\n",
"trying 0.381966011868139 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"..................."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"......."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.693147 (best)\r\n",
"vw-hypersearch: loss(0.618034) == loss(0.381966): 0.693147\r\n",
"trying 0.5000000005 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"...."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"...."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"..........."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
".."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"..."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"...."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 0.693147 (best)\r\n",
"0.5\t0.693147\r\n"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For some reason, it gives the same average loss for different values of l1 parameter implying regularization does not add any value to the model. \n",
"\n",
"However, in a model with large number of features (like ours) it is always advisable to add a bit of regularization to avoid overfitting, hence we add a very small -l1 value of 0.00000001 to our model. We now run our model with the updated value of -l1 as follows."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!vw train_full.vw -b 30 -l 0.999407 --l1 0.00000001 -f model-b30-l-l1.vw --loss_function logistic "
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"using l1 regularization = 1e-08\r\n",
"final_regressor = model-b30-l-l1.vw\r\n",
"Num weight bits = 30\r\n",
"learning rate = 0.999407\r\n",
"initial_t = 0\r\n",
"power_t = 0.5\r\n",
"using no cache\r\n",
"Reading datafile = train_full.vw\r\n",
"num sources = 1\r\n",
"average since example example current current current\r\n",
"loss last counter weight label predict features\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.693147 0.693147 1 1.0 -1.0000 0.0000 27\r\n",
"0.409970 0.126793 2 2.0 -1.0000 -2.0011 27\r\n",
"0.227020 0.044070 4 4.0 -1.0000 -3.3033 27\r\n",
"0.138481 0.049943 8 8.0 -1.0000 -3.1471 27\r\n",
"0.392604 0.646727 16 16.0 -1.0000 -2.8508 27\r\n",
"0.443221 0.493838 32 32.0 -1.0000 -1.5859 27\r\n",
"0.569091 0.694960 64 64.0 -1.0000 -0.9992 27\r\n",
"0.579374 0.589656 128 128.0 -1.0000 -1.6149 27\r\n",
"0.514951 0.450528 256 256.0 1.0000 -1.8115 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.480296 0.445641 512 512.0 -1.0000 -3.6788 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.453997 0.427698 1024 1024.0 -1.0000 -1.7688 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.447395 0.440792 2048 2048.0 -1.0000 -3.4553 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.444256 0.441117 4096 4096.0 -1.0000 -1.9970 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.434639 0.425023 8192 8192.0 -1.0000 -2.0707 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.422050 0.409461 16384 16384.0 -1.0000 -2.6602 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.420667 0.419285 32768 32768.0 1.0000 -1.5584 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.412426 0.404185 65536 65536.0 -1.0000 -3.2491 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.406063 0.399700 131072 131072.0 -1.0000 -4.6018 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.399582 0.393100 262144 262144.0 -1.0000 -2.7171 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.382007 0.364432 524288 524288.0 -1.0000 -2.0076 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.373174 0.364341 1048576 1048576.0 -1.0000 -1.0797 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.373839 0.374505 2097152 2097152.0 -1.0000 -3.3702 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.394181 0.414523 4194304 4194304.0 -1.0000 -3.7544 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.376235 0.358288 8388608 8388608.0 -1.0000 -0.6966 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.389913 0.403591 16777216 16777216.0 -1.0000 -0.7033 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.389299 0.388685 33554432 33554432.0 1.0000 -0.9818 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n",
"finished run\r\n",
"number of examples per pass = "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"40428967\r\n",
"passes used = 1\r\n",
"weighted example sum = 4.0429e+07\r\n",
"weighted label sum = -2.66988e+07\r\n",
"average loss = 0.388106\r\n",
"best constant = -0.660389\r\n",
"total feature number = 1091582109\r\n"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will also try one model with -l2 regularization as below and infact find that l2 regularization works better here than l1."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!vw train_full.vw -b 30 -l 0.999407 --l2 0.00000001 -f model-b30-l-l2.vw --loss_function logistic "
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"using l2 regularization = 1e-08\r\n",
"final_regressor = model-b30-l-l2.vw\r\n",
"Num weight bits = 30\r\n",
"learning rate = 0.999407\r\n",
"initial_t = 0\r\n",
"power_t = 0.5\r\n",
"using no cache\r\n",
"Reading datafile = train_full.vw\r\n",
"num sources = 1\r\n",
"average since example example current current current\r\n",
"loss last counter weight label predict features\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.693147 0.693147 1 1.0 -1.0000 0.0000 27\r\n",
"0.409970 0.126793 2 2.0 -1.0000 -2.0011 27\r\n",
"0.227020 0.044070 4 4.0 -1.0000 -3.3033 27\r\n",
"0.138481 0.049943 8 8.0 -1.0000 -3.1471 27\r\n",
"0.392604 0.646727 16 16.0 -1.0000 -2.8508 27\r\n",
"0.443221 0.493838 32 32.0 -1.0000 -1.5859 27\r\n",
"0.569091 0.694960 64 64.0 -1.0000 -0.9992 27\r\n",
"0.579374 0.589657 128 128.0 -1.0000 -1.6149 27\r\n",
"0.514951 0.450528 256 256.0 1.0000 -1.8115 27\r\n",
"0.480296 0.445641 512 512.0 -1.0000 -3.6789 27\r\n",
"0.453998 0.427700 1024 1024.0 -1.0000 -1.7687 27\r\n",
"0.447397 0.440796 2048 2048.0 -1.0000 -3.4554 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.444260 0.441124 4096 4096.0 -1.0000 -1.9968 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.434645 0.425030 8192 8192.0 -1.0000 -2.0707 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.422057 0.409469 16384 16384.0 -1.0000 -2.6599 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.420672 0.419286 32768 32768.0 1.0000 -1.5586 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.412430 0.404187 65536 65536.0 -1.0000 -3.2504 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.406064 0.399699 131072 131072.0 -1.0000 -4.6042 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.399576 0.393088 262144 262144.0 -1.0000 -2.7179 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.382001 0.364426 524288 524288.0 -1.0000 -2.0115 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.373149 0.364297 1048576 1048576.0 -1.0000 -1.0726 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.373772 0.374396 2097152 2097152.0 -1.0000 -3.3835 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.394002 0.414231 4194304 4194304.0 -1.0000 -3.7604 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.375869 0.357736 8388608 8388608.0 -1.0000 -0.6617 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.388928 0.401987 16777216 16777216.0 -1.0000 -0.6949 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.387333 0.385739 33554432 33554432.0 1.0000 -0.9512 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n",
"finished run\r\n",
"number of examples per pass = "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4042"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"8967\r\n",
"passes used = 1\r\n",
"weighted example sum = 4.0429e+07\r\n",
"weighted label sum = -2.66988e+07\r\n",
"average loss = 0.385823\r\n",
"best constant = -0.660389\r\n",
"total feature number = 1091582109\r\n"
]
}
],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see this gives a lower average loss (0.385823) than the that with l1 regularization (0.388106) and one in Model 2 (0.386194)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!vw test.vw -t -i model-b30-l-l2.vw --link logistic -p preds-b30-l-l2.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"only testing\r\n",
"Num weight bits = "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"30\r\n",
"learning rate = 10\r\n",
"initial_t = 1\r\n",
"power_t = 0.5\r\n",
"predictions = preds-b30-l-l2.txt\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"using no cache\r\n",
"Reading datafile = test.vw\r\n",
"num sources = 1\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"average since example example current current current\r\n",
"loss last counter weight label predict features\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"11.425002 11.425002 1 1.0 1.0000 0.0847 27\r\n",
"10.503748 9.582495 2 2.0 1.0000 0.1095 27\r\n",
"14.356219 18.208689 4 4.0 1.0000 0.0121 27\r\n",
"13.835917 13.315616 8 8.0 1.0000 0.0106 27\r\n",
"12.818223 11.800529 16 16.0 1.0000 0.0894 27\r\n",
"12.313324 11.808425 32 32.0 1.0000 0.0467 27\r\n",
"12.641553 12.969782 64 64.0 1.0000 0.1170 27\r\n",
"12.429421 12.217288 128 128.0 1.0000 0.0452 27\r\n",
"14.433907 16.438394 256 256.0 1.0000 0.0815 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"14.300987 14.168067 512 512.0 1.0000 0.1429 27\r\n",
"14.549103 14.797220 1024 1024.0 1.0000 0.0459 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"14.474095 14.399087 2048 2048.0 1.0000 0.0142 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"14.573010 14.671925 4096 4096.0 1.0000 0.0544 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"14.630715 14.688420 8192 8192.0 1.0000 0.0737 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"14.591898 14.553080 16384 16384.0 1.0000 0.0821 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"14.645670 14.699442 32768 32768.0 1.0000 0.0176 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"14.601456 14.557242 65536 65536.0 1.0000 0.1198 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"14.641540 14.681624 131072 131072.0 1.0000 0.0548 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"15.141720 15.641900 262144 262144.0 1.0000 0.1402 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"13.690059 12.238398 524288 524288.0 1.0000 0.1355 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"11.455350 9.220642 1048576 1048576.0 1.0000 0.0992 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"9.719304 7.983257 2097152 2097152.0 1.0000 0.0752 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"9.190696 8.662089 4194304 4194304.0 1.0000 0.1346 27\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n",
"finished run\r\n",
"number of examples per pass = "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4577464\r\n",
"passes used = 1\r\n",
"weighted example sum = 4.57746e+06\r\n",
"weighted label sum = 4.57746e+06\r\n",
"average loss = 9.34768\r\n",
"best constant = 1\r\n",
"total feature number = 123591528\r\n"
]
}
],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!pypy vw2kaggle.py preds-b30-l-l2.txt"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"3.4.4 Increase the number of passes (-p)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By increasing the number of times the algorithm will cycle over the training data, we allow VW to better fit the model and thus reduce the average loss further. However, due to the limitation of my hardware I was not able to try it out. "
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"3.4.5 Provide a higher weight to click data points"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we had seen earlier, the data is highly skewed with only 17% clicks (6865066/40428967). So we decided to give more weightage to the data points reflecting a click behavior however that dint give a better result in our final analysis. Hence we will not go into the details of this. "
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"3.4.5 Incorporating quadratic terms"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Model 4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Often, we want to be able to include interaction features between sets of features. This is a useful way to model nonlinearities in the data while still being able to use a linear learner like VW.\n",
"\n",
"VW has support for both two-way (quadratic) and three-way (cubic) interactions across feature namespaces. Quadratic features create a new namespace where each feature is the concatenation of two features from each namespace. The value of this feature is the product of the values that make up the features its composed from. Quadratic features are specified with the -q options and the feature namespaces as arguments.\n",
"\n",
"We incorporate the quadratic terms in our model as follows. '-q ::' specifies the model to form interactions of each namespace in our data set with the other. Further using quadratic interactions also poses a danger of overfitting, so we apply --l2 regularization to counter it."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!vw train_full.vw -b 30 -l 0.999407 --l2 0.00000001 -q :: -f model-b30-l-l2-q.vw --loss_function logistic"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"creating quadratic features for pairs: :: in pair creation\n",
"\n",
"final_regressor = avazu.model.vw\n",
"Num weight bits = 25\n",
"learning rate = 0.3945\n",
"initial_t = 0\n",
"power_t = 0.5\n",
"using no cache\n",
"Reading datafile = train_full.vw\n",
"num sources = 1\n",
"average since example example current current current\n",
"loss last counter weight label predict features\n",
"0.693147 0.693147 1 1.0 -1.0000 0.0000 601\n",
"0.394862 0.096578 2 2.0 -1.0000 -2.2887 601\n",
"0.209225 0.023588 4 4.0 -1.0000 -3.8691 601\n",
"0.144953 0.080681 8 8.0 -1.0000 -2.5628 601\n",
"0.429961 0.714969 16 16.0 -1.0000 -3.3126 601\n",
"0.474228 0.518494 32 32.0 -1.0000 -1.5024 601\n",
"0.585054 0.695881 64 64.0 -1.0000 -1.0034 601\n",
"0.622823 0.660591 128 128.0 -1.0000 -1.4885 601\n",
"0.560085 0.497348 256 256.0 1.0000 -1.7599 601\n",
"0.529467 0.498849 512 512.0 -1.0000 -4.3134 601\n",
"0.505159 0.480852 1024 1024.0 -1.0000 -3.0823 601\n",
"0.496035 0.486911 2048 2048.0 -1.0000 -4.3662 601\n",
"0.482713 0.469391 4096 4096.0 -1.0000 -0.9873 601\n",
"0.467771 0.452830 8192 8192.0 -1.0000 -1.2962 601\n",
"0.447862 0.427952 16384 16384.0 -1.0000 -2.1017 601\n",
"0.441907 0.435953 32768 32768.0 1.0000 -1.8154 601\n",
"0.427817 0.413728 65536 65536.0 -1.0000 -4.1079 601\n",
"0.416466 0.405114 131072 131072.0 -1.0000 -6.1644 601\n",
"0.406305 0.396145 262144 262144.0 -1.0000 -1.8602 601\n",
"0.387445 0.368586 524288 524288.0 -1.0000 -2.2565 601\n",
"0.376486 0.365527 1048576 1048576.0 -1.0000 -1.6468 601\n",
"0.375087 0.373687 2097152 2097152.0 -1.0000 -3.8969 601\n",
"0.393947 0.412807 4194304 4194304.0 -1.0000 -4.2576 601\n",
"0.374755 0.355562 8388608 8388608.0 -1.0000 -1.5612 601\n",
"0.386943 0.399131 16777216 16777216.0 -1.0000 -0.8215 601\n",
"0.385446 0.383950 33554432 33554432.0 1.0000 -0.9720 601\n",
"\n",
"finished run\n",
"number of examples per pass = 40428967\n",
"passes used = 1\n",
"weighted example sum = 4.0429e+07\n",
"weighted label sum = -2.66988e+07\n",
"average loss = 0.383988\n",
"best constant = -0.660389\n",
"total feature number = 24297809167"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This gives us a average loss of 0.383988, which is better than that of Model 3 (0.385823) and Model 2 (0.386194). "
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Submissions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For my submissions I chose the plain vanilla model optimized for learning rate and regularization i.e. Model 3 and the one with quadratic interactions i.e. Model 4 which gave me the best average loss among all my models. Out of the two my best submission came out to be Model 3, but it put me only around the 45% mark of the leader board. Having said that it was a great to further enhance my competence in VW and dive deep into its salient features. "
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Further scope of Work"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" 1. The data is very non linear and hence one can also look into using neural nets to take care of it. \n",
" 2. Clustering of the data sets as per device or domain ids could be looked into as one can expect to have some common behavior in these groups. "
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment