Skip to content

Instantly share code, notes, and snippets.

@mmerce
Last active November 25, 2015 18:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mmerce/6a80cef27ceac998fc99 to your computer and use it in GitHub Desktop.
Save mmerce/6a80cef27ceac998fc99 to your computer and use it in GitHub Desktop.
PyConES - BIGML command line tool, BigMLer
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
" # The Diabetes dataset: BigMLer example ![Python inside](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTSqpDdjXT-bX4POiYNUIj1gSfj9IjKnMklEWetncCQqsT2uWl6) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**: Check the **quick start** section of [BigMLer's documentation](http://bigmler.readthedocs.org/en/latest/#quick-start) to know how to **install** and set your **credentials** before using BigMLer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Creating a prediction\n",
"The **prediction workflow** is just one line in BigMLer:\n",
"![prediction workflow](https://cloud.githubusercontent.com/assets/722203/11405481/5f8f4d00-93a8-11e5-93df-723b1eeb5abb.png)\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2015-11-24 19:44:08] Retrieving project info.\n",
"[2015-11-24 19:44:09] Creating source.\n",
"[2015-11-24 19:44:11] Source created: https://bigml.com/dashboard/source/5654aff93faa625ace0006b4\n",
"[2015-11-24 19:44:11] Creating dataset.\n",
"[2015-11-24 19:44:15] Dataset created: https://bigml.com/dashboard/dataset/5654affc3faa625ace0006b9\n",
"[2015-11-24 19:44:15] Creating model.\n",
"[2015-11-24 19:44:18] Model created: https://bigml.com/dashboard/model/5654afff8ea1a454e70008a7\n",
"[2015-11-24 19:44:18] Retrieving model. https://bigml.com/dashboard/model/5654afff8ea1a454e70008a7\n",
"[2015-11-24 19:44:18] Creating local predictions.\n",
"\n",
"Generated files:\n",
"\n",
" diabetes\n",
" ├─bigmler_sessions\n",
" ├─predictions.csv\n",
" ├─models\n",
" ├─dataset\n",
" └─source\n",
"\n"
]
}
],
"source": [
"!bigmler --train diabetes.csv --name \"Diabetes dataset\" \\\n",
" --test diabetes_test.csv \\\n",
" --tag \"PyConES\" --tag \"diabetes\" \\\n",
" --project \"BigMLer in PyConES\" \\\n",
" --prediction-info full \\\n",
" --prediction-header \\\n",
" --output-dir diabetes"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>pregnancies</th>\n",
" <th>plasma glucose</th>\n",
" <th>blood pressure</th>\n",
" <th>triceps skin thickness</th>\n",
" <th>insulin</th>\n",
" <th>bmi</th>\n",
" <th>diabetes pedigree</th>\n",
" <th>age</th>\n",
" <th>diabetes</th>\n",
" <th>confidence</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2</td>\n",
" <td>197</td>\n",
" <td>70</td>\n",
" <td>45</td>\n",
" <td>543</td>\n",
" <td>30.5</td>\n",
" <td>158</td>\n",
" <td>53</td>\n",
" <td>True</td>\n",
" <td>0.83182</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>8</td>\n",
" <td>125</td>\n",
" <td>96</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>232</td>\n",
" <td>54</td>\n",
" <td>False</td>\n",
" <td>0.43849</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4</td>\n",
" <td>110</td>\n",
" <td>92</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>37.6</td>\n",
" <td>191</td>\n",
" <td>30</td>\n",
" <td>True</td>\n",
" <td>0.34237</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3</td>\n",
" <td>126</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>235</td>\n",
" <td>39.3</td>\n",
" <td>704</td>\n",
" <td>27</td>\n",
" <td>False</td>\n",
" <td>0.85688</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>8</td>\n",
" <td>99</td>\n",
" <td>84</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>35.4</td>\n",
" <td>388</td>\n",
" <td>50</td>\n",
" <td>True</td>\n",
" <td>0.56551</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>7</td>\n",
" <td>196</td>\n",
" <td>90</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>39.8</td>\n",
" <td>451</td>\n",
" <td>41</td>\n",
" <td>True</td>\n",
" <td>0.83182</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>9</td>\n",
" <td>119</td>\n",
" <td>80</td>\n",
" <td>35</td>\n",
" <td>0</td>\n",
" <td>29.0</td>\n",
" <td>263</td>\n",
" <td>29</td>\n",
" <td>True</td>\n",
" <td>0.34237</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>11</td>\n",
" <td>143</td>\n",
" <td>94</td>\n",
" <td>33</td>\n",
" <td>146</td>\n",
" <td>36.6</td>\n",
" <td>254</td>\n",
" <td>51</td>\n",
" <td>True</td>\n",
" <td>0.64566</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>10</td>\n",
" <td>125</td>\n",
" <td>70</td>\n",
" <td>26</td>\n",
" <td>115</td>\n",
" <td>31.1</td>\n",
" <td>205</td>\n",
" <td>41</td>\n",
" <td>True</td>\n",
" <td>0.75750</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>7</td>\n",
" <td>147</td>\n",
" <td>76</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>39.4</td>\n",
" <td>257</td>\n",
" <td>43</td>\n",
" <td>True</td>\n",
" <td>0.64566</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" pregnancies plasma glucose blood pressure triceps skin thickness \\\n",
"0 2 197 70 45 \n",
"1 8 125 96 NaN \n",
"2 4 110 92 NaN \n",
"3 3 126 NaN NaN \n",
"4 8 99 84 NaN \n",
"5 7 196 90 NaN \n",
"6 9 119 80 35 \n",
"7 11 143 94 33 \n",
"8 10 125 70 26 \n",
"9 7 147 76 0 \n",
"\n",
" insulin bmi diabetes pedigree age diabetes confidence \n",
"0 543 30.5 158 53 True 0.83182 \n",
"1 0 0.0 232 54 False 0.43849 \n",
"2 0 37.6 191 30 True 0.34237 \n",
"3 235 39.3 704 27 False 0.85688 \n",
"4 0 35.4 388 50 True 0.56551 \n",
"5 0 39.8 451 41 True 0.83182 \n",
"6 0 29.0 263 29 True 0.34237 \n",
"7 146 36.6 254 51 True 0.64566 \n",
"8 115 31.1 205 41 True 0.75750 \n",
"9 0 39.4 257 43 True 0.64566 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import pandas as pd\n",
"from IPython.display import display, IFrame, JSON\n",
"PREDICTIONS_FILE = 'diabetes/predictions.csv'\n",
"display(pd.read_csv(PREDICTIONS_FILE, nrows=10))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The files contained in the output folder store the IDs of the resources created in the BigMLer command."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"dataset/5654affc3faa625ace0006b9\r\n"
]
}
],
"source": [
"!cat ./diabetes/dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Creating an evaluation\n",
"BigMLer can use existing resources too. For instance, this is the **evaluation workflow**, where the original dataset is split in training/test datasets to evaluate your models.\n",
"![evaluation workflow](https://cloud.githubusercontent.com/assets/722203/11405488/6b1f334c-93a8-11e5-9771-058cfc9d4305.png)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2015-11-24 19:44:28] Retrieving dataset. https://bigml.com/dashboard/dataset/5654affc3faa625ace0006b9\n",
"[2015-11-24 19:44:29] Creating dataset.\n",
"[2015-11-24 19:44:32] Dataset created: https://bigml.com/dashboard/dataset/5654b00d8ea1a45af90003a7\n",
"[2015-11-24 19:44:32] Creating dataset.\n",
"[2015-11-24 19:44:35] Dataset created: https://bigml.com/dashboard/dataset/5654b0108ea1a454e70008af\n",
"[2015-11-24 19:44:35] Creating model.\n",
"[2015-11-24 19:44:40] Model created: https://bigml.com/dashboard/model/5654b0133faa62548800033a\n",
"[2015-11-24 19:44:40] Creating evaluations.\n",
"[2015-11-24 19:44:44] Evaluation created: https://bigml.com/dashboard/evaluation/5654b01a8ea1a45af90003b2\n",
"[2015-11-24 19:44:44] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654b01a8ea1a45af90003b2\n",
"\n",
"Generated files:\n",
"\n",
" diabetes_eval\n",
" ├─bigmler_sessions\n",
" ├─evaluation.txt\n",
" ├─dataset_train\n",
" ├─dataset_test\n",
" ├─evaluation.json\n",
" ├─models\n",
" └─evaluations\n",
"\n"
]
}
],
"source": [
"!bigmler --datasets diabetes/dataset \\\n",
" --name \"Diabetes split\" \\\n",
" --test-split 0.2 \\\n",
" --seed \"PyConES 2015\" \\\n",
" --tag \"PyConES\" --tag \"diabetes\" \\\n",
" --output-dir diabetes_eval \\\n",
" --evaluate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The evaluation results are stored in human-readable format in the **evaluation.txt** file and in JSON in **evaluation.json**. The evaluation file shows the metrics for the *model* predictions, but also for the *mode* prediction or a *random* prediction to help comparing them."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{ 'class_names': ['false', 'true'],\r\n",
" 'mode': { 'accuracy': 0.6,\r\n",
" 'average_f_measure': 0.375,\r\n",
" 'average_phi': 0,\r\n",
" 'average_precision': 0.3,\r\n",
" 'average_recall': 0.5,\r\n",
" 'confusion_matrix': [[24, 0], [16, 0]],\r\n",
" 'per_class_statistics': [ { 'accuracy': 0.6,\r\n",
" 'class_name': 'false',\r\n",
" 'f_measure': 0.7499999999999999,\r\n",
" 'phi_coefficient': 0,\r\n",
" 'precision': 0.6,\r\n",
" 'present_in_test_data': True,\r\n",
" 'recall': 1.0},\r\n",
" { 'accuracy': 0.6,\r\n",
" 'class_name': 'true',\r\n",
" 'f_measure': 0,\r\n",
" 'phi_coefficient': 0,\r\n",
" 'precision': 0,\r\n",
" 'present_in_test_data': True,\r\n",
" 'recall': 0.0}]},\r\n",
" 'model': { 'accuracy': 0.625,\r\n",
" 'average_f_measure': 0.58071,\r\n",
" 'average_phi': 0.18286,\r\n",
" 'average_precision': 0.60031,\r\n",
" 'average_recall': 0.58333,\r\n",
" 'confusion_matrix': [[19, 5], [10, 6]],\r\n",
" 'per_class_statistics': [ { 'accuracy': 0.625,\r\n",
" 'class_name': 'false',\r\n",
" 'f_measure': 0.7169811320754716,\r\n",
" 'phi_coefficient': 0.18286012835299778,\r\n",
" 'precision': 0.6551724137931034,\r\n",
" 'present_in_test_data': True,\r\n",
" 'recall': 0.7916666666666666},\r\n",
" { 'accuracy': 0.625,\r\n",
" 'class_name': 'true',\r\n",
" 'f_measure': 0.4444444444444444,\r\n",
" 'phi_coefficient': 0.18286012835299778,\r\n",
" 'precision': 0.5454545454545454,\r\n",
" 'present_in_test_data': True,\r\n",
" 'recall': 0.375}]},\r\n",
" 'random': { 'accuracy': 0.6,\r\n",
" 'average_f_measure': 0.59596,\r\n",
" 'average_phi': 0.20412,\r\n",
" 'average_precision': 0.6,\r\n",
" 'average_recall': 0.60417,\r\n",
" 'confusion_matrix': [[14, 10], [6, 10]],\r\n",
" 'per_class_statistics': [ { 'accuracy': 0.6,\r\n",
" 'class_name': 'false',\r\n",
" 'f_measure': 0.6363636363636365,\r\n",
" 'phi_coefficient': 0.2041241452319315,\r\n",
" 'precision': 0.7,\r\n",
" 'present_in_test_data': True,\r\n",
" 'recall': 0.5833333333333334},\r\n",
" { 'accuracy': 0.6,\r\n",
" 'class_name': 'true',\r\n",
" 'f_measure': 0.5555555555555556,\r\n",
" 'phi_coefficient': 0.2041241452319315,\r\n",
" 'precision': 0.5,\r\n",
" 'present_in_test_data': True,\r\n",
" 'recall': 0.625}]}}\r\n"
]
}
],
"source": [
"!cat ./diabetes_eval/evaluation.txt"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"#### Creating a k-fold cross-validation\n",
"The higher level of abstraction allows the user to build more complex calls. For instance, asking for **k-fold cross-validation.**\n",
"![k-fold cross-validation workflow](https://cloud.githubusercontent.com/assets/722203/11405551/bd315f2a-93a8-11e5-93e7-b9186b2f697e.png)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Creating the kfold datasets............\n",
"[2015-11-24 19:44:52] Retrieving dataset. https://bigml.com/dashboard/dataset/5654affc3faa625ace0006b9\n",
"[2015-11-24 19:44:53] Creating dataset.\n",
"[2015-11-24 19:44:56] Dataset created: https://bigml.com/dashboard/dataset/5654b0253faa6254d30005e3\n",
"\n",
"Generated files:\n",
"\n",
" test\n",
" ├─bigmler_sessions\n",
" ├─dataset_5654affc3faa625ace0006b9\n",
" ├─dataset_5654b0253faa6254d30005e3\n",
" └─dataset_gen\n",
"\n",
"[2015-11-24 19:44:56] Retrieving dataset. https://bigml.com/dashboard/dataset/5654affc3faa625ace0006b9\n",
"[2015-11-24 19:44:56] Creating dataset.\n",
"[2015-11-24 19:44:59] Dataset created: https://bigml.com/dashboard/dataset/5654b0293faa6254d30005e7\n",
"\n",
"Generated files:\n",
"\n",
" test\n",
" ├─bigmler_sessions\n",
" ├─dataset_5654affc3faa625ace0006b9\n",
" ├─dataset_5654b0253faa6254d30005e3\n",
" ├─dataset_5654b0293faa6254d30005e7\n",
" └─dataset_gen\n",
"\n",
"[2015-11-24 19:44:59] Retrieving dataset. https://bigml.com/dashboard/dataset/5654affc3faa625ace0006b9\n",
"[2015-11-24 19:44:59] Creating dataset.\n",
"[2015-11-24 19:45:02] Dataset created: https://bigml.com/dashboard/dataset/5654b02c3faa625ace0006c7\n",
"\n",
"Generated files:\n",
"\n",
" test\n",
" ├─bigmler_sessions\n",
" ├─dataset_5654affc3faa625ace0006b9\n",
" ├─dataset_5654b02c3faa625ace0006c7\n",
" ├─dataset_5654b0253faa6254d30005e3\n",
" ├─dataset_5654b0293faa6254d30005e7\n",
" └─dataset_gen\n",
"\n",
"[2015-11-24 19:45:02] Retrieving dataset. https://bigml.com/dashboard/dataset/5654affc3faa625ace0006b9\n",
"[2015-11-24 19:45:03] Creating dataset.\n",
"[2015-11-24 19:45:06] Dataset created: https://bigml.com/dashboard/dataset/5654b02f3faa6252120002f2\n",
"\n",
"Generated files:\n",
"\n",
" test\n",
" ├─bigmler_sessions\n",
" ├─dataset_5654affc3faa625ace0006b9\n",
" ├─dataset_5654b02f3faa6252120002f2\n",
" ├─dataset_5654b02c3faa625ace0006c7\n",
" ├─dataset_5654b0253faa6254d30005e3\n",
" ├─dataset_5654b0293faa6254d30005e7\n",
" └─dataset_gen\n",
"\n",
"[2015-11-24 19:45:06] Retrieving dataset. https://bigml.com/dashboard/dataset/5654affc3faa625ace0006b9\n",
"[2015-11-24 19:45:07] Creating dataset.\n",
"[2015-11-24 19:45:10] Dataset created: https://bigml.com/dashboard/dataset/5654b0343faa62548800033f\n",
"\n",
"Generated files:\n",
"\n",
" test\n",
" ├─bigmler_sessions\n",
" ├─dataset_5654affc3faa625ace0006b9\n",
" ├─dataset_5654b02f3faa6252120002f2\n",
" ├─dataset_5654b0343faa62548800033f\n",
" ├─dataset_5654b02c3faa625ace0006c7\n",
" ├─dataset_5654b0253faa6254d30005e3\n",
" ├─dataset_5654b0293faa6254d30005e7\n",
" └─dataset_gen\n",
"\n",
"Creating the kfold evaluations.........\n",
"[2015-11-24 19:45:10] Creating models.\n",
"[2015-11-24 19:45:47] Creating evaluations.\n",
"[2015-11-24 19:46:01] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654b05d3faa625488000346\n",
"[2015-11-24 19:46:03] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654b0623faa625ace0006e6\n",
"[2015-11-24 19:46:05] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654b0648ea1a454e70008c4\n",
"[2015-11-24 19:46:07] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654b0678ea1a451410002f8\n",
"[2015-11-24 19:46:09] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654b0698ea1a4549d000338\n",
"\n",
"Generated files:\n",
"\n",
" k_fold0\n",
" ├─bigmler_sessions\n",
" ├─evaluation__5654b05d3faa625488000346.txt\n",
" ├─evaluation.txt\n",
" ├─evaluation__5654b05d3faa625488000346.json\n",
" ├─evaluation__5654b0623faa625ace0006e6.txt\n",
" ├─evaluation__5654b0623faa625ace0006e6.json\n",
" ├─evaluation__5654b0698ea1a4549d000338.txt\n",
" ├─evaluation__5654b0648ea1a454e70008c4.txt\n",
" ├─evaluation.json\n",
" ├─evaluation__5654b0648ea1a454e70008c4.json\n",
" ├─models\n",
" ├─evaluation__5654b0678ea1a451410002f8.txt\n",
" ├─evaluations\n",
" ├─evaluation__5654b0678ea1a451410002f8.json\n",
" └─evaluation__5654b0698ea1a4549d000338.json\n",
"\n"
]
}
],
"source": [
"!bigmler analyze --cross-validation \\\n",
" --dataset $(cat diabetes/dataset) \\\n",
" --k-folds 5 \\\n",
" --output-dir diabetes_cv"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{ 'class_names': ['false', 'true'],\r\n",
" 'mode': { 'accuracy_standard_deviation': 0.07314369419163898,\r\n",
" 'average_accuracy': 0.595,\r\n",
" 'average_f_measure': 0.371684,\r\n",
" 'average_f_measure_standard_deviation': 0.029637730412432065,\r\n",
" 'average_phi': 0.0,\r\n",
" 'average_phi_standard_deviation': 0.0,\r\n",
" 'average_precision': 0.2975,\r\n",
" 'average_precision_standard_deviation': 0.03657184709581949,\r\n",
" 'average_recall': 0.5,\r\n",
" 'average_recall_standard_deviation': 0.0,\r\n",
" 'confusion_matrix': [[119, 0], [81, 0]],\r\n",
" 'per_class_statistics': [ { 'accuracy_standard_deviation': 0.07314369419163898,\r\n",
" 'average_accuracy': 0.595,\r\n",
" 'average_f_measure': 0.7433649844656594,\r\n",
" 'average_phi_coefficient': 0.0,\r\n",
" 'average_precision': 0.595,\r\n",
" 'average_recall': 1.0,\r\n",
" 'class_name': 'false',\r\n",
" 'f_measure_standard_deviation': 0.059270072942528985,\r\n",
" 'occurrences': 5,\r\n",
" 'phi_coefficient_standard_deviation': 0.0,\r\n",
" 'precision_standard_deviation': 0.07314369419163898,\r\n",
" 'present_in_test_data': True,\r\n",
" 'recall_standard_deviation': 0.0},\r\n",
" { 'accuracy_standard_deviation': 0.07314369419163898,\r\n",
" 'average_accuracy': 0.595,\r\n",
" 'average_f_measure': 0.0,\r\n",
" 'average_phi_coefficient': 0.0,\r\n",
" 'average_precision': 0.0,\r\n",
" 'average_recall': 0.0,\r\n",
" 'class_name': 'true',\r\n",
" 'f_measure_standard_deviation': 0.0,\r\n",
" 'occurrences': 5,\r\n",
" 'phi_coefficient_standard_deviation': 0.0,\r\n",
" 'precision_standard_deviation': 0.0,\r\n",
" 'present_in_test_data': True,\r\n",
" 'recall_standard_deviation': 0.0}]},\r\n",
" 'model': { 'accuracy_standard_deviation': 0.04636809247747853,\r\n",
" 'average_accuracy': 0.635,\r\n",
" 'average_f_measure': 0.6105100000000001,\r\n",
" 'average_f_measure_standard_deviation': 0.0631546026826232,\r\n",
" 'average_phi': 0.23418199999999997,\r\n",
" 'average_phi_standard_deviation': 0.12006536759615571,\r\n",
" 'average_precision': 0.618002,\r\n",
" 'average_precision_standard_deviation': 0.05829306799268675,\r\n",
" 'average_recall': 0.616562,\r\n",
" 'average_recall_standard_deviation': 0.06214006771801911,\r\n",
" 'confusion_matrix': [[84, 35], [38, 43]],\r\n",
" 'per_class_statistics': [ { 'accuracy_standard_deviation': 0.04636809247747853,\r\n",
" 'average_accuracy': 0.635,\r\n",
" 'average_f_measure': 0.6964194386027269,\r\n",
" 'average_phi_coefficient': 0.23418084455531116,\r\n",
" 'average_precision': 0.692809364548495,\r\n",
" 'average_recall': 0.7085810330020856,\r\n",
" 'class_name': 'false',\r\n",
" 'f_measure_standard_deviation': 0.028546184570343625,\r\n",
" 'occurrences': 5,\r\n",
" 'phi_coefficient_standard_deviation': 0.12006661387806686,\r\n",
" 'precision_standard_deviation': 0.06427446293868262,\r\n",
" 'present_in_test_data': True,\r\n",
" 'recall_standard_deviation': 0.05914282309287293},\r\n",
" { 'accuracy_standard_deviation': 0.04636809247747853,\r\n",
" 'average_accuracy': 0.635,\r\n",
" 'average_f_measure': 0.5245995320936312,\r\n",
" 'average_phi_coefficient': 0.23418084455531116,\r\n",
" 'average_precision': 0.5431932773109244,\r\n",
" 'average_recall': 0.5245421245421245,\r\n",
" 'class_name': 'true',\r\n",
" 'f_measure_standard_deviation': 0.1168140122980512,\r\n",
" 'occurrences': 5,\r\n",
" 'phi_coefficient_standard_deviation': 0.12006661387806686,\r\n",
" 'precision_standard_deviation': 0.11809351063640704,\r\n",
" 'present_in_test_data': True,\r\n",
" 'recall_standard_deviation': 0.14052573019337755}]},\r\n",
" 'random': { 'accuracy_standard_deviation': 0.09082951062292476,\r\n",
" 'average_accuracy': 0.525,\r\n",
" 'average_f_measure': 0.517402,\r\n",
" 'average_f_measure_standard_deviation': 0.09204940877593945,\r\n",
" 'average_phi': 0.068816,\r\n",
" 'average_phi_standard_deviation': 0.18460109995338597,\r\n",
" 'average_precision': 0.535818,\r\n",
" 'average_precision_standard_deviation': 0.09407348157690348,\r\n",
" 'average_recall': 0.533046,\r\n",
" 'average_recall_standard_deviation': 0.09061168436796657,\r\n",
" 'confusion_matrix': [[60, 59], [36, 45]],\r\n",
" 'per_class_statistics': [ { 'accuracy_standard_deviation': 0.09082951062292476,\r\n",
" 'average_accuracy': 0.525,\r\n",
" 'average_f_measure': 0.557698781954101,\r\n",
" 'average_phi_coefficient': 0.06881400385377819,\r\n",
" 'average_precision': 0.6344761904761905,\r\n",
" 'average_recall': 0.5136367180577708,\r\n",
" 'class_name': 'false',\r\n",
" 'f_measure_standard_deviation': 0.08451214683326931,\r\n",
" 'occurrences': 5,\r\n",
" 'phi_coefficient_standard_deviation': 0.1846005188541411,\r\n",
" 'precision_standard_deviation': 0.10513250466632557,\r\n",
" 'present_in_test_data': True,\r\n",
" 'recall_standard_deviation': 0.11645837783449035},\r\n",
" { 'accuracy_standard_deviation': 0.09082951062292476,\r\n",
" 'average_accuracy': 0.525,\r\n",
" 'average_f_measure': 0.47711064129668784,\r\n",
" 'average_phi_coefficient': 0.06881400385377819,\r\n",
" 'average_precision': 0.4371578947368421,\r\n",
" 'average_recall': 0.5524542124542124,\r\n",
" 'class_name': 'true',\r\n",
" 'f_measure_standard_deviation': 0.11898723769246959,\r\n",
" 'occurrences': 5,\r\n",
" 'phi_coefficient_standard_deviation': 0.1846005188541411,\r\n",
" 'precision_standard_deviation': 0.13348248352200454,\r\n",
" 'present_in_test_data': True,\r\n",
" 'recall_standard_deviation': 0.148423968263044}]}}\r\n"
]
}
],
"source": [
"!cat ./diabetes_cv/test/k_fold0/evaluation.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Removing anomalies\n",
"We can also get rid of outliers by detecting the anomalies in the dataset to generate an **anomaly-free dataset**."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2015-11-24 19:46:25] Retrieving dataset. https://bigml.com/dashboard/dataset/5654affc3faa625ace0006b9\n",
"[2015-11-24 19:46:26] Creating anomaly detector.\n",
"[2015-11-24 19:46:48] Anomaly created: https://bigml.com/dashboard/anomaly/5654b0823faa625ace0006fb\n",
"[2015-11-24 19:46:48] Creating dataset.\n",
"[2015-11-24 19:46:52] Dataset created: https://bigml.com/dashboard/dataset/5654b0998ea1a407ae005f09\n",
"\n",
"Generated files:\n",
"\n",
" diabetes_anomaly\n",
" ├─bigmler_sessions\n",
" ├─dataset_gen\n",
" └─anomalies\n",
"\n"
]
}
],
"source": [
"!bigmler anomaly --datasets diabetes/dataset \\\n",
" --name \"Diabetes anomaly\" \\\n",
" --seed \"PyConES 2015\" \\\n",
" --tag \"PyConES\" --tag \"diabetes\" \\\n",
" --top-n 1 \\\n",
" --forest-size 50 \\\n",
" --anomalies-dataset out \\\n",
" --output-dir diabetes_anomaly"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"dataset/5654b0998ea1a407ae005f09\r\n"
]
}
],
"source": [
"!cat diabetes_anomaly/dataset_gen"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The model based in the anomaly-free dataset performs better than the original one."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Creating the kfold datasets............\n",
"[2015-11-24 19:47:02] Retrieving dataset. https://bigml.com/dashboard/dataset/5654b0998ea1a407ae005f09\n",
"[2015-11-24 19:47:02] Creating dataset.\n",
"[2015-11-24 19:47:05] Dataset created: https://bigml.com/dashboard/dataset/5654b0a73faa6254d3000605\n",
"\n",
"Generated files:\n",
"\n",
" test\n",
" ├─bigmler_sessions\n",
" ├─dataset_5654b0998ea1a407ae005f09\n",
" ├─dataset_gen\n",
" └─dataset_5654b0a73faa6254d3000605\n",
"\n",
"[2015-11-24 19:47:05] Retrieving dataset. https://bigml.com/dashboard/dataset/5654b0998ea1a407ae005f09\n",
"[2015-11-24 19:47:06] Creating dataset.\n",
"[2015-11-24 19:48:52] Dataset created: https://bigml.com/dashboard/dataset/5654b0aa8ea1a407ae005f12\n",
"\n",
"Generated files:\n",
"\n",
" test\n",
" ├─bigmler_sessions\n",
" ├─dataset_5654b0aa8ea1a407ae005f12\n",
" ├─dataset_5654b0998ea1a407ae005f09\n",
" ├─dataset_gen\n",
" └─dataset_5654b0a73faa6254d3000605\n",
"\n",
"[2015-11-24 19:48:52] Retrieving dataset. https://bigml.com/dashboard/dataset/5654b0998ea1a407ae005f09\n",
"[2015-11-24 19:48:53] Creating dataset.\n",
"[2015-11-24 19:48:56] Dataset created: https://bigml.com/dashboard/dataset/5654b1158ea1a454e70008e5\n",
"\n",
"Generated files:\n",
"\n",
" test\n",
" ├─bigmler_sessions\n",
" ├─dataset_5654b0aa8ea1a407ae005f12\n",
" ├─dataset_5654b0998ea1a407ae005f09\n",
" ├─dataset_5654b1158ea1a454e70008e5\n",
" ├─dataset_gen\n",
" └─dataset_5654b0a73faa6254d3000605\n",
"\n",
"[2015-11-24 19:48:56] Retrieving dataset. https://bigml.com/dashboard/dataset/5654b0998ea1a407ae005f09\n",
"[2015-11-24 19:48:56] Creating dataset.\n",
"[2015-11-24 19:49:00] Dataset created: https://bigml.com/dashboard/dataset/5654b1193faa625488000358\n",
"\n",
"Generated files:\n",
"\n",
" test\n",
" ├─bigmler_sessions\n",
" ├─dataset_5654b0aa8ea1a407ae005f12\n",
" ├─dataset_5654b0998ea1a407ae005f09\n",
" ├─dataset_5654b1158ea1a454e70008e5\n",
" ├─dataset_5654b1193faa625488000358\n",
" ├─dataset_gen\n",
" └─dataset_5654b0a73faa6254d3000605\n",
"\n",
"[2015-11-24 19:49:00] Retrieving dataset. https://bigml.com/dashboard/dataset/5654b0998ea1a407ae005f09\n",
"[2015-11-24 19:49:00] Creating dataset.\n",
"[2015-11-24 19:49:03] Dataset created: https://bigml.com/dashboard/dataset/5654b11d8ea1a45af90003d4\n",
"\n",
"Generated files:\n",
"\n",
" test\n",
" ├─bigmler_sessions\n",
" ├─dataset_5654b0aa8ea1a407ae005f12\n",
" ├─dataset_5654b0998ea1a407ae005f09\n",
" ├─dataset_5654b11d8ea1a45af90003d4\n",
" ├─dataset_5654b1158ea1a454e70008e5\n",
" ├─dataset_5654b1193faa625488000358\n",
" ├─dataset_gen\n",
" └─dataset_5654b0a73faa6254d3000605\n",
"\n",
"Creating the kfold evaluations.........\n",
"[2015-11-24 19:49:03] Creating models.\n",
"[2015-11-24 19:49:45] Creating evaluations.\n",
"[2015-11-24 19:50:04] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654b14a8ea1a45141000318\n",
"[2015-11-24 19:50:06] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654b14f3faa625212000317\n",
"[2015-11-24 19:50:08] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654b1548ea1a454e70008fa\n",
"[2015-11-24 19:50:10] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654b1568ea1a45af90003ed\n",
"[2015-11-24 19:50:12] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654b15b8ea1a454e70008ff\n",
"\n",
"Generated files:\n",
"\n",
" k_fold0\n",
" ├─bigmler_sessions\n",
" ├─evaluation.txt\n",
" ├─evaluation__5654b1568ea1a45af90003ed.txt\n",
" ├─evaluation__5654b14f3faa625212000317.txt\n",
" ├─evaluation__5654b14a8ea1a45141000318.json\n",
" ├─evaluation__5654b1548ea1a454e70008fa.txt\n",
" ├─evaluation__5654b15b8ea1a454e70008ff.txt\n",
" ├─evaluation.json\n",
" ├─evaluation__5654b14f3faa625212000317.json\n",
" ├─models\n",
" ├─evaluation__5654b1548ea1a454e70008fa.json\n",
" ├─evaluations\n",
" ├─evaluation__5654b1568ea1a45af90003ed.json\n",
" ├─evaluation__5654b15b8ea1a454e70008ff.json\n",
" └─evaluation__5654b14a8ea1a45141000318.txt\n",
"\n"
]
}
],
"source": [
"!bigmler analyze --cross-validation \\\n",
" --dataset $(cat diabetes_anomaly/dataset_gen) \\\n",
" --k-folds 5 \\\n",
" --output-dir diabetes_cv_anomaly_free"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[0m0.6688459999999998\u001b[0m\r\n"
]
}
],
"source": [
"!cat ./diabetes_cv_anomaly_free/test/k_fold0/evaluation.json | jq .model.average_accuracy"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[0m0.635\u001b[0m\r\n"
]
}
],
"source": [
"!cat ./diabetes_cv/test/k_fold0/evaluation.json | jq .model.average_accuracy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Clustering your data\n",
"We can cluster together similar data points and show the cluster-associated datasets."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2015-11-24 20:16:57] Retrieving dataset. https://bigml.com/dashboard/dataset/5654affc3faa625ace0006b9\n",
"[2015-11-24 20:16:57] Creating cluster.\n",
"[2015-11-24 20:17:15] Cluster created: https://bigml.com/dashboard/cluster/5654b7aa8ea1a4549d000398\n",
"[2015-11-24 20:17:15] Creating dataset.\n",
"[2015-11-24 20:17:19] Dataset created: https://bigml.com/dashboard/dataset/5654b7bc3faa6254d30006b3\n",
"[2015-11-24 20:17:19] Creating dataset.\n",
"[2015-11-24 20:17:22] Dataset created: https://bigml.com/dashboard/dataset/5654b7c08ea1a454e70009cf\n",
"\n",
"Generated files:\n",
"\n",
" diabetes_cluster\n",
" ├─bigmler_sessions\n",
" ├─dataset_cluster\n",
" └─clusters\n",
"\n"
]
}
],
"source": [
"!bigmler cluster --datasets diabetes/dataset \\\n",
" --name \"Diabetes cluster\" \\\n",
" --seed \"PyConES 2015\" \\\n",
" --tag \"PyConES\" --tag \"diabetes\" \\\n",
" --cluster-datasets \\\n",
" --output-dir diabetes_cluster"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Reproducible resources\n",
"We can reproduce in a script how the dataset was created"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Starting reification for dataset/5654b3a73faa625ace0007cd\n",
"\n",
"[2015-11-24 20:17:25] Analyzing dataset/5654b3a73faa625ace0007cd.\n",
"[2015-11-24 20:17:27] New origin found for dataset/5654b3a73faa625ace0007cd: cluster/5654b3978ea1a454e700098d\n",
"[2015-11-24 20:17:27] Analyzing cluster/5654b3978ea1a454e700098d.\n",
"[2015-11-24 20:17:27] New origin found for cluster/5654b3978ea1a454e700098d: dataset/5654affc3faa625ace0006b9\n",
"[2015-11-24 20:17:27] Analyzing dataset/5654affc3faa625ace0006b9.\n",
"[2015-11-24 20:17:28] New origin found for dataset/5654affc3faa625ace0006b9: source/5654aff93faa625ace0006b4\n",
"[2015-11-24 20:17:28] Analyzing source/5654aff93faa625ace0006b4.\n",
"[2015-11-24 20:17:28] New origin found for source/5654aff93faa625ace0006b4: diabetes.csv\n",
"\n",
"Reification complete. See the results in diabetes_dataset_reified/reify.py\n",
"\n",
"\n",
"Generated files:\n",
"\n",
" diabetes_dataset_reified\n",
" ├─bigmler_sessions\n",
" └─reify.py\n",
"\n"
]
}
],
"source": [
"!bigmler reify --id $(head -n 1 diabetes_cluster/dataset_cluster) \\\n",
" --output-dir diabetes_dataset_reified"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The python code that is needed to reproduce the dataset can be found in the reify.py file"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\"\"\"Python code to reify dataset/5654b3a73faa625ace0007cd\r\n",
"\r\n",
"\"\"\"\r\n",
"\r\n",
"from bigml.api import BigML\r\n",
"api = BigML()\r\n",
"\r\n",
"source1 = api.create_source(\"diabetes.csv\", \\\r\n",
" {'category': 12,\r\n",
" 'description': 'Created using BigMLer',\r\n",
" 'name': 'Diabetes dataset',\r\n",
" 'project': 'project/56468dd2c6c19b6cf3004e1e',\r\n",
" 'tags': ['PyConES', 'diabetes', 'BigMLer', 'BigMLer_TueNov2415_194408']})\r\n",
"api.ok(source1)\r\n",
"\r\n",
"dataset1 = api.create_dataset(source1, \\\r\n",
" {'name': 'Diabetes dataset',\r\n",
" 'tags': ['PyConES', 'diabetes', 'BigMLer', 'BigMLer_TueNov2415_194408']})\r\n",
"api.ok(dataset1)\r\n",
"\r\n",
"cluster1 = api.create_cluster(dataset1, \\\r\n",
" {'cluster_seed': 'BigML, Machine Learning made easy',\r\n",
" 'name': 'Diabetes cluster',\r\n",
" 'seed': 'PyConES 2015',\r\n",
" 'tags': ['PyConES', 'diabetes', 'BigMLer', 'BigMLer_TueNov2415_195934']})\r\n",
"api.ok(cluster1)\r\n",
"\r\n",
"dataset2 = api.create_dataset(cluster1, \\\r\n",
" {'centroid': '000001',\r\n",
" 'tags': ['PyConES', 'diabetes', 'BigMLer', 'BigMLer_TueNov2415_195934']})\r\n",
"api.ok(dataset2)\r\n",
"\r\n"
]
}
],
"source": [
"!cat diabetes_dataset_reified/reify.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Model optimization: number of nodes\n",
"And we can improve our models by finding the **number of nodes** that best optimizes a concrete evaluation metric."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Creating the kfold datasets............\n",
"[2015-11-24 20:53:24] Retrieving dataset. https://bigml.com/dashboard/dataset/5654affc3faa625ace0006b9\n",
"[2015-11-24 20:53:25] Creating dataset.\n",
"[2015-11-24 20:53:28] Dataset created: https://bigml.com/dashboard/dataset/5654c0353faa625ace00089d\n",
"\n",
"Generated files:\n",
"\n",
" test\n",
" ├─bigmler_sessions\n",
" ├─dataset_5654affc3faa625ace0006b9\n",
" ├─dataset_5654c0353faa625ace00089d\n",
" └─dataset_gen\n",
"\n",
"[2015-11-24 20:53:28] Retrieving dataset. https://bigml.com/dashboard/dataset/5654affc3faa625ace0006b9\n",
"[2015-11-24 20:53:28] Creating dataset.\n",
"[2015-11-24 20:53:31] Dataset created: https://bigml.com/dashboard/dataset/5654c0393faa625ace0008a3\n",
"\n",
"Generated files:\n",
"\n",
" test\n",
" ├─dataset_5654c0393faa625ace0008a3\n",
" ├─bigmler_sessions\n",
" ├─dataset_5654affc3faa625ace0006b9\n",
" ├─dataset_5654c0353faa625ace00089d\n",
" └─dataset_gen\n",
"\n",
"Creating the node threshold set..........\n",
"[2015-11-24 20:53:31] Retrieving dataset. https://bigml.com/dashboard/dataset/5654c0353faa625ace00089d\n",
"[2015-11-24 20:53:31] Creating models.\n",
"[2015-11-24 20:53:39] Creating evaluations.\n",
"[2015-11-24 20:53:43] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654c0443faa62521200038d\n",
"[2015-11-24 20:53:45] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654c0468ea1a454e7000aa3\n",
"\n",
"Generated files:\n",
"\n",
" node_th3\n",
" ├─bigmler_sessions\n",
" ├─evaluation.txt\n",
" ├─evaluation__5654c0468ea1a454e7000aa3.json\n",
" ├─evaluation__5654c0468ea1a454e7000aa3.txt\n",
" ├─evaluation__5654c0443faa62521200038d.json\n",
" ├─evaluation.json\n",
" ├─models\n",
" ├─evaluations\n",
" └─evaluation__5654c0443faa62521200038d.txt\n",
"\n",
"New best node threshold: 3\n",
"Accuracy = 68.00% (score = 0.6799999999999999)\n",
"[2015-11-24 20:53:47] Retrieving dataset. https://bigml.com/dashboard/dataset/5654c0353faa625ace00089d\n",
"[2015-11-24 20:53:47] Creating models.\n",
"[2015-11-24 20:53:57] Creating evaluations.\n",
"[2015-11-24 20:54:01] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654c0568ea1a45af90004a9\n",
"[2015-11-24 20:54:03] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654c0598ea1a454e7000ab0\n",
"\n",
"Generated files:\n",
"\n",
" node_th13\n",
" ├─evaluation__5654c0598ea1a454e7000ab0.txt\n",
" ├─bigmler_sessions\n",
" ├─evaluation.txt\n",
" ├─evaluation__5654c0568ea1a45af90004a9.json\n",
" ├─evaluation__5654c0568ea1a45af90004a9.txt\n",
" ├─evaluation.json\n",
" ├─models\n",
" ├─evaluations\n",
" └─evaluation__5654c0598ea1a454e7000ab0.json\n",
"\n",
"New best node threshold: 13\n",
"Accuracy = 69.00% (score = 0.69)\n",
"[2015-11-24 20:54:05] Retrieving dataset. https://bigml.com/dashboard/dataset/5654c0353faa625ace00089d\n",
"[2015-11-24 20:54:05] Creating models.\n",
"[2015-11-24 20:54:13] Creating evaluations.\n",
"[2015-11-24 20:54:17] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654c0668ea1a407ae006007\n",
"[2015-11-24 20:54:19] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654c0693faa625ace0008be\n",
"\n",
"Generated files:\n",
"\n",
" node_th23\n",
" ├─bigmler_sessions\n",
" ├─evaluation__5654c0693faa625ace0008be.json\n",
" ├─evaluation.txt\n",
" ├─evaluation__5654c0668ea1a407ae006007.txt\n",
" ├─evaluation__5654c0693faa625ace0008be.txt\n",
" ├─evaluation__5654c0668ea1a407ae006007.json\n",
" ├─evaluation.json\n",
" ├─models\n",
" └─evaluations\n",
"\n",
"[2015-11-24 20:54:21] Retrieving dataset. https://bigml.com/dashboard/dataset/5654c0353faa625ace00089d\n",
"[2015-11-24 20:54:22] Creating models.\n",
"[2015-11-24 20:54:58] Creating evaluations.\n",
"[2015-11-24 20:55:02] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654c0943faa625ace0008cb\n",
"[2015-11-24 20:55:05] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654c0963faa625ace0008d1\n",
"\n",
"Generated files:\n",
"\n",
" node_th33\n",
" ├─bigmler_sessions\n",
" ├─evaluation__5654c0963faa625ace0008d1.json\n",
" ├─evaluation.txt\n",
" ├─evaluation__5654c0963faa625ace0008d1.txt\n",
" ├─evaluation.json\n",
" ├─evaluation__5654c0943faa625ace0008cb.json\n",
" ├─models\n",
" ├─evaluation__5654c0943faa625ace0008cb.txt\n",
" └─evaluations\n",
"\n",
"[2015-11-24 20:55:07] Retrieving dataset. https://bigml.com/dashboard/dataset/5654c0353faa625ace00089d\n",
"[2015-11-24 20:55:07] Creating models.\n",
"[2015-11-24 20:55:15] Creating evaluations.\n",
"[2015-11-24 20:55:19] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654c0a58ea1a45af90004b3\n",
"[2015-11-24 20:55:21] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654c0a78ea1a454e7000adf\n",
"\n",
"Generated files:\n",
"\n",
" node_th43\n",
" ├─bigmler_sessions\n",
" ├─evaluation.txt\n",
" ├─evaluation__5654c0a78ea1a454e7000adf.txt\n",
" ├─evaluation__5654c0a58ea1a45af90004b3.txt\n",
" ├─evaluation.json\n",
" ├─evaluation__5654c0a78ea1a454e7000adf.json\n",
" ├─models\n",
" ├─evaluations\n",
" └─evaluation__5654c0a58ea1a45af90004b3.json\n",
"\n",
"[2015-11-24 20:55:23] Retrieving dataset. https://bigml.com/dashboard/dataset/5654c0353faa625ace00089d\n",
"[2015-11-24 20:55:23] Creating models.\n",
"[2015-11-24 20:55:38] Creating evaluations.\n",
"[2015-11-24 20:55:45] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654c0bc8ea1a454e7000ae6\n",
"[2015-11-24 20:55:47] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654c0c03faa6252120003a7\n",
"\n",
"Generated files:\n",
"\n",
" node_th53\n",
" ├─bigmler_sessions\n",
" ├─evaluation__5654c0c03faa6252120003a7.json\n",
" ├─evaluation.txt\n",
" ├─evaluation__5654c0c03faa6252120003a7.txt\n",
" ├─evaluation__5654c0bc8ea1a454e7000ae6.json\n",
" ├─evaluation.json\n",
" ├─models\n",
" ├─evaluation__5654c0bc8ea1a454e7000ae6.txt\n",
" └─evaluations\n",
"\n",
"[2015-11-24 20:55:49] Retrieving dataset. https://bigml.com/dashboard/dataset/5654c0353faa625ace00089d\n",
"[2015-11-24 20:55:49] Creating models.\n",
"[2015-11-24 20:55:57] Creating evaluations.\n",
"[2015-11-24 20:56:01] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654c0ce3faa6251d9000566\n",
"[2015-11-24 20:56:02] Retrieving evaluation. https://bigml.com/dashboard/evaluation/5654c0d08ea1a451410003bf\n",
"\n",
"Generated files:\n",
"\n",
" node_th63\n",
" ├─bigmler_sessions\n",
" ├─evaluation.txt\n",
" ├─evaluation__5654c0d08ea1a451410003bf.txt\n",
" ├─evaluation__5654c0ce3faa6251d9000566.txt\n",
" ├─evaluation.json\n",
" ├─evaluation__5654c0ce3faa6251d9000566.json\n",
" ├─models\n",
" ├─evaluations\n",
" └─evaluation__5654c0d08ea1a451410003bf.json\n",
"\n",
"The best node threshold is: 13 \n",
"Accuracy = 69.00%\n",
"To create the final model with the entire dataset using the selected maximum number of nodes use:\n",
"bigmler --dataset dataset/5654affc3faa625ace0006b9 --node-threshold 13\n",
"\n",
"To delete all the resources generated by this analyze subcommand use:\n",
"bigmler delete --from-dir diabetes_nodes_selection\n",
"\n"
]
}
],
"source": [
"!bigmler analyze --nodes \\\n",
" --dataset $(cat diabetes/dataset) \\\n",
" --k-folds 2 \\\n",
" --optimize accuracy \\\n",
" --max-nodes 100 \\\n",
" --min-nodes 3 \\\n",
" --nodes-step 10 \\\n",
" --output-dir diabetes_nodes_selection"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"and show in a report the ROC-curve distribution for the generated models"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"!bigmler report --from-dir diabetes_nodes_selection"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Model optimization: smart feature selection\n",
"Or the **set of features** that generates the model which optimizes a concrete evaluation metric."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2015-11-24 20:58:34] Retrieving project info.\n",
"[2015-11-24 20:58:34] Creating project.\n",
"[2015-11-24 20:58:35] Project \"PyConES 4-featured diabetes\" has been created.\n",
"[2015-11-24 20:58:35] Creating source.\n",
"[2015-11-24 20:58:37] Source created: https://bigml.com/dashboard/source/5654c16b8ea1a407ae00602d\n",
"[2015-11-24 20:58:37] Creating dataset.\n",
"[2015-11-24 20:58:40] Dataset created: https://bigml.com/dashboard/dataset/5654c16e8ea1a407ae006034\n",
"\n",
"Generated files:\n",
"\n",
" diabetes_4\n",
" ├─project\n",
" ├─bigmler_sessions\n",
" ├─dataset\n",
" └─source\n",
"\n"
]
}
],
"source": [
"!bigmler --train data/diabetes.csv \\\n",
" --no-model \\\n",
" --name \"4-featured diabetes\" \\\n",
" --dataset-fields \\\n",
" \"plasma glucose,insulin,diabetes pedigree,diabetes\" \\\n",
" --output-dir diabetes_4 \\\n",
" --project \"PyConES 4-featured diabetes\""
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Creating the kfold datasets............\n",
"[2015-11-19 20:18:56] Retrieving dataset. https://bigml.com/dashboard/dataset/564e207ac6c19b02530002e8\n",
"[2015-11-19 20:18:56] Creating dataset.\n",
"[2015-11-19 20:18:59] Dataset created: https://bigml.com/dashboard/dataset/564e20a1c6c19b02530002ec\n",
"\n",
"Generated files:\n",
"\n",
" test\n",
" ├─bigmler_sessions\n",
" ├─dataset_gen\n",
" ├─dataset_564e207ac6c19b02530002e8\n",
" └─dataset_564e20a1c6c19b02530002ec\n",
"\n",
"[2015-11-19 20:18:59] Retrieving dataset. https://bigml.com/dashboard/dataset/564e207ac6c19b02530002e8\n",
"[2015-11-19 20:19:00] Creating dataset.\n",
"[2015-11-19 20:19:03] Dataset created: https://bigml.com/dashboard/dataset/564e20a4636e1c1c0a0001a8\n",
"\n",
"Generated files:\n",
"\n",
" test\n",
" ├─bigmler_sessions\n",
" ├─dataset_564e20a4636e1c1c0a0001a8\n",
" ├─dataset_gen\n",
" ├─dataset_564e207ac6c19b02530002e8\n",
" └─dataset_564e20a1c6c19b02530002ec\n",
"\n",
"Creating the best features set..........\n",
"[2015-11-19 20:19:03] Creating models.\n",
"[2015-11-19 20:19:11] Creating evaluations.\n",
"[2015-11-19 20:19:15] Retrieving evaluation. https://bigml.com/dashboard/evaluation/564e20b1636e1c1c07000254\n",
"[2015-11-19 20:19:17] Retrieving evaluation. https://bigml.com/dashboard/evaluation/564e20b3636e1c1c1200043f\n",
"\n",
"Generated files:\n",
"\n",
" kfold1\n",
" ├─bigmler_sessions\n",
" ├─evaluation.txt\n",
" ├─evaluation__564e20b1636e1c1c07000254.txt\n",
" ├─evaluation__564e20b1636e1c1c07000254.json\n",
" ├─evaluation__564e20b3636e1c1c1200043f.json\n",
" ├─evaluation__564e20b3636e1c1c1200043f.txt\n",
" ├─evaluation.json\n",
" ├─models\n",
" └─evaluations\n",
"\n",
"[2015-11-19 20:19:19] Creating models.\n",
"[2015-11-19 20:19:28] Creating evaluations.\n",
"[2015-11-19 20:19:32] Retrieving evaluation. https://bigml.com/dashboard/evaluation/564e20c1c6c19b0242000242\n",
"[2015-11-19 20:19:34] Retrieving evaluation. https://bigml.com/dashboard/evaluation/564e20c3636e1c1c12000446\n",
"\n",
"Generated files:\n",
"\n",
" kfold2\n",
" ├─bigmler_sessions\n",
" ├─evaluation.txt\n",
" ├─evaluation__564e20c3636e1c1c12000446.json\n",
" ├─evaluation__564e20c3636e1c1c12000446.txt\n",
" ├─evaluation.json\n",
" ├─models\n",
" ├─evaluation__564e20c1c6c19b0242000242.json\n",
" ├─evaluations\n",
" └─evaluation__564e20c1c6c19b0242000242.txt\n",
"\n",
"[2015-11-19 20:19:36] Creating models.\n",
"[2015-11-19 20:19:44] Creating evaluations.\n",
"[2015-11-19 20:19:47] Retrieving evaluation. https://bigml.com/dashboard/evaluation/564e20d1c6c19b02530002f5\n",
"[2015-11-19 20:19:49] Retrieving evaluation. https://bigml.com/dashboard/evaluation/564e20d3c6c19b02530002fa\n",
"\n",
"Generated files:\n",
"\n",
" kfold3\n",
" ├─bigmler_sessions\n",
" ├─evaluation.txt\n",
" ├─evaluation__564e20d1c6c19b02530002f5.txt\n",
" ├─evaluation__564e20d3c6c19b02530002fa.json\n",
" ├─evaluation__564e20d3c6c19b02530002fa.txt\n",
" ├─evaluation__564e20d1c6c19b02530002f5.json\n",
" ├─evaluation.json\n",
" ├─models\n",
" └─evaluations\n",
"\n",
"New best state: ['plasma glucose']\n",
"Precision = 61.20% (score = -0.3879999999999999)\n",
"[2015-11-19 20:19:52] Creating models.\n",
"[2015-11-19 20:19:59] Creating evaluations.\n",
"[2015-11-19 20:20:04] Retrieving evaluation. https://bigml.com/dashboard/evaluation/564e20e1636e1c1c0d0001a3\n",
"[2015-11-19 20:20:06] Retrieving evaluation. https://bigml.com/dashboard/evaluation/564e20e3c6c19b0253000313\n",
"\n",
"Generated files:\n",
"\n",
" kfold4\n",
" ├─bigmler_sessions\n",
" ├─evaluation.txt\n",
" ├─evaluation__564e20e3c6c19b0253000313.json\n",
" ├─evaluation__564e20e1636e1c1c0d0001a3.txt\n",
" ├─evaluation.json\n",
" ├─models\n",
" ├─evaluation__564e20e3c6c19b0253000313.txt\n",
" ├─evaluation__564e20e1636e1c1c0d0001a3.json\n",
" └─evaluations\n",
"\n",
"[2015-11-19 20:20:08] Creating models.\n",
"[2015-11-19 20:20:16] Creating evaluations.\n",
"[2015-11-19 20:20:19] Retrieving evaluation. https://bigml.com/dashboard/evaluation/564e20f1c6c19b0253000325\n",
"[2015-11-19 20:20:21] Retrieving evaluation. https://bigml.com/dashboard/evaluation/564e20f3636e1c1c0700026d\n",
"\n",
"Generated files:\n",
"\n",
" kfold5\n",
" ├─bigmler_sessions\n",
" ├─evaluation__564e20f3636e1c1c0700026d.txt\n",
" ├─evaluation.txt\n",
" ├─evaluation__564e20f1c6c19b0253000325.txt\n",
" ├─evaluation__564e20f1c6c19b0253000325.json\n",
" ├─evaluation.json\n",
" ├─models\n",
" ├─evaluations\n",
" └─evaluation__564e20f3636e1c1c0700026d.json\n",
"\n",
"[2015-11-19 20:20:23] Creating models.\n",
"[2015-11-19 20:20:34] Creating evaluations.\n",
"[2015-11-19 20:20:39] Retrieving evaluation. https://bigml.com/dashboard/evaluation/564e2104c6c19b02440001f6\n",
"[2015-11-19 20:20:41] Retrieving evaluation. https://bigml.com/dashboard/evaluation/564e2106636e1c1c0d0001ae\n",
"\n",
"Generated files:\n",
"\n",
" kfold6\n",
" ├─bigmler_sessions\n",
" ├─evaluation.txt\n",
" ├─evaluation__564e2106636e1c1c0d0001ae.txt\n",
" ├─evaluation__564e2104c6c19b02440001f6.json\n",
" ├─evaluation__564e2106636e1c1c0d0001ae.json\n",
" ├─evaluation.json\n",
" ├─models\n",
" ├─evaluation__564e2104c6c19b02440001f6.txt\n",
" └─evaluations\n",
"\n",
"The best feature subset is: plasma glucose \n",
"Precision = 61.20%\n",
"Evaluated 6/7 feature subsets\n",
"\n",
"To create the final model with the entire dataset using the selected feature subset use:\n",
"bigmler --dataset dataset/564e207ac6c19b02530002e8 --model-fields=\"plasma glucose\"\n",
"\n",
"To delete all the resources generated by this analyze subcommand use:\n",
"bigmler delete --from-dir diabetes_features_selection\n",
"\n"
]
}
],
"source": [
"!bigmler analyze --features \\\n",
" --dataset $(cat diabetes_4/dataset) \\\n",
" --k-folds 2 \\\n",
" --staleness 2 \\\n",
" --optimize precision \\\n",
" --penalty 1 \\\n",
" --output-dir diabetes_features_selection"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.0"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
pregnancies plasma glucose blood pressure triceps skin thickness insulin bmi diabetes pedigree age diabetes
6 148 72 35 0 33.6 0.627 50 true
1 85 66 29 0 26.6 0.351 31 false
8 183 64 0 0 23.3 0.672 32 true
1 89 66 23 94 28.1 0.167 21 false
0 137 40 35 168 43.1 2.288 33 true
5 116 74 0 0 25.6 0.201 30 false
3 78 50 32 88 31.0 0.248 26 true
10 115 0 0 0 35.3 0.134 29 false
2 197 70 45 543 30.5 0.158 53 true
8 125 96 0 0 0.0 0.232 54 true
4 110 92 0 0 37.6 0.191 30 false
10 168 74 0 0 38.0 0.537 34 true
10 139 80 0 0 27.1 1.441 57 false
1 189 60 23 846 30.1 0.398 59 true
5 166 72 19 175 25.8 0.587 51 true
7 100 0 0 0 30.0 0.484 32 true
0 118 84 47 230 45.8 0.551 31 true
7 107 74 0 0 29.6 0.254 31 true
1 103 30 38 83 43.3 0.183 33 false
1 115 70 30 96 34.6 0.529 32 true
3 126 88 41 235 39.3 0.704 27 false
8 99 84 0 0 35.4 0.388 50 false
7 196 90 0 0 39.8 0.451 41 true
9 119 80 35 0 29.0 0.263 29 true
11 143 94 33 146 36.6 0.254 51 true
10 125 70 26 115 31.1 0.205 41 true
7 147 76 0 0 39.4 0.257 43 true
1 97 66 15 140 23.2 0.487 22 false
13 145 82 19 110 22.2 0.245 57 false
5 117 92 0 0 34.1 0.337 38 false
5 109 75 26 0 36.0 0.546 60 false
3 158 76 36 245 31.6 0.851 28 true
3 88 58 11 54 24.8 0.267 22 false
6 92 92 0 0 19.9 0.188 28 false
10 122 78 31 0 27.6 0.512 45 false
4 103 60 33 192 24.0 0.966 33 false
11 138 76 0 0 33.2 0.420 35 false
9 102 76 37 0 32.9 0.665 46 true
2 90 68 42 0 38.2 0.503 27 true
4 111 72 47 207 37.1 1.390 56 true
3 180 64 25 70 34.0 0.271 26 false
7 133 84 0 0 40.2 0.696 37 false
7 106 92 18 0 22.7 0.235 48 false
9 171 110 24 240 45.4 0.721 54 true
7 159 64 0 0 27.4 0.294 40 false
0 180 66 39 0 42.0 1.893 25 true
1 146 56 0 0 29.7 0.564 29 false
2 71 70 27 0 28.0 0.586 22 false
7 103 66 32 0 39.1 0.344 31 true
7 105 0 0 0 0.0 0.305 24 false
1 103 80 11 82 19.4 0.491 22 false
1 101 50 15 36 24.2 0.526 26 false
5 88 66 21 23 24.4 0.342 30 false
8 176 90 34 300 33.7 0.467 58 true
7 150 66 42 342 34.7 0.718 42 false
1 73 50 10 0 23.0 0.248 21 false
7 187 68 39 304 37.7 0.254 41 true
0 100 88 60 110 46.8 0.962 31 false
0 146 82 0 0 40.5 1.781 44 false
0 105 64 41 142 41.5 0.173 22 false
2 84 0 0 0 0.0 0.304 21 false
8 133 72 0 0 32.9 0.270 39 true
5 44 62 0 0 25.0 0.587 36 false
2 141 58 34 128 25.4 0.699 24 false
7 114 66 0 0 32.8 0.258 42 true
5 99 74 27 0 29.0 0.203 32 false
0 109 88 30 0 32.5 0.855 38 true
2 109 92 0 0 42.7 0.845 54 false
1 95 66 13 38 19.6 0.334 25 false
4 146 85 27 100 28.9 0.189 27 false
2 100 66 20 90 32.9 0.867 28 true
5 139 64 35 140 28.6 0.411 26 false
13 126 90 0 0 43.4 0.583 42 true
4 129 86 20 270 35.1 0.231 23 false
1 79 75 30 0 32.0 0.396 22 false
1 0 48 20 0 24.7 0.140 22 false
7 62 78 0 0 32.6 0.391 41 false
5 95 72 33 0 37.7 0.370 27 false
0 131 0 0 0 43.2 0.270 26 true
2 112 66 22 0 25.0 0.307 24 false
11 127 106 0 0 39.0 0.190 51 false
3 80 82 31 70 34.2 1.292 27 true
10 162 84 0 0 27.7 0.182 54 false
1 199 76 43 0 42.9 1.394 22 true
8 167 106 46 231 37.6 0.165 43 true
9 145 80 46 130 37.9 0.637 40 true
6 115 60 39 0 33.7 0.245 40 true
1 112 80 45 132 34.8 0.217 24 false
4 145 82 18 0 32.5 0.235 70 true
10 111 70 27 0 27.5 0.141 40 true
6 98 58 33 190 34.0 0.430 43 false
9 154 78 30 100 30.9 0.164 45 false
6 165 68 26 168 33.6 0.631 49 false
1 99 58 10 0 25.4 0.551 21 false
10 68 106 23 49 35.5 0.285 47 false
3 123 100 35 240 57.3 0.880 22 false
8 91 82 0 0 35.6 0.587 68 false
6 195 70 0 0 30.9 0.328 31 true
9 156 86 0 0 24.8 0.230 53 true
0 93 60 0 0 35.3 0.263 25 false
3 121 52 0 0 36.0 0.127 25 true
2 101 58 17 265 24.2 0.614 23 false
2 56 56 28 45 24.2 0.332 22 false
0 162 76 36 0 49.6 0.364 26 true
0 95 64 39 105 44.6 0.366 22 false
4 125 80 0 0 32.3 0.536 27 true
5 136 82 0 0 0.0 0.640 69 false
2 129 74 26 205 33.2 0.591 25 false
3 130 64 0 0 23.1 0.314 22 false
1 107 50 19 0 28.3 0.181 29 false
1 140 74 26 180 24.1 0.828 23 false
1 144 82 46 180 46.1 0.335 46 true
8 107 80 0 0 24.6 0.856 34 false
13 158 114 0 0 42.3 0.257 44 true
2 121 70 32 95 39.1 0.886 23 false
7 129 68 49 125 38.5 0.439 43 true
2 90 60 0 0 23.5 0.191 25 false
7 142 90 24 480 30.4 0.128 43 true
3 169 74 19 125 29.9 0.268 31 true
0 99 0 0 0 25.0 0.253 22 false
4 127 88 11 155 34.5 0.598 28 false
4 118 70 0 0 44.5 0.904 26 false
2 122 76 27 200 35.9 0.483 26 false
6 125 78 31 0 27.6 0.565 49 true
1 168 88 29 0 35.0 0.905 52 true
2 129 0 0 0 38.5 0.304 41 false
4 110 76 20 100 28.4 0.118 27 false
6 80 80 36 0 39.8 0.177 28 false
10 115 0 0 0 0.0 0.261 30 true
2 127 46 21 335 34.4 0.176 22 false
9 164 78 0 0 32.8 0.148 45 true
2 93 64 32 160 38.0 0.674 23 true
3 158 64 13 387 31.2 0.295 24 false
5 126 78 27 22 29.6 0.439 40 false
10 129 62 36 0 41.2 0.441 38 true
0 134 58 20 291 26.4 0.352 21 false
3 102 74 0 0 29.5 0.121 32 false
7 187 50 33 392 33.9 0.826 34 true
3 173 78 39 185 33.8 0.970 31 true
10 94 72 18 0 23.1 0.595 56 false
1 108 60 46 178 35.5 0.415 24 false
5 97 76 27 0 35.6 0.378 52 true
4 83 86 19 0 29.3 0.317 34 false
1 114 66 36 200 38.1 0.289 21 false
1 149 68 29 127 29.3 0.349 42 true
5 117 86 30 105 39.1 0.251 42 false
1 111 94 0 0 32.8 0.265 45 false
4 112 78 40 0 39.4 0.236 38 false
1 116 78 29 180 36.1 0.496 25 false
0 141 84 26 0 32.4 0.433 22 false
2 175 88 0 0 22.9 0.326 22 false
2 92 52 0 0 30.1 0.141 22 false
3 130 78 23 79 28.4 0.323 34 true
8 120 86 0 0 28.4 0.259 22 true
2 174 88 37 120 44.5 0.646 24 true
2 106 56 27 165 29.0 0.426 22 false
2 105 75 0 0 23.3 0.560 53 false
4 95 60 32 0 35.4 0.284 28 false
0 126 86 27 120 27.4 0.515 21 false
8 65 72 23 0 32.0 0.600 42 false
2 99 60 17 160 36.6 0.453 21 false
1 102 74 0 0 39.5 0.293 42 true
11 120 80 37 150 42.3 0.785 48 true
3 102 44 20 94 30.8 0.400 26 false
1 109 58 18 116 28.5 0.219 22 false
9 140 94 0 0 32.7 0.734 45 true
13 153 88 37 140 40.6 1.174 39 false
12 100 84 33 105 30.0 0.488 46 false
1 147 94 41 0 49.3 0.358 27 true
1 81 74 41 57 46.3 1.096 32 false
3 187 70 22 200 36.4 0.408 36 true
6 162 62 0 0 24.3 0.178 50 true
4 136 70 0 0 31.2 1.182 22 true
1 121 78 39 74 39.0 0.261 28 false
3 108 62 24 0 26.0 0.223 25 false
0 181 88 44 510 43.3 0.222 26 true
8 154 78 32 0 32.4 0.443 45 true
1 128 88 39 110 36.5 1.057 37 true
7 137 90 41 0 32.0 0.391 39 false
0 123 72 0 0 36.3 0.258 52 true
1 106 76 0 0 37.5 0.197 26 false
6 190 92 0 0 35.5 0.278 66 true
2 88 58 26 16 28.4 0.766 22 false
9 170 74 31 0 44.0 0.403 43 true
9 89 62 0 0 22.5 0.142 33 false
10 101 76 48 180 32.9 0.171 63 false
2 122 70 27 0 36.8 0.340 27 false
5 121 72 23 112 26.2 0.245 30 false
1 126 60 0 0 30.1 0.349 47 true
2 157 74 35 440 39.4 0.134 30 false
1 167 74 17 144 23.4 0.447 33 true
0 179 50 36 159 37.8 0.455 22 true
11 136 84 35 130 28.3 0.260 42 true
0 107 60 25 0 26.4 0.133 23 false
1 91 54 25 100 25.2 0.234 23 false
1 117 60 23 106 33.8 0.466 27 false
5 123 74 40 77 34.1 0.269 28 false
2 120 54 0 0 26.8 0.455 27 false
1 106 70 28 135 34.2 0.142 22 false
2 155 52 27 540 38.7 0.240 25 true
pregnancies plasma glucose blood pressure triceps skin thickness insulin bmi diabetes pedigree age
2 197 70 45 543 30.5 158 53
8 125 96 0 0.0 232 54
4 110 92 0 37.6 191 30
3 126 235 39.3 704 27
8 99 84 0 35.4 388 50
7 196 90 0 39.8 451 41
9 119 80 35 0 29.0 263 29
11 143 94 33 146 36.6 254 51
10 125 70 26 115 31.1 205 41
7 147 76 0 0 39.4 257 43
103 192 24.0 966 33
138 33.2 420 35
102 37 32.9 665 46
90 42 38.2 503 27
103 19.4 491 22
1 101 15 24.2 526 26
5 88 66 23 24.4 342 30
8 176 90 300 33.7 467 58
7 114 66 0 32.8 258 42
5 99 74 0 29.0 203 32
0 109 88 0 32.5 855 38
2 109 92 42.7 845 54
2 112 66 25.0 307 24
11 127 106 0 39.0 190 51
3 80 82 31 34.2 1292 27
10 162 84 0 0 27.7 182 54
1 199 76 43 0 42.9 1394 22
8 167 106 46 231 37.6 165 43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment