Skip to content

Instantly share code, notes, and snippets.

@doctorlove
Last active February 6, 2020 16:55
Show Gist options
  • Save doctorlove/bf6e42658d5806a61669a844b885983b to your computer and use it in GitHub Desktop.
Save doctorlove/bf6e42658d5806a61669a844b885983b to your computer and use it in GitHub Desktop.
Decision tree demo
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Decision Trees\n",
"## Wine: red, white or pink?\n",
"### Frances Buontempo"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.tree import DecisionTreeClassifier, export_graphviz\n",
"from sklearn import tree\n",
"from sklearn.datasets import load_wine\n",
"from IPython.display import SVG\n",
"from graphviz import Source\n",
"from IPython.display import display\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"data = load_wine()\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"X = data.data\n",
"y = data.target\n",
"labels = data.feature_names"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars.\n",
"The analysis determined the quantities of 13 constituents found in each of the three types of wines.\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
".. _wine_dataset:\n",
"\n",
"Wine recognition dataset\n",
"------------------------\n",
"\n",
"**Data Set Characteristics:**\n",
"\n",
" :Number of Instances: 178 (50 in each of three classes)\n",
" :Number of Attributes: 13 numeric, predictive attributes and the class\n",
" :Attribute Information:\n",
" \t\t- Alcohol\n",
" \t\t- Malic acid\n",
" \t\t- Ash\n",
"\t\t- Alcalinity of ash \n",
" \t\t- Magnesium\n",
"\t\t- Total phenols\n",
" \t\t- Flavanoids\n",
" \t\t- Nonflavanoid phenols\n",
" \t\t- Proanthocyanins\n",
"\t\t- Color intensity\n",
" \t\t- Hue\n",
" \t\t- OD280/OD315 of diluted wines\n",
" \t\t- Proline\n",
"\n",
" - class:\n",
" - class_0\n",
" - class_1\n",
" - class_2\n",
"\t\t\n",
" :Summary Statistics:\n",
" \n",
" ============================= ==== ===== ======= =====\n",
" Min Max Mean SD\n",
" ============================= ==== ===== ======= =====\n",
" Alcohol: 11.0 14.8 13.0 0.8\n",
" Malic Acid: 0.74 5.80 2.34 1.12\n",
" Ash: 1.36 3.23 2.36 0.27\n",
" Alcalinity of Ash: 10.6 30.0 19.5 3.3\n",
" Magnesium: 70.0 162.0 99.7 14.3\n",
" Total Phenols: 0.98 3.88 2.29 0.63\n",
" Flavanoids: 0.34 5.08 2.03 1.00\n",
" Nonflavanoid Phenols: 0.13 0.66 0.36 0.12\n",
" Proanthocyanins: 0.41 3.58 1.59 0.57\n",
" Colour Intensity: 1.3 13.0 5.1 2.3\n",
" Hue: 0.48 1.71 0.96 0.23\n",
" OD280/OD315 of diluted wines: 1.27 4.00 2.61 0.71\n",
" Proline: 278 1680 746 315\n",
" ============================= ==== ===== ======= =====\n",
"\n",
" :Missing Attribute Values: None\n",
" :Class Distribution: class_0 (59), class_1 (71), class_2 (48)\n",
" :Creator: R.A. Fisher\n",
" :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)\n",
" :Date: July, 1988\n",
"\n",
"This is a copy of UCI ML Wine recognition datasets.\n",
"https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data\n",
"\n",
"The data is the results of a chemical analysis of wines grown in the same\n",
"region in Italy by three different cultivators. There are thirteen different\n",
"measurements taken for different constituents found in the three types of\n",
"wine.\n",
"\n",
"Original Owners: \n",
"\n",
"Forina, M. et al, PARVUS - \n",
"An Extendible Package for Data Exploration, Classification and Correlation. \n",
"Institute of Pharmaceutical and Food Analysis and Technologies,\n",
"Via Brigata Salerno, 16147 Genoa, Italy.\n",
"\n",
"Citation:\n",
"\n",
"Lichman, M. (2013). UCI Machine Learning Repository\n",
"[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California,\n",
"School of Information and Computer Science. \n",
"\n",
".. topic:: References\n",
"\n",
" (1) S. Aeberhard, D. Coomans and O. de Vel, \n",
" Comparison of Classifiers in High Dimensional Settings, \n",
" Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of \n",
" Mathematics and Statistics, James Cook University of North Queensland. \n",
" (Also submitted to Technometrics). \n",
"\n",
" The data was used with many others for comparing various \n",
" classifiers. The classes are separable, though only RDA \n",
" has achieved 100% correct classification. \n",
" (RDA : 100%, QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data)) \n",
" (All results using the leave-one-out technique) \n",
"\n",
" (2) S. Aeberhard, D. Coomans and O. de Vel, \n",
" \"THE CLASSIFICATION PERFORMANCE OF RDA\" \n",
" Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of \n",
" Mathematics and Statistics, James Cook University of North Queensland. \n",
" (Also submitted to Journal of Chemometrics).\n",
"\n"
]
}
],
"source": [
"print(data.DESCR)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,\n",
" max_features=None, max_leaf_nodes=None,\n",
" min_impurity_decrease=0.0, min_impurity_split=None,\n",
" min_samples_leaf=1, min_samples_split=2,\n",
" min_weight_fraction_leaf=0.0, presort=False, random_state=None,\n",
" splitter='best')"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"estimator = DecisionTreeClassifier()\n",
"estimator.fit(X, y)\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<svg height=\"671pt\" viewBox=\"0.00 0.00 1175.00 671.00\" width=\"1175pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g class=\"graph\" id=\"graph0\" transform=\"scale(1 1) rotate(0) translate(4 667)\">\n",
"<title>Tree</title>\n",
"<polygon fill=\"white\" points=\"-4,4 -4,-667 1171,-667 1171,4 -4,4\" stroke=\"none\"/>\n",
"<!-- 0 -->\n",
"<g class=\"node\" id=\"node1\"><title>0</title>\n",
"<polygon fill=\"#39e581\" fill-opacity=\"0.101961\" points=\"696.5,-663 571.5,-663 571.5,-580 696.5,-580 696.5,-663\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"634\" y=\"-647.8\">proline &lt;= 755.0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"634\" y=\"-632.8\">gini = 0.658</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"634\" y=\"-617.8\">samples = 178</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"634\" y=\"-602.8\">value = [59, 71, 48]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"634\" y=\"-587.8\">class = 1</text>\n",
"</g>\n",
"<!-- 1 -->\n",
"<g class=\"node\" id=\"node2\"><title>1</title>\n",
"<polygon fill=\"#39e581\" fill-opacity=\"0.360784\" points=\"628.5,-544 379.5,-544 379.5,-461 628.5,-461 628.5,-544\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"504\" y=\"-528.8\">od280/od315_of_diluted_wines &lt;= 2.115</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"504\" y=\"-513.8\">gini = 0.492</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"504\" y=\"-498.8\">samples = 111</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"504\" y=\"-483.8\">value = [2, 67, 42]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"504\" y=\"-468.8\">class = 1</text>\n",
"</g>\n",
"<!-- 0&#45;&gt;1 -->\n",
"<g class=\"edge\" id=\"edge1\"><title>0-&gt;1</title>\n",
"<path d=\"M588.897,-579.907C578.511,-570.56 567.375,-560.538 556.699,-550.929\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"558.797,-548.109 549.023,-544.021 554.115,-553.312 558.797,-548.109\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"550.337\" y=\"-565.286\">True</text>\n",
"</g>\n",
"<!-- 16 -->\n",
"<g class=\"node\" id=\"node17\"><title>16</title>\n",
"<polygon fill=\"#e58139\" fill-opacity=\"0.835294\" points=\"851,-544 723,-544 723,-461 851,-461 851,-544\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"787\" y=\"-528.8\">flavanoids &lt;= 2.165</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"787\" y=\"-513.8\">gini = 0.265</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"787\" y=\"-498.8\">samples = 67</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"787\" y=\"-483.8\">value = [57, 4, 6]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"787\" y=\"-468.8\">class = 0</text>\n",
"</g>\n",
"<!-- 0&#45;&gt;16 -->\n",
"<g class=\"edge\" id=\"edge16\"><title>0-&gt;16</title>\n",
"<path d=\"M687.083,-579.907C699.543,-570.379 712.922,-560.148 725.709,-550.37\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"728.194,-552.876 734.011,-544.021 723.942,-547.315 728.194,-552.876\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"730.707\" y=\"-565.102\">False</text>\n",
"</g>\n",
"<!-- 2 -->\n",
"<g class=\"node\" id=\"node3\"><title>2</title>\n",
"<polygon fill=\"#8139e5\" fill-opacity=\"0.850980\" points=\"386,-425 274,-425 274,-342 386,-342 386,-425\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"330\" y=\"-409.8\">hue &lt;= 0.935</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"330\" y=\"-394.8\">gini = 0.227</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"330\" y=\"-379.8\">samples = 46</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"330\" y=\"-364.8\">value = [0, 6, 40]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"330\" y=\"-349.8\">class = 2</text>\n",
"</g>\n",
"<!-- 1&#45;&gt;2 -->\n",
"<g class=\"edge\" id=\"edge2\"><title>1-&gt;2</title>\n",
"<path d=\"M443.632,-460.907C427.75,-450.229 410.559,-438.669 394.471,-427.851\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"396.423,-424.946 386.172,-422.271 392.517,-430.755 396.423,-424.946\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 9 -->\n",
"<g class=\"node\" id=\"node10\"><title>9</title>\n",
"<polygon fill=\"#39e581\" fill-opacity=\"0.937255\" points=\"568,-425 440,-425 440,-342 568,-342 568,-425\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"504\" y=\"-409.8\">flavanoids &lt;= 0.795</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"504\" y=\"-394.8\">gini = 0.117</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"504\" y=\"-379.8\">samples = 65</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"504\" y=\"-364.8\">value = [2, 61, 2]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"504\" y=\"-349.8\">class = 1</text>\n",
"</g>\n",
"<!-- 1&#45;&gt;9 -->\n",
"<g class=\"edge\" id=\"edge9\"><title>1-&gt;9</title>\n",
"<path d=\"M504,-460.907C504,-452.649 504,-443.864 504,-435.302\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"507.5,-435.021 504,-425.021 500.5,-435.021 507.5,-435.021\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 3 -->\n",
"<g class=\"node\" id=\"node4\"><title>3</title>\n",
"<polygon fill=\"#8139e5\" fill-opacity=\"0.972549\" points=\"235,-306 113,-306 113,-223 235,-223 235,-306\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"174\" y=\"-290.8\">flavanoids &lt;= 1.58</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"174\" y=\"-275.8\">gini = 0.049</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"174\" y=\"-260.8\">samples = 40</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"174\" y=\"-245.8\">value = [0, 1, 39]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"174\" y=\"-230.8\">class = 2</text>\n",
"</g>\n",
"<!-- 2&#45;&gt;3 -->\n",
"<g class=\"edge\" id=\"edge3\"><title>2-&gt;3</title>\n",
"<path d=\"M275.877,-341.907C263.05,-332.288 249.271,-321.953 236.12,-312.09\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"238.128,-309.221 228.028,-306.021 233.928,-314.821 238.128,-309.221\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 6 -->\n",
"<g class=\"node\" id=\"node7\"><title>6</title>\n",
"<polygon fill=\"#39e581\" fill-opacity=\"0.800000\" points=\"407,-306 253,-306 253,-223 407,-223 407,-306\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"330\" y=\"-290.8\">color_intensity &lt;= 5.815</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"330\" y=\"-275.8\">gini = 0.278</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"330\" y=\"-260.8\">samples = 6</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"330\" y=\"-245.8\">value = [0, 5, 1]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"330\" y=\"-230.8\">class = 1</text>\n",
"</g>\n",
"<!-- 2&#45;&gt;6 -->\n",
"<g class=\"edge\" id=\"edge6\"><title>2-&gt;6</title>\n",
"<path d=\"M330,-341.907C330,-333.649 330,-324.864 330,-316.302\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"333.5,-316.021 330,-306.021 326.5,-316.021 333.5,-316.021\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 4 -->\n",
"<g class=\"node\" id=\"node5\"><title>4</title>\n",
"<polygon fill=\"#8139e5\" points=\"112,-179.5 7.10543e-015,-179.5 7.10543e-015,-111.5 112,-111.5 112,-179.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"56\" y=\"-164.3\">gini = 0.0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"56\" y=\"-149.3\">samples = 39</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"56\" y=\"-134.3\">value = [0, 0, 39]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"56\" y=\"-119.3\">class = 2</text>\n",
"</g>\n",
"<!-- 3&#45;&gt;4 -->\n",
"<g class=\"edge\" id=\"edge4\"><title>3-&gt;4</title>\n",
"<path d=\"M133.06,-222.907C121.264,-211.211 108.401,-198.457 96.624,-186.78\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"99.0159,-184.222 89.4504,-179.667 94.0872,-189.193 99.0159,-184.222\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 5 -->\n",
"<g class=\"node\" id=\"node6\"><title>5</title>\n",
"<polygon fill=\"#39e581\" points=\"235.5,-179.5 130.5,-179.5 130.5,-111.5 235.5,-111.5 235.5,-179.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"183\" y=\"-164.3\">gini = 0.0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"183\" y=\"-149.3\">samples = 1</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"183\" y=\"-134.3\">value = [0, 1, 0]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"183\" y=\"-119.3\">class = 1</text>\n",
"</g>\n",
"<!-- 3&#45;&gt;5 -->\n",
"<g class=\"edge\" id=\"edge5\"><title>3-&gt;5</title>\n",
"<path d=\"M177.123,-222.907C177.946,-212.204 178.837,-200.615 179.671,-189.776\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"183.171,-189.906 180.449,-179.667 176.192,-189.369 183.171,-189.906\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 7 -->\n",
"<g class=\"node\" id=\"node8\"><title>7</title>\n",
"<polygon fill=\"#39e581\" points=\"366.5,-179.5 261.5,-179.5 261.5,-111.5 366.5,-111.5 366.5,-179.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"314\" y=\"-164.3\">gini = 0.0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"314\" y=\"-149.3\">samples = 5</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"314\" y=\"-134.3\">value = [0, 5, 0]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"314\" y=\"-119.3\">class = 1</text>\n",
"</g>\n",
"<!-- 6&#45;&gt;7 -->\n",
"<g class=\"edge\" id=\"edge7\"><title>6-&gt;7</title>\n",
"<path d=\"M324.449,-222.907C322.985,-212.204 321.4,-200.615 319.918,-189.776\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"323.358,-189.1 318.536,-179.667 316.423,-190.049 323.358,-189.1\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 8 -->\n",
"<g class=\"node\" id=\"node9\"><title>8</title>\n",
"<polygon fill=\"#8139e5\" points=\"489.5,-179.5 384.5,-179.5 384.5,-111.5 489.5,-111.5 489.5,-179.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"437\" y=\"-164.3\">gini = 0.0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"437\" y=\"-149.3\">samples = 1</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"437\" y=\"-134.3\">value = [0, 0, 1]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"437\" y=\"-119.3\">class = 2</text>\n",
"</g>\n",
"<!-- 6&#45;&gt;8 -->\n",
"<g class=\"edge\" id=\"edge8\"><title>6-&gt;8</title>\n",
"<path d=\"M367.123,-222.907C377.719,-211.321 389.264,-198.698 399.86,-187.111\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"402.502,-189.408 406.668,-179.667 397.336,-184.684 402.502,-189.408\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 10 -->\n",
"<g class=\"node\" id=\"node11\"><title>10</title>\n",
"<polygon fill=\"#8139e5\" points=\"530.5,-298.5 425.5,-298.5 425.5,-230.5 530.5,-230.5 530.5,-298.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"478\" y=\"-283.3\">gini = 0.0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"478\" y=\"-268.3\">samples = 2</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"478\" y=\"-253.3\">value = [0, 0, 2]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"478\" y=\"-238.3\">class = 2</text>\n",
"</g>\n",
"<!-- 9&#45;&gt;10 -->\n",
"<g class=\"edge\" id=\"edge10\"><title>9-&gt;10</title>\n",
"<path d=\"M494.979,-341.907C492.576,-331.094 489.973,-319.376 487.542,-308.441\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"490.956,-307.67 485.37,-298.667 484.123,-309.188 490.956,-307.67\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 11 -->\n",
"<g class=\"node\" id=\"node12\"><title>11</title>\n",
"<polygon fill=\"#39e581\" fill-opacity=\"0.968627\" points=\"667.5,-306 548.5,-306 548.5,-223 667.5,-223 667.5,-306\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"608\" y=\"-290.8\">alcohol &lt;= 13.175</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"608\" y=\"-275.8\">gini = 0.061</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"608\" y=\"-260.8\">samples = 63</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"608\" y=\"-245.8\">value = [2, 61, 0]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"608\" y=\"-230.8\">class = 1</text>\n",
"</g>\n",
"<!-- 9&#45;&gt;11 -->\n",
"<g class=\"edge\" id=\"edge11\"><title>9-&gt;11</title>\n",
"<path d=\"M540.082,-341.907C548.149,-332.832 556.781,-323.121 565.094,-313.769\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"567.954,-315.82 571.982,-306.021 562.722,-311.17 567.954,-315.82\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 12 -->\n",
"<g class=\"node\" id=\"node13\"><title>12</title>\n",
"<polygon fill=\"#39e581\" points=\"642,-179.5 530,-179.5 530,-111.5 642,-111.5 642,-179.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"586\" y=\"-164.3\">gini = 0.0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"586\" y=\"-149.3\">samples = 58</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"586\" y=\"-134.3\">value = [0, 58, 0]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"586\" y=\"-119.3\">class = 1</text>\n",
"</g>\n",
"<!-- 11&#45;&gt;12 -->\n",
"<g class=\"edge\" id=\"edge12\"><title>11-&gt;12</title>\n",
"<path d=\"M600.367,-222.907C598.355,-212.204 596.176,-200.615 594.137,-189.776\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"597.524,-188.848 592.237,-179.667 590.645,-190.142 597.524,-188.848\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 13 -->\n",
"<g class=\"node\" id=\"node14\"><title>13</title>\n",
"<polygon fill=\"#39e581\" fill-opacity=\"0.333333\" points=\"779.5,-187 660.5,-187 660.5,-104 779.5,-104 779.5,-187\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"720\" y=\"-171.8\">alcohol &lt;= 13.365</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"720\" y=\"-156.8\">gini = 0.48</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"720\" y=\"-141.8\">samples = 5</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"720\" y=\"-126.8\">value = [2, 3, 0]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"720\" y=\"-111.8\">class = 1</text>\n",
"</g>\n",
"<!-- 11&#45;&gt;13 -->\n",
"<g class=\"edge\" id=\"edge13\"><title>11-&gt;13</title>\n",
"<path d=\"M646.858,-222.907C655.632,-213.742 665.028,-203.927 674.062,-194.489\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"676.824,-196.665 681.211,-187.021 671.767,-191.824 676.824,-196.665\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 14 -->\n",
"<g class=\"node\" id=\"node15\"><title>14</title>\n",
"<polygon fill=\"#e58139\" points=\"710.5,-68 605.5,-68 605.5,-0 710.5,-0 710.5,-68\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"658\" y=\"-52.8\">gini = 0.0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"658\" y=\"-37.8\">samples = 2</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"658\" y=\"-22.8\">value = [2, 0, 0]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"658\" y=\"-7.8\">class = 0</text>\n",
"</g>\n",
"<!-- 13&#45;&gt;14 -->\n",
"<g class=\"edge\" id=\"edge14\"><title>13-&gt;14</title>\n",
"<path d=\"M696.913,-103.726C692.007,-95.0615 686.818,-85.8962 681.883,-77.1802\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"684.827,-75.277 676.855,-68.2996 678.736,-78.726 684.827,-75.277\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 15 -->\n",
"<g class=\"node\" id=\"node16\"><title>15</title>\n",
"<polygon fill=\"#39e581\" points=\"833.5,-68 728.5,-68 728.5,-0 833.5,-0 833.5,-68\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"781\" y=\"-52.8\">gini = 0.0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"781\" y=\"-37.8\">samples = 3</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"781\" y=\"-22.8\">value = [0, 3, 0]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"781\" y=\"-7.8\">class = 1</text>\n",
"</g>\n",
"<!-- 13&#45;&gt;15 -->\n",
"<g class=\"edge\" id=\"edge15\"><title>13-&gt;15</title>\n",
"<path d=\"M742.714,-103.726C747.541,-95.0615 752.647,-85.8962 757.502,-77.1802\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"760.64,-78.7389 762.45,-68.2996 754.525,-75.3322 760.64,-78.7389\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 17 -->\n",
"<g class=\"node\" id=\"node18\"><title>17</title>\n",
"<polygon fill=\"#8139e5\" fill-opacity=\"0.666667\" points=\"839.5,-425 734.5,-425 734.5,-342 839.5,-342 839.5,-425\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"787\" y=\"-409.8\">hue &lt;= 0.803</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"787\" y=\"-394.8\">gini = 0.375</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"787\" y=\"-379.8\">samples = 8</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"787\" y=\"-364.8\">value = [0, 2, 6]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"787\" y=\"-349.8\">class = 2</text>\n",
"</g>\n",
"<!-- 16&#45;&gt;17 -->\n",
"<g class=\"edge\" id=\"edge17\"><title>16-&gt;17</title>\n",
"<path d=\"M787,-460.907C787,-452.649 787,-443.864 787,-435.302\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"790.5,-435.021 787,-425.021 783.5,-435.021 790.5,-435.021\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 20 -->\n",
"<g class=\"node\" id=\"node21\"><title>20</title>\n",
"<polygon fill=\"#e58139\" fill-opacity=\"0.964706\" points=\"1061,-425 907,-425 907,-342 1061,-342 1061,-425\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"984\" y=\"-409.8\">color_intensity &lt;= 3.435</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"984\" y=\"-394.8\">gini = 0.065</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"984\" y=\"-379.8\">samples = 59</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"984\" y=\"-364.8\">value = [57, 2, 0]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"984\" y=\"-349.8\">class = 0</text>\n",
"</g>\n",
"<!-- 16&#45;&gt;20 -->\n",
"<g class=\"edge\" id=\"edge20\"><title>16-&gt;20</title>\n",
"<path d=\"M851.233,-463.352C869.068,-452.759 888.616,-441.15 907.096,-430.174\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"908.975,-433.129 915.786,-425.013 905.4,-427.11 908.975,-433.129\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 18 -->\n",
"<g class=\"node\" id=\"node19\"><title>18</title>\n",
"<polygon fill=\"#8139e5\" points=\"790.5,-298.5 685.5,-298.5 685.5,-230.5 790.5,-230.5 790.5,-298.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"738\" y=\"-283.3\">gini = 0.0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"738\" y=\"-268.3\">samples = 6</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"738\" y=\"-253.3\">value = [0, 0, 6]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"738\" y=\"-238.3\">class = 2</text>\n",
"</g>\n",
"<!-- 17&#45;&gt;18 -->\n",
"<g class=\"edge\" id=\"edge18\"><title>17-&gt;18</title>\n",
"<path d=\"M770,-341.907C765.425,-330.983 760.463,-319.137 755.844,-308.107\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"758.982,-306.539 751.89,-298.667 752.525,-309.243 758.982,-306.539\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 19 -->\n",
"<g class=\"node\" id=\"node20\"><title>19</title>\n",
"<polygon fill=\"#39e581\" points=\"913.5,-298.5 808.5,-298.5 808.5,-230.5 913.5,-230.5 913.5,-298.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"861\" y=\"-283.3\">gini = 0.0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"861\" y=\"-268.3\">samples = 2</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"861\" y=\"-253.3\">value = [0, 2, 0]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"861\" y=\"-238.3\">class = 1</text>\n",
"</g>\n",
"<!-- 17&#45;&gt;19 -->\n",
"<g class=\"edge\" id=\"edge19\"><title>17-&gt;19</title>\n",
"<path d=\"M812.674,-341.907C819.723,-330.763 827.379,-318.658 834.474,-307.439\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"837.635,-308.989 840.023,-298.667 831.719,-305.248 837.635,-308.989\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 21 -->\n",
"<g class=\"node\" id=\"node22\"><title>21</title>\n",
"<polygon fill=\"#39e581\" points=\"1036.5,-298.5 931.5,-298.5 931.5,-230.5 1036.5,-230.5 1036.5,-298.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"984\" y=\"-283.3\">gini = 0.0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"984\" y=\"-268.3\">samples = 2</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"984\" y=\"-253.3\">value = [0, 2, 0]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"984\" y=\"-238.3\">class = 1</text>\n",
"</g>\n",
"<!-- 20&#45;&gt;21 -->\n",
"<g class=\"edge\" id=\"edge21\"><title>20-&gt;21</title>\n",
"<path d=\"M984,-341.907C984,-331.204 984,-319.615 984,-308.776\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"987.5,-308.667 984,-298.667 980.5,-308.667 987.5,-308.667\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 22 -->\n",
"<g class=\"node\" id=\"node23\"><title>22</title>\n",
"<polygon fill=\"#e58139\" points=\"1167,-298.5 1055,-298.5 1055,-230.5 1167,-230.5 1167,-298.5\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"1111\" y=\"-283.3\">gini = 0.0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"1111\" y=\"-268.3\">samples = 57</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"1111\" y=\"-253.3\">value = [57, 0, 0]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"1111\" y=\"-238.3\">class = 0</text>\n",
"</g>\n",
"<!-- 20&#45;&gt;22 -->\n",
"<g class=\"edge\" id=\"edge22\"><title>20-&gt;22</title>\n",
"<path d=\"M1028.06,-341.907C1040.88,-330.101 1054.86,-317.217 1067.64,-305.45\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"1070.02,-308.017 1075,-298.667 1065.27,-302.868 1070.02,-308.017\" stroke=\"black\"/>\n",
"</g>\n",
"</g>\n",
"</svg>"
],
"text/plain": [
"<IPython.core.display.SVG object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"graph = Source(tree.export_graphviz(estimator, out_file=None\n",
" , feature_names=labels, class_names=['0', '1', '2'] \n",
" , filled = True))\n",
"display(SVG(graph.pipe(format='svg')))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is too big to see! Let's limit the max depth"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=2,\n",
" max_features=None, max_leaf_nodes=None,\n",
" min_impurity_decrease=0.0, min_impurity_split=None,\n",
" min_samples_leaf=1, min_samples_split=2,\n",
" min_weight_fraction_leaf=0.0, presort=False, random_state=None,\n",
" splitter='best')"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"estimator = DecisionTreeClassifier(max_depth=2)\n",
"estimator.fit(X, y)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<svg height=\"314pt\" viewBox=\"0.00 0.00 544.00 314.00\" width=\"544pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g class=\"graph\" id=\"graph0\" transform=\"scale(1 1) rotate(0) translate(4 310)\">\n",
"<title>Tree</title>\n",
"<polygon fill=\"white\" points=\"-4,4 -4,-310 540,-310 540,4 -4,4\" stroke=\"none\"/>\n",
"<!-- 0 -->\n",
"<g class=\"node\" id=\"node1\"><title>0</title>\n",
"<polygon fill=\"#39e581\" fill-opacity=\"0.101961\" points=\"331.5,-306 206.5,-306 206.5,-223 331.5,-223 331.5,-306\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"269\" y=\"-290.8\">proline &lt;= 755.0</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"269\" y=\"-275.8\">gini = 0.658</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"269\" y=\"-260.8\">samples = 178</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"269\" y=\"-245.8\">value = [59, 71, 48]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"269\" y=\"-230.8\">class = 1</text>\n",
"</g>\n",
"<!-- 1 -->\n",
"<g class=\"node\" id=\"node2\"><title>1</title>\n",
"<polygon fill=\"#39e581\" fill-opacity=\"0.360784\" points=\"290.5,-187 41.5,-187 41.5,-104 290.5,-104 290.5,-187\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"166\" y=\"-171.8\">od280/od315_of_diluted_wines &lt;= 2.115</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"166\" y=\"-156.8\">gini = 0.492</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"166\" y=\"-141.8\">samples = 111</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"166\" y=\"-126.8\">value = [2, 67, 42]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"166\" y=\"-111.8\">class = 1</text>\n",
"</g>\n",
"<!-- 0&#45;&gt;1 -->\n",
"<g class=\"edge\" id=\"edge1\"><title>0-&gt;1</title>\n",
"<path d=\"M233.265,-222.907C225.275,-213.832 216.726,-204.121 208.494,-194.769\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"210.907,-192.214 201.672,-187.021 205.653,-196.839 210.907,-192.214\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"200.084\" y=\"-208.27\">True</text>\n",
"</g>\n",
"<!-- 4 -->\n",
"<g class=\"node\" id=\"node5\"><title>4</title>\n",
"<polygon fill=\"#e58139\" fill-opacity=\"0.835294\" points=\"437,-187 309,-187 309,-104 437,-104 437,-187\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"373\" y=\"-171.8\">flavanoids &lt;= 2.165</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"373\" y=\"-156.8\">gini = 0.265</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"373\" y=\"-141.8\">samples = 67</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"373\" y=\"-126.8\">value = [57, 4, 6]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"373\" y=\"-111.8\">class = 0</text>\n",
"</g>\n",
"<!-- 0&#45;&gt;4 -->\n",
"<g class=\"edge\" id=\"edge4\"><title>0-&gt;4</title>\n",
"<path d=\"M305.082,-222.907C313.149,-213.832 321.781,-204.121 330.094,-194.769\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"332.954,-196.82 336.982,-187.021 327.722,-192.17 332.954,-196.82\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"338.45\" y=\"-208.278\">False</text>\n",
"</g>\n",
"<!-- 2 -->\n",
"<g class=\"node\" id=\"node3\"><title>2</title>\n",
"<polygon fill=\"#8139e5\" fill-opacity=\"0.850980\" points=\"112,-68 7.10543e-015,-68 7.10543e-015,-0 112,-0 112,-68\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"56\" y=\"-52.8\">gini = 0.227</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"56\" y=\"-37.8\">samples = 46</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"56\" y=\"-22.8\">value = [0, 6, 40]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"56\" y=\"-7.8\">class = 2</text>\n",
"</g>\n",
"<!-- 1&#45;&gt;2 -->\n",
"<g class=\"edge\" id=\"edge2\"><title>1-&gt;2</title>\n",
"<path d=\"M125.04,-103.726C115.786,-94.5142 105.964,-84.7364 96.7194,-75.5343\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"99.0081,-72.8741 89.4517,-68.2996 94.0696,-77.8351 99.0081,-72.8741\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 3 -->\n",
"<g class=\"node\" id=\"node4\"><title>3</title>\n",
"<polygon fill=\"#39e581\" fill-opacity=\"0.937255\" points=\"242,-68 130,-68 130,-0 242,-0 242,-68\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"186\" y=\"-52.8\">gini = 0.117</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"186\" y=\"-37.8\">samples = 65</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"186\" y=\"-22.8\">value = [2, 61, 2]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"186\" y=\"-7.8\">class = 1</text>\n",
"</g>\n",
"<!-- 1&#45;&gt;3 -->\n",
"<g class=\"edge\" id=\"edge3\"><title>1-&gt;3</title>\n",
"<path d=\"M173.447,-103.726C174.963,-95.4263 176.563,-86.6671 178.094,-78.2834\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"181.564,-78.7658 179.918,-68.2996 174.678,-77.508 181.564,-78.7658\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 5 -->\n",
"<g class=\"node\" id=\"node6\"><title>5</title>\n",
"<polygon fill=\"#8139e5\" fill-opacity=\"0.666667\" points=\"405.5,-68 300.5,-68 300.5,-0 405.5,-0 405.5,-68\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"353\" y=\"-52.8\">gini = 0.375</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"353\" y=\"-37.8\">samples = 8</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"353\" y=\"-22.8\">value = [0, 2, 6]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"353\" y=\"-7.8\">class = 2</text>\n",
"</g>\n",
"<!-- 4&#45;&gt;5 -->\n",
"<g class=\"edge\" id=\"edge5\"><title>4-&gt;5</title>\n",
"<path d=\"M365.553,-103.726C364.037,-95.4263 362.437,-86.6671 360.906,-78.2834\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"364.322,-77.508 359.082,-68.2996 357.436,-78.7658 364.322,-77.508\" stroke=\"black\"/>\n",
"</g>\n",
"<!-- 6 -->\n",
"<g class=\"node\" id=\"node7\"><title>6</title>\n",
"<polygon fill=\"#e58139\" fill-opacity=\"0.964706\" points=\"536,-68 424,-68 424,-0 536,-0 536,-68\" stroke=\"black\"/>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"480\" y=\"-52.8\">gini = 0.065</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"480\" y=\"-37.8\">samples = 59</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"480\" y=\"-22.8\">value = [57, 2, 0]</text>\n",
"<text font-family=\"Times New Roman,serif\" font-size=\"14.00\" text-anchor=\"middle\" x=\"480\" y=\"-7.8\">class = 0</text>\n",
"</g>\n",
"<!-- 4&#45;&gt;6 -->\n",
"<g class=\"edge\" id=\"edge6\"><title>4-&gt;6</title>\n",
"<path d=\"M412.843,-103.726C421.845,-94.5142 431.399,-84.7364 440.391,-75.5343\" fill=\"none\" stroke=\"black\"/>\n",
"<polygon fill=\"black\" points=\"442.975,-77.898 447.461,-68.2996 437.968,-73.0057 442.975,-77.898\" stroke=\"black\"/>\n",
"</g>\n",
"</g>\n",
"</svg>"
],
"text/plain": [
"<IPython.core.display.SVG object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"graph_limit_depth = Source(tree.export_graphviz(estimator, out_file=None\n",
" , feature_names=labels, class_names=['0', '1', '2'] \n",
" , filled = True))\n",
"display(SVG(graph_limit_depth.pipe(format='svg')))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This may not be as accruate, but shows us significant features.\n",
"[Wikipedia](https://en.wikipedia.org/wiki/Phenolic_content_in_wine) says, \n",
" > In white wines the number of flavonoids is reduced due to the lesser contact with the skins \n",
" > that they receive during winemaking"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"flavinoid = [1500] * 13\n",
"flavinoid[6]= 0.\n",
"prediction = estimator.predict([flavinoid])\n",
"prediction[0]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment