Skip to content

Instantly share code, notes, and snippets.

@jwdink
Last active July 24, 2021 06:09
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save jwdink/9715a1a30e8c7f50a572 to your computer and use it in GitHub Desktop.
Save jwdink/9715a1a30e8c7f50a572 to your computer and use it in GitHub Desktop.
Decision Tree Algorithm in Python - A simple walkthrough of the ID3 algorithm for building decision trees (view at http://nbviewer.ipython.org/gist/jwdink/9715a1a30e8c7f50a572)
{
"metadata": {
"name": "",
"signature": "sha256:b48eb854cee7ab1a3958cf1186bb35bf11222f4dd9ec741b60efd40b1cb7fbcc"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Decision Trees"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Intro\n",
"\n",
"This is a walkthrough of some code for a simple decision-tree learner.\n",
"\n",
"We'll import a dataset, where each row is a mushroom, and each column specifies attributes of that mushroom. We want to guess whether the mushroom is poisonous or not based on the other attributes.\n",
"\n",
"NOTE: This is not a robust version of the algorithm: for example, it can only handle discrete/nominal attributes, it doesn't handle overfitting, etc. The purpose is for the code to be relatively simple and easy to understand."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd\n",
"from pandas import DataFrame \n",
"df_shroom = DataFrame.from_csv('../datasets/mushroom_data.csv')\n",
"df_shroom"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>class</th>\n",
" <th>cap-shape</th>\n",
" <th>cap-surface</th>\n",
" <th>cap-color</th>\n",
" <th>bruises?</th>\n",
" <th>odor</th>\n",
" <th>gill-attachment</th>\n",
" <th>gill-spacing</th>\n",
" <th>gill-size</th>\n",
" <th>gill-color</th>\n",
" <th>...</th>\n",
" <th>stalk-surface-below-ring</th>\n",
" <th>stalk-color-above-ring</th>\n",
" <th>stalk-color-below-ring</th>\n",
" <th>veil-type</th>\n",
" <th>veil-color</th>\n",
" <th>ring-number</th>\n",
" <th>ring-type</th>\n",
" <th>spore-print-color</th>\n",
" <th>population</th>\n",
" <th>habitat</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0 </th>\n",
" <td> poisonous</td>\n",
" <td> convex</td>\n",
" <td> smooth</td>\n",
" <td> brown</td>\n",
" <td> bruises</td>\n",
" <td> pungent</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> narrow</td>\n",
" <td> black</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> black</td>\n",
" <td> scattered</td>\n",
" <td> urban</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1 </th>\n",
" <td> edible</td>\n",
" <td> convex</td>\n",
" <td> smooth</td>\n",
" <td> yellow</td>\n",
" <td> bruises</td>\n",
" <td> almond</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> black</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> brown</td>\n",
" <td> numerous</td>\n",
" <td> grasses</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2 </th>\n",
" <td> edible</td>\n",
" <td> bell</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> bruises</td>\n",
" <td> anise</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> brown</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> brown</td>\n",
" <td> numerous</td>\n",
" <td> meadows</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3 </th>\n",
" <td> poisonous</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> white</td>\n",
" <td> bruises</td>\n",
" <td> pungent</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> narrow</td>\n",
" <td> brown</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> black</td>\n",
" <td> scattered</td>\n",
" <td> urban</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4 </th>\n",
" <td> edible</td>\n",
" <td> convex</td>\n",
" <td> smooth</td>\n",
" <td> gray</td>\n",
" <td> no</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> crowded</td>\n",
" <td> broad</td>\n",
" <td> black</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> evanescent</td>\n",
" <td> brown</td>\n",
" <td> abundant</td>\n",
" <td> grasses</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5 </th>\n",
" <td> edible</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> yellow</td>\n",
" <td> bruises</td>\n",
" <td> almond</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> brown</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> black</td>\n",
" <td> numerous</td>\n",
" <td> grasses</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6 </th>\n",
" <td> edible</td>\n",
" <td> bell</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> bruises</td>\n",
" <td> almond</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> gray</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> black</td>\n",
" <td> numerous</td>\n",
" <td> meadows</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7 </th>\n",
" <td> edible</td>\n",
" <td> bell</td>\n",
" <td> scaly</td>\n",
" <td> white</td>\n",
" <td> bruises</td>\n",
" <td> anise</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> brown</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> brown</td>\n",
" <td> scattered</td>\n",
" <td> meadows</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8 </th>\n",
" <td> poisonous</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> white</td>\n",
" <td> bruises</td>\n",
" <td> pungent</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> narrow</td>\n",
" <td> pink</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> black</td>\n",
" <td> several</td>\n",
" <td> grasses</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9 </th>\n",
" <td> edible</td>\n",
" <td> bell</td>\n",
" <td> smooth</td>\n",
" <td> yellow</td>\n",
" <td> bruises</td>\n",
" <td> almond</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> gray</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> black</td>\n",
" <td> scattered</td>\n",
" <td> meadows</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10 </th>\n",
" <td> edible</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> yellow</td>\n",
" <td> bruises</td>\n",
" <td> anise</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> gray</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> brown</td>\n",
" <td> numerous</td>\n",
" <td> grasses</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11 </th>\n",
" <td> edible</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> yellow</td>\n",
" <td> bruises</td>\n",
" <td> almond</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> brown</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> black</td>\n",
" <td> scattered</td>\n",
" <td> meadows</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12 </th>\n",
" <td> edible</td>\n",
" <td> bell</td>\n",
" <td> smooth</td>\n",
" <td> yellow</td>\n",
" <td> bruises</td>\n",
" <td> almond</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> brown</td>\n",
" <td> scattered</td>\n",
" <td> grasses</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13 </th>\n",
" <td> poisonous</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> white</td>\n",
" <td> bruises</td>\n",
" <td> pungent</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> narrow</td>\n",
" <td> black</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> brown</td>\n",
" <td> several</td>\n",
" <td> urban</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14 </th>\n",
" <td> edible</td>\n",
" <td> convex</td>\n",
" <td> fibrous</td>\n",
" <td> brown</td>\n",
" <td> no</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> crowded</td>\n",
" <td> broad</td>\n",
" <td> brown</td>\n",
" <td>...</td>\n",
" <td> fibrous</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> evanescent</td>\n",
" <td> black</td>\n",
" <td> abundant</td>\n",
" <td> grasses</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15 </th>\n",
" <td> edible</td>\n",
" <td> sunken</td>\n",
" <td> fibrous</td>\n",
" <td> gray</td>\n",
" <td> no</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> narrow</td>\n",
" <td> black</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> brown</td>\n",
" <td> solitary</td>\n",
" <td> urban</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16 </th>\n",
" <td> edible</td>\n",
" <td> flat</td>\n",
" <td> fibrous</td>\n",
" <td> white</td>\n",
" <td> no</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> crowded</td>\n",
" <td> broad</td>\n",
" <td> black</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> evanescent</td>\n",
" <td> brown</td>\n",
" <td> abundant</td>\n",
" <td> grasses</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17 </th>\n",
" <td> poisonous</td>\n",
" <td> convex</td>\n",
" <td> smooth</td>\n",
" <td> brown</td>\n",
" <td> bruises</td>\n",
" <td> pungent</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> narrow</td>\n",
" <td> brown</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> black</td>\n",
" <td> scattered</td>\n",
" <td> grasses</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18 </th>\n",
" <td> poisonous</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> white</td>\n",
" <td> bruises</td>\n",
" <td> pungent</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> narrow</td>\n",
" <td> brown</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> brown</td>\n",
" <td> scattered</td>\n",
" <td> urban</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19 </th>\n",
" <td> poisonous</td>\n",
" <td> convex</td>\n",
" <td> smooth</td>\n",
" <td> brown</td>\n",
" <td> bruises</td>\n",
" <td> pungent</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> narrow</td>\n",
" <td> black</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> brown</td>\n",
" <td> scattered</td>\n",
" <td> urban</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20 </th>\n",
" <td> edible</td>\n",
" <td> bell</td>\n",
" <td> smooth</td>\n",
" <td> yellow</td>\n",
" <td> bruises</td>\n",
" <td> almond</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> black</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> brown</td>\n",
" <td> scattered</td>\n",
" <td> meadows</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21 </th>\n",
" <td> poisonous</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> brown</td>\n",
" <td> bruises</td>\n",
" <td> pungent</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> narrow</td>\n",
" <td> brown</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> brown</td>\n",
" <td> several</td>\n",
" <td> grasses</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22 </th>\n",
" <td> edible</td>\n",
" <td> bell</td>\n",
" <td> scaly</td>\n",
" <td> yellow</td>\n",
" <td> bruises</td>\n",
" <td> anise</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> black</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> brown</td>\n",
" <td> scattered</td>\n",
" <td> meadows</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23 </th>\n",
" <td> edible</td>\n",
" <td> bell</td>\n",
" <td> scaly</td>\n",
" <td> white</td>\n",
" <td> bruises</td>\n",
" <td> almond</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> brown</td>\n",
" <td> numerous</td>\n",
" <td> meadows</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24 </th>\n",
" <td> edible</td>\n",
" <td> bell</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> bruises</td>\n",
" <td> anise</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> gray</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> black</td>\n",
" <td> scattered</td>\n",
" <td> meadows</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25 </th>\n",
" <td> poisonous</td>\n",
" <td> flat</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> bruises</td>\n",
" <td> pungent</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> narrow</td>\n",
" <td> brown</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> brown</td>\n",
" <td> several</td>\n",
" <td> grasses</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26 </th>\n",
" <td> edible</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> yellow</td>\n",
" <td> bruises</td>\n",
" <td> almond</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> brown</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> brown</td>\n",
" <td> numerous</td>\n",
" <td> meadows</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27 </th>\n",
" <td> edible</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> white</td>\n",
" <td> bruises</td>\n",
" <td> anise</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> brown</td>\n",
" <td> numerous</td>\n",
" <td> meadows</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28 </th>\n",
" <td> edible</td>\n",
" <td> flat</td>\n",
" <td> fibrous</td>\n",
" <td> brown</td>\n",
" <td> no</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> narrow</td>\n",
" <td> black</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> black</td>\n",
" <td> solitary</td>\n",
" <td> urban</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29 </th>\n",
" <td> edible</td>\n",
" <td> convex</td>\n",
" <td> smooth</td>\n",
" <td> yellow</td>\n",
" <td> bruises</td>\n",
" <td> almond</td>\n",
" <td> free</td>\n",
" <td> crowded</td>\n",
" <td> narrow</td>\n",
" <td> brown</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> one</td>\n",
" <td> pendant</td>\n",
" <td> brown</td>\n",
" <td> several</td>\n",
" <td> woods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5614</th>\n",
" <td> edible</td>\n",
" <td> flat</td>\n",
" <td> smooth</td>\n",
" <td> gray</td>\n",
" <td> bruises</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> two</td>\n",
" <td> pendant</td>\n",
" <td> white</td>\n",
" <td> solitary</td>\n",
" <td> paths</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5615</th>\n",
" <td> edible</td>\n",
" <td> convex</td>\n",
" <td> smooth</td>\n",
" <td> pink</td>\n",
" <td> bruises</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> two</td>\n",
" <td> pendant</td>\n",
" <td> white</td>\n",
" <td> several</td>\n",
" <td> paths</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5616</th>\n",
" <td> poisonous</td>\n",
" <td> conical</td>\n",
" <td> scaly</td>\n",
" <td> yellow</td>\n",
" <td> no</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> crowded</td>\n",
" <td> narrow</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> yellow</td>\n",
" <td> yellow</td>\n",
" <td> partial</td>\n",
" <td> yellow</td>\n",
" <td> one</td>\n",
" <td> evanescent</td>\n",
" <td> white</td>\n",
" <td> clustered</td>\n",
" <td> leaves</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5617</th>\n",
" <td> poisonous</td>\n",
" <td> knobbed</td>\n",
" <td> scaly</td>\n",
" <td> red</td>\n",
" <td> no</td>\n",
" <td> musty</td>\n",
" <td> attached</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> yellow</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> cinnamon</td>\n",
" <td> cinnamon</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> none</td>\n",
" <td> none</td>\n",
" <td> white</td>\n",
" <td> clustered</td>\n",
" <td> woods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5618</th>\n",
" <td> poisonous</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> cinnamon</td>\n",
" <td> no</td>\n",
" <td> musty</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> cinnamon</td>\n",
" <td> cinnamon</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> none</td>\n",
" <td> none</td>\n",
" <td> white</td>\n",
" <td> clustered</td>\n",
" <td> woods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5619</th>\n",
" <td> edible</td>\n",
" <td> bell</td>\n",
" <td> scaly</td>\n",
" <td> brown</td>\n",
" <td> no</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> brown</td>\n",
" <td> brown</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> two</td>\n",
" <td> pendant</td>\n",
" <td> white</td>\n",
" <td> solitary</td>\n",
" <td> woods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5620</th>\n",
" <td> edible</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> brown</td>\n",
" <td> bruises</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> two</td>\n",
" <td> pendant</td>\n",
" <td> white</td>\n",
" <td> several</td>\n",
" <td> paths</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5621</th>\n",
" <td> poisonous</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> brown</td>\n",
" <td> no</td>\n",
" <td> musty</td>\n",
" <td> attached</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> yellow</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> cinnamon</td>\n",
" <td> cinnamon</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> none</td>\n",
" <td> none</td>\n",
" <td> white</td>\n",
" <td> clustered</td>\n",
" <td> woods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5622</th>\n",
" <td> poisonous</td>\n",
" <td> knobbed</td>\n",
" <td> scaly</td>\n",
" <td> yellow</td>\n",
" <td> no</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> crowded</td>\n",
" <td> narrow</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> yellow</td>\n",
" <td> yellow</td>\n",
" <td> partial</td>\n",
" <td> yellow</td>\n",
" <td> one</td>\n",
" <td> evanescent</td>\n",
" <td> white</td>\n",
" <td> clustered</td>\n",
" <td> leaves</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5623</th>\n",
" <td> edible</td>\n",
" <td> knobbed</td>\n",
" <td> smooth</td>\n",
" <td> brown</td>\n",
" <td> no</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> brown</td>\n",
" <td> brown</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> two</td>\n",
" <td> pendant</td>\n",
" <td> white</td>\n",
" <td> solitary</td>\n",
" <td> woods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5624</th>\n",
" <td> poisonous</td>\n",
" <td> flat</td>\n",
" <td> scaly</td>\n",
" <td> cinnamon</td>\n",
" <td> no</td>\n",
" <td> musty</td>\n",
" <td> attached</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> cinnamon</td>\n",
" <td> cinnamon</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> none</td>\n",
" <td> none</td>\n",
" <td> white</td>\n",
" <td> clustered</td>\n",
" <td> woods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5625</th>\n",
" <td> poisonous</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> brown</td>\n",
" <td> no</td>\n",
" <td> musty</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> cinnamon</td>\n",
" <td> cinnamon</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> none</td>\n",
" <td> none</td>\n",
" <td> white</td>\n",
" <td> clustered</td>\n",
" <td> woods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5626</th>\n",
" <td> poisonous</td>\n",
" <td> knobbed</td>\n",
" <td> scaly</td>\n",
" <td> red</td>\n",
" <td> no</td>\n",
" <td> musty</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> cinnamon</td>\n",
" <td> cinnamon</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> none</td>\n",
" <td> none</td>\n",
" <td> white</td>\n",
" <td> clustered</td>\n",
" <td> woods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5627</th>\n",
" <td> edible</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> brown</td>\n",
" <td> no</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> brown</td>\n",
" <td> brown</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> two</td>\n",
" <td> pendant</td>\n",
" <td> white</td>\n",
" <td> solitary</td>\n",
" <td> woods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5628</th>\n",
" <td> edible</td>\n",
" <td> flat</td>\n",
" <td> scaly</td>\n",
" <td> gray</td>\n",
" <td> bruises</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> two</td>\n",
" <td> pendant</td>\n",
" <td> white</td>\n",
" <td> several</td>\n",
" <td> paths</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5629</th>\n",
" <td> poisonous</td>\n",
" <td> flat</td>\n",
" <td> scaly</td>\n",
" <td> brown</td>\n",
" <td> no</td>\n",
" <td> musty</td>\n",
" <td> attached</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> yellow</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> cinnamon</td>\n",
" <td> cinnamon</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> none</td>\n",
" <td> none</td>\n",
" <td> white</td>\n",
" <td> clustered</td>\n",
" <td> woods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5630</th>\n",
" <td> edible</td>\n",
" <td> flat</td>\n",
" <td> scaly</td>\n",
" <td> pink</td>\n",
" <td> bruises</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> two</td>\n",
" <td> pendant</td>\n",
" <td> white</td>\n",
" <td> several</td>\n",
" <td> paths</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5631</th>\n",
" <td> edible</td>\n",
" <td> flat</td>\n",
" <td> smooth</td>\n",
" <td> brown</td>\n",
" <td> no</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> brown</td>\n",
" <td> brown</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> two</td>\n",
" <td> pendant</td>\n",
" <td> white</td>\n",
" <td> solitary</td>\n",
" <td> paths</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5632</th>\n",
" <td> edible</td>\n",
" <td> flat</td>\n",
" <td> scaly</td>\n",
" <td> gray</td>\n",
" <td> bruises</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> two</td>\n",
" <td> pendant</td>\n",
" <td> white</td>\n",
" <td> solitary</td>\n",
" <td> paths</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5633</th>\n",
" <td> poisonous</td>\n",
" <td> knobbed</td>\n",
" <td> scaly</td>\n",
" <td> red</td>\n",
" <td> no</td>\n",
" <td> musty</td>\n",
" <td> attached</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> cinnamon</td>\n",
" <td> cinnamon</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> none</td>\n",
" <td> none</td>\n",
" <td> white</td>\n",
" <td> clustered</td>\n",
" <td> woods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5634</th>\n",
" <td> edible</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> cinnamon</td>\n",
" <td> bruises</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> two</td>\n",
" <td> pendant</td>\n",
" <td> white</td>\n",
" <td> several</td>\n",
" <td> paths</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5635</th>\n",
" <td> edible</td>\n",
" <td> flat</td>\n",
" <td> smooth</td>\n",
" <td> cinnamon</td>\n",
" <td> bruises</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> two</td>\n",
" <td> pendant</td>\n",
" <td> white</td>\n",
" <td> several</td>\n",
" <td> paths</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5636</th>\n",
" <td> edible</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> brown</td>\n",
" <td> bruises</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> two</td>\n",
" <td> pendant</td>\n",
" <td> white</td>\n",
" <td> solitary</td>\n",
" <td> paths</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5637</th>\n",
" <td> poisonous</td>\n",
" <td> knobbed</td>\n",
" <td> scaly</td>\n",
" <td> cinnamon</td>\n",
" <td> no</td>\n",
" <td> musty</td>\n",
" <td> attached</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> yellow</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> cinnamon</td>\n",
" <td> cinnamon</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> none</td>\n",
" <td> none</td>\n",
" <td> white</td>\n",
" <td> clustered</td>\n",
" <td> woods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5638</th>\n",
" <td> edible</td>\n",
" <td> flat</td>\n",
" <td> smooth</td>\n",
" <td> brown</td>\n",
" <td> no</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> brown</td>\n",
" <td> brown</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> two</td>\n",
" <td> pendant</td>\n",
" <td> white</td>\n",
" <td> solitary</td>\n",
" <td> woods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5639</th>\n",
" <td> edible</td>\n",
" <td> bell</td>\n",
" <td> scaly</td>\n",
" <td> brown</td>\n",
" <td> no</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> brown</td>\n",
" <td> brown</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> two</td>\n",
" <td> pendant</td>\n",
" <td> white</td>\n",
" <td> solitary</td>\n",
" <td> paths</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5640</th>\n",
" <td> edible</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> brown</td>\n",
" <td> no</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> brown</td>\n",
" <td> brown</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> two</td>\n",
" <td> pendant</td>\n",
" <td> white</td>\n",
" <td> solitary</td>\n",
" <td> paths</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5641</th>\n",
" <td> edible</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> gray</td>\n",
" <td> bruises</td>\n",
" <td> none</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> white</td>\n",
" <td>...</td>\n",
" <td> smooth</td>\n",
" <td> white</td>\n",
" <td> white</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> two</td>\n",
" <td> pendant</td>\n",
" <td> white</td>\n",
" <td> solitary</td>\n",
" <td> paths</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5642</th>\n",
" <td> poisonous</td>\n",
" <td> convex</td>\n",
" <td> scaly</td>\n",
" <td> cinnamon</td>\n",
" <td> no</td>\n",
" <td> musty</td>\n",
" <td> free</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> yellow</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> cinnamon</td>\n",
" <td> cinnamon</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> none</td>\n",
" <td> none</td>\n",
" <td> white</td>\n",
" <td> clustered</td>\n",
" <td> woods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5643</th>\n",
" <td> poisonous</td>\n",
" <td> flat</td>\n",
" <td> scaly</td>\n",
" <td> cinnamon</td>\n",
" <td> no</td>\n",
" <td> musty</td>\n",
" <td> attached</td>\n",
" <td> close</td>\n",
" <td> broad</td>\n",
" <td> yellow</td>\n",
" <td>...</td>\n",
" <td> scaly</td>\n",
" <td> cinnamon</td>\n",
" <td> cinnamon</td>\n",
" <td> partial</td>\n",
" <td> white</td>\n",
" <td> none</td>\n",
" <td> none</td>\n",
" <td> white</td>\n",
" <td> clustered</td>\n",
" <td> woods</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5644 rows \u00d7 23 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 1,
"text": [
" class cap-shape cap-surface cap-color bruises? odor \\\n",
"0 poisonous convex smooth brown bruises pungent \n",
"1 edible convex smooth yellow bruises almond \n",
"2 edible bell smooth white bruises anise \n",
"3 poisonous convex scaly white bruises pungent \n",
"4 edible convex smooth gray no none \n",
"5 edible convex scaly yellow bruises almond \n",
"6 edible bell smooth white bruises almond \n",
"7 edible bell scaly white bruises anise \n",
"8 poisonous convex scaly white bruises pungent \n",
"9 edible bell smooth yellow bruises almond \n",
"10 edible convex scaly yellow bruises anise \n",
"11 edible convex scaly yellow bruises almond \n",
"12 edible bell smooth yellow bruises almond \n",
"13 poisonous convex scaly white bruises pungent \n",
"14 edible convex fibrous brown no none \n",
"15 edible sunken fibrous gray no none \n",
"16 edible flat fibrous white no none \n",
"17 poisonous convex smooth brown bruises pungent \n",
"18 poisonous convex scaly white bruises pungent \n",
"19 poisonous convex smooth brown bruises pungent \n",
"20 edible bell smooth yellow bruises almond \n",
"21 poisonous convex scaly brown bruises pungent \n",
"22 edible bell scaly yellow bruises anise \n",
"23 edible bell scaly white bruises almond \n",
"24 edible bell smooth white bruises anise \n",
"25 poisonous flat smooth white bruises pungent \n",
"26 edible convex scaly yellow bruises almond \n",
"27 edible convex scaly white bruises anise \n",
"28 edible flat fibrous brown no none \n",
"29 edible convex smooth yellow bruises almond \n",
"... ... ... ... ... ... ... \n",
"5614 edible flat smooth gray bruises none \n",
"5615 edible convex smooth pink bruises none \n",
"5616 poisonous conical scaly yellow no none \n",
"5617 poisonous knobbed scaly red no musty \n",
"5618 poisonous convex scaly cinnamon no musty \n",
"5619 edible bell scaly brown no none \n",
"5620 edible convex scaly brown bruises none \n",
"5621 poisonous convex scaly brown no musty \n",
"5622 poisonous knobbed scaly yellow no none \n",
"5623 edible knobbed smooth brown no none \n",
"5624 poisonous flat scaly cinnamon no musty \n",
"5625 poisonous convex scaly brown no musty \n",
"5626 poisonous knobbed scaly red no musty \n",
"5627 edible convex scaly brown no none \n",
"5628 edible flat scaly gray bruises none \n",
"5629 poisonous flat scaly brown no musty \n",
"5630 edible flat scaly pink bruises none \n",
"5631 edible flat smooth brown no none \n",
"5632 edible flat scaly gray bruises none \n",
"5633 poisonous knobbed scaly red no musty \n",
"5634 edible convex scaly cinnamon bruises none \n",
"5635 edible flat smooth cinnamon bruises none \n",
"5636 edible convex scaly brown bruises none \n",
"5637 poisonous knobbed scaly cinnamon no musty \n",
"5638 edible flat smooth brown no none \n",
"5639 edible bell scaly brown no none \n",
"5640 edible convex scaly brown no none \n",
"5641 edible convex scaly gray bruises none \n",
"5642 poisonous convex scaly cinnamon no musty \n",
"5643 poisonous flat scaly cinnamon no musty \n",
"\n",
" gill-attachment gill-spacing gill-size gill-color ... \\\n",
"0 free close narrow black ... \n",
"1 free close broad black ... \n",
"2 free close broad brown ... \n",
"3 free close narrow brown ... \n",
"4 free crowded broad black ... \n",
"5 free close broad brown ... \n",
"6 free close broad gray ... \n",
"7 free close broad brown ... \n",
"8 free close narrow pink ... \n",
"9 free close broad gray ... \n",
"10 free close broad gray ... \n",
"11 free close broad brown ... \n",
"12 free close broad white ... \n",
"13 free close narrow black ... \n",
"14 free crowded broad brown ... \n",
"15 free close narrow black ... \n",
"16 free crowded broad black ... \n",
"17 free close narrow brown ... \n",
"18 free close narrow brown ... \n",
"19 free close narrow black ... \n",
"20 free close broad black ... \n",
"21 free close narrow brown ... \n",
"22 free close broad black ... \n",
"23 free close broad white ... \n",
"24 free close broad gray ... \n",
"25 free close narrow brown ... \n",
"26 free close broad brown ... \n",
"27 free close broad white ... \n",
"28 free close narrow black ... \n",
"29 free crowded narrow brown ... \n",
"... ... ... ... ... ... \n",
"5614 free close broad white ... \n",
"5615 free close broad white ... \n",
"5616 free crowded narrow white ... \n",
"5617 attached close broad yellow ... \n",
"5618 free close broad white ... \n",
"5619 free close broad white ... \n",
"5620 free close broad white ... \n",
"5621 attached close broad yellow ... \n",
"5622 free crowded narrow white ... \n",
"5623 free close broad white ... \n",
"5624 attached close broad white ... \n",
"5625 free close broad white ... \n",
"5626 free close broad white ... \n",
"5627 free close broad white ... \n",
"5628 free close broad white ... \n",
"5629 attached close broad yellow ... \n",
"5630 free close broad white ... \n",
"5631 free close broad white ... \n",
"5632 free close broad white ... \n",
"5633 attached close broad white ... \n",
"5634 free close broad white ... \n",
"5635 free close broad white ... \n",
"5636 free close broad white ... \n",
"5637 attached close broad yellow ... \n",
"5638 free close broad white ... \n",
"5639 free close broad white ... \n",
"5640 free close broad white ... \n",
"5641 free close broad white ... \n",
"5642 free close broad yellow ... \n",
"5643 attached close broad yellow ... \n",
"\n",
" stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring \\\n",
"0 smooth white white \n",
"1 smooth white white \n",
"2 smooth white white \n",
"3 smooth white white \n",
"4 smooth white white \n",
"5 smooth white white \n",
"6 smooth white white \n",
"7 smooth white white \n",
"8 smooth white white \n",
"9 smooth white white \n",
"10 smooth white white \n",
"11 smooth white white \n",
"12 smooth white white \n",
"13 smooth white white \n",
"14 fibrous white white \n",
"15 smooth white white \n",
"16 smooth white white \n",
"17 smooth white white \n",
"18 smooth white white \n",
"19 smooth white white \n",
"20 smooth white white \n",
"21 smooth white white \n",
"22 smooth white white \n",
"23 smooth white white \n",
"24 smooth white white \n",
"25 smooth white white \n",
"26 smooth white white \n",
"27 smooth white white \n",
"28 smooth white white \n",
"29 smooth white white \n",
"... ... ... ... \n",
"5614 smooth white white \n",
"5615 smooth white white \n",
"5616 scaly yellow yellow \n",
"5617 scaly cinnamon cinnamon \n",
"5618 scaly cinnamon cinnamon \n",
"5619 scaly brown brown \n",
"5620 smooth white white \n",
"5621 scaly cinnamon cinnamon \n",
"5622 scaly yellow yellow \n",
"5623 scaly brown brown \n",
"5624 scaly cinnamon cinnamon \n",
"5625 scaly cinnamon cinnamon \n",
"5626 scaly cinnamon cinnamon \n",
"5627 scaly brown brown \n",
"5628 smooth white white \n",
"5629 scaly cinnamon cinnamon \n",
"5630 smooth white white \n",
"5631 scaly brown brown \n",
"5632 smooth white white \n",
"5633 scaly cinnamon cinnamon \n",
"5634 smooth white white \n",
"5635 smooth white white \n",
"5636 smooth white white \n",
"5637 scaly cinnamon cinnamon \n",
"5638 scaly brown brown \n",
"5639 scaly brown brown \n",
"5640 scaly brown brown \n",
"5641 smooth white white \n",
"5642 scaly cinnamon cinnamon \n",
"5643 scaly cinnamon cinnamon \n",
"\n",
" veil-type veil-color ring-number ring-type spore-print-color \\\n",
"0 partial white one pendant black \n",
"1 partial white one pendant brown \n",
"2 partial white one pendant brown \n",
"3 partial white one pendant black \n",
"4 partial white one evanescent brown \n",
"5 partial white one pendant black \n",
"6 partial white one pendant black \n",
"7 partial white one pendant brown \n",
"8 partial white one pendant black \n",
"9 partial white one pendant black \n",
"10 partial white one pendant brown \n",
"11 partial white one pendant black \n",
"12 partial white one pendant brown \n",
"13 partial white one pendant brown \n",
"14 partial white one evanescent black \n",
"15 partial white one pendant brown \n",
"16 partial white one evanescent brown \n",
"17 partial white one pendant black \n",
"18 partial white one pendant brown \n",
"19 partial white one pendant brown \n",
"20 partial white one pendant brown \n",
"21 partial white one pendant brown \n",
"22 partial white one pendant brown \n",
"23 partial white one pendant brown \n",
"24 partial white one pendant black \n",
"25 partial white one pendant brown \n",
"26 partial white one pendant brown \n",
"27 partial white one pendant brown \n",
"28 partial white one pendant black \n",
"29 partial white one pendant brown \n",
"... ... ... ... ... ... \n",
"5614 partial white two pendant white \n",
"5615 partial white two pendant white \n",
"5616 partial yellow one evanescent white \n",
"5617 partial white none none white \n",
"5618 partial white none none white \n",
"5619 partial white two pendant white \n",
"5620 partial white two pendant white \n",
"5621 partial white none none white \n",
"5622 partial yellow one evanescent white \n",
"5623 partial white two pendant white \n",
"5624 partial white none none white \n",
"5625 partial white none none white \n",
"5626 partial white none none white \n",
"5627 partial white two pendant white \n",
"5628 partial white two pendant white \n",
"5629 partial white none none white \n",
"5630 partial white two pendant white \n",
"5631 partial white two pendant white \n",
"5632 partial white two pendant white \n",
"5633 partial white none none white \n",
"5634 partial white two pendant white \n",
"5635 partial white two pendant white \n",
"5636 partial white two pendant white \n",
"5637 partial white none none white \n",
"5638 partial white two pendant white \n",
"5639 partial white two pendant white \n",
"5640 partial white two pendant white \n",
"5641 partial white two pendant white \n",
"5642 partial white none none white \n",
"5643 partial white none none white \n",
"\n",
" population habitat \n",
"0 scattered urban \n",
"1 numerous grasses \n",
"2 numerous meadows \n",
"3 scattered urban \n",
"4 abundant grasses \n",
"5 numerous grasses \n",
"6 numerous meadows \n",
"7 scattered meadows \n",
"8 several grasses \n",
"9 scattered meadows \n",
"10 numerous grasses \n",
"11 scattered meadows \n",
"12 scattered grasses \n",
"13 several urban \n",
"14 abundant grasses \n",
"15 solitary urban \n",
"16 abundant grasses \n",
"17 scattered grasses \n",
"18 scattered urban \n",
"19 scattered urban \n",
"20 scattered meadows \n",
"21 several grasses \n",
"22 scattered meadows \n",
"23 numerous meadows \n",
"24 scattered meadows \n",
"25 several grasses \n",
"26 numerous meadows \n",
"27 numerous meadows \n",
"28 solitary urban \n",
"29 several woods \n",
"... ... ... \n",
"5614 solitary paths \n",
"5615 several paths \n",
"5616 clustered leaves \n",
"5617 clustered woods \n",
"5618 clustered woods \n",
"5619 solitary woods \n",
"5620 several paths \n",
"5621 clustered woods \n",
"5622 clustered leaves \n",
"5623 solitary woods \n",
"5624 clustered woods \n",
"5625 clustered woods \n",
"5626 clustered woods \n",
"5627 solitary woods \n",
"5628 several paths \n",
"5629 clustered woods \n",
"5630 several paths \n",
"5631 solitary paths \n",
"5632 solitary paths \n",
"5633 clustered woods \n",
"5634 several paths \n",
"5635 several paths \n",
"5636 solitary paths \n",
"5637 clustered woods \n",
"5638 solitary woods \n",
"5639 solitary paths \n",
"5640 solitary paths \n",
"5641 solitary paths \n",
"5642 clustered woods \n",
"5643 clustered woods \n",
"\n",
"[5644 rows x 23 columns]"
]
}
],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Entropy\n",
"\n",
"The decision tree algorithm works by looking at all the attributes, and picking the 'best' one to split the data on, then running recursively on the split data.\n",
"\n",
"The way we pick the 'best' attribute to split on is by picking the attribute that decreases 'entropy'.\n",
"\n",
"What's entropy? Recall that we're trying to classify mushrooms as poisonous or not. Entropy is a value that will be low if a group of mushrooms mostly has the same class (all poisonous or all edible) and high if a group of mushrooms varies in their classes (half poisonous and half edible). So every split of the data that minimizes entropy is a split that does a good job classifying."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def entropy(probs):\n",
" '''\n",
" Takes a list of probabilities and calculates their entropy\n",
" '''\n",
" import math\n",
" return sum( [-prob*math.log(prob, 2) for prob in probs] )\n",
" \n",
"\n",
"def entropy_of_list(a_list):\n",
" '''\n",
" Takes a list of items with discrete values (e.g., poisonous, edible)\n",
" and returns the entropy for those items.\n",
" '''\n",
" from collections import Counter\n",
" \n",
" # Tally Up:\n",
" cnt = Counter(x for x in a_list)\n",
" \n",
" # Convert to Proportion\n",
" num_instances = len(a_list)*1.0\n",
" probs = [x / num_instances for x in cnt.values()]\n",
" \n",
" # Calculate Entropy:\n",
" return entropy(probs)\n",
" \n",
"# The initial entropy of the poisonous/not attribute for our dataset.\n",
"total_entropy = entropy_of_list(df_shroom['class'])\n",
"print total_entropy"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.959441337353\n"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to decide which attribute to split on, we want to quantify how each attribute decreases the entropy. \n",
"\n",
"We do this in a fairly intuitive way: we split our dataset by the possible values of an attribute, then do a weighted sum of the entropies for each of these split datasets, weighted by how big that sub-dataset is. \n",
"\n",
"We'll make a function that quantifies the decrease in entropy, or conversely, the gain in information."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def information_gain(df, split_attribute_name, target_attribute_name, trace=0):\n",
" '''\n",
" Takes a DataFrame of attributes, and quantifies the entropy of a target\n",
" attribute after performing a split along the values of another attribute.\n",
" '''\n",
" \n",
" # Split Data by Possible Vals of Attribute:\n",
" df_split = df.groupby(split_attribute_name)\n",
" \n",
" # Calculate Entropy for Target Attribute, as well as Proportion of Obs in Each Data-Split\n",
" nobs = len(df.index) * 1.0\n",
" df_agg_ent = df_split.agg({target_attribute_name : [entropy_of_list, lambda x: len(x)/nobs] })[target_attribute_name]\n",
" df_agg_ent.columns = ['Entropy', 'PropObservations']\n",
" if trace: # helps understand what fxn is doing:\n",
" print df_agg_ent\n",
" \n",
" # Calculate Information Gain:\n",
" new_entropy = sum( df_agg_ent['Entropy'] * df_agg_ent['PropObservations'] )\n",
" old_entropy = entropy_of_list(df[target_attribute_name])\n",
" return old_entropy-new_entropy\n",
"\n",
"print '\\nExample: Info-gain for best attribute is ' + str( information_gain(df_shroom, 'odor', 'class') )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Example: Info-gain for best attribute is 0.859670435885\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ID3 Decision Tree Algorithm\n",
"\n",
"Now we'll write the decision tree algorithm itself, which is called \"ID3\"."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def id3(df, target_attribute_name, attribute_names, default_class=None):\n",
" \n",
" ## Tally target attribute:\n",
" from collections import Counter\n",
" cnt = Counter(x for x in df[target_attribute_name])\n",
" \n",
" ## First check: Is this split of the dataset homogeneous?\n",
" # (e.g., all mushrooms in this set are poisonous)\n",
" # if yes, return that homogenous label (e.g., 'poisonous')\n",
" if len(cnt) == 1:\n",
" return cnt.keys()[0]\n",
" \n",
" ## Second check: Is this split of the dataset empty?\n",
" # if yes, return a default value\n",
" elif df.empty or (not attribute_names):\n",
" return default_class \n",
" \n",
" ## Otherwise: This dataset is ready to be divvied up!\n",
" else:\n",
" # Get Default Value for next recursive call of this function:\n",
" index_of_max = cnt.values().index(max(cnt.values())) \n",
" default_class = cnt.keys()[index_of_max] # most common value of target attribute in dataset\n",
" \n",
" # Choose Best Attribute to split on:\n",
" gainz = [information_gain(df, attr, target_attribute_name) for attr in attribute_names]\n",
" index_of_max = gainz.index(max(gainz)) \n",
" best_attr = attribute_names[index_of_max]\n",
" \n",
" # Create an empty tree, to be populated in a moment\n",
" tree = {best_attr:{}}\n",
" remaining_attribute_names = [i for i in attribute_names if i != best_attr]\n",
" \n",
" # Split dataset\n",
" # On each split, recursively call this algorithm.\n",
" # populate the empty tree with subtrees, which\n",
" # are the result of the recursive call\n",
" for attr_val, data_subset in df.groupby(best_attr):\n",
" subtree = id3(data_subset,\n",
" target_attribute_name,\n",
" remaining_attribute_names,\n",
" default_class)\n",
" tree[best_attr][attr_val] = subtree\n",
" return tree"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Get Predictor Names (all but 'class')\n",
"attribute_names = list(df_shroom.columns)\n",
"attribute_names.remove('class')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Run Algorithm:\n",
"from pprint import pprint\n",
"tree = id3(df_shroom, 'class', attribute_names)\n",
"pprint(tree)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"{'odor': {'almond': 'edible',\n",
" 'anise': 'edible',\n",
" 'creosote': 'poisonous',\n",
" 'foul': 'poisonous',\n",
" 'musty': 'poisonous',\n",
" 'none': {'spore-print-color': {'black': 'edible',\n",
" 'brown': 'edible',\n",
" 'green': 'poisonous',\n",
" 'white': {'ring-type': {'evanescent': {'stalk-surface-above-ring': {'fibrous': 'edible',\n",
" 'scaly': 'poisonous',\n",
" 'smooth': 'edible'}},\n",
" 'pendant': {'stalk-surface-above-ring': {'scaly': 'edible',\n",
" 'smooth': {'stalk-surface-below-ring': {'smooth': {'stalk-color-above-ring': {'white': {'stalk-color-below-ring': {'white': {'stalk-shape': {'enlarging': {'gill-color': {'white': {'cap-color': {'brown': 'edible',\n",
" 'cinnamon': 'edible',\n",
" 'gray': 'edible',\n",
" 'pink': 'edible',\n",
" 'white': 'poisonous'}}}}}}}}}}}}}}}}}},\n",
" 'pungent': 'poisonous'}}\n"
]
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Classification Accuracy\n",
"\n",
"Let's make sure our resulting tree accurately predicts the class, based on the features.\n",
"\n",
"Below is a 'classify' algorithm that takes an instance and classifies it based on the tree."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def classify(instance, tree, default=None):\n",
" attribute = tree.keys()[0]\n",
" if instance[attribute] in tree[attribute].keys():\n",
" result = tree[attribute][instance[attribute]]\n",
" if isinstance(result, dict): # this is a tree, delve deeper\n",
" return classify(instance, result)\n",
" else:\n",
" return result # this is a label\n",
" else:\n",
" return default"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df_shroom['predicted'] = df_shroom.apply(classify, axis=1, args=(tree,'poisonous') ) \n",
" # classify func allows for a default arg: when tree doesn't have answer for a particular\n",
" # combitation of attribute-values, we can use 'poisonous' as the default guess (better safe than sorry!)\n",
"\n",
"print 'Accuracy is ' + str( sum(df_shroom['class']==df_shroom['predicted'] ) / (1.0*len(df_shroom.index)) )\n",
"\n",
"df_shroom[['class', 'predicted']]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Accuracy is 1.0\n"
]
},
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>class</th>\n",
" <th>predicted</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0 </th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3 </th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8 </th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13 </th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17 </th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18 </th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19 </th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21 </th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25 </th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29 </th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5614</th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5615</th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5616</th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5617</th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5618</th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5619</th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5620</th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5621</th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5622</th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5623</th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5624</th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5625</th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5626</th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5627</th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5628</th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5629</th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5630</th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5631</th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5632</th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5633</th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5634</th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5635</th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5636</th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5637</th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5638</th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5639</th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5640</th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5641</th>\n",
" <td> edible</td>\n",
" <td> edible</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5642</th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5643</th>\n",
" <td> poisonous</td>\n",
" <td> poisonous</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5644 rows \u00d7 2 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 20,
"text": [
" class predicted\n",
"0 poisonous poisonous\n",
"1 edible edible\n",
"2 edible edible\n",
"3 poisonous poisonous\n",
"4 edible edible\n",
"5 edible edible\n",
"6 edible edible\n",
"7 edible edible\n",
"8 poisonous poisonous\n",
"9 edible edible\n",
"10 edible edible\n",
"11 edible edible\n",
"12 edible edible\n",
"13 poisonous poisonous\n",
"14 edible edible\n",
"15 edible edible\n",
"16 edible edible\n",
"17 poisonous poisonous\n",
"18 poisonous poisonous\n",
"19 poisonous poisonous\n",
"20 edible edible\n",
"21 poisonous poisonous\n",
"22 edible edible\n",
"23 edible edible\n",
"24 edible edible\n",
"25 poisonous poisonous\n",
"26 edible edible\n",
"27 edible edible\n",
"28 edible edible\n",
"29 edible edible\n",
"... ... ...\n",
"5614 edible edible\n",
"5615 edible edible\n",
"5616 poisonous poisonous\n",
"5617 poisonous poisonous\n",
"5618 poisonous poisonous\n",
"5619 edible edible\n",
"5620 edible edible\n",
"5621 poisonous poisonous\n",
"5622 poisonous poisonous\n",
"5623 edible edible\n",
"5624 poisonous poisonous\n",
"5625 poisonous poisonous\n",
"5626 poisonous poisonous\n",
"5627 edible edible\n",
"5628 edible edible\n",
"5629 poisonous poisonous\n",
"5630 edible edible\n",
"5631 edible edible\n",
"5632 edible edible\n",
"5633 poisonous poisonous\n",
"5634 edible edible\n",
"5635 edible edible\n",
"5636 edible edible\n",
"5637 poisonous poisonous\n",
"5638 edible edible\n",
"5639 edible edible\n",
"5640 edible edible\n",
"5641 edible edible\n",
"5642 poisonous poisonous\n",
"5643 poisonous poisonous\n",
"\n",
"[5644 rows x 2 columns]"
]
}
],
"prompt_number": 20
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Classification Accuracy: Training/Testing Set\n",
"\n",
"Of course, a more accurate assessement of the algorithm is to train it on a subset of the data, then test it on a different subset."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"training_data = df_shroom.iloc[1:-1000] # all but last thousand instances\n",
"test_data = df_shroom.iloc[-1000:] # just the last thousand\n",
"train_tree = id3(training_data, 'class', attribute_names)\n",
"\n",
"test_data['predicted2'] = test_data.apply( # <---- test_data source\n",
" classify, \n",
" axis=1, \n",
" args=(train_tree,'poisonous') ) # <---- train_data tree\n",
"\n",
"print 'Accuracy is ' + str( sum(test_data['class']==test_data['predicted2'] ) / (1.0*len(test_data.index)) )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Accuracy is 0.944\n"
]
}
],
"prompt_number": 29
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment