Skip to content

Instantly share code, notes, and snippets.

@pohzipohzi
Last active January 23, 2018 16:42
Show Gist options
  • Save pohzipohzi/f152374bc79550f1228cc413c0f9e61e to your computer and use it in GitHub Desktop.
Save pohzipohzi/f152374bc79550f1228cc413c0f9e61e to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# BC2407 Seminar 2 - Severe Injury Rules\n",
"\n",
"In this project we aim to identify 3 useful association rules for the [severeinjuries kaggle dataset](https://www.kaggle.com/jboysen/injured-workers)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>UPA</th>\n",
" <th>EventDate</th>\n",
" <th>Employer</th>\n",
" <th>Address1</th>\n",
" <th>Address2</th>\n",
" <th>City</th>\n",
" <th>State</th>\n",
" <th>Zip</th>\n",
" <th>Latitude</th>\n",
" <th>...</th>\n",
" <th>Nature</th>\n",
" <th>NatureTitle</th>\n",
" <th>Part of Body</th>\n",
" <th>Part of Body Title</th>\n",
" <th>Event</th>\n",
" <th>EventTitle</th>\n",
" <th>Source</th>\n",
" <th>SourceTitle</th>\n",
" <th>Secondary Source</th>\n",
" <th>Secondary Source Title</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2015010015</td>\n",
" <td>931176</td>\n",
" <td>1/1/2015</td>\n",
" <td>FCI Otisville Federal Correctional Institution</td>\n",
" <td>Two Mile Drive</td>\n",
" <td>NaN</td>\n",
" <td>OTISVILLE</td>\n",
" <td>NEW YORK</td>\n",
" <td>10963.0</td>\n",
" <td>41.46</td>\n",
" <td>...</td>\n",
" <td>111</td>\n",
" <td>Fractures</td>\n",
" <td>513</td>\n",
" <td>Lower leg(s)</td>\n",
" <td>1214</td>\n",
" <td>Injured by physical contact with person while ...</td>\n",
" <td>5721</td>\n",
" <td>Co-worker</td>\n",
" <td>5772.0</td>\n",
" <td>Inmate or detainee in custody</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2015010016</td>\n",
" <td>930267</td>\n",
" <td>1/1/2015</td>\n",
" <td>Kalahari Manufacturing LLC</td>\n",
" <td>171 Progress Drive</td>\n",
" <td>NaN</td>\n",
" <td>LAKE DELTON</td>\n",
" <td>WISCONSIN</td>\n",
" <td>53940.0</td>\n",
" <td>43.59</td>\n",
" <td>...</td>\n",
" <td>1522</td>\n",
" <td>Second degree heat (thermal) burns</td>\n",
" <td>519</td>\n",
" <td>Leg(s), n.e.c.</td>\n",
" <td>317</td>\n",
" <td>Ignition of vapors, gases, or liquids</td>\n",
" <td>7261</td>\n",
" <td>Welding, cutting, and blow torches</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2015010018</td>\n",
" <td>929823</td>\n",
" <td>1/1/2015</td>\n",
" <td>Schneider National Bulk Carrier</td>\n",
" <td>420 CORAOPOLIS ROAD</td>\n",
" <td>NaN</td>\n",
" <td>CORAOPOLIS</td>\n",
" <td>PENNSYLVANIA</td>\n",
" <td>15108.0</td>\n",
" <td>40.49</td>\n",
" <td>...</td>\n",
" <td>10</td>\n",
" <td>Traumatic injuries and disorders, unspecified</td>\n",
" <td>9999</td>\n",
" <td>Nonclassifiable</td>\n",
" <td>4331</td>\n",
" <td>Other fall to lower level less than 6 feet</td>\n",
" <td>8421</td>\n",
" <td>Semi, tractor-trailer, tanker truck</td>\n",
" <td>741.0</td>\n",
" <td>Ladders-fixed</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2015010019</td>\n",
" <td>929711</td>\n",
" <td>1/1/2015</td>\n",
" <td>PEPSI BOTTLING GROUP INC.</td>\n",
" <td>4541 HOUSTON AVE.</td>\n",
" <td>NaN</td>\n",
" <td>MACON</td>\n",
" <td>GEORGIA</td>\n",
" <td>31206.0</td>\n",
" <td>32.77</td>\n",
" <td>...</td>\n",
" <td>1972</td>\n",
" <td>Soreness, pain, hurt-nonspecified injury</td>\n",
" <td>510</td>\n",
" <td>Leg(s), unspecified</td>\n",
" <td>640</td>\n",
" <td>Caught in or compressed by equipment or object...</td>\n",
" <td>8623</td>\n",
" <td>Pallet jack-powered</td>\n",
" <td>8420.0</td>\n",
" <td>Truck-motorized freight hauling and utility, u...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2015010020</td>\n",
" <td>929642</td>\n",
" <td>1/1/2015</td>\n",
" <td>North American Pipe Corporation</td>\n",
" <td>210 South Arch Street</td>\n",
" <td>NaN</td>\n",
" <td>JANESVILLE</td>\n",
" <td>WISCONSIN</td>\n",
" <td>53545.0</td>\n",
" <td>42.67</td>\n",
" <td>...</td>\n",
" <td>111</td>\n",
" <td>Fractures</td>\n",
" <td>4429</td>\n",
" <td>Finger(s), fingernail(s), n.e.c.</td>\n",
" <td>6411</td>\n",
" <td>Caught in running equipment or machinery durin...</td>\n",
" <td>350</td>\n",
" <td>Metal, woodworking, and special material machi...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 26 columns</p>\n",
"</div>"
],
"text/plain": [
" ID UPA EventDate \\\n",
"0 2015010015 931176 1/1/2015 \n",
"1 2015010016 930267 1/1/2015 \n",
"2 2015010018 929823 1/1/2015 \n",
"3 2015010019 929711 1/1/2015 \n",
"4 2015010020 929642 1/1/2015 \n",
"\n",
" Employer Address1 \\\n",
"0 FCI Otisville Federal Correctional Institution Two Mile Drive \n",
"1 Kalahari Manufacturing LLC 171 Progress Drive \n",
"2 Schneider National Bulk Carrier 420 CORAOPOLIS ROAD \n",
"3 PEPSI BOTTLING GROUP INC. 4541 HOUSTON AVE. \n",
"4 North American Pipe Corporation 210 South Arch Street \n",
"\n",
" Address2 City State Zip Latitude \\\n",
"0 NaN OTISVILLE NEW YORK 10963.0 41.46 \n",
"1 NaN LAKE DELTON WISCONSIN 53940.0 43.59 \n",
"2 NaN CORAOPOLIS PENNSYLVANIA 15108.0 40.49 \n",
"3 NaN MACON GEORGIA 31206.0 32.77 \n",
"4 NaN JANESVILLE WISCONSIN 53545.0 42.67 \n",
"\n",
" ... Nature \\\n",
"0 ... 111 \n",
"1 ... 1522 \n",
"2 ... 10 \n",
"3 ... 1972 \n",
"4 ... 111 \n",
"\n",
" NatureTitle Part of Body \\\n",
"0 Fractures 513 \n",
"1 Second degree heat (thermal) burns 519 \n",
"2 Traumatic injuries and disorders, unspecified 9999 \n",
"3 Soreness, pain, hurt-nonspecified injury 510 \n",
"4 Fractures 4429 \n",
"\n",
" Part of Body Title Event \\\n",
"0 Lower leg(s) 1214 \n",
"1 Leg(s), n.e.c. 317 \n",
"2 Nonclassifiable 4331 \n",
"3 Leg(s), unspecified 640 \n",
"4 Finger(s), fingernail(s), n.e.c. 6411 \n",
"\n",
" EventTitle Source \\\n",
"0 Injured by physical contact with person while ... 5721 \n",
"1 Ignition of vapors, gases, or liquids 7261 \n",
"2 Other fall to lower level less than 6 feet 8421 \n",
"3 Caught in or compressed by equipment or object... 8623 \n",
"4 Caught in running equipment or machinery durin... 350 \n",
"\n",
" SourceTitle Secondary Source \\\n",
"0 Co-worker 5772.0 \n",
"1 Welding, cutting, and blow torches NaN \n",
"2 Semi, tractor-trailer, tanker truck 741.0 \n",
"3 Pallet jack-powered 8420.0 \n",
"4 Metal, woodworking, and special material machi... NaN \n",
"\n",
" Secondary Source Title \n",
"0 Inmate or detainee in custody \n",
"1 NaN \n",
"2 Ladders-fixed \n",
"3 Truck-motorized freight hauling and utility, u... \n",
"4 NaN \n",
"\n",
"[5 rows x 26 columns]"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"df = pd.read_csv(\"injured-workers/severeinjury.csv\", encoding=\"latin-1\")\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 21578 entries, 0 to 21577\n",
"Data columns (total 8 columns):\n",
"EventDate 21578 non-null object\n",
"State 21578 non-null object\n",
"Primary NAICS 21576 non-null object\n",
"Hospitalized 21578 non-null float64\n",
"NatureTitle 21578 non-null object\n",
"Part of Body Title 21578 non-null object\n",
"EventTitle 21578 non-null object\n",
"SourceTitle 21578 non-null object\n",
"dtypes: float64(1), object(7)\n",
"memory usage: 1.3+ MB\n"
]
}
],
"source": [
"# get only columns we are interested in\n",
"# we omit the 'Amputation' column, as it is highly negatively correlated with 'Hospitalized', and is also mentioned under 'NatureTitle'\n",
"df = df[['EventDate','State','Primary NAICS','Hospitalized','NatureTitle','Part of Body Title','EventTitle','SourceTitle']]\n",
"\n",
"df.info(verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>EventDate</th>\n",
" <th>State</th>\n",
" <th>Primary NAICS</th>\n",
" <th>Hospitalized</th>\n",
" <th>NatureTitle</th>\n",
" <th>Part of Body Title</th>\n",
" <th>EventTitle</th>\n",
" <th>SourceTitle</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Jan</td>\n",
" <td>NEW YORK</td>\n",
" <td>92</td>\n",
" <td>1</td>\n",
" <td>Fractures</td>\n",
" <td>Lower leg(s)</td>\n",
" <td>Injured by physical contact with person while ...</td>\n",
" <td>Co-worker</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jan</td>\n",
" <td>WISCONSIN</td>\n",
" <td>33</td>\n",
" <td>1</td>\n",
" <td>Second degree heat (thermal) burns</td>\n",
" <td>Leg(s), n.e.c.</td>\n",
" <td>Ignition of vapors, gases, or liquids</td>\n",
" <td>Welding, cutting, and blow torches</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Jan</td>\n",
" <td>PENNSYLVANIA</td>\n",
" <td>48</td>\n",
" <td>1</td>\n",
" <td>Traumatic injuries and disorders, unspecified</td>\n",
" <td>Nonclassifiable</td>\n",
" <td>Other fall to lower level less than 6 feet</td>\n",
" <td>Semi, tractor-trailer, tanker truck</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Jan</td>\n",
" <td>GEORGIA</td>\n",
" <td>42</td>\n",
" <td>1</td>\n",
" <td>Soreness, pain, hurt-nonspecified injury</td>\n",
" <td>Leg(s), unspecified</td>\n",
" <td>Caught in or compressed by equipment or object...</td>\n",
" <td>Pallet jack-powered</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Jan</td>\n",
" <td>WISCONSIN</td>\n",
" <td>32</td>\n",
" <td>1</td>\n",
" <td>Fractures</td>\n",
" <td>Finger(s), fingernail(s), n.e.c.</td>\n",
" <td>Caught in running equipment or machinery durin...</td>\n",
" <td>Metal, woodworking, and special material machi...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" EventDate State Primary NAICS Hospitalized \\\n",
"0 Jan NEW YORK 92 1 \n",
"1 Jan WISCONSIN 33 1 \n",
"2 Jan PENNSYLVANIA 48 1 \n",
"3 Jan GEORGIA 42 1 \n",
"4 Jan WISCONSIN 32 1 \n",
"\n",
" NatureTitle \\\n",
"0 Fractures \n",
"1 Second degree heat (thermal) burns \n",
"2 Traumatic injuries and disorders, unspecified \n",
"3 Soreness, pain, hurt-nonspecified injury \n",
"4 Fractures \n",
"\n",
" Part of Body Title \\\n",
"0 Lower leg(s) \n",
"1 Leg(s), n.e.c. \n",
"2 Nonclassifiable \n",
"3 Leg(s), unspecified \n",
"4 Finger(s), fingernail(s), n.e.c. \n",
"\n",
" EventTitle \\\n",
"0 Injured by physical contact with person while ... \n",
"1 Ignition of vapors, gases, or liquids \n",
"2 Other fall to lower level less than 6 feet \n",
"3 Caught in or compressed by equipment or object... \n",
"4 Caught in running equipment or machinery durin... \n",
"\n",
" SourceTitle \n",
"0 Co-worker \n",
"1 Welding, cutting, and blow torches \n",
"2 Semi, tractor-trailer, tanker truck \n",
"3 Pallet jack-powered \n",
"4 Metal, woodworking, and special material machi... "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#transform fields:\n",
"\n",
"from datetime import datetime\n",
"\n",
"# there were some NA values, but only a few of them, so it seems appropriate to just remove entire rows\n",
"df.dropna(inplace=True)\n",
"\n",
"# get only the month for eventdate, possibly detecting seasonal patterns in the data\n",
"df['EventDate'] = df['EventDate'].apply(lambda x: datetime.strftime(datetime.strptime(x,\"%m/%d/%Y\"),\"%b\"))\n",
"\n",
"# only get the first 2 numbers for NAICS to identify the industry: see https://www.naics.com/search/\n",
"df['Primary NAICS'] = df['Primary NAICS'].apply(lambda x: int(str(x)[:2]))\n",
"\n",
"# what does it mean when hospitalized > 1? (eg. hospitalized==3 at ID=2015031117)\n",
"# the kaggle data site does not mention anything, so we convert them to booleans instead for the time being\n",
"df['Hospitalized'] = df['Hospitalized'].apply(lambda x: 0 if x==0 else 1)\n",
"\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# convert df for apriori api (change it to a bunch of 0s and 1s for columns except hospitalized)\n",
"def maketrans(df,col):\n",
" s = set(df[col])\n",
" for e in s:\n",
" mask = (df[col]==e).apply(lambda x: 1 if x else 0)\n",
" df[e] = mask\n",
" df.drop(col, axis=1, inplace=True)\n",
"\n",
"for col in list(filter(lambda x: x not in [\"Hospitalized\"], df.columns)):\n",
" maketrans(df,col)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>support</th>\n",
" <th>itemsets</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.800797</td>\n",
" <td>[Hospitalized]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.071376</td>\n",
" <td>[Nov]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.071005</td>\n",
" <td>[May]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.068919</td>\n",
" <td>[Dec]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.072164</td>\n",
" <td>[Apr]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>0.108917</td>\n",
" <td>[Feb]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>0.088710</td>\n",
" <td>[Aug]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>0.087690</td>\n",
" <td>[Jul]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>0.074481</td>\n",
" <td>[Mar]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>0.082638</td>\n",
" <td>[Sep]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>0.076706</td>\n",
" <td>[Oct]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>0.116194</td>\n",
" <td>[Jan]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>0.081201</td>\n",
" <td>[Jun]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>0.063126</td>\n",
" <td>[ILLINOIS]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>0.100111</td>\n",
" <td>[FLORIDA]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>0.076845</td>\n",
" <td>[OHIO]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>0.053578</td>\n",
" <td>[GEORGIA]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>0.079857</td>\n",
" <td>[PENNSYLVANIA]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>0.161290</td>\n",
" <td>[TEXAS]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>0.177836</td>\n",
" <td>[23]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>0.071468</td>\n",
" <td>[31]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>0.110215</td>\n",
" <td>[32]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>0.158509</td>\n",
" <td>[33]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>0.054273</td>\n",
" <td>[42]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>0.053207</td>\n",
" <td>[56]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>0.098628</td>\n",
" <td>[Soreness, pain, hurt-nonspecified injury]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>0.278875</td>\n",
" <td>[Fractures]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>0.270347</td>\n",
" <td>[Amputations]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>0.112811</td>\n",
" <td>[Fingertip(s)]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>0.054922</td>\n",
" <td>[Multiple body parts, n.e.c.]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>0.050519</td>\n",
" <td>[BODY SYSTEMS]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>0.052466</td>\n",
" <td>[Finger(s), fingernail(s), unspecified]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>32</th>\n",
" <td>0.129310</td>\n",
" <td>[Finger(s), fingernail(s), n.e.c.]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33</th>\n",
" <td>0.087736</td>\n",
" <td>[Caught in running equipment or machinery during regular operation]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td>0.055525</td>\n",
" <td>[Other fall to lower level, unspecified]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>35</th>\n",
" <td>0.061596</td>\n",
" <td>[Caught in running equipment or machinery during maintenance, cleaning ]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>36</th>\n",
" <td>0.068734</td>\n",
" <td>[Compressed or pinched by shifting objects or equipment]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>37</th>\n",
" <td>0.059047</td>\n",
" <td>[Floor, n.e.c.]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>38</th>\n",
" <td>0.056822</td>\n",
" <td>[Hospitalized, Nov]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>39</th>\n",
" <td>0.056405</td>\n",
" <td>[Hospitalized, May]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>0.054412</td>\n",
" <td>[Hospitalized, Dec]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41</th>\n",
" <td>0.056915</td>\n",
" <td>[Hospitalized, Apr]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>42</th>\n",
" <td>0.087690</td>\n",
" <td>[Hospitalized, Feb]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>43</th>\n",
" <td>0.071654</td>\n",
" <td>[Hospitalized, Aug]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>44</th>\n",
" <td>0.072164</td>\n",
" <td>[Hospitalized, Jul]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>45</th>\n",
" <td>0.059093</td>\n",
" <td>[Hospitalized, Mar]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>46</th>\n",
" <td>0.065999</td>\n",
" <td>[Hospitalized, Sep]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>47</th>\n",
" <td>0.059789</td>\n",
" <td>[Hospitalized, Oct]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>48</th>\n",
" <td>0.095198</td>\n",
" <td>[Hospitalized, Jan]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49</th>\n",
" <td>0.064655</td>\n",
" <td>[Hospitalized, Jun]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50</th>\n",
" <td>0.050519</td>\n",
" <td>[Hospitalized, ILLINOIS]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>51</th>\n",
" <td>0.088524</td>\n",
" <td>[Hospitalized, FLORIDA]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>52</th>\n",
" <td>0.054181</td>\n",
" <td>[Hospitalized, OHIO]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>53</th>\n",
" <td>0.064238</td>\n",
" <td>[Hospitalized, PENNSYLVANIA]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>54</th>\n",
" <td>0.132555</td>\n",
" <td>[Hospitalized, TEXAS]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>55</th>\n",
" <td>0.158741</td>\n",
" <td>[Hospitalized, 23]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>56</th>\n",
" <td>0.074991</td>\n",
" <td>[Hospitalized, 32]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>57</th>\n",
" <td>0.105627</td>\n",
" <td>[Hospitalized, 33]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>58</th>\n",
" <td>0.097840</td>\n",
" <td>[Hospitalized, Soreness, pain, hurt-nonspecified injury]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>59</th>\n",
" <td>0.278365</td>\n",
" <td>[Hospitalized, Fractures]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>60</th>\n",
" <td>0.075083</td>\n",
" <td>[Hospitalized, Amputations]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>61</th>\n",
" <td>0.054690</td>\n",
" <td>[Hospitalized, Multiple body parts, n.e.c.]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>62</th>\n",
" <td>0.050334</td>\n",
" <td>[Hospitalized, BODY SYSTEMS]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>63</th>\n",
" <td>0.054922</td>\n",
" <td>[Hospitalized, Other fall to lower level, unspecified]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>64</th>\n",
" <td>0.058120</td>\n",
" <td>[Hospitalized, Floor, n.e.c.]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>65</th>\n",
" <td>0.059372</td>\n",
" <td>[23, Fractures]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>66</th>\n",
" <td>0.070819</td>\n",
" <td>[33, Amputations]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>67</th>\n",
" <td>0.110771</td>\n",
" <td>[Amputations, Fingertip(s)]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>68</th>\n",
" <td>0.116009</td>\n",
" <td>[Amputations, Finger(s), fingernail(s), n.e.c.]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>69</th>\n",
" <td>0.062940</td>\n",
" <td>[Amputations, Caught in running equipment or machinery during regular operation]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>70</th>\n",
" <td>0.051678</td>\n",
" <td>[Amputations, Compressed or pinched by shifting objects or equipment]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>71</th>\n",
" <td>0.059279</td>\n",
" <td>[Hospitalized, 23, Fractures]</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" support \\\n",
"0 0.800797 \n",
"1 0.071376 \n",
"2 0.071005 \n",
"3 0.068919 \n",
"4 0.072164 \n",
"5 0.108917 \n",
"6 0.088710 \n",
"7 0.087690 \n",
"8 0.074481 \n",
"9 0.082638 \n",
"10 0.076706 \n",
"11 0.116194 \n",
"12 0.081201 \n",
"13 0.063126 \n",
"14 0.100111 \n",
"15 0.076845 \n",
"16 0.053578 \n",
"17 0.079857 \n",
"18 0.161290 \n",
"19 0.177836 \n",
"20 0.071468 \n",
"21 0.110215 \n",
"22 0.158509 \n",
"23 0.054273 \n",
"24 0.053207 \n",
"25 0.098628 \n",
"26 0.278875 \n",
"27 0.270347 \n",
"28 0.112811 \n",
"29 0.054922 \n",
"30 0.050519 \n",
"31 0.052466 \n",
"32 0.129310 \n",
"33 0.087736 \n",
"34 0.055525 \n",
"35 0.061596 \n",
"36 0.068734 \n",
"37 0.059047 \n",
"38 0.056822 \n",
"39 0.056405 \n",
"40 0.054412 \n",
"41 0.056915 \n",
"42 0.087690 \n",
"43 0.071654 \n",
"44 0.072164 \n",
"45 0.059093 \n",
"46 0.065999 \n",
"47 0.059789 \n",
"48 0.095198 \n",
"49 0.064655 \n",
"50 0.050519 \n",
"51 0.088524 \n",
"52 0.054181 \n",
"53 0.064238 \n",
"54 0.132555 \n",
"55 0.158741 \n",
"56 0.074991 \n",
"57 0.105627 \n",
"58 0.097840 \n",
"59 0.278365 \n",
"60 0.075083 \n",
"61 0.054690 \n",
"62 0.050334 \n",
"63 0.054922 \n",
"64 0.058120 \n",
"65 0.059372 \n",
"66 0.070819 \n",
"67 0.110771 \n",
"68 0.116009 \n",
"69 0.062940 \n",
"70 0.051678 \n",
"71 0.059279 \n",
"\n",
" itemsets \n",
"0 [Hospitalized] \n",
"1 [Nov] \n",
"2 [May] \n",
"3 [Dec] \n",
"4 [Apr] \n",
"5 [Feb] \n",
"6 [Aug] \n",
"7 [Jul] \n",
"8 [Mar] \n",
"9 [Sep] \n",
"10 [Oct] \n",
"11 [Jan] \n",
"12 [Jun] \n",
"13 [ILLINOIS] \n",
"14 [FLORIDA] \n",
"15 [OHIO] \n",
"16 [GEORGIA] \n",
"17 [PENNSYLVANIA] \n",
"18 [TEXAS] \n",
"19 [23] \n",
"20 [31] \n",
"21 [32] \n",
"22 [33] \n",
"23 [42] \n",
"24 [56] \n",
"25 [Soreness, pain, hurt-nonspecified injury] \n",
"26 [Fractures] \n",
"27 [Amputations] \n",
"28 [Fingertip(s)] \n",
"29 [Multiple body parts, n.e.c.] \n",
"30 [BODY SYSTEMS] \n",
"31 [Finger(s), fingernail(s), unspecified] \n",
"32 [Finger(s), fingernail(s), n.e.c.] \n",
"33 [Caught in running equipment or machinery during regular operation] \n",
"34 [Other fall to lower level, unspecified] \n",
"35 [Caught in running equipment or machinery during maintenance, cleaning ] \n",
"36 [Compressed or pinched by shifting objects or equipment] \n",
"37 [Floor, n.e.c.] \n",
"38 [Hospitalized, Nov] \n",
"39 [Hospitalized, May] \n",
"40 [Hospitalized, Dec] \n",
"41 [Hospitalized, Apr] \n",
"42 [Hospitalized, Feb] \n",
"43 [Hospitalized, Aug] \n",
"44 [Hospitalized, Jul] \n",
"45 [Hospitalized, Mar] \n",
"46 [Hospitalized, Sep] \n",
"47 [Hospitalized, Oct] \n",
"48 [Hospitalized, Jan] \n",
"49 [Hospitalized, Jun] \n",
"50 [Hospitalized, ILLINOIS] \n",
"51 [Hospitalized, FLORIDA] \n",
"52 [Hospitalized, OHIO] \n",
"53 [Hospitalized, PENNSYLVANIA] \n",
"54 [Hospitalized, TEXAS] \n",
"55 [Hospitalized, 23] \n",
"56 [Hospitalized, 32] \n",
"57 [Hospitalized, 33] \n",
"58 [Hospitalized, Soreness, pain, hurt-nonspecified injury] \n",
"59 [Hospitalized, Fractures] \n",
"60 [Hospitalized, Amputations] \n",
"61 [Hospitalized, Multiple body parts, n.e.c.] \n",
"62 [Hospitalized, BODY SYSTEMS] \n",
"63 [Hospitalized, Other fall to lower level, unspecified] \n",
"64 [Hospitalized, Floor, n.e.c.] \n",
"65 [23, Fractures] \n",
"66 [33, Amputations] \n",
"67 [Amputations, Fingertip(s)] \n",
"68 [Amputations, Finger(s), fingernail(s), n.e.c.] \n",
"69 [Amputations, Caught in running equipment or machinery during regular operation] \n",
"70 [Amputations, Compressed or pinched by shifting objects or equipment] \n",
"71 [Hospitalized, 23, Fractures] "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# obtain support and itemsets via rasbt's apriori algorithm (see https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/)\n",
"from mlxtend.frequent_patterns import apriori\n",
"\n",
"def print_full(x):\n",
" pd.set_option('display.max_rows', len(x))\n",
" pd.set_option('display.max_colwidth', 100)\n",
" display(x)\n",
" pd.reset_option('display.max_rows')\n",
"\n",
"a = apriori(df, min_support=0.05, use_colnames=True, max_len=3)\n",
"print_full(a)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>antecedants</th>\n",
" <th>consequents</th>\n",
" <th>support</th>\n",
" <th>confidence</th>\n",
" <th>lift</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>(Fingertip(s))</td>\n",
" <td>(Amputations)</td>\n",
" <td>0.112811</td>\n",
" <td>0.981923</td>\n",
" <td>3.632087</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>(Amputations)</td>\n",
" <td>(Fingertip(s))</td>\n",
" <td>0.270347</td>\n",
" <td>0.409738</td>\n",
" <td>3.632087</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>(Finger(s), fingernail(s), n.e.c.)</td>\n",
" <td>(Amputations)</td>\n",
" <td>0.129310</td>\n",
" <td>0.897133</td>\n",
" <td>3.318452</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>(Amputations)</td>\n",
" <td>(Finger(s), fingernail(s), n.e.c.)</td>\n",
" <td>0.270347</td>\n",
" <td>0.429110</td>\n",
" <td>3.318452</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>(Amputations)</td>\n",
" <td>(Compressed or pinched by shifting objects or equipment)</td>\n",
" <td>0.270347</td>\n",
" <td>0.191154</td>\n",
" <td>2.781075</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>(Compressed or pinched by shifting objects or equipment)</td>\n",
" <td>(Amputations)</td>\n",
" <td>0.068734</td>\n",
" <td>0.751854</td>\n",
" <td>2.781075</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>(Caught in running equipment or machinery during regular operation)</td>\n",
" <td>(Amputations)</td>\n",
" <td>0.087736</td>\n",
" <td>0.717380</td>\n",
" <td>2.653555</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>(Amputations)</td>\n",
" <td>(Caught in running equipment or machinery during regular operation)</td>\n",
" <td>0.270347</td>\n",
" <td>0.232813</td>\n",
" <td>2.653555</td>\n",
" </tr>\n",
" <tr>\n",
" <th>51</th>\n",
" <td>(Amputations)</td>\n",
" <td>(33)</td>\n",
" <td>0.270347</td>\n",
" <td>0.261958</td>\n",
" <td>1.652632</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50</th>\n",
" <td>(33)</td>\n",
" <td>(Amputations)</td>\n",
" <td>0.158509</td>\n",
" <td>0.446784</td>\n",
" <td>1.652632</td>\n",
" </tr>\n",
" <tr>\n",
" <th>46</th>\n",
" <td>(Fractures)</td>\n",
" <td>(Hospitalized, 23)</td>\n",
" <td>0.278875</td>\n",
" <td>0.212564</td>\n",
" <td>1.339063</td>\n",
" </tr>\n",
" <tr>\n",
" <th>45</th>\n",
" <td>(Hospitalized, 23)</td>\n",
" <td>(Fractures)</td>\n",
" <td>0.158741</td>\n",
" <td>0.373431</td>\n",
" <td>1.339063</td>\n",
" </tr>\n",
" <tr>\n",
" <th>44</th>\n",
" <td>(Fractures, 23)</td>\n",
" <td>(Hospitalized)</td>\n",
" <td>0.059372</td>\n",
" <td>0.998439</td>\n",
" <td>1.246806</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>(Fractures)</td>\n",
" <td>(Hospitalized)</td>\n",
" <td>0.278875</td>\n",
" <td>0.998172</td>\n",
" <td>1.246473</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>(Hospitalized)</td>\n",
" <td>(Fractures)</td>\n",
" <td>0.800797</td>\n",
" <td>0.347610</td>\n",
" <td>1.246473</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>(BODY SYSTEMS)</td>\n",
" <td>(Hospitalized)</td>\n",
" <td>0.050519</td>\n",
" <td>0.996330</td>\n",
" <td>1.244173</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>(Multiple body parts, n.e.c.)</td>\n",
" <td>(Hospitalized)</td>\n",
" <td>0.054922</td>\n",
" <td>0.995781</td>\n",
" <td>1.243487</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>(Soreness, pain, hurt-nonspecified injury)</td>\n",
" <td>(Hospitalized)</td>\n",
" <td>0.098628</td>\n",
" <td>0.992011</td>\n",
" <td>1.238780</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>(Hospitalized)</td>\n",
" <td>(Soreness, pain, hurt-nonspecified injury)</td>\n",
" <td>0.800797</td>\n",
" <td>0.122178</td>\n",
" <td>1.238780</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>(Other fall to lower level, unspecified)</td>\n",
" <td>(Hospitalized)</td>\n",
" <td>0.055525</td>\n",
" <td>0.989149</td>\n",
" <td>1.235205</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>(Floor, n.e.c.)</td>\n",
" <td>(Hospitalized)</td>\n",
" <td>0.059047</td>\n",
" <td>0.984301</td>\n",
" <td>1.229152</td>\n",
" </tr>\n",
" <tr>\n",
" <th>43</th>\n",
" <td>(Fractures, Hospitalized)</td>\n",
" <td>(23)</td>\n",
" <td>0.278365</td>\n",
" <td>0.212954</td>\n",
" <td>1.197469</td>\n",
" </tr>\n",
" <tr>\n",
" <th>47</th>\n",
" <td>(23)</td>\n",
" <td>(Fractures, Hospitalized)</td>\n",
" <td>0.177836</td>\n",
" <td>0.333333</td>\n",
" <td>1.197469</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>(23)</td>\n",
" <td>(Fractures)</td>\n",
" <td>0.177836</td>\n",
" <td>0.333855</td>\n",
" <td>1.197149</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>(Fractures)</td>\n",
" <td>(23)</td>\n",
" <td>0.278875</td>\n",
" <td>0.212897</td>\n",
" <td>1.197149</td>\n",
" </tr>\n",
" <tr>\n",
" <th>36</th>\n",
" <td>(23)</td>\n",
" <td>(Hospitalized)</td>\n",
" <td>0.177836</td>\n",
" <td>0.892624</td>\n",
" <td>1.114670</td>\n",
" </tr>\n",
" <tr>\n",
" <th>35</th>\n",
" <td>(Hospitalized)</td>\n",
" <td>(23)</td>\n",
" <td>0.800797</td>\n",
" <td>0.198229</td>\n",
" <td>1.114670</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33</th>\n",
" <td>(FLORIDA)</td>\n",
" <td>(Hospitalized)</td>\n",
" <td>0.100111</td>\n",
" <td>0.884259</td>\n",
" <td>1.104224</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td>(Hospitalized)</td>\n",
" <td>(FLORIDA)</td>\n",
" <td>0.800797</td>\n",
" <td>0.110545</td>\n",
" <td>1.104224</td>\n",
" </tr>\n",
" <tr>\n",
" <th>39</th>\n",
" <td>(Jul)</td>\n",
" <td>(Hospitalized)</td>\n",
" <td>0.087690</td>\n",
" <td>0.822939</td>\n",
" <td>1.027649</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41</th>\n",
" <td>(Hospitalized)</td>\n",
" <td>(TEXAS)</td>\n",
" <td>0.800797</td>\n",
" <td>0.165528</td>\n",
" <td>1.026276</td>\n",
" </tr>\n",
" <tr>\n",
" <th>42</th>\n",
" <td>(TEXAS)</td>\n",
" <td>(Hospitalized)</td>\n",
" <td>0.161290</td>\n",
" <td>0.821839</td>\n",
" <td>1.026276</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>(Jan)</td>\n",
" <td>(Hospitalized)</td>\n",
" <td>0.116194</td>\n",
" <td>0.819306</td>\n",
" <td>1.023113</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>(Hospitalized)</td>\n",
" <td>(Jan)</td>\n",
" <td>0.800797</td>\n",
" <td>0.118879</td>\n",
" <td>1.023113</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>(Aug)</td>\n",
" <td>(Hospitalized)</td>\n",
" <td>0.088710</td>\n",
" <td>0.807732</td>\n",
" <td>1.008661</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49</th>\n",
" <td>(Hospitalized)</td>\n",
" <td>(Feb)</td>\n",
" <td>0.800797</td>\n",
" <td>0.109503</td>\n",
" <td>1.005381</td>\n",
" </tr>\n",
" <tr>\n",
" <th>48</th>\n",
" <td>(Feb)</td>\n",
" <td>(Hospitalized)</td>\n",
" <td>0.108917</td>\n",
" <td>0.805106</td>\n",
" <td>1.005381</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>(PENNSYLVANIA)</td>\n",
" <td>(Hospitalized)</td>\n",
" <td>0.079857</td>\n",
" <td>0.804411</td>\n",
" <td>1.004513</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" antecedants \\\n",
"19 (Fingertip(s)) \n",
"20 (Amputations) \n",
"4 (Finger(s), fingernail(s), n.e.c.) \n",
"5 (Amputations) \n",
"24 (Amputations) \n",
"23 (Compressed or pinched by shifting objects or equipment) \n",
"12 (Caught in running equipment or machinery during regular operation) \n",
"13 (Amputations) \n",
"51 (Amputations) \n",
"50 (33) \n",
"46 (Fractures) \n",
"45 (Hospitalized, 23) \n",
"44 (Fractures, 23) \n",
"30 (Fractures) \n",
"31 (Hospitalized) \n",
"26 (BODY SYSTEMS) \n",
"17 (Multiple body parts, n.e.c.) \n",
"6 (Soreness, pain, hurt-nonspecified injury) \n",
"7 (Hospitalized) \n",
"0 (Other fall to lower level, unspecified) \n",
"22 (Floor, n.e.c.) \n",
"43 (Fractures, Hospitalized) \n",
"47 (23) \n",
"15 (23) \n",
"14 (Fractures) \n",
"36 (23) \n",
"35 (Hospitalized) \n",
"33 (FLORIDA) \n",
"34 (Hospitalized) \n",
"39 (Jul) \n",
"41 (Hospitalized) \n",
"42 (TEXAS) \n",
"3 (Jan) \n",
"2 (Hospitalized) \n",
"29 (Aug) \n",
"49 (Hospitalized) \n",
"48 (Feb) \n",
"27 (PENNSYLVANIA) \n",
"\n",
" consequents \\\n",
"19 (Amputations) \n",
"20 (Fingertip(s)) \n",
"4 (Amputations) \n",
"5 (Finger(s), fingernail(s), n.e.c.) \n",
"24 (Compressed or pinched by shifting objects or equipment) \n",
"23 (Amputations) \n",
"12 (Amputations) \n",
"13 (Caught in running equipment or machinery during regular operation) \n",
"51 (33) \n",
"50 (Amputations) \n",
"46 (Hospitalized, 23) \n",
"45 (Fractures) \n",
"44 (Hospitalized) \n",
"30 (Hospitalized) \n",
"31 (Fractures) \n",
"26 (Hospitalized) \n",
"17 (Hospitalized) \n",
"6 (Hospitalized) \n",
"7 (Soreness, pain, hurt-nonspecified injury) \n",
"0 (Hospitalized) \n",
"22 (Hospitalized) \n",
"43 (23) \n",
"47 (Fractures, Hospitalized) \n",
"15 (Fractures) \n",
"14 (23) \n",
"36 (Hospitalized) \n",
"35 (23) \n",
"33 (Hospitalized) \n",
"34 (FLORIDA) \n",
"39 (Hospitalized) \n",
"41 (TEXAS) \n",
"42 (Hospitalized) \n",
"3 (Hospitalized) \n",
"2 (Jan) \n",
"29 (Hospitalized) \n",
"49 (Feb) \n",
"48 (Hospitalized) \n",
"27 (Hospitalized) \n",
"\n",
" support confidence lift \n",
"19 0.112811 0.981923 3.632087 \n",
"20 0.270347 0.409738 3.632087 \n",
"4 0.129310 0.897133 3.318452 \n",
"5 0.270347 0.429110 3.318452 \n",
"24 0.270347 0.191154 2.781075 \n",
"23 0.068734 0.751854 2.781075 \n",
"12 0.087736 0.717380 2.653555 \n",
"13 0.270347 0.232813 2.653555 \n",
"51 0.270347 0.261958 1.652632 \n",
"50 0.158509 0.446784 1.652632 \n",
"46 0.278875 0.212564 1.339063 \n",
"45 0.158741 0.373431 1.339063 \n",
"44 0.059372 0.998439 1.246806 \n",
"30 0.278875 0.998172 1.246473 \n",
"31 0.800797 0.347610 1.246473 \n",
"26 0.050519 0.996330 1.244173 \n",
"17 0.054922 0.995781 1.243487 \n",
"6 0.098628 0.992011 1.238780 \n",
"7 0.800797 0.122178 1.238780 \n",
"0 0.055525 0.989149 1.235205 \n",
"22 0.059047 0.984301 1.229152 \n",
"43 0.278365 0.212954 1.197469 \n",
"47 0.177836 0.333333 1.197469 \n",
"15 0.177836 0.333855 1.197149 \n",
"14 0.278875 0.212897 1.197149 \n",
"36 0.177836 0.892624 1.114670 \n",
"35 0.800797 0.198229 1.114670 \n",
"33 0.100111 0.884259 1.104224 \n",
"34 0.800797 0.110545 1.104224 \n",
"39 0.087690 0.822939 1.027649 \n",
"41 0.800797 0.165528 1.026276 \n",
"42 0.161290 0.821839 1.026276 \n",
"3 0.116194 0.819306 1.023113 \n",
"2 0.800797 0.118879 1.023113 \n",
"29 0.088710 0.807732 1.008661 \n",
"49 0.800797 0.109503 1.005381 \n",
"48 0.108917 0.805106 1.005381 \n",
"27 0.079857 0.804411 1.004513 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# get association rules\n",
"from mlxtend.frequent_patterns import association_rules\n",
"r = association_rules(a, metric=\"confidence\", min_threshold=0.1)\n",
"r = r[r['lift']>1].sort_values(by=['lift'], ascending=False) # only regard when its not a coincidence (lift>1)\n",
"print_full(r)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Some possible association rules:\n",
"\n",
"1. Amputation related injuries are more likely to occur for finger-related accidents\n",
"2. Amputation related injuries are more likely to occur in industry 33 (manufacturing, particularly in metals)\n",
"3. Fracture related injuries are more likely to occur in industry 23 (construction)\n",
"4. Amputation related injuries are more likely to be caused by shifting objects/equipment\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment