Skip to content

Instantly share code, notes, and snippets.

@cjauvin
Last active December 27, 2015 14:59
Show Gist options
  • Save cjauvin/7344531 to your computer and use it in GitHub Desktop.
Save cjauvin/7344531 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"scikit-learn's Random Forest with the Classic \"Adult\" UCI Dataset"
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"Christian Jauvin (cjauvin@gmail.com)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"My goal is to study the specifics of scikit-learn's RF implementation. We use the \"Adult\" dataset, a classic binary classification problem from the UCI Machine Learning Repository: predict wheter the income of individuals is below or above 50K$, according to some of their census attributes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" | This data was extracted from the census bureau database found at\n",
" | http://www.census.gov/ftp/pub/DES/www/welcome.html\n",
" | Donor: Ronny Kohavi and Barry Becker,\n",
" | Data Mining and Visualization\n",
" | Silicon Graphics.\n",
" | e-mail: ronnyk@sgi.com for questions.\n",
" | Split into train-test using MLC++ GenCVFiles (2/3, 1/3 random).\n",
" | 48842 instances, mix of continuous and discrete (train=32561, test=16281)\n",
" | 45222 if instances with unknown values are removed (train=30162, test=15060)\n",
" | Duplicate or conflicting instances : 6\n",
" | Class probabilities for adult.all file\n",
" | Probability for the label '>50K' : 23.93% / 24.78% (without unknowns)\n",
" | Probability for the label '<=50K' : 76.07% / 75.22% (without unknowns)\n",
" |\n",
" | Extraction was done by Barry Becker from the 1994 Census database. A set of\n",
" | reasonably clean records was extracted using the following conditions:\n",
" | ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0))\n",
" |\n",
" | Prediction task is to determine whether a person makes over 50K\n",
" | a year."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd\n",
"import numpy as np\n",
"from sklearn.ensemble import *\n",
"from sklearn.cross_validation import *\n",
"from sklearn.preprocessing import *\n",
"\n",
"wd = 'C:/Documents and Settings/e68321/Application Data/adult/data'\n",
"train = pd.io.parsers.read_csv(wd + '/adult.data')\n",
"test = pd.io.parsers.read_csv(wd + '/adult.test')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Some slight preprocessing of the data is required: remove whitespaces, unify the train and test class labels. Although there is an already defined train/test split in the original dataset, we will stack both parts and only consider the 3-fold cross-validation results over the full dataset from now on."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for c in train.columns.tolist():\n",
" if train[c].dtype == object:\n",
" train[c] = train[c].map(lambda s: s.strip())\n",
" test[c] = test[c].map(lambda s: s.strip())\n",
" if c == 'income':\n",
" test[c] = test[c].map(lambda s: s[:-1]) # remove \".\"\n",
" if c == 'native-country': continue\n",
" assert set(train[c]) == set(test[c])\n",
"\n",
"full = pd.concat((train, test))\n",
"print 'dataset size:', full.shape"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"dataset size: (48842, 15)\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that the features are a nice balance of numeric and categorical attributes."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for c in full.columns.tolist():\n",
" is_categorical = full[c].dtype == object\n",
" print '%s (%s): %s' % (c, 'categorical' if is_categorical else 'numeric', \n",
" ', '.join(set(full[c])) if is_categorical else '%s to %s' % (min(full[c]), max(full[c])))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"age (numeric): 17 to 90\n",
"workclass (categorical): Self-emp-inc, State-gov, Without-pay, Private, Local-gov, Self-emp-not-inc, Federal-gov, Never-worked, ?\n",
"fnlwgt (numeric): 12285 to 1490400\n",
"education (categorical): Masters, Prof-school, 12th, Assoc-voc, 1st-4th, Assoc-acdm, HS-grad, Bachelors, 9th, 5th-6th, Some-college, 11th, 10th, Doctorate, Preschool, 7th-8th\n",
"education-num (numeric): 1 to 16\n",
"marital-status (categorical): Separated, Widowed, Divorced, Married-spouse-absent, Never-married, Married-AF-spouse, Married-civ-spouse\n",
"occupation (categorical): Farming-fishing, Armed-Forces, Craft-repair, Other-service, Transport-moving, Prof-specialty, Sales, Exec-managerial, Handlers-cleaners, ?, Adm-clerical, Protective-serv, Tech-support, Priv-house-serv, Machine-op-inspct\n",
"relationship (categorical): Own-child, Wife, Unmarried, Other-relative, Husband, Not-in-family"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"race (categorical): Asian-Pac-Islander, Amer-Indian-Eskimo, White, Other, Black\n",
"sex (categorical): Male, Female\n",
"capital-gain (numeric): 0 to 99999\n",
"capital-loss (numeric): 0 to 4356\n",
"hours-per-week (numeric): 1 to 99"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"native-country (categorical): Canada, Hong, Dominican-Republic, Italy, Ireland, Outlying-US(Guam-USVI-etc), Scotland, Cambodia, France, Peru, Laos, Ecuador, Iran, Cuba, Guatemala, Germany, Thailand, Haiti, Poland, ?, Holand-Netherlands, Philippines, Vietnam, Hungary, England, South, Jamaica, Honduras, Portugal, Mexico, El-Salvador, India, Puerto-Rico, China, Yugoslavia, United-States, Trinadad&Tobago, Greece, Japan, Taiwan, Nicaragua, Columbia\n",
"income (categorical): <=50K, >50K\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To be fed into the sklearn implementation of RFs, the categorical features must be numerically encoded."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for c in [c for c in full.columns.tolist() if full[c].dtype == object]:\n",
" full[c] = LabelEncoder().fit_transform(full[c])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's easy to match the ~85% accuracy reported in the dataset documentation, with a vanilla RF, and no hyperparameter tweaking whatsoever."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"X = full.iloc[:,:-1] # everything except last column\n",
"y = full.iloc[:,-1] # last column\n",
"\n",
"clf = RandomForestClassifier()\n",
"print 'accuracy:', np.mean(cross_val_score(clf, X, y, scoring='accuracy'))\n",
"print 'AUC:', np.mean(cross_val_score(clf, X, y, scoring='roc_auc'))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"accuracy: "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.850067533235\n",
"AUC: "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.882580780957\n"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's possible to slightly increase the performance with a higher number of trees."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"clf = RandomForestClassifier(n_estimators=50)\n",
"print 'accuracy:', np.mean(cross_val_score(clf, X, y, scoring='accuracy'))\n",
"print 'AUC:', np.mean(cross_val_score(clf, X, y, scoring='roc_auc'))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"accuracy: "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.854449031674\n",
"AUC: "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.900744507122\n"
]
}
],
"prompt_number": 65
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The CART algorithm (on which this particular implementation of RF rests) actually treats the categorical variables as ordinal: for a feature $x = \\{a,b,c,d\\}$ for instance, the possible (binary) splits can only be $\\{a\\}/\\{b,c,d\\}, \\{a,b\\}/\\{c,d\\}$ and $\\{a,b,c\\}/\\{d\\}$, because the values are implicitly ordered, meaning that $\\{a,c\\}/\\{b,d\\}$ would not be possible. In theory, one way to mitigate this is to use one-hot encoding.. but in practice, it doesn't really seem to help."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"X = np.asarray(full)\n",
"ohe = OneHotEncoder(categorical_features=[i for i, c in enumerate(train.columns.tolist()[:-1]) if train[c].dtype == object])\n",
"X = ohe.fit_transform(X[:,:-1]).todense()\n",
"\n",
"print 'features:', X.shape\n",
"print 'accuracy:', np.mean(cross_val_score(clf, X, y, scoring='accuracy'))\n",
"print 'AUC:', np.mean(cross_val_score(clf, X, y, scoring='roc_auc'))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"features: (48842, 108)\n",
"accuracy: "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.849330518015\n",
"AUC: "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.878599687512\n"
]
}
],
"prompt_number": 64
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A nice aspect of an RF model is that it provides a direct way to assess the importance of features (in terms of their predictive power)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"clf = RandomForestClassifier()\n",
"X = full.iloc[:,:-1] # back to 15 features (i.e. non one-hot encoded)\n",
"clf.fit(X, y)\n",
"fi = clf.feature_importances_\n",
"fic = sorted(zip(fi, train.columns.tolist()[:-1]))\n",
"pylab.rcParams['figure.figsize'] = 12, 8\n",
"barh(range(len(fi)), [v for v, c in fic], align='center')\n",
"yticks(range(len(fi)), [c for v, c in fic]);"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAwcAAAHaCAYAAABVZzWXAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xl01fWB///nJWxVQgmOHYaZfgXsAQNkJQQIBG6kLC6g\nBWFcyiIuQ3WQaRWrVmki1aMVZyq0g4oOQaBTtiJgpTpSr0UpAiEsCsIADbZ1cCFAAINmub8/CPdH\nJCzCzcrzcc495/O5+Xzen/d9HzT3lfcWCIfDYSRJkiRd8BrVdgUkSZIk1Q2GA0mSJEmA4UCSJElS\nBcOBJEmSJMBwIEmSJKlC49quQEOXnJzMpk2barsakiRJasCSkpLYuHHjeZcTcCnT6hUIBLCJVRdl\nZ2eTnZ1d29WQquS/T9VV/ttUXRWt75wOK5IkSZIEGA4kSZIkVTAcSBeoYDBY21WQTsl/n6qr/Lep\nhs45B9XMOQeSJEmqbs45kCRJkhRVhgNJkiRJgOFAkiRJUgXDgSRJkiTAcCBJkiSpguFAkiRJEmA4\nkCRJklTBcCBJkiQJMBxIkiRJqmA4kCRJkgQYDiRJkiRVaFzbFbgQBAKB2q6CJElSgxcbG0dRUWFt\nV6NeC4TD4XBtV6IhOxYMbGJJkqTqF+BC/WobCETnszusSJIkSRJgOJAkSZJUwXAgSZIkCTAcSJIk\nSapgOJAkSZIENOBwMG3aNDp37syoUaNOeU2LFi3O+zmzZ8/m//7v/867HEmSJKm2Ndh9DmbMmMHK\nlStp27btKa+Jxv4Dubm5dO3alX/4h38477IkSZKk2tQgew7Gjx/P7t27GTx4MK1ateK2224jKyuL\nyy+/nOnTp590/d13383y5csB+N73vsdtt90GwH/913/x8MMPAzBlyhSuuOIKMjMzufnmm3n66adZ\nvHgx69ev55ZbbiE1NZWjR4/W3IeUJEmSoqxBhoNnn32Wtm3bEgqF+OEPf8j27dt5/fXXWbt2LTk5\nOZSVlVW6vm/fvqxatQqAv/3tb2zbtg2AVatW0a9fP9atW8dvf/tbNm/ezIoVK1i/fj2BQIDhw4eT\nlpbGr3/9azZs2EDz5s1r/LNKkiRJ0dJghxUBkV3irrnmGpo0acIll1zCt771LT7++ONKw4369OnD\nL37xC7Zt20aXLl04cOAAe/fuZc2aNUyfPp0XXniB66+/nqZNm9K0aVOGDBlS5XNOLfuE42DFS5Ik\nSTo3oVCIUCgU9XIbdDg4rmnTppHjmJgYSktLK/38H//xHzlw4AC///3v6du3L4WFhcyfP5/Y2Fha\ntGhx0nbUXw0DZ567kH2+H0GSJEmKCAaDBIPByHlOTk5Uym2Qw4rORc+ePfnFL35Bv379yMzMZOrU\nqWRmZgLQu3dvli9fzhdffMHhw4f53e9+F7kvNjaWoqKi2qq2JEmSFDUNNhyc+Nf8U/1l/8T3MzMz\nKSsro0OHDqSkpLB///5IOEhLS2Po0KEkJiZy9dVXk5CQwDe/+U0Axo4dy/jx452QLEmSpHovED7z\ngHkBR44c4eKLL+bzzz+nX79+zJw5k+Tk5DPedyyA2MSSJEnVL3AWc0Ebpq8Ogz9XF8Scg2i48847\n2bp1K0ePHmXs2LFnFQwkSZKk+sSeg2pmz4EkSVJNsefgfDXYOQeSJEmSvh7DgSRJkiTAcCBJkiSp\nguFAkiRJEuBqRTXkTDsoS5Ik6XzFxsbVdhXqPcNBDbhQZ81LkiSpfnFYkSRJkiTAcCBJkiSpguFA\nkiRJEmA4kCRJklTBCck1IBBwtSJJkqIhNjaOoqLC2q6G1GAFwi6lU62OBQObWJKk6Ai4CqBUhUAg\nOv9tOKxIkiRJEmA4kCRJklTBcCBJkiQJMBxIkiRJqmA4kCRJkgQYDiRJkiRVMBxIkiRJAgwHlXzv\ne98jLS2Nrl27MnPmTABefPFFOnXqRI8ePbjjjjuYMGECAJ9++ik33HAD6enppKens3r16tqsuiRJ\nknTe3ATtBPv37ycuLo7i4mLS09N57bXX6N27N/n5+bRo0YIrr7yS5ORkpk2bxs0338zdd99N7969\n+fDDDxk8eDBbt249qUw3QZMkKZrcBE2qSrQ2QWschbo0GM888wwvv/wyAH/5y1+YM2cOwWCQVq1a\nATBixAh27NgBwBtvvMG2bdsi9x46dIjPP/+ciy66qIqSs084Dla8JEmSpHMTCoUIhUJRL9dwUCEU\nCrFy5UrWrFlD8+bNycrK4oorrqgUAMLhcEVPwLHjd999l6ZNm55F6dnVU2lJkiRdkILBIMFgMHKe\nk5MTlXKdc1ChqKiIuLg4mjdvzgcffMCaNWs4cuQIb731FgcOHKC0tJTFixdHrh84cCDTpk2LnG/c\nuLE2qi1JkiRFjeGgwuDBgyktLaVz5848+OCD9OrVi3/6p3/ioYceIj09nT59+tC+fXtatmwJwLRp\n01i/fj1JSUl06dKF559/vpY/gSRJknR+nJB8BkeOHOHiiy+mtLSUYcOGcdttt3Hddded9f1OSJYk\nKZqckCxVJVoTku05OIPs7GxSUlJISEigQ4cOXysYSJIkSfWJPQfVzJ4DSZKiyZ4DqSr2HEiSJEmK\nKsOBJEmSJMBwIEmSJKmC4UCSJEkS4A7JNSRQ2xWQJKlBiI2Nq+0qSA2a4aAGuKqCJEmS6gOHFUmS\nJEkCDAeSJEmSKhgOJEmSJAGGA0mSJEkVnJBcAwIBVyuSJEVHbGwcRUWFtV0NSQ1UIOxSOtXqWDCw\niSVJ0RJwFTxJJwkEovP/BocVSZIkSQIMB5IkSZIqGA4kSZIkAYYDSZIkSRUMB5IkSZKAehYOPvro\nI0aMGAHApk2bWLFixRnvCYVCDBkyJCrPz8vLY+LEiVEpS5IkSapr6tU+B23btmXhwoUA5Ofnk5eX\nx1VXXVVjz+/WrRvdunWrsedJkiRJNalGew5eeuklkpKSSE5OZvTo0bzyyiv07NmT1NRUBgwYwCef\nfAJAdnY2o0aNIiMjg44dO/LCCy8AUFBQQEJCAiUlJUyePJn58+eTkpLCggULWLduHRkZGaSmptK7\nd2927Nhxxvq8+uqrxMfHk5aWxj333BPpYVi7dm2VZZ3YC5Gdnc24cePIysri8ssvZ/r06dXRZJIk\nSVKNqbGeg/fff5/HHnuMP/3pT7Ru3Zr9+/cTCARYs2YNAC+88AI///nPmTp1KgDvvfcea9as4fDh\nw6SkpHDttddGymrSpAlTpkwhLy+PadOmAXDo0CFWrVpFTEwMb7zxBg899BCLFi06ZX2OHj3K+PHj\nWbVqFZdddhk333xzZCfj+Pj4syprx44dvPnmmxQVFdGpUyfuuusuYmJiotZmkiRJUk2qsXDwhz/8\ngZEjR9K6dWsA4uLi2LJlCyNHjmTv3r18+eWXdOjQATi2w9t1111Hs2bNaNasGVlZWbz77rskJSVF\nyguHw5V2gTtw4ACjR49m586dBAIBSkpKTlufDz74gA4dOnDZZZcBcNNNN/H888+fdVmBQIBrrrmG\nJk2acMkll/Ctb32Ljz/+mLZt21bxtOwTjoMVL0mSJOnchEIhQqFQ1MutsXBQ1ZbOEyZM4L777uPa\na6/lrbfeIjs7+5T3N2p0+hFQjzzyCP3792fJkiXs2bOHYDB40jWDBg3ik08+oXv37tx9992VfnZi\n3c6mLICmTZtGjmNiYigtLT1F7bJPW3dJkiTp6wgGg5W+o+bk5ESl3Bqbc3DllVeycOFCCgsLASgs\nLKSoqCjyl/bc3NzIteFwmKVLl/LFF1+wb98+QqEQ3bt3r1Rey5YtOXToUOT8xLJmzZpVZR1ee+01\n8vPzef755+nYsSO7d+9mz549AMyfPz8yrOhsyvpq0JEkSZLquxoLB507d+YnP/kJ/fr1Izk5mXvv\nvZfs7GxGjBhBWloal156aeTLeSAQIDExkaysLHr16sXkyZNp06ZN5GcAWVlZbN26NTIh+f777+fB\nBx8kNTWVsrKyyHUn3nOib3zjG/znf/4ngwcPJi0tjZYtW9KyZUuAsyorEAhUWa4kSZJUXwXCdfBP\n4Dk5ObRo0YJ77723Wp9z5MgRLr74YgDuvvtuOnbsGPV9DI4FiDrXxJKkeuvkYbqSVNUQ/nNRZzdB\nq4m/ys+cOZOUlBS6dOlCUVER//Iv/1Ltz5QkSZLqqjrZc9CQ2HMgSYouew4knazB9xxIkiRJqlmG\nA0mSJEmA4UCSJElSBcOBJEmSJKAGd0i+sLkfgiQpOmJj42q7CpIaMMNBDXBVCUmSJNUHDiuSJEmS\nBBgOJEmSJFUwHEiSJEkCDAeSJEmSKjghuQYEAq5WJEn1UWxsHEVFhbVdDUmqMYGwS+lUq2PBwCaW\npPop4IpzkuqFQCA6/79yWJEkSZIkwHAgSZIkqYLhQJIkSRJgOJAkSZJUwXAgSZIkCaiD4aBFixan\n/fnBgweZMWNG5Pyjjz5ixIgRUa1DMBgkLy/vpPfz8vKYOHFiVJ8lSZIk1RW1Eg7C4fApl1o6054A\n+/fv5z//8z8j523btmXhwoVRrd+p6tCtWzeeeeaZqD5LkiRJqitqLBwUFBTQqVMnxowZQ0JCAlOm\nTCE9PZ2kpCSys7NPuv7w4cN897vfpVu3biQmJrJs2TIAHnjgAXbt2kVKSgo//vGP2bNnD127dgXg\n6NGj3HrrrSQmJpKamkooFAIgNzeXYcOGcdVVV9GxY0d+/OMfA1BWVsbYsWNJSEggMTGx0hf/hQsX\n0qNHDzp16sTbb78NQCgUYsiQIQBkZ2czatQoMjIy6NixIy+88EJ1NZ0kSZJUI2p0h+SdO3cyZ84c\nDh48yKJFi1i7di3l5eUMHTqUVatWkZmZGbn2G9/4BkuWLCE2NpbPPvuMXr16MXToUJ588knef/99\n8vPzgWOh4/hf+n/1q18RExPD5s2b2b59OwMHDmTHjh0AbNq0iY0bN9K0aVM6derEhAkT+Pjjj/no\no4/YsmULAEVFRZHnl5WV8e6777JixQpycnL4n//5n5M+z3vvvceaNWs4fPgwKSkpXHPNNfzDP/xD\ntbWfJEmSVJ1qNBxcdtllpKenc9999/H666+TkpICwJEjR9i5c2elcFBeXs6DDz7IqlWraNSoER99\n9BGffPLJaXd+e+edd7jnnnsA6NSpE5dddhk7duwgEAjQv39/YmNjAejcuTMffvghnTt3Zvfu3dxz\nzz1cc801DBw4MFLWsGHDAEhNTaWgoOCkZwUCAa677jqaNWtGs2bNyMrKYu3atVx33XVV1Cz7hONg\nxUuSJEk6N6FQKDJKJppqNBxcfPHFkeMHH3yQO++885TXzps3j88++4wNGzYQExND+/btOXr06Bmf\ncarw0KxZs8hxTEwMpaWltGrVik2bNvHaa6/x7LPPsmDBAl588cVK1x+/9mw0anSqUVrZZ3W/JEmS\ndDaCwSDBYDBynpOTE5Vya2VC8qBBg/iv//ovjhw5AsDf/vY3Pv3000rXFBUV8a1vfYuYmBjefPNN\n9uzZA0BsbCyHDh2qstzMzEzmzZsHwI4dO/jwww+54oorqgwM4XCYffv2UVZWxrBhw5gyZUpkqNLZ\nCIfDLF26lC+++IJ9+/YRCoXo3r37Wd8vSZIk1TU12nNwfG7AgAED2LZtG7169QKOLV86b948Lr30\n0sg1t9xyC0OGDCExMZG0tDTi4+MBuOSSS+jduzcJCQlcffXV3HXXXZF77rrrLn7wgx+QmJhI48aN\nmT17Nk2aNCEQCJy0AlEgEOBvf/sbt956K+Xl5QA88cQTp633iceBQIDExESysrL47LPPmDx5Mm3a\ntIlWU0mSJEk1LhA+3SB+nVJOTg4tWrTg3nvvPe11x8KETSxJ9VPgtHPdJKmuCASi8/+rOrcJWn1y\npj0ZJEmSpPrEnoNqZs+BJNVn9hxIqh/sOZAkSZIUVYYDSZIkSYDhQJIkSVIFw4EkSZIkoIb3Obhw\nuaqRJNVHsbFxtV0FSapRhoMa4EoXkiRJqg8cViRJkiQJMBxIkiRJqmA4kCRJkgQYDiRJkiRVcEJy\nDQgEXK1IUvTFxsZRVFRY29WQJDUggbBL6VSrY8HAJpZUHQKuhiZJAo5954zG7wSHFUmSJEkCDAeS\nJEmSKhgOJEmSJAGGA0mSJEkVDAeSJEmSgCiGg9zcXCZMmBCt4gBYunQp27Zti5z/9Kc/ZeXKlVF9\nhiRJkqRj6nTPwZIlS9i6dWvkPCcnh/79+9dijSRJkqSG66zDwdy5c+nRowcpKSmMHz+e8vJyZs2a\nRadOnejRowerV6+OXDt27FgWL14cOW/RokXk+MknnyQxMZHk5GQeeughAGbOnEl6ejrJycnccMMN\nFBcXs3r1apYvX86kSZNITU1l9+7dlcpduXIlqampJCYmctttt/Hll18C0K5dO7Kzs+nWrRuJiYls\n3769ys9zquuys7N5+umnI9d17dqVDz/8kIKCAq644gpuvfVWOnXqxC233MLrr79O79696dixI+vW\nrTvbppQkSZLqpLMKB9u2bWPBggWsXr2a/Px8YmJimDNnDtnZ2axevZq3336brVu3RnYC/uqOwMfP\nV6xYwbJly1i7di0bN25k0qRJAAwfPjzyXnx8PC+++CIZGRkMHTqUqVOnsmHDBjp06EAgECAQCHD0\n6FFuvfVWFixYwObNmyktLWXGjBmRZ1166aXk5eXxgx/8gKlTp1b5mU513anqDrBr1y7uu+8+Pvjg\nA7Zv3878+fN55513mDp1Ko8//vjZNKUkSZJUZzU+m4tWrlxJXl4eaWlpAJG/7GdlZXHJJZcA8M//\n/M/s2LHjtOW88cYbjBs3jubNmwMQFxcHwJYtW3j44Yc5ePAghw8fZvDgwZF7vrrTWzgcZvv27bRv\n357vfOc7AIwZM4Zf/epXTJw4EYBhw4YBkJqaym9/+9tT1udsrzuuffv2dOnSBYAuXbrw3e9+FzjW\nu1BQUHCaO7NPOA5WvCRJkqRzEwqFCIVCUS/3rMIBHPsCfuJfx5cuXVrpC/WJX+IbN25MeXk5AOXl\n5ZEhP6fa1nns2LEsW7aMhIQEZs+eXemDfvUv+VW9Fw6HK73XrFkzAGJiYigtLQVg0KBBfPLJJ3Tv\n3p3nn3/+lNedWHeAo0ePnlQuQKNGjWjatGnk+Pj9Vcs+zc8kSZKkrycYDBIMBiPnOTk5USn3rIYV\n9e/fn0WLFvHpp58CUFhYSHJyMm+99RaFhYWUlJSwcOHCyBf0du3akZeXB8CyZcsoKSkBYMCAAcya\nNYvi4mIA9u/fD8Dhw4dp06YNJSUlzJ07N1JObGwsRUVFleoSCATo1KkTBQUF7Nq1C4A5c+bQr1+/\n036G1157jfz8/EgwOJV27dqxYcMGADZs2MCf//zns2kiSZIkqd47q3AQHx/Pz372MwYOHEhSUhID\nBw5k7969ZGdn06tXL/r06RMZbgNwxx138NZbb5GcnMyaNWsiE5IHDRrE0KFDSUtLIyUlJTLxd8qU\nKfTo0YM+ffoQHx8fKefGG2/kqaeeolu3buzevTvyfrNmzZg1axYjRowgMTGRxo0bM378eKByr8Lx\nOQpVOdV1w4cPp7CwkK5du/KrX/2KTp06VXlPVWVIkiRJ9VkgXNU4H0XNsdBgE0uqDlUP1ZQkXXhO\nNXz/66rT+xxIkiRJqjmGA0mSJEmA4UCSJElSBcOBJEmSJMBwIEmSJKmC4UCSJEkS8DV2SNb5cA8E\nSdEXGxtX21WQJDUwhoMa4DrkkiRJqg8cViRJkiQJMBxIkiRJqmA4kCRJkgQYDiRJkiRVcEJyDQgE\nXK1I0tmLjY2jqKiwtqshSboABcIupVOtjgUDm1jS1xFwlTNJ0tcSCETnd4fDiiRJkiQBhgNJkiRJ\nFQwHkiRJkgDDgSRJkqQKhgNJkiRJwBnCQUFBAQkJCTVVl3olOzubp59+urarIUmSJEVNjfcclJaW\n1shzysvLq7V89y6QJElSQ3PGcFBWVsadd95J165dGTRoEEePHmXjxo307NmTpKQkhg0bxoEDBwAI\nBoPk5eUB8Nlnn9G+fXsAcnNzGTp0KP3792fAgAHs3buXvn37kpKSQkJCAm+//fZJz83NzeW6664j\nKyuLjh078uijj0Z+NnfuXHr06EFKSgrjx4+PBIEWLVpw3333kZyczJo1ayLXr1u3juHDhwOwdOlS\nLrroIkpLSzl69CiXX345ALt27eKqq64iLS2Nvn37sn37dgA+/fRTbrjhBtLT00lPT2f16tWRco8H\nhJkzZ3L11Vdz9OjRs213SZIkqc45Yzj43//9X/71X/+V9957j1atWrF48WLGjBnDU089xaZNm0hI\nSCAnJwc49mX5VH9Rz8/PZ/Hixbz55pvMmzePwYMHk5+fz+bNm0lOTq7ynnXr1vHb3/6WzZs3s3Dh\nQvLy8ti2bRsLFixg9erV5Ofn06hRI+bNmwfA559/Ts+ePdm4cSMZGRmRclJSUti4cSMAq1atIiEh\ngbVr1/Luu+/Ss2dPAO68806mT5/O+vXreeqpp7jrrrsAmDhxIj/84Q9Zu3YtixYt4vbbb4+UGw6H\n+eUvf8mrr77K0qVLad68+RkbXJIkSaqrGp/pgvbt25OYmAhAt27d2LVrFwcOHCAzMxOAMWPGMGLE\niDM+aODAgbRq1QqA9PR0xo0bR0lJCddffz1JSUmnvCcuLg6AYcOG8fbbbxMTE0NeXh5paWkAFBcX\n06ZNGwBiYmIiPQSVPmTjxlx++eV88MEHrFu3jh/96Ef88Y9/pKysjMzMTI4cOcLq1asrfY4vv/wS\ngDfeeINt27ZF3j906BBHjhwhHA7z0ksv8e1vf5ulS5cSExNzmk+ffcJxsOIlSZIknZtQKEQoFIp6\nuWcMB82aNYscx8TERIYQHXfiNs2NGzeODPH56hCbiy66KHKcmZnJqlWreOWVVxg7diw/+tGPiI2N\njfRAvPDCCyf1QITD4ch7Y8aM4fHHHz+prs2bN49cM3jwYD7++GO6d+/O888/T9++fXn11Vdp0qQJ\n/fv3Z8yYMZSXlzN16lTKysqIi4sjPz//pDLD4TDvvvsuTZs2rfR+IBAgISGBTZs28Ze//IV27dpV\n3YBA5XAgSZIknZ9gMEgwGIycH/8efb6+9oTkb37zm7Ru3ToyT2DOnDmRirVr147169cDsGjRolOW\n8eGHH3LppZdy++23c/vtt5Ofn8/1119Pfn4++fn5dOvWjXA4zP/8z/+wf/9+iouLWbp0KX369KF/\n//4sWrSITz/9FIDCwkI+/PDDk57x+9//nvz8fJ5//nngWCD5xS9+QUZGBn/3d3/Hvn372LFjB126\ndKFly5a0b98+UudwOMzmzZuBY70X06ZNi5S7adOmyHFKSgrPPvssQ4cO5f/+7/++blNKkiRJdcoZ\nw8FX/4IfCATIzc1l0qRJJCUlsXnzZiZPngzAfffdx4wZM0hNTWXfvn2Re786FyEUCpGcnExqaioL\nFixg4sSJVT43PT2d4cOHk5SUxA033EBqairx8fH87Gc/Y+DAgSQlJTFw4ED27t1bZV1PlJ6ezief\nfELfvn0BSEpKqrRM67x583jxxRdJTk6ma9euLFu2DIBp06axfv16kpKS6NKlC88991ylOvbu3Zup\nU6dyzTXXUFhYeKbmlCRJkuqsQPjEcUF1SG5uLnl5eUyfPr22q3JejgWWOtnEkuqsAHX0f82SpDoq\nEIjO7446u0Py6VY+kiRJkhR9dbbnoKGw50DS12fPgSTp62nwPQeSJEmSapbhQJIkSRJgOJAkSZJU\nwXAgSZIkCTiLHZIVDa66JOnsxcbG1XYVJEkXKMNBDXDVEUmSJNUHDiuSJEmSBBgOJEmSJFUwHEiS\nJEkCDAeSJEmSKjghuQYEAq5WJNWG2Ng4iooKa7sakiTVG4GwS+lUq2PBwCaWakfA1cIkSReEQCA6\nv/McViRJkiQJMBxIkiRJqmA4kCRJkgQYDiRJkiRVMBwAmzZtYsWKFZHz5cuX8+STT9ZijSRJkqSa\n52pFQG5uLnl5eUyfPj3qZbtakVSbXK1IknRhaBCrFf37v/87CQkJJCQk8MwzzwDw0ksvkZSURHJy\nMqNHjwbg448/5nvf+x7JyckkJyezZs0aCgoKSEhIiJQ1depUcnJyAAgGg/zbv/0bKSkpJCQksG7d\nOgDWrl1LRkYGqamp9O7dmx07dvDll18yefJk5s+fT0pKCgsWLCA3N5cJEyYAUFBQwJVXXklSUhLf\n/e53+ctf/gLA2LFjmThxIr179+byyy9n8eLFNdZukiRJUnWotU3Q8vLyyM3NZe3atZSXl9OjRw+6\nd+/OY489xp/+9Cdat27NgQMHALjnnnvIyspiyZIllJeXc/jwYQoLK29sFAgEIpuNBQIBiouLyc/P\nZ9WqVYwbN44tW7YQHx/PqlWriImJ4Y033uChhx5i0aJFTJkyhby8PKZNmwbA7NmzI+VOmDCBW2+9\nlVGjRjFr1izuuecelixZAsDevXt555132LZtG0OHDmX48OE10XSSJElStai1cPD2228zbNgwvvGN\nbwAwbNgw1q9fz8iRI2ndujUArVq1AuDNN99k7ty5ADRq1IiWLVueFA6ASl0pN910EwCZmZkUFRVR\nVFTEwYMHGT16NDt37iQQCFBaWhq571TdMGvWrOHll18G4Pvf/z73338/cCyAXH/99QDEx8fz8ccf\nn1+DSJIkSbWs1sLBV8dFHT8+1Zf0r77fuHFjysvLI+fFxcWRnoNTeeSRR+jfvz9Llixhz549BIPB\ns6rrqerUtGnTM15zTPYJx8GKlyRJknRuQqEQoVAo6uXW2pyDzMxMXn75ZYqLizly5Agvv/wyaWlp\nLFy4MNIrsH//fgD69+/PjBkzACgrK6OoqIi///u/55NPPqGwsJAvvviCV155JVJ2OBxm/vz5wLEe\nilatWtGyZUuKiopo27YtALNmzYpc37JlSw4dOlTp/uMyMjL4zW9+A8C8efPo27fvOXza7BNewXO4\nX5IkSfr/BYNBsrOzI69oqbVwkJKSwtixY0lPT6dnz57ccccdZGRk8JOf/IR+/fqRnJzMvffeC8Az\nzzzDm2+PtUL/AAAgAElEQVS+SWJiImlpaWzbto0mTZowefJk0tPTGThwIJ07d46UHQgEaN68Oamp\nqdx11128+OKLANx///08+OCDpKamUlZWFulpyMrKYuvWrZEJySfOX5g+fTqzZs0iKSmJefPmRSZO\nH39OVceSJElSfdQglzLNysri6aefJjU1tbar4lKmUq1yKVNJ0oWhQSxlKkmSJKnuaJA9B3WJPQdS\nbbLnQJJ0YbDnQJIkSVJUGQ4kSZIkAYYDSZIkSRUMB5IkSZIAw4EkSZKkCo1ruwIXBjdIk2pDbGxc\nbVdBkqR6xXBQA1xKUZIkSfWBw4okSZIkAYYDSZIkSRUMB5IkSZIA5xzUiEDACcnSuYqNjaOoqLC2\nqyFJ0gUhEHa2bLU6FgxsYuncBZzUL0nSGQQC0fl96bAiSZIkSYDhQJIkSVIFw4EkSZIkwHAgSZIk\nqYLhQJIkSRJQh8LBRx99xIgRIwDYtGkTK1asOOM9oVCIIUOGnPUzli5dyrZt26J2nSRJktSQ1Ilw\nUFpaStu2bVm4cCEA+fn5vPrqq1F/zpIlS9i6dWvUrpMkSZIakvMKBwUFBVxxxRXceuutdOrUiVtu\nuYXXX3+d3r1707FjR9atW8e6devIyMggNTWV3r17s2PHDgByc3MZOnQo/fv3Z8CAAezZs4eEhARK\nSkqYPHky8+fPJyUlhQULFpyyjNN54IEH6NKlC0lJSUyaNIk//elPLF++nEmTJpGamsru3buZOXMm\n6enpJCcnc8MNN1BcXMzq1atPui4YDJKXlwfAZ599Rvv27QF4//336dGjBykpKSQlJbFz587zaU5J\nkiSpVp33Dsm7du1i8eLFdO7cme7duzN//nzeeecdli1bxuOPP86cOXNYtWoVMTExvPHGGzz00EMs\nWrQIONZDsGXLFlq1akVBQQEATZo0YcqUKeTl5TFt2jQADh06dMoyqrJv3z5efvllPvjgAwCKiopo\n2bIlQ4cOZciQIQwbNgyAVq1acccddwDwyCOP8OKLL/Kv//qvJ10XCASq3OX42WefZeLEidx8882U\nlpZSWlp6vs0pSZIk1ZrzDgft27enS5cuAHTp0oXvfve7AHTt2pWCggIOHDjAqFGj2LlzJ4FAoNIX\n6IEDB9KqVauTygyHw5V2eDtw4ACjR4+OlFFSUnLaOrVq1YrmzZtz2223ce2113LttddWKvu4LVu2\n8PDDD3Pw4EEOHz7M4MGDq7zuVDIyMnjsscf461//yrBhw/jOd75zxnskSZKkuuq8w0GzZs0ix40a\nNaJp06aR49LSUh555BH69+/PkiVL2LNnD8FgMHL9RRdddFbPOF0Zxw0aNIhPPvmE7t278/zzz7N2\n7VpWrlzJokWL+OUvf8nKlSsBKvUAjB07lmXLlpGQkMDs2bMJhUKRn514XePGjSkvLwfg6NGjkfdv\nuukmevbsySuvvMLVV1/Nc889R1ZWVhWfIPuE42DFS5IkSTo3oVCo0nfXaDnvcHA64XCYoqIi2rZt\nC8CsWbPO6r6WLVty6NChyPnZlPHaa69Fjo8cOcKRI0e46qqryMjI4PLLLwcgNjaWoqKiyHWHDx+m\nTZs2lJSUMHfuXL797W9XeV27du1Yv349aWlplYYz7d69mw4dOjBhwgQ+/PBDtmzZchbhQJIkSTo/\nwWCw0h/Mc3JyolLuea9W9NWx+CeeN2rUiEmTJvHggw+SmppKWVlZ5OdVjeM/fp6VlcXWrVsjE5Lv\nv//+Ksuo6vlwbI7CkCFDSEpKIjMzk//4j/8A4MYbb+Spp56iW7du7N69mylTptCjRw/69OlDfHx8\n5P4Tr/vzn//Mfffdx4wZM0hNTWXfvn2RZy5YsICuXbuSkpLC+++/z+jRo8+nKSVJkqRaFQifzeB6\nnbNjQcImls5d4KzmAEmSdCELBKLz+7JO7HMgSZIkqfYZDiRJkiQBhgNJkiRJFQwHkiRJkgDDgSRJ\nkqQKhgNJkiRJgOFAkiRJUoVq3SFZx528UZuksxMbG1fbVZAk6YJhOKgBbuAkSZKk+sBhRZIkSZIA\nw4EkSZKkCoYDSZIkSYBzDmpEIOCE5AtZbGwcRUWFtV0NSZKkMwqEnS1brY4FA5v4whZwUrokSapW\ngUB0vm84rEiSJEkSYDiQJEmSVMFwIEmSJAkwHEiSJEmqYDiQJEmSBDTgcNCuXTsKC89u+cjs7Gye\nfvrpaq6RJEmSVLc1yHBQVlb2tfYWcB8CSZIkqQ6Gg6eeeorp06cD8MMf/pD+/fsD8Ic//IHvf//7\n/Pd//zeJiYkkJCTwwAMPRO5r0aIF9913H8nJyaxZsybyfnFxMVdddRUvvvgiAC+99BJJSUkkJycz\nZsyYk54/c+ZM0tPTSU5O5oYbbqC4uBiAhQsXkpCQQHJyMv369QPg/fffp0ePHqSkpJCUlMTOnTur\np1EkSZKkGlDnwkHfvn1ZtWoVAOvXr+fIkSOUlpayatUqOnbsyAMPPMCbb77Jxo0bWbduHUuXLgXg\n888/p2fPnmzcuJHevXsDcOjQIYYOHcott9zCbbfdxvvvv89jjz0Wuf+ZZ5456fnDhw9n7dq1bNy4\nkfj4+EiomDJlCq+//jobN25k+fLlADz33HNMnDiR/Px88vLy+Kd/+qeaaCJJkiSpWtS5cJCamkpe\nXh6HDh2iefPm9OrVi/Xr1/P222/TqlUrsrKyuOSSS4iJieGWW27hj3/8IwAxMTEMHz48Uk44HOa6\n665j3LhxfP/73weO9T6MHDmS1q1bA9CqVauTnr9lyxYyMzNJTExk3rx5bN26FYDevXszZswYXnjh\nBUpLSwHo1asXjz/+OD//+c8pKCigefPm1do2kiRJUnVqXNsV+KomTZrQvn17cnNzycjIIDExkT/8\n4Q/s3LmTdu3akZeXF7k2HA5H5gs0b9680tyBQCBAnz59WLFiBTfddFPkvVNtK3383rFjx7Js2TIS\nEhKYPXs2oVAIgBkzZrB27Vp+97vf0a1bN/Ly8rjpppvo2bMnr7zyCldffTXPPfccWVlZVZSefcJx\nsOIlSZIknZtQKBT5nhpNda7nACAzM5OpU6fSr18/MjMzefbZZ0lNTSU9PZ233nqLffv2UVZWxm9+\n85vI+P+qPProo8TFxXH33XcDcOWVV7Jw4cLIKkb79++PXHs8NBw+fJg2bdpQUlLC3LlzIz/ftWsX\n6enp5OTkcOmll/LXv/6VP//5z7Rr144JEyZw3XXXsWXLllPUJPuEV/Bcm0WSJEkCIBgMkp2dHXlF\nS50NB3v37qVXr15861vf4hvf+AaZmZm0adOGJ554gqysLJKTk0lLS2PIkCHAySsOHT9/5plnKC4u\n5oEHHqBz58785Cc/oV+/fiQnJ3PvvfeedP2UKVPo0aMHffr0IT4+PvL+/fffH5kI3bt3bxITE1mw\nYAEJCQmkpKTw/vvvM3r06JpoHkmSJKlaBMKnGmejqDgWLmziC9uph7NJkiRFw+mGz38ddbLnQJIk\nSVLNMxxIkiRJAgwHkiRJkioYDiRJkiQBhgNJkiRJFQwHkiRJkgDDgSRJkqQKjWu7AheGwJkvUYMV\nGxtX21WQJEk6K4aDGuAGWJIkSaoPHFYkSZIkCTAcSJIkSapgOJAkSZIEOOegRgQCTkhuaGJj4ygq\nKqztakiSJEVVIOxs2Wp1LBjYxA1PwInmkiSpzggEovPdxGFFkiRJkgDDgSRJkqQKhgNJkiRJgOFA\nkiRJUgXDgSRJkiSgnoaDjz76iBEjRgCwadMmVqxYccZ7QqEQQ4YMqfJnwWCQvLy8qNZRkiRJqm/q\nZTho27YtCxcuBCA/P59XX331vMoLBALuRSBJkqQLXq2Eg5deeomkpCSSk5MZPXo0r7zyCj179iQ1\nNZUBAwbwySefAJCdnc2oUaPIyMigY8eOvPDCCwAUFBSQkJBASUkJkydPZv78+aSkpLBgwQLWrVtH\nRkYGqamp9O7dmx07dnytuv33f/83iYmJJCQk8MADDwBQVlbG2LFjSUhIIDExkWeeeQaAadOm0aVL\nF5KSkrjpppui2EKSJElSzavxHZLff/99HnvsMf70pz/RunVr9u/fTyAQYM2aNQC88MIL/PznP2fq\n1KkAvPfee6xZs4bDhw+TkpLCtddeGymrSZMmTJkyhby8PKZNmwbAoUOHWLVqFTExMbzxxhs89NBD\nLFq06Kzq9tFHH/HAAw+wYcMGWrVqxcCBA1m6dCnf/va3+eijj9iyZQsARUVFADz55JMUFBTQpEmT\nyHuSJElSfVXj4eAPf/gDI0eOpHXr1gDExcWxZcsWRo4cyd69e/nyyy/p0KEDcGy4z3XXXUezZs1o\n1qwZWVlZvPvuuyQlJUXKC4fDlXaDO3DgAKNHj2bnzp0EAgFKSkrOql7hcJh169YRDAa55JJLALjl\nllv44x//yCOPPMLu3bu55557uOaaaxg4cCAAiYmJ3HzzzVx//fVcf/31pyk9+4TjYMVLkiRJOjeh\nUIhQKBT1cmt8WFFVWztPmDCBe+65h82bN/Pcc89RXFx8yvsbNTp9lR955BH69+/Pli1bWL58OUeP\nHj3pmkGDBpGSksKdd955Ut1OdLyerVq1YtOmTQSDQZ599lluv/12AH73u99x9913s2HDBrp3705Z\nWdkpapV9wit42vpLkiRJZxIMBsnOzo68oqXGw8GVV17JwoULKSwsBKCwsJCioiLatm0LQG5ubuTa\ncDjM0qVL+eKLL9i3bx+hUIju3btXKq9ly5YcOnQocn5iWbNmzaqyDq+99hr5+fk8//zzkfcCgQDp\n6em89dZb7Nu3j7KyMn7zm98QDAYj58OGDWPKlCls2LCBcDjMhx9+SDAY5IknnuDgwYMcOXIkKm0k\nSZIk1YYaH1bUuXNnfvKTn9CvXz9iYmJISUkhOzubESNGEBcXx5VXXsmePXuAY1/YExMTycrK4rPP\nPmPy5Mm0adOGgoKCyF/5s7KyeOKJJ0hJSeHBBx/k/vvvZ8yYMfzsZz/jmmuuqdQbcKYVidq0acMT\nTzxBVlYW4XCYa6+9liFDhrBp0ybGjRtHeXk5AE888QRlZWWMGjWKgwcPEg6HmThxIi1btqymVpMk\nSZKqXyD81TE+dUhOTg4tWrTg3nvvre2qnLNjgaTONrHO2cnD4yRJkmpLVUP3z0Wd3+fA/QckSZKk\nmlGnew4aAnsOGip7DiRJUt1xwfQcSJIkSaoZhgNJkiRJgOFAkiRJUgXDgSRJkiTAcCBJkiSpQo1v\ngnZhcjnWhiY2Nq62qyBJkhR1hoMa4JKXkiRJqg8cViRJkiQJMBxIkiRJqmA4kCRJkgQ456BGBAJO\nSK5PYmPjKCoqrO1qSJIk1bhA2Nmy1epYMLCJ65eAk8glSVK9EghE5/uLw4okSZIkAYYDSZIkSRUM\nB5IkSZIAw4EkSZKkCoYDSZIkSUA9DAe5ublMmDAhqmUuXbqUbdu2Rc5/+tOfsnLlyqg+Q5IkSarr\n6l04qA5Llixh69atkfOcnBz69+9fizWSJEmSal6dCwdz586lR48epKSkMH78eMrLy5k1axadOnWi\nR48erF69OnLt2LFjWbx4ceS8RYsWkeMnn3ySxMREkpOTeeihhwCYOXMm6enpJCcnc8MNN1BcXMzq\n1atZvnw5kyZNIjU1ld27d1cqd+XKlaSmppKYmMhtt93Gl19+CUC7du3Izs6mW7duJCYmsn379ppo\nHkmSJKna1KlwsG3bNhYsWMDq1avJz88nJiaGOXPmkJ2dzerVq3n77bfZunVrZMfhr+48fPx8xYoV\nLFu2jLVr17Jx40YmTZoEwPDhwyPvxcfH8+KLL5KRkcHQoUOZOnUqGzZsoEOHDgQCAQKBAEePHuXW\nW29lwYIFbN68mdLSUmbMmBF51qWXXkpeXh4/+MEPmDp1ag22lCRJkhR9jWu7AidauXIleXl5pKWl\nAUT+sp+VlcUll1wCwD//8z+zY8eO05bzxhtvMG7cOJo3bw5AXFwcAFu2bOHhhx/m4MGDHD58mMGD\nB0fu+eqOcuFwmO3bt9O+fXu+853vADBmzBh+9atfMXHiRACGDRsGQGpqKr/97W9PU6PsE46DFS9J\nkiTp3IRCIUKhUNTLrVPhAI59AX/88ccj50uXLq30xfvEL/GNGzemvLwcgPLy8siQn1NtHz127FiW\nLVtGQkICs2fPrtSgX+2FqOq9cDhc6b1mzZoBEBMTQ2lp6Wk+VfZpfiZJkiR9PcFgkGAwGDnPycmJ\nSrl1alhR//79WbRoEZ9++ikAhYWFJCcn89Zbb1FYWEhJSQkLFy6MfEFv164deXl5ACxbtoySkhIA\nBgwYwKxZsyguLgZg//79ABw+fJg2bdpQUlLC3LlzI+XExsZSVFRUqS6BQIBOnTpRUFDArl27AJgz\nZw79+vWr5laQJEmSakedCgfx8fH87Gc/Y+DAgSQlJTFw4ED27t1LdnY2vXr1ok+fPnTp0iVy/R13\n3MFbb71FcnIya9asiUxIHjRoEEOHDiUtLY2UlBSefvppAKZMmUKPHj3o06cP8fHxkXJuvPFGnnrq\nKbp168bu3bsj7zdr1oxZs2YxYsQIEhMTady4MePHjwcq9yocn6MgSZIk1WeBcFXjbxQ1x0KDTVy/\nVD0sTZIkqa461bD6r6tO9RxIkiRJqj2GA0mSJEmA4UCSJElSBcOBJEmSJMBwIEmSJKmC4UCSJEkS\nUAd3SG6Y3AOhPomNjavtKkiSJNUKw0ENcM18SZIk1QcOK5IkSZIEGA4kSZIkVTAcSJIkSQIMB5Ik\nSZIqOCG5BgQCDXO1otjYOIqKCmu7GpIkSYqSQNildKrVsWDQUJs44EpMkiRJdUAgEJ3vZQ4rkiRJ\nkgQYDiRJkiRVMBxIkiRJAgwHkiRJkioYDiRJkiQBNRAOli5dyrZt2yLnP/3pT1m5cmV1PzYqHn/8\n8dqugiRJklRjqn0p07FjxzJkyBCGDx9enY+pFrGxsRw6dOik94832dnsX+BSppIkSaputbaUaUFB\nAfHx8dx555107dqVQYMGcfToUWbOnEl6ejrJycnccMMNFBcXs3r1apYvX86kSZNITU1l9+7djB07\nlsWLF/Paa68xcuTISLmhUIghQ4YA8Prrr5ORkUG3bt0YOXIkR44cqbIuTz75JImJiSQnJ/Pggw8C\nsHHjRnr27ElSUhLDhg3jwIEDAASDQfLy8gD47LPPaN++PQC5ubkMGzaMq666io4dO/LjH/8YgAce\neIDi4mJSUlIYNWoUe/bsoVOnTowZM4aEhASmTJnCD3/4w0hdZs6cyY9+9KOv25ySJElS3RH+mv78\n5z+HGzduHN60aVM4HA6HR44cGZ47d2543759kWsefvjh8PTp08PhcDg8duzY8OLFiyM/O35eWloa\n/n//7/+FP//883A4HA6PHz8+PG/evPCnn34a7tu3b+T9J554Ivzoo4+eVI9XX301nJGRES4uLg6H\nw+Hw/v37w+FwOJyQkBD+4x//GA6Hw+HJkyeH/+3f/i0cDofDwWAwnJeXFw6Hw+FPP/003K5du3A4\nHA7PmjUr3KFDh3BRUVH46NGj4csuuyz817/+NRwOh8MtWrSo9LkbNWoUfvfdd8PhcDh8+PDh8OWX\nXx4uLS0Nh8PhcEZGRvi99947qZ5AGMIN9PW1//lIkiSpGkTre1njcwkU7du3JzExEYBu3bpRUFDA\nli1bePjhhzl48CCHDx9m8ODBJwaQk8qIiYlh8ODBLFu2jOHDh/Pqq68ydepU3nzzTbZu3UpGRgYA\nX375ZeT4RCtXrmTcuHE0b94cgFatWnHw4EEOHjxIZmYmAGPGjGHEiBFn/Dz9+/cnNjYWgM6dO7Nn\nzx7+8R//8aTrLrvsMtLT0wG4+OKLufLKK1m+fDlXXHEFJSUldOnS5RRPyD7hOFjxkiRJks5NKBQi\nFApFvdxzCgfNmjWLHMfExFBcXMytt97K0qVLSUhIYPbs2ZUqe6qx+TfeeCO//OUvad26Nd27d+fi\niy8GYMCAAfz617+udO3atWv5l3/5FwAeffRRoOrQcaITf964cWPKy8sBOHr06Gk/T2lpaZXlHa/f\ncbfffjuPPfYY8fHxjBs37jQ1yT5tPSVJkqSvIxgMEgwGI+c5OTlRKTdqqxUdPnyYNm3aUFJSwty5\ncyOBIDY2lqKiokrXHv/S3rdvXzZs2MDMmTO58cYbAejRowfvvPMOu3btAuDIkSP87//+L+np6eTn\n55Ofn8+QIUMYMGAAs2bNori4GID9+/fzzW9+k7i4ON5++20A5syZE2m0du3asX79egAWLVp0Vp+p\nSZMmpwwKAOnp6fz1r3/l17/+NTfddNNZlSlJkiTVVecUDqrqCXj00Ufp0aMHffr0IT4+PvL+jTfe\nyFNPPUW3bt3YvXt3pftjYmK49tpr+f3vf8+1114LwKWXXkpubi433XQTSUlJZGRksH379pOeN2jQ\nIIYOHUpaWhopKSk8/fTTAMyePZtJkyaRlJTE5s2bmTx5MgD33XcfM2bMIDU1lX379kXqEAgETtmz\nceedd5KYmMioUaNOed3IkSPp06cP3/zmN8+6/SRJkqS6qNqXMm3ohgwZwo9+9COysrKq/LlLmUqS\nJKm61dpSpjrmwIEDdOrUiYsuuuiUwUCSJEmqT+w5qGb2HEiSJKm62XMgSZIkKaoMB5IkSZIAw4Ek\nSZKkCoYDSZIkScA57pCsr6vqfRTqu9jYuNqugiRJkqLIcFADXNFHkiRJ9YHDiiRJkiQBhgNJkiRJ\nFQwHkiRJkgDDgSRJkqQKTkiuAYFAw1qtKDY2jqKiwtquhqT/r737Dc2q7v8A/r7MhTRNJSr8R4MU\nF6nTnBPRasHUtBDykeEDE4sISggUAp9Y9CRIIpCiZAWVDyZBJCxNgvYgRaZRaYwwQfFPTqufOtPK\ntq7fg67Euu9ut12bS/d6wWAXO5+z77n48N1573yvcwCgjxWKbqXTr/4IBtfbW1xwByYAgH+RQqFv\nzs8sKwIAAJIIBwAAQIlwAAAAJBEOAACAEuEAAABIIhwAAAAlwgEAAJBEOPiL8+fP56GHHsr06dMz\nderUbNmyJZ9//nnq6+tTW1ubBx98MO3t7Tl79myqq6tz4MCBJMmjjz6axsbGAR49AACUxxOSL7N9\n+/aMGzcuzc3NSZKOjo4sWrQoW7duzS233JKmpqasW7cujY2N2bhxYx577LGsXr06Z8+ezapVqwZ4\n9AAAUB7h4DLTpk3LmjVr8txzz+Xhhx/OqFGj8vXXX6ehoSFJ0tXVlbFjxyZJGhoasmXLljz99NPZ\nt2/fFfa8/rLv60tfAADQOy0tLWlpaenz/RaKffGc5evImTNn0tzcnE2bNuWBBx7Ixx9/nF27dv3H\ndr///nvuv//+HDlyJM3NzZkyZcp/3V+hUEhyvb3FffN4bgAA+kah0DfnZz5zcJkTJ05k2LBhWb58\nedasWZPW1tb88MMP2b17d5Lkt99+S1tbW5LklVdeyd13353Nmzdn5cqV6ezsHMihAwBA2Swrusz+\n/fuzdu3aDBkyJDfeeGNef/313HDDDZc+V9DZ2Zlnn302Q4cOTWNjY/bs2ZPKysrcd999efHFF7N+\n/fqBPgQAAOg1y4r6mWVFAAD0N8uKAACAPiUcAAAASYQDAACgRDgAAACSCAcAAECJcAAAACTxnIOr\npDDQA+hTI0aMHughAADQD4SDq8AzAQAAuBZYVgQAACQRDgAAgBLhAAAASCIcAAAAJT6QfBUUCtfW\n3YpGjBidjo7/G+hhAABwlRWKbqXTr/4IBtfaW1xwhyUAgGtIodA352+WFQEAAEmEAwAAoEQ4AAAA\nkggHAABAiXAAAAAkEQ4AAIAS4aCkWCy6fScAAIPaoA4Hhw8fzuTJk7NixYpMnTo1q1atyqxZszJl\nypSsX7/+0nZ79uzJ3LlzM3369MyePTvnz59PV1dX1q5dm7q6utTU1OTNN98cuAMBAIA+MOifkHzw\n4MG8++67qaury+nTpzN69Oh0dXWloaEh+/fvz+TJk7Ns2bJs2bIlM2fOzE8//ZRhw4alsbExo0aN\nSmtra3799dfMmzcvCxYsSFVV1UAfEgAA9MqgDwd33HFH6urqkiRNTU3ZtGlTOjs7c+LEibS1tSVJ\nxowZk5kzZyZJhg8fniTZsWNH9u/fn/fffz9J0tHRkYMHD/5DOFh/2ff1pS8AAOidlpaWtLS09Pl+\nB304qKysTJIcOnQoGzZsyN69ezNy5MisXLkyv/zySwqFwj/Wbty4MfPnz+/Gb1nfN4MFAIAk9fX1\nqa+vv/T6+eef75P9DurPHFyuo6MjlZWVufnmm3Py5Mls27YthUIhkydPzokTJ7J3794kyblz59LV\n1ZWFCxfmtddeS2dnZ5LkwIEDuXDhwkAeAgAAlGXQXzn488pATU1NZsyYkerq6kyYMCHz5s1LklRU\nVKSpqSnPPPNMfv7559x000355JNP8vjjj+fw4cO55557UiwWc9ttt+WDDz4YyEMBAICyFIru39mv\n/ggf19pbXHBbVwCAa0ih0Dfnb5YVAQAASYQDAACgRDgAAACSCAcAAECJcAAAACQRDgAAgJJB/5yD\nq+Ofn7L8bzRixOiBHgIAAANAOLgKPDMAAIBrgWVFAABAEuEAAAAoEQ4AAIAkwgEAAFAiHAAAAEmE\nAwAAoEQ4AAAAkggHAABAiXAAAAAkEQ4AAIAS4QAAAEgiHAAAACXCAQAAkEQ4AAAASoQDAAAgiXAA\nAACUCAcAAEAS4QAAACgRDgAAgCTCAQAAUCIcAAAASYQDAACgRDgAAACSCAcAAECJcAAAACQRDgAA\ngBLhAAAASCIcAAAAJcIBAACQRDgAAABKhAMAACCJcAAAAJQIBwAAQBLhAAAAKBEOAACAJMIBAABQ\nIhwAAABJhAMAAKBEOAAAAJIIBzBotbS0DPQQ4B/pT/6t9CbXO+EABil/4Pg305/8W+lNrnfCAQAA\nkOTRjXkAAAPwSURBVEQ4AAAASgrFYrE40IO4nk2fPj1fffXVQA8DAIDrWE1NTb788suy9yMcAAAA\nSSwrAgAASoQDAAAgiXAAAACUCAe9tH379lRXV2fSpEl56aWX/us2q1evzqRJk1JTU5MvvviiR7VQ\njnL6s6qqKtOmTcuMGTNSV1d3tYbMIHGl3vzmm28yZ86cDBs2LBs2bOhRLZSjnN40b9LfrtSfmzdv\nTk1NTaZNm5a5c+dm37593a79D0V6rLOzs3jnnXcWDx06VLx48WKxpqam2NbW9pdtmpubi4sWLSoW\ni8Xi7t27i7Nnz+52LZSjnP4sFovFqqqq4o8//nhVx8zg0J3ePHXqVHHPnj3FdevWFV9++eUe1UJv\nldObxaJ5k/7Vnf7ctWtX8cyZM8VisVjctm1bWeedrhz0QmtrayZOnJiqqqpUVFRk2bJl+fDDD/+y\nzdatW7NixYokyezZs3PmzJm0t7d3qxbK0dv+PHny5KWfF93EjH7Qnd689dZbU1tbm4qKih7XQm+V\n05t/Mm/SX7rTn3PmzMnIkSOT/PF3/dixY92u/TvhoBeOHz+eCRMmXHo9fvz4HD9+vFvbfPfdd1es\nhXKU059JUigU0tDQkNra2mzatOnqDJpBoTu92R+1cCXl9pd5k/7U0/5sbGzM4sWLe1WbJEPLHO+g\nVCgUurWd/yIwEMrtz88++yxjx47N999/n/nz56e6ujr33ntvXw6RQaq7vdnXtXAl5fbXzp07M2bM\nGPMm/aIn/fnpp5/mrbfeys6dO3tc+ydXDnph3LhxOXr06KXXR48ezfjx4//nNseOHcv48eO7VQvl\n6G1/jhs3LkkyduzYJH9cQn/kkUfS2tp6FUbNYFDO/GfupD+V219jxoxJYt6kf3S3P/ft25cnnngi\nW7duzejRo3tUeznhoBdqa2vz7bff5vDhw7l48WKampqyZMmSv2yzZMmSvPPOO0mS3bt3Z9SoUbn9\n9tu7VQvlKKc/L1y4kHPnziVJzp8/nx07dmTq1KlX/Ri4PvVk/vv7lS1zJ/2pnN40b9LfutOfR44c\nydKlS/Pee+9l4sSJPar9O8uKemHo0KHZuHFjFi5cmK6urqxatSp33XVX3njjjSTJk08+mcWLF+ej\njz7KxIkTU1lZmbfffvt/1kJfKac/29vbs3Tp0iRJZ2dnli9fngULFgzYsXB96U5vtre3Z9asWeno\n6MiQIUPy6quvpq2tLcOHDzd30m/K6c1Tp06ZN+lX3enPF154IadPn85TTz2VJKmoqEhra2uvzjsL\nRQvjAQCAWFYEAACUCAcAAEAS4QAAACgRDgAAgCTCAQAAUCIcAAAASYQDAACg5P8BmfLu3O6aLz4A\nAAAASUVORK5CYII=\n",
"text": [
"<matplotlib.figure.Figure at 0x714a550>"
]
}
],
"prompt_number": 19
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can easily verify this by looking at the performance of models trained with the worst and best set of 4 features (the results are not striking though, for some reason)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"clf = RandomForestClassifier(max_features=None) # the default is sqrt(n_features), here we want to use them all\n",
"worst = ['education', 'native-country', 'sex', 'race']\n",
"best = ['fnlwgt', 'age', 'capital-gain', 'relationship']\n",
"print 'accuracy (worst 4 features):', np.mean(cross_val_score(clf, X[worst], y, scoring='accuracy'))\n",
"print 'AUC (worst 4 features):', np.mean(cross_val_score(clf, X[worst], y, scoring='roc_auc'))\n",
"print 'accuracy (best 4 features):', np.mean(cross_val_score(clf, X[best], y, scoring='accuracy'))\n",
"print 'AUC (best 4 features):', np.mean(cross_val_score(clf, X[best], y, scoring='roc_auc'))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"accuracy (worst 4 features): "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.785123450754\n",
"AUC (worst 4 features): "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.758914590442\n",
"accuracy (best 4 features): "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.792535113887\n",
"AUC (best 4 features): "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.800357087358\n"
]
}
],
"prompt_number": 31
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment