Skip to content

Instantly share code, notes, and snippets.

@oztalha
Last active April 22, 2017 03:46
Show Gist options
  • Save oztalha/2edc8d12d710291acb5037794e4fa62e to your computer and use it in GitHub Desktop.
Save oztalha/2edc8d12d710291acb5037794e4fa62e to your computer and use it in GitHub Desktop.
B-Supporters.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {
"run_control": {
"frozen": false,
"read_only": false
}
},
"cell_type": "markdown",
"source": "## USE CASE\nOn behalf of candidate ‘B’ we have taken a poll (n=651) and we need to find the 50,000 most supportive voters for B. We have provided features on all the voters in the region and we have provided the (up to) five closest social-graph connections to each voter. These are your feature sets for building your models.\n\n## DELIVERABLE\n\n- Your model’s evaluation scores, one for a model utilizing demographic data alone, another for a model including demographic and connection data.\n- A sorted list (CSV format) of the 50,000 voters most likely to support candidate B derived from your machine learning model containing demographic and connection features. The file must contain two columns: the uid of the voter and the probability of that voter supporting B.\n- A write up detailing your methods, procedures, and evaluations. This document must include your thoughts on what you could do with this data if you worked with it daily (as opposed to this brief time). Specifically include how you decided to to turn connection data into features.\n- All the code (Python preferred, R, Scala etc) you used in this assignment. Using what you have given us and the source data, we must be able substantively to replicate your results.\n\n## DATA SETS\n\n**voters.csv** This file contains a subset of the demographic, psychographic, and behavioral variables we have on voters.\nField names:\n- uid – The id of the voter\n- party – the party of the voter\n- demo.age_range – the age_range band of the voter\n- demo.gender – the gender of the voter (m,f)\n- demo.marital_status – marital status (m,s)\n- demo.ethnicity – a letter code denoting the ethnicity of the voter\n- demo.children – a letter code denoting whether the voter lives with minor children (y=‘yes’, anything else means unknown)\n- demo.religion – a letter code denoting the religion of the voter (e.g, p=protestant)\n- demo.homeowner – a letter code denoting whether the home is owned or rented\n- last_4_primary_history – an integer describing how many times the voter voted in the last four primary elections\n- last_4_general_history – an integer describing how many times the voter voted in the last four general elections\n\n**polled.csv** The file contains a list of poll results.\nField names:\n- uid – the id of the poll respondent\n- choice – the candidate chosen by the respondent\n\n**connections.csv** This file contains a list of ~1.5 million social-graph connections (edges). The social graph is a directed graph.\nField names:\n- source – the id of a voter\n- sink – the id of a voter"
},
{
"metadata": {
"run_control": {
"frozen": false,
"read_only": false
}
},
"cell_type": "markdown",
"source": "## Data Exploration\n\nBoth A and B are Republican candidates and the given region is also highly Republican."
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2017-04-22T01:54:21.131805Z",
"end_time": "2017-04-22T01:54:23.097352Z"
},
"run_control": {
"frozen": false,
"read_only": false
},
"trusted": true
},
"cell_type": "code",
"source": "#read the data\nimport pandas as pd\nimport re #to strip 'demo.' and '_history' off attribute names\n\nvoters = pd.read_csv('voters.csv',index_col=0,na_values=['unknown'])\nvoters.columns = [re.sub(r'(demo\\.)?(\\w+?)(_history)?',r'\\2',c) for c in voters.columns]\n\npolled = pd.read_csv('polled.csv',index_col=0)",
"execution_count": 2,
"outputs": []
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2017-04-21T15:28:49.882821Z",
"start_time": "2017-04-21T11:28:49.733273-04:00"
},
"run_control": {
"frozen": false,
"read_only": false
},
"trusted": true
},
"cell_type": "code",
"source": "#Identified party preferences\npd.DataFrame({'# of voters':voters.party.value_counts(), 'normalized':voters.party.value_counts(normalize=True)})",
"execution_count": 2,
"outputs": [
{
"data": {
"text/html": "<div>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th># of voters</th>\n <th>normalized</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>republican</th>\n <td>192709</td>\n <td>0.543949</td>\n </tr>\n <tr>\n <th>other</th>\n <td>95494</td>\n <td>0.269545</td>\n </tr>\n <tr>\n <th>democrat</th>\n <td>66075</td>\n <td>0.186506</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " # of voters normalized\nrepublican 192709 0.543949\nother 95494 0.269545\ndemocrat 66075 0.186506"
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2017-04-21T15:28:51.069322Z",
"start_time": "2017-04-21T11:28:49.887154-04:00"
},
"run_control": {
"frozen": false,
"read_only": false
},
"trusted": true
},
"cell_type": "code",
"source": "# Cross tabulate poll choice and voters party\nprint('Polled.choice - voters.party crosstab shows the party of the polled by their candidate choice.')\nprint('Result: Both A and B are Republicans, and P(B|republican): {:.2f}'.format(len(polled[polled.choice=='B'])/len(polled)))\npd.crosstab(polled.choice,voters.party)",
"execution_count": 3,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "Polled.choice - voters.party crosstab shows the party of the polled by their candidate choice.\nResult: Both A and B are Republicans, and P(B|republican): 0.48\n"
},
{
"data": {
"text/html": "<div>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th>party</th>\n <th>republican</th>\n </tr>\n <tr>\n <th>choice</th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>A</th>\n <td>339</td>\n </tr>\n <tr>\n <th>B</th>\n <td>312</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": "party republican\nchoice \nA 339\nB 312"
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2017-04-22T01:56:20.677778Z",
"end_time": "2017-04-22T01:56:21.117551Z"
},
"run_control": {
"frozen": false,
"read_only": false
},
"trusted": true
},
"cell_type": "code",
"source": "#join the two tables\nfrom IPython.display import HTML\n\ndf = polled.join(voters, how='inner')\nif len(polled) == len(df): print('`voters` (dataset) includes everyone `polled`')\nHTML(df.head().to_html(index=False))",
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"text": "`voters` (dataset) includes everyone `polled`\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 3,
"data": {
"text/plain": "<IPython.core.display.HTML object>",
"text/html": "<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th>choice</th>\n <th>party</th>\n <th>age_range</th>\n <th>gender</th>\n <th>marital_status</th>\n <th>ethnicity</th>\n <th>children</th>\n <th>religion</th>\n <th>homeowner</th>\n <th>last_4_primary</th>\n <th>last_4_general</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <td>A</td>\n <td>republican</td>\n <td>4</td>\n <td>f</td>\n <td>m</td>\n <td>w</td>\n <td>y</td>\n <td>p</td>\n <td>y</td>\n <td>2</td>\n <td>2</td>\n </tr>\n <tr>\n <td>A</td>\n <td>republican</td>\n <td>5</td>\n <td>m</td>\n <td>s</td>\n <td>j</td>\n <td>u</td>\n <td>j</td>\n <td>n</td>\n <td>3</td>\n <td>3</td>\n </tr>\n <tr>\n <td>B</td>\n <td>republican</td>\n <td>3</td>\n <td>m</td>\n <td>s</td>\n <td>w</td>\n <td>u</td>\n <td>c</td>\n <td>y</td>\n <td>4</td>\n <td>4</td>\n </tr>\n <tr>\n <td>A</td>\n <td>republican</td>\n <td>5</td>\n <td>f</td>\n <td>m</td>\n <td>w</td>\n <td>u</td>\n <td>p</td>\n <td>y</td>\n <td>4</td>\n <td>4</td>\n </tr>\n <tr>\n <td>A</td>\n <td>republican</td>\n <td>4</td>\n <td>m</td>\n <td>m</td>\n <td>w</td>\n <td>y</td>\n <td>p</td>\n <td>y</td>\n <td>2</td>\n <td>4</td>\n </tr>\n </tbody>\n</table>"
},
"metadata": {}
}
]
},
{
"metadata": {
"run_control": {
"frozen": false,
"read_only": false
}
},
"cell_type": "markdown",
"source": "## Demographic Data Alone\n\n### Preprocessing\nBefore running a regressor on our data, we may want...\n\n- to convert categorical variable into dummy/indicator variables\n- to use an imputation transformer for completing missing values\n- to standardize our dataset (center to the mean and component wise scale to unit variance)\n- to generate polynomial and interaction features"
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2017-04-22T03:35:17.626153Z",
"end_time": "2017-04-22T03:35:17.662677Z"
},
"run_control": {
"frozen": false,
"read_only": false
},
"trusted": true
},
"cell_type": "code",
"source": "#preprocess the data\ny = df.loc[:,'choice'].map({'A':0,'B':1})\nX_train = df.drop(['choice','party'],axis=1)\n\nfrom sklearn.preprocessing import Imputer,scale,PolynomialFeatures\n# from sklearn.decomposition import PCA\nX_train = pd.get_dummies(X_train)\nX = Imputer(strategy='most_frequent').fit_transform(X_train)\n# X = PolynomialFeatures().fit_transform(X) #not helped\n# X = scale(X) #not helped\n# X = PCA().fit_transform(X) #not helped",
"execution_count": 77,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Model evaluation\nLet's try different statistical models to see which one predicts the prices best.\nTo evaluate the models...\n\n- train and test on different sets: 3-fold cross validation (train using 2/3 of X, and test on the remaining 1/3)\n- use `roc_auc` (computes Area Under the Curve (AUC) from prediction scores) as the metric"
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2017-04-22T03:35:19.113789Z",
"end_time": "2017-04-22T03:35:20.337505Z"
},
"run_control": {
"frozen": false,
"read_only": false
},
"trusted": true
},
"cell_type": "code",
"source": "#evaluate different models with their default parameters\nfrom sklearn.linear_model import LogisticRegression, SGDClassifier\nfrom sklearn.naive_bayes import GaussianNB\nfrom sklearn.ensemble import RandomForestClassifier, VotingClassifier\nfrom sklearn.tree import DecisionTreeClassifier\nfrom sklearn.svm import SVC\nfrom sklearn.neighbors import KNeighborsClassifier\nfrom sklearn.neural_network import MLPClassifier\nfrom sklearn.model_selection import cross_val_score\n\nmodels = [GaussianNB(), LogisticRegression(), RandomForestClassifier(), SVC(probability=True),\n DecisionTreeClassifier(), SGDClassifier(loss='modified_huber'), KNeighborsClassifier(),\n MLPClassifier()]\nmodels.append(VotingClassifier(list(enumerate(models)),voting='soft'))\nprint('3-fold CV AUC Scores from probability estimates')\nprint('-----------------------------------------------')\nfor model in models:\n scores = cross_val_score(model, X, y, cv=3, scoring='roc_auc') #average_precision\n print(\"{:25s} : {:.3f} (+/- {:.2f})\".format(model.__class__.__name__, scores.mean(), scores.std() * 2))",
"execution_count": 78,
"outputs": [
{
"output_type": "stream",
"text": "3-fold CV AUC Scores from probability estimates\n-----------------------------------------------\nGaussianNB : 0.552 (+/- 0.08)\nLogisticRegression : 0.548 (+/- 0.08)\nRandomForestClassifier : 0.511 (+/- 0.08)\nSVC : 0.530 (+/- 0.07)\nDecisionTreeClassifier : 0.494 (+/- 0.03)\nSGDClassifier : 0.538 (+/- 0.09)\nKNeighborsClassifier : 0.509 (+/- 0.07)\nMLPClassifier : 0.512 (+/- 0.04)\nVotingClassifier : 0.519 (+/- 0.04)\n",
"name": "stdout"
}
]
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2017-04-22T03:35:25.116769Z",
"end_time": "2017-04-22T03:35:25.733462Z"
},
"run_control": {
"frozen": false,
"read_only": false
},
"trusted": true
},
"cell_type": "code",
"source": "#get the prediction probabilities for a decent estimator\nmodel = LogisticRegression()\nmodel = model.fit(X, y)\n#only include republicans\nX = voters[voters['party'] == 'republican'].drop(['party'],axis=1)\nX = pd.get_dummies(X)\nX = X.loc[:,X.columns.isin(X_train.columns)]\nX_test = Imputer(strategy='most_frequent').fit_transform(X)\nres = model.predict_proba(X_test)",
"execution_count": 79,
"outputs": []
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2017-04-22T03:35:51.368135Z",
"end_time": "2017-04-22T03:35:51.738100Z"
},
"run_control": {
"frozen": false,
"read_only": false
},
"trusted": true
},
"cell_type": "code",
"source": "#get the predicted and plot B_proba\n%matplotlib inline\npredicted = pd.Series(res[:,1], index=X.index, name='B_proba')\n#don't forget to add the known B voters from the polled dataset\nmask = predicted.index.isin(polled[polled.choice=='B'].index)\npredicted[mask] = 1.0 #we're 100% sure of 319\nmask = predicted.index.isin(polled[polled.choice=='A'].index)\npredicted[mask] = 0.0 #we're 100% sure of 332\npredicted.hist();",
"execution_count": 80,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<matplotlib.figure.Figure at 0x14a7c35f8>",
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAD8CAYAAACcjGjIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAF2xJREFUeJzt3X+QXeV93/H318jYimIsYcwOI6kRmWxcEzTGsAPKeCbd\nWKlYcAfxh+mIIZFg1G6HYk/SatrK7R9qIZ4h7VBqeRwnqlEleYgxpXWlMSKqRuaO2w5g5ECQgTBa\nYwJrqSixhMKasV053/5xn3Vu9NzV3l3t3qvdfb9m7txzvuc55zyP9kifPT/uVWQmkiS1elevOyBJ\nuvAYDpKkiuEgSaoYDpKkiuEgSaoYDpKkiuEgSaoYDpKkiuEgSaos6nUHpuuyyy7LVatWTWvdH/7w\nhyxZsmRmO3SBc8zz30IbLzjmqfr2t7/9l5n5wU7aztlwWLVqFYcPH57Wuo1Gg8HBwZnt0AXOMc9/\nC2284JinKiL+vNO2XlaSJFUMB0lSxXCQJFUMB0lSxXCQJFUMB0lSxXCQJFUMB0lSxXCQJFXm7Cek\npckc+f5p7tz6eNf3+9r9n+j6PqWZ5pmDJKliOEiSKoaDJKliOEiSKpOGQ0R8KCKeb3n9VUT8TkRc\nGhEHI+JoeV9W2kdEbI+IkYh4ISKubdnWptL+aERsaqlfFxFHyjrbIyJmZ7iSpE5MGg6Z+UpmXpOZ\n1wDXAe8AXwO2Aocysx84VOYBbgL6y2sY+CJARFwKbANuAK4Hto0HSmkz3LLe0IyMTpI0LVO9rLQW\n+G5m/jmwHthd6ruBW8v0emBPNj0NLI2IK4AbgYOZeTIzTwEHgaGy7JLMfCozE9jTsi1JUg9MNRw2\nAF8p032ZeRygvF9e6suBN1rWGS21c9VH29QlST3S8YfgIuJi4BbgM5M1bVPLadTb9WGY5uUn+vr6\naDQak3SlvbGxsWmvO1ctxDH3LYYtq890fb+9+nNeiD9jxzx7pvIJ6ZuAP8nMN8v8mxFxRWYeL5eG\nTpT6KLCyZb0VwLFSHzyr3ij1FW3aVzJzB7ADYGBgIKf7/6j6/84uDJ9/eC8PHOn+lwC8dsdg1/cJ\nC/Nn7Jhnz1QuK93O31xSAtgHjD9xtAnY21LfWJ5aWgOcLpedDgDrImJZuRG9DjhQlr0dEWvKU0ob\nW7YlSeqBjn6tioifA/4+8E9ayvcDj0bEZuB14LZS3w/cDIzQfLLpLoDMPBkR9wHPlnb3ZubJMn03\nsAtYDDxRXpKkHukoHDLzHeADZ9V+QPPppbPbJnDPBNvZCexsUz8MXN1JXyRJs89PSEuSKoaDJKli\nOEiSKoaDJKliOEiSKoaDJKliOEiSKoaDJKliOEiSKoaDJKliOEiSKoaDJKliOEiSKoaDJKliOEiS\nKoaDJKliOEiSKoaDJKliOEiSKh2FQ0QsjYjHIuLPIuLliPjViLg0Ig5GxNHyvqy0jYjYHhEjEfFC\nRFzbsp1Npf3RiNjUUr8uIo6UdbZHRMz8UCVJner0zOFzwB9n5t8FPgK8DGwFDmVmP3CozAPcBPSX\n1zDwRYCIuBTYBtwAXA9sGw+U0ma4Zb2h8xuWJOl8TBoOEXEJ8GvAQwCZ+ZPMfAtYD+wuzXYDt5bp\n9cCebHoaWBoRVwA3Agcz82RmngIOAkNl2SWZ+VRmJrCnZVuSpB7o5MzhF4G/AP5LRDwXEV+KiCVA\nX2YeByjvl5f2y4E3WtYfLbVz1Ufb1CVJPbKowzbXAp/OzGci4nP8zSWkdtrdL8hp1OsNRwzTvPxE\nX18fjUbjHN2Y2NjY2LTXnasW4pj7FsOW1We6vt9e/TkvxJ+xY549nYTDKDCamc+U+cdohsObEXFF\nZh4vl4ZOtLRf2bL+CuBYqQ+eVW+U+oo27SuZuQPYATAwMJCDg4Ptmk2q0Wgw3XXnqoU45s8/vJcH\njnRyiM+s1+4Y7Po+YWH+jB3z7Jn0slJm/l/gjYj4UCmtBV4C9gHjTxxtAvaW6X3AxvLU0hrgdLns\ndABYFxHLyo3odcCBsuztiFhTnlLa2LItSVIPdPpr1aeBhyPiYuBV4C6awfJoRGwGXgduK233AzcD\nI8A7pS2ZeTIi7gOeLe3uzcyTZfpuYBewGHiivCRJPdJROGTm88BAm0Vr27RN4J4JtrMT2Nmmfhi4\nupO+SJJmn5+QliRVDAdJUqX7j3JoQVm19fGe7XvL6p7tWprzPHOQJFUMB0lSxXCQJFUMB0lSxXCQ\nJFUMB0lSxXCQJFUMB0lSxXCQJFUMB0lSxXCQJFUMB0lSxXCQJFUMB0lSxXCQJFUMB0lSxXCQJFU6\nCoeIeC0ijkTE8xFxuNQujYiDEXG0vC8r9YiI7RExEhEvRMS1LdvZVNofjYhNLfXryvZHyrox0wOV\nJHVuKmcOv56Z12TmQJnfChzKzH7gUJkHuAnoL69h4IvQDBNgG3ADcD2wbTxQSpvhlvWGpj0iSdJ5\nO5/LSuuB3WV6N3BrS31PNj0NLI2IK4AbgYOZeTIzTwEHgaGy7JLMfCozE9jTsi1JUg8s6rBdAv8z\nIhL4w8zcAfRl5nGAzDweEZeXtsuBN1rWHS21c9VH29QrETFM8wyDvr4+Go1Gh93/28bGxqa97lzV\nqzFvWX2m6/sc17e4N/vv1bHlcb0wdGvMnYbDxzLzWAmAgxHxZ+do2+5+QU6jXhebobQDYGBgIAcH\nB8/Z6Yk0Gg2mu+5c1asx37n18a7vc9yW1Wd44Einh/jMee2Owa7vEzyuF4pujbmjy0qZeay8nwC+\nRvOewZvlkhDl/URpPgqsbFl9BXBskvqKNnVJUo9MGg4RsSQi3jc+DawDvgPsA8afONoE7C3T+4CN\n5amlNcDpcvnpALAuIpaVG9HrgANl2dsRsaY8pbSxZVuSpB7o5Jy7D/haebp0EfBHmfnHEfEs8GhE\nbAZeB24r7fcDNwMjwDvAXQCZeTIi7gOeLe3uzcyTZfpuYBewGHiivCRJPTJpOGTmq8BH2tR/AKxt\nU0/gngm2tRPY2aZ+GLi6g/5KkrrAT0hLkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySp\nYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiodh0NEXBQR\nz0XE18v8lRHxTEQcjYivRsTFpf6eMj9Slq9q2cZnSv2ViLixpT5UaiMRsXXmhidJmo6pnDn8NvBy\ny/zvAQ9mZj9wCthc6puBU5n5S8CDpR0RcRWwAfgVYAj4/RI4FwFfAG4CrgJuL20lST3SUThExArg\nE8CXynwAHwceK012A7eW6fVlnrJ8bWm/HngkM3+cmd8DRoDry2skM1/NzJ8Aj5S2kqQe6fTM4T8B\n/xL46zL/AeCtzDxT5keB5WV6OfAGQFl+urT/Wf2sdSaqS5J6ZNFkDSLiHwAnMvPbETE4Xm7TNCdZ\nNlG9XUBlmxoRMQwMA/T19dFoNCbu+DmMjY1Ne925qldj3rL6zOSNZknf4t7sv1fHlsf1wtCtMU8a\nDsDHgFsi4mbgvcAlNM8klkbEonJ2sAI4VtqPAiuB0YhYBLwfONlSH9e6zkT1vyUzdwA7AAYGBnJw\ncLCD7tcajQbTXXeu6tWY79z6eNf3OW7L6jM8cKSTQ3xmvXbHYNf3CR7XC0W3xjzpZaXM/ExmrsjM\nVTRvKH8jM+8AngQ+WZptAvaW6X1lnrL8G5mZpb6hPM10JdAPfAt4FugvTz9dXPaxb0ZGJ0malvP5\ntepfAY9ExO8CzwEPlfpDwJcjYoTmGcMGgMx8MSIeBV4CzgD3ZOZPASLiU8AB4CJgZ2a+eB79kiSd\npymFQ2Y2gEaZfpXmk0Znt/kRcNsE638W+Gyb+n5g/1T6IkmaPX5CWpJUMRwkSRXDQZJUMRwkSRXD\nQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSZXuf2WlNM+t6tE30e4aWtKT/Wp+8sxBklQxHCRJFcNB\nklQxHCRJFcNBklQxHCRJFcNBklQxHCRJlUnDISLeGxHfiog/jYgXI+LflfqVEfFMRByNiK9GxMWl\n/p4yP1KWr2rZ1mdK/ZWIuLGlPlRqIxGxdeaHKUmaik7OHH4MfDwzPwJcAwxFxBrg94AHM7MfOAVs\nLu03A6cy85eAB0s7IuIqYAPwK8AQ8PsRcVFEXAR8AbgJuAq4vbSVJPXIpOGQTWNl9t3llcDHgcdK\nfTdwa5leX+Ypy9dGRJT6I5n548z8HjACXF9eI5n5amb+BHiktJUk9UhH9xzKb/jPAyeAg8B3gbcy\n80xpMgosL9PLgTcAyvLTwAda62etM1FdktQjHX3xXmb+FLgmIpYCXwM+3K5ZeY8Jlk1UbxdQ2aZG\nRAwDwwB9fX00Go1zd3wCY2Nj0153rurVmLesPjN5o1nSt7i3++82j+uFoVtjntK3smbmWxHRANYA\nSyNiUTk7WAEcK81GgZXAaEQsAt4PnGypj2tdZ6L62fvfAewAGBgYyMHBwal0/2cajQbTXXeu6tWY\n7+zRN5RCMxgeOLJwvnh419ASj+sFoFtj7uRppQ+WMwYiYjHwG8DLwJPAJ0uzTcDeMr2vzFOWfyMz\ns9Q3lKeZrgT6gW8BzwL95emni2netN43E4OTJE1PJ79WXQHsLk8VvQt4NDO/HhEvAY9ExO8CzwEP\nlfYPAV+OiBGaZwwbADLzxYh4FHgJOAPcUy5XERGfAg4AFwE7M/PFGRuhJGnKJg2HzHwB+Gib+qs0\nnzQ6u/4j4LYJtvVZ4LNt6vuB/R30V5LUBX5CWpJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJU\nMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwk\nSZVJwyEiVkbEkxHxckS8GBG/XeqXRsTBiDha3peVekTE9ogYiYgXIuLalm1tKu2PRsSmlvp1EXGk\nrLM9ImI2BitJ6kwnZw5ngC2Z+WFgDXBPRFwFbAUOZWY/cKjMA9wE9JfXMPBFaIYJsA24Abge2DYe\nKKXNcMt6Q+c/NEnSdE0aDpl5PDP/pEy/DbwMLAfWA7tLs93ArWV6PbAnm54GlkbEFcCNwMHMPJmZ\np4CDwFBZdklmPpWZCexp2ZYkqQcWTaVxRKwCPgo8A/Rl5nFoBkhEXF6aLQfeaFlttNTOVR9tU2+3\n/2GaZxj09fXRaDSm0v2fGRsbm/a6c1Wvxrxl9Zmu73Nc3+Le7r/bPK4Xhm6NueNwiIifB/4b8DuZ\n+VfnuC3QbkFOo14XM3cAOwAGBgZycHBwkl6312g0mO66c1Wvxnzn1se7vs9xW1af4YEjU/r9Z07b\nNbTE43oB6NaYO3paKSLeTTMYHs7M/17Kb5ZLQpT3E6U+CqxsWX0FcGyS+oo2dUlSj3TytFIADwEv\nZ+Z/bFm0Dxh/4mgTsLelvrE8tbQGOF0uPx0A1kXEsnIjeh1woCx7OyLWlH1tbNmWJKkHOjnn/hjw\nW8CRiHi+1P41cD/waERsBl4HbivL9gM3AyPAO8BdAJl5MiLuA54t7e7NzJNl+m5gF7AYeKK8JEk9\nMmk4ZOb/pv19AYC1bdoncM8E29oJ7GxTPwxcPVlfNH1Hvn+6p9f/Jc0tfkJaklQxHCRJFcNBklQx\nHCRJFcNBklQxHCRJFcNBklQxHCRJFcNBklQxHCRJFcNBklQxHCRJFcNBklQxHCRJFcNBklQxHCRJ\nFcNBklQxHCRJFcNBklSZNBwiYmdEnIiI77TULo2IgxFxtLwvK/WIiO0RMRIRL0TEtS3rbCrtj0bE\nppb6dRFxpKyzPSIm+v+qJUld0smZwy5g6KzaVuBQZvYDh8o8wE1Af3kNA1+EZpgA24AbgOuBbeOB\nUtoMt6x39r4kSV02aThk5jeBk2eV1wO7y/Ru4NaW+p5sehpYGhFXADcCBzPzZGaeAg4CQ2XZJZn5\nVGYmsKdlW5KkHlk0zfX6MvM4QGYej4jLS3058EZLu9FSO1d9tE29rYgYpnmWQV9fH41GY1qdHxsb\nm/a6c1XfYtiy+kyvu9FVC23MC/G4dsyzZ7rhMJF29wtyGvW2MnMHsANgYGAgBwcHp9FFaDQaTHfd\nuerzD+/lgSMz/eO+sG1ZfWZBjXnX0JIFd1wvxL/L3RrzdJ9WerNcEqK8nyj1UWBlS7sVwLFJ6iva\n1CVJPTTdcNgHjD9xtAnY21LfWJ5aWgOcLpefDgDrImJZuRG9DjhQlr0dEWvKU0obW7YlSeqRSc+5\nI+IrwCBwWUSM0nzq6H7g0YjYDLwO3Faa7wduBkaAd4C7ADLzZETcBzxb2t2bmeM3ue+m+UTUYuCJ\n8pIk9dCk4ZCZt0+waG2btgncM8F2dgI729QPA1dP1g9JUvf4CWlJUsVwkCRVDAdJUsVwkCRVDAdJ\nUsVwkCRVDAdJUsVwkCRVFs63kknz3JHvn+bOrY/3ZN+v3f+JnuxXs8czB0lSxXCQJFUMB0lSxXCQ\nJFUMB0lSxXCQJFUMB0lSxXCQJFUMB0lSxU9Id9GqHn16FWDL6p7tWtIcdMGEQ0QMAZ8DLgK+lJn3\n97hLkjrUq198dg0t6cl+F4IL4rJSRFwEfAG4CbgKuD0iruptryRp4bogwgG4HhjJzFcz8yfAI8D6\nHvdJkhasC+Wy0nLgjZb5UeCG2dpZL7+9UtL8MN8vpUVmdmVH5+xExG3AjZn5j8r8bwHXZ+anz2o3\nDAyX2Q8Br0xzl5cBfznNdecqxzz/LbTxgmOeql/IzA920vBCOXMYBVa2zK8Ajp3dKDN3ADvOd2cR\ncTgzB853O3OJY57/Ftp4wTHPpgvlnsOzQH9EXBkRFwMbgH097pMkLVgXxJlDZp6JiE8BB2g+yroz\nM1/scbckacG6IMIBIDP3A/u7tLvzvjQ1Bznm+W+hjRcc86y5IG5IS5IuLBfKPQdJ0gVkXodDRAxF\nxCsRMRIRW9ssf09EfLUsfyYiVnW/lzOng/H+84h4KSJeiIhDEfELvejnTJpszC3tPhkRGRFz/smW\nTsYcEf+w/KxfjIg/6nYfZ1oHx/bfiYgnI+K5cnzf3It+zpSI2BkRJyLiOxMsj4jYXv48XoiIa2e8\nE5k5L180b2x/F/hF4GLgT4GrzmrzT4E/KNMbgK/2ut+zPN5fB36uTN89l8fb6ZhLu/cB3wSeBgZ6\n3e8u/Jz7geeAZWX+8l73uwtj3gHcXaavAl7rdb/Pc8y/BlwLfGeC5TcDTwABrAGemek+zOczh06+\nkmM9sLtMPwasjYjoYh9n0qTjzcwnM/OdMvs0zc+TzGWdfu3KfcC/B37Uzc7Nkk7G/I+BL2TmKYDM\nPNHlPs60TsacwCVl+v20+ZzUXJKZ3wROnqPJemBPNj0NLI2IK2ayD/M5HNp9Jcfyidpk5hngNPCB\nrvRu5nUy3labaf7mMZdNOuaI+CiwMjO/3s2OzaJOfs6/DPxyRPyfiHi6fOPxXNbJmP8t8JsRMUrz\nqcdPM79N9e/7lF0wj7LOgnZnAGc/mtVJm7mi47FExG8CA8Dfm9Uezb5zjjki3gU8CNzZrQ51QSc/\n50U0Ly0N0jw7/F8RcXVmvjXLfZstnYz5dmBXZj4QEb8KfLmM+a9nv3s9Mev/ds3nM4dOvpLjZ20i\nYhHN09FzncpdyDr6CpKI+A3g3wC3ZOaPu9S32TLZmN8HXA00IuI1mtdm983xm9KdHtd7M/P/Zeb3\naH4HWX+X+jcbOhnzZuBRgMx8Cngvze8gmq86+vt+PuZzOHTylRz7gE1l+pPAN7Lc7ZmDJh1vucTy\nhzSDYa5fh4ZJxpyZpzPzssxclZmraN5nuSUzD/emuzOik+P6f9B8+ICIuIzmZaZXu9rLmdXJmF8H\n1gJExIdphsNfdLWX3bUP2FieWloDnM7M4zO5g3l7WSkn+EqOiLgXOJyZ+4CHaJ5+jtA8Y9jQux6f\nnw7H+x+Anwf+a7nv/npm3tKzTp+nDsc8r3Q45gPAuoh4Cfgp8C8y8we96/X56XDMW4D/HBH/jObl\nlTvn8C96RMRXaF4WvKzcR9kGvBsgM/+A5n2Vm4ER4B3grhnvwxz+85MkzZL5fFlJkjRNhoMkqWI4\nSJIqhoMkqWI4SJIqhoMkqWI4SJIqhoMkqfL/AcycIJtPHzYBAAAAAElFTkSuQmCC\n"
},
"metadata": {}
}
]
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2017-04-21T16:18:33.087543Z",
"start_time": "2017-04-21T12:18:32.754281-04:00"
},
"run_control": {
"frozen": false,
"read_only": false
},
"trusted": true
},
"cell_type": "code",
"source": "#save the top 50k as a CSV file\npredicted = predicted.sort_values(ascending=False)\npredicted[:50000].to_csv('demog_only.csv')",
"execution_count": 95,
"outputs": []
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2017-04-18T17:12:56.452219Z",
"start_time": "2017-04-18T13:12:56.373101-04:00"
},
"run_control": {
"frozen": false,
"read_only": false
}
},
"cell_type": "markdown",
"source": "## Demographic + Connections\n\nIn this section, I generate features from the connections and append them to the features created from demographic, hoping that these new features help models fit and predict better. 10 new features added:\n\n- in/out degrees of each node in the network (2)\n- num of democrats/republicans/others among sources of a node (3)\n- num of democrats/republicans/others among sinks of a node (3)\n- predicted avg for B support among sources/sinks of a node (2)\n"
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2017-04-22T02:00:52.013235Z",
"end_time": "2017-04-22T02:00:55.776343Z"
},
"run_control": {
"frozen": false,
"read_only": false
},
"trusted": true
},
"cell_type": "code",
"source": "con = pd.read_csv('connections.csv')\nnodes = pd.Series(con.sink.append(con.source).unique())\n\nprint('Only {:.2f}% of the `voters` are included in the `connections`'.format(100*len(voters[voters.index.isin(nodes)])/len(voters)))\nprint('Only {:.2f}% of the `polled` are included in the `connections`'.format(100*len(polled[polled.index.isin(nodes)])/len(polled)))",
"execution_count": 9,
"outputs": [
{
"output_type": "stream",
"text": "93.88% of the `voters` are included in the `connections`\n93.70% of the `polled` are included in the `connections`\n",
"name": "stdout"
}
]
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2017-04-22T02:38:16.173666Z",
"end_time": "2017-04-22T02:38:22.636075Z"
},
"collapsed": true,
"trusted": true,
"run_control": {
"read_only": false,
"frozen": false
}
},
"cell_type": "code",
"source": "features = {}\nfeatures['in_degree'] = con.groupby('sink',sort=False).size()\nfeatures['out_degree'] = con.groupby('source',sort=False).size()\nfor party in voters.party.unique():\n features['in_'+party] = con[con.source.map(voters.party) == party].groupby('sink',sort=False).size()\n features['out_'+party] = con[con.sink.map(voters.party) == party].groupby('source',sort=False).size()\nfeatures['in_B'] = con.groupby('sink',sort=False).apply(lambda g: g.source.map(predicted)).mean()\nfeatures['out_B'] = con.groupby('source',sort=False).apply(lambda g: g.sink.map(predicted)).mean()",
"execution_count": 29,
"outputs": []
},
{
"metadata": {
"trusted": true,
"collapsed": true
},
"cell_type": "code",
"source": "#get new attributes into a dataframe\nnew_attr = pd.DataFrame(features)",
"execution_count": null,
"outputs": []
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2017-04-22T03:36:14.065127Z",
"end_time": "2017-04-22T03:36:14.172926Z"
},
"run_control": {
"frozen": false,
"read_only": false
},
"trusted": true
},
"cell_type": "code",
"source": "#preprocess the data\ny = df.loc[:,'choice'].map({'A':0,'B':1})\nX_train = df.drop(['choice','party'],axis=1)\n\nX_train = X_train.join(new_attr)\n\nfrom sklearn.preprocessing import Imputer,scale,PolynomialFeatures\n# from sklearn.decomposition import PCA\n\nX_train = pd.get_dummies(X_train)\ntrain_cols = X_train.columns\n\nX_train = Imputer(strategy='most_frequent').fit_transform(X_train)\n# X = PolynomialFeatures().fit_transform(X) #not helped\n# X = scale(X) #not helped\n# X = PCA().fit_transform(X) #not helped",
"execution_count": 81,
"outputs": []
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2017-04-22T03:22:22.950108Z",
"end_time": "2017-04-22T03:22:24.392305Z"
},
"run_control": {
"frozen": false,
"read_only": false
},
"trusted": true
},
"cell_type": "code",
"source": "print('3-fold CV AUC Scores from probability estimates')\nprint('-----------------------------------------------')\nfor model in models:\n scores = cross_val_score(model, X_train, y, cv=3, scoring='roc_auc') #average_precision\n print(\"{:25s} : {:.3f} (+/- {:.2f})\".format(model.__class__.__name__, scores.mean(), scores.std() * 2))",
"execution_count": 63,
"outputs": [
{
"output_type": "stream",
"text": "3-fold CV AUC Scores from probability estimates\n-----------------------------------------------\nGaussianNB : 0.546 (+/- 0.09)\nLogisticRegression : 0.521 (+/- 0.05)\nRandomForestClassifier : 0.520 (+/- 0.07)\nSVC : 0.480 (+/- 0.04)\nDecisionTreeClassifier : 0.497 (+/- 0.02)\nSGDClassifier : 0.527 (+/- 0.11)\nKNeighborsClassifier : 0.504 (+/- 0.02)\nMLPClassifier : 0.523 (+/- 0.07)\nVotingClassifier : 0.537 (+/- 0.04)\n",
"name": "stdout"
}
]
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2017-04-22T03:33:09.004954Z",
"end_time": "2017-04-22T03:33:10.250334Z"
},
"trusted": true
},
"cell_type": "code",
"source": "#get the prediction probabilities for a decent estimator\nmodel = GaussianNB()\nmodel = model.fit(X_train, y)\n#only include republicans\nX_test = voters[voters['party'] == 'republican'].drop(['party'],axis=1)\ntest_inds = X_test.index\nX_test = X_test.join(new_attr)\nX_test = pd.get_dummies(X_test)\nX_test = Imputer(strategy='most_frequent').fit_transform(X_test[train_cols])\nres = model.predict_proba(X_test)",
"execution_count": 74,
"outputs": []
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2017-04-22T03:33:20.402602Z",
"end_time": "2017-04-22T03:33:20.812914Z"
},
"trusted": true
},
"cell_type": "code",
"source": "#get the predicted and plot B_proba\n%matplotlib inline\npredicted = pd.Series(res[:,1], index=test_inds, name='B_proba')\n#don't forget to add the known B voters from the polled dataset\nmask = predicted.index.isin(polled[polled.choice=='B'].index)\npredicted[mask] = 1.0 #we're 100% sure of 319\nmask = predicted.index.isin(polled[polled.choice=='A'].index)\npredicted[mask] = 0.0 #we're 100% sure of 332\npredicted.hist();",
"execution_count": 75,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<matplotlib.figure.Figure at 0x116beb3c8>",
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAY0AAAD8CAYAAACLrvgBAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAF5tJREFUeJzt3X+w5XV93/HnK7vBEhMDSrzDAOnidM0EoSVyR+lkkt6E\nRBfacbGjKQwJqzLdaKDTtExHbDqDo3FGk6HOyChmLTssGcKPanR3zFrCEG9NO6JgICyYUC64kSs7\nUAHRDQl27bt/nM/aw3p274dz7r1n1/t8zHznfM/7+/l8v5/P3WVf9/vjHFJVSJLU40emPQBJ0rHD\n0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3QwNSVI3Q0OS1G39tAew3E466aTasGHDWH3/9m//\nlpe+9KXLO6CjnHNeG5zz2jDJnL/yla98s6p+aql2P3ShsWHDBu65556x+s7PzzM3N7e8AzrKOee1\nwTmvDZPMOcnf9LTz8pQkqZuhIUnqZmhIkroZGpKkboaGJKmboSFJ6mZoSJK6GRqSpG6GhiSp2w/d\nJ8Insecbz/K2q/5kKsfe+8F/PpXjStKL4ZmGJKmboSFJ6mZoSJK6GRqSpG5LhkaS7UmeTPLAUO3W\nJPe1ZW+S+1p9Q5K/G9r28aE+5yTZk2QhyUeSpNVfnuSOJA+31xNbPa3dQpL7k7x2+acvSXoxes40\nbgA2DReq6l9V1dlVdTbwKeCPhzY/cnBbVb1zqH4dsBXY2JaD+7wKuLOqNgJ3tvcA5w+13dr6S5Km\naMnQqKovAE+P2tbOFn4NuPlI+0hyMvCyqvpiVRVwI3Bh27wZ2NHWdxxSv7EG7gJOaPuRJE3JpPc0\nfgF4oqoeHqqdnuTeJP89yS+02inA4lCbxVYDmKmqfQDt9ZVDfR47TB9J0hRM+uG+i3nhWcY+4Ker\n6qkk5wCfSfIaICP61hL77u6TZCuDS1jMzMwwPz+/1LhHmjkerjzrwFh9JzXumCe1f//+qR17Wpzz\n2uCcV8bYoZFkPfAvgXMO1qrqeeD5tv6VJI8Ar2ZwlnDqUPdTgcfb+hNJTq6qfe3y05Otvgicdpg+\nL1BV24BtALOzszXu/yP32pt2cs2e6XxIfu8lc1M5rv8f5bXBOa8NqzHnSS5P/Qrw11X1/ctOSX4q\nybq2/ioGN7EfbZedvpPk3HYf5FJgZ+u2C9jS1rccUr+0PUV1LvDswctYkqTp6Hnk9mbgi8DPJFlM\nclnbdBE/eAP8F4H7k/wl8EngnVV18Cb6u4D/AiwAjwCfa/UPAr+a5GHgV9t7gN3Ao639J4DfevHT\nkyQtpyWvxVTVxYepv21E7VMMHsEd1f4e4MwR9aeA80bUC7h8qfFJklaPnwiXJHUzNCRJ3QwNSVI3\nQ0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3QwNSVI3\nQ0OS1M3QkCR1MzQkSd0MDUlStyVDI8n2JE8meWCo9t4k30hyX1suGNr2niQLSR5K8sah+qZWW0hy\n1VD99CRfSvJwkluTHNfqL2nvF9r2Dcs1aUnSeHrONG4ANo2of7iqzm7LboAkZwAXAa9pfT6WZF2S\ndcBHgfOBM4CLW1uAD7V9bQSeAS5r9cuAZ6rqHwEfbu0kSVO0ZGhU1ReApzv3txm4paqer6qvAQvA\n69qyUFWPVtV3gVuAzUkC/DLwydZ/B3Dh0L52tPVPAue19pKkKZnknsYVSe5vl69ObLVTgMeG2iy2\n2uHqrwC+VVUHDqm/YF9t+7OtvSRpStaP2e864P1AtddrgHcAo84EitHhVEdozxLbXiDJVmArwMzM\nDPPz80cY+uHNHA9XnnVg6YYrYNwxT2r//v1TO/a0OOe1wTmvjLFCo6qeOLie5BPAZ9vbReC0oaan\nAo+39VH1bwInJFnfziaG2x/c12KS9cBPcpjLZFW1DdgGMDs7W3Nzc+NMi2tv2sk1e8bN0cnsvWRu\nKsedn59n3J/Xsco5rw3OeWWMdXkqyclDb98MHHyyahdwUXvy6XRgI/Bl4G5gY3tS6jgGN8t3VVUB\nnwfe0vpvAXYO7WtLW38L8GetvSRpSpb8tTrJzcAccFKSReBqYC7J2QwuF+0FfhOgqh5MchvwVeAA\ncHlVfa/t5wrgdmAdsL2qHmyHeDdwS5LfBe4Frm/164E/TLLA4AzjoolnK0mayJKhUVUXjyhfP6J2\nsP0HgA+MqO8Gdo+oP8rg6apD638PvHWp8UmSVo+fCJckdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ\n3QwNSVI3Q0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ\n3QwNSVI3Q0OS1G3J0EiyPcmTSR4Yqv1+kr9Ocn+STyc5odU3JPm7JPe15eNDfc5JsifJQpKPJEmr\nvzzJHUkebq8ntnpau4V2nNcu//QlSS9Gz5nGDcCmQ2p3AGdW1T8G/hfwnqFtj1TV2W1551D9OmAr\nsLEtB/d5FXBnVW0E7mzvAc4faru19ZckTdGSoVFVXwCePqT2p1V1oL29Czj1SPtIcjLwsqr6YlUV\ncCNwYdu8GdjR1nccUr+xBu4CTmj7kSRNyfpl2Mc7gFuH3p+e5F7g28B/qqo/B04BFofaLLYawExV\n7QOoqn1JXtnqpwCPjeiz79ABJNnK4GyEmZkZ5ufnx5rIzPFw5VkHlm64AsYd86T2798/tWNPi3Ne\nG5zzypgoNJL8DnAAuKmV9gE/XVVPJTkH+EyS1wAZ0b2W2n1vn6raBmwDmJ2drbm5uY7R/6Brb9rJ\nNXuWI0dfvL2XzE3luPPz84z78zpWOee1wTmvjLH/hUyyBfgXwHntkhNV9TzwfFv/SpJHgFczOEsY\nvoR1KvB4W38iycntLONk4MlWXwROO0wfSdIUjPXIbZJNwLuBN1XVc0P1n0qyrq2/isFN7Efb5afv\nJDm3PTV1KbCzddsFbGnrWw6pX9qeojoXePbgZSxJ0nQseaaR5GZgDjgpySJwNYOnpV4C3NGenL2r\nPSn1i8D7khwAvge8s6oO3kR/F4MnsY4HPtcWgA8CtyW5DPg68NZW3w1cACwAzwFvn2SikqTJLRka\nVXXxiPL1h2n7KeBTh9l2D3DmiPpTwHkj6gVcvtT4JEmrx0+ES5K6GRqSpG6GhiSpm6EhSepmaEiS\nuhkakqRuhoYkqZuhIUnqZmhIkroZGpKkboaGJKmboSFJ6mZoSJK6GRqSpG6GhiSpm6EhSepmaEiS\nuhkakqRuhoYkqVtXaCTZnuTJJA8M1V6e5I4kD7fXE1s9ST6SZCHJ/UleO9RnS2v/cJItQ/Vzkuxp\nfT6SJEc6hiRpOnrPNG4ANh1Suwq4s6o2Ane29wDnAxvbshW4DgYBAFwNvB54HXD1UAhc19oe7Ldp\niWNIkqagKzSq6gvA04eUNwM72voO4MKh+o01cBdwQpKTgTcCd1TV01X1DHAHsKlte1lVfbGqCrjx\nkH2NOoYkaQomuacxU1X7ANrrK1v9FOCxoXaLrXak+uKI+pGOIUmagvUrsM+MqNUY9f4DJlsZXN5i\nZmaG+fn5F9P9+2aOhyvPOjBW30mNO+ZJ7d+/f2rHnhbnvDY455UxSWg8keTkqtrXLjE92eqLwGlD\n7U4FHm/1uUPq861+6oj2RzrGC1TVNmAbwOzsbM3NzY1qtqRrb9rJNXtWIkeXtveSuakcd35+nnF/\nXscq57w2OOeVMcnlqV3AwSegtgA7h+qXtqeozgWebZeWbgfekOTEdgP8DcDtbdt3kpzbnpq69JB9\njTqGJGkKun6tTnIzg7OEk5IsMngK6oPAbUkuA74OvLU13w1cACwAzwFvB6iqp5O8H7i7tXtfVR28\nuf4uBk9oHQ98ri0c4RiSpCnoCo2quvgwm84b0baAyw+zn+3A9hH1e4AzR9SfGnUMSdJ0+IlwSVI3\nQ0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3QwNSVI3\nQ0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdRs7NJL8TJL7hpZvJ/ntJO9N8o2h+gVDfd6TZCHJ\nQ0neOFTf1GoLSa4aqp+e5EtJHk5ya5Ljxp+qJGlSY4dGVT1UVWdX1dnAOcBzwKfb5g8f3FZVuwGS\nnAFcBLwG2AR8LMm6JOuAjwLnA2cAF7e2AB9q+9oIPANcNu54JUmTW67LU+cBj1TV3xyhzWbglqp6\nvqq+BiwAr2vLQlU9WlXfBW4BNicJ8MvAJ1v/HcCFyzReSdIYlis0LgJuHnp/RZL7k2xPcmKrnQI8\nNtRmsdUOV38F8K2qOnBIXZI0Jesn3UG7z/Am4D2tdB3wfqDa6zXAO4CM6F6MDq46QvtRY9gKbAWY\nmZlhfn6+fwJDZo6HK886sHTDFTDumCe1f//+qR17Wpzz2uCcV8bEocHgXsRfVNUTAAdfAZJ8Avhs\ne7sInDbU71Tg8bY+qv5N4IQk69vZxnD7F6iqbcA2gNnZ2ZqbmxtrItfetJNr9izHj+TF23vJ3FSO\nOz8/z7g/r2OVc14bnPPKWI7LUxczdGkqyclD294MPNDWdwEXJXlJktOBjcCXgbuBje1JqeMYXOra\nVVUFfB54S+u/Bdi5DOOVJI1pol+rk/wY8KvAbw6Vfy/J2QwuJe09uK2qHkxyG/BV4ABweVV9r+3n\nCuB2YB2wvaoebPt6N3BLkt8F7gWun2S8kqTJTBQaVfUcgxvWw7XfOEL7DwAfGFHfDeweUX+UwdNV\nkqSjgJ8IlyR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3QwNSVI3\nQ0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUbeLQSLI3yZ4k9yW5p9Ve\nnuSOJA+31xNbPUk+kmQhyf1JXju0ny2t/cNJtgzVz2n7X2h9M+mYJUnjWa4zjV+qqrOrara9vwq4\ns6o2Ane29wDnAxvbshW4DgYhA1wNvB54HXD1waBpbbYO9du0TGOWJL1IK3V5ajOwo63vAC4cqt9Y\nA3cBJyQ5GXgjcEdVPV1VzwB3AJvatpdV1RerqoAbh/YlSVpl65dhHwX8aZIC/qCqtgEzVbUPoKr2\nJXlla3sK8NhQ38VWO1J9cUT9BZJsZXA2wszMDPPz82NNZOZ4uPKsA2P1ndS4Y57U/v37p3bsaXHO\na4NzXhnLERo/X1WPt2C4I8lfH6HtqPsRNUb9hYVBUG0DmJ2drbm5uSUHPcq1N+3kmj3L8SN58fZe\nMjeV487PzzPuz+tY5ZzXBue8Mia+PFVVj7fXJ4FPM7gn8US7tER7fbI1XwROG+p+KvD4EvVTR9Ql\nSVMwUWgkeWmSnzi4DrwBeADYBRx8AmoLsLOt7wIubU9RnQs82y5j3Q68IcmJ7Qb4G4Db27bvJDm3\nPTV16dC+JEmrbNJrMTPAp9tTsOuBP6qq/5bkbuC2JJcBXwfe2trvBi4AFoDngLcDVNXTSd4P3N3a\nva+qnm7r7wJuAI4HPtcWSdIUTBQaVfUo8E9G1J8CzhtRL+Dyw+xrO7B9RP0e4MxJxilJWh5+IlyS\n1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3QwNSVI3Q0OS\n1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndxg6NJKcl+XySv0ryYJJ/2+rvTfKNJPe15YKh\nPu9JspDkoSRvHKpvarWFJFcN1U9P8qUkDye5Nclx445XkjS5Sc40DgBXVtXPAucClyc5o237cFWd\n3ZbdAG3bRcBrgE3Ax5KsS7IO+ChwPnAGcPHQfj7U9rUReAa4bILxSpImNHZoVNW+qvqLtv4d4K+A\nU47QZTNwS1U9X1VfAxaA17VloaoerarvArcAm5ME+GXgk63/DuDCcccrSZrc+uXYSZINwM8BXwJ+\nHrgiyaXAPQzORp5hECh3DXVb5P+HzGOH1F8PvAL4VlUdGNH+0ONvBbYCzMzMMD8/P9Y8Zo6HK886\nsHTDFTDumCe1f//+qR17Wpzz2uCcV8bEoZHkx4FPAb9dVd9Och3wfqDa6zXAO4CM6F6MPtupI7T/\nwWLVNmAbwOzsbM3Nzb3IWQxce9NOrtmzLDn6ou29ZG4qx52fn2fcn9exyjmvDc55ZUz0L2SSH2UQ\nGDdV1R8DVNUTQ9s/AXy2vV0EThvqfirweFsfVf8mcEKS9e1sY7i9JGkKJnl6KsD1wF9V1X8eqp88\n1OzNwANtfRdwUZKXJDkd2Ah8Gbgb2NielDqOwc3yXVVVwOeBt7T+W4Cd445XkjS5Sc40fh74DWBP\nkvta7T8yePrpbAaXkvYCvwlQVQ8muQ34KoMnry6vqu8BJLkCuB1YB2yvqgfb/t4N3JLkd4F7GYSU\nJGlKxg6NqvofjL7vsPsIfT4AfGBEffeoflX1KIOnqyRJRwE/ES5J6mZoSJK6GRqSpG6GhiSpm6Eh\nSepmaEiSuhkakqRuhoYkqZuhIUnqZmhIkroZGpKkboaGJKmboSFJ6jad/02dJP2Q2nDVn0zt2Dds\neumKH8MzDUlSN0NDktTN0JAkdTM0JEndDA1JUrejPjSSbEryUJKFJFdNezyStJYd1aGRZB3wUeB8\n4Azg4iRnTHdUkrR2HdWhAbwOWKiqR6vqu8AtwOYpj0mS1qyjPTROAR4ber/YapKkKTjaPxGeEbX6\ngUbJVmBre7s/yUNjHu8k4Jtj9p1IPjSNowJTnPMUOee1Yc3N+Zc+NNGc/2FPo6M9NBaB04benwo8\nfmijqtoGbJv0YEnuqarZSfdzLHHOa4NzXhtWY85H++Wpu4GNSU5PchxwEbBrymOSpDXrqD7TqKoD\nSa4AbgfWAdur6sEpD0uS1qyjOjQAqmo3sHuVDjfxJa5jkHNeG5zz2rDic07VD9xXliRppKP9noYk\n6SiyJkNjqa8mSfKSJLe27V9KsmH1R7m8Oub875N8Ncn9Se5M0vX43dGs9ytokrwlSSU55p+06Zlz\nkl9rf9YPJvmj1R7jcuv4u/3TST6f5N729/uCaYxzuSTZnuTJJA8cZnuSfKT9PO5P8tplHUBVramF\nwQ31R4BXAccBfwmccUib3wI+3tYvAm6d9rhXYc6/BPxYW3/XWphza/cTwBeAu4DZaY97Ff6cNwL3\nAie296+c9rhXYc7bgHe19TOAvdMe94Rz/kXgtcADh9l+AfA5Bp9zOxf40nIefy2eafR8NclmYEdb\n/yRwXpJRHzQ8Viw556r6fFU9197exeAzMcey3q+geT/we8Dfr+bgVkjPnP818NGqegagqp5c5TEu\nt545F/Cytv6TjPis17Gkqr4APH2EJpuBG2vgLuCEJCcv1/HXYmj0fDXJ99tU1QHgWeAVqzK6lfFi\nv47lMga/qRzLlpxzkp8DTquqz67mwFZQz5/zq4FXJ/mfSe5KsmnVRrcyeub8XuDXkywyeBLz36zO\n0KZmRb9+6ah/5HYF9Hw1SdfXlxxDuueT5NeBWeCfreiIVt4R55zkR4APA29brQGtgp4/5/UMLlHN\nMTib/PMkZ1bVt1Z4bCulZ84XAzdU1TVJ/inwh23O/3flhzcVK/rv11o80+j5apLvt0mynsEp7ZFO\nB492XV/HkuRXgN8B3lRVz6/S2FbKUnP+CeBMYD7JXgbXfncd4zfDe/9u76yq/1NVXwMeYhAix6qe\nOV8G3AZQVV8E/gGD76X6YdX13/u41mJo9Hw1yS5gS1t/C/Bn1e4wHaOWnHO7VPMHDALjWL/ODUvM\nuaqeraqTqmpDVW1gcB/nTVV1z3SGuyx6/m5/hsFDDyQ5icHlqkdXdZTLq2fOXwfOA0jyswxC43+v\n6ihX1y7g0vYU1bnAs1W1b7l2vuYuT9VhvpokyfuAe6pqF3A9g1PYBQZnGBdNb8ST65zz7wM/DvzX\nds//61X1pqkNekKdc/6h0jnn24E3JPkq8D3gP1TVU9Mb9WQ653wl8Ikk/47BZZq3Hcu/BCa5mcHl\nxZPafZqrgR8FqKqPM7hvcwGwADwHvH1Zj38M/+wkSatsLV6ekiSNydCQJHUzNCRJ3QwNSVI3Q0OS\n1M3QkCR1MzQkSd0MDUlSt/8HTi2DHCk5xKMAAAAASUVORK5CYII=\n"
},
"metadata": {}
}
]
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2017-04-22T03:34:40.277377Z",
"end_time": "2017-04-22T03:34:40.662057Z"
},
"trusted": true,
"collapsed": true
},
"cell_type": "code",
"source": "#save the top 50k as a CSV file\npredicted = predicted.sort_values(ascending=False)\npredicted[:50000].to_csv('demog_conn.csv')",
"execution_count": 76,
"outputs": []
}
],
"metadata": {
"_draft": {
"nbviewer_url": "https://gist.github.com/2edc8d12d710291acb5037794e4fa62e"
},
"gist": {
"id": "2edc8d12d710291acb5037794e4fa62e",
"data": {
"description": "B-Supporters.ipynb",
"public": true
}
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"language_info": {
"name": "python",
"version": "3.6.1",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment