Skip to content

Instantly share code, notes, and snippets.

@ajayteach
Created March 12, 2017 14:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ajayteach/eed37262e64de78f4b209c5eb4a7ed23 to your computer and use it in GitHub Desktop.
Save ajayteach/eed37262e64de78f4b209c5eb4a7ed23 to your computer and use it in GitHub Desktop.
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd #importing packages\n",
"import os as os"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"display.chop_threshold : float or None\n",
" if set to a float value, all float values smaller then the given threshold\n",
" will be displayed as exactly 0 by repr and friends.\n",
" [default: None] [currently: None]\n",
"\n",
"display.colheader_justify : 'left'/'right'\n",
" Controls the justification of column headers. used by DataFrameFormatter.\n",
" [default: right] [currently: right]\n",
"\n",
"display.column_space No description available.\n",
" [default: 12] [currently: 12]\n",
"\n",
"display.date_dayfirst : boolean\n",
" When True, prints and parses dates with the day first, eg 20/01/2005\n",
" [default: False] [currently: False]\n",
"\n",
"display.date_yearfirst : boolean\n",
" When True, prints and parses dates with the year first, eg 2005/01/20\n",
" [default: False] [currently: False]\n",
"\n",
"display.encoding : str/unicode\n",
" Defaults to the detected encoding of the console.\n",
" Specifies the encoding to be used for strings returned by to_string,\n",
" these are generally strings meant to be displayed on the console.\n",
" [default: UTF-8] [currently: UTF-8]\n",
"\n",
"display.expand_frame_repr : boolean\n",
" Whether to print out the full DataFrame repr for wide DataFrames across\n",
" multiple lines, `max_columns` is still respected, but the output will\n",
" wrap-around across multiple \"pages\" if its width exceeds `display.width`.\n",
" [default: True] [currently: True]\n",
"\n",
"display.float_format : callable\n",
" The callable should accept a floating point number and return\n",
" a string with the desired format of the number. This is used\n",
" in some places like SeriesFormatter.\n",
" See formats.format.EngFormatter for an example.\n",
" [default: None] [currently: None]\n",
"\n",
"display.height : int\n",
" Deprecated.\n",
" [default: 60] [currently: 60]\n",
" (Deprecated, use `display.max_rows` instead.)\n",
"\n",
"display.large_repr : 'truncate'/'info'\n",
" For DataFrames exceeding max_rows/max_cols, the repr (and HTML repr) can\n",
" show a truncated table (the default from 0.13), or switch to the view from\n",
" df.info() (the behaviour in earlier versions of pandas).\n",
" [default: truncate] [currently: truncate]\n",
"\n",
"display.latex.escape : bool\n",
" This specifies if the to_latex method of a Dataframe uses escapes special\n",
" characters.\n",
" method. Valid values: False,True\n",
" [default: True] [currently: True]\n",
"\n",
"display.latex.longtable :bool\n",
" This specifies if the to_latex method of a Dataframe uses the longtable\n",
" format.\n",
" method. Valid values: False,True\n",
" [default: False] [currently: False]\n",
"\n",
"display.latex.repr : boolean\n",
" Whether to produce a latex DataFrame representation for jupyter\n",
" environments that support it.\n",
" (default: False)\n",
" [default: False] [currently: False]\n",
"\n",
"display.line_width : int\n",
" Deprecated.\n",
" [default: 80] [currently: 80]\n",
" (Deprecated, use `display.width` instead.)\n",
"\n",
"display.max_categories : int\n",
" This sets the maximum number of categories pandas should output when\n",
" printing out a `Categorical` or a Series of dtype \"category\".\n",
" [default: 8] [currently: 8]\n",
"\n",
"display.max_columns : int\n",
" If max_cols is exceeded, switch to truncate view. Depending on\n",
" `large_repr`, objects are either centrally truncated or printed as\n",
" a summary view. 'None' value means unlimited.\n",
"\n",
" In case python/IPython is running in a terminal and `large_repr`\n",
" equals 'truncate' this can be set to 0 and pandas will auto-detect\n",
" the width of the terminal and print a truncated object which fits\n",
" the screen width. The IPython notebook, IPython qtconsole, or IDLE\n",
" do not run in a terminal and hence it is not possible to do\n",
" correct auto-detection.\n",
" [default: 20] [currently: 20]\n",
"\n",
"display.max_colwidth : int\n",
" The maximum width in characters of a column in the repr of\n",
" a pandas data structure. When the column overflows, a \"...\"\n",
" placeholder is embedded in the output.\n",
" [default: 50] [currently: 50]\n",
"\n",
"display.max_info_columns : int\n",
" max_info_columns is used in DataFrame.info method to decide if\n",
" per column information will be printed.\n",
" [default: 100] [currently: 100]\n",
"\n",
"display.max_info_rows : int or None\n",
" df.info() will usually show null-counts for each column.\n",
" For large frames this can be quite slow. max_info_rows and max_info_cols\n",
" limit this null check only to frames with smaller dimensions than\n",
" specified.\n",
" [default: 1690785] [currently: 1690785]\n",
"\n",
"display.max_rows : int\n",
" If max_rows is exceeded, switch to truncate view. Depending on\n",
" `large_repr`, objects are either centrally truncated or printed as\n",
" a summary view. 'None' value means unlimited.\n",
"\n",
" In case python/IPython is running in a terminal and `large_repr`\n",
" equals 'truncate' this can be set to 0 and pandas will auto-detect\n",
" the height of the terminal and print a truncated object which fits\n",
" the screen height. The IPython notebook, IPython qtconsole, or\n",
" IDLE do not run in a terminal and hence it is not possible to do\n",
" correct auto-detection.\n",
" [default: 60] [currently: 60]\n",
"\n",
"display.max_seq_items : int or None\n",
" when pretty-printing a long sequence, no more then `max_seq_items`\n",
" will be printed. If items are omitted, they will be denoted by the\n",
" addition of \"...\" to the resulting string.\n",
"\n",
" If set to None, the number of items to be printed is unlimited.\n",
" [default: 100] [currently: 100]\n",
"\n",
"display.memory_usage : bool, string or None\n",
" This specifies if the memory usage of a DataFrame should be displayed when\n",
" df.info() is called. Valid values True,False,'deep'\n",
" [default: True] [currently: True]\n",
"\n",
"display.mpl_style : bool\n",
" Setting this to 'default' will modify the rcParams used by matplotlib\n",
" to give plots a more pleasing visual style by default.\n",
" Setting this to None/False restores the values to their initial value.\n",
" [default: None] [currently: None]\n",
"\n",
"display.multi_sparse : boolean\n",
" \"sparsify\" MultiIndex display (don't display repeated\n",
" elements in outer levels within groups)\n",
" [default: True] [currently: True]\n",
"\n",
"display.notebook_repr_html : boolean\n",
" When True, IPython notebook will use html representation for\n",
" pandas objects (if it is available).\n",
" [default: True] [currently: True]\n",
"\n",
"display.pprint_nest_depth : int\n",
" Controls the number of nested levels to process when pretty-printing\n",
" [default: 3] [currently: 3]\n",
"\n",
"display.precision : int\n",
" Floating point output precision (number of significant digits). This is\n",
" only a suggestion\n",
" [default: 6] [currently: 6]\n",
"\n",
"display.show_dimensions : boolean or 'truncate'\n",
" Whether to print out dimensions at the end of DataFrame repr.\n",
" If 'truncate' is specified, only print out the dimensions if the\n",
" frame is truncated (e.g. not display all rows and/or columns)\n",
" [default: truncate] [currently: truncate]\n",
"\n",
"display.unicode.ambiguous_as_wide : boolean\n",
" Whether to use the Unicode East Asian Width to calculate the display text\n",
" width.\n",
" Enabling this may affect to the performance (default: False)\n",
" [default: False] [currently: False]\n",
"\n",
"display.unicode.east_asian_width : boolean\n",
" Whether to use the Unicode East Asian Width to calculate the display text\n",
" width.\n",
" Enabling this may affect to the performance (default: False)\n",
" [default: False] [currently: False]\n",
"\n",
"display.width : int\n",
" Width of the display in characters. In case python/IPython is running in\n",
" a terminal this can be set to None and pandas will correctly auto-detect\n",
" the width.\n",
" Note that the IPython notebook, IPython qtconsole, or IDLE do not run in a\n",
" terminal and hence it is not possible to correctly detect the width.\n",
" [default: 80] [currently: 80]\n",
"\n",
"io.excel.xls.writer : string\n",
" The default Excel writer engine for 'xls' files. Available options:\n",
" 'xlwt' (the default).\n",
" [default: xlwt] [currently: xlwt]\n",
"\n",
"io.excel.xlsm.writer : string\n",
" The default Excel writer engine for 'xlsm' files. Available options:\n",
" 'openpyxl' (the default).\n",
" [default: openpyxl] [currently: openpyxl]\n",
"\n",
"io.excel.xlsx.writer : string\n",
" The default Excel writer engine for 'xlsx' files. Available options:\n",
" 'xlsxwriter' (the default), 'openpyxl'.\n",
" [default: xlsxwriter] [currently: xlsxwriter]\n",
"\n",
"io.hdf.default_format : format\n",
" default format writing format, if None, then\n",
" put will default to 'fixed' and append will default to 'table'\n",
" [default: None] [currently: None]\n",
"\n",
"io.hdf.dropna_table : boolean\n",
" drop ALL nan rows when appending to a table\n",
" [default: False] [currently: False]\n",
"\n",
"mode.chained_assignment : string\n",
" Raise an exception, warn, or no action if trying to use chained assignment,\n",
" The default is warn\n",
" [default: warn] [currently: warn]\n",
"\n",
"mode.sim_interactive : boolean\n",
" Whether to simulate interactive mode for purposes of testing\n",
" [default: False] [currently: False]\n",
"\n",
"mode.use_inf_as_null : boolean\n",
" True means treat None, NaN, INF, -INF as null (old way),\n",
" False means None and NaN are null, but INF, -INF are not null\n",
" (new way).\n",
" [default: False] [currently: False]\n",
"\n",
"\n"
]
}
],
"source": [
" pd.describe_option() #describe options for customizing\n",
" \n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.get_option(\"display.memory_usage\")#setting some options\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"adult=pd.read_csv(\"https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data\",header=None)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" <th>5</th>\n",
" <th>6</th>\n",
" <th>7</th>\n",
" <th>8</th>\n",
" <th>9</th>\n",
" <th>10</th>\n",
" <th>11</th>\n",
" <th>12</th>\n",
" <th>13</th>\n",
" <th>14</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>39</td>\n",
" <td>State-gov</td>\n",
" <td>77516</td>\n",
" <td>Bachelors</td>\n",
" <td>13</td>\n",
" <td>Never-married</td>\n",
" <td>Adm-clerical</td>\n",
" <td>Not-in-family</td>\n",
" <td>White</td>\n",
" <td>Male</td>\n",
" <td>2174</td>\n",
" <td>0</td>\n",
" <td>40</td>\n",
" <td>United-States</td>\n",
" <td>&lt;=50K</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>50</td>\n",
" <td>Self-emp-not-inc</td>\n",
" <td>83311</td>\n",
" <td>Bachelors</td>\n",
" <td>13</td>\n",
" <td>Married-civ-spouse</td>\n",
" <td>Exec-managerial</td>\n",
" <td>Husband</td>\n",
" <td>White</td>\n",
" <td>Male</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>13</td>\n",
" <td>United-States</td>\n",
" <td>&lt;=50K</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>38</td>\n",
" <td>Private</td>\n",
" <td>215646</td>\n",
" <td>HS-grad</td>\n",
" <td>9</td>\n",
" <td>Divorced</td>\n",
" <td>Handlers-cleaners</td>\n",
" <td>Not-in-family</td>\n",
" <td>White</td>\n",
" <td>Male</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>40</td>\n",
" <td>United-States</td>\n",
" <td>&lt;=50K</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>53</td>\n",
" <td>Private</td>\n",
" <td>234721</td>\n",
" <td>11th</td>\n",
" <td>7</td>\n",
" <td>Married-civ-spouse</td>\n",
" <td>Handlers-cleaners</td>\n",
" <td>Husband</td>\n",
" <td>Black</td>\n",
" <td>Male</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>40</td>\n",
" <td>United-States</td>\n",
" <td>&lt;=50K</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>28</td>\n",
" <td>Private</td>\n",
" <td>338409</td>\n",
" <td>Bachelors</td>\n",
" <td>13</td>\n",
" <td>Married-civ-spouse</td>\n",
" <td>Prof-specialty</td>\n",
" <td>Wife</td>\n",
" <td>Black</td>\n",
" <td>Female</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>40</td>\n",
" <td>Cuba</td>\n",
" <td>&lt;=50K</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0 1 2 3 4 5 \\\n",
"0 39 State-gov 77516 Bachelors 13 Never-married \n",
"1 50 Self-emp-not-inc 83311 Bachelors 13 Married-civ-spouse \n",
"2 38 Private 215646 HS-grad 9 Divorced \n",
"3 53 Private 234721 11th 7 Married-civ-spouse \n",
"4 28 Private 338409 Bachelors 13 Married-civ-spouse \n",
"\n",
" 6 7 8 9 10 11 12 \\\n",
"0 Adm-clerical Not-in-family White Male 2174 0 40 \n",
"1 Exec-managerial Husband White Male 0 0 13 \n",
"2 Handlers-cleaners Not-in-family White Male 0 0 40 \n",
"3 Handlers-cleaners Husband Black Male 0 0 40 \n",
"4 Prof-specialty Wife Black Female 0 0 40 \n",
"\n",
" 13 14 \n",
"0 United-States <=50K \n",
"1 United-States <=50K \n",
"2 United-States <=50K \n",
"3 United-States <=50K \n",
"4 Cuba <=50K "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adult.head()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"names2=[\"age\",\"workclass\",\"fnlwgt\",\"education\",\"education-num\",\"marital-status\",\"occupation\",\"relationship\",\"race\",\"sex\",\"capital-gain\",\"capital-loss\",\"hours-per-week\",\"native-country\",\"income\"]\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"adult.columns=names2"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>workclass</th>\n",
" <th>fnlwgt</th>\n",
" <th>education</th>\n",
" <th>education-num</th>\n",
" <th>marital-status</th>\n",
" <th>occupation</th>\n",
" <th>relationship</th>\n",
" <th>race</th>\n",
" <th>sex</th>\n",
" <th>capital-gain</th>\n",
" <th>capital-loss</th>\n",
" <th>hours-per-week</th>\n",
" <th>native-country</th>\n",
" <th>income</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>39</td>\n",
" <td>State-gov</td>\n",
" <td>77516</td>\n",
" <td>Bachelors</td>\n",
" <td>13</td>\n",
" <td>Never-married</td>\n",
" <td>Adm-clerical</td>\n",
" <td>Not-in-family</td>\n",
" <td>White</td>\n",
" <td>Male</td>\n",
" <td>2174</td>\n",
" <td>0</td>\n",
" <td>40</td>\n",
" <td>United-States</td>\n",
" <td>&lt;=50K</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>50</td>\n",
" <td>Self-emp-not-inc</td>\n",
" <td>83311</td>\n",
" <td>Bachelors</td>\n",
" <td>13</td>\n",
" <td>Married-civ-spouse</td>\n",
" <td>Exec-managerial</td>\n",
" <td>Husband</td>\n",
" <td>White</td>\n",
" <td>Male</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>13</td>\n",
" <td>United-States</td>\n",
" <td>&lt;=50K</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>38</td>\n",
" <td>Private</td>\n",
" <td>215646</td>\n",
" <td>HS-grad</td>\n",
" <td>9</td>\n",
" <td>Divorced</td>\n",
" <td>Handlers-cleaners</td>\n",
" <td>Not-in-family</td>\n",
" <td>White</td>\n",
" <td>Male</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>40</td>\n",
" <td>United-States</td>\n",
" <td>&lt;=50K</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>53</td>\n",
" <td>Private</td>\n",
" <td>234721</td>\n",
" <td>11th</td>\n",
" <td>7</td>\n",
" <td>Married-civ-spouse</td>\n",
" <td>Handlers-cleaners</td>\n",
" <td>Husband</td>\n",
" <td>Black</td>\n",
" <td>Male</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>40</td>\n",
" <td>United-States</td>\n",
" <td>&lt;=50K</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>28</td>\n",
" <td>Private</td>\n",
" <td>338409</td>\n",
" <td>Bachelors</td>\n",
" <td>13</td>\n",
" <td>Married-civ-spouse</td>\n",
" <td>Prof-specialty</td>\n",
" <td>Wife</td>\n",
" <td>Black</td>\n",
" <td>Female</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>40</td>\n",
" <td>Cuba</td>\n",
" <td>&lt;=50K</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age workclass fnlwgt education education-num \\\n",
"0 39 State-gov 77516 Bachelors 13 \n",
"1 50 Self-emp-not-inc 83311 Bachelors 13 \n",
"2 38 Private 215646 HS-grad 9 \n",
"3 53 Private 234721 11th 7 \n",
"4 28 Private 338409 Bachelors 13 \n",
"\n",
" marital-status occupation relationship race sex \\\n",
"0 Never-married Adm-clerical Not-in-family White Male \n",
"1 Married-civ-spouse Exec-managerial Husband White Male \n",
"2 Divorced Handlers-cleaners Not-in-family White Male \n",
"3 Married-civ-spouse Handlers-cleaners Husband Black Male \n",
"4 Married-civ-spouse Prof-specialty Wife Black Female \n",
"\n",
" capital-gain capital-loss hours-per-week native-country income \n",
"0 2174 0 40 United-States <=50K \n",
"1 0 0 13 United-States <=50K \n",
"2 0 0 40 United-States <=50K \n",
"3 0 0 40 United-States <=50K \n",
"4 0 0 40 Cuba <=50K "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adult.head()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>fnlwgt</th>\n",
" <th>education-num</th>\n",
" <th>capital-gain</th>\n",
" <th>capital-loss</th>\n",
" <th>hours-per-week</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>32561.000000</td>\n",
" <td>3.256100e+04</td>\n",
" <td>32561.000000</td>\n",
" <td>32561.000000</td>\n",
" <td>32561.000000</td>\n",
" <td>32561.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>38.581647</td>\n",
" <td>1.897784e+05</td>\n",
" <td>10.080679</td>\n",
" <td>1077.648844</td>\n",
" <td>87.303830</td>\n",
" <td>40.437456</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>13.640433</td>\n",
" <td>1.055500e+05</td>\n",
" <td>2.572720</td>\n",
" <td>7385.292085</td>\n",
" <td>402.960219</td>\n",
" <td>12.347429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>17.000000</td>\n",
" <td>1.228500e+04</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>28.000000</td>\n",
" <td>1.178270e+05</td>\n",
" <td>9.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>40.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>37.000000</td>\n",
" <td>1.783560e+05</td>\n",
" <td>10.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>40.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>48.000000</td>\n",
" <td>2.370510e+05</td>\n",
" <td>12.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>45.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>90.000000</td>\n",
" <td>1.484705e+06</td>\n",
" <td>16.000000</td>\n",
" <td>99999.000000</td>\n",
" <td>4356.000000</td>\n",
" <td>99.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age fnlwgt education-num capital-gain capital-loss \\\n",
"count 32561.000000 3.256100e+04 32561.000000 32561.000000 32561.000000 \n",
"mean 38.581647 1.897784e+05 10.080679 1077.648844 87.303830 \n",
"std 13.640433 1.055500e+05 2.572720 7385.292085 402.960219 \n",
"min 17.000000 1.228500e+04 1.000000 0.000000 0.000000 \n",
"25% 28.000000 1.178270e+05 9.000000 0.000000 0.000000 \n",
"50% 37.000000 1.783560e+05 10.000000 0.000000 0.000000 \n",
"75% 48.000000 2.370510e+05 12.000000 0.000000 0.000000 \n",
"max 90.000000 1.484705e+06 16.000000 99999.000000 4356.000000 \n",
"\n",
" hours-per-week \n",
"count 32561.000000 \n",
"mean 40.437456 \n",
"std 12.347429 \n",
"min 1.000000 \n",
"25% 40.000000 \n",
"50% 40.000000 \n",
"75% 45.000000 \n",
"max 99.000000 "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adult.describe()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>fnlwgt</th>\n",
" <th>education-num</th>\n",
" <th>capital-gain</th>\n",
" <th>capital-loss</th>\n",
" <th>hours-per-week</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0.1</th>\n",
" <td>22.0</td>\n",
" <td>65716.0</td>\n",
" <td>7.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>24.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0.5</th>\n",
" <td>37.0</td>\n",
" <td>178356.0</td>\n",
" <td>10.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>40.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age fnlwgt education-num capital-gain capital-loss hours-per-week\n",
"0.1 22.0 65716.0 7.0 0.0 0.0 24.0\n",
"0.5 37.0 178356.0 10.0 0.0 0.0 40.0"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adult.quantile([.1,.5])"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>fnlwgt</th>\n",
" <th>education-num</th>\n",
" <th>capital-gain</th>\n",
" <th>capital-loss</th>\n",
" <th>hours-per-week</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0.10</th>\n",
" <td>22.0</td>\n",
" <td>65716.0</td>\n",
" <td>7.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>24.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0.50</th>\n",
" <td>37.0</td>\n",
" <td>178356.0</td>\n",
" <td>10.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>40.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0.10</th>\n",
" <td>22.0</td>\n",
" <td>65716.0</td>\n",
" <td>7.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>24.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0.25</th>\n",
" <td>28.0</td>\n",
" <td>117827.0</td>\n",
" <td>9.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>40.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0.50</th>\n",
" <td>37.0</td>\n",
" <td>178356.0</td>\n",
" <td>10.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>40.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0.75</th>\n",
" <td>48.0</td>\n",
" <td>237051.0</td>\n",
" <td>12.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>45.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0.90</th>\n",
" <td>58.0</td>\n",
" <td>329054.0</td>\n",
" <td>13.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>55.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0.95</th>\n",
" <td>63.0</td>\n",
" <td>379682.0</td>\n",
" <td>14.0</td>\n",
" <td>5013.0</td>\n",
" <td>0.0</td>\n",
" <td>60.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0.99</th>\n",
" <td>74.0</td>\n",
" <td>510072.0</td>\n",
" <td>16.0</td>\n",
" <td>15024.0</td>\n",
" <td>1980.0</td>\n",
" <td>80.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age fnlwgt education-num capital-gain capital-loss \\\n",
"0.10 22.0 65716.0 7.0 0.0 0.0 \n",
"0.50 37.0 178356.0 10.0 0.0 0.0 \n",
"0.10 22.0 65716.0 7.0 0.0 0.0 \n",
"0.25 28.0 117827.0 9.0 0.0 0.0 \n",
"0.50 37.0 178356.0 10.0 0.0 0.0 \n",
"0.75 48.0 237051.0 12.0 0.0 0.0 \n",
"0.90 58.0 329054.0 13.0 0.0 0.0 \n",
"0.95 63.0 379682.0 14.0 5013.0 0.0 \n",
"0.99 74.0 510072.0 16.0 15024.0 1980.0 \n",
"\n",
" hours-per-week \n",
"0.10 24.0 \n",
"0.50 40.0 \n",
"0.10 24.0 \n",
"0.25 40.0 \n",
"0.50 40.0 \n",
"0.75 45.0 \n",
"0.90 55.0 \n",
"0.95 60.0 \n",
"0.99 80.0 "
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adult.quantile([.1,.5,.10,.25,.50,.75,.90,.95,.99])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>fnlwgt</th>\n",
" <th>education-num</th>\n",
" <th>capital-gain</th>\n",
" <th>capital-loss</th>\n",
" <th>hours-per-week</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>age</th>\n",
" <td>1.000000</td>\n",
" <td>-0.076646</td>\n",
" <td>0.036527</td>\n",
" <td>0.077674</td>\n",
" <td>0.057775</td>\n",
" <td>0.068756</td>\n",
" </tr>\n",
" <tr>\n",
" <th>fnlwgt</th>\n",
" <td>-0.076646</td>\n",
" <td>1.000000</td>\n",
" <td>-0.043195</td>\n",
" <td>0.000432</td>\n",
" <td>-0.010252</td>\n",
" <td>-0.018768</td>\n",
" </tr>\n",
" <tr>\n",
" <th>education-num</th>\n",
" <td>0.036527</td>\n",
" <td>-0.043195</td>\n",
" <td>1.000000</td>\n",
" <td>0.122630</td>\n",
" <td>0.079923</td>\n",
" <td>0.148123</td>\n",
" </tr>\n",
" <tr>\n",
" <th>capital-gain</th>\n",
" <td>0.077674</td>\n",
" <td>0.000432</td>\n",
" <td>0.122630</td>\n",
" <td>1.000000</td>\n",
" <td>-0.031615</td>\n",
" <td>0.078409</td>\n",
" </tr>\n",
" <tr>\n",
" <th>capital-loss</th>\n",
" <td>0.057775</td>\n",
" <td>-0.010252</td>\n",
" <td>0.079923</td>\n",
" <td>-0.031615</td>\n",
" <td>1.000000</td>\n",
" <td>0.054256</td>\n",
" </tr>\n",
" <tr>\n",
" <th>hours-per-week</th>\n",
" <td>0.068756</td>\n",
" <td>-0.018768</td>\n",
" <td>0.148123</td>\n",
" <td>0.078409</td>\n",
" <td>0.054256</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age fnlwgt education-num capital-gain capital-loss \\\n",
"age 1.000000 -0.076646 0.036527 0.077674 0.057775 \n",
"fnlwgt -0.076646 1.000000 -0.043195 0.000432 -0.010252 \n",
"education-num 0.036527 -0.043195 1.000000 0.122630 0.079923 \n",
"capital-gain 0.077674 0.000432 0.122630 1.000000 -0.031615 \n",
"capital-loss 0.057775 -0.010252 0.079923 -0.031615 1.000000 \n",
"hours-per-week 0.068756 -0.018768 0.148123 0.078409 0.054256 \n",
"\n",
" hours-per-week \n",
"age 0.068756 \n",
"fnlwgt -0.018768 \n",
"education-num 0.148123 \n",
"capital-gain 0.078409 \n",
"capital-loss 0.054256 \n",
"hours-per-week 1.000000 "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adult.corr()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>fnlwgt</th>\n",
" <th>education-num</th>\n",
" <th>capital-gain</th>\n",
" <th>capital-loss</th>\n",
" <th>hours-per-week</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>age</th>\n",
" <td>1.000000</td>\n",
" <td>-0.076646</td>\n",
" <td>0.036527</td>\n",
" <td>0.077674</td>\n",
" <td>0.057775</td>\n",
" <td>0.068756</td>\n",
" </tr>\n",
" <tr>\n",
" <th>fnlwgt</th>\n",
" <td>-0.076646</td>\n",
" <td>1.000000</td>\n",
" <td>-0.043195</td>\n",
" <td>0.000432</td>\n",
" <td>-0.010252</td>\n",
" <td>-0.018768</td>\n",
" </tr>\n",
" <tr>\n",
" <th>education-num</th>\n",
" <td>0.036527</td>\n",
" <td>-0.043195</td>\n",
" <td>1.000000</td>\n",
" <td>0.122630</td>\n",
" <td>0.079923</td>\n",
" <td>0.148123</td>\n",
" </tr>\n",
" <tr>\n",
" <th>capital-gain</th>\n",
" <td>0.077674</td>\n",
" <td>0.000432</td>\n",
" <td>0.122630</td>\n",
" <td>1.000000</td>\n",
" <td>-0.031615</td>\n",
" <td>0.078409</td>\n",
" </tr>\n",
" <tr>\n",
" <th>capital-loss</th>\n",
" <td>0.057775</td>\n",
" <td>-0.010252</td>\n",
" <td>0.079923</td>\n",
" <td>-0.031615</td>\n",
" <td>1.000000</td>\n",
" <td>0.054256</td>\n",
" </tr>\n",
" <tr>\n",
" <th>hours-per-week</th>\n",
" <td>0.068756</td>\n",
" <td>-0.018768</td>\n",
" <td>0.148123</td>\n",
" <td>0.078409</td>\n",
" <td>0.054256</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age fnlwgt education-num capital-gain capital-loss \\\n",
"age 1.000000 -0.076646 0.036527 0.077674 0.057775 \n",
"fnlwgt -0.076646 1.000000 -0.043195 0.000432 -0.010252 \n",
"education-num 0.036527 -0.043195 1.000000 0.122630 0.079923 \n",
"capital-gain 0.077674 0.000432 0.122630 1.000000 -0.031615 \n",
"capital-loss 0.057775 -0.010252 0.079923 -0.031615 1.000000 \n",
"hours-per-week 0.068756 -0.018768 0.148123 0.078409 0.054256 \n",
"\n",
" hours-per-week \n",
"age 0.068756 \n",
"fnlwgt -0.018768 \n",
"education-num 0.148123 \n",
"capital-gain 0.078409 \n",
"capital-loss 0.054256 \n",
"hours-per-week 1.000000 "
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adult.corr(method='pearson', min_periods=1)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
" White 27816\n",
" Black 3124\n",
" Asian-Pac-Islander 1039\n",
" Amer-Indian-Eskimo 311\n",
" Other 271\n",
"Name: race, dtype: int64"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adult.race.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
" Male 21790\n",
" Female 10771\n",
"Name: sex, dtype: int64"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adult.sex.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>sex</th>\n",
" <th>Female</th>\n",
" <th>Male</th>\n",
" </tr>\n",
" <tr>\n",
" <th>race</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Amer-Indian-Eskimo</th>\n",
" <td>119</td>\n",
" <td>192</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Asian-Pac-Islander</th>\n",
" <td>346</td>\n",
" <td>693</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Black</th>\n",
" <td>1555</td>\n",
" <td>1569</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Other</th>\n",
" <td>109</td>\n",
" <td>162</td>\n",
" </tr>\n",
" <tr>\n",
" <th>White</th>\n",
" <td>8642</td>\n",
" <td>19174</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"sex Female Male\n",
"race \n",
" Amer-Indian-Eskimo 119 192\n",
" Asian-Pac-Islander 346 693\n",
" Black 1555 1569\n",
" Other 109 162\n",
" White 8642 19174"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.crosstab(adult.race,adult.sex)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>income</th>\n",
" <th>&lt;=50K</th>\n",
" <th>&gt;50K</th>\n",
" </tr>\n",
" <tr>\n",
" <th>race</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Amer-Indian-Eskimo</th>\n",
" <td>275</td>\n",
" <td>36</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Asian-Pac-Islander</th>\n",
" <td>763</td>\n",
" <td>276</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Black</th>\n",
" <td>2737</td>\n",
" <td>387</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Other</th>\n",
" <td>246</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>White</th>\n",
" <td>20699</td>\n",
" <td>7117</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"income <=50K >50K\n",
"race \n",
" Amer-Indian-Eskimo 275 36\n",
" Asian-Pac-Islander 763 276\n",
" Black 2737 387\n",
" Other 246 25\n",
" White 20699 7117"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.crosstab(adult.race,adult.income)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"workclass=adult.groupby(\"workclass\") "
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>fnlwgt</th>\n",
" <th>education</th>\n",
" <th>education-num</th>\n",
" <th>marital-status</th>\n",
" <th>occupation</th>\n",
" <th>relationship</th>\n",
" <th>race</th>\n",
" <th>sex</th>\n",
" <th>capital-gain</th>\n",
" <th>capital-loss</th>\n",
" <th>hours-per-week</th>\n",
" <th>native-country</th>\n",
" <th>income</th>\n",
" </tr>\n",
" <tr>\n",
" <th>workclass</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>?</th>\n",
" <td>1836</td>\n",
" <td>1836</td>\n",
" <td>1836</td>\n",
" <td>1836</td>\n",
" <td>1836</td>\n",
" <td>1836</td>\n",
" <td>1836</td>\n",
" <td>1836</td>\n",
" <td>1836</td>\n",
" <td>1836</td>\n",
" <td>1836</td>\n",
" <td>1836</td>\n",
" <td>1836</td>\n",
" <td>1836</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Federal-gov</th>\n",
" <td>960</td>\n",
" <td>960</td>\n",
" <td>960</td>\n",
" <td>960</td>\n",
" <td>960</td>\n",
" <td>960</td>\n",
" <td>960</td>\n",
" <td>960</td>\n",
" <td>960</td>\n",
" <td>960</td>\n",
" <td>960</td>\n",
" <td>960</td>\n",
" <td>960</td>\n",
" <td>960</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Local-gov</th>\n",
" <td>2093</td>\n",
" <td>2093</td>\n",
" <td>2093</td>\n",
" <td>2093</td>\n",
" <td>2093</td>\n",
" <td>2093</td>\n",
" <td>2093</td>\n",
" <td>2093</td>\n",
" <td>2093</td>\n",
" <td>2093</td>\n",
" <td>2093</td>\n",
" <td>2093</td>\n",
" <td>2093</td>\n",
" <td>2093</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Never-worked</th>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Private</th>\n",
" <td>22696</td>\n",
" <td>22696</td>\n",
" <td>22696</td>\n",
" <td>22696</td>\n",
" <td>22696</td>\n",
" <td>22696</td>\n",
" <td>22696</td>\n",
" <td>22696</td>\n",
" <td>22696</td>\n",
" <td>22696</td>\n",
" <td>22696</td>\n",
" <td>22696</td>\n",
" <td>22696</td>\n",
" <td>22696</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Self-emp-inc</th>\n",
" <td>1116</td>\n",
" <td>1116</td>\n",
" <td>1116</td>\n",
" <td>1116</td>\n",
" <td>1116</td>\n",
" <td>1116</td>\n",
" <td>1116</td>\n",
" <td>1116</td>\n",
" <td>1116</td>\n",
" <td>1116</td>\n",
" <td>1116</td>\n",
" <td>1116</td>\n",
" <td>1116</td>\n",
" <td>1116</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Self-emp-not-inc</th>\n",
" <td>2541</td>\n",
" <td>2541</td>\n",
" <td>2541</td>\n",
" <td>2541</td>\n",
" <td>2541</td>\n",
" <td>2541</td>\n",
" <td>2541</td>\n",
" <td>2541</td>\n",
" <td>2541</td>\n",
" <td>2541</td>\n",
" <td>2541</td>\n",
" <td>2541</td>\n",
" <td>2541</td>\n",
" <td>2541</td>\n",
" </tr>\n",
" <tr>\n",
" <th>State-gov</th>\n",
" <td>1298</td>\n",
" <td>1298</td>\n",
" <td>1298</td>\n",
" <td>1298</td>\n",
" <td>1298</td>\n",
" <td>1298</td>\n",
" <td>1298</td>\n",
" <td>1298</td>\n",
" <td>1298</td>\n",
" <td>1298</td>\n",
" <td>1298</td>\n",
" <td>1298</td>\n",
" <td>1298</td>\n",
" <td>1298</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Without-pay</th>\n",
" <td>14</td>\n",
" <td>14</td>\n",
" <td>14</td>\n",
" <td>14</td>\n",
" <td>14</td>\n",
" <td>14</td>\n",
" <td>14</td>\n",
" <td>14</td>\n",
" <td>14</td>\n",
" <td>14</td>\n",
" <td>14</td>\n",
" <td>14</td>\n",
" <td>14</td>\n",
" <td>14</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age fnlwgt education education-num marital-status \\\n",
"workclass \n",
" ? 1836 1836 1836 1836 1836 \n",
" Federal-gov 960 960 960 960 960 \n",
" Local-gov 2093 2093 2093 2093 2093 \n",
" Never-worked 7 7 7 7 7 \n",
" Private 22696 22696 22696 22696 22696 \n",
" Self-emp-inc 1116 1116 1116 1116 1116 \n",
" Self-emp-not-inc 2541 2541 2541 2541 2541 \n",
" State-gov 1298 1298 1298 1298 1298 \n",
" Without-pay 14 14 14 14 14 \n",
"\n",
" occupation relationship race sex capital-gain \\\n",
"workclass \n",
" ? 1836 1836 1836 1836 1836 \n",
" Federal-gov 960 960 960 960 960 \n",
" Local-gov 2093 2093 2093 2093 2093 \n",
" Never-worked 7 7 7 7 7 \n",
" Private 22696 22696 22696 22696 22696 \n",
" Self-emp-inc 1116 1116 1116 1116 1116 \n",
" Self-emp-not-inc 2541 2541 2541 2541 2541 \n",
" State-gov 1298 1298 1298 1298 1298 \n",
" Without-pay 14 14 14 14 14 \n",
"\n",
" capital-loss hours-per-week native-country income \n",
"workclass \n",
" ? 1836 1836 1836 1836 \n",
" Federal-gov 960 960 960 960 \n",
" Local-gov 2093 2093 2093 2093 \n",
" Never-worked 7 7 7 7 \n",
" Private 22696 22696 22696 22696 \n",
" Self-emp-inc 1116 1116 1116 1116 \n",
" Self-emp-not-inc 2541 2541 2541 2541 \n",
" State-gov 1298 1298 1298 1298 \n",
" Without-pay 14 14 14 14 "
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"workclass.count() \n"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>fnlwgt</th>\n",
" <th>education-num</th>\n",
" <th>capital-gain</th>\n",
" <th>capital-loss</th>\n",
" <th>hours-per-week</th>\n",
" </tr>\n",
" <tr>\n",
" <th>workclass</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>?</th>\n",
" <td>40.960240</td>\n",
" <td>188516.338235</td>\n",
" <td>9.260349</td>\n",
" <td>606.795752</td>\n",
" <td>60.760349</td>\n",
" <td>31.919390</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Federal-gov</th>\n",
" <td>42.590625</td>\n",
" <td>185221.243750</td>\n",
" <td>10.973958</td>\n",
" <td>833.232292</td>\n",
" <td>112.268750</td>\n",
" <td>41.379167</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Local-gov</th>\n",
" <td>41.751075</td>\n",
" <td>188639.712852</td>\n",
" <td>11.042045</td>\n",
" <td>880.202580</td>\n",
" <td>109.854276</td>\n",
" <td>40.982800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Never-worked</th>\n",
" <td>20.571429</td>\n",
" <td>225989.571429</td>\n",
" <td>7.428571</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>28.428571</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Private</th>\n",
" <td>36.797585</td>\n",
" <td>192764.114734</td>\n",
" <td>9.879714</td>\n",
" <td>889.217792</td>\n",
" <td>80.008724</td>\n",
" <td>40.267096</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Self-emp-inc</th>\n",
" <td>46.017025</td>\n",
" <td>175981.344086</td>\n",
" <td>11.137097</td>\n",
" <td>4875.693548</td>\n",
" <td>155.138889</td>\n",
" <td>48.818100</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Self-emp-not-inc</th>\n",
" <td>44.969697</td>\n",
" <td>175608.641480</td>\n",
" <td>10.226289</td>\n",
" <td>1886.061787</td>\n",
" <td>116.631641</td>\n",
" <td>44.421881</td>\n",
" </tr>\n",
" <tr>\n",
" <th>State-gov</th>\n",
" <td>39.436055</td>\n",
" <td>184136.613251</td>\n",
" <td>11.375963</td>\n",
" <td>701.699538</td>\n",
" <td>83.256549</td>\n",
" <td>39.031587</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Without-pay</th>\n",
" <td>47.785714</td>\n",
" <td>174267.500000</td>\n",
" <td>9.071429</td>\n",
" <td>487.857143</td>\n",
" <td>0.000000</td>\n",
" <td>32.714286</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age fnlwgt education-num capital-gain \\\n",
"workclass \n",
" ? 40.960240 188516.338235 9.260349 606.795752 \n",
" Federal-gov 42.590625 185221.243750 10.973958 833.232292 \n",
" Local-gov 41.751075 188639.712852 11.042045 880.202580 \n",
" Never-worked 20.571429 225989.571429 7.428571 0.000000 \n",
" Private 36.797585 192764.114734 9.879714 889.217792 \n",
" Self-emp-inc 46.017025 175981.344086 11.137097 4875.693548 \n",
" Self-emp-not-inc 44.969697 175608.641480 10.226289 1886.061787 \n",
" State-gov 39.436055 184136.613251 11.375963 701.699538 \n",
" Without-pay 47.785714 174267.500000 9.071429 487.857143 \n",
"\n",
" capital-loss hours-per-week \n",
"workclass \n",
" ? 60.760349 31.919390 \n",
" Federal-gov 112.268750 41.379167 \n",
" Local-gov 109.854276 40.982800 \n",
" Never-worked 0.000000 28.428571 \n",
" Private 80.008724 40.267096 \n",
" Self-emp-inc 155.138889 48.818100 \n",
" Self-emp-not-inc 116.631641 44.421881 \n",
" State-gov 83.256549 39.031587 \n",
" Without-pay 0.000000 32.714286 "
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"workclass.mean() "
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" <th>5</th>\n",
" <th>6</th>\n",
" <th>7</th>\n",
" <th>8</th>\n",
" <th>9</th>\n",
" <th>...</th>\n",
" <th>32551</th>\n",
" <th>32552</th>\n",
" <th>32553</th>\n",
" <th>32554</th>\n",
" <th>32555</th>\n",
" <th>32556</th>\n",
" <th>32557</th>\n",
" <th>32558</th>\n",
" <th>32559</th>\n",
" <th>32560</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>age</th>\n",
" <td>39</td>\n",
" <td>50</td>\n",
" <td>38</td>\n",
" <td>53</td>\n",
" <td>28</td>\n",
" <td>37</td>\n",
" <td>49</td>\n",
" <td>52</td>\n",
" <td>31</td>\n",
" <td>42</td>\n",
" <td>...</td>\n",
" <td>32</td>\n",
" <td>43</td>\n",
" <td>32</td>\n",
" <td>53</td>\n",
" <td>22</td>\n",
" <td>27</td>\n",
" <td>40</td>\n",
" <td>58</td>\n",
" <td>22</td>\n",
" <td>52</td>\n",
" </tr>\n",
" <tr>\n",
" <th>workclass</th>\n",
" <td>State-gov</td>\n",
" <td>Self-emp-not-inc</td>\n",
" <td>Private</td>\n",
" <td>Private</td>\n",
" <td>Private</td>\n",
" <td>Private</td>\n",
" <td>Private</td>\n",
" <td>Self-emp-not-inc</td>\n",
" <td>Private</td>\n",
" <td>Private</td>\n",
" <td>...</td>\n",
" <td>Private</td>\n",
" <td>Private</td>\n",
" <td>Private</td>\n",
" <td>Private</td>\n",
" <td>Private</td>\n",
" <td>Private</td>\n",
" <td>Private</td>\n",
" <td>Private</td>\n",
" <td>Private</td>\n",
" <td>Self-emp-inc</td>\n",
" </tr>\n",
" <tr>\n",
" <th>fnlwgt</th>\n",
" <td>77516</td>\n",
" <td>83311</td>\n",
" <td>215646</td>\n",
" <td>234721</td>\n",
" <td>338409</td>\n",
" <td>284582</td>\n",
" <td>160187</td>\n",
" <td>209642</td>\n",
" <td>45781</td>\n",
" <td>159449</td>\n",
" <td>...</td>\n",
" <td>34066</td>\n",
" <td>84661</td>\n",
" <td>116138</td>\n",
" <td>321865</td>\n",
" <td>310152</td>\n",
" <td>257302</td>\n",
" <td>154374</td>\n",
" <td>151910</td>\n",
" <td>201490</td>\n",
" <td>287927</td>\n",
" </tr>\n",
" <tr>\n",
" <th>education</th>\n",
" <td>Bachelors</td>\n",
" <td>Bachelors</td>\n",
" <td>HS-grad</td>\n",
" <td>11th</td>\n",
" <td>Bachelors</td>\n",
" <td>Masters</td>\n",
" <td>9th</td>\n",
" <td>HS-grad</td>\n",
" <td>Masters</td>\n",
" <td>Bachelors</td>\n",
" <td>...</td>\n",
" <td>10th</td>\n",
" <td>Assoc-voc</td>\n",
" <td>Masters</td>\n",
" <td>Masters</td>\n",
" <td>Some-college</td>\n",
" <td>Assoc-acdm</td>\n",
" <td>HS-grad</td>\n",
" <td>HS-grad</td>\n",
" <td>HS-grad</td>\n",
" <td>HS-grad</td>\n",
" </tr>\n",
" <tr>\n",
" <th>education-num</th>\n",
" <td>13</td>\n",
" <td>13</td>\n",
" <td>9</td>\n",
" <td>7</td>\n",
" <td>13</td>\n",
" <td>14</td>\n",
" <td>5</td>\n",
" <td>9</td>\n",
" <td>14</td>\n",
" <td>13</td>\n",
" <td>...</td>\n",
" <td>6</td>\n",
" <td>11</td>\n",
" <td>14</td>\n",
" <td>14</td>\n",
" <td>10</td>\n",
" <td>12</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>marital-status</th>\n",
" <td>Never-married</td>\n",
" <td>Married-civ-spouse</td>\n",
" <td>Divorced</td>\n",
" <td>Married-civ-spouse</td>\n",
" <td>Married-civ-spouse</td>\n",
" <td>Married-civ-spouse</td>\n",
" <td>Married-spouse-absent</td>\n",
" <td>Married-civ-spouse</td>\n",
" <td>Never-married</td>\n",
" <td>Married-civ-spouse</td>\n",
" <td>...</td>\n",
" <td>Married-civ-spouse</td>\n",
" <td>Married-civ-spouse</td>\n",
" <td>Never-married</td>\n",
" <td>Married-civ-spouse</td>\n",
" <td>Never-married</td>\n",
" <td>Married-civ-spouse</td>\n",
" <td>Married-civ-spouse</td>\n",
" <td>Widowed</td>\n",
" <td>Never-married</td>\n",
" <td>Married-civ-spouse</td>\n",
" </tr>\n",
" <tr>\n",
" <th>occupation</th>\n",
" <td>Adm-clerical</td>\n",
" <td>Exec-managerial</td>\n",
" <td>Handlers-cleaners</td>\n",
" <td>Handlers-cleaners</td>\n",
" <td>Prof-specialty</td>\n",
" <td>Exec-managerial</td>\n",
" <td>Other-service</td>\n",
" <td>Exec-managerial</td>\n",
" <td>Prof-specialty</td>\n",
" <td>Exec-managerial</td>\n",
" <td>...</td>\n",
" <td>Handlers-cleaners</td>\n",
" <td>Sales</td>\n",
" <td>Tech-support</td>\n",
" <td>Exec-managerial</td>\n",
" <td>Protective-serv</td>\n",
" <td>Tech-support</td>\n",
" <td>Machine-op-inspct</td>\n",
" <td>Adm-clerical</td>\n",
" <td>Adm-clerical</td>\n",
" <td>Exec-managerial</td>\n",
" </tr>\n",
" <tr>\n",
" <th>relationship</th>\n",
" <td>Not-in-family</td>\n",
" <td>Husband</td>\n",
" <td>Not-in-family</td>\n",
" <td>Husband</td>\n",
" <td>Wife</td>\n",
" <td>Wife</td>\n",
" <td>Not-in-family</td>\n",
" <td>Husband</td>\n",
" <td>Not-in-family</td>\n",
" <td>Husband</td>\n",
" <td>...</td>\n",
" <td>Husband</td>\n",
" <td>Husband</td>\n",
" <td>Not-in-family</td>\n",
" <td>Husband</td>\n",
" <td>Not-in-family</td>\n",
" <td>Wife</td>\n",
" <td>Husband</td>\n",
" <td>Unmarried</td>\n",
" <td>Own-child</td>\n",
" <td>Wife</td>\n",
" </tr>\n",
" <tr>\n",
" <th>race</th>\n",
" <td>White</td>\n",
" <td>White</td>\n",
" <td>White</td>\n",
" <td>Black</td>\n",
" <td>Black</td>\n",
" <td>White</td>\n",
" <td>Black</td>\n",
" <td>White</td>\n",
" <td>White</td>\n",
" <td>White</td>\n",
" <td>...</td>\n",
" <td>Amer-Indian-Eskimo</td>\n",
" <td>White</td>\n",
" <td>Asian-Pac-Islander</td>\n",
" <td>White</td>\n",
" <td>White</td>\n",
" <td>White</td>\n",
" <td>White</td>\n",
" <td>White</td>\n",
" <td>White</td>\n",
" <td>White</td>\n",
" </tr>\n",
" <tr>\n",
" <th>sex</th>\n",
" <td>Male</td>\n",
" <td>Male</td>\n",
" <td>Male</td>\n",
" <td>Male</td>\n",
" <td>Female</td>\n",
" <td>Female</td>\n",
" <td>Female</td>\n",
" <td>Male</td>\n",
" <td>Female</td>\n",
" <td>Male</td>\n",
" <td>...</td>\n",
" <td>Male</td>\n",
" <td>Male</td>\n",
" <td>Male</td>\n",
" <td>Male</td>\n",
" <td>Male</td>\n",
" <td>Female</td>\n",
" <td>Male</td>\n",
" <td>Female</td>\n",
" <td>Male</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>capital-gain</th>\n",
" <td>2174</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>14084</td>\n",
" <td>5178</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>15024</td>\n",
" </tr>\n",
" <tr>\n",
" <th>capital-loss</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>hours-per-week</th>\n",
" <td>40</td>\n",
" <td>13</td>\n",
" <td>40</td>\n",
" <td>40</td>\n",
" <td>40</td>\n",
" <td>40</td>\n",
" <td>16</td>\n",
" <td>45</td>\n",
" <td>50</td>\n",
" <td>40</td>\n",
" <td>...</td>\n",
" <td>40</td>\n",
" <td>45</td>\n",
" <td>11</td>\n",
" <td>40</td>\n",
" <td>40</td>\n",
" <td>38</td>\n",
" <td>40</td>\n",
" <td>40</td>\n",
" <td>20</td>\n",
" <td>40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>native-country</th>\n",
" <td>United-States</td>\n",
" <td>United-States</td>\n",
" <td>United-States</td>\n",
" <td>United-States</td>\n",
" <td>Cuba</td>\n",
" <td>United-States</td>\n",
" <td>Jamaica</td>\n",
" <td>United-States</td>\n",
" <td>United-States</td>\n",
" <td>United-States</td>\n",
" <td>...</td>\n",
" <td>United-States</td>\n",
" <td>United-States</td>\n",
" <td>Taiwan</td>\n",
" <td>United-States</td>\n",
" <td>United-States</td>\n",
" <td>United-States</td>\n",
" <td>United-States</td>\n",
" <td>United-States</td>\n",
" <td>United-States</td>\n",
" <td>United-States</td>\n",
" </tr>\n",
" <tr>\n",
" <th>income</th>\n",
" <td>&lt;=50K</td>\n",
" <td>&lt;=50K</td>\n",
" <td>&lt;=50K</td>\n",
" <td>&lt;=50K</td>\n",
" <td>&lt;=50K</td>\n",
" <td>&lt;=50K</td>\n",
" <td>&lt;=50K</td>\n",
" <td>&gt;50K</td>\n",
" <td>&gt;50K</td>\n",
" <td>&gt;50K</td>\n",
" <td>...</td>\n",
" <td>&lt;=50K</td>\n",
" <td>&lt;=50K</td>\n",
" <td>&lt;=50K</td>\n",
" <td>&gt;50K</td>\n",
" <td>&lt;=50K</td>\n",
" <td>&lt;=50K</td>\n",
" <td>&gt;50K</td>\n",
" <td>&lt;=50K</td>\n",
" <td>&lt;=50K</td>\n",
" <td>&gt;50K</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>15 rows × 32561 columns</p>\n",
"</div>"
],
"text/plain": [
" 0 1 2 \\\n",
"age 39 50 38 \n",
"workclass State-gov Self-emp-not-inc Private \n",
"fnlwgt 77516 83311 215646 \n",
"education Bachelors Bachelors HS-grad \n",
"education-num 13 13 9 \n",
"marital-status Never-married Married-civ-spouse Divorced \n",
"occupation Adm-clerical Exec-managerial Handlers-cleaners \n",
"relationship Not-in-family Husband Not-in-family \n",
"race White White White \n",
"sex Male Male Male \n",
"capital-gain 2174 0 0 \n",
"capital-loss 0 0 0 \n",
"hours-per-week 40 13 40 \n",
"native-country United-States United-States United-States \n",
"income <=50K <=50K <=50K \n",
"\n",
" 3 4 5 \\\n",
"age 53 28 37 \n",
"workclass Private Private Private \n",
"fnlwgt 234721 338409 284582 \n",
"education 11th Bachelors Masters \n",
"education-num 7 13 14 \n",
"marital-status Married-civ-spouse Married-civ-spouse Married-civ-spouse \n",
"occupation Handlers-cleaners Prof-specialty Exec-managerial \n",
"relationship Husband Wife Wife \n",
"race Black Black White \n",
"sex Male Female Female \n",
"capital-gain 0 0 0 \n",
"capital-loss 0 0 0 \n",
"hours-per-week 40 40 40 \n",
"native-country United-States Cuba United-States \n",
"income <=50K <=50K <=50K \n",
"\n",
" 6 7 8 \\\n",
"age 49 52 31 \n",
"workclass Private Self-emp-not-inc Private \n",
"fnlwgt 160187 209642 45781 \n",
"education 9th HS-grad Masters \n",
"education-num 5 9 14 \n",
"marital-status Married-spouse-absent Married-civ-spouse Never-married \n",
"occupation Other-service Exec-managerial Prof-specialty \n",
"relationship Not-in-family Husband Not-in-family \n",
"race Black White White \n",
"sex Female Male Female \n",
"capital-gain 0 0 14084 \n",
"capital-loss 0 0 0 \n",
"hours-per-week 16 45 50 \n",
"native-country Jamaica United-States United-States \n",
"income <=50K >50K >50K \n",
"\n",
" 9 ... 32551 \\\n",
"age 42 ... 32 \n",
"workclass Private ... Private \n",
"fnlwgt 159449 ... 34066 \n",
"education Bachelors ... 10th \n",
"education-num 13 ... 6 \n",
"marital-status Married-civ-spouse ... Married-civ-spouse \n",
"occupation Exec-managerial ... Handlers-cleaners \n",
"relationship Husband ... Husband \n",
"race White ... Amer-Indian-Eskimo \n",
"sex Male ... Male \n",
"capital-gain 5178 ... 0 \n",
"capital-loss 0 ... 0 \n",
"hours-per-week 40 ... 40 \n",
"native-country United-States ... United-States \n",
"income >50K ... <=50K \n",
"\n",
" 32552 32553 32554 \\\n",
"age 43 32 53 \n",
"workclass Private Private Private \n",
"fnlwgt 84661 116138 321865 \n",
"education Assoc-voc Masters Masters \n",
"education-num 11 14 14 \n",
"marital-status Married-civ-spouse Never-married Married-civ-spouse \n",
"occupation Sales Tech-support Exec-managerial \n",
"relationship Husband Not-in-family Husband \n",
"race White Asian-Pac-Islander White \n",
"sex Male Male Male \n",
"capital-gain 0 0 0 \n",
"capital-loss 0 0 0 \n",
"hours-per-week 45 11 40 \n",
"native-country United-States Taiwan United-States \n",
"income <=50K <=50K >50K \n",
"\n",
" 32555 32556 32557 \\\n",
"age 22 27 40 \n",
"workclass Private Private Private \n",
"fnlwgt 310152 257302 154374 \n",
"education Some-college Assoc-acdm HS-grad \n",
"education-num 10 12 9 \n",
"marital-status Never-married Married-civ-spouse Married-civ-spouse \n",
"occupation Protective-serv Tech-support Machine-op-inspct \n",
"relationship Not-in-family Wife Husband \n",
"race White White White \n",
"sex Male Female Male \n",
"capital-gain 0 0 0 \n",
"capital-loss 0 0 0 \n",
"hours-per-week 40 38 40 \n",
"native-country United-States United-States United-States \n",
"income <=50K <=50K >50K \n",
"\n",
" 32558 32559 32560 \n",
"age 58 22 52 \n",
"workclass Private Private Self-emp-inc \n",
"fnlwgt 151910 201490 287927 \n",
"education HS-grad HS-grad HS-grad \n",
"education-num 9 9 9 \n",
"marital-status Widowed Never-married Married-civ-spouse \n",
"occupation Adm-clerical Adm-clerical Exec-managerial \n",
"relationship Unmarried Own-child Wife \n",
"race White White White \n",
"sex Female Male Female \n",
"capital-gain 0 0 15024 \n",
"capital-loss 0 0 0 \n",
"hours-per-week 40 20 40 \n",
"native-country United-States United-States United-States \n",
"income <=50K <=50K >50K \n",
"\n",
"[15 rows x 32561 columns]"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adult.transpose()"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>race</th>\n",
" <th>Amer-Indian-Eskimo</th>\n",
" <th>Asian-Pac-Islander</th>\n",
" <th>Black</th>\n",
" <th>Other</th>\n",
" <th>White</th>\n",
" </tr>\n",
" <tr>\n",
" <th>sex</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Female</th>\n",
" <td>36</td>\n",
" <td>33</td>\n",
" <td>37</td>\n",
" <td>29</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Male</th>\n",
" <td>35</td>\n",
" <td>37</td>\n",
" <td>36</td>\n",
" <td>32</td>\n",
" <td>38</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"race Amer-Indian-Eskimo Asian-Pac-Islander Black Other White\n",
"sex \n",
" Female 36 33 37 29 35\n",
" Male 35 37 36 32 38"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"e=adult.groupby(['sex', \"race\"]).age.median().reset_index()\n",
"e.pivot(index='sex', columns='race', values='age')\n"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandasql as pdsql"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"str1=\"select * from adult limit 3;\""
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>workclass</th>\n",
" <th>fnlwgt</th>\n",
" <th>education</th>\n",
" <th>education-num</th>\n",
" <th>marital-status</th>\n",
" <th>occupation</th>\n",
" <th>relationship</th>\n",
" <th>race</th>\n",
" <th>sex</th>\n",
" <th>capital-gain</th>\n",
" <th>capital-loss</th>\n",
" <th>hours-per-week</th>\n",
" <th>native-country</th>\n",
" <th>income</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>39</td>\n",
" <td>State-gov</td>\n",
" <td>77516</td>\n",
" <td>Bachelors</td>\n",
" <td>13</td>\n",
" <td>Never-married</td>\n",
" <td>Adm-clerical</td>\n",
" <td>Not-in-family</td>\n",
" <td>White</td>\n",
" <td>Male</td>\n",
" <td>2174</td>\n",
" <td>0</td>\n",
" <td>40</td>\n",
" <td>United-States</td>\n",
" <td>&lt;=50K</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>50</td>\n",
" <td>Self-emp-not-inc</td>\n",
" <td>83311</td>\n",
" <td>Bachelors</td>\n",
" <td>13</td>\n",
" <td>Married-civ-spouse</td>\n",
" <td>Exec-managerial</td>\n",
" <td>Husband</td>\n",
" <td>White</td>\n",
" <td>Male</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>13</td>\n",
" <td>United-States</td>\n",
" <td>&lt;=50K</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>38</td>\n",
" <td>Private</td>\n",
" <td>215646</td>\n",
" <td>HS-grad</td>\n",
" <td>9</td>\n",
" <td>Divorced</td>\n",
" <td>Handlers-cleaners</td>\n",
" <td>Not-in-family</td>\n",
" <td>White</td>\n",
" <td>Male</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>40</td>\n",
" <td>United-States</td>\n",
" <td>&lt;=50K</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age workclass fnlwgt education education-num \\\n",
"0 39 State-gov 77516 Bachelors 13 \n",
"1 50 Self-emp-not-inc 83311 Bachelors 13 \n",
"2 38 Private 215646 HS-grad 9 \n",
"\n",
" marital-status occupation relationship race sex \\\n",
"0 Never-married Adm-clerical Not-in-family White Male \n",
"1 Married-civ-spouse Exec-managerial Husband White Male \n",
"2 Divorced Handlers-cleaners Not-in-family White Male \n",
"\n",
" capital-gain capital-loss hours-per-week native-country income \n",
"0 2174 0 40 United-States <=50K \n",
"1 0 0 13 United-States <=50K \n",
"2 0 0 40 United-States <=50K "
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1=pdsql.sqldf(str1)\n",
"df1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"celltoolbar": "Raw Cell Format",
"kernelspec": {
"display_name": "Python [conda root]",
"language": "python",
"name": "conda-root-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment