Skip to content

Instantly share code, notes, and snippets.

@syed
Created July 9, 2013 18:26
Show Gist options
  • Save syed/5959860 to your computer and use it in GitHub Desktop.
Save syed/5959860 to your computer and use it in GitHub Desktop.
Example use of pandas
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "TB_Model"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": "import pandas as pd\n\ndf = pd.DataFrame( { 'hiv' : [True]*2 + [False]*3,\n 'age' : [\"adult\"] + [\"child\"]*2 + [\"adult\"]*2,\n 'tb_type' : [\"pulm\"]*3 + [\"epulm\"]*2,\n 'num_persons' : [ 80, 32, 21, 46, 84]\n })",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": "df",
"language": "python",
"metadata": {},
"outputs": [
{
"html": "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>age</th>\n <th>hiv</th>\n <th>num_persons</th>\n <th>tb_type</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td> adult</td>\n <td> True</td>\n <td> 80</td>\n <td> pulm</td>\n </tr>\n <tr>\n <th>1</th>\n <td> child</td>\n <td> True</td>\n <td> 32</td>\n <td> pulm</td>\n </tr>\n <tr>\n <th>2</th>\n <td> child</td>\n <td> False</td>\n <td> 21</td>\n <td> pulm</td>\n </tr>\n <tr>\n <th>3</th>\n <td> adult</td>\n <td> False</td>\n <td> 46</td>\n <td> epulm</td>\n </tr>\n <tr>\n <th>4</th>\n <td> adult</td>\n <td> False</td>\n <td> 84</td>\n <td> epulm</td>\n </tr>\n </tbody>\n</table>\n</div>",
"output_type": "pyout",
"prompt_number": 15,
"text": " age hiv num_persons tb_type\n0 adult True 80 pulm\n1 child True 32 pulm\n2 child False 21 pulm\n3 adult False 46 epulm\n4 adult False 84 epulm"
}
],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Selecting Stuff"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "### All Adults"
},
{
"cell_type": "code",
"collapsed": false,
"input": "df[ df[\"age\"] == \"adult\"]",
"language": "python",
"metadata": {},
"outputs": [
{
"html": "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>age</th>\n <th>hiv</th>\n <th>num_persons</th>\n <th>tb_type</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td> adult</td>\n <td> True</td>\n <td> 80</td>\n <td> pulm</td>\n </tr>\n <tr>\n <th>3</th>\n <td> adult</td>\n <td> False</td>\n <td> 46</td>\n <td> epulm</td>\n </tr>\n <tr>\n <th>4</th>\n <td> adult</td>\n <td> False</td>\n <td> 84</td>\n <td> epulm</td>\n </tr>\n </tbody>\n</table>\n</div>",
"output_type": "pyout",
"prompt_number": 16,
"text": " age hiv num_persons tb_type\n0 adult True 80 pulm\n3 adult False 46 epulm\n4 adult False 84 epulm"
}
],
"prompt_number": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": "### All adults with pulmonary TB \nThe `&` sign is for *and* and the `|` sign is for *or* <br>\nHere I am trying to find all adults who have pulmonary TB"
},
{
"cell_type": "code",
"collapsed": false,
"input": "df[ (df[\"age\"] == \"adult\") & (df[\"tb_type\"] == \"pulm\") ]",
"language": "python",
"metadata": {},
"outputs": [
{
"html": "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>age</th>\n <th>hiv</th>\n <th>num_persons</th>\n <th>tb_type</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td> adult</td>\n <td> True</td>\n <td> 80</td>\n <td> pulm</td>\n </tr>\n </tbody>\n</table>\n</div>",
"output_type": "pyout",
"prompt_number": 18,
"text": " age hiv num_persons tb_type\n0 adult True 80 pulm"
}
],
"prompt_number": 18
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Now to get the persons in this particular category"
},
{
"cell_type": "code",
"collapsed": false,
"input": "x = df[ (df[\"age\"] == \"adult\") & (df[\"tb_type\"] == \"pulm\") ]\nint(x.num_persons)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 22,
"text": "80"
}
],
"prompt_number": 22
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Reading from CSV"
},
{
"cell_type": "code",
"collapsed": false,
"input": "csv_data = pd.read_csv(\"data.csv\")\ncsv_data",
"language": "python",
"metadata": {},
"outputs": [
{
"html": "<pre>\n&ltclass 'pandas.core.frame.DataFrame'&gt\nInt64Index: 3 entries, 0 to 2\nData columns (total 36 columns):\nhivinc 3 non-null values\nbeta 3 non-null values\nlat prot 3 non-null values\nlat prot HIV 3 non-null values\nprim prog 3 non-null values\nprimproh HIV 3 non-null values\nbeta_decMDR 3 non-null values\n beta_decINH 3 non-null values\nbeta_decXDR 3 non-null values\nreact 3 non-null values\nreact HIV 3 non-null values\nnever dx 3 non-null values\ninf child 3 non-null values\ndx rate 3 non-null values\ndx rateF 3 non-null values\ndx rateDef 3 non-null values\ndx rate EPTB 3 non-null values\nprop_newdx new 3 non-null values\nprop_newdx recur 3 non-null values\nprop_newdx fail 3 non-null values\nprop_newdx EPTBdec, 3 non-null values\npropEPTB adults HIV neg 3 non-null values\npropEPTB child HIV neg 3 non-null values\n tb_mort 3 non-null values\nbl_mort adult 3 non-null values\nmort_TB_h 3 non-null values\nretreatdec 3 non-null values\nsens mol RIF 3 non-null values\nsens mol INH 3 non-null values\nsens mol XDR 3 non-null values\nsens mol PTB 3 non-null values\nsensstand PTB 3 non-null values\nsensstand EPTB 3 non-null values\nself cure 3 non-null values\nself cure HIV 3 non-null values\nbasecase 3 non-null values\ndtypes: float64(31), int64(5)\n</pre>",
"output_type": "pyout",
"prompt_number": 24,
"text": " hivinc beta lat prot lat prot HIV prim prog primproh HIV \\\n0 0.0010 9.135735 0.45 0.0 0.14 0.25 \n1 0.0007 8.600000 0.40 0.0 0.05 0.16 \n2 0.0017 9.600000 0.55 0.2 0.14 0.27 \n\n beta_decMDR beta_decINH beta_decXDR react react HIV never dx \\\n0 0.85 0.98746 0.6 0.00050 0.05 0.15 \n1 0.80 0.98000 0.5 0.00008 0.03 0.10 \n2 0.90 0.99500 0.7 0.00140 0.06 0.25 \n\n inf child dx rate dx rateF dx rateDef dx rate EPTB prop_newdx new \\\n0 0.20 2 4 2 0.5 0.15 \n1 0.16 1 2 1 0.5 0.15 \n2 0.28 3 6 3 1.0 0.50 \n\n prop_newdx recur prop_newdx fail prop_newdx EPTBdec, \\\n0 0.3 0.3 1 \n1 0.3 0.3 1 \n2 0.5 0.5 1 \n\n propEPTB adults HIV neg propEPTB child HIV neg tb_mort bl_mort adult \\\n0 0.18 0.7 0.15 0.022 \n1 0.15 0.6 0.10 0.020 \n2 0.25 0.8 0.22 0.025 \n\n mort_TB_h retreatdec sens mol RIF sens mol INH sens mol XDR \\\n0 0.3 0.9 0.94 0.88 0.84 \n1 0.2 0.8 0.90 0.70 0.60 \n2 0.5 1.0 0.96 0.95 0.90 \n\n sens mol PTB sensstand PTB sensstand EPTB self cure self cure HIV \\\n0 0.95 0.8 0.6 0.10 0.0 \n1 0.75 0.6 0.4 0.08 0.0 \n2 0.98 0.9 0.8 0.25 0.2 \n\n basecase \n0 0 \n1 0 \n2 0 "
}
],
"prompt_number": 24
},
{
"cell_type": "markdown",
"metadata": {},
"source": "### Reading the whole row"
},
{
"cell_type": "code",
"collapsed": false,
"input": "first_row = csv_data.ix[0,:]\nfirst_row",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 32,
"text": "hivinc 0.001000\nbeta 9.135735\nlat prot 0.450000\nlat prot HIV 0.000000\nprim prog 0.140000\nprimproh HIV 0.250000\nbeta_decMDR 0.850000\n beta_decINH 0.987460\nbeta_decXDR 0.600000\nreact 0.000500\nreact HIV 0.050000\nnever dx 0.150000\ninf child 0.200000\ndx rate 2.000000\ndx rateF 4.000000\ndx rateDef 2.000000\ndx rate EPTB 0.500000\nprop_newdx new 0.150000\nprop_newdx recur 0.300000\nprop_newdx fail 0.300000\nprop_newdx EPTBdec, 1.000000\npropEPTB adults HIV neg 0.180000\npropEPTB child HIV neg 0.700000\n tb_mort 0.150000\nbl_mort adult 0.022000\nmort_TB_h 0.300000\nretreatdec 0.900000\nsens mol RIF 0.940000\nsens mol INH 0.880000\nsens mol XDR 0.840000\nsens mol PTB 0.950000\nsensstand PTB 0.800000\nsensstand EPTB 0.600000\nself cure 0.100000\nself cure HIV 0.000000\nbasecase 0.000000\nName: 0, dtype: float64"
}
],
"prompt_number": 32
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Convert it into a numpy array"
},
{
"cell_type": "code",
"collapsed": false,
"input": "params = np.array(first_row)\nparams",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 38,
"text": "array([ 1.00000000e-03, 9.13573458e+00, 4.50000000e-01,\n 0.00000000e+00, 1.40000000e-01, 2.50000000e-01,\n 8.50000000e-01, 9.87460000e-01, 6.00000000e-01,\n 5.00000000e-04, 5.00000000e-02, 1.50000000e-01,\n 2.00000000e-01, 2.00000000e+00, 4.00000000e+00,\n 2.00000000e+00, 5.00000000e-01, 1.50000000e-01,\n 3.00000000e-01, 3.00000000e-01, 1.00000000e+00,\n 1.80000000e-01, 7.00000000e-01, 1.50000000e-01,\n 2.20000000e-02, 3.00000000e-01, 9.00000000e-01,\n 9.40000000e-01, 8.80000000e-01, 8.40000000e-01,\n 9.50000000e-01, 8.00000000e-01, 6.00000000e-01,\n 1.00000000e-01, 0.00000000e+00, 0.00000000e+00])"
}
],
"prompt_number": 38
},
{
"cell_type": "code",
"collapsed": false,
"input": "",
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment