Skip to content

Instantly share code, notes, and snippets.

@adam704a
Created April 9, 2020 11:49
Show Gist options
  • Save adam704a/0352135ade89e008c2fb2ad10f720f2d to your computer and use it in GitHub Desktop.
Save adam704a/0352135ade89e008c2fb2ad10f720f2d to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Overview\n",
"This script reads in the formatted workbook that we recieved from Stop Palu + and produces an import file that can be used in the Import Wizard. The template for this import file is here: Cohort Data Import.xlsx"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1\n",
"Read in file to get the facility names and regions (they don't make this easy)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"cohortfiles = pd.read_excel('../../Other Forms/DONNEES SUIVI DE COHORTE_STOPPALU+_Janv-Juin 2019 vf(1).xlsx', index_col=0, header=None,sheet_name=None)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"region = cohortfiles['CS BINTIMODIA'].loc['DRS/DSVCO'][1] \n",
"district = cohortfiles['CS BINTIMODIA'].loc['DRS/DSVCO'][5] \n",
"facility = cohortfiles['CS BINTIMODIA'].loc['DRS/DSVCO'][10] "
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"# build a list of health facilities and their districts\n",
"regions = []\n",
"districts = []\n",
"facilities = []\n",
"\n",
"for tabname, df in cohortfiles.items():\n",
"\n",
" if \"total\" not in tabname.lower() and \"dps \" not in tabname.lower(): #don't include these\n",
" #print(tabname)\n",
" regions.append(df.loc['DRS/DSVCO'][1].title())\n",
" districts.append(df.loc['DRS/DSVCO'][5].title() )\n",
" facilities.append(df.loc['DRS/DSVCO'][10].title())\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"combined = zip(regions, districts,facilities)\n",
"facility_df = pd.DataFrame(combined, columns = ['region', 'district', 'facility'])\n",
"\n",
"# add period\n",
"facility_df[\"period\"] = \"2019Oct\""
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(161, 4)"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"facility_df.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2\n",
"Initialize the data"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"# define columns\n",
"columns = [\n",
"\"Cohorte - CPN 1 Cohorte de janvier\",\n",
"\"Cohorte - CPN 1 Cohorte de février\",\n",
"\"Cohorte - CPN 1 Cohorte de mars\",\n",
"\"Cohorte - CPN 1 Cohorte d'avril\",\n",
"\"Cohorte - CPN 1 Cohorte de mai\",\n",
"\"Cohorte - CPN 1 Cohorte de juin\",\n",
"\"Cohorte - CPN 1 Cohorte de juillet\",\n",
"\"Cohorte - CPN 1 Cohorte d'août\",\n",
"\"Cohorte - CPN 1 Cohorte de septembre\",\n",
"\"Cohorte - CPN 1 Cohorte d'octobre\",\n",
"\"Cohorte - CPN 1 Cohorte de novembre\",\n",
"\"Cohorte - CPN 1 Cohorte de décembre\",\n",
"\"Cohorte - CPN1 au 1er trimestre ≤ 12 semaines Cohorte de janvier\",\n",
"\"Cohorte - CPN1 au 1er trimestre ≤ 12 semaines Cohorte de février\",\n",
"\"Cohorte - CPN1 au 1er trimestre ≤ 12 semaines Cohorte de mars\",\n",
"\"Cohorte - CPN1 au 1er trimestre ≤ 12 semaines Cohorte d'avril\",\n",
"\"Cohorte - CPN1 au 1er trimestre ≤ 12 semaines Cohorte de mai\",\n",
"\"Cohorte - CPN1 au 1er trimestre ≤ 12 semaines Cohorte de juin\",\n",
"\"Cohorte - CPN1 au 1er trimestre ≤ 12 semaines Cohorte de juillet\",\n",
"\"Cohorte - CPN1 au 1er trimestre ≤ 12 semaines Cohorte d'août\",\n",
"\"Cohorte - CPN1 au 1er trimestre ≤ 12 semaines Cohorte de septembre\",\n",
"\"Cohorte - CPN1 au 1er trimestre ≤ 12 semaines Cohorte d'octobre\",\n",
"\"Cohorte - CPN1 au 1er trimestre ≤ 12 semaines Cohorte de novembre\",\n",
"\"Cohorte - CPN1 au 1er trimestre ≤ 12 semaines Cohorte de décembre\",\n",
"\"Cohorte - SP 1 Cohorte de janvier\",\n",
"\"Cohorte - SP 1 Cohorte de février\",\n",
"\"Cohorte - SP 1 Cohorte de mars\",\n",
"\"Cohorte - SP 1 Cohorte d'avril\",\n",
"\"Cohorte - SP 1 Cohorte de mai\",\n",
"\"Cohorte - SP 1 Cohorte de juin\",\n",
"\"Cohorte - SP 1 Cohorte de juillet\",\n",
"\"Cohorte - SP 1 Cohorte d'août\",\n",
"\"Cohorte - SP 1 Cohorte de septembre\",\n",
"\"Cohorte - SP 1 Cohorte d'octobre\",\n",
"\"Cohorte - SP 1 Cohorte de novembre\",\n",
"\"Cohorte - SP 1 Cohorte de décembre\",\n",
"\"Cohorte - SP 2 Cohorte de janvier\",\n",
"\"Cohorte - SP 2 Cohorte de février\",\n",
"\"Cohorte - SP 2 Cohorte de mars\",\n",
"\"Cohorte - SP 2 Cohorte d'avril\",\n",
"\"Cohorte - SP 2 Cohorte de mai\",\n",
"\"Cohorte - SP 2 Cohorte de juin\",\n",
"\"Cohorte - SP 2 Cohorte de juillet\",\n",
"\"Cohorte - SP 2 Cohorte d'août\",\n",
"\"Cohorte - SP 2 Cohorte de septembre\",\n",
"\"Cohorte - SP 2 Cohorte d'octobre\",\n",
"\"Cohorte - SP 2 Cohorte de novembre\",\n",
"\"Cohorte - SP 2 Cohorte de décembre\",\n",
"\"Cohorte - SP 3 Cohorte de janvier\",\n",
"\"Cohorte - SP 3 Cohorte de février\",\n",
"\"Cohorte - SP 3 Cohorte de mars\",\n",
"\"Cohorte - SP 3 Cohorte d'avril\",\n",
"\"Cohorte - SP 3 Cohorte de mai\",\n",
"\"Cohorte - SP 3 Cohorte de juin\",\n",
"\"Cohorte - SP 3 Cohorte de juillet\",\n",
"\"Cohorte - SP 3 Cohorte d'août\",\n",
"\"Cohorte - SP 3 Cohorte de septembre\",\n",
"\"Cohorte - SP 3 Cohorte d'octobre\",\n",
"\"Cohorte - SP 3 Cohorte de novembre\",\n",
"\"Cohorte - SP 3 Cohorte de décembre\",\n",
"\"Cohorte - SP 4+ Cohorte de janvier\",\n",
"\"Cohorte - SP 4+ Cohorte de février\",\n",
"\"Cohorte - SP 4+ Cohorte de mars\",\n",
"\"Cohorte - SP 4+ Cohorte d'avril\",\n",
"\"Cohorte - SP 4+ Cohorte de mai\",\n",
"\"Cohorte - SP 4+ Cohorte de juin\",\n",
"\"Cohorte - SP 4+ Cohorte de juillet\",\n",
"\"Cohorte - SP 4+ Cohorte d'août\",\n",
"\"Cohorte - SP 4+ Cohorte de septembre\",\n",
"\"Cohorte - SP 4+ Cohorte d'octobre\",\n",
"\"Cohorte - SP 4+ Cohorte de novembre\",\n",
"\"Cohorte - SP 4+ Cohorte de décembre\",\n",
"\"Cohorte - Accouch au CS Cohorte de janvier\",\n",
"\"Cohorte - Accouch au CS Cohorte de février\",\n",
"\"Cohorte - Accouch au CS Cohorte de mars\",\n",
"\"Cohorte - Accouch au CS Cohorte d'avril\",\n",
"\"Cohorte - Accouch au CS Cohorte de mai\",\n",
"\"Cohorte - Accouch au CS Cohorte de juin\",\n",
"\"Cohorte - Accouch au CS Cohorte de juillet\",\n",
"\"Cohorte - Accouch au CS Cohorte d'août\",\n",
"\"Cohorte - Accouch au CS Cohorte de septembre\",\n",
"\"Cohorte - Accouch au CS Cohorte d'octobre\",\n",
"\"Cohorte - Accouch au CS Cohorte de novembre\",\n",
"\"Cohorte - Accouch au CS Cohorte de décembre\"\n",
"]\n",
"\n",
"cohort_data = {}\n",
"for n in columns:\n",
" #print(n)\n",
" cohort_data[n]=[]"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"# whats in here again (optional)\n",
"#cohort_data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3\n",
"Read in file to get the cohort data"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"cohorts = pd.read_excel('../../Other Forms/DONNEES SUIVI DE COHORTE_STOPPALU+_Janv-Juin 2019 vf(1).xlsx', header=6, index_col=0, sheet_name=None)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>CPN 1</th>\n",
" <th>CPN1 au 1er trimestre ≤ 12 semaines</th>\n",
" <th>SP 1</th>\n",
" <th>SP 1 à 13 SA</th>\n",
" <th>SP 2</th>\n",
" <th>SP 3</th>\n",
" <th>SP 4+</th>\n",
" <th>Accouch au CS</th>\n",
" <th>FE ayant reçu au moins 3 doses de SP avec respect d'1 mois d'interval entre 2 prises</th>\n",
" <th>Couverture en TPI-3</th>\n",
" <th>Couverture effective en TPI-3</th>\n",
" <th>% TPI-1 à 13 SA</th>\n",
" <th>%FE ayant accouché au CS</th>\n",
" <th>Taux d'abandon SP1-SP3</th>\n",
" </tr>\n",
" <tr>\n",
" <th>COHORTES</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2019-01-01 00:00:00</th>\n",
" <td>141.0</td>\n",
" <td>NaN</td>\n",
" <td>38.0</td>\n",
" <td>NaN</td>\n",
" <td>52.0</td>\n",
" <td>32.0</td>\n",
" <td>19.0</td>\n",
" <td>22.0</td>\n",
" <td>NaN</td>\n",
" <td>0.22695</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.156028</td>\n",
" <td>0.157895</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-02-01 00:00:00</th>\n",
" <td>138.0</td>\n",
" <td>NaN</td>\n",
" <td>43.0</td>\n",
" <td>NaN</td>\n",
" <td>39.0</td>\n",
" <td>34.0</td>\n",
" <td>22.0</td>\n",
" <td>26.0</td>\n",
" <td>NaN</td>\n",
" <td>0.246377</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.188406</td>\n",
" <td>0.209302</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-03-01 00:00:00</th>\n",
" <td>145.0</td>\n",
" <td>NaN</td>\n",
" <td>33.0</td>\n",
" <td>NaN</td>\n",
" <td>50.0</td>\n",
" <td>39.0</td>\n",
" <td>23.0</td>\n",
" <td>37.0</td>\n",
" <td>NaN</td>\n",
" <td>0.268966</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.255172</td>\n",
" <td>-0.181818</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-04-01 00:00:00</th>\n",
" <td>18.0</td>\n",
" <td>NaN</td>\n",
" <td>18.0</td>\n",
" <td>NaN</td>\n",
" <td>18.0</td>\n",
" <td>16.0</td>\n",
" <td>12.0</td>\n",
" <td>15.0</td>\n",
" <td>NaN</td>\n",
" <td>0.888889</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.833333</td>\n",
" <td>0.111111</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-05-01 00:00:00</th>\n",
" <td>26.0</td>\n",
" <td>NaN</td>\n",
" <td>26.0</td>\n",
" <td>NaN</td>\n",
" <td>22.0</td>\n",
" <td>14.0</td>\n",
" <td>9.0</td>\n",
" <td>5.0</td>\n",
" <td>NaN</td>\n",
" <td>0.538462</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.192308</td>\n",
" <td>0.461538</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" CPN 1 CPN1 au 1er trimestre ≤ 12 semaines SP 1 \\\n",
"COHORTES \n",
"2019-01-01 00:00:00 141.0 NaN 38.0 \n",
"2019-02-01 00:00:00 138.0 NaN 43.0 \n",
"2019-03-01 00:00:00 145.0 NaN 33.0 \n",
"2019-04-01 00:00:00 18.0 NaN 18.0 \n",
"2019-05-01 00:00:00 26.0 NaN 26.0 \n",
"\n",
" SP 1 à 13 SA SP 2 SP 3 SP 4+ Accouch au CS \\\n",
"COHORTES \n",
"2019-01-01 00:00:00 NaN 52.0 32.0 19.0 22.0 \n",
"2019-02-01 00:00:00 NaN 39.0 34.0 22.0 26.0 \n",
"2019-03-01 00:00:00 NaN 50.0 39.0 23.0 37.0 \n",
"2019-04-01 00:00:00 NaN 18.0 16.0 12.0 15.0 \n",
"2019-05-01 00:00:00 NaN 22.0 14.0 9.0 5.0 \n",
"\n",
" FE ayant reçu au moins 3 doses de SP avec respect d'1 mois d'interval entre 2 prises \\\n",
"COHORTES \n",
"2019-01-01 00:00:00 NaN \n",
"2019-02-01 00:00:00 NaN \n",
"2019-03-01 00:00:00 NaN \n",
"2019-04-01 00:00:00 NaN \n",
"2019-05-01 00:00:00 NaN \n",
"\n",
" Couverture en TPI-3 Couverture effective en TPI-3 \\\n",
"COHORTES \n",
"2019-01-01 00:00:00 0.22695 0 \n",
"2019-02-01 00:00:00 0.246377 0 \n",
"2019-03-01 00:00:00 0.268966 0 \n",
"2019-04-01 00:00:00 0.888889 0 \n",
"2019-05-01 00:00:00 0.538462 0 \n",
"\n",
" % TPI-1 à 13 SA %FE ayant accouché au CS \\\n",
"COHORTES \n",
"2019-01-01 00:00:00 0 0.156028 \n",
"2019-02-01 00:00:00 0 0.188406 \n",
"2019-03-01 00:00:00 0 0.255172 \n",
"2019-04-01 00:00:00 0 0.833333 \n",
"2019-05-01 00:00:00 0 0.192308 \n",
"\n",
" Taux d'abandon SP1-SP3 \n",
"COHORTES \n",
"2019-01-01 00:00:00 0.157895 \n",
"2019-02-01 00:00:00 0.209302 \n",
"2019-03-01 00:00:00 -0.181818 \n",
"2019-04-01 00:00:00 0.111111 \n",
"2019-05-01 00:00:00 0.461538 "
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cohorts['CS BINTIMODIA'].head()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"# go through each tab and put data in the main dictionary\n",
"\n",
"for tabname, df in cohorts.items():\n",
" #print(\"processing \"+ tabname)\n",
" if \"total\" not in tabname.lower() and \"dps \" not in tabname.lower(): #don't include these\n",
"\n",
" df.reset_index(inplace=True) # the index needs to be numeric for this\n",
"\n",
" # load data into main dictionary by column\n",
" \n",
" for index, row in df['CPN 1'].items():\n",
" if index < 12: # don't include totals row\n",
" data = cohort_data[columns[index]]\n",
" data.append(row)\n",
"\n",
" # this indicator starts at row 12\n",
" for index, row in df['CPN1 au 1er trimestre ≤ 12 semaines'].items():\n",
" if index < 12: # don't include totals row\n",
" data = cohort_data[columns[index+12]]\n",
" data.append(row)\n",
" \n",
" # this indicator starts at row 24\n",
" for index, row in df['SP 1'].items():\n",
" if index < 12: # don't include totals row\n",
" data = cohort_data[columns[index+24]]\n",
" data.append(row)\n",
" \n",
" # this indicator starts at row 36\n",
" for index, row in df['SP 2'].items():\n",
" if index < 12: # don't include totals row\n",
" data = cohort_data[columns[index+36]]\n",
" data.append(row)\n",
" \n",
" # this indicator starts at row 48\n",
" for index, row in df['SP 3'].items():\n",
" if index < 12: # don't include totals row\n",
" data = cohort_data[columns[index+48]]\n",
" data.append(row)\n",
" \n",
" # this indicator starts at row 60\n",
" for index, row in df['SP 4+'].items():\n",
" if index < 12: # don't include totals row\n",
" data = cohort_data[columns[index+60]]\n",
" data.append(row)\n",
" \n",
" # this indicator starts at row 72\n",
" for index, row in df['Accouch au CS'].items():\n",
" if index < 12: # don't include totals row\n",
" data = cohort_data[columns[index+72]]\n",
" data.append(row)\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4\n",
"Put everything together"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"# probably loop here\n",
"main_df = facility_df\n",
"\n",
"\n",
"#for i in range(36): # onsey twosie, there are 12 * 7 = 84 of these\n",
"for i, n in enumerate(columns): #everything\n",
" main_df[columns[i]] = cohort_data[columns[i]]\n",
" #print(i)\n"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(161, 88)"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"main_df.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5\n",
"Write out the file. This will be what you import"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"main_df.to_excel(\"cohort import.xlsx\", encoding='utf-8', index=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "rti",
"language": "python",
"name": "rti"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment