Skip to content

Instantly share code, notes, and snippets.

@guinslym
Last active October 20, 2023 14:58
Show Gist options
  • Save guinslym/62f05d5d5ad7211fb1553bcb16530701 to your computer and use it in GitHub Desktop.
Save guinslym/62f05d5d5ad7211fb1553bcb16530701 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "a4842ffb",
"metadata": {},
"source": [
"## Importing the neccessary python package"
]
},
{
"cell_type": "code",
"execution_count": 62,
"id": "1f480728",
"metadata": {},
"outputs": [],
"source": [
"import warnings\n",
"\n",
"# Suppress FutureWarning messages\n",
"warnings.simplefilter(action='ignore', category=FutureWarning)\n",
"\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "2351c28b",
"metadata": {},
"source": [
"### Import the SAV file"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "442d7664",
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_spss(\"NTS-5232-E-2021-visit_F1.sav\") "
]
},
{
"cell_type": "code",
"execution_count": 52,
"id": "ac2c8abd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['PUMFID', 'REFYEAR', 'QUARTER', 'TRIPID', 'VISITID', 'VCNTRY', 'VPROV',\n",
" 'VTOUR', 'VCMA2', 'VSNIGHTS', 'FLAGACC2', 'VISITFL', 'VISRECFL',\n",
" 'AIRCODE', 'ACC_1', 'ACC_2', 'ACC_3', 'ACC_4', 'ACC_5', 'ACC_6',\n",
" 'VS_TRANS', 'VS_LOCAL', 'VS_ACCOM', 'VS_RENT', 'VS_OPER', 'VS_REST',\n",
" 'VS_STOR', 'VS_REC', 'VS_ENT', 'VS_CLOTH', 'VS_OTHER', 'VS_CRUIS',\n",
" 'VS_TOTAL'],\n",
" dtype='object')"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns"
]
},
{
"cell_type": "markdown",
"id": "35588250",
"metadata": {},
"source": [
"## Create a sub-datasets only with those variables"
]
},
{
"cell_type": "code",
"execution_count": 53,
"id": "ef4abaf5",
"metadata": {},
"outputs": [],
"source": [
"df = df[[\"VCNTRY\", \"ACC_1\",\"ACC_2\",\"ACC_3\",]]"
]
},
{
"cell_type": "code",
"execution_count": 54,
"id": "5b8eecf3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>VCNTRY</th>\n",
" <th>ACC_1</th>\n",
" <th>ACC_2</th>\n",
" <th>ACC_3</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>NaN</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>NaN</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>NaN</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>NaN</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>NaN</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>116398</th>\n",
" <td>NaN</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>116399</th>\n",
" <td>NaN</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>116400</th>\n",
" <td>NaN</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>116401</th>\n",
" <td>Italy</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>116402</th>\n",
" <td>NaN</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>116403 rows × 4 columns</p>\n",
"</div>"
],
"text/plain": [
" VCNTRY ACC_1 ACC_2 ACC_3\n",
"0 NaN No No No\n",
"1 NaN No No No\n",
"2 NaN No No No\n",
"3 NaN No No No\n",
"4 NaN No No No\n",
"... ... ... ... ...\n",
"116398 NaN No No No\n",
"116399 NaN No No No\n",
"116400 NaN No No No\n",
"116401 Italy No Yes No\n",
"116402 NaN No No No\n",
"\n",
"[116403 rows x 4 columns]"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"id": "b298050a",
"metadata": {},
"source": [
"### Resizing the dataset by defining a list of country that I want to review"
]
},
{
"cell_type": "code",
"execution_count": 55,
"id": "51706a4a",
"metadata": {},
"outputs": [],
"source": [
"country = [\"Canada\", \"Albania\", \"Armenia\", \"Australia\", \"United States\", \"Cuba\", \"Mexico\", \"United Kingdom, England, Scotland, Wales\"]"
]
},
{
"cell_type": "markdown",
"id": "1d8ecc8d",
"metadata": {},
"source": [
"### Create a copy of the data with only those 8 countries"
]
},
{
"cell_type": "code",
"execution_count": 56,
"id": "cf865a61",
"metadata": {},
"outputs": [],
"source": [
"df = df[df.VCNTRY.isin(country)].copy()"
]
},
{
"cell_type": "markdown",
"id": "1e0573b0",
"metadata": {},
"source": [
"### Create the Cross-Tabulation"
]
},
{
"cell_type": "code",
"execution_count": 57,
"id": "72e46c42",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"VCNTRY Albania Armenia Australia Cuba Mexico \\\n",
"ACC_1 ACC_2 ACC_3 \n",
"No No No 0 1 1 3 54 \n",
" Yes 0 0 0 0 0 \n",
" Yes No 0 0 0 39 191 \n",
" Yes 0 0 0 0 0 \n",
"Yes No No 3 1 0 2 13 \n",
" Yes 0 0 0 0 0 \n",
" Yes No 0 0 0 0 0 \n",
"\n",
"VCNTRY United Kingdom, England, Scotland, Wales United States \n",
"ACC_1 ACC_2 ACC_3 \n",
"No No No 5 312 \n",
" Yes 0 29 \n",
" Yes No 9 459 \n",
" Yes 0 1 \n",
"Yes No No 19 331 \n",
" Yes 0 2 \n",
" Yes No 4 23 \n"
]
}
],
"source": [
"# Create a cross-tabulation\n",
"cross_tab = pd.crosstab(index=[df['ACC_1'], df['ACC_2'], df['ACC_3']], columns=df['VCNTRY'])\n",
"\n",
"print(cross_tab)"
]
},
{
"cell_type": "markdown",
"id": "4f3728b5",
"metadata": {},
"source": [
"### Export the cross-tabulation to Excel"
]
},
{
"cell_type": "code",
"execution_count": 58,
"id": "949b8175",
"metadata": {},
"outputs": [],
"source": [
"# Define the file name\n",
"excel_file = \"Cross_tabulation_request.xlsx\"\n",
"\n",
"# Export the cross-tabulation to Excel\n",
"cross_tab.to_excel(excel_file)"
]
},
{
"cell_type": "markdown",
"id": "dfcbbe1f",
"metadata": {},
"source": [
"### Reversing the cross-tab"
]
},
{
"cell_type": "code",
"execution_count": 60,
"id": "1739e024",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ACC_1 No Yes \n",
"ACC_2 No Yes No Yes\n",
"ACC_3 No Yes No Yes No Yes No\n",
"VCNTRY \n",
"Albania 0 0 0 0 3 0 0\n",
"Armenia 1 0 0 0 1 0 0\n",
"Australia 1 0 0 0 0 0 0\n",
"Cuba 3 0 39 0 2 0 0\n",
"Mexico 54 0 191 0 13 0 0\n",
"United Kingdom, England, Scotland, Wales 5 0 9 0 19 0 4\n",
"United States 312 29 459 1 331 2 23\n"
]
}
],
"source": [
"# Create a cross-tabulation\n",
"cross_tab = pd.crosstab(index=df['VCNTRY'], columns=[df['ACC_1'], df['ACC_2'], df['ACC_3']])\n",
"\n",
"print(cross_tab)"
]
},
{
"cell_type": "code",
"execution_count": 61,
"id": "e61569db",
"metadata": {},
"outputs": [],
"source": [
"# Define the file name\n",
"excel_file = \"reversed_crosstab.xlsx\"\n",
"\n",
"# Export the cross-tabulation to Excel\n",
"cross_tab.to_excel(excel_file)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment