careduz/clearedpermits_file_integration.ipynb

## clearedpermits_file_integration.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 0. Context\n",
    "The purpose of this script is to consolidate the separate cleared permits files into one for visualization and analysis. This requires three main actions:\n",
    "> 1. Importing the raw .CSV files into Python as tables\n",
    "> 2. Transforming data tables for visualization purposes\n",
    "> 3. Consolidating tables into a single one\n",
    "> 4. Writing the consolidated table into another file to visualize the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import os\n",
    "from datetime import datetime\n",
    "from tabulate import tabulate\n",
    "import re\n",
    "\n",
    "script_directory = \"./\"\n",
    "input_files_dir = \"input_files/\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. Import input files\n",
    "## 1.1. Get file names from the *input_files* folder\n",
    "Will retrieve file names that end with _.csv_ and start with _clearedpermits_, which are default when the data is downloaded from the Open Data Portal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "files = [ filename for filename in os.listdir(script_directory + input_files_dir) \\\n",
    "         if (filename.startswith( \"clearedpermits\" ) and filename.endswith(\".csv\"))]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.2. Define function to bring CSV files in-memory\n",
    "Created a function for loading CSV files, and relevant metadata, rather than repeating the code for each CSV file or looping it. I was interested in capturing the following data for each file:\n",
    "1. Permits raw data\n",
    "2. Distinct count of permits\n",
    "3. Year of permits file\n",
    "\n",
    "Item 2 can be calculated from the imported data, and item 3 is in the default file name when downloading from the Open Data Portal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def printLine(year, rows, columns, permits_count, filename=\"\"):\n",
    "    rows = \"{:,}\".format(rows)\n",
    "    permits_count = \"{:,}\".format(permits_count)\n",
    "    print_dict = {\"Year\": year,\n",
    "                  \"File\": filename,\n",
    "                  \"Rows\": rows,\n",
    "                  \"Columns\": columns,\n",
    "                  \"Permits\": permits_count\n",
    "                 }\n",
    "    print_list = \" | \".join([ \"{0}: {1}\".format(d, print_dict[d]) for d in [\"Rows\", \"Columns\", \"Permits\"]\\\n",
    "                             if len( str(print_dict[d]))  > 0 ])\n",
    "    if len(filename) > 0: print( \"   {0} | {1} | {2}\".format(year, filename, print_list) )\n",
    "    else: print( \"   {0} | {1}\".format(year, print_list) )\n",
    "    \n",
    "    \n",
    "def loadTable(filename, script_directory=script_directory, input_files_dir=input_files_dir):\n",
    "    df = pd.read_csv('{0}{1}'.format(script_directory + input_files_dir, filename), low_memory=False)\n",
    "    year = re.findall(r'\\d+', f)[0]\n",
    "    permits_count, rows, columns = len(df['PERMIT_NUM'].unique()), df.shape[0], df.shape[1]\n",
    "    printLine(year=year, filename=filename, rows=rows, columns=columns, permits_count=permits_count)\n",
    "    \n",
    "    return df, year, permits_count, rows\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.3. Load CSVs into individual dataframes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "LOADING INPUT FILES FROM EACH YEAR\n",
      "   2001 | clearedpermits2001.csv | Rows: 21,458 | Columns: 30 | Permits: 21,231\n",
      "   2002 | clearedpermits2002.csv | Rows: 25,060 | Columns: 30 | Permits: 24,528\n",
      "   2003 | clearedpermits2003.csv | Rows: 25,708 | Columns: 30 | Permits: 25,092\n",
      "   2004 | clearedpermits2004.csv | Rows: 25,184 | Columns: 30 | Permits: 24,393\n",
      "   2005 | clearedpermits2005.csv | Rows: 32,809 | Columns: 30 | Permits: 31,660\n",
      "   2006 | clearedpermits2006.csv | Rows: 28,777 | Columns: 30 | Permits: 27,419\n",
      "   2007 | clearedpermits2007.csv | Rows: 25,616 | Columns: 30 | Permits: 24,145\n",
      "   2008 | clearedpermits2008.csv | Rows: 26,073 | Columns: 30 | Permits: 24,608\n",
      "   2009 | clearedpermits2009.csv | Rows: 26,385 | Columns: 30 | Permits: 24,814\n",
      "   2010 | clearedpermits2010.csv | Rows: 26,461 | Columns: 30 | Permits: 24,790\n",
      "   2011 | clearedpermits2011.csv | Rows: 30,541 | Columns: 30 | Permits: 28,719\n",
      "   2012 | clearedpermits2012.csv | Rows: 31,525 | Columns: 30 | Permits: 29,140\n",
      "   2013 | clearedpermits2013.csv | Rows: 34,796 | Columns: 30 | Permits: 32,488\n",
      "   2014 | clearedpermits2014.csv | Rows: 37,096 | Columns: 30 | Permits: 34,732\n",
      "   2015 | clearedpermits2015.csv | Rows: 38,969 | Columns: 30 | Permits: 36,152\n",
      "   2016 | clearedpermits2016.csv | Rows: 41,056 | Columns: 30 | Permits: 37,685\n",
      "   2017 | clearedpermits2017.csv | Rows: 40,047 | Columns: 30 | Permits: 36,627\n",
      "in the files from 2001 to 2017 there were a total of 517,561 records and 488,223 permits\n"
     ]
    }
   ],
   "source": [
    "tables, rows, permits = {}, [], []\n",
    "\n",
    "print(\"loading input files from each year\".upper())\n",
    "for f in sorted(files):\n",
    "    table, year, permits_count, rows_count = loadTable(f)\n",
    "    tables[year] = table.reset_index(drop=True)\n",
    "    rows.append(rows_count)\n",
    "    permits.append(permits_count)\n",
    "    \n",
    "years, rows, permits = sorted(list(tables)), sum(rows), sum(permits)\n",
    "first_year, last_year = years[0], years[-1]\n",
    "print(\"in the files from {0} to {1} there were a total of {2} records and {3} permits\".format(first_year,last_year, \"{:,}\".format(rows), \"{:,}\".format(permits)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. Consolidate data tables\n",
    "## 2.1. Append dataframes into one"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INTEGRATING PERMITS FROM ALL YEARS INTO ONE\n",
      "   2001 | Rows: 21,458 | Columns: 30 | Permits: 21,231\n",
      "   2001 | Rows: 46,518 | Columns: 30 | Permits: 45,693\n",
      "   2001 | Rows: 72,226 | Columns: 30 | Permits: 70,722\n",
      "   2001 | Rows: 97,410 | Columns: 30 | Permits: 95,020\n",
      "   2001 | Rows: 130,219 | Columns: 30 | Permits: 126,525\n",
      "   2001 | Rows: 158,996 | Columns: 30 | Permits: 153,819\n",
      "   2001 | Rows: 184,612 | Columns: 30 | Permits: 177,868\n",
      "   2001 | Rows: 210,685 | Columns: 30 | Permits: 202,399\n",
      "   2001 | Rows: 237,070 | Columns: 30 | Permits: 227,025\n",
      "   2001 | Rows: 263,531 | Columns: 30 | Permits: 251,759\n",
      "   2001 | Rows: 294,072 | Columns: 30 | Permits: 280,371\n",
      "   2001 | Rows: 325,597 | Columns: 30 | Permits: 309,416\n",
      "   2001 | Rows: 360,393 | Columns: 30 | Permits: 338,292\n",
      "   2001 | Rows: 397,489 | Columns: 30 | Permits: 372,671\n",
      "   2001 | Rows: 436,458 | Columns: 30 | Permits: 408,579\n",
      "   2001 | Rows: 477,514 | Columns: 30 | Permits: 446,002\n",
      "   2001 | Rows: 517,561 | Columns: 30 | Permits: 482,355\n"
     ]
    }
   ],
   "source": [
    "all_permits = tables[first_year]  # initiate data table to hold all permits, with earliest available year preloaded\n",
    "new_permits = len(all_permits['PERMIT_NUM'].unique())\n",
    "print(\"integrating permits from all years into one\".upper())\n",
    "printLine(year=first_year, rows=all_permits.shape[0], columns =all_permits.shape[1], permits_count=new_permits)\n",
    "\n",
    "for year in years[1:]:\n",
    "    all_permits = all_permits.append(tables[year])\n",
    "    new_permits = len(all_permits['PERMIT_NUM'].unique())\n",
    "    printLine(year=first_year, rows=all_permits.shape[0], columns =all_permits.shape[1], permits_count=new_permits)\n",
    "    \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As expected, the total number of rows matches in the integrated table matches the sum of number of rows across all years; however, note the difference in number of permits calculated before (across all the files)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  # rows in consolidated table    # of rows across all tables\n",
      "------------------------------  -----------------------------\n",
      "                        517561                         517561\n",
      "\n",
      "  # of permits ACROSS files    # of permits in INTEGRATED    difference\n",
      "---------------------------  ----------------------------  ------------\n",
      "                     488223                        482355          5868\n"
     ]
    }
   ],
   "source": [
    "print(tabulate([(all_permits.shape[0], rows)], [\"# rows in consolidated table\", \"# of rows across all tables\"]))\n",
    "print()\n",
    "print(tabulate([[permits, new_permits, permits-new_permits]], [\"# of permits ACROSS files\", \"# of permits in INTEGRATED\", \"difference\"]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is likely because permit numbers repeated across years, such as a renewal, so they can be counted more than once when summing the number of permits in each file - the accurate number is the new one\n",
    "\n",
    "# 3. Write dataframe to output file\n",
    "Opted for a .CSV file given it is a fairly standard format for tabular data and I had no issues with reading it in Tableau, the visualization software I used in this analysis. I mention his because it has happened before."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   \n",
      "CREATING .CSV FOR ALL BUILDING PERMITS...\n",
      "      output file created: ./clearedpermits_2001to2017.csv\n"
     ]
    }
   ],
   "source": [
    "while True:\n",
    "    write_to_csv = input(\"create a .CSV file with the {0} building permits from {1} to {2}? (yes / no): \".format(\"{:,}\".format(all_permits.shape[0]), first_year, last_year)).lower()\n",
    "   \n",
    "    if write_to_csv == \"yes\":\n",
    "        print(\"   \\ncreating .CSV for all building permits...\".upper())\n",
    "        created = datetime.now().strftime(\"%d%b%Y\")\n",
    "        output_file_name = \"clearedpermits_{0}to{1}.csv\".format(first_year, last_year,created)  # name of output file\n",
    "        all_permits.to_csv(\"{0}{1}\".format(script_directory, output_file_name), index=False)\n",
    "        print(\"      output file created: {0}{1}\".format(script_directory, output_file_name))\n",
    "        break\n",
    "    \n",
    "    elif write_to_csv == \"no\": print(\"finished (file not created)\".upper()); break\n",
    "        \n",
    "    else: print(\"try again\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# 0. Context\n",
	"The purpose of this script is to consolidate the separate cleared permits files into one for visualization and analysis. This requires three main actions:\n",
	"> 1. Importing the raw .CSV files into Python as tables\n",
	"> 2. Transforming data tables for visualization purposes\n",
	"> 3. Consolidating tables into a single one\n",
	"> 4. Writing the consolidated table into another file to visualize the data"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [],
	"source": [
	"import pandas as pd\n",
	"import os\n",
	"from datetime import datetime\n",
	"from tabulate import tabulate\n",
	"import re\n",
	"\n",
	"script_directory = \"./\"\n",
	"input_files_dir = \"input_files/\""
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# 1. Import input files\n",
	"## 1.1. Get file names from the input_files folder\n",
	"Will retrieve file names that end with _.csv_ and start with _clearedpermits_, which are default when the data is downloaded from the Open Data Portal."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {},
	"outputs": [],
	"source": [
	"files = [ filename for filename in os.listdir(script_directory + input_files_dir) \\\n",
	" if (filename.startswith( \"clearedpermits\" ) and filename.endswith(\".csv\"))]"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## 1.2. Define function to bring CSV files in-memory\n",
	"Created a function for loading CSV files, and relevant metadata, rather than repeating the code for each CSV file or looping it. I was interested in capturing the following data for each file:\n",
	"1. Permits raw data\n",
	"2. Distinct count of permits\n",
	"3. Year of permits file\n",
	"\n",
	"Item 2 can be calculated from the imported data, and item 3 is in the default file name when downloading from the Open Data Portal."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {},
	"outputs": [],
	"source": [
	"def printLine(year, rows, columns, permits_count, filename=\"\"):\n",
	" rows = \"{:,}\".format(rows)\n",
	" permits_count = \"{:,}\".format(permits_count)\n",
	" print_dict = {\"Year\": year,\n",
	" \"File\": filename,\n",
	" \"Rows\": rows,\n",
	" \"Columns\": columns,\n",
	" \"Permits\": permits_count\n",
	" }\n",
	" print_list = \" \| \".join([ \"{0}: {1}\".format(d, print_dict[d]) for d in [\"Rows\", \"Columns\", \"Permits\"]\\\n",
	" if len( str(print_dict[d])) > 0 ])\n",
	" if len(filename) > 0: print( \" {0} \| {1} \| {2}\".format(year, filename, print_list) )\n",
	" else: print( \" {0} \| {1}\".format(year, print_list) )\n",
	" \n",
	" \n",
	"def loadTable(filename, script_directory=script_directory, input_files_dir=input_files_dir):\n",
	" df = pd.read_csv('{0}{1}'.format(script_directory + input_files_dir, filename), low_memory=False)\n",
	" year = re.findall(r'\\d+', f)[0]\n",
	" permits_count, rows, columns = len(df['PERMIT_NUM'].unique()), df.shape[0], df.shape[1]\n",
	" printLine(year=year, filename=filename, rows=rows, columns=columns, permits_count=permits_count)\n",
	" \n",
	" return df, year, permits_count, rows\n",
	"\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## 1.3. Load CSVs into individual dataframes"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"LOADING INPUT FILES FROM EACH YEAR\n",
	" 2001 \| clearedpermits2001.csv \| Rows: 21,458 \| Columns: 30 \| Permits: 21,231\n",
	" 2002 \| clearedpermits2002.csv \| Rows: 25,060 \| Columns: 30 \| Permits: 24,528\n",
	" 2003 \| clearedpermits2003.csv \| Rows: 25,708 \| Columns: 30 \| Permits: 25,092\n",
	" 2004 \| clearedpermits2004.csv \| Rows: 25,184 \| Columns: 30 \| Permits: 24,393\n",
	" 2005 \| clearedpermits2005.csv \| Rows: 32,809 \| Columns: 30 \| Permits: 31,660\n",
	" 2006 \| clearedpermits2006.csv \| Rows: 28,777 \| Columns: 30 \| Permits: 27,419\n",
	" 2007 \| clearedpermits2007.csv \| Rows: 25,616 \| Columns: 30 \| Permits: 24,145\n",
	" 2008 \| clearedpermits2008.csv \| Rows: 26,073 \| Columns: 30 \| Permits: 24,608\n",
	" 2009 \| clearedpermits2009.csv \| Rows: 26,385 \| Columns: 30 \| Permits: 24,814\n",
	" 2010 \| clearedpermits2010.csv \| Rows: 26,461 \| Columns: 30 \| Permits: 24,790\n",
	" 2011 \| clearedpermits2011.csv \| Rows: 30,541 \| Columns: 30 \| Permits: 28,719\n",
	" 2012 \| clearedpermits2012.csv \| Rows: 31,525 \| Columns: 30 \| Permits: 29,140\n",
	" 2013 \| clearedpermits2013.csv \| Rows: 34,796 \| Columns: 30 \| Permits: 32,488\n",
	" 2014 \| clearedpermits2014.csv \| Rows: 37,096 \| Columns: 30 \| Permits: 34,732\n",
	" 2015 \| clearedpermits2015.csv \| Rows: 38,969 \| Columns: 30 \| Permits: 36,152\n",
	" 2016 \| clearedpermits2016.csv \| Rows: 41,056 \| Columns: 30 \| Permits: 37,685\n",
	" 2017 \| clearedpermits2017.csv \| Rows: 40,047 \| Columns: 30 \| Permits: 36,627\n",
	"in the files from 2001 to 2017 there were a total of 517,561 records and 488,223 permits\n"
	]
	}
	],
	"source": [
	"tables, rows, permits = {}, [], []\n",
	"\n",
	"print(\"loading input files from each year\".upper())\n",
	"for f in sorted(files):\n",
	" table, year, permits_count, rows_count = loadTable(f)\n",
	" tables[year] = table.reset_index(drop=True)\n",
	" rows.append(rows_count)\n",
	" permits.append(permits_count)\n",
	" \n",
	"years, rows, permits = sorted(list(tables)), sum(rows), sum(permits)\n",
	"first_year, last_year = years[0], years[-1]\n",
	"print(\"in the files from {0} to {1} there were a total of {2} records and {3} permits\".format(first_year,last_year, \"{:,}\".format(rows), \"{:,}\".format(permits)))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# 2. Consolidate data tables\n",
	"## 2.1. Append dataframes into one"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"INTEGRATING PERMITS FROM ALL YEARS INTO ONE\n",
	" 2001 \| Rows: 21,458 \| Columns: 30 \| Permits: 21,231\n",
	" 2001 \| Rows: 46,518 \| Columns: 30 \| Permits: 45,693\n",
	" 2001 \| Rows: 72,226 \| Columns: 30 \| Permits: 70,722\n",
	" 2001 \| Rows: 97,410 \| Columns: 30 \| Permits: 95,020\n",
	" 2001 \| Rows: 130,219 \| Columns: 30 \| Permits: 126,525\n",
	" 2001 \| Rows: 158,996 \| Columns: 30 \| Permits: 153,819\n",
	" 2001 \| Rows: 184,612 \| Columns: 30 \| Permits: 177,868\n",
	" 2001 \| Rows: 210,685 \| Columns: 30 \| Permits: 202,399\n",
	" 2001 \| Rows: 237,070 \| Columns: 30 \| Permits: 227,025\n",
	" 2001 \| Rows: 263,531 \| Columns: 30 \| Permits: 251,759\n",
	" 2001 \| Rows: 294,072 \| Columns: 30 \| Permits: 280,371\n",
	" 2001 \| Rows: 325,597 \| Columns: 30 \| Permits: 309,416\n",
	" 2001 \| Rows: 360,393 \| Columns: 30 \| Permits: 338,292\n",
	" 2001 \| Rows: 397,489 \| Columns: 30 \| Permits: 372,671\n",
	" 2001 \| Rows: 436,458 \| Columns: 30 \| Permits: 408,579\n",
	" 2001 \| Rows: 477,514 \| Columns: 30 \| Permits: 446,002\n",
	" 2001 \| Rows: 517,561 \| Columns: 30 \| Permits: 482,355\n"
	]
	}
	],
	"source": [
	"all_permits = tables[first_year] # initiate data table to hold all permits, with earliest available year preloaded\n",
	"new_permits = len(all_permits['PERMIT_NUM'].unique())\n",
	"print(\"integrating permits from all years into one\".upper())\n",
	"printLine(year=first_year, rows=all_permits.shape[0], columns =all_permits.shape[1], permits_count=new_permits)\n",
	"\n",
	"for year in years[1:]:\n",
	" all_permits = all_permits.append(tables[year])\n",
	" new_permits = len(all_permits['PERMIT_NUM'].unique())\n",
	" printLine(year=first_year, rows=all_permits.shape[0], columns =all_permits.shape[1], permits_count=new_permits)\n",
	" \n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"As expected, the total number of rows matches in the integrated table matches the sum of number of rows across all years; however, note the difference in number of permits calculated before (across all the files)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	" # rows in consolidated table # of rows across all tables\n",
	"------------------------------ -----------------------------\n",
	" 517561 517561\n",
	"\n",
	" # of permits ACROSS files # of permits in INTEGRATED difference\n",
	"--------------------------- ---------------------------- ------------\n",
	" 488223 482355 5868\n"
	]
	}
	],
	"source": [
	"print(tabulate([(all_permits.shape[0], rows)], [\"# rows in consolidated table\", \"# of rows across all tables\"]))\n",
	"print()\n",
	"print(tabulate([[permits, new_permits, permits-new_permits]], [\"# of permits ACROSS files\", \"# of permits in INTEGRATED\", \"difference\"]))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"This is likely because permit numbers repeated across years, such as a renewal, so they can be counted more than once when summing the number of permits in each file - the accurate number is the new one\n",
	"\n",
	"# 3. Write dataframe to output file\n",
	"Opted for a .CSV file given it is a fairly standard format for tabular data and I had no issues with reading it in Tableau, the visualization software I used in this analysis. I mention his because it has happened before."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	" \n",
	"CREATING .CSV FOR ALL BUILDING PERMITS...\n",
	" output file created: ./clearedpermits_2001to2017.csv\n"
	]
	}
	],
	"source": [
	"while True:\n",
	" write_to_csv = input(\"create a .CSV file with the {0} building permits from {1} to {2}? (yes / no): \".format(\"{:,}\".format(all_permits.shape[0]), first_year, last_year)).lower()\n",
	" \n",
	" if write_to_csv == \"yes\":\n",
	" print(\" \\ncreating .CSV for all building permits...\".upper())\n",
	" created = datetime.now().strftime(\"%d%b%Y\")\n",
	" output_file_name = \"clearedpermits_{0}to{1}.csv\".format(first_year, last_year,created) # name of output file\n",
	" all_permits.to_csv(\"{0}{1}\".format(script_directory, output_file_name), index=False)\n",
	" print(\" output file created: {0}{1}\".format(script_directory, output_file_name))\n",
	" break\n",
	" \n",
	" elif write_to_csv == \"no\": print(\"finished (file not created)\".upper()); break\n",
	" \n",
	" else: print(\"try again\")"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.6.4"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}