Skip to content

Instantly share code, notes, and snippets.

@thomasthaddeus
Created February 8, 2023 13:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save thomasthaddeus/c2f0ce5d62910f6bc8c875bd7a17b678 to your computer and use it in GitHub Desktop.
Save thomasthaddeus/c2f0ce5d62910f6bc8c875bd7a17b678 to your computer and use it in GitHub Desktop.
SQLite3 in Python and ways to analyze it
"""analyzing_hotel_dbs_wpy.py"""
# %% [markdown]
# # Analyzing Hotel Databases with Python
#
# Sleep Away Inc., a hotel corporation, has just hired you as a data analyst.
#
# 1. They would like you to analyze their most recent data about:
# - customers who cancelled their booking and
# - customers who did not cancel.
# 2. It is your job to connect to Sleep Away’s SQLite database so you can edit and analyze the `hotel_booking.db` database.
# 3. The company wants to learn about:
# - the `BRA` customers (Brazilian customers) who cancelled their bookings
# - compared to the `BRA` customers who did not.
# 4. To do this, you will create a _new data table_ in their database that _only_ has the **_Brazilian customer hotel bookings_**.
# 5. Once you create this new table and fill it with the `BRA` data, you will analyze it and report back to **Sleep Away Inc**.
# 6. Lastly, don’t forget to commit these changes to the database and close the connection.
#
# The **hotel_booking.db** database has a data table called `booking_summary`, which consists of their booking information from the year *2017*.
# Here is a field description from the `booking_summary` data table:
#
# | Field | Description |
# | ------------------------- | :------------------------------------------------------------------------------------------------------------------ |
# | hotel | Type of hotel (Resort or City) |
# | is_cancelled | 1=cancelled, 0= not cancelled |
# | lead_time | Number of days that elapsed between the entering date of the booking and the arrival date |
# | arrival_date_year | Year they arrived |
# | arrival_date_month | Month they arrived |
# | arrival_date_day_of_month | Day of month they arrived |
# | adults | Number of adults |
# | children | Number of children |
# | adr | Average Daily Rate as defined by dividing the sum of all lodging transactions by the total number of staying nights |
# | special_requests | Number of special requests made by the customer (e.g. twin bed or high floor) |
#
# %% [markdown]
# ## Tasks
#
# ### Creating Connection and Cursor Objects
#
# 1. Connect to the `hotel_booking.db` database and name the connection object con.
# 2. Next, you must create your cursor object with `.cursor()`. Name your cursor object `cur` for convenience with the following tasks.
#
# ### Explore and Understand the Data
#
# 3. Now that you have connected to the database, let’s look at some of the actual rows of data.
# Using the `.fetchone()` method, view the first row from the `booking_summary` table. Be sure to print this line of code to view the output.
# 4. To better understand the data, pull more than one row with the `.fetchmany()` method. Be sure to specify how many rows to pull and print this line of code to view the output.
#
# ### Pull `BRA` Data and Save as SQLite Table
#
# 5. Now that you’ve looked at some records from booking_summary, pull all records with BRA (Brazil) as country of origin.
# Create a SQL statement with the `.fetchall()` method to retrieve the `BRA` records, and save this tuple as bra. You may print the first five rows of bra to double-check your work.
#
# 6. Create a new table named bra_customers that has the following field names and field types:
# `num INTEGER, hotel TEXT, is_cancelled INTEGER, lead_time INTEGER, arrival_date_year INTEGER, arrival_date_month TEXT, arrival_date_day_of_month INTEGER, adults INTEGER, children INTEGER, country TEXT, adr REAL, special_requests INTEGER`
# - **Note** use `CREATE TABLE IF NOT EXISTS` instead of `CREATE TABLE`. This way you won’t run into the following error:
# `sqlite3.OperationalError: table bra_customers already exists`
# In the next task, you will insert the BRA records into this table.
# 7. Insert the object BRA into the table bra_customers using .executemany() and use question marks as field placeholders. Remember, the object BRA is a tuple list of all the rows of data that have BRA for the field country.
# The following SQL command, '''INSERT INTO bra_customers VALUES (?,?,?,?,?,?,?,?,?,?,?,?)''', bra) will go inside the .executemany() method.
# 8. Great Job! To ensure you correctly inserted bra into bra_customers, print the first ten rows.
#
# ### Find the Average Lead Time
#
# 9. Sleep Away Inc. would like to compare the average lead time (number of days that elapsed between the entering date of the booking and the arrival date) of those who cancelled and those who did not cancel (from the BRA data).
# Retrieve the rows of the column lead_time where the customer cancelled their booking. Pull this data from the bra_customers table using the .fetchall() method. You will also use the SQL commands SELECT, FROM, and WHERE.
# Remember, the `field is_cancelled` is:
# - 1 if cancelled and
# - 0 if they did not cancel.
# Be sure to save these rows as a new object such as `lead_time_can`.
# Check out the Advanced Tip in the hint below to learn more about pulling and analyzing SQLite data!
# 10. Now that you retrieved the desired rows of data, create a for loop that iterates through and calculates the average lead time. This will be much easier if you saved the data retrieved in task #9 in a variable.
# 11. Create an object made up of the lead_time field, but this time for those who did NOT cancel their booking.
# 12. Calculate the average lead time of those who did NOT cancel their booking. You may use the same for loop as task #10 but make sure it iterates through the new data retrieved.
# After you find the average lead time of both those who cancelled and those who did not cancel, compare them. Which group has a higher average lead time, and by how much? How could this knowledge help Sleep Away to have fewer cancellations?
#
# ### Find the Total Special Requests
#
# 13. Sleep Away Inc. also wants you to calculate the total amount of special requests for cancelled vs. not cancelled.
# First, retrieve the special_requests field from the bra_customers data table for those who cancelled. You can use the `.fetchall()` method or a `for loop`.
# Be sure to save this data as an object.
# 14. With the data you retrieved, use a for loop to find the total amount of special requests for cancelled bookings.
# 15. Pull the total special requests column from the bra_customers data table for those who did NOT cancel. You can use the `.fetchall()` method or a `for loop`.
# Be sure to save this data to an object.
# 16. Now find the total amount of special requests for those who did not cancel. Compare this to the total amount of special requests for those who cancelled.
# Does it surprise you which group is higher? What can you tell Sleep Away to help them have fewer cancellations?
#
# ### Committing Changes and Closing Connection
#
# 17. Great Job! You have finished working in this database. Commit all the changes using `.commit()`. This ensures that any changes to the database are not lost.
# Once you have committed the changes, you may close the connection with `.close()`.
# Import module
import sqlite3
import pandas as pd
# Task 1: Create connection object
con = sqlite3.connect("hotel_booking.db")
# Task 2: Create cursor object
cur = con.cursor()
# Task 3: View first row of booking_summary
one = cur.execute('''SELECT * FROM booking_summary;''').fetchone()
print(f"{one}\n")
# Task 4: View first ten rows of booking_summary
ten = cur.execute('''SELECT * FROM booking_summary;''').fetchmany(10)
print(f"{ten}\n")
# Task 5: Create object bra and print first 5 rows to view data
bra = cur.execute('''SELECT * FROM booking_summary WHERE country = "BRA";''').fetchall()
print(f"{bra[:5]}\n")
# Task 6: Create new table called bra_customers
bra_customers = cur.execute('''CREATE TABLE IF NOT EXISTS bra_customers(num INTEGER,
hotel TEXT,
is_cancelled INTEGER,
lead_time INTEGER,
arrival_date_year INTEGER,
arrival_date_month TEXT,
arrival_date_day_of_month INTEGER,
adults INTEGER,
children INTEGER,
country TEXT,
adr REAL,
special_requests INTEGER);''')
# Task 7: Insert the object bra into the table bra_customers
cur.executemany('''INSERT INTO bra_customers VALUES (?,?,?,?,?,?,?,?,?,?,?,?);''', bra)
# Task 8: View the first 10 rows of bra_customers
b10 = cur.execute('''SELECT * FROM bra_customers;''').fetchmany(10)
print(b10)
df1 = pd.read_sql_query('''SELECT lead_time FROM bra_customers WHERE is_cancelled = 1;''', con)
# Task 9: Retrieve lead_time rows where the bookings were canceled
lead_time_can = cur.execute('''SELECT lead_time FROM bra_customers WHERE is_cancelled = 1;''').fetchall()
# Task 10: Find average lead time for those who canceled and print results
print(df1.mean())
# Task 11: Retrieve lead_time rows where the bookings were not canceled
df0 = pd.read_sql_query('''SELECT lead_time FROM bra_customers WHERE is_cancelled = 0;''', con)
# Task 12: Find average lead time for those who did not cancel and print results
print(df0.mean())
# Task 13: Retrieve special_requests rows where the bookings were canceled
spec_req = cur.execute('''SELECT special_requests FROM bra_customers WHERE is_cancelled = 1;''').fetchall()
# Task 14: Find total special requests for those who canceled and print results
num = 0
for i in spec_req:
num = num + i[0]
average = (num / len(spec_req))
print(average)
# Task 15: Retrieve special_requests rows where the bookings were not canceled
spec_req2 = cur.execute('''SELECT special_requests FROM bra_customers WHERE is_cancelled = 0;''').fetchall()
# Task 16: Find total special requests for those who did not cancel and print results
num2 = 0
for i in spec_req2:
num2 = num2 + i[0]
average = (num2 / len(spec_req2))
print(average)
# Task 17: Commit changes and close the connection
con.commit()
con.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment