Skip to content

Instantly share code, notes, and snippets.

@KamiBG
KamiBG / KPMG InsideSherpa.py
Last active June 8, 2021 13:47
Based on the available data about the current clients, recommend which clients should be targeted to drive most value
#!/usr/bin/env python
# coding: utf-8
# In[1]:
import pandas as pd
import numpy as np
@KamiBG
KamiBG / Healthcare in Different States
Created September 28, 2020 12:43
Boxplot practice on Codecademy
import pandas as pd
from matplotlib import pyplot as plt
healthcare = pd.read_csv("healthcare.csv")
print(healthcare.head())
#print(healthcare["DRG Definition"].unique())
chest_pain = healthcare[healthcare['DRG Definition'] == '313 - CHEST PAIN']
# boxplot for Alabama:
alabama_chest_pain = chest_pain[chest_pain['Provider State'] == 'AL']
@KamiBG
KamiBG / Life Expectancy By Country
Created September 25, 2020 13:08
Quantiles Practice
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("country_data.csv")
#print(data.head())
life_expectancy = data['Life Expectancy']
# Find the Quantiles:
life_expectancy_quartiles = np.quantile(life_expectancy, [0.25, 0.5, 0.75])
@KamiBG
KamiBG / Twitch Part 1: Analyze Data with SQL
Last active September 5, 2020 09:46
Codecademy Project
SELECT * FROM stream
LIMIT 20;
SELECT * FROM chat
LIMIT 20;
-- What are the unique games in the stream table?
SELECT DISTINCT game FROM stream
WHERE game is NOT NULL;
import codecademylib3_seaborn
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.read_csv('WorldCupMatches.csv')
print(df.head())
# Create a column with the total number of goals scored in each match
df['Total Goals'] = df['Home Team Goals'] + df['Away Team Goals']
@KamiBG
KamiBG / Roller Coaster Data Visualisation Practice
Created August 30, 2020 11:25
Python Practice on Codecademy with open-ended requirements
import pandas as pd
import matplotlib.pyplot as plt
# load rankings data:
GTAW_wood = pd.read_csv('Golden_Ticket_Award_Winners_Wood.csv')
print(GTAW_wood.head())
GTAW_steel = pd.read_csv('Golden_Ticket_Award_Winners_Steel.csv')
print(GTAW_steel.head())
# write function to plot rankings over time for 1 roller coaster:
@KamiBG
KamiBG / Page Visits Funnel - Pandas Practice
Created August 25, 2020 00:45
Cool T-Shirts Inc. has asked you to analyze data on visits to their website. Your job is to build a funnel, which is a description of how many people continue to the next step of a multi-step process. In this case, our funnel is going to describe the following process: 1. A user visits CoolTShirts.com; 2. A user adds a t-shirt to their cart; 3. …
import pandas as pd
visits = pd.read_csv('visits.csv',
parse_dates=[1])
print(visits.head())
cart = pd.read_csv('cart.csv',
parse_dates=[1])
print(cart.head())
checkout = pd.read_csv('checkout.csv',
parse_dates=[1])
@KamiBG
KamiBG / AB Testing for ShoeFly.com Pandas Practice
Created August 23, 2020 04:00
Codecademy Pandas Practice Project. Description: Our favorite online shoe store, ShoeFly.com is performing an A/B Test. They have two different versions of an ad, which they have placed in emails, as well as in banner ads on Facebook, Twitter, and Google. They want to know how the two ads are performing on each of the different platforms on each…
import pandas as pd
ad_clicks = pd.read_csv('ad_clicks.csv')
print(ad_clicks.head())
# Your manager wants to know which ad platform is getting you the most views. How many views (i.e., rows of the table) came from each utm_source?
views = ad_clicks.groupby('utm_source').user_id.count().reset_index().rename(columns = {'user_id': 'views'})
print(views)
# Create a new column called is_click, which is True if ad_click_timestamp is not null and False otherwise.
@KamiBG
KamiBG / Attribution Queries SQL Practice
Created August 17, 2020 03:45
A codecademy SQL project (first/last touch)
SELECT * FROM page_visits
LIMIT 20;
SELECT DISTINCT utm_campaign FROM page_visits;
SELECT DISTINCT utm_source FROM page_visits;
-- Which source is used for each campaign?
SELECT DISTINCT utm_campaign, utm_source FROM page_visits;
-- What pages are on the CoolTShirts website?
@KamiBG
KamiBG / Calculating Churn Rates
Created August 14, 2020 11:08
SQL Practice from codecademy
-- Four months into launching Codeflix, management asks you to look into subscription churn rates. It’s early on in the business and people are excited to know how the company is doing.
-- The marketing department is particularly interested in how the churn compares between two segments of users. They provide you with a dataset containing subscription data for users who were acquired through two distinct channels.
-- The dataset provided to you contains one SQL table, subscriptions. Within the table, there are 4 columns:
-- id - the subscription id
-- subscription_start - the start date of the subscription
-- subscription_end - the end date of the subscription
--segment - this identifies which segment the subscription owner belongs to
-- Codeflix requires a minimum subscription length of 31 days, so a user can never start and end their subscription in the same month.
SELECT * FROM subscriptions