This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def load_dataset(split="trn_set", limit=None, ignore_categorical=False): | |
sql = """ | |
SELECT o.*, f1.*, f2.*, f3.*, f4.*, | |
EXTRACT(MONTH FROM o.dt) AS month | |
FROM %s AS t | |
JOIN Online AS o | |
ON t.index = o.index | |
JOIN features_group_1 AS f1 | |
ON t.index = f1.index | |
JOIN features_group_2 AS f2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mysql> show tables; | |
+----------------------+ | |
| Tables_in_shutterfly | | |
+----------------------+ | |
| Online | | |
| Purchase | | |
| features_group_1 | | |
| features_group_2 | | |
| features_group_3 | | |
| features_group_4 | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
USE Shutterfly; | |
DROP TABLE IF EXISTS features_group_1; | |
CREATE TABLE IF NOT EXISTS features_group_1 | |
SELECT o.index | |
,LEFT(o.dt, 10) AS day | |
,COUNT(*) AS order_count | |
,SUM(p.revenue) AS revenue_sum | |
,MAX(p.revenue) AS revenue_max |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mysql> select count(*) from trn_set; | |
+----------+ | |
| count(*) | | |
+----------+ | |
| 859296 | | |
+----------+ | |
1 row in set (0.61 sec) | |
mysql> select count(*) from tst_set; | |
+----------+ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sql = "SELECT `index`, event2 FROM Online;" | |
df = pd.read_sql_query(sql, engine).set_index('index') | |
# shuffle dataset, preserving index | |
df = df.sample(frac=1) | |
train_frac = 0.9 | |
test_frac = 1 - train_frac | |
trn_cutoff = int(len(df) * train_frac) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mysql> use shutterfly; | |
Reading table information for completion of table and column names | |
You can turn off this feature to get a quicker startup with -A | |
Database changed | |
mysql> show tables; | |
+----------------------+ | |
| Tables_in_shutterfly | | |
+----------------------+ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from sqlalchemy import create_engine | |
import pandas as pd | |
username = "root" | |
password = "1234567" | |
port = 3306 | |
database = "Shutterfly" | |
engine = create_engine('mysql+mysqldb://%s:%s@localhost:%i/%s' | |
%(username, password, port, database)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# two-tailed t-test | |
h_0 = 300 | |
h_1 = 290 | |
n = 50 | |
se = 20 / np.sqrt(n) | |
power = compute_power(h_0, h_1, se, tail="two") | |
print("power: %.3f, beta: %.3f"%(power, 1 - power)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# one-tailed z-test | |
h_0 = 0.8 | |
h_1 = 0.75 | |
n = 100 | |
se = np.sqrt(h_0 * (1 - h_0) / n) | |
power = compute_power(h_0, h_1, se, tail="left") | |
print("power: %.3f, beta: %.3f"%(power, 1 - power)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def critical_z(alpha=0.05, tail="two"): | |
""" | |
Given significance level, compute critical value. | |
""" | |
if tail == "two": | |
p = 1 - alpha / 2 | |
else: | |
p = 1 - alpha | |
return norm.ppf(p) |