Skip to content

Instantly share code, notes, and snippets.

View CamDavidsonPilon's full-sized avatar
:shipit:
Learning bio brb

Cameron Davidson-Pilon CamDavidsonPilon

:shipit:
Learning bio brb
View GitHub Profile
import py4j
from pyspark.sql.functions import monotonically_increasing_id
# very important to cache this.
df = df.select(monotonically_increasing_id().alias("index"), "*")\
.cache()
MAX = 34359738368
def mod_binary_search(round, previous_winner, dataset):
# round starts at 0, previous_winner starts at 0
@CamDavidsonPilon
CamDavidsonPilon / model.py
Last active July 9, 2017 11:03
This NN model performs worse than sklearn's `LogisticRegression` model (with default params too). Really I can only get about 57% on the test set, and LR gives me about 58%. What am I doing wrong - I've tried dozen of permutations of the network topology, optimisers, etc.? Note the size of the data is (400k, 144), so I have lots of data.
# FFN to model game outcome
import pandas as pd
from keras.layers import Dense
from keras.models import Sequential
import numpy as np
if __name__ == "__main__":
import sqlite3
with sqlite3.connect("data/heroes.db") as conn:
from collections import deque
def sequence():
queue = deque([3,3,3,2])
yield queue.popleft()
while True:
value = queue.popleft()
queue.extend([3] * value + [2])
"""
C6 = sum of heads on the 60% coin after N flips
C5 = sum of heads on the 50% coin after N flips
P(C6 > C5 | N flips) >= 0.95 # solve for smallest N
C6 ~ Binomial(N, 0.6)
C5 ~ Binomial(N, 0.5)
%pyplot
x = np.arange(32)
A = [5, 10, 14, 20, 24, 27]
B = [3, 4, 6, 7, 8]
C = [2, 12, 22]
plt.scatter(A, 3*np.ones_like(A), c='k', marker='X', lw=0.5)
plt.scatter(B, 2*np.ones_like(B), c='k', marker='X', lw=0.5)
plt.scatter(C, 1*np.ones_like(C), c='k', marker='X', lw=0.5)
@CamDavidsonPilon
CamDavidsonPilon / econ_jobs_at_shopify.md
Last active November 10, 2017 23:07
At Shopify, we empower 500,000+ entrepreneurs all over the world. We’re looking for hard-working, passionate people to help us make commerce better. On the Shopify Decision Science team, it's our job to understand and measure the company using a statistical lens. I strongly feeling that economists make a big impact here. Below are some jobs open…

Economists jobs on the Shopify Decision Science Team

logo

Hi there,

At Shopify, we empower 500,000+ entrepreneurs all over the world. We’re looking for hard-working, passionate people to help us make commerce better. On the Shopify Decision Science team, it's our job to understand and measure the company using a statistical lens. We strongly feel that economists make a big impact here. Below are some jobs open for economists:

Finance Analytics Team

Shopify can be seen as a country, with 500,000+ citizens (I suppose that makes us the government). With this perspective, we can ask questions like what is the wealth inequality in Shopify, how are our population cohorts evolving over time, how much should we "tax" citizens, what is the predicted birth rate of the country, and so on. This team works closely with the CFO and Finance team and sees their results reported publically.

import numpy as np
from numpy.linalg import matrix_power
from matplotlib import pyplot as plt
import seaborn as sns
SIZE = 100
M = np.zeros((SIZE, SIZE))
# encoding rolls of die
for y in xrange(SIZE):
import numpy as np
from numpy.linalg import matrix_power
from matplotlib import pyplot as plt
import seaborn as sns
SIZE = 100
M = np.zeros((SIZE, SIZE))
# encoding rolls of die
for y in xrange(SIZE):
def concordance_index(event_times, predicted_scores, event_observed=None):
"""
Calculates the concordance index (C-index) between two series
of event times. The first is the real survival times from
the experimental data, and the other is the predicted survival
times from a model of some kind.
The concordance index is a value between 0 and 1 where,
0.5 is the expected result from random predictions,
1.0 is perfect concordance and,
def falling_factorial(k, n):
product = 1
counter = k
while counter > n:
product *= counter
counter -= 1
return product
f = lambda k: 2 * falling_factorial(k, k-100) - k**100