Skip to content

Instantly share code, notes, and snippets.

View dyerrington's full-sized avatar
💭
I may be slow to respond.

David Yerrington dyerrington

💭
I may be slow to respond.
View GitHub Profile
from statsmodels.stats.power import tt_ind_solve_power
from scipy.interpolate import interp1d
import matplotlib.pyplot as plt
def test_ttest_power_diff(mean, std, sample1_size=None, alpha=0.05, desired_power=0.8, mean_diff_percentages=[0.1, 0.05]):
'''
calculates the power function for a given mean and std. the function plots a graph showing the comparison between desired mean differences
:param mean: the desired mean
:param std: the std value
:param sample1_size: if None, it is assumed that both samples (first and second) will have same size. The function then will
@dyerrington
dyerrington / polar_plot.py
Created November 7, 2019 19:06
Basic implementation of a matplotlib polar plot using a basic observations with multiple variables.
from math import pi
from mpl_toolkits.axes_grid.inset_locator import inset_axes
# Set data
df = pd.DataFrame({
# 'group': ['A','B','C','D'],
'var1': [38, 1.5, 30, 4],
'var2': [29, 10, 9, 34],
'var3': [8, 39, 23, 24],
'var4': [7, 31, 33, 14]
@dyerrington
dyerrington / generate_udf_js_big_query.py
Created September 18, 2019 21:33
Python code that will create, essentially a pivot from a nested big query set. Based on the original method in the google big query documentation.
# fighting == most common event type
def build_udf_prototype(event_types):
null = "null" # default all types to null in the UDF function
PIVOT_FEATURES = str({"col_" + event_name.replace("-", "_"): null for event_name in event_types.tolist()}).replace("'null'", "null")
SQL_RETURN = "STRUCT<"
for event_type in event_types.tolist():
event_type = event_type.replace("-", "_")
SQL_RETURN += f"col_{event_type} INT64, "
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dyerrington
dyerrington / readme.md
Last active January 9, 2019 20:59
This is a very basic data generator to test recommender systems. A future version may simulate the actual sparseness of ratings data with a simple bootstrap function but for now, numpy generator does the job.

RecData

To use this snippet, install faker:

pip install faker

Parse Jupyter

This is a basic class that makes it convenient to parse notebooks. I built a larger version of this that was used for clustering documents to create symantic indeices that linked related content together for a personal project. You can use this to parse notebooks for doing things like NLP or preprocessing.

Usage

parser = ParseJupyter("./Untitled.ipynb")
parser.get_cells(source_only = True, source_as_string = True)
import tweepy
import wget
import os
oauth = {
"consumer_key": "",
"consumer_secret": ""
}
access = {
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dyerrington
dyerrington / my_little_pony_lstm.py
Created July 16, 2018 05:57
As a point of comparison with the default Nietzsche example from the Keras repo, this little experiment swaps out the dataset with forum comments from My Little Pony subreddit.
'''Example script to generate text from Nietzsche's writings.
At least 20 epochs are required before the generated text
starts sounding coherent.
It is recommended to run this script on GPU, as recurrent
networks are quite computationally intensive.
If you try this script on new data, make sure your corpus
has at least ~100k characters. ~1M is better.
'''
from __future__ import print_function