Skip to content

Instantly share code, notes, and snippets.

View dyerrington's full-sized avatar
💭
I may be slow to respond.

David Yerrington dyerrington

💭
I may be slow to respond.
View GitHub Profile
@dyerrington
dyerrington / preprocess_corpus.py
Last active August 29, 2015 14:21
Preprocessing pipeline for processing documents with Gensim. Easily manage text data to format data frames, run classification, etc.
import numpy as np, pandas as pd, os, seaborn as sns, codecs
from gensim import corpora, models, similarities
from gensim.parsing.preprocessing import STOPWORDS
class preprocess_corpus(object):
files = []
dirs = []
def __init__(self, dir, directory=False, stopwords_file=False):
@dyerrington
dyerrington / probability.py
Created May 30, 2015 20:04
Simple probability and stat functions
import bisect
import random
def Mean(t):
"""Computes the mean of a sequence of numbers.
Args:
t: sequence of numbers
Returns:
import numpy as np
import scipy as sp
import scipy.stats
def mean_confidence_interval(data, confidence=0.95):
a = 1.0*np.array(data)
n = len(a)
m, se = np.mean(a), scipy.stats.sem(a)
h = se * sp.stats.t._ppf((1+confidence)/2., n-1)
return m, m-h, m+h
@dyerrington
dyerrington / auto_coefficients.py
Created October 9, 2015 07:28
Singular linear coefficients
def auto_coefficients(df):
sorted_coefs = list()
coefs = df.corr()
for row_index, row_values in enumerate(coefs.values):
for col_index, col_value in enumerate(row_values):
if coefs.columns[row_index] == coefs.columns[col_index]:
@dyerrington
dyerrington / pandas_leftjoin.py
Created November 3, 2015 04:56
Basic left join example using pandas data frames.
test_1 = pd.DataFrame([['Test 1', 'Dogs', 'Cats'], ['Test 2', 'Fogs', 'Squids']], columns=['Company', 'A1', 'A2'])
test_2 = pd.DataFrame([['Test 1', 4, 5, 6], ['Test 1', 6,3,1], ['Test 1', 3, 3, 1], ['Test 2', 2,3 ,4], ['Test 2', 7, 8, 9]], columns=['Company', 'V1', 'V2', 'V3'])
pd.merge(left=test_2, right=test_1, how='left', left_on='Company', right_on='Company')
@dyerrington
dyerrington / grouper.py
Last active November 30, 2015 05:02
When I work with date formats, it’s nice to have them as actual “datetime” objects rather than objects. If you notice when you first import the csv, the “Time” feature has a dtype “object”. If we convert this object to a “datetime” type, we can use Pandas Grouper() to actually do a groupby unique time period (ie: days, weeks, months, years).
#Step1, convert Time after loading:
ufo = pd.read_csv('https://raw.githubusercontent.com/sinanuozdemir/SF_DAT_17/master/data/ufo.csv') # can also read csvs directly from the web!
ufo['Time'] = ufo['Time'].apply(pd.to_datetime)
# Step 2: Group by unique days
ufo.groupby([pd.Grouper(key='Time',freq='1D')])[['Shape Reported']].count()
# Also, you can concat Year, Month, and Day into a new feature, and group by that. As an engineer, I much prefer to work on strict types and leverage current method.
@dyerrington
dyerrington / collision_layer
Created August 15, 2013 18:22
Detect obstacles based on layer properties.
--DETECT OBSTACLES ------------------------------------------------------------
local obstacle = function(level, locX, locY)
local detect = mte.getTileProperties({level = level, locX = locX, locY = locY})
for i = 1, #detect, 1 do
-- I forget why I check the tile property...
if detect[i].tile then
-- does its layers property solid?
if layers[i].properties and layers[i].properties.solid and detect[i].tile > 0 then
detect = "stop"
player:pause()
@dyerrington
dyerrington / controls_dialog
Last active December 21, 2015 03:38
Dealing with dialog actions in LUA
local ActionKey = display.newRect(controlGroup, DpadBack.x + 750, DpadBack.y - 37, 100, 100)
local function move( event )
if event.phase == "ended" or event.phase == "cancelled" then
movement = nil
elseif player.talking then
return false;
elseif event.target.id then
movement = event.target.id
player.lastMovement = event.target.id
@dyerrington
dyerrington / obj_player
Created January 24, 2014 02:26
Basic setup for GML top/down player control
if(keyboard_check(vk_nokey)) {
image_speed = 0;
}
if(keyboard_check(vk_left)) {
sprite_index = spr_right;
image_speed = 0.2;
image_xscale= 1;
x -= 4;
}
@dyerrington
dyerrington / comprehensive_spray.io.conf
Created January 15, 2016 21:20
After many problems, it's been difficult to find which settings with Akka and Spray have an impact on performance. This is an attempt to put all the settings that seem like they matter, in one place.
akka.actor{
creation-timeout = 20s
default-dispatcher {
throughput = 20
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 16
parallelism-factor = 2.0
parallelism-max = 16
}