Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

View dyerrington's full-sized avatar
💭
I may be slow to respond.

David Yerrington dyerrington

💭
I may be slow to respond.
View GitHub Profile
@dyerrington
dyerrington / pandas_leftjoin.py
Created November 3, 2015 04:56
Basic left join example using pandas data frames.
test_1 = pd.DataFrame([['Test 1', 'Dogs', 'Cats'], ['Test 2', 'Fogs', 'Squids']], columns=['Company', 'A1', 'A2'])
test_2 = pd.DataFrame([['Test 1', 4, 5, 6], ['Test 1', 6,3,1], ['Test 1', 3, 3, 1], ['Test 2', 2,3 ,4], ['Test 2', 7, 8, 9]], columns=['Company', 'V1', 'V2', 'V3'])
pd.merge(left=test_2, right=test_1, how='left', left_on='Company', right_on='Company')
@dyerrington
dyerrington / simple_scaler.py
Created November 3, 2015 05:02
Another simple scaler found in scipy
from scipy.interpolate import interp1d
# params: [from_min, from_max], [to_min, to_max]
m = interp1d([1,100],[1,7])
m(99.234)
# output: array(6.9535757575757575)
@dyerrington
dyerrington / comprehensive_spray.io.conf
Created January 15, 2016 21:20
After many problems, it's been difficult to find which settings with Akka and Spray have an impact on performance. This is an attempt to put all the settings that seem like they matter, in one place.
akka.actor{
creation-timeout = 20s
default-dispatcher {
throughput = 20
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 16
parallelism-factor = 2.0
parallelism-max = 16
}
@dyerrington
dyerrington / jdbc_type_mapping.scala
Created January 20, 2016 19:13
I know there are better ways to handle this problem explicitly but I do a lot of prototyping and it's helpful to have JDBC dynamically map to equivalent Scala types, so I read the docs about Java's java.sql.Types (https://docs.oracle.com/javase/6/docs/api/constant-values.html#java.sql.Types), and setup this basic JDBC sql type to Scala mapping m…
def getJDBCResults(sql:String = "SHOW PROCESSLIST()") : List[Map[String,Any]] = {
// classOf[com.mysql.jdbc.Driver]
val conn = DriverManager.getConnection(s"${this.dsn}?user=${this.dbuser}&password=${this.dbpassword}")
try {
// Configure to be Read Only
val statement = conn.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)
val rs = statement.executeQuery(sql)
@dyerrington
dyerrington / print_classpaths.scala
Created January 21, 2016 21:08
Easily print classpath and jar files of loaded classes in scala.
def urlses(cl: ClassLoader): Array[java.net.URL] = cl match {
case null => Array()
case u: java.net.URLClassLoader => u.getURLs() ++ urlses(cl.getParent)
case _ => urlses(cl.getParent)
}
val urls = urlses(getClass.getClassLoader)
urls.filterNot(_.toString.contains("ivy")).foreach(println)
@dyerrington
dyerrington / connect4.py
Created March 18, 2016 10:38
This is a quick framework for a game of connect4 that plays itself. While I was working on this, I've thought of some neat ways to write an AI that could be outfitted with a machine learning process perhaps.
import numpy as np, random, time, random, itertools
from itertools import cycle
class connect4:
# These values will override dynamic class attributes
defaults = {
'board_matrix_size': (7,6), # Standard connect-4 board size is 7x6
'game_simulate_steps': 3, # For testing
'game_simulate_delay': 1, # Delay between game simulation steps
data = [
["2001-10-1", 2, 3, 4],
["2001-11-1", 2, 3, 4],
["2001-05-1", 2, 3, 4],
["2001-05-1", 2, 3, 4],
["2001-03-1", 2, 3, 4]
]
df = pd.DataFrame(data, columns=["date", "a", "b", "c"])
df["my_date"] = pd.to_datetime(df["date"])
df.dtypes
import pandas as pd
# Careful of displaying too many results in a single browser session == high CPU potential
pd.options.display.max_rows = 999
pd.options.display.max_columns = 999
@dyerrington
dyerrington / interpolate_missing_object_values.py
Last active May 4, 2016 18:53
One of the problems with interpolate() in Pandas is that it only works on continuous data. Using ffill(), you can fill objects. Using gropuby in iteration, we can fill in missing categorical data / object type data cells in our dataframe based on subsets.
import pandas as pd, numpy as np
data = [["blabla", "234234234", "yoyoyo", "Super Store235"],
[np.nan, np.nan, np.nan, "Super Store"],
[np.nan, np.nan, np.nan, "Super Store"],
["yo yo yo", 456, 789, "Super Store"],
[np.nan, np.nan, np.nan, "Super Store"],
[np.nan, np.nan, np.nan, "Super Store"],
[123, 456, 789, "Super Store2"],
@dyerrington
dyerrington / basic_linear_regression.py
Created May 4, 2016 23:56
End to end example of Pandas with sklearn LinearRegression using test data (diabetes data from sklearn.datasets)
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_diabetes
from sklearn.cross_validation import train_test_split
# We load some test data
data = load_diabetes()
# Put it in a data frame for future reference -- or you work from your own dataframe
df = pd.DataFrame(data['data'])