Skip to content

Instantly share code, notes, and snippets.

View timothyrenner's full-sized avatar

Timothy Renner timothyrenner

View GitHub Profile
@timothyrenner
timothyrenner / pyspark_udf_definition.py
Created January 30, 2019 15:57
Pyspark UDF Definition
import numpy as np
def predict(*features):
""" Performs a prediction on the features.
Parameters
----------
features : List[float]
The feature values the model needs to make a prediction.
@timothyrenner
timothyrenner / average_agreement.py
Created September 13, 2018 15:39
Average Agreement, Attempt 3
def average_agreement(list1, list2, max_depth):
# Empty lists evaluate to false.
if (not list1) or (not list2):
return 0.0
### NEW CODE ###
# Truncate the depth
max_list_len = max(len(list1), len(list2))
max_depth = min(max_depth, max_list_len)
@timothyrenner
timothyrenner / average_agreement_test.py
Created September 13, 2018 15:30
Average Agreement Test, Attempt 2
from hypothesis import given, settings # <- NEW CODE
from hypothesis.strategies import lists, integers
@given(
list1=lists(integers(min_value=1)),
list2=lists(integers(min_value=1)),
depth=integers(min_value=1)
)
@settings(deadline=300) # <- NEW CODE
def test_average_agreement_properties(list1, list2, depth):
@timothyrenner
timothyrenner / average_agreement.py
Last active September 13, 2018 15:14
Average Agreement, Attempt 1
def average_agreement(list1, list2, max_depth):
agreements = []
for depth in range(1, max_depth+1):
set1 = set(list1[:depth])
set2 = set(list2[:depth])
intersection = set1 & set2
@timothyrenner
timothyrenner / average_agreement.py
Created September 13, 2018 15:14
Average Agreement, Attempt 2
def average_agreement(list1, list2, max_depth):
### NEW CODE ###
# Empty lists evaluate to false.
if (not list1) or (not list2):
return 0.0
agreements = []
for depth in range(1, max_depth+1):
@timothyrenner
timothyrenner / average_agreement_test.py
Last active September 13, 2018 14:47
Average Agreement Test, Attempt 1
from hypothesis import given
from hypothesis.strategies import lists, integers
@given(
list1=lists(integers(min_value=1)),
list2=lists(integers(min_value=1)),
depth=integers(min_value=1)
)
def test_average_agreement_properties(list1, list2, depth):
@timothyrenner
timothyrenner / elastic_async_scan.py
Last active March 13, 2018 12:40
Example of a coroutine that uses the scroll API to allow async scrolling for Elasticsearch queries.
import aiohttp
import asyncio
import json
from toolz import get_in, assoc, concat
url = "http://localhost:9200"
index = "stocks"
doc_type = "stock"
@timothyrenner
timothyrenner / make_haunted_places_map.py
Last active February 25, 2018 23:57
Building a map of haunted places using shapefiles and cartopy.
import pandas as pd
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import cartopy.io.shapereader as shpreader
from shapely.ops import cascaded_union
non_conus_states = {'VI', 'AK', 'HI', 'PR', 'GU', 'MP', 'AS'}
@timothyrenner
timothyrenner / gorilla-to-markdown.clj
Created December 5, 2015 14:07
Gorilla To Markdown
(ns gorilla-to-markdown.core
(:require [clojure.string :as str]
[cheshire.core :as json]))
(defn- process-output-json
"Builds the HTML from the output JSON.
Assumes the JSON has already been parsed, and that j is a map."
[j]
(case (:type j)
"html" (:content j)
@timothyrenner
timothyrenner / groupByTiming.jl
Created December 1, 2014 21:54
Comparison Between Python Pandas and Julia DataFrames GroupBy Operations
using DataFrames
keys = rand(1:100000, 500000);
values = randn(length(keys));
df = DataFrame();
df[:KEY] = keys;
df[:VALUE] = values;
@time by(df, :KEY, x -> sum(x[:VALUE]));