Skip to content

Instantly share code, notes, and snippets.

@ramhiser
ramhiser / spark-master-controller.log
Created September 6, 2016 19:25
Kubernetes + Spark Exception: java.net.UnknownHostException: metadata
16/09/06 19:00:49 INFO Master: Registered signal handlers for [TERM, HUP, INT]
16/09/06 19:00:50 INFO SecurityManager: Changing view acls to: root
16/09/06 19:00:50 INFO SecurityManager: Changing modify acls to: root
16/09/06 19:00:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/09/06 19:00:51 INFO Slf4jLogger: Slf4jLogger started
16/09/06 19:00:51 INFO Remoting: Starting remoting
16/09/06 19:00:51 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkMaster@spark-master:7077]
16/09/06 19:00:51 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
16/09/06 19:00:51 INFO Master: Starting Spark master at spark://spark-master:7077
16/09/06 19:00:51 INFO Master: Running Spark version 1.5.2
@ramhiser
ramhiser / bayesian-billiards.r
Last active August 1, 2016 20:38
Bayesian Billiards simultation in R based on Shiny app from Jason Bryer
# Problem Definition: https://priorprobability.com/2014/04/27/bayesian-billiards/
# Referenced Shiny app: http://jason.bryer.org/posts/2016-02-21/Bayes_Billiards_Shiny.html
library(dplyr)
set.seed(424242)
true_p <- runif(1)
num_draws <- 1000
draws <- sample(c(0, 1), num_draws, replace=TRUE, prob=c(1-true_p, true_p))
@ramhiser
ramhiser / cloudvision.py
Last active February 19, 2016 21:51
Simple script to generate JSON to send to Google CloudVision API
# Did this without *requests* to avoid dependencies.
import urllib
import urllib2
import argparse
import base64
import json
API_URL = 'https://vision.googleapis.com/v1/images:annotate'
API_KEY = 'FLUFFY BUNNIES'
@ramhiser
ramhiser / remove_substrings.py
Created December 15, 2015 16:07
Python snippet to remove any substrings of other strings in the list
from collections import defaultdict
def remove_substrings(words):
"""Remove any substrings of other strings in the list.
O(n) solution from...
Source: http://stackoverflow.com/a/24049808/234233
"""
longest = defaultdict(str)
for word in words: # O(n)
@ramhiser
ramhiser / download-youtube.py
Last active October 14, 2015 21:16
Python script to download YouTube videos from a YouTube user ID.
import argparse
import logging
import requests
from lxml import html
from pytube import YouTube
def get_youtube_links(youtube_userid):
YT_ROOT = r'http://www.youtube.com'
@ramhiser
ramhiser / tornado-example.py
Created October 12, 2015 18:13
Simple Tornado server example
import tornado
import tornado.ioloop
import tornado.web
from tornado.httpclient import AsyncHTTPClient
class MainHandler(tornado.web.RequestHandler):
@gen.coroutine
def get(self):
@ramhiser
ramhiser / sphinx2vtt.py
Created September 10, 2015 02:49
Convert CMU Sphinx closed-captioning auto alignment to WebVTT format
#!/usr/bin/env python
import argparse
import sys
import time
from itertools import izip, count
def parse_sphinx_line(line):
'''Parse a line from Sphinx's closed captioning alignment'''
line_split = line.split()
@ramhiser
ramhiser / awk.ftw
Created August 23, 2015 21:24
awk one-liners
# Print each row's number of characters in column of CSV (example: 2nd column)
awk -F, '{print length($2)}'
@ramhiser
ramhiser / sample.js
Last active August 29, 2015 14:26
Weighted random sample from a vector in JavaScript
// Weighted random sample from a vector
//
// By default, the `weights` are set to 1. This equates to equal weighting.
// Loosely based on http://codereview.stackexchange.com/a/4265
//
// If any weight is `null`, revert to default weights (i.e., all 1).
//
// A random-number generator (RNG) seed is optionally set via seedrandom.js.
// NOTE: The JS file is loaded via jQuery.
// Details: https://github.com/davidbau/seedrandom
@ramhiser
ramhiser / thd.py
Created July 19, 2015 02:11
Simple EDA of daily Home Depot stock prices
%matplotlib inline
from yahoo_finance import Share
import matplotlib.pylab
import pandas as pd
import numpy as np
thd = Share('HD')
thd_prices = thd.get_historical('2010-01-01', '2015-06-01')
thd_prices = pd.DataFrame(thd_prices)