Skip to content

Instantly share code, notes, and snippets.

View graham-thomson's full-sized avatar

Graham Thomson graham-thomson

  • United States
View GitHub Profile
@mrchristine
mrchristine / vector_sum_udaf.scala
Created November 29, 2017 21:39
Spark UDAF to sum vectors for common keys
package com.databricks.example.pivot
/**
This code allows a user to add vectors together for common keys.
The code in the comments show you how to register the scala UDAF to be called from pyspark.
The UDAF can only be called from a SQL expression (aka spark.sql() or df.expr() )
**/
/**
# Python code to register a scala UDAF
@bsweger
bsweger / useful_pandas_snippets.md
Last active April 19, 2024 18:04
Useful Pandas Snippets

Useful Pandas Snippets

A personal diary of DataFrame munging over the years.

Data Types and Conversion

Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)

@hoffrocket
hoffrocket / parhttp.py
Created August 28, 2012 00:29
Python parallel http requests using multiprocessing
#!/usr/bin/env python
from multiprocessing import Process, Pool
import time
import urllib2
def millis():
return int(round(time.time() * 1000))
def http_get(url):
@reuben-sutton
reuben-sutton / CosineSimilarity.scala
Created June 14, 2012 21:10
Cosine Similarity Scala Object
/*
* Object in scala for calculating cosine similarity
* Reuben Sutton - 2012
* More information: http://en.wikipedia.org/wiki/Cosine_similarity
*/
object CosineSimilarity {
/*
* This method takes 2 equal length arrays of integers
@jxson
jxson / README.md
Created February 10, 2012 00:18
README.md template

Synopsis

At the top of the file there should be a short introduction and/ or overview that explains what the project is. This description should match descriptions added for package managers (Gemspec, package.json, etc.)

Code Example

Show what the library does as concisely as possible, developers should be able to figure out how your project solves their problem by looking at the code example. Make sure the API you are showing off is obvious, and that your code is short and concise.

Motivation

@ecarter
ecarter / major_us_city_dma_codes.py
Created May 25, 2010 16:18
Major US Cities with Latitude/Longitude and DMA Codes
# Major US Cities with DMA Codes
major_cities = [
{'city': 'Anchorage', 'dma_code': 743, 'latitude': 61.2180556, 'longitude': -149.9002778, 'region': 'AK', 'slug': 'anchorage-ak'},
{'city': 'Fairbanks', 'dma_code': 745, 'latitude': 64.837777799999998, 'longitude': -147.7163889, 'region': 'AK', 'slug': 'fairbanks-ak'},
{'city': 'Juneau', 'dma_code': 747, 'latitude': 58.301944399999996, 'longitude': -134.4197222, 'region': 'AK', 'slug': 'juneau-ak'},
{'city': 'Birmingham', 'dma_code': 630, 'latitude': 33.520660800000002, 'longitude': -86.802490000000006, 'region': 'AL', 'slug': 'birmingham-al'},
{'city': 'Dothan', 'dma_code': 606, 'latitude': 31.223231299999998, 'longitude': -85.3904888, 'region': 'AL', 'slug': 'dothan-al'},
{'city': 'Decatur', 'dma_code': 691, 'latitude': 34.605925300000003, 'longitude': -86.983341699999997, 'region': 'AL', 'slug': 'decatur-al'},
{'city': 'Florence', 'dma_code': 691, 'latitude': 34.799810000000001, 'longitude': -87.677250999999998, 'region': 'AL', 'slug': 'florence-al'},