A personal diary of DataFrame munging over the years.
Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)
package com.databricks.example.pivot | |
/** | |
This code allows a user to add vectors together for common keys. | |
The code in the comments show you how to register the scala UDAF to be called from pyspark. | |
The UDAF can only be called from a SQL expression (aka spark.sql() or df.expr() ) | |
**/ | |
/** | |
# Python code to register a scala UDAF |
A personal diary of DataFrame munging over the years.
Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)
#!/usr/bin/env python | |
from multiprocessing import Process, Pool | |
import time | |
import urllib2 | |
def millis(): | |
return int(round(time.time() * 1000)) | |
def http_get(url): |
/* | |
* Object in scala for calculating cosine similarity | |
* Reuben Sutton - 2012 | |
* More information: http://en.wikipedia.org/wiki/Cosine_similarity | |
*/ | |
object CosineSimilarity { | |
/* | |
* This method takes 2 equal length arrays of integers |
At the top of the file there should be a short introduction and/ or overview that explains what the project is. This description should match descriptions added for package managers (Gemspec, package.json, etc.)
Show what the library does as concisely as possible, developers should be able to figure out how your project solves their problem by looking at the code example. Make sure the API you are showing off is obvious, and that your code is short and concise.
# Major US Cities with DMA Codes | |
major_cities = [ | |
{'city': 'Anchorage', 'dma_code': 743, 'latitude': 61.2180556, 'longitude': -149.9002778, 'region': 'AK', 'slug': 'anchorage-ak'}, | |
{'city': 'Fairbanks', 'dma_code': 745, 'latitude': 64.837777799999998, 'longitude': -147.7163889, 'region': 'AK', 'slug': 'fairbanks-ak'}, | |
{'city': 'Juneau', 'dma_code': 747, 'latitude': 58.301944399999996, 'longitude': -134.4197222, 'region': 'AK', 'slug': 'juneau-ak'}, | |
{'city': 'Birmingham', 'dma_code': 630, 'latitude': 33.520660800000002, 'longitude': -86.802490000000006, 'region': 'AL', 'slug': 'birmingham-al'}, | |
{'city': 'Dothan', 'dma_code': 606, 'latitude': 31.223231299999998, 'longitude': -85.3904888, 'region': 'AL', 'slug': 'dothan-al'}, | |
{'city': 'Decatur', 'dma_code': 691, 'latitude': 34.605925300000003, 'longitude': -86.983341699999997, 'region': 'AL', 'slug': 'decatur-al'}, | |
{'city': 'Florence', 'dma_code': 691, 'latitude': 34.799810000000001, 'longitude': -87.677250999999998, 'region': 'AL', 'slug': 'florence-al'}, |