Skip to content

Instantly share code, notes, and snippets.

View dkincaid's full-sized avatar

Dave Kincaid dkincaid

View GitHub Profile
@dkincaid
dkincaid / infochimps_geo.R
Created September 10, 2011 15:30
Pulling ACS data from Infochimps Geo API in R
library(RJSONIO)
library(ggplot2)
api.uri <- "http://api.infochimps.com/"
acs.topline <- "social/demographics/us_census/topline/search?"
api.key <- "apikey=xxxxxxxxxx" # replace the x's with your Infochimps API key
radius <- 10000 # in meters
lat <- 44.768202
long <- -91.491603
@dkincaid
dkincaid / getcode.R
Created November 14, 2011 21:17
Geocode function for R using Infochimps
geocode = function(location) {
library(RJSONIO)
api.uri = "http://api.infochimps.com/"
geocode.uri = "geo/utils/geolocate?"
api.key = "apikey=xxxxxxxxxxxx"
print(location)
uri = paste(api.uri, geocode.uri, api.key, "&f.address_text=", location, sep="")
raw.data = readLines(uri, warn="F")
results = fromJSON(raw.data)
(defn standard-tokenizer [text]
"Uses the Lucene StandardTokenizer to tokenize the given text. Returns a vector containing
the tokens."
(let [analyzer (StandardAnalyzer. Version/LUCENE_31)
tokenstream (.tokenStream analyzer "field" (StringReader. text))
termatt (.addAttribute tokenstream TermAttribute)
terms []]
(while (.incrementToken tokenstream)
(print (.term termatt)))))
@dkincaid
dkincaid / queries.clj
Created July 13, 2012 01:13
Cascalog sales transaction summary
(ns transaction.queries
(:use [cascalog.api])
(:require [cascalog.ops :as c]
[cascalog.tap :as tap]
[cascalog.workflow :as w])
(:import [com.google.common.hash Hashing]
[org.joda.time.format DateTimeFormat]
[cascading.scheme.hadoop TextDelimited])
(:gen-class))
@dkincaid
dkincaid / gist:3277518
Created August 6, 2012 18:52
Mutable state test
@Test
public void changeNameTest() {
MutableClass original_name = new MutableClass("my name");
MutableClass expected_name = original_name;
NameFilter filter = new NameFilter();
MutableClass new_name = filter.changeName(original_name,
"new name");
assertEquals(new_name, expected_name);
@dkincaid
dkincaid / gist:3277619
Created August 6, 2012 19:08
Fixed mutable state test
@Test
public void changeNameTest() {
MutableClass original_name = new MutableClass("my name");
MutableClass expected_name = new MutableClass("my name");
NameFilter filter = new NameFilter();
MutableClass new_name = filter.changeName(original_name,
"new name");
assertEquals(new_name, expected_name);
@dkincaid
dkincaid / gist:3277712
Created August 6, 2012 19:19
Change Name
public MutableClass changeName(MutableClass oldNameClass, String newName) {
MutableClass newNameClass = new MutableClass();
newNameClass = oldNameClass;
newNameClass.setName(newName);
return newNameClass;
}
@dkincaid
dkincaid / ClientEmailQuery.java
Last active December 10, 2015 17:48
Example of an issue with trying to use two PailTap's reading from the same Pail in a query.
/* If I execute only the clientQuery or only the emailQuery by themselves everything works right.
I set breakpoints inside the ExtractClientEdgeFields() and ExtractClientId() functions and they
are called with only the Data objects with the correct property types.
However, if I execute this query as it is shown here then only one of the two functions is called
with all of the Data objects from both taps. */
public static Subquery clientEmail(String pailPath) {
PailTap clientEdgeTap = clientEdgeTap(pailPath);
PailTap clientTap = petOwnerTap(pailPath);
@dkincaid
dkincaid / WordCount.java
Last active September 6, 2018 03:50
Hadoop job remote submission
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
@dkincaid
dkincaid / MinRTF-control-rule.g4
Last active December 22, 2015 09:58
RTF Parser Early Tests
grammar MinRtf ;
document : (control | text )+ ;
text : TEXT ;
control : KEYWORD INT? SPACE? ;
KEYWORD : '\\' (ASCIILETTER)+ ;