Skip to content

Instantly share code, notes, and snippets.

@rlankenau
rlankenau / hr_stats.pig
Created August 8, 2013 14:14
Pig script using the RetrosheetLoader.
register 'maprfs:///user/rlankenau/moneyball-1.0-SNAPSHOT.jar';
DEFINE RetrosheetLoader com.mapr.baseball.RetrosheetLoader();
set default_parallel 20;
set job.name mapr_baseball_summary
raw = LOAD '/projects/baseball/*.EV?' USING RetrosheetLoader();
describe raw;
-- This outputs a single record for each play, with all associated game information.
flattened = FOREACH raw GENERATE $0 .. $39, FLATTEN($40);
@rlankenau
rlankenau / retrosheet_loadfunc.java
Created July 29, 2013 16:35
Pig LoadFunc for RetroSheet data.
@Override
public InputFormat getInputFormat() throws IOException {
return new RetrosheetInputFormat();
}
@Override
public Tuple getNext() throws IOException {
RetrosheetPlayer[] home_players = new RetrosheetPlayer[11];
RetrosheetPlayer[] away_players = new RetrosheetPlayer[11];
RetrosheetPlayer[] defense = null;
@rlankenau
rlankenau / retrosheet_record_snippet
Created July 29, 2013 16:09
Snippet of a RetroSheet record.
id,CHN200104020
version,2
info,visteam,MON
info,hometeam,CHN
info,site,CHI11
info,date,2001/04/02
...
@rlankenau
rlankenau / record_reader_snippet.java
Last active December 20, 2015 09:09
Code to split RetroSheet records by game
private boolean isStartLine(Text t)
{
/* Find the end of the first field */
int fieldTerm = t.find(",");
int idTerm = t.find("id");
return (idTerm != -1 && fieldTerm != -1 && idTerm < fieldTerm);
}
public boolean nextKeyValue() throws IOException
{