Skip to content

Instantly share code, notes, and snippets.

View rjurney's full-sized avatar

Russell Jurney rjurney

View GitHub Profile
@rjurney
rjurney / large.dot
Created May 13, 2012 16:16
DOT formatted large DAG example
digraph pig {
node [label="\N", shape=record];
graph [bb="0,0,861.06,1532"];
scope_819 [label="{<f0>1.1 1.2|<f1>GROUP_BY MULTI_QUERY|<f2>job_201204251821_148386}", pos="360,262", rects="263.55,274.4,456.45,299.2 263.55,249.6,456.45,274.4 263.55,224.8,456.45,249.6", width="2.673", height="1.0472"];
scope_825 [label="{<f0>6.1 6.2 6.3 6.4|<f1>HASH_JOIN MULTI_QUERY|<f2>job_201204251821_148529}", pos="124,150", rects="26.777,162.4,221.22,187.2 26.777,137.6,221.22,162.4 26.777,112.8,221.22,137.6", width="2.6975", height="1.0472"];
scope_884 [label="{<f0>8.1 8.2 8.3 8.4|<f1>REPLICATED_JOIN GROUP_BY|<f2>job_201204251821_148528}", pos="471,150", rects="362.88,162.4,579.12,187.2 362.88,137.6,579.12,162.4 362.88,112.8,579.12,137.6", width="3.0017", height="1.0472"];
scope_801 [label="{<f0>2.1 2.2 2.3|<f1>REPLICATED_JOIN MULTI_QUERY MAP_ONLY|<f2>job_201204251821_147833}", pos="266,486", rects="105.2,498.4,426.8,523.2 105.2,473.6,426.8,498.4 105.2,448.8,426.8,473.6", width="4.4555", height="1.0472"];
scope_815 [lab
@rjurney
rjurney / Hackery and Tomfoolery.java
Created May 17, 2012 05:55
Hacking Datetimes into mongo-hadoop
// Note: Pig does not have a DateTime type yet, so this is hackery.
// See https://issues.apache.org/jira/browse/PIG-1314
case DataType.CHARARRAY:
// If it starts like an ISODate...
if((String)d.startsWith("ISODate(")) {
// and it ends like an ISODate...
if((String)d.endsWith(")")) {
// lets treat it like an ISODate.
try {
builder.add( field.getName(), new Date((String)d));
@rjurney
rjurney / mongo.pig
Created May 17, 2012 06:04
The Beauty of ILLUSTRATE in Pig when working with Avro documents and MongoDB
/* Piggybank */
register /me/pig/contrib/piggybank/java/piggybank.jar
/* Avro */
register /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
register /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
register /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
register /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar
register /me/pig/build/ivy/lib/Pig/joda-time-1.6.jar
@rjurney
rjurney / hive.sql
Created May 29, 2012 02:14
Unable to create a table in HIVE local mode :(
set mapred.job.tracker=local;
set mapred.local.dir=/tmp/hive
set hive.exec.mode.local.auto=false;
set http://fs.default.name=file: ///tmp/hive;
create table from_to (from_address string, to_address string, dt string);
FAILED: Error in metadata: MetaException(message:Got exception: java.net.ConnectException Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refused)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
@rjurney
rjurney / hive.sql
Created May 31, 2012 01:21
Why can't I order by to_date(dt) in HiveQL?
select to_date(dt), count(*) as total from from_to group by to_date(dt) limit 10;
Works ok, but:
select to_date(dt), count(*) as total from from_to group by to_date(dt) order by to_date(dt) limit 10;
FAILED: Error in semantic analysis: Line 1:81 Invalid table alias or column reference dt :(
@rjurney
rjurney / hive.error
Created May 31, 2012 04:41
Trouble compiling my HIVE Serde
Buildfile: /Users/peyomp/serde/build.xml
compile:
[javac] /Users/peyomp/serde/build.xml:30: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 1 source file to /Users/peyomp/serde/build/classes
[javac] /Users/peyomp/serde/src/TimeSeries.java:3: package org.apache.hadoop.hive.ql.exec does not exist
[javac] import org.apache.hadoop.hive.ql.exec.UDF;
[javac] ^
[javac] /Users/peyomp/serde/src/TimeSeries.java:6: cannot find symbol
[javac] symbol: class UDF
@rjurney
rjurney / gist:2841498
Created May 31, 2012 06:33
Hive SELECT/GROUP/ORDER by problem
# I am making a chart UDF, to convert BigInts from count(6) to ******
# I am trying to find a way to select TimeSeries(count(*)) and group by day, but this results in an error:
select to_date(dt) as total,
TimeSeries(CAST(count(*) AS INT)) as stars,
count(*) as total
from from_to
group by to_date(dt)
order by to_date(dt);
@rjurney
rjurney / email_utils.py
Created June 3, 2012 02:07
My Pig script and Python Streaming Stuff
#!/opt/local/bin/python
import imaplib
import sys, signal
from avro import schema, datafile, io
import os, re
import email
import inspect, pprint
import getopt
import time
@rjurney
rjurney / app.py
Created June 15, 2012 22:43
Jinja2 Template being served by Flask on Heroku
import os, sys, time
from flask import Flask, render_template, request, redirect, make_response
PROJECT_ROOT = os.path.dirname(os.path.realpath(__file__))
app = Flask(__name__, static_folder=os.path.join(PROJECT_ROOT, 'static'), static_url_path='/static')
@rjurney
rjurney / error.log
Created June 20, 2012 21:20
Can't start hive metastore
hadoop@ip-10-4-115-52:~$ bin/hive --service metastore
Starting Hive Metastore Server
org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:9083.
at org.apache.thrift.transport.TServerSocket.<init>(TServerSocket.java:93)
at org.apache.thrift.transport.TServerSocket.<init>(TServerSocket.java:75)
at org.apache.hadoop.hive.metastore.TServerSocketKeepAlive.<init>(TServerSocketKeepAlive.java:34)
at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:3781)
at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:3742)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)