Skip to content

Instantly share code, notes, and snippets.

@jeremykarn
jeremykarn / FromJsonInferSchema.java
Created February 27, 2015 23:06
FromJsonInferSchema UDF from Mortar Pig 0.12 fork. Requires json.patch from https://issues.apache.org/jira/browse/PIG-1914
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
@jeremykarn
jeremykarn / example_mortar_iam_emr_policy.json
Last active August 29, 2015 14:02
Example Mortar IAM EMR Policy document.
{
"Statement": [
{
"Action": [
"s3:*"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::*elasticmapreduce",
"arn:aws:s3:::*elasticmapreduce/*",
May 07 02:41:00 184.73.121.234 local7: 2014-05-07 02:41:00,389 [pool-2-thread-11] (RunJobTask.java:289) ERROR com.mortardata.mhc.model.RunJobTask - java.lang.RuntimeException: Unable to connect to Google to verify authentication.
May 07 02:41:00 184.73.121.234 local7: 2014-05-07 02:41:00,428 [pool-2-thread-11] (MessageHandler.java:397) ERROR com.mortardata.mhc.MessageHandler - An unexpected error occurred while running your job. Please try running the job again, and contact us if you continue to experience issues.
May 07 02:41:00 184.73.121.234 com.mortardata.mhc.exception.HawkInternalException: An unexpected error occurred while running your job. Please try running the job again, and contact us if you continue to experience issues.
May 07 02:41:00 184.73.121.234 logger: <187> at com.mortardata.mhc.model.RunJobTask.handleException(RunJobTask.java:292)
May 07 02:41:00 184.73.121.234 logger: <187> at com.mortardata.mhc.model.RunJobTask.doTask(RunJobTask.java:99)
May 07 02:41:00 184.73.121.234 logge
@jeremykarn
jeremykarn / gist:9536431
Created March 13, 2014 20:34
dynamo iam policy
{
"Statement": [
{
"Effect":"Allow",
"Action":"dynamodb:*",
"Resource":"*"
}
]
}
@jeremykarn
jeremykarn / udf.pig
Created September 30, 2013 13:57
Simple Pig example showing UDFs being called in each of the map and reduce phase.
REGISTER ‘udf.py’ USING streaming_python AS my_udfs;
tweets = LOAD 's3n://twitter-gardenhose-mortar/tweets'
USING org.apache.pig.piggybank.storage.JsonLoader(
'text: chararray, place:tuple(name:chararray)');
-- my_length UDF is called in the mapper for each tweet.
long_tweets = FILTER tweets BY my_udfs.my_length(text) > 50;
@jeremykarn
jeremykarn / nltk.pig
Created September 25, 2013 15:17
CPython and NLTK bigram example.
REGISTER ‘<python_file>’ USING streaming_python AS nltk_udfs;
tweets = LOAD 's3n://twitter-gardenhose-mortar/tweets'
USING org.apache.pig.piggybank.storage.JsonLoader(
'text: chararray, place:tuple(name:chararray)');
-- Group the tweets by place name and use a CPython UDF to find the top 5 bigrams
-- for each of these places.
bigrams_by_place = FOREACH (GROUP tweets BY place.name) GENERATE
group AS place:chararray,
@jeremykarn
jeremykarn / mongo pig schema generator
Last active December 11, 2015 13:38
The Pig and Python scripts for a Mortar web project that generates the MongoLoader schema associated with a Mongo collection. Need to supply your own MongoDB connection details and your own s3 bucket.
#
# Copyright 2012 Mortar Data Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
@jeremykarn
jeremykarn / awsstatus.coffee
Last active January 29, 2019 20:15
A Hubot script for monitoring the aws status rss feed and posting messages to a campfire room whenever there's a new update.
#
# Copyright 2012 Mortar Data Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software