Skip to content

Instantly share code, notes, and snippets.

@jorisbontje
jorisbontje / positweets.hive
Created May 15, 2012 10:53
Twitter sentiment analysis using Apache Hive
drop table if exists raw_tweets;
drop table if exists tweets;
drop table if exists positive_hashtags_per_day;
drop table if exists count_positive_hashtags_per_day;
drop table if exists top5_positive_hashtags_per_day;
create table raw_tweets (json string);
load data local inpath 'sample.json' into table raw_tweets;
create table tweets as
@jorisbontje
jorisbontje / gist:5803625
Created June 18, 2013 08:31
hadoop default xml
http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project-dist/hadoop-common/core-default.xml
http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
---
- name: Install R
yum: name=$item state=installed
environment: $proxy_env
with_items:
- R
tags:
- packages
- name: Copy R package installer script
@jorisbontje
jorisbontje / gist:5056544
Created February 28, 2013 12:57
Hive Avro.txt
0) Download avro-tools jar file from avro.apache.org
1) Extract Avro schema using avro-tools.jar
java -jar avro-tools*.jar getschema file.avro > file.avsc
2) Upload Avro schema to hdfs
hadoop fs -cp file.avsc /use/training/file.avsc
@jorisbontje
jorisbontje / WebHDFS.txt
Last active December 13, 2015 23:49
WebHDFS
curl -O http://python-distribute.org/distribute_setup.py
sudo python distribute_setup.py
curl -O https://raw.github.com/pypa/pip/master/contrib/get-pip.py
sudo python get-pip.py
sudo pip install webhdfs
cp /usr/lib/python2.6/site-packages/webhdfs/example.py .
@jorisbontje
jorisbontje / heroku_dynos.sh
Created November 14, 2012 10:22
Heroku total number of dynos
#!/bin/sh
# Return total number of Heroku dynos for an account
#
# Uses:
# jutil <https://github.com/misterfifths/jutil.git>
# underscore.js <http://underscorejs.org/>
API_KEY="<your Heroku API key>"
curl -s -H "Accept: application/json" -u :$API_KEY https://api.heroku.com/apps | jselect 'dynos' | jutil 'return _.reduce($, function(memo, num){ return memo + num; }, 0);'
@jorisbontje
jorisbontje / export-scm-config.sh
Created May 27, 2012 12:04
Export the Cloudera Manager configuration
#!/bin/bash
USERNAME=admin
PASSWORD=admin
SCM_URL=http://localhost:7180
COOKIES_FILE=cookies.txt
EXPORT_FILE=export.txt
wget -q --post-data="j_username=${USERNAME}&j_password=${PASSWORD}" --save-cookies ${COOKIES_FILE} --keep-session-cookies -O /dev/null ${SCM_URL}/j_spring_security_check
wget -q -O ${EXPORT_FILE} --load-cookies ${COOKIES_FILE} ${SCM_URL}/cmf/exportCLI
@SuppressWarnings("serial")
@PlatformRunner.Platform({ LocalPlatform.class, HadoopPlatform.class})
public class SortTest extends PlatformTestCase {
private static final inputFileSort = "src/test/data/sort.txt";
public SortTest() {
super(false);
}
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.NullPointerException
LazySeq.java:47 clojure.lang.LazySeq.sval
LazySeq.java:56 clojure.lang.LazySeq.seq
Cons.java:39 clojure.lang.Cons.next
RT.java:1178 clojure.lang.RT.length
RT.java:1157 clojure.lang.RT.seqToArray
LazySeq.java:126 clojure.lang.LazySeq.toArray
RT.java:1135 clojure.lang.RT.toArray
core.clj:300 clojure.core/to-array
$ lein test
Testing cascalog-weather.test.weather
11/06/29 22:46:49 INFO hadoop.Hadoop18TapUtil: setting up task: 'attempt_002147483647_0000_m_000000_0' - file:/var/folders/YZ/YZO0QDWpEp0jBsowT4Bo4U+++TI/-Tmp-/tap57/4abafe2e-a136-4415-b7bc-09fd51acc301/_temporary/_attempt_002147483647_0000_m_000000_0
11/06/29 22:46:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
11/06/29 22:46:49 INFO hadoop.TapCollector: closing tap collector for: /var/folders/YZ/YZO0QDWpEp0jBsowT4Bo4U+++TI/-Tmp-/tap57/4abafe2e-a136-4415-b7bc-09fd51acc301/part-00000
11/06/29 22:46:49 INFO hadoop.Hadoop18TapUtil: committing task: 'attempt_002147483647_0000_m_000000_0' - file:/var/folders/YZ/YZO0QDWpEp0jBsowT4Bo4U+++TI/-Tmp-/tap57/4abafe2e-a136-4415-b7bc-09fd51acc301/_temporary/_attempt_002147483647_0000_m_000000_0
11/06/29 22:46:49 INFO hadoop.Hadoop18TapUtil: saved output of task 'attempt_002147483647_0000_m_000000_0' to file:/var/folders/YZ/YZO0QDWpEp0jBsow