Skip to content

Instantly share code, notes, and snippets.

@johnynek
johnynek / gist:8290375
Created January 6, 2014 21:47
example of LAG type function in the scalding Fields API (similar for typed)
groupBy('source) {
_.sortBy('links)
.reverse
.mapStream[(String,Int), (String, Int, Int, Int)]
(('destination, 'links) -> ('destination, 'links, 'rank, 'gap)) { destLinks =>
destLinks.scanLeft(None: Option[(String, Int, Int, Int)]) {
(prevRowOut: Option[(String,Int,Int,Int)], thisRow: (String, Int)) =>
val (dest, links) = thisRow
prevRowOut match {
case None => Some((dest, links, 1, 0)) // rank 1, gap 0 -- not exactly what you wanted...
@mushkevych
mushkevych / Dockerfile
Last active December 30, 2015 07:39
Docker CDH 4.5
FROM ubuntu:precise
MAINTAINER Bohdan Mushkevych
# Installing Oracle JDK
RUN apt-get -y install python-software-properties ;\
add-apt-repository ppa:webupd8team/java ;\
apt-get update && apt-get -y upgrade ;\
echo oracle-java7-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections ;\
apt-get -y install oracle-java7-installer && apt-get clean ;\
update-alternatives --display java ;\
@terrancesnyder
terrancesnyder / kafka-consumer-example.java
Last active October 15, 2015 11:03
Example of processing Kafka messages using JQuery like deferred / promise for cleaner async code.
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import org.apache.avro.io.BinaryDecoder;
@nipra
nipra / notes.txt
Created November 13, 2012 10:04
CDH4 Hadoop + HBase Pseudo-distributed Mode installation
# Installing CDH4 on a Single Linux Node in Pseudo-distributed Mode
# https://ccp.cloudera.com/display/CDH4DOC/Installing+CDH4+on+a+Single+Linux+Node+in+Pseudo-distributed+Mode
# Installing CDH4 with MRv1 on a Single Linux Node in Pseudo-distributed mode
# On Ubuntu and other Debian systems
nipra@lambda:Downloads$ wget -cv http://archive.cloudera.com/cdh4/one-click-install/precise/amd64/cdh4-repository_1.0_all.deb
nipra@lambda:Downloads$ sudo dpkg -i cdh4-repository_1.0_all.deb # Adds /etc/apt/sources.list.d/cloudera-cdh4.list ??
nipra@lambda:Downloads$ dpkg -L cdh4-repository # To view the files on Ubuntu systems
# Install CDH4