Skip to content

Instantly share code, notes, and snippets.

@epishkin
epishkin / .gitconfig
Last active October 7, 2015 16:08
My git aliases
[alias]
co = checkout
ci = commit
st = status -sb
cln = remote prune origin
br = branch
hist = log --pretty=format:\"%h %ad | %s%d [%an]\" --date=short
hist-graph = log --pretty=format:\"%h %ad | %s%d [%an]\" --graph --date=short
lr = "!f() { git log $1...$2 --left-right --oneline; }; f"
type = cat-file -t
@epishkin
epishkin / upload.md
Last active December 19, 2015 12:08
script to upload oozie workflow / coordinator to hdfs

project structure:

.
├── oozie
│   ├── upload.sh
│   ├── combined_queries
│      ├── ...
│   └── simple_reports
│      ├── lib
│      │   ├── avro-1.7.4.jar
@epishkin
epishkin / readme.md
Last active December 27, 2015 16:49
Hadoop Howto
@epishkin
epishkin / gist:9844553
Last active August 29, 2015 13:57 — forked from johnynek/gist:6632488
upgrade code to scalding 0.9 using sed
find . -type f -print0 | xargs -0 gsed -i 's/\.sum(/.sum[Double](/g'
find . -type f -print0 | xargs -0 gsed -i 's/\.plus\[/.sum[/g'
find . -type f -print0 | xargs -0 gsed -i 's/import com.twitter.scalding.DateOps.richDateToCalendar/import com.twitter.scalding.RichDate.toCalendar/'
find . -type f -print0 | xargs -0 gsed -i 's/ RichDate("\([^"]\+\)")(\([^)]\+\))/ com.twitter.scalding.DateParser.default.parse("\1")(\2).get/g'
find . -type f -print0 | xargs -0 gsed -i 's/\.then[^(Do)]/.thenDo/g'
find . -type f -print0 | xargs -0 gsed -i 's/Mode\.mode/mode/g'
find . -type f -print0 | xargs -0 gsed -i 's/new RichDate/RichDate/g'
find . -type f -print0 | xargs -0 gsed -i 's/import scalding.avro/import com.twitter.scalding.avro/'
@epishkin
epishkin / git-branches.sh
Created November 14, 2014 16:42
show remote branches of a git repo sorted by date
remove_head() {
for BRANCH in $ALL_BRANCHES;
do
if [ "$BRANCH" = "->" ] || [ "$BRANCH" = "origin/HEAD" ]; then
continue
fi
echo $BRANCH
done | sort -u
}
@epishkin
epishkin / count_uniques.scala
Last active August 29, 2015 14:11
Examples for Optimize Scalding Jobs
// 2 m/r jobs :-(
.unique('item_id_from, 'item_id_to, 'user_id) // 1st m/r
.groupBy('item_id_from, 'item_id_to) { _.size('count) } // 2nd m/r
// 1 m/r job but more code
.map('user_id -> 'user_id) { id: String => Set(id) }
.groupBy('item_id_from, 'item_id_to) {
_.sum[Set[String}]('user_id)
}
.map('user_id -> 'count) { ids: Set[String] => ids.size }
@epishkin
epishkin / SaveCountersToHdfs.scala
Last active December 4, 2017 13:39
write all counters of a scalding job to hdfs
import java.io.PrintWriter
import cascading.stats.CascadingStats
import com.twitter.scalding._
/**
* Writes all custom counters into a tsv file args("counters-file") if this property is set.
*
* Output format:
* counter_name value
import turtle
t = turtle.Pen()

def line(count, size, alpha, beta):
	if count==0:
		return
	else:
	    t.forward(size)
 t.right(180 - beta)