Skip to content

Instantly share code, notes, and snippets.

View elmer-garduno's full-sized avatar

Elmer Garduno elmer-garduno

View GitHub Profile
@azymnis
azymnis / ItemSimilarity.scala
Created December 13, 2013 05:17
Approximate item similarity using LSH in Scalding.
import com.twitter.scalding._
import com.twitter.algebird.{ MinHasher, MinHasher32, MinHashSignature }
/**
* Computes similar items (with a string itemId), based on approximate
* Jaccard similarity, using LSH.
*
* Assumes an input data TSV file of the following format:
*
* itemId userId
@guenter
guenter / Main.scala
Last active September 17, 2020 11:25
A simple Mesos "Hello World": downloads and starts a Python web server on every node in the cluster.
import mesosphere.mesos.util.FrameworkInfo
import org.apache.mesos.MesosSchedulerDriver
/**
* @author Tobi Knaup
*/
object Main extends App {
@massie
massie / KryoRegistrator.scala
Created October 29, 2013 23:59
Here's an example of how to embed Avro objects into a Kryo stream. You only need to register each Avro Specific class in the KryoRegistrator using the AvroSerializer class below and you're ready to go.
/*
* Copyright (c) 2013. Regents of the University of California
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
@johnynek
johnynek / branch_clean.scala
Created August 24, 2013 21:15
Keeping old branches clean. Run this and it will delete fully merged branches.
#!/bin/sh
exec scala -savecompiled "$0" "$@"
!#
// Get the shell scripting enrichments
import scala.sys.process._
val alwaysKeep = Set("develop", "master")
// Now delete any merged branches"
val branches: String = "git branch".!!
@johnynek
johnynek / compression.scala
Created August 12, 2013 05:40
Compress Lists of any item (as long as they are immutable and have sane equals and hashCode). This is basically the Lempel-Ziv algorithm
/**
scala> compress(List.fill(1000)(1))
res17: List[Either[Int,Int]] = List(Left(1), Right(0), Left(1), Right(1), Left(1), Right(2), Left(1), Right(3), Left(1), Right(4), Left(1), Right(5), Left(1), Right(6), Left(1), Right(7), Left(1), Right(8), Left(1), Right(9), Left(1), Right(10), Left(1), Right(11), Left(1), Right(12), Left(1), Right(13), Left(1), Right(14), Left(1), Right(15), Left(1), Right(16), Left(1), Right(17), Left(1), Right(18), Left(1), Right(19), Left(1), Right(20), Left(1), Right(21), Left(1), Right(22), Left(1), Right(23), Left(1), Right(24), Left(1), Right(25), Left(1), Right(26), Left(1), Right(27), Left(1), Right(28), Left(1), Right(29), Left(1), Right(30), Left(1), Right(31), Left(1), Right(32), Left(1), Right(33), Left(1), Right(34), Left(1), Right(35), Left(1), Right(36), Left(1), Right(37), Left(1), Ri...
scala> compress(List.fill(1000)(1)).size
res18: Int = 88
scala> decompress(res17)
res19: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,