Skip to content

Instantly share code, notes, and snippets.


Sam Bessalah samklr

Block or report user

Report or block samklr

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
# using four part files to construct "minitweet"
cat rawtweets/part-0000[1-3] > minitweets
# change log4j properties to WARN to reduce noise during demo
mv conf/ conf/
vim conf/ # Change to WARN
# launch Spark shell REPL
#! /bin/bash
sudo apt-get update
sudo apt-get install -yq git-core
git clone ~/.rbenv
echo 'export PATH="$HOME/.rbenv/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(rbenv init -)"' >> ~/.bashrc
# type rbenv ===> should print rbenv is a function
View gist:de372d543c9ce96efdc8
import com.twitter.algebird._
import HyperLogLog._
import com.twitter.algebird.Monoid
import com.twitter.algebird.DecayedValue
import com.twitter.algebird.Operators._
val hll = new HyperLogLogMonoid(4)
#! /bin/bash
sudo echo "deb precise main" | tee -a /etc/apt/sources.list
sudo echo "deb-src precise main" | tee -a /etc/apt/sources.list
sudo apt-key adv --keyserver --recv-keys EEA14886
sudo apt-get -y update
## auto accept oracle jdk license
sudo echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections
sudo apt-get install -y oracle-java8-installer ca-certificates
View Main.scala
import mesosphere.mesos.util.FrameworkInfo
import org.apache.mesos.MesosSchedulerDriver
* @author Tobi Knaup
object Main extends App {
View QTreeAlgebirdTaxiData.scala
// transcribed from an Apache Spark 1.0 spark-shell session
// using data from
// and the QTree algorithm for approximate quantiles over large datasets
// each of the distanceRange and minutesRange calculations below takes about 15 minutes on my four-core SSD-based Macbook Pro
import com.twitter.algebird._
import com.twitter.algebird.Operators._
implicit val qtSemigroupD = new QTreeSemigroup[Double](6)
val in = sc.textFile("trip_data") // a directory containing all the trip_data*.csv files downloaded from the above link
#! /bin/bash
sudo apt-get -y update
sudo apt-get -y install git-core curl
sudo apt-key adv --keyserver --recv E56151BF
DISTRO=$(lsb_release -is | tr '[:upper:]' '[:lower:]')
CODENAME=$(lsb_release -cs)
echo "deb${DISTRO} ${CODENAME} main" | sudo tee /etc/apt/sources.list.d/mesosphere.list
View ItemSimilarity.scala
import com.twitter.scalding._
import com.twitter.algebird.{ MinHasher, MinHasher32, MinHashSignature }
* Computes similar items (with a string itemId), based on approximate
* Jaccard similarity, using LSH.
* Assumes an input data TSV file of the following format:
* itemId userId
View Bloomfilter.scala
import com.twitter.algebird._
import com.twitter.algebird.Operators._
// generate 2 lists
val A = (1 to 300).toList
val B = (201 to 400).toList
// Generate a Bloomfilter
val NUM_HASHES = 6
val WIDTH = 6000 // bits
sudo add-apt-repository ppa:webupd8team/java -y && apt-get update && \
echo oracle-java7-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections && \
apt-get install -y oracle-java8-installer
sudo dpkg -i sbt-0.13.8.deb
rm sbt-0.13.8.deb
You can’t perform that action at this time.