Skip to content

Instantly share code, notes, and snippets.

Tuning Storm+Trident

Tuning a dataflow system is easy:

The First Rule of Dataflow Tuning:
* Ensure each stage is always ready to accept records, and
* Deliver each processed record promptly to its destination
@samklr
samklr / interviewitems.MD
Last active August 29, 2015 14:03 — forked from KWMalik/interviewitems.MD
Silly (or not ?) interview questions from tech companies

##Google Interview Questions: Product Marketing Manager

  • Why do you want to join Google? -- Because I want to create tools for others to learn, for free. I didn't have a lot of money when growing up so I didn't get access to the same books, computers and resources that others had which caused money, I want to help ensure that others can learn on the same playing field regardless of their families wealth status or location.
  • What do you know about Google’s product and technology? -- A lot actually, I am a beta tester for numerous products, I use most of the Google tools such as: Search, Gmaill, Drive, Reader, Calendar, G+, YouTube, Web Master Tools, Keyword tools, Analytics etc.
  • If you are Product Manager for Google’s Adwords, how do you plan to market this?
  • What would you say during an AdWords or AdSense product seminar?
  • Who are Google’s competitors, and how does Google compete with them? -- Google competes on numerous fields: --- Search: Baidu, Bing, Duck Duck Go
package com.grasswire.grasswireurlshortener
import akka.actor.ActorSystem
import com.typesafe.config.ConfigFactory
import org.apache.commons.validator.routines.UrlValidator
import spray.http.StatusCodes
import spray.routing._
import scala.util.Random
import scalaz.concurrent._
#### Start IPython, generate SHA1 password to use for IPython Notebook server
$ ipython
Python 2.7.5 |Anaconda 1.8.0 (x86_64)| (default, Oct 24 2013, 07:02:20)
Type "copyright", "credits" or "license" for more information.
IPython 1.1.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
@samklr
samklr / 0.setup.sh
Last active August 29, 2015 14:08 — forked from ceteri/0.setup.sh
# using four part files to construct "minitweet"
cat rawtweets/part-0000[1-3] > minitweets
# change log4j properties to WARN to reduce noise during demo
mv conf/log4j.properties.template conf/log4j.properties
vim conf/log4j.properties # Change to WARN
# launch Spark shell REPL
./bin/spark-shell
import com.twitter.algebird._
import HyperLogLog._
import com.twitter.algebird.Monoid
import com.twitter.algebird.DecayedValue
import com.twitter.algebird.Operators._
val hll = new HyperLogLogMonoid(4)
#! /bin/bash
sudo apt-get -y update
sudo apt-get -y install git-core curl
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF
DISTRO=$(lsb_release -is | tr '[:upper:]' '[:lower:]')
CODENAME=$(lsb_release -cs)
echo "deb http://repos.mesosphere.io/${DISTRO} ${CODENAME} main" | sudo tee /etc/apt/sources.list.d/mesosphere.list
import com.twitter.scalding._
import com.twitter.algebird.{ MinHasher, MinHasher32, MinHashSignature }
/**
* Computes similar items (with a string itemId), based on approximate
* Jaccard similarity, using LSH.
*
* Assumes an input data TSV file of the following format:
*
* itemId userId
@samklr
samklr / Main.scala
Last active August 29, 2015 14:09 — forked from guenter/Main.scala
import mesosphere.mesos.util.FrameworkInfo
import org.apache.mesos.MesosSchedulerDriver
/**
* @author Tobi Knaup
*/
object Main extends App {
import com.twitter.algebird._
import com.twitter.algebird.Operators._
// generate 2 lists
val A = (1 to 300).toList
val B = (201 to 400).toList
// Generate a Bloomfilter
val NUM_HASHES = 6
val WIDTH = 6000 // bits