Skip to content

Instantly share code, notes, and snippets.

View alaiacano's full-sized avatar

Adam Laiacano alaiacano

  • Spotify
  • New Rochelle, NY
View GitHub Profile
@azymnis
azymnis / ItemSimilarity.scala
Created December 13, 2013 05:17
Approximate item similarity using LSH in Scalding.
import com.twitter.scalding._
import com.twitter.algebird.{ MinHasher, MinHasher32, MinHashSignature }
/**
* Computes similar items (with a string itemId), based on approximate
* Jaccard similarity, using LSH.
*
* Assumes an input data TSV file of the following format:
*
* itemId userId
@rjhall
rjhall / SVD.scala
Last active December 20, 2015 22:08
import org.apache.commons.math3.linear._
import com.twitter.algebird.Operators._
import com.twitter.scalding._
import cascading.pipe.Pipe
import cascading.pipe.joiner.InnerJoin
import cascading.tuple.Fields
object SVD extends Serializable {
@minrk
minrk / nbstripout
Last active June 6, 2023 06:23
git pre-commit hook for stripping output from IPython notebooks
#!/usr/bin/env python
"""strip outputs from an IPython Notebook
Opens a notebook, strips its output, and writes the outputless version to the original file.
Useful mainly as a git filter or pre-commit hook for users who don't want to track output in VCS.
This does mostly the same thing as the `Clear All Output` command in the notebook UI.
LICENSE: Public Domain
@johnynek
johnynek / reach2.scala
Created November 1, 2012 20:35
Second followers in scalding
ackage com.twitter.ads.batch.experimental.oscar
import com.twitter.scalding._
import com.twitter.pluck.job._
import com.twitter.pluck.source._
import com.twitter.pluck.source.matrix._
import com.twitter.pluck.mathematics._
import com.twitter.scalding.mathematics.Monoid
/*