Skip to content

Instantly share code, notes, and snippets.

View MansurAshraf's full-sized avatar

Mansur Ashraf MansurAshraf

  • Apple
  • San Francisco, CA
View GitHub Profile
@MansurAshraf
MansurAshraf / product_catalog.csv
Created March 30, 2013 22:59
Sample product catalog
women jeans Levi's 512 512 Perfectly Slimming Bootcut, Midnight Star Wash jeans 54.00 39.00
women jeans Tommy Hilfiger Jeans Hope Boot Cut, Caroline Original Wash jeans 69.50 54.00
women jeans Lauren Jeans Co. Jeans Slimming Bootcut, Rinse Wash jeans 80.00 65.00
women jeans INC International Concepts Jeans Curvy-Fit Skinny Ankle-Length jeans 69.00 48.99
women jeans MICHAEL Michael Kors Jeans Skinny Colored Denim jeans 89.50 69.99
women tops Karen Scott Top Short-Sleeve Boat-Neck top 29.00 12.99
women tops Cable & Gauge Top Three-Quarter-Sleeve Solid Twist Front top 24.98 19.99
women tops Style&co. Top Long-Sleeve Striped Button-Down Shirt top 39.98 29.99
women tops Style&co. Top Bell-Sleeve Printed Tunic top 49.98 35.99
women tops INC International Concepts Top Short-Sleeve Ruched Tee top 29.50 24.99
@MansurAshraf
MansurAshraf / product_recommendations.csv
Created March 31, 2013 00:31
Product Recommendations
Bandolino Shoes Rampage Shoes 0.6702
Bandolino Shoes Marc Fisher Shoes 0.73206
Bandolino Shoes Nine West Shoes 0.81128
Cable & Gauge Top INC International Concepts Top 0.65449
Cable & Gauge Top Karen Scott Top 0.68115
Cable & Gauge Top Style&co. Top 0.9547
INC International Concepts Jeans MICHAEL Michael Kors Jeans 0.69658
INC International Concepts Jeans Tommy Hilfiger Jeans 0.75921
INC International Concepts Jeans Levi's 512 0.80887
INC International Concepts Top Karen Scott Top 0.47929
package com.mansur.scalding
import com.twitter.scalding._
import org.apache.lucene.search.spell._
import org.apache.mahout.common.distance.TanimotoDistanceMeasure
import org.apache.mahout.math.DenseVector
import org.apache.commons.math.util.MathUtils
/**
buildscript {
repositories {
maven {
url "http://repository-uncommon-configuration.forge.cloudbees.com/release/"
}
mavenCentral()
}
dependencies {
classpath 'org.github.mansur.oozie:gradle-oozie-plugin:0.1'
<workflow-app xmlns='xmlns=uri:oozie:workflow:0.1' name='oozie_flow'>
<start to='ingestor' />
<action name='ingestor'>
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
@MansurAshraf
MansurAshraf / gist:7050203
Created October 19, 2013 00:22
word pair
object WordPair {
def main(args: Array[String]) = {
val line = "Android is a Linux-based operating system designed primarily for touchscreen mobile devices".split(" ").toList
val pairs = wordPair(line)
println(pairs)
}
def wordPair(line: List[String]): List[(String, String)] = line match {
case Nil => Nil
1. General Background and Overview
a) Probabilistic Data Structures for Web Analytics and Data Mining : On Highly Scalable Blog (http://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/) : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
b) Models and Issues in Data Stream Systems : (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.9846)
c) Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani (http://www.vldb.org/conf/2002/S10P03.pdf) : One of the early papers on the subject.
d) Methods for Finding Frequent Items in Data Streams by Graham Cormode & Marios Hadjieleftheriou (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&type=pdf)
e) The space complexity of approximating the frequency moments by Noga Alon, Yossi Matias, Mario Szegedy : one of the most influential papers introducing succinctness in computing fre
@MansurAshraf
MansurAshraf / gist:10336681
Created April 10, 2014 01:58
Injection bluess
override def multiPut[K1 <: String](kvs: Map[K1, Option[String]]): Map[K1, Future[Unit]] = {
implicit val inj=StringCodec.utf8
multiPutValues(kvs)
}
def multiPutValues[K:Codec, V:Codec](kvs: Map[K, Option[V]]): Map[K, Future[Unit]]={...}
Error:(84, 19) Cannot find Injection type class from K1 to Array[Byte]
multiPutValues(kvs)
^
oozie {
def common_props = [
jobTracker: '${jobTracker}',
namenode: '${nameNode}',
configuration: ["mapred.job.queue.name": "default"]
]
Class StreamingQueryService[T]{
Request[T]
.flatmap{
t => Seq(t)
}
.lookup(ReadibleStore)