Skip to content

Instantly share code, notes, and snippets.

View MansurAshraf's full-sized avatar

Mansur Ashraf MansurAshraf

  • Apple
  • San Francisco, CA
View GitHub Profile
--
-- NOTE:
--
-- File paths need to be edited. Search for $$PATH$$ and
-- replace it with the path to the directory containing
-- the extracted data files.
--
--
-- PostgreSQL database dump
--
class InMemoryStore[M[_]: Monad, K, V] extends Store[M, K, V] {
override val monad: Monad[M] = implicitly[Monad[M]]
private val map = new ConcurrentHashMap[K, V]()
override def get(k: K): M[Option[V]] = monad.pure(Option(map.get(k)))
override def put(kv: (K, V)): M[Unit] = monad.pure {
val (k, v) = kv
map.put(k, v)
colums: A B C D E
types: String String String Int Int
/**
Group By ABC, and within each group find the max of (E-D)
then fanout from E to D, assign 1 to each value from E to D and put them in a map.
GroupBy AB, merge all the maps
*/
def checkPath(p: String) : Boolean
val futureA = Future{
if(!checkPath(...)) throw new IllegalStateException("path doesnt exisit")
}
val futureB = Future(..)
futureA.flatMap(futureB)
package com.twitter.algebird
import com.twitter.scalding.typed.TypedPipe
/**
* @author Mansur Ashraf.
*/
class jankyMonoid {
implicit val sg = new EnumSemiGroup
package com.twitter.algebird
import com.twitter.scalding.typed.TypedPipe
/**
* @author Mansur Ashraf.
*/
class jankyMonoid {
implicit val sg = new EnumSemiGroup
Class StreamingQueryService[T]{
Request[T]
.flatmap{
t => Seq(t)
}
.lookup(ReadibleStore)
oozie {
def common_props = [
jobTracker: '${jobTracker}',
namenode: '${nameNode}',
configuration: ["mapred.job.queue.name": "default"]
]
@MansurAshraf
MansurAshraf / gist:10336681
Created April 10, 2014 01:58
Injection bluess
override def multiPut[K1 <: String](kvs: Map[K1, Option[String]]): Map[K1, Future[Unit]] = {
implicit val inj=StringCodec.utf8
multiPutValues(kvs)
}
def multiPutValues[K:Codec, V:Codec](kvs: Map[K, Option[V]]): Map[K, Future[Unit]]={...}
Error:(84, 19) Cannot find Injection type class from K1 to Array[Byte]
multiPutValues(kvs)
^
1. General Background and Overview
a) Probabilistic Data Structures for Web Analytics and Data Mining : On Highly Scalable Blog (http://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/) : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
b) Models and Issues in Data Stream Systems : (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.9846)
c) Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani (http://www.vldb.org/conf/2002/S10P03.pdf) : One of the early papers on the subject.
d) Methods for Finding Frequent Items in Data Streams by Graham Cormode & Marios Hadjieleftheriou (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&type=pdf)
e) The space complexity of approximating the frequency moments by Noga Alon, Yossi Matias, Mario Szegedy : one of the most influential papers introducing succinctness in computing fre