Skip to content

Instantly share code, notes, and snippets.

View dgalichet's full-sized avatar

David Galichet dgalichet

  • Freelance
  • France
View GitHub Profile
@debasishg
debasishg / gist:8172796
Last active July 5, 2024 11:53
A collection of links for streaming algorithms and data structures

General Background and Overview

  1. Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
  2. Models and Issues in Data Stream Systems
  3. Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
  4. Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
  5. [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
object Reader {
def map2[I, O, O1, O2](r1: Reader[I, O1], r2: Reader[I, O2])(f: (O1, O2) => O): Reader[I, O] = Reader { s: I =>
(r1.p(s), r2.p(s)) match {
case (Success(s1), Success(s2)) => Success(f(s1, s2))
case (Success(_), Failure(e)) => Failure(e)
case (Failure(e), Success(_)) => Failure(e)
case (Failure(e1), Failure(e2)) => Failure(e1 ++ e2)
}
}
@pchiusano
pchiusano / scalaz-stream-design.markdown
Last active August 29, 2015 14:22
WIP design for new scalaz-stream core

All right, here we go! The API is quite a bit different than what we have currently, but it's simpler to use, more general, and the implementation can be much more efficient. The encoding for Tee, Process1, Channel is greatly simplified, these become just regular functions, rather than a Process[F,_] with a funky F. Stepping a Process is now a first-class concept, exposed publicly, in a safe way. The resulting style looks a lot like ordinary list processing, and doing things like a 3-way or N-way merge are trivial.

The algebra and the canonical model for this algebra are given in the attached streams.scala. The algebra for streams is given by the trait Stream[P[+_[_],+_]], and there's a canonical instance, Stream[NF], which specifies what the operations mean. A couple notes:

  • The Process[F,A] data type we have currently would have a Stream instance. (Name of Stream TBD)
  • The Chunk type is TBD
  • Free is just the free monad (well, one formulation of it)
  • I still have some uncert