- Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
- Models and Issues in Data Stream Systems
- Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
- Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
- [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
object Reader { | |
def map2[I, O, O1, O2](r1: Reader[I, O1], r2: Reader[I, O2])(f: (O1, O2) => O): Reader[I, O] = Reader { s: I => | |
(r1.p(s), r2.p(s)) match { | |
case (Success(s1), Success(s2)) => Success(f(s1, s2)) | |
case (Success(_), Failure(e)) => Failure(e) | |
case (Failure(e), Success(_)) => Failure(e) | |
case (Failure(e1), Failure(e2)) => Failure(e1 ++ e2) | |
} | |
} |
All right, here we go! The API is quite a bit different than what we have currently, but it's simpler to use, more general, and the implementation can be much more efficient. The encoding for Tee
, Process1
, Channel
is greatly simplified, these become just regular functions, rather than a Process[F,_]
with a funky F
. Stepping a Process
is now a first-class concept, exposed publicly, in a safe way. The resulting style looks a lot like ordinary list processing, and doing things like a 3-way or N-way merge are trivial.
The algebra and the canonical model for this algebra are given in the attached streams.scala
. The algebra for streams is given by the trait Stream[P[+_[_],+_]]
, and there's a canonical instance, Stream[NF]
, which specifies what the operations mean. A couple notes:
- The
Process[F,A]
data type we have currently would have aStream
instance. (Name ofStream
TBD) - The
Chunk
type is TBD Free
is just the free monad (well, one formulation of it)- I still have some uncert