- Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
- Models and Issues in Data Stream Systems
- Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
- Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
- [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
--- | |
#### | |
#### THIS IS OLD AND OUTDATED | |
#### LIKE, ANSIBLE 1.0 OLD. | |
#### | |
#### PROBABLY HIT UP https://docs.ansible.com MY DUDES | |
#### | |
#### IF IT BREAKS I'M JUST SOME GUY WITH | |
#### A DOG, OK, SORRY | |
#### |
Every application ever written can be viewed as some sort of transformation on data. Data can come from different sources, such as a network or a file or user input or the Large Hadron Collider. It can come from many sources all at once to be merged and aggregated in interesting ways, and it can be produced into many different output sinks, such as a network or files or graphical user interfaces. You might produce your output all at once, as a big data dump at the end of the world (right before your program shuts down), or you might produce it more incrementally. Every application fits into this model.
The scalaz-stream project is an attempt to make it easy to construct, test and scale programs that fit within this model (which is to say, everything). It does this by providing an abstraction around a "stream" of data, which is really just this notion of some number of data being sequentially pulled out of some unspecified data source. On top of this abstraction, sca
object SafeIO { | |
trait Brace[M[_]] extends Monad[M] { | |
def brace[A,B,C](acquire: M[A])(release: A => M[B], go: A => M[C]): M[C] | |
def snag[A](m: M[A], f: Throwable => M[A]): M[A] | |
def lift[A](t: Task[A]): M[A] | |
} | |
object Brace { | |
def apply[M[_]:Brace]: Brace[M] = implicitly[Brace[M]] |
/* | |
A script to generate a Google BigQuery-complient JSON-schema from a JSON object. | |
Make sure the JSON object is complete before generating, null values will be skipped. | |
References: | |
https://cloud.google.com/bigquery/docs/data | |
https://cloud.google.com/bigquery/docs/personsDataSchema.json | |
https://gist.github.com/igrigorik/83334277835625916cd6 | |
... and a couple of visits to StackOverflow |
package experiments | |
import scala.concurrent.ExecutionContext.Implicits.global | |
import scala.concurrent.{Await, Future} | |
import scalaz._ | |
import Scalaz._ | |
import scala.concurrent.duration.Duration | |
import natural.TypeSafeMap |