Anton Parkhomenko chuwy

## gist:8172796

      
              1 file
            
          
              403 forks
            
          
              23 comments
            
          
              1642 stars
            
          
                debasishg
                / gist:8172796
            
            
              Last active
              March 15, 2024 15:05
            
              
                A collection of links for streaming algorithms and data structures
              
          
    General Background and Overview


Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
Models and Issues in Data Stream Systems
Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
[Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&amp;rep=rep1&amp;t


## pedantically_commented_playbook.yml
---
####
#### THIS IS OLD AND OUTDATED
#### LIKE, ANSIBLE 1.0 OLD.
####
#### PROBABLY HIT UP https://docs.ansible.com MY DUDES
####
#### IF IT BREAKS I'M JUST SOME GUY WITH
#### A DOG, OK, SORRY
####

## streams-tutorial.md

      
              1 file
            
          
              23 forks
            
          
              6 comments
            
          
              184 stars
            
          
                djspiewak
                / streams-tutorial.md
            
            
              Created
              March 22, 2015 19:55
            
              
                Introduction to scalaz-stream
              
          
    Introduction to scalaz-stream

Every application ever written can be viewed as some sort of transformation on data.  Data can come from different sources, such as a network or a file or user input or the Large Hadron Collider.  It can come from many sources all at once to be merged and aggregated in interesting ways, and it can be produced into many different output sinks, such as a network or files or graphical user interfaces.  You might produce your output all at once, as a big data dump at the end of the world (right before your program shuts down), or you might produce it more incrementally.  Every application fits into this model.
The scalaz-stream project is an attempt to make it easy to construct, test and scale programs that fit within this model (which is to say, everything). It does this by providing an abstraction around a "stream" of data, which is really just this notion of some number of data being sequentially pulled out of some unspecified data source. On top of this abstraction, sca

  
## schema-generator.js
/*
	A script to generate a Google BigQuery-complient JSON-schema from a JSON object.

	Make sure the JSON object is complete before generating, null values will be skipped.

	References:
	https://cloud.google.com/bigquery/docs/data
	https://cloud.google.com/bigquery/docs/personsDataSchema.json
	https://gist.github.com/igrigorik/83334277835625916cd6
	... and a couple of visits to StackOverflow

## gist:dba2699f064460228315
object SafeIO {

  trait Brace[M[_]] extends Monad[M] {
    def brace[A,B,C](acquire: M[A])(release: A => M[B], go: A => M[C]): M[C]
    def snag[A](m: M[A], f: Throwable => M[A]): M[A]
    def lift[A](t: Task[A]): M[A]
  }

  object Brace {
    def apply[M[_]:Brace]: Brace[M] = implicitly[Brace[M]]

## ForFree.scala
package experiments

import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.{Await, Future}
import scalaz._
import Scalaz._
import scala.concurrent.duration.Duration

import natural.TypeSafeMap
	---
	####
	#### THIS IS OLD AND OUTDATED
	#### LIKE, ANSIBLE 1.0 OLD.
	####
	#### PROBABLY HIT UP https://docs.ansible.com MY DUDES
	####
	#### IF IT BREAKS I'M JUST SOME GUY WITH
	#### A DOG, OK, SORRY
	####
	/*
	A script to generate a Google BigQuery-complient JSON-schema from a JSON object.

	Make sure the JSON object is complete before generating, null values will be skipped.

	References:
	https://cloud.google.com/bigquery/docs/data
	https://cloud.google.com/bigquery/docs/personsDataSchema.json
	https://gist.github.com/igrigorik/83334277835625916cd6
	... and a couple of visits to StackOverflow
	object SafeIO {

	trait Brace[M[_]] extends Monad[M] {
	def brace[A,B,C](acquire: M[A])(release: A => M[B], go: A => M[C]): M[C]
	def snag[A](m: M[A], f: Throwable => M[A]): M[A]
	def lift[A](t: Task[A]): M[A]
	}

	object Brace {
	def apply[M[_]:Brace]: Brace[M] = implicitly[Brace[M]]
	package experiments

	import scala.concurrent.ExecutionContext.Implicits.global
	import scala.concurrent.{Await, Future}
	import scalaz._
	import Scalaz._
	import scala.concurrent.duration.Duration

	import natural.TypeSafeMap