Alex codejitsu

## gist:5d87afa06dc4d88a85ee

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                codejitsu
                / gist:5d87afa06dc4d88a85ee
            
            
              Last active
              August 29, 2015 14:26
                — forked from debasishg/gist:8172796
            
              
                A collection of links for streaming algorithms and data structures
              
          
General Background and Overview


Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
Models and Issues in Data Stream Systems
Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
[Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&amp;rep


## Client example
package akkahttptest

import akka.http.Http
import akka.stream.ActorFlowMaterializer
import akka.actor.ActorSystem
import akka.stream.scaladsl.{Sink, Source}
import akka.http.model._

object TestClient extends App {

## _webfaction_setup.rst

      
              5 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                codejitsu
                / _webfaction_setup.rst
            
            
              Created
              December 12, 2012 13:20
                — forked from mihow/_webfaction_setup.rst
            
          
    Setting up Webfaction for modern Django deployment

last updated: 4/5/2011
note that this stuff is always a moving target, much of this has been cribbed and combined from various blog posts. Much of the information was out of date from those, and if it is more than a couple months after the last updated date above, consider some of this likely to now be out of date.

  
## StreamingHLL.scala
import spark.streaming.StreamingContext._
import spark.streaming.{Seconds, StreamingContext}
import spark.SparkContext._
import spark.storage.StorageLevel
import spark.streaming.examples.twitter.TwitterInputDStream
import com.twitter.algebird.HyperLogLog._
import com.twitter.algebird._

/**
 * Example of using HyperLogLog monoid from Twitter's Algebird together with Spark Streaming's

## 0-rate-limiters.md

      
              7 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                codejitsu
                / 0-rate-limiters.md
            
            
              Created
              April 25, 2018 09:18
                — forked from ptarjan/0-rate-limiters.md
            
          
    Scaling your API with rate limiters

The following are examples of the four types rate limiters discussed in the accompanying blog post. In the examples below I've used pseudocode-like Ruby, so if you're unfamiliar with Ruby you should be able to easily translate this approach to other languages. Complete examples in Ruby are also provided later in this gist.
In most cases you'll want all these examples to be classes, but I've used simple functions here to keep the code samples brief.
Request rate limiter

This uses a basic token bucket algorithm and relies on the fact that Redis scripts execute atomically. No other operations can run between fetching the count and writing the new count.

  
## kafka-cheat-sheet.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                codejitsu
                / kafka-cheat-sheet.md
            
            
              Created
              May 22, 2018 12:36
                — forked from ursuad/kafka-cheat-sheet.md
            
              
                Quick command reference for Apache Kafka
              
          
    Kafka Topics

List existing topics

bin/kafka-topics.sh --zookeeper localhost:2181 --list
Describe a topic

bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic mytopic 
Purge a topic

bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic mytopic --config retention.ms=1000
... wait a minute ...

  
## reading-list.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                codejitsu
                / reading-list.md
            
            
              Created
              May 25, 2018 21:01
            
          
    Reading list

Programming

Books


 Machine Language for Beginners
 Making Games for the Atari 2600 - Steven Hugg
 The Go Programming Language
 Learn You a Haskell for Great Good!


## how_to_reset_kafka_consumer_group_offset.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                codejitsu
                / how_to_reset_kafka_consumer_group_offset.md
            
            
              Created
              July 11, 2018 11:27
                — forked from marwei/how_to_reset_kafka_consumer_group_offset.md
            
              
                How to Reset Kafka Consumer Group Offset
              
          
    Kafka 0.11.0.0 (Confluent 3.3.0) added support to manipulate offsets for a consumer group via cli kafka-consumer-groups command.

List the topics to which the group is subscribed

kafka-consumer-groups --bootstrap-server <kafkahost:port> --group <group_id> --describe
Note the values under "CURRENT-OFFSET" and "LOG-END-OFFSET". "CURRENT-OFFSET" is the offset where this consumer group is currently at in each of the partitions.

Reset the consumer offset for a topic (preview)


## Kafka commands.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                codejitsu
                / Kafka commands.md
            
            
              Created
              July 17, 2018 14:41
                — forked from vkroz/Kafka commands.md
            
              
                Kafka frequent commands
              
          
    Kafka frequent commands

Assuming that the following environment variables are set:

KAFKA_HOME where Kafka is installed on local machine (e.g. /opt/kafka)
ZK_HOSTS identifies running zookeeper ensemble, e.g. ZK_HOSTS=192.168.0.99:2181
KAFKA_BROKERS identifies running Kafka brokers, e.g. KAFKA_BROKERS=192.168.0.99:9092

Server

Start Zookepper and Kafka servers

  
## Redis-BestPractices-General.md

      
              7 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                codejitsu
                / Redis-BestPractices-General.md
            
            
              Created
              August 1, 2018 11:59
                — forked from JonCole/Redis-BestPractices-General.md
            
              
                Redis Best Practices
              
          
    Best Practices for Azure Redis

Below are a set of best practices that I recommend for most customers.  This information is based on my experience helping hundreds of Azure Redis customers investigate various issues.
Configuration and Concepts


Use Standard or Premium Tier for Production systems.  The Basic Tier is a single node system with no data replication and no SLA. Also, use at least a C1 cache.  C0 caches are really meant for simple dev/test scenarios since they have a shared CPU core, very little memory, are prone to "noisy neighbor", etc.
Remember that Redis is an In-Memory data store.  Read this article so that you are aware of scenarios where data loss can occur.
Configure your client library to use a "connect timeout" of at least 10 to 15 seconds, giving the system time to connect even under higher CPU conditions. If your client or server tend to be under high load
	package akkahttptest

	import akka.http.Http
	import akka.stream.ActorFlowMaterializer
	import akka.actor.ActorSystem
	import akka.stream.scaladsl.{Sink, Source}
	import akka.http.model._

	object TestClient extends App {
	import spark.streaming.StreamingContext._
	import spark.streaming.{Seconds, StreamingContext}
	import spark.SparkContext._
	import spark.storage.StorageLevel
	import spark.streaming.examples.twitter.TwitterInputDStream
	import com.twitter.algebird.HyperLogLog._
	import com.twitter.algebird._

	/**
	* Example of using HyperLogLog monoid from Twitter's Algebird together with Spark Streaming's