Skip to content

Instantly share code, notes, and snippets.

View codejitsu's full-sized avatar

Alex codejitsu

View GitHub Profile
@codejitsu
codejitsu / gist:5d87afa06dc4d88a85ee
Last active August 29, 2015 14:26 — forked from debasishg/gist:8172796
A collection of links for streaming algorithms and data structures
  1. General Background and Overview
@codejitsu
codejitsu / Client example
Last active August 29, 2015 14:27 — forked from rklaehn/Client example
akka http file server
package akkahttptest
import akka.http.Http
import akka.stream.ActorFlowMaterializer
import akka.actor.ActorSystem
import akka.stream.scaladsl.{Sink, Source}
import akka.http.model._
object TestClient extends App {

Setting up Webfaction for modern Django deployment

last updated: 4/5/2011

note that this stuff is always a moving target, much of this has been cribbed and combined from various blog posts. Much of the information was out of date from those, and if it is more than a couple months after the last updated date above, consider some of this likely to now be out of date.

@codejitsu
codejitsu / StreamingHLL.scala
Created July 3, 2017 19:23 — forked from MLnick/StreamingHLL.scala
Spark Streaming meets Algebird's HyperLogLog Monoid
import spark.streaming.StreamingContext._
import spark.streaming.{Seconds, StreamingContext}
import spark.SparkContext._
import spark.storage.StorageLevel
import spark.streaming.examples.twitter.TwitterInputDStream
import com.twitter.algebird.HyperLogLog._
import com.twitter.algebird._
/**
* Example of using HyperLogLog monoid from Twitter's Algebird together with Spark Streaming's

Scaling your API with rate limiters

The following are examples of the four types rate limiters discussed in the accompanying blog post. In the examples below I've used pseudocode-like Ruby, so if you're unfamiliar with Ruby you should be able to easily translate this approach to other languages. Complete examples in Ruby are also provided later in this gist.

In most cases you'll want all these examples to be classes, but I've used simple functions here to keep the code samples brief.

Request rate limiter

This uses a basic token bucket algorithm and relies on the fact that Redis scripts execute atomically. No other operations can run between fetching the count and writing the new count.

@codejitsu
codejitsu / kafka-cheat-sheet.md
Created May 22, 2018 12:36 — forked from ursuad/kafka-cheat-sheet.md
Quick command reference for Apache Kafka

Kafka Topics

List existing topics

bin/kafka-topics.sh --zookeeper localhost:2181 --list

Describe a topic

bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic mytopic

Purge a topic

bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic mytopic --config retention.ms=1000

... wait a minute ...

Reading list

Programming

Books

Kafka 0.11.0.0 (Confluent 3.3.0) added support to manipulate offsets for a consumer group via cli kafka-consumer-groups command.

  1. List the topics to which the group is subscribed
kafka-consumer-groups --bootstrap-server <kafkahost:port> --group <group_id> --describe

Note the values under "CURRENT-OFFSET" and "LOG-END-OFFSET". "CURRENT-OFFSET" is the offset where this consumer group is currently at in each of the partitions.

  1. Reset the consumer offset for a topic (preview)
@codejitsu
codejitsu / Kafka commands.md
Created July 17, 2018 14:41 — forked from vkroz/Kafka commands.md
Kafka frequent commands

Kafka frequent commands

Assuming that the following environment variables are set:

  • KAFKA_HOME where Kafka is installed on local machine (e.g. /opt/kafka)
  • ZK_HOSTS identifies running zookeeper ensemble, e.g. ZK_HOSTS=192.168.0.99:2181
  • KAFKA_BROKERS identifies running Kafka brokers, e.g. KAFKA_BROKERS=192.168.0.99:9092

Server

Start Zookepper and Kafka servers

Best Practices for Azure Redis

Below are a set of best practices that I recommend for most customers. This information is based on my experience helping hundreds of Azure Redis customers investigate various issues.

Configuration and Concepts

  1. Use Standard or Premium Tier for Production systems. The Basic Tier is a single node system with no data replication and no SLA. Also, use at least a C1 cache. C0 caches are really meant for simple dev/test scenarios since they have a shared CPU core, very little memory, are prone to "noisy neighbor", etc.
  2. Remember that Redis is an In-Memory data store. Read this article so that you are aware of scenarios where data loss can occur.
  3. Configure your client library to use a "connect timeout" of at least 10 to 15 seconds, giving the system time to connect even under higher CPU conditions. If your client or server tend to be under high load