- Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
- Models and Issues in Data Stream Systems
- Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
- Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
- [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
import java.time.*; | |
import java.time.format.DateTimeFormatter; | |
import java.time.format.FormatStyle; | |
import java.time.temporal.ChronoUnit; | |
import java.time.temporal.TemporalAdjusters; | |
import java.util.*; | |
import static java.time.temporal.TemporalAdjusters.*; | |
public class Java8DateTimeExamples { |
""" | |
This gist demonstrates that spark 1.0.0 and 0.9.1 | |
don't serialize a logger instance properly when code runs on workers. | |
run this code via: | |
spark-submit spark_serialization_demo.py | |
- or - | |
pyspark spark_serialization_demo.py | |
""" | |
import pyspark |
The `Getting started`_ instructions are a good start (no surprise there!) but are somewhat incomplete and currently look a bit outdated (I plan to fix them soon): however, the outcome has been that I have struggled more than I felt necessary in building and running Mesos on a dev VM (Ubuntu 14.04 running under VirtualBox).
Some of the issue seem to arise from the unfortunate combination of Mesos Master trying to guess its own IP address, the VM being (obviously) non-DNS resolvable and, eventually, the Slave and the Framework failing to properly communicate with the Master.
In the process of solving this, I ended up automating all the dependencies installation, building and running the framework; I have then broken it down into the following modules to make it easier to run only parts of the process.
package main | |
import ( | |
"database/sql" | |
"gopkg.in/gorp.v1" | |
"log" | |
"strconv" | |
"github.com/gin-gonic/gin" | |
_ "github.com/go-sql-driver/mysql" |
import numpy as np | |
from keras.layers import GRU, initializations, K | |
from collections import OrderedDict | |
class GRULN(GRU): | |
'''Gated Recurrent Unit with Layer Normalization | |
Current impelemtation only works with consume_less = 'gpu' which is already | |
set. | |
# Arguments |