- Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
- Models and Issues in Data Stream Systems
- Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
- Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
- [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
#include <Python.h> | |
#include <numpy/arrayobject.h> | |
#include "chi2.h" | |
/* Docstrings */ | |
static char module_docstring[] = | |
"This module provides an interface for calculating chi-squared using C."; | |
static char chi2_docstring[] = | |
"Calculate the chi-squared of some data given a model."; |
import time | |
from watchdog.observers import Observer | |
from watchdog.events import FileSystemEventHandler | |
import envoy | |
import os | |
import datetime | |
class NotebookConverterHandler(FileSystemEventHandler): | |
def on_modified(self, event): | |
if event.src_path.endswith('.ipynb'): |
""" | |
Usage: python remove_output.py notebook.ipynb [ > without_output.ipynb ] | |
Modified from remove_output by Minrk | |
""" | |
import sys | |
import io | |
import os | |
from IPython.nbformat.current import read, write |
#!/bin/bash | |
# generate new personal ed25519 ssh keys | |
ssh-keygen -o -a 100 -t ed25519 -f ~/.ssh/id_ed25519 -C "rob thijssen <rthijssen@gmail.com>" | |
ssh-keygen -o -a 100 -t ed25519 -f ~/.ssh/id_robtn -C "rob thijssen <rob@rob.tn>" | |
# generate new host cert authority (host_ca) ed25519 ssh key | |
# used for signing host keys and creating host certs | |
ssh-keygen -t ed25519 -f manta_host_ca -C manta.network |
# delete local tag '12345' | |
git tag -d 12345 | |
# delete remote tag '12345' (eg, GitHub version too) | |
git push origin :refs/tags/12345 | |
# alternative approach | |
git push --delete origin tagName | |
git tag -d tagName |
#! /usr/bin/python | |
import sys | |
import ldap | |
from ldap.controls import SimplePagedResultsControl | |
from distutils.version import LooseVersion | |
# Check if we're using the Python "ldap" 2.4 or greater API | |
LDAP24API = LooseVersion(ldap.__version__) >= LooseVersion('2.4') |
Here's a little walkthrough of how Yannick and I are using feature branches and pull requests to develop new features and adding them to the project. Below are the steps I take when working on a new feature. Hopefully this, along with watching the process on Github, will serve as a starting point to having everyone use a similar workflow.
Questions, comments, and suggestions for improvements welcome!
When starting a new feature, I make sure to start with the latest and greatest codebase:
git checkout master
Deploy key is a SSH key set in your repo to grant client read-only (as well as r/w, if you want) access to your repo.
As the name says, its primary function is to be used in the deploy process in replace of username/password, where only read access is needed. Therefore keep the repo safe from the attack, in case the server side is fallen.
- Generate a ssh key
#!/usr/bin/env python | |
#-*- coding: utf-8 -*- | |
# RescueTime Data Export | |
import os | |
import requests | |
from datetime import date, datetime | |
from dateutil.rrule import rrule, MONTHLY | |
from dateutil.parser import parse |