- Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
- Models and Issues in Data Stream Systems
- Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
- Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
- [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
/* Python(ish) string formatting: | |
* >>> format('{0}', ['zzz']) | |
* "zzz" | |
* >>> format('{x}', {x: 1}) | |
* "1" | |
*/ | |
var format = (function() { | |
var re = /\{([^}]+)\}/g; | |
return function(s, args) { | |
return s.replace(re, function(_, match){ return args[match]; }); |
location /resize { | |
alias /tmp/nginx/resize; | |
set $width 150; | |
set $height 100; | |
set $dimens ""; | |
if ($uri ~* "^/resize_(\d+)x(\d+)/(.*)" ) { | |
set $width $1; | |
set $height $2; | |
set $image_path $3; |
def read_only_processor(request): | |
if read_only_flag(): | |
if request.method in ['GET', 'HEAD']: | |
return { | |
'read_only': True, | |
'error_message': 'Sorry, this site is read only right now!" | |
} | |
else: | |
#return a "Read only" response | |
else: |
/* | |
A Goroutine safe pattern using channels that abstracts away the channels | |
This is a concept of creating a goroutine safe object that uses channels under the covers to communicate | |
with the internal map[string]string structure. I know that typically this kind of solution may done | |
with mutexes but the excercise was in using channels on purpose although they are heavier. | |
Note a couple of points: | |
- When using channels, you can still build a public-facing api that nicely abstracts them away, therefore |
On July 22, Github announced the 3rd Annual Github Data Challenge presenting multiple sources of data available.
This sounded to me a good opportunity to use their available data and import it in Neo4j in order to have a lot of fun at analyzing the data that fits naturally in a graph.
As I work mainly offline or behind military proxies that do not permit me to use the ReST API, I decided to go for the Github Archive available here, you can then download json files representing Github Events on a daily/hour basis.
I've been looking for the best Linux backup system, and also reading lots of HN comments.
Instead of putting pros and cons of every backup system I'll just list some deal-breakers which would disqualify them.
Also I would like that you, the HN community, would add more deal breakers for these or other backup systems if you know some more and at the same time, if you have data to disprove some of the deal-breakers listed here (benchmarks, info about something being true for older releases but is fixed on newer releases), please share it so that I can edit this list accordingly.
- It has a lot of management overhead and that's a problem if you don't have time for a full time backup administrator.