Skip to content

Instantly share code, notes, and snippets.

Avatar
🗿
.

Bob Poekert bobpoekert

🗿
.
View GitHub Profile
@bobpoekert
bobpoekert / ThreadLocalThing.java
Last active Apr 10, 2017
clojure thread local
View ThreadLocalThing.java
import clojure.lang.IFn;
import clojure.lang.IDeref;
import java.lang.ThreadLocal;
public class ThreadLocalThing extends ThreadLocal implements IDeref {
final Object sentinelValue;
final IFn generator;
View bad_reasons_to_start_a_company.md

Bad Reasons to Start A Company

  1. You want people to like you / want to be in charge of people / think you're a "leader"
  2. You think you're smarter than everyone else
  3. You think you have some technical "secret sauce" that no one else has and that this has intrinsic value
  4. You think your PhD thesis is a product
  5. Mummy and Daddy are giving you a seed round because they want to get you out of the house
  6. You live in LA or New York and you're jealous of San Francisco
  7. You heard that $fad (Big Data/IoT/Adtech/Fintech/whatever) was big
View sieve.pyx
#!/usr/bin/env python
# Number to guess: How large of a prime can we find,
# using a naive algorithm, in a second?
import sys
import itertools
import math
from libc.stdlib cimport malloc, free
cimport libc.stdio as stdio
View postgres_webskale.md

Postgres Webskale(tm)

What would it take to fold the ad-hoc sharding that people do with postgres into postgres? Or, what would it take to make postgres scale like riak and cassandra?

  • A shard routing server kind of like mongos. This could be implemented as a foreign data wrapper that holds a connection pool and routes queries to shards based on qualifiers it has with respect to the column being sharded on. It could get fancy by supporting scatter-gather aggregation and joins but that probably isn't necessary because by the point you need to shard doing aggregation and joins on the production database is already too dangerous (I assert). Updates to the hash ring happen over paxos (or zab or raft or whatever).
  • A server responsible for annointing new users and assigning them to shards. This could happen at random or it could be aware of geographic locality and things.
  • The multicast views described below
@bobpoekert
bobpoekert / sha256.c
Last active Aug 29, 2015
NaCl's sha256 implementation pulled out into a standalone executable, buildable from a single C file
View sha256.c
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h> /* so we don't have to deal with buffering and fread()-ing */
#include <sys/types.h>
#include <unistd.h>
#include <sys/stat.h>
#include <string.h>
/* http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/utopic/nacl/utopic/view/head:/crypto_hashblocks/sha256/inplace/blocks.c */
View partial.py
class partial(object):
def __init__(self, fn, *args):
self.fn = fn
self.args = args
def __call__(self, *extra_args):
return self.fn(*(self.args + extra_args))
View gist:f4613bde4fabae5b50bb
  • Doing a query with a has_parent filter when a parent-child relation references a mapping that doesn't exist returns a NullPointerException (instead of a more informative error)
  • Adding a port number to a unicast host in elasticsearch.yml causes that node to recieve invalid (ie unparseable) http requests
  • Missing a newline in a bulk insert request caused subsequent queries on that index to return invalid json
  • Doing a delete by query on an index that removed a significant number of documents caused refresh requests on that index to return NullPointerExceptions
  • Shards moving between nodes for no apparent reason
  • Shards becoming unassigned for no apparent reason
  • Shards becoming unassigned even when all of the shards in the cluster had been routed manually and shard allocation had been disabled
  • Shards losing all of their documents if a write is performed while it's unavailable
View sentiments.clj
(def nlp
(let [props (new java.util.Properties)]
(.setProperty props "annotators" "tokenize,ssplit,parse,sentiment")
(new edu.stanford.nlp.pipeline.StanfordCoreNLP props)))
(defn find-sentiment
"Determines the sentiment of each sentence in a given glob of text. Results in a collection of integers ranging from [0-4]: where 0 is 'Very negative', 2 is 'neutral', and 4 is 'Very positive'"
[^String glob]
(let [main-sentiment 0
longest 0]
@bobpoekert
bobpoekert / gist:685bce859831eabfdc9e
Last active Aug 29, 2015
Transfer-Encoding: Chuxed
View gist:685bce859831eabfdc9e

In HTTP/1.1, Transfer-Encoding: chunked allows servers to stream responses to clients in pieces without knowing the length in advance, which is useful in, for example, chat and live streaming applications.

HTTP/1.1 also allows request pipelining, which lets clients do multiple get requests in parallel, saving the latency overhead of opening a new TCP socket for each one.

It's not currently possible to use both of these things together, because chunks in HTTP chunked encoding don't have any information in them about which stream they came from. Chunks are <length field>\r\n<data>.

But if a new transfer-encoding were added that had stream ids, you could do multiple requsts in parallel even if the responses were streamed. Let's call this encoding Transfer-Encoding: chuxed (for "chunked, multiplexed").

There are two differences between chuxed and chunked:

View gist:8049579

Data structures:

Hash table mapping tokens -> <document-count, count-min-sketch(docuemnt id -> term count)>
Hash table mapping sketch indexes -> heap(<document id, term count dictionary> sorted by document id)

To search:

  1. sum sketches for all terms in the query
  2. find indexes of top k values in result sketch
  3. look up actual document ids and term counts for those indexes