Skip to content

Instantly share code, notes, and snippets.

View bobpoekert's full-sized avatar
🗿
.

Bob Poekert bobpoekert

🗿
.
View GitHub Profile
@bobpoekert
bobpoekert / ThreadLocalThing.java
Last active December 12, 2023 02:42
clojure thread local
import clojure.lang.IFn;
import clojure.lang.IDeref;
import java.lang.ThreadLocal;
public class ThreadLocalThing extends ThreadLocal implements IDeref {
final Object sentinelValue;
final IFn generator;

Bad Reasons to Start A Company

  1. You want people to like you / want to be in charge of people / think you're a "leader"
  2. You think you're smarter than everyone else
  3. You think you have some technical "secret sauce" that no one else has and that this has intrinsic value
  4. You think your PhD thesis is a product
  5. Mummy and Daddy are giving you a seed round because they want to get you out of the house
  6. You live in LA or New York and you're jealous of San Francisco
  7. You heard that $fad (Big Data/IoT/Adtech/Fintech/whatever) was big
#!/usr/bin/env python
# Number to guess: How large of a prime can we find,
# using a naive algorithm, in a second?
import sys
import itertools
import math
from libc.stdlib cimport malloc, free
cimport libc.stdio as stdio

Postgres Webskale(tm)

What would it take to fold the ad-hoc sharding that people do with postgres into postgres? Or, what would it take to make postgres scale like riak and cassandra?

  • A shard routing server kind of like mongos. This could be implemented as a foreign data wrapper that holds a connection pool and routes queries to shards based on qualifiers it has with respect to the column being sharded on. It could get fancy by supporting scatter-gather aggregation and joins but that probably isn't necessary because by the point you need to shard doing aggregation and joins on the production database is already too dangerous (I assert). Updates to the hash ring happen over paxos (or zab or raft or whatever).
  • A server responsible for annointing new users and assigning them to shards. This could happen at random or it could be aware of geographic locality and things.
  • The multicast views described below
@bobpoekert
bobpoekert / sha256.c
Last active August 29, 2015 14:16
NaCl's sha256 implementation pulled out into a standalone executable, buildable from a single C file
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h> /* so we don't have to deal with buffering and fread()-ing */
#include <sys/types.h>
#include <unistd.h>
#include <sys/stat.h>
#include <string.h>
/* http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/utopic/nacl/utopic/view/head:/crypto_hashblocks/sha256/inplace/blocks.c */
class partial(object):
def __init__(self, fn, *args):
self.fn = fn
self.args = args
def __call__(self, *extra_args):
return self.fn(*(self.args + extra_args))
@bobpoekert
bobpoekert / gist:f4613bde4fabae5b50bb
Last active August 29, 2015 14:05
Elasticsearch errors
  • Doing a query with a has_parent filter when a parent-child relation references a mapping that doesn't exist returns a NullPointerException (instead of a more informative error)
  • Adding a port number to a unicast host in elasticsearch.yml causes that node to recieve invalid (ie unparseable) http requests
  • Missing a newline in a bulk insert request caused subsequent queries on that index to return invalid json
  • Doing a delete by query on an index that removed a significant number of documents caused refresh requests on that index to return NullPointerExceptions
  • Shards moving between nodes for no apparent reason
  • Shards becoming unassigned for no apparent reason
  • Shards becoming unassigned even when all of the shards in the cluster had been routed manually and shard allocation had been disabled
  • Shards losing all of their documents if a write is performed while it's unavailable
(def nlp
(let [props (new java.util.Properties)]
(.setProperty props "annotators" "tokenize,ssplit,parse,sentiment")
(new edu.stanford.nlp.pipeline.StanfordCoreNLP props)))
(defn find-sentiment
"Determines the sentiment of each sentence in a given glob of text. Results in a collection of integers ranging from [0-4]: where 0 is 'Very negative', 2 is 'neutral', and 4 is 'Very positive'"
[^String glob]
(let [main-sentiment 0
longest 0]
@bobpoekert
bobpoekert / gist:685bce859831eabfdc9e
Last active August 29, 2015 14:02
Transfer-Encoding: Chuxed

In HTTP/1.1, Transfer-Encoding: chunked allows servers to stream responses to clients in pieces without knowing the length in advance, which is useful in, for example, chat and live streaming applications.

HTTP/1.1 also allows request pipelining, which lets clients do multiple get requests in parallel, saving the latency overhead of opening a new TCP socket for each one.

It's not currently possible to use both of these things together, because chunks in HTTP chunked encoding don't have any information in them about which stream they came from. Chunks are <length field>\r\n<data>.

But if a new transfer-encoding were added that had stream ids, you could do multiple requsts in parallel even if the responses were streamed. Let's call this encoding Transfer-Encoding: chuxed (for "chunked, multiplexed").

There are two differences between chuxed and chunked:

Data structures:

Hash table mapping tokens -> <document-count, count-min-sketch(docuemnt id -> term count)>
Hash table mapping sketch indexes -> heap(<document id, term count dictionary> sorted by document id)

To search:

  1. sum sketches for all terms in the query
  2. find indexes of top k values in result sketch
  3. look up actual document ids and term counts for those indexes