Skip to content

Instantly share code, notes, and snippets.

@mythmon
Created December 22, 2015 00:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mythmon/7249c6c99066f663f7af to your computer and use it in GitHub Desktop.
Save mythmon/7249c6c99066f663f7af to your computer and use it in GitHub Desktop.
future ideas
* rule previewer
* I am a user in CANADA I am the FOURTH user. What do I get?
Notes from meeting
* no user should get the same question twice
* this could be done on the client
* or we could just never send the payload to the same user twice, providing a set collection primitive.
* we need to record how many users take each action from the responses
* I don't know if this is our part or something in heartbeat/input
* slow ramp up is very important
* I think customizable small bundles will be a good way forward instead of giant mega bundles.
* opt out
* i think can we do percentage instead of counts? that makes CDN stuff easier.
* privacy is super important
* "intelligently tune"
* there is a value exchange here. there is a tradeoff of magic vs creepiness.
Will told me
* heartbeat uses counts to make sure not to "use up" the population, since
we only get O(number of users) per year.
* Unclear why self-repair needs this.
* ideatown is like gmail labs - opt in for weirdness to get more data.
* to learn more about counts for self-repeat
* matt grimes
* gregg lindd
* (michael verdi)
Talked to Matt Grimes about counts
* Mostly for getting statistical significance
* Use percentage to control rate at which the limit is hit
* Close enough is fine, as long as the number if known.
* Probably err on the side of overshooting
* Also want to be able to guess how long it will take to get to the limit
* Probably guesses based on historic traffic and percentage of users selected
Architecture
* Name: Normandy
* Architecture (hard mode)
* Editor
* Django
* Writes to database to share with other components
* Server (need better name for this component)
* Maybe Django?
* If it is fast enough
* if not, Rust? :D
* Read only (?)
* Loads everything it needs into memory on boot, serves from that
* Reload data by restarting process (or maybe SIGHUP?)
* Has to handle ~300million hits / day = 3500/sec minimum
* Probably target peak of ~double that (7000/sec)
* 10 ms/request = 70 server processes
* 4 procs/box = ~18 boxes
* 16 procs/box = ~5 boxes
* Can probably do this in on DC.
* 50 ms/request = 350 server processes
* 4 procs/box = 88 boxes
* 16 procs/box = 22 boxes
* This needs multi DC I bet.
* Kitsune is at about
* 70ms/req including databases.
* 30ms/req for just python.
* Features
* Make a bundle of code
* Serve it to users following certain rules
* Rules are things like
* date ranges
* a certain count of users
* a particular percentage of users
* particular countries
* "Deploy additional payload P to (compositions of other rules)"
* Rules need to be eventually able to be generated by non-programmers
* v2 goal of a UI to build them
* It would be awesome if they are static and not turing complete.
* Scaling considerations
* Reading bundles has to be cheap (and so fast)
* But the admin can be "slow", and it can take a while to take effect
* Idea:
* Rules and bundles
* Single data center writable service for the editor
* Slow sync to cluster management system
* Reboot immutable server processes to win
* Store in github?
* This gets us code review for free
* Version control of code
* Means interacting with the github api. that is probably fine
* the bundles will probably be in git anyways for someone
* Store config rules there too?
* How does this interact with immutable deploy/data store
* Series of items
* Each item defines some rules and some bundles
* If rules match, send all bundles.
* Stop processing after one.
* Bundles defined separately, referenced from items/rules.
* This way bundles can be de-duped and used in many rules.
* Counter
* A second service
* this is the Geo distributed bit
* This needs to be fast to read, but can be slow to write
* Paxos? Raft? Idk, those things.
* Do we need unique counters? or just counters?
* just counters: easy, just count in the data store
* sysadmins tell me just use RDS, probably
* uniques:
* how high does this need to count? how many uniques?
* how many counters will there be? what dimensionality?
* Can we expire them eventually?
* scale? number?
* easy: hashmap?
* How is this synced?
* medium: bloom filters
* Sync a couple "numbers" - count and bloom bitmap.
* Are growing bloom filters a pain?
* hard: hyperloglog
* store a bunch of numbers, but state storage is harder.
german translations:
predicates:
{type: language, match: de}
bundles:
translations/de.js
spanish translations:
predicates:
{type: language, match: es}
bundles:
translations/es.js
experiment 1
predicates:
{type: daterange, start: 2015-12-11T09:55:36Z, end: 2015-12-18T09:55:36Z}
{type: language, match: en}
{type: country, match: canada}
{type: sample, rate: 0.05}
{type: count, limit: 10000}
bundles:
experiments/exp1.js
default
predicates
bundles:
default.js
=====================
16 possible bundles in naive case
O(2^n)
n=20, == ~million
6 observed bundles
n = l + e
O(l*2^e))
l=5, e=15, == 160K
l=5, e=10, == 5000
- translations/de.js, experiments/exp1.js, default.js
- translations/es.js, experiments/exp1.js, default.js
- experiments/exp1.js, default.js
- translations/de.js, default.js
- translations/es.js, default.js
- default.js
3 experiment only bundles
- german, mega (+bitmask)
- spanish, mega (+bitmask)
- mega (+bitmask)
l=5, e=15, == 5
============================
Salena says:
* Postgres might not be able to handle this load for counters
* Or rather, it might have bad effects on the users
* Postgres works by lock, update, write.
* No CAS-operations
* Locks could delay users, hurts throughput.
* But Postgres has hll, which might work
* Still need locking stuff
* need pgbouncer
* Selena can help us set this up once we get to that point
* Talk to Jonas Finnemann Jensen about this
* After the new year, set up a 1 hour chat about it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment