Create a gist now

Instantly share code, notes, and snippets.

RedisConf Notes Part IV
Redis Pain - Matt @mranney from Voxer (did node-redis)
* Asked to talk about redis stress points.. "no stress, things work for a while and there is no stress and then... you enter a world of pain."
* Pain from how we use it at Voxer. Explanation of Voxer and its use cases.
* People assume Voxer is "how hard can it be?".. "That's how it used to be until we got a bunch of pictures..."
* Growth curve looks like Pinterests "might be the same because there was no label on the y-axis.. might be EXACTLY the same."
* Computers are hard... nothing works as it is supposed to... and eventually you fly into nerd rage and then you become a curmmudgeon... getting serious, this isn't real pain -- we aren't selling stree sheets -- this are great problems to have...
* They use redis as cache for Riak, also for rapidly changing data, "data we can afford to lose", throttling, NO SAVING.
* "We really like redis."
* "When we first started building voxer, I never understood why people use Redis at all... we have a db and ... you can make them go faster ... buy ssds ... why use redis at all?" Initial pain of getting an idea about why and how to think about redis.
* Docs aren't clear about what you are trading off to get all of this magic.
* Temptation to dive in...setting yourself up for failure down the line.
* Native clients... had to implement rate-limiting... not for hackers, but for ourselves.
* Everything was going to couchdb, which had no way of handling rapidly-changing data.. started using redis for throttling
* "We were already using CouchDB so [when we added redis] we had this hipster devops thing going"
* Pain 2/6: Problems with BGSAVE. Copy-on-write is fine if you don't have quickly-changing data.. but quickly changing data is why you use redis -- so if the bgsave takes a while, then you can consume 2x memory and get yourself into "a deep, deep hole if you are changing your data faster than it can be written".. this bgsave also happens whenever a slave connects.. if your slave connects while a bgsave is happening -- wah-wah, now you have TWO bgsaves going on..
* Pain 3/6: Reading back the data from disk. "As if writing wasn't bad enough, reading is worse".. If you have lots of data, your server is **offline** for minutes. "The server is up and accepting tcp connections, but the only valid command is INFO."
* You know, AOF, I want to mention. We don't actually use it, but theoretically, it might limit the exposure to data loss, but it makes MORE disk i/o.
* Right solution: expect less! Disk store and vm were failures, but that's fine! I'd say aof and snapshots were also failures, but that's fine because it is a GREAT in-memory db. If you just expect it to not [do persistence] then, it is great.
* "The things redis is great at, being fast, are directly at-odds with writing to disk"
* Pain 4/6: Single-CPU
* "We'd really like the API of Redis but the clustering of Riak.."
* "Somebody should do that.. take redis and maybe a postgres backend..?"
* Pain 5/6: Scaling. SPOF.
* Even at riak meetups, everyone is talking about redis cluster.
* You HAVE to shard eventually.. "I would argue the best way is in your sucks but at the moment, that's just kind of how it is.."
* Pain 6/6 Operations: Just because it doesn't crash very often, and it is true it doesn't crash often, doesn't mean things don't go wrong. When they do, it is VERY hard to tell what is going on -- memory, cpu, etc. Complaints about impact of MONITOR (double CPU, gets backlogged, etc.) "What actions can you take to get your memory back, or have you screwed it up?" It would be nice if we had DTrace support...
* It would be great it redis had better visibility so you knew where your memory and cpu were going. "Copy-on-write is clever, but it is very surprising and awkward for ops people."
* The issue of support -- and it is kind of lame to suggest this is a problem -- ... but people who run businesses, they want support contracts. Some people on mailing lists offer support, but it seems amateurish...
* Eventually this will be solved... but until then, we have work to-do.
* "It's all WAY more efficient to pipeline...fewer packets on the network..."
Redis PubSub and ActivityStreams - Monica Wilkinsin from PSHB @ciberch
* Lessons learned building realtime activity stream with node, redis and mongodb
* "About me": Recently joined crushpath, developer for a while, fond of Node.js. Prior to that, worked at VMWare, facebook, socialcast, myspace
* Defining activity streams a la facebook, jira, github, "general event tracking"
* "Generally, activities are shared passively...[unlike twitter's active posting]...they are usually being tracked"
* How can we *share* activity streams (standards?) worked with people from a variety of companies, settled on json.
* Open Web Foundation licensing, so anyone can use it.
* Example of a post.
* Defines activity stream engine, managing pub/sub, fanning out, etc
* Coded up node/redis/mongo implementation of the spec that helped write, using express and node-express-boilerplate and, switched plaintext for activitystream, added persistence, used redis pub/sub to overcome's single-server limitation
* "Unlike Danielle, my UX skills are terrible, so please don't judge me on that."" The final app:
* "I'll talk one slide about Mongo, don't kill me!"
* Explains joinless mongodb usage, but having it be more structured than Riak because of in-document queries. "Some people would call it schemaless, I call it flexible schema.. just like activitystreams schema, anybody can extend it."
* Replaced in-memory store of messages with redis. Description of how PubSub in redis works.
* "In my first version, I forgot to unsubscribe, so in a couple of days I crashed the app.. so don't forget to unsubscribe."
* node module: activity-streams-mongoose. Wraps getters/setters to auto-publish.
* Description of client-side with Backbone hooked up to and jade templates.. moved rendering from server-side to client-side.. "was really cool to be able to use jade templates in both places.."
* Demo:, posting photos and watching it updates. In redis-cli, subscribing to the photo
* Filtering client side vs server-side, trade-offs.
* Talking about use-cases of activity streams.. started with Facebook just to fill the page provide opportunities to Like and comment.. enterprise tools / "knowledge sharing"..
* Now, started with rails, lots of 3rd-party integration;linked in, salesforce, superfeeder, lots of complexity.
* Talking about using redis as message bus for integration, kind of like an enterprise service bus.
Rate-limiting w/ Alan Shreve, core engineering and architecture at twilio
* New distributed rate-limiting queues with redis 2.6
* Talk about why it was built, and the implementation.
* The problem is that the phone network has rules. One of the rules is that you can only enter the phone network at a bounded rate -- you can only originate requests at a certain rate -- they provide buffering and queueing to hide this from users.
* Different rules; some regulatory, some bizdev / 3rd party / contracts, "origination" is a limited resource
* Capacity management is tricky, this lets them smooth it out so they can have lead-time to scale to accomodate spikey demand. Animation that illustrates temporal smoothing.
* "Leaky bucket queue" algorithm; no matter how fast you fill it, it always leaks out at the same rate.. form of "traffic shaping".
* This is easy if you have one queue, but it becomes tricky when you scale and have millions of open and active queues.
* Additional twilio-imposed constraints: must be horizonally scalable. low-latency (<10ms). HA & Fault-tolerant. NO DATA LOSS ALLOWED. Runs majority of software on EC2. So, in the case of arbitrary node failure, need to keep all of these properties.
* At least 1k originations/sec per node, 1k queues per node at least. Multi-tenant and transactional. Pull-based, not pushed. Customizable, transparent, introspectable.
* Queueing is very common problem, not too many rate-limiting queues that are out there. Most are counter-based, not leaky-bucket.
* "Redis didn't even exist when we first took a crack at this problem".. issues with existing solutions not having fanning support, durability issues.
* Tried several approaches: 1st pass, sql, "when all you have is a hammer..". SQL's over-capacity failure modes for queues didn't fit, deadlock issues, etc. 2nd pass: redis and timers. Service that enqueues and a dequeuer. (Detailed description of system.) 3rd pass: redis w/ lua scripting. (detailed description of how it works).
* Redis wins because it is high-performance and is easily customizable through Lua scripting. "What is advantage over just writing a networked server?" "Easy and free replication and persistence if you want.."
* MONITOR shows all of the commands that your Lua script executes.
Lightning talk from Brock @sintaxi
* Complain about antipattern.
* APIs that have functions, so we end up creating SDKs to access our resources
* Thinks we should change how we structure our models: rolodex is a project that supports this paradigm where interface is the same if it is local or remote.
* Also created a project called thug which is a "non-orm", which is Express middleware
* "The theme is that the tool's shouldn't be distributing the system, it should be our applications..."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment