Skip to content

Instantly share code, notes, and snippets.

@PharkMillups
Created September 22, 2010 15:36
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save PharkMillups/591899 to your computer and use it in GitHub Desktop.
Save PharkMillups/591899 to your computer and use it in GitHub Desktop.
18:47 <jpartogi_> technoweenie: are you using riak in production?
18:47 <technoweenie> i'm not
18:47 <technoweenie> ericflo is
18:48 <technoweenie> ok maybe i shouldnt have called him out, ha
18:48 <ericflo_> yay
18:48 <ericflo_> no it's cool
18:48 <jpartogi_> the django dude
18:49 <ericflo_> heh, I suppose that's me :)
18:52 <ericflo_> technoweenie: Are you still looking at using Riak for the same purpose you mentioned earlier?
18:52 <ericflo_> technoweenie: Because I was decently drunk when you told me, and I'd be interested to
hear how you end up modeling that.
18:52 <technoweenie> ericflo_: yea, event storage.
18:52 <technoweenie> hah
18:53 <technoweenie> ok so we create Event records in mysql for every event... and then we look at
watchers and create a duplicate for them
18:53 <ericflo_> technoweenie: cool, didn't know if it was public knowledge or not
18:53 <technoweenie> ah, w/e :)
18:54 <technoweenie> so right now i have it split into 2 tables, and events table for the first,
unique event, and a timelines table that tracks all the watchers
18:54 <technoweenie> and i want the events table in riak.
18:54 <ericflo_> technoweenie: got it
18:54 <ericflo_> riak is pretty perfect for that
18:55 <technoweenie> so to build a list of events, i'd do "lrange event:recipient:21 0 29" in redis,
and then a map reduce to get all the values given the 30 keys
18:55 <ericflo_> Ahh, was just about to ask that, so the timelines are going into Redis?
18:55 <technoweenie> yea thats the plan
18:55 <ericflo_> I think with protobufs you can do multiget
18:55 <technoweenie> right now its in mysql... i ran some redis benches this weekend
18:55 <technoweenie> oh well thatd be perfect
18:56 <technoweenie> though it would be neat to use the map reduce to squash common events or something
18:56 <technoweenie> so if there are 3 wiki updates, squash them into 1
18:57 <ericflo_> technoweenie: Yeah, that'd be cool
18:57 <ericflo_> Although it could be nice to put some info into the keys so that you could do that without
hitting the datastore
18:57 <ericflo_> sorry, you weren't soliciting feedback :)
18:57 <* ericflo_> is nosy
18:57 <technoweenie> well i was thinking of that
18:58 <technoweenie> could store an array of [event-id, actor-id, event-type] instead of just event_id
18:58 <technoweenie> [[event-id, actor-id, event-type], [event-id, actor-id, event-type],
[event-id, actor-id, event-type]]
18:58 <ericflo_> yeah
18:59 <technoweenie> but then i might want to squash cases where 2 people post issues right away
18:59 <technoweenie> bob and fred created an issue on rails/rails
18:59 <technoweenie> shrug i dont know :)
18:59 <ericflo_> yeah, it's hard to know what to collapse without doing analysis on the dataset
19:00 <technoweenie> im waiting for kyle to drop a design comp on me
19:00 <ericflo_> man I'm looking in the protobufs for the multiget, I could have sworn I've seen it before
19:00 <technoweenie> were you drunk then too
19:00 <ericflo_> probably
19:01 <technoweenie> the crazy thing is that when i compare the new events implementation to the old one,
the old one is storing roughly 20x more events due to all the extra ones for watchers
19:01 <technoweenie> so if we get that cut down, the events table will be at a much more manageable
level for mysql
19:05 <ericflo_> anyway it looks like I must've been drunk or something when I saw that multiget stuff
19:05 <jpartogi_> so is there a django ORM for riak?
19:06 <ericflo_> jpartogi_: Nope
19:06 <jpartogi_> so how do you do it?
19:07 <ericflo_> jpartogi_: It really wouldn't make sense anyway, because Django's ORM maps pretty
closely to SQL but not so well to Riak
19:07 <jpartogi_> oh ok
19:07 <ericflo_> jpartogi_: You import the riak Python client and make queries using that
19:07 <benblack> you write code.
19:07 <ericflo_> jpartogi_: http://hg.basho.com/riak-python-client
19:07 <jpartogi_> do you encode your request to json?
19:08 <benblack> if you want to query it as json with m/r
19:13 <ericflo_> jpartogi_: The client library will take care of the json encoding/decoding
for you automatically.
19:15 <ericflo_> Hmm, doesn't look like there's a single example of how to use the Python
Riak client library.
19:30 <jpartogi_> ericflo: do you use riak for customer facing apps?
19:30 <ericflo_> jpartogi_: Yep
19:31 <ericflo_> jpartogi_: We use it as the backing store for our web sessions, to store friendship
relationships for our social network, to store arbitrary flash object for our flash developers, and
to for metadata about our url shortener.
19:31 <ericflo_> None of them have very much data in it though
19:32 <ericflo_> Sessions has the most, with about 20 million rows a few months back, not
sure how large it is now.
19:32 <ericflo_> s/rows/keys/
19:34 <jpartogi_> ericflo: but is this data served for the customer? or is it just a metadata for
another database?
19:34 <ericflo_> jpartogi_: The web sessions are accessed directly every time any customer hits any
page on our website.
19:35 <ericflo_> jpartogi_: Is that what you mean?
19:35 <benblack> jpartogi_: what is the question you are trying to answer for yourself?
19:40 <jpartogi_> well is this data in riak displayed in the browser?
19:41 <jpartogi_> or is it just a bucket of data for internal use?
19:42 <ericflo_> jpartogi_: displayed in the browser. But I'll echo benblack's question, how does
this question translate into what you're trying to do?
19:44 <jpartogi_> well, some people are using riak only for data warehouse
19:45 <joseph_sh> I've been building a website using nitrogen project, with a backend of everything
riak, including sessions
19:45 <joseph_sh> key lookups are fast
19:46 <ericflo_> jpartogi_: Interesting. Data warehousing isn't really Riak's forté.
19:46 <joseph_sh> we are also building a large scale messaging system on riak
19:46 <joseph_sh> not data warehousing
19:58 <jpartogi_> yeah, I think when people see riak as a key-value store they start abusing
it for data warehousing
19:59 <jpartogi_> I know most people that use cassandra use it as data warehousing as well
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment