Created
September 22, 2010 15:36
-
-
Save PharkMillups/591899 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
18:47 <jpartogi_> technoweenie: are you using riak in production? | |
18:47 <technoweenie> i'm not | |
18:47 <technoweenie> ericflo is | |
18:48 <technoweenie> ok maybe i shouldnt have called him out, ha | |
18:48 <ericflo_> yay | |
18:48 <ericflo_> no it's cool | |
18:48 <jpartogi_> the django dude | |
18:49 <ericflo_> heh, I suppose that's me :) | |
18:52 <ericflo_> technoweenie: Are you still looking at using Riak for the same purpose you mentioned earlier? | |
18:52 <ericflo_> technoweenie: Because I was decently drunk when you told me, and I'd be interested to | |
hear how you end up modeling that. | |
18:52 <technoweenie> ericflo_: yea, event storage. | |
18:52 <technoweenie> hah | |
18:53 <technoweenie> ok so we create Event records in mysql for every event... and then we look at | |
watchers and create a duplicate for them | |
18:53 <ericflo_> technoweenie: cool, didn't know if it was public knowledge or not | |
18:53 <technoweenie> ah, w/e :) | |
18:54 <technoweenie> so right now i have it split into 2 tables, and events table for the first, | |
unique event, and a timelines table that tracks all the watchers | |
18:54 <technoweenie> and i want the events table in riak. | |
18:54 <ericflo_> technoweenie: got it | |
18:54 <ericflo_> riak is pretty perfect for that | |
18:55 <technoweenie> so to build a list of events, i'd do "lrange event:recipient:21 0 29" in redis, | |
and then a map reduce to get all the values given the 30 keys | |
18:55 <ericflo_> Ahh, was just about to ask that, so the timelines are going into Redis? | |
18:55 <technoweenie> yea thats the plan | |
18:55 <ericflo_> I think with protobufs you can do multiget | |
18:55 <technoweenie> right now its in mysql... i ran some redis benches this weekend | |
18:55 <technoweenie> oh well thatd be perfect | |
18:56 <technoweenie> though it would be neat to use the map reduce to squash common events or something | |
18:56 <technoweenie> so if there are 3 wiki updates, squash them into 1 | |
18:57 <ericflo_> technoweenie: Yeah, that'd be cool | |
18:57 <ericflo_> Although it could be nice to put some info into the keys so that you could do that without | |
hitting the datastore | |
18:57 <ericflo_> sorry, you weren't soliciting feedback :) | |
18:57 <* ericflo_> is nosy | |
18:57 <technoweenie> well i was thinking of that | |
18:58 <technoweenie> could store an array of [event-id, actor-id, event-type] instead of just event_id | |
18:58 <technoweenie> [[event-id, actor-id, event-type], [event-id, actor-id, event-type], | |
[event-id, actor-id, event-type]] | |
18:58 <ericflo_> yeah | |
18:59 <technoweenie> but then i might want to squash cases where 2 people post issues right away | |
18:59 <technoweenie> bob and fred created an issue on rails/rails | |
18:59 <technoweenie> shrug i dont know :) | |
18:59 <ericflo_> yeah, it's hard to know what to collapse without doing analysis on the dataset | |
19:00 <technoweenie> im waiting for kyle to drop a design comp on me | |
19:00 <ericflo_> man I'm looking in the protobufs for the multiget, I could have sworn I've seen it before | |
19:00 <technoweenie> were you drunk then too | |
19:00 <ericflo_> probably | |
19:01 <technoweenie> the crazy thing is that when i compare the new events implementation to the old one, | |
the old one is storing roughly 20x more events due to all the extra ones for watchers | |
19:01 <technoweenie> so if we get that cut down, the events table will be at a much more manageable | |
level for mysql | |
19:05 <ericflo_> anyway it looks like I must've been drunk or something when I saw that multiget stuff | |
19:05 <jpartogi_> so is there a django ORM for riak? | |
19:06 <ericflo_> jpartogi_: Nope | |
19:06 <jpartogi_> so how do you do it? | |
19:07 <ericflo_> jpartogi_: It really wouldn't make sense anyway, because Django's ORM maps pretty | |
closely to SQL but not so well to Riak | |
19:07 <jpartogi_> oh ok | |
19:07 <ericflo_> jpartogi_: You import the riak Python client and make queries using that | |
19:07 <benblack> you write code. | |
19:07 <ericflo_> jpartogi_: http://hg.basho.com/riak-python-client | |
19:07 <jpartogi_> do you encode your request to json? | |
19:08 <benblack> if you want to query it as json with m/r | |
19:13 <ericflo_> jpartogi_: The client library will take care of the json encoding/decoding | |
for you automatically. | |
19:15 <ericflo_> Hmm, doesn't look like there's a single example of how to use the Python | |
Riak client library. | |
19:30 <jpartogi_> ericflo: do you use riak for customer facing apps? | |
19:30 <ericflo_> jpartogi_: Yep | |
19:31 <ericflo_> jpartogi_: We use it as the backing store for our web sessions, to store friendship | |
relationships for our social network, to store arbitrary flash object for our flash developers, and | |
to for metadata about our url shortener. | |
19:31 <ericflo_> None of them have very much data in it though | |
19:32 <ericflo_> Sessions has the most, with about 20 million rows a few months back, not | |
sure how large it is now. | |
19:32 <ericflo_> s/rows/keys/ | |
19:34 <jpartogi_> ericflo: but is this data served for the customer? or is it just a metadata for | |
another database? | |
19:34 <ericflo_> jpartogi_: The web sessions are accessed directly every time any customer hits any | |
page on our website. | |
19:35 <ericflo_> jpartogi_: Is that what you mean? | |
19:35 <benblack> jpartogi_: what is the question you are trying to answer for yourself? | |
19:40 <jpartogi_> well is this data in riak displayed in the browser? | |
19:41 <jpartogi_> or is it just a bucket of data for internal use? | |
19:42 <ericflo_> jpartogi_: displayed in the browser. But I'll echo benblack's question, how does | |
this question translate into what you're trying to do? | |
19:44 <jpartogi_> well, some people are using riak only for data warehouse | |
19:45 <joseph_sh> I've been building a website using nitrogen project, with a backend of everything | |
riak, including sessions | |
19:45 <joseph_sh> key lookups are fast | |
19:46 <ericflo_> jpartogi_: Interesting. Data warehousing isn't really Riak's forté. | |
19:46 <joseph_sh> we are also building a large scale messaging system on riak | |
19:46 <joseph_sh> not data warehousing | |
19:58 <jpartogi_> yeah, I think when people see riak as a key-value store they start abusing | |
it for data warehousing | |
19:59 <jpartogi_> I know most people that use cassandra use it as data warehousing as well |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment