Created
September 17, 2010 17:15
-
-
Save PharkMillups/584572 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| 15:25 <mheld> question to pose for y'all | |
| 15:25 <mheld> why would one use riak over mongodb? | |
| 15:26 <jdmaturen> cue benblack | |
| 15:26 <mheld> I don't mean it to be flamebait | |
| 15:26 <mheld> I really don't | |
| 15:26 <bingeldac> heh | |
| 15:26 <bingeldac> some of it is covered on the wiki | |
| 15:26 <bingeldac> in the comparison article | |
| ( here ---> http://wiki.basho.com/display/RIAK/Riak+Compared+to+MongoDB) | |
| 15:27 <mheld> those are differences, but not really a "this is when you use riak and this is when you use mongo" | |
| 15:28 <ericflo> use mongo when you could use a relational db, there's really not much | |
| difference except flexible schema IMO | |
| 15:29 <benblack> mheld: think i answered this for you previously. | |
| 15:29 <mheld> benblack: if you did, I'm sorry. I've forgotton it | |
| 15:29 <mheld> forgotten | |
| 15:29 <benblack> first part of the answer is: they really have little in common, so the comparison is forced. | |
| 15:30 <mheld> mhmm | |
| 15:31 <benblack> second part is: if you want a rich query language that is familiar from the | |
| relational world, don't have that much data, don't have hard durability requirements, and don't | |
| require real distribution, then mongo is great | |
| 15:31 <benblack> if you can deal with a different data and query model, have a lot of data, care | |
| about data durability, and don't want to make managing replication and distribution a full time | |
| job, then you probably will prefer riak | |
| 15:32 <mheld> that's good :-) | |
| 15:33 <benblack> but, again, they are really not comparable and you can definitely use them | |
| effectively in combination | |
| 15:33 <benblack> think of mongo more like memcache with a better query interface and you | |
| are not far off | |
| 15:33 <mheld> do people ever use just riak as a backend? | |
| 15:33 <mheld> or am I being naive? | |
| 15:34 <benblack> just as a backend for what? | |
| 15:34 <mheld> database | |
| 15:34 <mheld> for web services | |
| 15:34 <benblack> you mean like people usually use memcache in front of mysql? | |
| 15:35 <mheld> like using riak + mysql | |
| 15:35 <mheld> vs just riak | |
| 15:35 <benblack> think you missed my question | |
| 15:36 <benblack> if your goal is to have one database to do everything you should stick | |
| with an rdbms. that isn't how you best use nosql systems. | |
| 15:36 <benblack> you should expect to use different ones in combination | |
| 15:36 <mheld> hmm | |
| 15:37 <benblack> that's why questions trying to compare mongo vs riak (or whatever else) | |
| are not straightforward | |
| 15:37 <mheld> I see | |
| 15:38 <mheld> I'd like to use just one database | |
| 15:39 <mheld> but it doesn't feel right to use a rdbms for the data munchey stuff we're doing | |
| 15:40 <benblack> why would you like to use one database? | |
| 15:41 <mheld> well, every type of data we have would have to have the same operations done to them | |
| 15:42 <mheld> I've got users which have different relationships to pieces of data | |
| 15:43 <mheld> and while it'd make sense to store the users in a rdbms | |
| 15:44 <mheld> there'd be a lot of weird data relating users to the other stuff | |
| 15:45 <benblack> you can of course try to push everything to the same db | |
| 15:45 <benblack> just letting you know it is common to have several databases | |
| 15:46 <ericflo> mheld: how much data are you dealing with? | |
| 15:48 <mheld> ericflo: now, not much... on the level of hundreds of megabytes | |
| 15:49 <mheld> but we're anticipating a much larger level within the next few weeks | |
| 15:49 <ericflo> oh wow, yeah any relational database out there can handle that | |
| 15:49 <ericflo> how much larger? | |
| 15:49 <mheld> orders of magnitude | |
| 15:49 <ericflo> mheld: how many? | |
| 15:49 <mheld> ideally terabytes | |
| 15:49 <mheld> petabytes eventually | |
| 15:51 <benblack> mheld: ok. this is a common problem. | |
| 15:51 <benblack> "well, ideally, we'll have more data than can possibly fit in anything we | |
| can conceive of designing or building!" | |
| 15:51 <benblack> you have to be realistic and specific. | |
| 15:51 <mheld> we'll we've just acquired our first major partner | |
| 15:52 <mheld> and we're in talks with another | |
| 15:52 <mheld> and we need to start acquiring more data | |
| 15:53 <benblack> sure, big stuff coming | |
| 15:53 <benblack> you still have to do capacity planning | |
| 15:53 <benblack> because you aren't putting a petabyte in riak right now | |
| 15:53 <mheld> mhmm | |
| 15:53 <benblack> (or anything else) | |
| 15:54 <benblack> folks with petabytes of data to store and process have full time engineers | |
| working on storage infrastructure. | |
| 15:54 <benblack> they aren't taking an existing system and running it as is | |
| 15:55 <mheld> I'm not doubting that, I just want to be sure I'm not looking down the wrong path | |
| 15:55 <mheld> I rather like riak | |
| 15:55 <benblack> step 1 is being realistic about data growth | |
| 15:55 <benblack> realistic and _specific_ | |
| 15:56 <mheld> alright, say I've given up on the petabyte dream | |
| 15:56 <mheld> we're thinking hundreds of gigs | |
| 15:56 <benblack> ok | |
| 15:56 <benblack> and what kind of processing? | |
| 15:57 <mheld> we're essentially the history of the web and how people interact with it | |
| 15:57 <mheld> over time | |
| 15:57 <mheld> essentially capturing* | |
| 15:57 <mheld> I accidentally the verb | |
| 15:57 <mheld> we're doing trends, recommendations, and analytics | |
| 15:58 <mheld> market searching, info trending | |
| 16:01 <benblack> i don't want to scare you away from riak, which i love dearly, but you | |
| are describing the sweet spot for hadoop. | |
| 16:01 <mheld> ha, it's funny that you mention that | |
| 16:02 <mheld> because I've just purchased a few hadoop books | |
| 16:03 <mheld> why would this fit hadoop better than riak? | |
| 16:04 <benblack> because you are describing taking in a large amount of behavioral data | |
| and doing bulk analytics on it | |
| 16:04 <benblack> that's what hadoop is for | |
| 16:05 <mheld> hmm | |
| 16:13 <mheld> in my head, the only difference (data processing-wise) between riak and | |
| hadoop/hbase is the internal data structure (k/v store vs columnar rdbms) | |
| 16:13 <mheld> is that not right? | |
| 16:13 <benblack> nope, not right | |
| 16:14 <mheld> they still both do mapreduce | |
| 16:14 <benblack> java and erlang are both programming languages | |
| 16:14 <jdmaturen> benblack and I are both human | |
| 16:15 <benblack> map reduce is a processing model, it implies nothing about whether a specific | |
| implementation of that model is appropriate for a given application | |
| 16:15 <benblack> hadoop is built to do large-scale, batch processing for analysis | |
| 16:15 <benblack> riak's map reduce is primarily for interactive querying | |
| 16:16 <benblack> that one is written in java and the other (essentially) javascript is a | |
| good indicator they are not for the same purpose | |
| 16:16 <mheld> how is that different? | |
| 16:16 <mheld> can you not use riak to do batch processing? | |
| 16:17 <benblack> wow, ok | |
| 16:17 <benblack> mheld: what programming languages do you generally use? | |
| 16:18 <mheld> if I'm pissing you off, I'll leave you alone, I'm just trying to understand | |
| this and apparently failing | |
| 16:18 <mheld> day to day I'll use java, ruby, and scala | |
| 16:18 <benblack> ok | |
| 16:18 <benblack> why do you use ruby vs java? | |
| 16:19 <benblack> they are both programming languages, right? | |
| 16:19 <mheld> functional constructs that just aren't there in java | |
| 16:19 <mheld> is why I'd use ruby | |
| 16:19 <benblack> ok | |
| 16:19 <benblack> any kind of performance differences? | |
| 16:19 <mheld> java is generally faster for most things | |
| 16:19 <mheld> iirc | |
| 16:19 <benblack> ...by a lot. | |
| 16:20 <mheld> mhm | |
| 16:20 <benblack> but you want to write something straightforward, and compact | |
| 16:20 <benblack> less concerned about performance | |
| 16:20 <benblack> you'd probably use ruby, right? | |
| 16:21 <mheld> yes | |
| 16:21 <mheld> oh | |
| 16:21 <makmanalp> lambdaj is pretty neat, btw, but i can't help but feel that it's a hackjob | |
| 16:21 <makmanalp> and doesn't really give you everything | |
| 16:22 <mheld> benblack: I think I get what you're painting a picture of | |
| 16:22 <mheld> I'm just being retarded | |
| 16:22 <makmanalp> i'd be better off using something like clojure if the jvm is a must | |
| 16:22 <mheld> makmanalp: why not scala? | |
| 16:22 <makmanalp> mheld: scala looks weird to me, but that's a matter of personal taste | |
| 16:22 <makmanalp> i've used scheme a lot | |
| 16:23 <mheld> ah | |
| 16:23 <makmanalp> so clojure is familiar | |
| 16:23 <benblack> mheld: riak is really great at a lot of things (like being stupidly | |
| simple to operate, scale up and down, tuning consistency, tuning backends, etc) | |
| 16:23 <benblack> mheld: but bulk analytics on a petabyte, not so much. | |
| 16:24 <mheld> hmm | |
| 16:24 <makmanalp> and so it programming languages, etc | |
| 16:24 <makmanalp> AI, depending on the professor | |
| 16:24 <mheld> makmanalp: ah sweet | |
| 16:25 <mheld> makmanalp: HTDP? | |
| 6:49 <davidc_> I knew I was missing a channel... | |
| 18:19 <mheld> how fast are riak mapreduce queries compared to sql queries? | |
| 18:19 <mheld> well, sql queries with some magic done on them | |
| 18:21 <ericflo> mheld: sql queries don't have some set time that they all take, and | |
| neither do riak mapreduce jobs. | |
| 18:22 <ericflo> mheld: It also depends on what storage backend you use, and what | |
| version of Riak, and what language you choose for your map/reduce tasks, and if you leverage | |
| any of the built-in functions, etc. | |
| 18:24 <mheld> say I enter "greasy bacon" on google.com | |
| 18:25 <mheld> what happens? | |
| 18:25 <mheld> not as in GET requests | |
| 18:26 <mheld> as in database queries | |
| 18:27 <benblack> nothing remotely like what you will have with any database you've ever used | |
| 18:27 <benblack> they are precomputing enormous indices, then querying in parallel across | |
| a huge number of machines | |
| 18:27 <benblack> the indices and the backend queries are specifically structured for the purpose | |
| 18:27 <mheld> how do they get it all done in such a small time? | |
| 18:28 <ericflo> 2 minutes of searching found this: | |
| http://www.ams.org/samplings/feature-column/fcarc-pagerank | |
| 18:28 <benblack> mheld: as i just said. | |
| 18:35 <mheld> anybody know anything about | |
| http://googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html ? | |
| 18:36 <benblack> nobody outside google. | |
| 18:36 <* jdmaturen> waits for the paper | |
| 18:36 <benblack> you might also enjoy the paper from google on Dremel | |
| 18:37 <jdmaturen> yes |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment