Skip to content

Instantly share code, notes, and snippets.

@PharkMillups
Created September 17, 2010 17:15
Show Gist options
  • Select an option

  • Save PharkMillups/584572 to your computer and use it in GitHub Desktop.

Select an option

Save PharkMillups/584572 to your computer and use it in GitHub Desktop.
15:25 <mheld> question to pose for y'all
15:25 <mheld> why would one use riak over mongodb?
15:26 <jdmaturen> cue benblack
15:26 <mheld> I don't mean it to be flamebait
15:26 <mheld> I really don't
15:26 <bingeldac> heh
15:26 <bingeldac> some of it is covered on the wiki
15:26 <bingeldac> in the comparison article
( here ---> http://wiki.basho.com/display/RIAK/Riak+Compared+to+MongoDB)
15:27 <mheld> those are differences, but not really a "this is when you use riak and this is when you use mongo"
15:28 <ericflo> use mongo when you could use a relational db, there's really not much
difference except flexible schema IMO
15:29 <benblack> mheld: think i answered this for you previously.
15:29 <mheld> benblack: if you did, I'm sorry. I've forgotton it
15:29 <mheld> forgotten
15:29 <benblack> first part of the answer is: they really have little in common, so the comparison is forced.
15:30 <mheld> mhmm
15:31 <benblack> second part is: if you want a rich query language that is familiar from the
relational world, don't have that much data, don't have hard durability requirements, and don't
require real distribution, then mongo is great
15:31 <benblack> if you can deal with a different data and query model, have a lot of data, care
about data durability, and don't want to make managing replication and distribution a full time
job, then you probably will prefer riak
15:32 <mheld> that's good :-)
15:33 <benblack> but, again, they are really not comparable and you can definitely use them
effectively in combination
15:33 <benblack> think of mongo more like memcache with a better query interface and you
are not far off
15:33 <mheld> do people ever use just riak as a backend?
15:33 <mheld> or am I being naive?
15:34 <benblack> just as a backend for what?
15:34 <mheld> database
15:34 <mheld> for web services
15:34 <benblack> you mean like people usually use memcache in front of mysql?
15:35 <mheld> like using riak + mysql
15:35 <mheld> vs just riak
15:35 <benblack> think you missed my question
15:36 <benblack> if your goal is to have one database to do everything you should stick
with an rdbms. that isn't how you best use nosql systems.
15:36 <benblack> you should expect to use different ones in combination
15:36 <mheld> hmm
15:37 <benblack> that's why questions trying to compare mongo vs riak (or whatever else)
are not straightforward
15:37 <mheld> I see
15:38 <mheld> I'd like to use just one database
15:39 <mheld> but it doesn't feel right to use a rdbms for the data munchey stuff we're doing
15:40 <benblack> why would you like to use one database?
15:41 <mheld> well, every type of data we have would have to have the same operations done to them
15:42 <mheld> I've got users which have different relationships to pieces of data
15:43 <mheld> and while it'd make sense to store the users in a rdbms
15:44 <mheld> there'd be a lot of weird data relating users to the other stuff
15:45 <benblack> you can of course try to push everything to the same db
15:45 <benblack> just letting you know it is common to have several databases
15:46 <ericflo> mheld: how much data are you dealing with?
15:48 <mheld> ericflo: now, not much... on the level of hundreds of megabytes
15:49 <mheld> but we're anticipating a much larger level within the next few weeks
15:49 <ericflo> oh wow, yeah any relational database out there can handle that
15:49 <ericflo> how much larger?
15:49 <mheld> orders of magnitude
15:49 <ericflo> mheld: how many?
15:49 <mheld> ideally terabytes
15:49 <mheld> petabytes eventually
15:51 <benblack> mheld: ok. this is a common problem.
15:51 <benblack> "well, ideally, we'll have more data than can possibly fit in anything we
can conceive of designing or building!"
15:51 <benblack> you have to be realistic and specific.
15:51 <mheld> we'll we've just acquired our first major partner
15:52 <mheld> and we're in talks with another
15:52 <mheld> and we need to start acquiring more data
15:53 <benblack> sure, big stuff coming
15:53 <benblack> you still have to do capacity planning
15:53 <benblack> because you aren't putting a petabyte in riak right now
15:53 <mheld> mhmm
15:53 <benblack> (or anything else)
15:54 <benblack> folks with petabytes of data to store and process have full time engineers
working on storage infrastructure.
15:54 <benblack> they aren't taking an existing system and running it as is
15:55 <mheld> I'm not doubting that, I just want to be sure I'm not looking down the wrong path
15:55 <mheld> I rather like riak
15:55 <benblack> step 1 is being realistic about data growth
15:55 <benblack> realistic and _specific_
15:56 <mheld> alright, say I've given up on the petabyte dream
15:56 <mheld> we're thinking hundreds of gigs
15:56 <benblack> ok
15:56 <benblack> and what kind of processing?
15:57 <mheld> we're essentially the history of the web and how people interact with it
15:57 <mheld> over time
15:57 <mheld> essentially capturing*
15:57 <mheld> I accidentally the verb
15:57 <mheld> we're doing trends, recommendations, and analytics
15:58 <mheld> market searching, info trending
16:01 <benblack> i don't want to scare you away from riak, which i love dearly, but you
are describing the sweet spot for hadoop.
16:01 <mheld> ha, it's funny that you mention that
16:02 <mheld> because I've just purchased a few hadoop books
16:03 <mheld> why would this fit hadoop better than riak?
16:04 <benblack> because you are describing taking in a large amount of behavioral data
and doing bulk analytics on it
16:04 <benblack> that's what hadoop is for
16:05 <mheld> hmm
16:13 <mheld> in my head, the only difference (data processing-wise) between riak and
hadoop/hbase is the internal data structure (k/v store vs columnar rdbms)
16:13 <mheld> is that not right?
16:13 <benblack> nope, not right
16:14 <mheld> they still both do mapreduce
16:14 <benblack> java and erlang are both programming languages
16:14 <jdmaturen> benblack and I are both human
16:15 <benblack> map reduce is a processing model, it implies nothing about whether a specific
implementation of that model is appropriate for a given application
16:15 <benblack> hadoop is built to do large-scale, batch processing for analysis
16:15 <benblack> riak's map reduce is primarily for interactive querying
16:16 <benblack> that one is written in java and the other (essentially) javascript is a
good indicator they are not for the same purpose
16:16 <mheld> how is that different?
16:16 <mheld> can you not use riak to do batch processing?
16:17 <benblack> wow, ok
16:17 <benblack> mheld: what programming languages do you generally use?
16:18 <mheld> if I'm pissing you off, I'll leave you alone, I'm just trying to understand
this and apparently failing
16:18 <mheld> day to day I'll use java, ruby, and scala
16:18 <benblack> ok
16:18 <benblack> why do you use ruby vs java?
16:19 <benblack> they are both programming languages, right?
16:19 <mheld> functional constructs that just aren't there in java
16:19 <mheld> is why I'd use ruby
16:19 <benblack> ok
16:19 <benblack> any kind of performance differences?
16:19 <mheld> java is generally faster for most things
16:19 <mheld> iirc
16:19 <benblack> ...by a lot.
16:20 <mheld> mhm
16:20 <benblack> but you want to write something straightforward, and compact
16:20 <benblack> less concerned about performance
16:20 <benblack> you'd probably use ruby, right?
16:21 <mheld> yes
16:21 <mheld> oh
16:21 <makmanalp> lambdaj is pretty neat, btw, but i can't help but feel that it's a hackjob
16:21 <makmanalp> and doesn't really give you everything
16:22 <mheld> benblack: I think I get what you're painting a picture of
16:22 <mheld> I'm just being retarded
16:22 <makmanalp> i'd be better off using something like clojure if the jvm is a must
16:22 <mheld> makmanalp: why not scala?
16:22 <makmanalp> mheld: scala looks weird to me, but that's a matter of personal taste
16:22 <makmanalp> i've used scheme a lot
16:23 <mheld> ah
16:23 <makmanalp> so clojure is familiar
16:23 <benblack> mheld: riak is really great at a lot of things (like being stupidly
simple to operate, scale up and down, tuning consistency, tuning backends, etc)
16:23 <benblack> mheld: but bulk analytics on a petabyte, not so much.
16:24 <mheld> hmm
16:24 <makmanalp> and so it programming languages, etc
16:24 <makmanalp> AI, depending on the professor
16:24 <mheld> makmanalp: ah sweet
16:25 <mheld> makmanalp: HTDP?
6:49 <davidc_> I knew I was missing a channel...
18:19 <mheld> how fast are riak mapreduce queries compared to sql queries?
18:19 <mheld> well, sql queries with some magic done on them
18:21 <ericflo> mheld: sql queries don't have some set time that they all take, and
neither do riak mapreduce jobs.
18:22 <ericflo> mheld: It also depends on what storage backend you use, and what
version of Riak, and what language you choose for your map/reduce tasks, and if you leverage
any of the built-in functions, etc.
18:24 <mheld> say I enter "greasy bacon" on google.com
18:25 <mheld> what happens?
18:25 <mheld> not as in GET requests
18:26 <mheld> as in database queries
18:27 <benblack> nothing remotely like what you will have with any database you've ever used
18:27 <benblack> they are precomputing enormous indices, then querying in parallel across
a huge number of machines
18:27 <benblack> the indices and the backend queries are specifically structured for the purpose
18:27 <mheld> how do they get it all done in such a small time?
18:28 <ericflo> 2 minutes of searching found this:
http://www.ams.org/samplings/feature-column/fcarc-pagerank
18:28 <benblack> mheld: as i just said.
18:35 <mheld> anybody know anything about
http://googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html ?
18:36 <benblack> nobody outside google.
18:36 <* jdmaturen> waits for the paper
18:36 <benblack> you might also enjoy the paper from google on Dremel
18:37 <jdmaturen> yes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment