Created
July 8, 2010 17:14
-
-
Save PharkMillups/468320 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
copious # seancribbs: are you taking questions ahead of time for tomorrows webinar ? | |
seancribbs # you can ask right now ;) | |
copious # well, I think others would be interested in the answers too | |
one question, what are the ramifcations of having millions of buckets | |
seancribbs # depending on the intended use, I advocate them | |
copious # and which backends are more appropriate for large # of small buckets vs. small | |
numbers of large buckets. I was thinking of further use for riak down our road, | |
and was of solving various data access patterns, one of which would involve millions | |
of buckets. Which as I understand it, would not be good to use with the innostore | |
backend, since that would be one innodb file per bucket | |
seancribbs # per bucket per partition | |
benblack # you are going to have that problem with any useful backend, at present, | |
i expect is there a reason you need millions of buckets in particular, or is | |
it just for the uri path? | |
seancribbs # inno is the best choice if you're going to do full-bucket | |
scans (list-keys) for that reason, but at large numbers of buckets | |
it will be swapping file handles a lot | |
copious # I was pondering a schema where we have /messages/<message-id> -- this is | |
one bucket with billions of documents | |
copious # each of those messages has an author, some "authors" have many thousands | |
of messages and I would have /authors/<author-id> bucket, with millions of authors in the | |
authors bucket | |
seancribbs # right, so you might create an author_1284295629_messages/ bucket that | |
uses inno, where other things use a different backend and that bucket would just | |
contain lightweight objects that point to the original messages | |
copious # and to tie the messages to the authors, I could use Link:, or | |
I could have /<author-id>/<message-id> with basically a Link to the message document | |
benblack # suggest actually using links rather than proliferating buckets like that | |
copious # well, the issue is keeping the bi-directional links in place. It | |
would updating the /authors/<author-id> document quite a bit, and a not-insignificant | |
portion of the authors would end up having thousands upon thousands of Links to messages | |
seancribbs # benblack: the question then becomes whether you want to pay | |
the cost of listing keys or loading a huge object | |
benblack # indeed | |
seancribbs # copious: i think you are fine as long as you have only one author | |
benblack # copious: how many authors? | |
seancribbs # (per message) | |
benblack # perhaps i misunderstood which thing was numerous | |
copious # 1 author per message | |
seancribbs # then you should never have to change the link on the message | |
copious # in relational terms author has many messages, where many can be in | |
the thousands yup, the message -> author link is pretty trivial, is | |
that many times we will want to say" I want all messages from a particular author" | |
so I could either list all the keys in a /author-id bucket, or follow all | |
the links in a /authors/author-id document. I just figured that maintaining | |
the /author-id bucket would be easier than maintaining the /authors/author-id document | |
seancribbs # this is why knowing the cardinality is important | |
numerality? | |
i don't recall the correct word | |
copious # the scale of N in 1:N relationships ? | |
seancribbs # yeah | |
copious # yeah, in this case, N can become quite large | |
benblack # cardinality | |
seancribbs # then you definitely want their insertion to be bottlenecked by updating | |
the author benblack: i was right the first time then, heh | |
benblack # indeed | |
seancribbs # definitely _don't_ want | |
copious # seancribbs: agreed, thats why I was thinking that /<author-id> buckets would | |
be the proper way to go in this situation | |
seancribbs # yes. it's not an ideal solution, but it should work | |
seancribbs # the message of my webinar tomorrow is "everything has tradeoffs" | |
copious # then its the manner of making sure the backend is okay with millions | |
of buckets. seancribbs: I comletely agree :-) or I should say "sucks the least | |
when having millions of buckets" seancribbs: thanks. benblack: thanks. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment