PharkMillups/gist:468320

## gistfile1.txt
copious # seancribbs: are you taking questions ahead of time for tomorrows webinar ?

seancribbs # you can ask right now ;)

copious # well, I think others would be interested in the answers too
one question, what are the ramifcations of having millions of buckets

seancribbs # depending on the intended use, I advocate them

copious # and which backends are more appropriate for large # of small buckets vs. small
numbers of large buckets. I was thinking of further use for riak down our road,
and was of solving various data access patterns, one of which would involve millions
of buckets. Which as I understand it, would not be good to use with the  innostore
backend,  since that would be one innodb file per bucket

seancribbs # per bucket per partition

benblack # you are going to have that problem with any useful backend, at present,
i expect is there a reason you need millions of buckets in particular, or is
it just for the uri path?

seancribbs # inno is the best choice if you're going to do full-bucket
scans (list-keys)  for that reason, but at large numbers of buckets
it will be swapping file handles a lot

copious # I was pondering a schema where we have /messages/<message-id> -- this is
 one bucket with billions of documents

copious # each of those messages has an author, some "authors" have many thousands
of messages and I would have /authors/<author-id> bucket, with millions of authors in the
authors bucket

seancribbs # right, so you might create an author_1284295629_messages/ bucket that
uses inno, where other things use a different backend and that bucket would just
 contain lightweight objects that point to the original messages

copious # and to tie the messages to the authors, I could use Link:, or
I could have /<author-id>/<message-id> with basically a Link to the message document

benblack # suggest actually using links rather than proliferating buckets like that

copious # well, the issue is keeping the bi-directional links in place. It
would updating the /authors/<author-id> document quite a bit, and a not-insignificant
portion of the authors would end up having thousands upon thousands of Links to messages

seancribbs # benblack: the question then becomes whether you want to pay
the cost of listing keys or loading a huge object

benblack # indeed

seancribbs # copious: i think you are fine as long as you have only one author

benblack # copious: how many authors?

seancribbs # (per message)

benblack # perhaps i misunderstood which thing was numerous

copious # 1 author per message

seancribbs # then you should never have to change the link on the message

copious # in relational terms author has many messages, where many can be in
the thousands yup, the message -> author link is pretty trivial, is
that many times we will want to say" I want all messages from a particular author"
so I could either list all the keys in a /author-id bucket, or follow all
the links in a /authors/author-id document. I just figured that maintaining
the /author-id bucket would be easier than maintaining the /authors/author-id document

seancribbs # this is why knowing the cardinality is important
numerality?
i don't recall the correct word

copious # the scale of N in 1:N relationships ?

seancribbs # yeah

copious # yeah, in this case, N can become quite large

benblack # cardinality

seancribbs # then you definitely want their insertion to be bottlenecked by updating
the author benblack: i was right the first time then, heh

benblack # indeed

seancribbs # definitely _don't_ want


copious # seancribbs: agreed, thats why I was thinking that /<author-id> buckets would
 be the proper way to go in this situation

seancribbs # yes. it's not an ideal solution, but it should work

seancribbs # the message of my webinar tomorrow is "everything has tradeoffs"
copious # then its the manner of making sure the backend is okay with millions
of buckets. seancribbs: I comletely agree :-) or I should say "sucks the least
when having millions of buckets" seancribbs: thanks. benblack: thanks.
	copious # seancribbs: are you taking questions ahead of time for tomorrows webinar ?

	seancribbs # you can ask right now ;)

	copious # well, I think others would be interested in the answers too
	one question, what are the ramifcations of having millions of buckets

	seancribbs # depending on the intended use, I advocate them

	copious # and which backends are more appropriate for large # of small buckets vs. small
	numbers of large buckets. I was thinking of further use for riak down our road,
	and was of solving various data access patterns, one of which would involve millions
	of buckets. Which as I understand it, would not be good to use with the innostore
	backend, since that would be one innodb file per bucket

	seancribbs # per bucket per partition

	benblack # you are going to have that problem with any useful backend, at present,
	i expect is there a reason you need millions of buckets in particular, or is
	it just for the uri path?

	seancribbs # inno is the best choice if you're going to do full-bucket
	scans (list-keys) for that reason, but at large numbers of buckets
	it will be swapping file handles a lot

	copious # I was pondering a schema where we have /messages/<message-id> -- this is
	one bucket with billions of documents

	copious # each of those messages has an author, some "authors" have many thousands
	of messages and I would have /authors/<author-id> bucket, with millions of authors in the
	authors bucket

	seancribbs # right, so you might create an author_1284295629_messages/ bucket that
	uses inno, where other things use a different backend and that bucket would just
	contain lightweight objects that point to the original messages

	copious # and to tie the messages to the authors, I could use Link:, or
	I could have /<author-id>/<message-id> with basically a Link to the message document

	benblack # suggest actually using links rather than proliferating buckets like that

	copious # well, the issue is keeping the bi-directional links in place. It
	would updating the /authors/<author-id> document quite a bit, and a not-insignificant
	portion of the authors would end up having thousands upon thousands of Links to messages

	seancribbs # benblack: the question then becomes whether you want to pay
	the cost of listing keys or loading a huge object

	benblack # indeed

	seancribbs # copious: i think you are fine as long as you have only one author

	benblack # copious: how many authors?

	seancribbs # (per message)

	benblack # perhaps i misunderstood which thing was numerous

	copious # 1 author per message

	seancribbs # then you should never have to change the link on the message

	copious # in relational terms author has many messages, where many can be in
	the thousands yup, the message -> author link is pretty trivial, is
	that many times we will want to say" I want all messages from a particular author"
	so I could either list all the keys in a /author-id bucket, or follow all
	the links in a /authors/author-id document. I just figured that maintaining
	the /author-id bucket would be easier than maintaining the /authors/author-id document

	seancribbs # this is why knowing the cardinality is important
	numerality?
	i don't recall the correct word

	copious # the scale of N in 1:N relationships ?

	seancribbs # yeah

	copious # yeah, in this case, N can become quite large

	benblack # cardinality

	seancribbs # then you definitely want their insertion to be bottlenecked by updating
	the author benblack: i was right the first time then, heh

	benblack # indeed

	seancribbs # definitely _don't_ want


	copious # seancribbs: agreed, thats why I was thinking that /<author-id> buckets would
	be the proper way to go in this situation

	seancribbs # yes. it's not an ideal solution, but it should work

	seancribbs # the message of my webinar tomorrow is "everything has tradeoffs"
	copious # then its the manner of making sure the backend is okay with millions
	of buckets. seancribbs: I comletely agree :-) or I should say "sucks the least
	when having millions of buckets" seancribbs: thanks. benblack: thanks.