PharkMillups/13

## 13
johne # I have some questions about limits and was hoping someone could help
me out. If I understand correctly, riak does not impose a limit on the number
of buckets, but the chosen backstore may impose some practical limits. Is this correct?

arg # pretty much, yes

johne # There appears to be a similar situation with the number of keys.

arg # innostore currently creates a file per bucket/partition combo
but all other backends use one file per partition unless you really want
innostore, we recommend you use bitcask one other thing with buckets: buckets
dont consume any resources as long as they use the bucket defaults - either
the stock riak defaults or ones you set in your app.config buckets that
change some of those defaults take up a small amount of space in the ring
data structure that's gossiped around

johne # ok, good to know

arg # number of keys, not so much

johne # does bitcask use only a single active data file for all buckets?

arg # bitcask does keep a small amount of metadata for each key in RAM
it uses a single data file for each partition
many bucket/key pairs can reside in a single partition

johne # It looks like bitcask limits the number of keys by the size of the
keydir, which must be memory resident? That would be key hash + key metadata?

arg # yeah it keeps in memory just key hash + file id + offset
we have some ideas on how to compress that further

johne # How big would that be? what is the size of each field there

arg # i think its around 32 bytes per key 20 bytes hash + 4 bytes fileid + 8 bytes offset

johne # ok, thanks that is very helpful

arg # could be 4 bytes offset, not sure

johne # Any additional overhead for keydir?

arg # some small fixed overhead for the hash data structure
but nothing to worry about if you're doing back-of-the-envelope calculations
all that stuff is in relatively optimized C code

johne # I am hoping to go beyond back-of-the-envelope....

arg # IOW the overhead of the container for that metadata wont push you over the edge if you do calculations based strictly on the per-key overhead

johne # I am looking a migrating an app that currently is using a RDBMS. Current limits because of application are around 64 billion stored objects.
Currently investigating another solution because we are starting to get worried about that limit

roidrage # arg: may i propose putting this discussion in the next recap? very valueable.

arg # roidrage: absolutely

johne : if you spread that over several nodes you can probably fit the bitcask metadata in ram
or you could use innostore
but we're also actively working on a compressed in-memory keydir format that will be much smaller than 32bytes / key

arg # using burst tries : http://portal.acm.org/citation.cfm?id=506312


benblack # http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499

johne # When you say spread that over several nodes, are you meaning split the keys?

benblack # for those without acm logins

arg # riak will do that for you

benblack # you should clarify what "that" is

arg # the more nodes you have the less per-node bitcask overhead there is

benblack: replica placement

benblack # i know

johne # got it... I misunderstood.

benblack # see :P
johne # So, if I understand correctly, riak could handle a larger number of
keys than what the backend supports. Is that correct?

arg # yeah, by spreading the keys over a number of hosts

johne # yes, very good. Any known limits there?

arg # as far as number of hosts?
johne # No, max keys is it just a factor of the number of nodes?

arg # for bitcask yes
johne # great
arg # if you use innostore there's no per-key overhead but its a bit slower than bitcask

johne # Aside from the performance aspect, is the only other limit of
innostore the open file handles?

arg # it caches file handles in an LRU cache, and you can configure the max open handles
innostore can also take much longer to recover from a crash than bitcask

johne # If I take the rough numbers you have provided me here, could I reliably predict number of nodes needed for the desired number of keys?

arg # i think so, yes. also remember that each item is stored on 3 physical nodes (by default), so take that into account

johne # yes, got that. And also that I should stick to bucket defaults, which I think works for me.

arg # it seems to me that if you wanted to deploy this right now, innostore is probably what you want if you have 64 billion objects

johne # Starting to reach those limits... What happens when a particular
node is "full" can additional nodes be added and the re-balancing 'fixes' it so to speak?

arg # yep, you can always add new nodes and they will take over a
fair share of they keyspace

johne # ok

arg # gotta run out for ~10mins but ill be back

johne # Thanks a lot for the help. I think I have enough to work with for the moment. I am trying to
ensue what we are attempting to do would be possible before attempting to structure the data.

arg # always feel free to email riak@basho.com or the riak-users list if you have more questions

johne # Thanks again

arg # any time!
	johne # I have some questions about limits and was hoping someone could help
	me out. If I understand correctly, riak does not impose a limit on the number
	of buckets, but the chosen backstore may impose some practical limits. Is this correct?

	arg # pretty much, yes

	johne # There appears to be a similar situation with the number of keys.

	arg # innostore currently creates a file per bucket/partition combo
	but all other backends use one file per partition unless you really want
	innostore, we recommend you use bitcask one other thing with buckets: buckets
	dont consume any resources as long as they use the bucket defaults - either
	the stock riak defaults or ones you set in your app.config buckets that
	change some of those defaults take up a small amount of space in the ring
	data structure that's gossiped around

	johne # ok, good to know

	arg # number of keys, not so much

	johne # does bitcask use only a single active data file for all buckets?

	arg # bitcask does keep a small amount of metadata for each key in RAM
	it uses a single data file for each partition
	many bucket/key pairs can reside in a single partition

	johne # It looks like bitcask limits the number of keys by the size of the
	keydir, which must be memory resident? That would be key hash + key metadata?

	arg # yeah it keeps in memory just key hash + file id + offset
	we have some ideas on how to compress that further

	johne # How big would that be? what is the size of each field there

	arg # i think its around 32 bytes per key 20 bytes hash + 4 bytes fileid + 8 bytes offset

	johne # ok, thanks that is very helpful

	arg # could be 4 bytes offset, not sure

	johne # Any additional overhead for keydir?

	arg # some small fixed overhead for the hash data structure
	but nothing to worry about if you're doing back-of-the-envelope calculations
	all that stuff is in relatively optimized C code

	johne # I am hoping to go beyond back-of-the-envelope....

	arg # IOW the overhead of the container for that metadata wont push you over the edge if you do calculations based strictly on the per-key overhead

	johne # I am looking a migrating an app that currently is using a RDBMS. Current limits because of application are around 64 billion stored objects.
	Currently investigating another solution because we are starting to get worried about that limit

	roidrage # arg: may i propose putting this discussion in the next recap? very valueable.

	arg # roidrage: absolutely

	johne : if you spread that over several nodes you can probably fit the bitcask metadata in ram
	or you could use innostore
	but we're also actively working on a compressed in-memory keydir format that will be much smaller than 32bytes / key

	arg # using burst tries : http://portal.acm.org/citation.cfm?id=506312


	benblack # http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499

	johne # When you say spread that over several nodes, are you meaning split the keys?

	benblack # for those without acm logins

	arg # riak will do that for you

	benblack # you should clarify what "that" is

	arg # the more nodes you have the less per-node bitcask overhead there is

	benblack: replica placement

	benblack # i know

	johne # got it... I misunderstood.

	benblack # see :P
	johne # So, if I understand correctly, riak could handle a larger number of
	keys than what the backend supports. Is that correct?

	arg # yeah, by spreading the keys over a number of hosts

	johne # yes, very good. Any known limits there?

	arg # as far as number of hosts?
	johne # No, max keys is it just a factor of the number of nodes?

	arg # for bitcask yes
	johne # great
	arg # if you use innostore there's no per-key overhead but its a bit slower than bitcask

	johne # Aside from the performance aspect, is the only other limit of
	innostore the open file handles?

	arg # it caches file handles in an LRU cache, and you can configure the max open handles
	innostore can also take much longer to recover from a crash than bitcask

	johne # If I take the rough numbers you have provided me here, could I reliably predict number of nodes needed for the desired number of keys?

	arg # i think so, yes. also remember that each item is stored on 3 physical nodes (by default), so take that into account

	johne # yes, got that. And also that I should stick to bucket defaults, which I think works for me.

	arg # it seems to me that if you wanted to deploy this right now, innostore is probably what you want if you have 64 billion objects

	johne # Starting to reach those limits... What happens when a particular
	node is "full" can additional nodes be added and the re-balancing 'fixes' it so to speak?

	arg # yep, you can always add new nodes and they will take over a
	fair share of they keyspace

	johne # ok

	arg # gotta run out for ~10mins but ill be back

	johne # Thanks a lot for the help. I think I have enough to work with for the moment. I am trying to
	ensue what we are attempting to do would be possible before attempting to structure the data.

	arg # always feel free to email riak@basho.com or the riak-users list if you have more questions

	johne # Thanks again

	arg # any time!