public
Created

  • Download Gist
Riak Recap 6/13
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129
johne # I have some questions about limits and was hoping someone could help
me out. If I understand correctly, riak does not impose a limit on the number
of buckets, but the chosen backstore may impose some practical limits. Is this correct?
 
arg # pretty much, yes
 
johne # There appears to be a similar situation with the number of keys.
 
arg # innostore currently creates a file per bucket/partition combo
but all other backends use one file per partition unless you really want
innostore, we recommend you use bitcask one other thing with buckets: buckets
dont consume any resources as long as they use the bucket defaults - either
the stock riak defaults or ones you set in your app.config buckets that
change some of those defaults take up a small amount of space in the ring
data structure that's gossiped around
 
johne # ok, good to know
 
arg # number of keys, not so much
 
johne # does bitcask use only a single active data file for all buckets?
 
arg # bitcask does keep a small amount of metadata for each key in RAM
it uses a single data file for each partition
many bucket/key pairs can reside in a single partition
 
johne # It looks like bitcask limits the number of keys by the size of the
keydir, which must be memory resident? That would be key hash + key metadata?
 
arg # yeah it keeps in memory just key hash + file id + offset
we have some ideas on how to compress that further
 
johne # How big would that be? what is the size of each field there
 
arg # i think its around 32 bytes per key 20 bytes hash + 4 bytes fileid + 8 bytes offset
 
johne # ok, thanks that is very helpful
 
arg # could be 4 bytes offset, not sure
 
johne # Any additional overhead for keydir?
 
arg # some small fixed overhead for the hash data structure
but nothing to worry about if you're doing back-of-the-envelope calculations
all that stuff is in relatively optimized C code
 
johne # I am hoping to go beyond back-of-the-envelope....
 
arg # IOW the overhead of the container for that metadata wont push you over the edge if you do calculations based strictly on the per-key overhead
 
johne # I am looking a migrating an app that currently is using a RDBMS. Current limits because of application are around 64 billion stored objects.
Currently investigating another solution because we are starting to get worried about that limit
 
roidrage # arg: may i propose putting this discussion in the next recap? very valueable.
 
arg # roidrage: absolutely
 
johne : if you spread that over several nodes you can probably fit the bitcask metadata in ram
or you could use innostore
but we're also actively working on a compressed in-memory keydir format that will be much smaller than 32bytes / key
 
arg # using burst tries : http://portal.acm.org/citation.cfm?id=506312
 
 
benblack # http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499
 
johne # When you say spread that over several nodes, are you meaning split the keys?
 
benblack # for those without acm logins
 
arg # riak will do that for you
 
benblack # you should clarify what "that" is
 
arg # the more nodes you have the less per-node bitcask overhead there is
 
benblack: replica placement
 
benblack # i know
 
johne # got it... I misunderstood.
 
benblack # see :P
johne # So, if I understand correctly, riak could handle a larger number of
keys than what the backend supports. Is that correct?
 
arg # yeah, by spreading the keys over a number of hosts
 
johne # yes, very good. Any known limits there?
 
arg # as far as number of hosts?
johne # No, max keys is it just a factor of the number of nodes?
 
arg # for bitcask yes
johne # great
arg # if you use innostore there's no per-key overhead but its a bit slower than bitcask
 
johne # Aside from the performance aspect, is the only other limit of
innostore the open file handles?
 
arg # it caches file handles in an LRU cache, and you can configure the max open handles
innostore can also take much longer to recover from a crash than bitcask
 
johne # If I take the rough numbers you have provided me here, could I reliably predict number of nodes needed for the desired number of keys?
 
arg # i think so, yes. also remember that each item is stored on 3 physical nodes (by default), so take that into account
 
johne # yes, got that. And also that I should stick to bucket defaults, which I think works for me.
 
arg # it seems to me that if you wanted to deploy this right now, innostore is probably what you want if you have 64 billion objects
 
johne # Starting to reach those limits... What happens when a particular
node is "full" can additional nodes be added and the re-balancing 'fixes' it so to speak?
 
arg # yep, you can always add new nodes and they will take over a
fair share of they keyspace
 
johne # ok
 
arg # gotta run out for ~10mins but ill be back
 
johne # Thanks a lot for the help. I think I have enough to work with for the moment. I am trying to
ensue what we are attempting to do would be possible before attempting to structure the data.
 
arg # always feel free to email riak@basho.com or the riak-users list if you have more questions
 
johne # Thanks again
 
arg # any time!

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.