Created
August 4, 2010 20:00
-
-
Save PharkMillups/508693 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| 12:57 <siculars> hey gang. so i was looking over the bitcask-intro.pdf | |
| file again, | |
| http://downloads.basho.com/papers/bitcask-intro.pdf. | |
| if you look at the bitcask file layout on pgs 2-3 and | |
| read the following on pg 3 "After the append completes, | |
| an in-memory structure called a ”keydir” is updated." you | |
| get an idea for whats going on under the hood. My question is | |
| that in light of the issue of scanning all keys in a cluster for | |
| a map/reduce , why don't you just store the bucket info in the row header | |
| along with the key? that way your in-memory 'keydir' could be filtered by | |
| bucket or constructed differently. | |
| 12:59 <seancribbs> siculars: it is stored there, just not segregated from the key. | |
| 12:59 <seancribbs> you're essentially asking for a hash of hashes | |
| 13:00 <siculars> right , or some other mechanism. so the key column is basically | |
| bucket/key munged together. | |
| 13:02 <seancribbs> yes | |
| 13:02 <siculars> if it's only there for uniqueness, its not really that helpful. | |
| is it? | |
| 13:03 <seancribbs> actually it's more for lookups. i.e. "where is this piece of data | |
| in bitcask's file structure" | |
| 13:30 <justinsheehy> siculars: the only thing missing from that analysis is | |
| that bitcask doesn't know anything at all about buckets. it's just a | |
| binary-key/binary-value store. | |
| 13:31 <justinsheehy> hence the bucket awareness being in the bitcask | |
| 'kv backend, which is basically the riak-to-bitcask bridge | |
| 13:33 <siculars> but you could also bin hash the bucket and store | |
| it in a separate field , no ? just add two fields in bitcask for | |
| bucket and bucket size? something like bsz | ksz | vsz | b | k | v | |
| 13:33 <siculars> i dunno, just trying to think of ways out of | |
| the scan all keys in the cluster problem... | |
| 13:35 <justinsheehy> I am also thinking about that problem. :-) | |
| 13:35 <drev1> siculars: seems like that option would add Riak dependent | |
| functionality to Bitcask | |
| 13:36 <justinsheehy> and yes, changing bitcask to be less general-purpose | |
| would be one path if we wanted to go that way | |
| 13:36 <justinsheehy> like drev1 said: right now, bitcask knows nothing at all | |
| about Riak. it's just a local k/v store. | |
| 13:37 <siculars> true true... the bitcask/bucket thing is gonna be a headache fd wise. | |
| 13:38 <justinsheehy> yep | |
| 13:39 <siculars> if you are currently using the hash(bucket/key) to create | |
| your keys isn't there some way to branch your mem index by the bucket? | |
| 13:39 <justinsheehy> that way is pretty easy to do (and thus easy to commit | |
| to having soon if there's nothing else) but has its own kind of pain. hence | |
| still looking to see if there's a better way. | |
| 13:40 <siculars> data architectures of one bucket per user/date frequency, | |
| etc. are gonna get burned. | |
| 13:42 <justinsheehy> if they have to do it that way, yes. but I am not yet | |
| resigned to that. we'll see. | |
| 13:47 <justinsheehy> am hoping to not have to make bitcask too riak-aware, | |
| which is where the tension comes from. can definitely do it that way if we need | |
| to, but will be more work (as right now bitcask is | |
| entirely unaware of the contents of the key, etc.) and would also require | |
| either making an ongoing fork or else making bitcask less general-purpose useful. | |
| 13:48 <justinsheehy> I am confident that we'll find a good answer, especially | |
| since we already know some "okay" answers. it'll just take a bit of work. | |
| 13:49 <drev1> another possibility is adding an arbitrary flag field like | |
| memcache to Bitcask which could be used by Riak for bucket aware keys but that would | |
| increase the size of the in memory key dir | |
| 13:52 <justinsheehy> yeah... a separate problem we're also looking to solve | |
| is reducing/removing some of the RAM constraints imposed by bitcask. heh. | |
| 13:53 <justinsheehy> but it's an interesting idea. hm, | |
| 14:00 <siculars> true. redis has been doing a bunch of work to decrease | |
| their mem footprint... | |
| http://blog.zawodny.com/2010/07/25/1250000000-keyvalue-pairs-in-redis-2-0-0-rc3-on-a-32gb-machine/ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment