Skip to content

Instantly share code, notes, and snippets.

@PharkMillups
Created August 4, 2010 20:00
Show Gist options
  • Select an option

  • Save PharkMillups/508693 to your computer and use it in GitHub Desktop.

Select an option

Save PharkMillups/508693 to your computer and use it in GitHub Desktop.
12:57 <siculars> hey gang. so i was looking over the bitcask-intro.pdf
file again,
http://downloads.basho.com/papers/bitcask-intro.pdf.
if you look at the bitcask file layout on pgs 2-3 and
read the following on pg 3 "After the append completes,
an in-memory structure called a ”keydir” is updated." you
get an idea for whats going on under the hood. My question is
that in light of the issue of scanning all keys in a cluster for
a map/reduce , why don't you just store the bucket info in the row header
along with the key? that way your in-memory 'keydir' could be filtered by
bucket or constructed differently.
12:59 <seancribbs> siculars: it is stored there, just not segregated from the key.
12:59 <seancribbs> you're essentially asking for a hash of hashes
13:00 <siculars> right , or some other mechanism. so the key column is basically
bucket/key munged together.
13:02 <seancribbs> yes
13:02 <siculars> if it's only there for uniqueness, its not really that helpful.
is it?
13:03 <seancribbs> actually it's more for lookups. i.e. "where is this piece of data
in bitcask's file structure"
13:30 <justinsheehy> siculars: the only thing missing from that analysis is
that bitcask doesn't know anything at all about buckets. it's just a
binary-key/binary-value store.
13:31 <justinsheehy> hence the bucket awareness being in the bitcask
'kv backend, which is basically the riak-to-bitcask bridge
13:33 <siculars> but you could also bin hash the bucket and store
it in a separate field , no ? just add two fields in bitcask for
bucket and bucket size? something like bsz | ksz | vsz | b | k | v
13:33 <siculars> i dunno, just trying to think of ways out of
the scan all keys in the cluster problem...
13:35 <justinsheehy> I am also thinking about that problem. :-)
13:35 <drev1> siculars: seems like that option would add Riak dependent
functionality to Bitcask
13:36 <justinsheehy> and yes, changing bitcask to be less general-purpose
would be one path if we wanted to go that way
13:36 <justinsheehy> like drev1 said: right now, bitcask knows nothing at all
about Riak. it's just a local k/v store.
13:37 <siculars> true true... the bitcask/bucket thing is gonna be a headache fd wise.
13:38 <justinsheehy> yep
13:39 <siculars> if you are currently using the hash(bucket/key) to create
your keys isn't there some way to branch your mem index by the bucket?
13:39 <justinsheehy> that way is pretty easy to do (and thus easy to commit
to having soon if there's nothing else) but has its own kind of pain. hence
still looking to see if there's a better way.
13:40 <siculars> data architectures of one bucket per user/date frequency,
etc. are gonna get burned.
13:42 <justinsheehy> if they have to do it that way, yes. but I am not yet
resigned to that. we'll see.
13:47 <justinsheehy> am hoping to not have to make bitcask too riak-aware,
which is where the tension comes from. can definitely do it that way if we need
to, but will be more work (as right now bitcask is
entirely unaware of the contents of the key, etc.) and would also require
either making an ongoing fork or else making bitcask less general-purpose useful.
13:48 <justinsheehy> I am confident that we'll find a good answer, especially
since we already know some "okay" answers. it'll just take a bit of work.
13:49 <drev1> another possibility is adding an arbitrary flag field like
memcache to Bitcask which could be used by Riak for bucket aware keys but that would
increase the size of the in memory key dir
13:52 <justinsheehy> yeah... a separate problem we're also looking to solve
is reducing/removing some of the RAM constraints imposed by bitcask. heh.
13:53 <justinsheehy> but it's an interesting idea. hm,
14:00 <siculars> true. redis has been doing a bunch of work to decrease
their mem footprint...
http://blog.zawodny.com/2010/07/25/1250000000-keyvalue-pairs-in-redis-2-0-0-rc3-on-a-32gb-machine/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment