Skip to content

Instantly share code, notes, and snippets.

@PharkMillups
Created October 8, 2010 19:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save PharkMillups/617383 to your computer and use it in GitHub Desktop.
Save PharkMillups/617383 to your computer and use it in GitHub Desktop.
08:03 <allen> when I want to drop all data in a bucket, if I do remove a bucket and create it
again, will the data will be deleted too?
08:03 <seancribbs> there's no "delete" for buckets
08:03 benchMark joined
08:05 <allen> hi seancribbs, i miscalculated the data, which caused disk-full on all three nodes
08:05 <allen> I need to re-insert the data after deleting all
08:06 <allen> I can remove data directory, and do it again, is it safe?
08:07 <seancribbs> remove just the bitcask directory, not the ring
08:08 <allen> k, thanks
08:08 <allen> I wish 'remove bucket' could do it.
08:09 <allen> or, 'delete bucket'
08:13 <justinsheehy> I agree that it'd be mighty convenient, but.
08:13 <justinsheehy> here's the trick, allen: how exactly should 'delete bucket' work if
one node is briefly out of communication while you're doing that sweeping delete, and you
issue new inserts to that bucket right after the delete?
08:14 <seancribbs> also, it'll be great until someone does "delete bucket" on production data
08:15 <allen> justinsheehy: you can
08:15 <allen> 'disable bucket' before delete it
08:15 <justinsheehy> what happens if a node is unreachable during the disable?
08:15 <allen> or 'delete bucket' will 'disable' it first, then do it
08:16 <allen> then, 'dislable' will return error, so that we cannot delete it
08:16 <justinsheehy> we don't currently support global transactions across a Riak cluster,
so that's hard.
08:17 <allen> y justinsheehy, it would be hard, it was just my wish
08:17 <justinsheehy> also, a bucket is just a namespace -- not a really separate table-like thing.
08:17 <allen> I can delete the data files bymyself.
08:17 <justinsheehy> so it's unclear what it really means to disable it. the storage for a
bucket is not separate from
the rest of the storage.
08:18 <justinsheehy> I'm not saying you shouldn't wish for it, just trying to share some
understanding of why it's not as simple an operation as it might seem.
08:19 <allen> I meant 'disable' as the blocking read/write operations, not as a seperate space.
08:19 <justinsheehy> right, but then we need global transactions and also we need every single
request to look up somewhere to decide if that request is for an item in a disabled namespace.
08:19 <allen> y, it will be costly
08:20 <allen> I don't like anything that sacrifices performance
08:20 <justinsheehy> (only actually _need_ transactions for that if you want certain guarantees
along with it, but you certainly would need to take action insisting on all nodes being up and aware
for the duration)
08:21 <justinsheehy> which is a constraint not imposed by any other current user-facing behavior in riak
08:22 <allen> hmm, not easy, eh?
08:29 <justinsheehy> much easier would be a "best effort" bucket delete, that trawls the whole
cluster and deletes everything reachable within a given bucket.
08:32 <allen> not a big deal, 'delete bucket' won't happen often (but others may wish to have, as
well as renaming bucket)
08:33 <justinsheehy> renaming would be MUCH harder, since buckets are not at all like separate
tables. the storage is not independent.
08:34 <justinsheehy> it would literally have to be "for all keys K in bucket B1,
put(B2,K) and delete(B1,K)" and since you can't atomically tie the put and delete to different
keys together... messy.
08:35 <justinsheehy> what should happen if some nodes go down when you're half done?
08:36 <allen> y that would be messy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment