Skip to content

Instantly share code, notes, and snippets.

@gburd
Created April 9, 2012 14:57
Show Gist options
  • Star 16 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save gburd/b88aee6da7fee81dc036 to your computer and use it in GitHub Desktop.
Save gburd/b88aee6da7fee81dc036 to your computer and use it in GitHub Desktop.
Diagnose and fix a corrupt LevelDB SST file preventing Riak from running.

Summary

LevelDB can become corrupted when bad things happen on the filesystem or in hardware. We push the I/O to the limits on heavily loaded Riak nodes so it is not uncommon to experience such failures. This one exhibits as a message Compaction error: Corruption: corrupted compressed block contents in the [data_root]/leveldb/[vnode]/LOG file.

Diagnosis

Steps that pin-point this issue

# find . -name "LOG" -exec grep -l 'Compaction error' {} \; 
./442446784738847563128068650529343492278651453440/LOG 
./448155775509671402652301794407141472824182439936/LOG 
./145579264656007907867945168883848503911040155648/LOG

2012/03/18-16:45:55.649589 57 Compaction error: Corruption: corrupted compressed block contents 
2012/03/18-16:45:55.649643 4b waiting... 
2012/03/18-16:45:56.105357 57 Skipping expansion on level 0 from 12 to 12 files 
2012/03/18-16:45:56.105418 57 Compacting 12@0 + 5@1 files 
2012/03/18-16:45:56.111994 57 Generated table #200557: 162 keys, 174112 bytes 
2012/03/18-16:45:56.169928 57 Generated table #200558: 224 keys, 2112499 bytes 
2012/03/18-16:45:56.227341 57 Generated table #200559: 239 keys, 2111625 bytes 
2012/03/18-16:45:56.285007 57 Generated table #200560: 230 keys, 2108929 bytes 
2012/03/18-16:45:56.341888 57 Generated table #200561: 223 keys, 2109107 bytes 
2012/03/18-16:45:56.375369 57 Generated table #200562: 116 keys, 1287455 bytes 
2012/03/18-16:45:56.429633 57 compacted to: files[ 12 5 54 200 0 0 0 ] 
2012/03/18-16:45:56.430168 57 Delete type=2 #200557 
2012/03/18-16:45:56.430327 57 Delete type=2 #200559 
2012/03/18-16:45:56.430871 57 Delete type=2 #200562 
2012/03/18-16:45:56.431242 57 Delete type=2 #200563 
2012/03/18-16:45:56.432571 57 Delete type=2 #200558 
2012/03/18-16:45:56.433146 57 Delete type=2 #200560 
2012/03/18-16:45:56.433723 57 Delete type=2 #200561 
2012/03/18-16:45:56.434338 57 Compaction error: Corruption: corrupted compressed block contents

Which indicates that these vnode's LevelDB databases are in need of repair. We can do that, but it's very odd to have more than one corrupt at any given moment. This may be indicative of a larger issue.

Concerns

Things to consider post-diagnosis and pre-solution

  • Finding one compaction error is interesting, more than one might be a strong indication of a hardware or OS bug.

Solution: steps to address this issue

  1. we assume that riak isn't running, to begin you need to start an Erlang session (do not start riak, we just want Erlang)
/opt/local/riak/erts-5.8.5/bin/erl
  1. from erlang console perform the following command to open the LevelDB database
[application:set_env(eleveldb, Var, Val) || {Var, Val} <- 
[{max_open_files, 2000}, 
{block_size, 1048576}, 
{cache_size, 20*1024*1024*1024}, 
{sync, false}, 
{data_root, "/var/db/riak/leveldb"}]].
  1. For each of the corrupted LevelDB databases (found by # find . -name "LOG" -exec grep -l 'Compaction error' {} \; ) run this command substituting in the proper vnode number.
eleveldb:repair("/var/db/riak/leveldb/442446784738847563128068650529343492278651453440", []).
  1. When all have finished successfully you may restart the node
riak start
  1. Check for proper operation by looking at log files in /var/log/riak and in the LOG files in the effected LevelDB vnodes.

  2. Contact us with any concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment