Skip to content

Instantly share code, notes, and snippets.

@angrycub
Created June 29, 2015 16:46
Show Gist options
  • Save angrycub/dcd234068fac23aa6de4 to your computer and use it in GitHub Desktop.
Save angrycub/dcd234068fac23aa6de4 to your computer and use it in GitHub Desktop.
Examples of Riak Error Log Messages

eacces

Note: It is not eaccess

An example created when the ring folder is not owned by the user riak is running as:

2015-06-23 14:57:43.073 [error] <0.154.0>@riak_core_ring_manager:do_write_ringfile:236 Unable to write ring to "./data/ring/riak_core_ring.default.20150623185743" - {badmatch,{error,eacces}}

emfile

This is a POSIX error emitted by the operating system when a user attempts to open more files than allowed by their open files limit–discoverable with ulimit -n when logged in as that user.

Some examples. Not comprehensive.

2015-06-23 13:53:59.481 [info] <0.1356.0> Lock failed trying deleting stale merge input files from "./data/bitcask/890602560248518965780370444936484965102833893376": {error,emfile}
2015-06-23 13:53:59.482 [error] <0.1369.0> Failed to open lock file ./data/bitcask/1141798154164767904846628775559596109106197299200/bitcask.write.lock: emfile
2015-06-23 13:53:59.485 [error] <0.1433.0> CRASH REPORT Process <0.1433.0> with 0 neighbours exited with reason: no function clause matching riak_kv_vnode:terminate({{badmatch,{error,emfile}},[{riak_kv_vnode,init,1,[{file,"src/riak_kv_vnode.erl"},{line,366}]},{...},...]}, undefined) line 935 in gen_fsm:terminate/7 line 589

You might see this expressed as {error,emfile} tuples or emfile expressed as the Reason. Sometimes this error is translated into more readable text as below:

2015-06-23 13:54:08.730 [error] <0.2411.0> CRASH REPORT Process <0.2411.0> with 0 neighbours exited with reason: no match of right hand value {error,{db_open,"IO error: ./data/anti_entropy/1027618338748291114361965898003636498195577569280/CURRENT: Too many open files"}} in hashtree:new_segment_store/2 line 505 in gen_server:init_it/6 line 328

erofs

This can show up as a Result or in a badmatch error:

2013-09-30 07:00:02.790 [error] <0.279.0> Supervisor riak_core_sup had child riak_core_ring_manager started with riak_core_ring_manager:start_link() at <0.2142.0> exit with reason no match of right hand value {error,erofs} in riak_core_ring_manager:do_write_ringfile/1 line 127 in context child_terminated
2013-09-30 07:00:01.756 [error] <0.295.0> Supervisor riak_core_vnode_proxy_sup had child {riak_kv_vnode,125597796958124469533129165311555572001681702912} started with riak_core_vnode_proxy:start_link(riak_kv_vnode, 125597796958124469533129165311555572001681702912) at <0.784.0> exit with reason {{{badmatch,{error,{{badmatch,{error,erofs}},[{riak_kv_vnode,init,1,[{file,"src/riak_kv_vnode.erl"},{line,240}]},{riak_core_vnode,init,1,[{file,"src/riak_core_vnode.erl"},{line,132}]},{gen_fsm,init_it,6,[{file,"gen_fsm.erl"},{line,361}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}}},[{riak_core_vnode_manager,get_vnode,3,[{file,"src/riak_core_vnode_manager.erl"},{line,489}]},{riak_core_vnode_manager,handle_call,3,[{file,"src/riak_core_vnode_manager.erl"},{line,226}]},{gen_server,...},...]},...} in context shutdown_error

system_limit

The maximum number of simultaneously open files and sockets depends on the maximum number of Erlang ports available, as well as on operating system-specific settings and limits. The maximum number of simultaneously open Erlang ports is configured at startup. For more information, see the +Q command-line flag in the erl(1) manual page in erts.

This is very likely to show up as a badmatch error in the logs and a module level crash.

2015-04-05 14:33:02.966 UTC [error] <0.13268.56> CRASH REPORT Process <0.13268.56> with 1 neighbours exited with reason: no case clause matching {error,system_limit} in bitcask:get_filestate/2 line 1007 in gen_fsm:terminate/7 line 611

Some errors might indicate what is causing you to encounter the system_limit like the following:

2015-04-05 09:44:05.910 UTC [error] <0.30949.36> CRASH REPORT Process <0.30949.36> with 0 neighbours exited with reason: {error,system_limit,[{erlang,open_port,[{spawn,"zlib_drv"},[binary]],[]},{zlib,open,0,[]},{zlib,zip,1,[]},{riak_kv_pb_object,process,2,[{file,"src/riak_kv_pb_object.erl"},{line,143}]},{riak_api_pb_server,process_message,4,[{file,"src/riak_api_pb_server.erl"},{line,223}]},{riak_api_pb_server,handle_message,3,[{file,"src/riak_api_pb_server.erl"},{line,200}]},{riak_api_pb_server,decode_buffer,1,[{file,"src/riak_api_pb_server.erl"},{line,172}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{...}]}]} in gen_server:terminate/6 line 747

Notice that the error is being encountered during the open port operation. This would indicate that you are attempting to exceed the maximum ports allocated to the Erlang VM.


#Bitcask Specific log strings ##CRC error

2015-05-12 19:29:51.179 UTC [error] <0.7621.751> fold_loop: CRC error at file /data/riak/bitcask/1341612831143602288194788811282525428199781826560/8183.bitcask.data offset 81745648, skipping 16777973 bytes 

##Failed to merge

2015-04-29 01:02:37.343 UTC [error] <0.3431.4865> Failed to merge {["/data/riak/bitcask/742168800207099138150308704113737470919028244480/4629.bitcask.data","/data/riak/bitcask/742168800207099138150308704113737470919028244480/5269.bitcask.data","/data/riak/bitcask/742168800207099138150308704113737470919028244480/5861.bitcask.data","/data/riak/bitcask/742168800207099138150308704113737470919028244480/5863.bitcask.data","/data/riak/bitcask/742168800207099138150308704113737470919028244480/5876.bitcask.data","/data/riak/bitcask/742168800207099138150308704113737470919028244480/6970.bitcask.data"],[]}: {merge_locked,locked,"/data/riak/bitcask/742168800207099138150308704113737470919028244480"}

#LevelDB specific log strings ##Compaction error

2014/06/10-02:42:07.708991 8e Compaction error: IO error: /var/db/riak/leveldb/1313067877289483090573623091893535525472126894080/sst_2/004809.sst: No such file or directory 

A properly-running LevelDB instance should not see any event containing the string "Compaction error" emitted to the LOG file. The remainder of the log message should provide a clue as to why compaction failed.


##waiting

The waiting event can indicate LevelDB stalls. The frequency of occurrence is the most interesing aspect of these. These can happen in instances of degraded RAID arrays or during RAID rebuild.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment