Note: It is not
eaccess
An example created when the ring folder is not owned by the user riak is running as:
2015-06-23 14:57:43.073 [error] <0.154.0>@riak_core_ring_manager:do_write_ringfile:236 Unable to write ring to "./data/ring/riak_core_ring.default.20150623185743" - {badmatch,{error,eacces}}
This is a POSIX error emitted by the operating system when a user attempts to open more files than allowed by their open files limit–discoverable with ulimit -n
when logged in as that user.
Some examples. Not comprehensive.
2015-06-23 13:53:59.481 [info] <0.1356.0> Lock failed trying deleting stale merge input files from "./data/bitcask/890602560248518965780370444936484965102833893376": {error,emfile}
2015-06-23 13:53:59.482 [error] <0.1369.0> Failed to open lock file ./data/bitcask/1141798154164767904846628775559596109106197299200/bitcask.write.lock: emfile
2015-06-23 13:53:59.485 [error] <0.1433.0> CRASH REPORT Process <0.1433.0> with 0 neighbours exited with reason: no function clause matching riak_kv_vnode:terminate({{badmatch,{error,emfile}},[{riak_kv_vnode,init,1,[{file,"src/riak_kv_vnode.erl"},{line,366}]},{...},...]}, undefined) line 935 in gen_fsm:terminate/7 line 589
You might see this expressed as {error,emfile}
tuples or emfile expressed as the Reason. Sometimes this error is translated into more readable text as below:
2015-06-23 13:54:08.730 [error] <0.2411.0> CRASH REPORT Process <0.2411.0> with 0 neighbours exited with reason: no match of right hand value {error,{db_open,"IO error: ./data/anti_entropy/1027618338748291114361965898003636498195577569280/CURRENT: Too many open files"}} in hashtree:new_segment_store/2 line 505 in gen_server:init_it/6 line 328
This can show up as a Result or in a badmatch error:
2013-09-30 07:00:02.790 [error] <0.279.0> Supervisor riak_core_sup had child riak_core_ring_manager started with riak_core_ring_manager:start_link() at <0.2142.0> exit with reason no match of right hand value {error,erofs} in riak_core_ring_manager:do_write_ringfile/1 line 127 in context child_terminated
2013-09-30 07:00:01.756 [error] <0.295.0> Supervisor riak_core_vnode_proxy_sup had child {riak_kv_vnode,125597796958124469533129165311555572001681702912} started with riak_core_vnode_proxy:start_link(riak_kv_vnode, 125597796958124469533129165311555572001681702912) at <0.784.0> exit with reason {{{badmatch,{error,{{badmatch,{error,erofs}},[{riak_kv_vnode,init,1,[{file,"src/riak_kv_vnode.erl"},{line,240}]},{riak_core_vnode,init,1,[{file,"src/riak_core_vnode.erl"},{line,132}]},{gen_fsm,init_it,6,[{file,"gen_fsm.erl"},{line,361}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}}},[{riak_core_vnode_manager,get_vnode,3,[{file,"src/riak_core_vnode_manager.erl"},{line,489}]},{riak_core_vnode_manager,handle_call,3,[{file,"src/riak_core_vnode_manager.erl"},{line,226}]},{gen_server,...},...]},...} in context shutdown_error
The maximum number of simultaneously open files and sockets depends on the maximum number of Erlang ports available, as well as on operating system-specific settings and limits. The maximum number of simultaneously open Erlang ports is configured at startup. For more information, see the +Q
command-line flag in the erl(1)
manual page in erts
.
This is very likely to show up as a badmatch error in the logs and a module level crash.
2015-04-05 14:33:02.966 UTC [error] <0.13268.56> CRASH REPORT Process <0.13268.56> with 1 neighbours exited with reason: no case clause matching {error,system_limit} in bitcask:get_filestate/2 line 1007 in gen_fsm:terminate/7 line 611
Some errors might indicate what is causing you to encounter the system_limit
like the following:
2015-04-05 09:44:05.910 UTC [error] <0.30949.36> CRASH REPORT Process <0.30949.36> with 0 neighbours exited with reason: {error,system_limit,[{erlang,open_port,[{spawn,"zlib_drv"},[binary]],[]},{zlib,open,0,[]},{zlib,zip,1,[]},{riak_kv_pb_object,process,2,[{file,"src/riak_kv_pb_object.erl"},{line,143}]},{riak_api_pb_server,process_message,4,[{file,"src/riak_api_pb_server.erl"},{line,223}]},{riak_api_pb_server,handle_message,3,[{file,"src/riak_api_pb_server.erl"},{line,200}]},{riak_api_pb_server,decode_buffer,1,[{file,"src/riak_api_pb_server.erl"},{line,172}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{...}]}]} in gen_server:terminate/6 line 747
Notice that the error is being encountered during the open port operation. This would indicate that you are attempting to exceed the maximum ports allocated to the Erlang VM.
#Bitcask Specific log strings
##CRC error
2015-05-12 19:29:51.179 UTC [error] <0.7621.751> fold_loop: CRC error at file /data/riak/bitcask/1341612831143602288194788811282525428199781826560/8183.bitcask.data offset 81745648, skipping 16777973 bytes
##Failed to merge
2015-04-29 01:02:37.343 UTC [error] <0.3431.4865> Failed to merge {["/data/riak/bitcask/742168800207099138150308704113737470919028244480/4629.bitcask.data","/data/riak/bitcask/742168800207099138150308704113737470919028244480/5269.bitcask.data","/data/riak/bitcask/742168800207099138150308704113737470919028244480/5861.bitcask.data","/data/riak/bitcask/742168800207099138150308704113737470919028244480/5863.bitcask.data","/data/riak/bitcask/742168800207099138150308704113737470919028244480/5876.bitcask.data","/data/riak/bitcask/742168800207099138150308704113737470919028244480/6970.bitcask.data"],[]}: {merge_locked,locked,"/data/riak/bitcask/742168800207099138150308704113737470919028244480"}
#LevelDB specific log strings
##Compaction error
2014/06/10-02:42:07.708991 8e Compaction error: IO error: /var/db/riak/leveldb/1313067877289483090573623091893535525472126894080/sst_2/004809.sst: No such file or directory
A properly-running LevelDB instance should not see any event containing the string "Compaction error" emitted to the LOG file. The remainder of the log message should provide a clue as to why compaction failed.
##waiting
The waiting
event can indicate LevelDB stalls. The frequency of occurrence is the most interesing aspect of these. These can happen in instances of degraded RAID arrays or during RAID rebuild.