Skip to content

Instantly share code, notes, and snippets.

@slfritchie
Created September 12, 2013 00:43
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save slfritchie/34f39ea0ab454eb8f53f to your computer and use it in GitHub Desktop.
Save slfritchie/34f39ea0ab454eb8f53f to your computer and use it in GitHub Desktop.
Greg's plan for this afternoon:
handoff 1
A+??
riak_core_node_watcher:service_down(riak_kv).
/usr/lib64/riak/erts-5.8.4/bin/escript msgq.escript /etc/riak/vm.args
riak_core_node_watcher:service_up(riak_kv, whereis(riak_kv_sup)).
For setting 'handoff_concurrency' value:
I = fun(Num) -> rpc:multicall(lists:sublist([node()|nodes()], Num), application, set_env, [riak_core, handoff_concurrency, 2]), rpc:multicall(lists:sublist([node()|nodes()], Num), riak_core_vnode_manager, force_handoffs, []) end.
% AllFallbacks = fun(Ring) ->
% [{N,Fallbacks(N, Ring)} || N <- riak_core_ring:all_members(Ring), lists:member(N, [node()|nodes()])]
% end.
%
% rpc:multicall([element(1, N) || N <- lists:sublist(AllFallbacks(element(2, riak_core_ring_manager:get_raw_ring())), 3)], application, set_env, [riak_core, handoff_concurrency, 1]).
ssh -p 2222 -l basho prod-04.XXXer.com
[12/7/11 2:36:21 PM] Greg Burd: msgq.escript
[12/7/11 2:36:37 PM] Greg Burd: /usr/lib64/riak/erts-5.8.4/bin/escript msgq.escript /etc/riak/vm.args
[12/7/11 2:37:17 PM] Greg Burd: basho@prod-08:~$ sudo cluster "grep ERL_MAX_PORTS /etc/riak/vm.args"
[12/7/11 2:37:31 PM] Greg Burd: [sudo] password for basho:
Wed Dec  7 12:37:10 PST 2011
+++ Processing: ........................................................................................................
#--- prod-04:
-env ERL_MAX_PORTS 32768
#--- prod-05:
-env ERL_MAX_PORTS 32768
[...]
------------------
hd(lists:reverse(lists:sort([{ process_info(Pid,message_queue_len), Pid} || Pid <- processes()]))).
OwnedPartitions = fun(Owners, Node) ->
[P || {P, Owner} <- Owners, Owner =:= Node]
end.
VnodePids = fun(Node) -> [Pid || {_,Pid,_,_} <- supervisor:which_children({riak_core_vnode_sup, Node}), is_pid(Pid)] end.
RunningVnodes = fun(Node) ->
Pids = VnodePids(Node),
[rpc:call(Node, riak_core_vnode, get_mod_index, [Pid], 30000) || Pid <- Pids]
end.
ActivePartitions = fun(Node) ->
lists:foldl(fun({_,P}, Ps) ->
ordsets:add_element(P, Ps)
end, [], RunningVnodes(Node))
end.
Fallbacks = fun(Node, Ring) ->
Owners = riak_core_ring:all_owners(Ring),
Owned = ordsets:from_list(OwnedPartitions(Owners, Node)),
Active = ordsets:from_list(ActivePartitions(Node)),
Stopped = ordsets:subtract(Owned, Active),
ordsets:subtract(Active, Owned)
end.
--------------------------
[12/7/11 2:42:40 PM] Scott Fritchie: +A 257 sounds like a fun number to experiment with.
[12/7/11 2:43:09 PM] Greg Burd:              {wants_claim_fun, {riak_core_new_claim, new_wants_claim}},
             {choose_claim_fun, {riak_core_new_claim, new_choose_claim}},
--------------------------
http://dl.dropbox.com/u/11999159/Basho/2011-12-06-log1.txt
rp(lists:sort([{Mem, Pid, Queue, Dict}
|| Pid <- [Pid || {_,Pid,_,_} <- supervisor:which_children(riak_core_vnode_sup)],
{memory, Mem} <- [catch process_info(Pid, memory)],
{message_queue_len, Queue} <- [catch process_info(Pid, message_queue_len)],
{dictionary, Dict} <- [catch process_info(Pid, dictionary)],
is_integer(Mem)])).
lists:reverse(lists:sort([{Queue, Pid} || Pid <- [Pid || {_,Pid,_,_} <- supervisor:which_children(riak_core_vnode_sup)], {memory, Mem} <- [catch process_info(Pid, memory)], {message_queue_len, Queue} <- [catch process_info(Pid, message_queue_len)], {dictionary, Dict} <- [catch process_info(Pid, dictionary)], is_integer(Mem)])).
--------------------------
basho@prod-04:~$ awk '/# Server: prod-[0123456789]/ { last = 2 } /^10/ && last == 0 { print } { last--}' /etc/hosts
10.28.60.208 prod-04-internal.XXXer.com raid-0-b
10.28.60.210 prod-05-internal.XXXer.com raid-0-b
10.28.60.212 prod-06-internal.XXXer.com RAID0-B
10.28.60.214 prod-07-internal.XXXer.com RAID0-B
10.28.60.216 prod-08-internal.XXXer.com RAID0-B
10.28.60.218 prod-09-internal.XXXer.com RAID0-B
10.28.60.220 prod-10-internal.XXXer.com Raid0-B
10.28.60.222 prod-11-internal.XXXer.com Raid0-B
10.28.60.224 prod-12-internal.XXXer.com RAID0-B
10.28.60.226 prod-13-internal.XXXer.com RAID5-B
10.28.60.228 prod-14-internal.XXXer.com RAID0-B
10.28.60.230 prod-15-internal.XXXer.com RAID0-B
10.28.60.202 prod-16-internal.XXXer.com --Not-In-Service--
10.28.60.232 prod-17-internal.XXXer.com raid-5-b
10.28.60.234 prod-18-internal.XXXer.com raid-5-b
10.28.60.238 prod-19-internal.XXXer.com RAID0-B
10.28.60.240 prod-20-internal.XXXer.com RAID0-B
10.28.60.242 prod-21-internal.XXXer.com RAID0-B
10.28.60.244 prod-22-internal.XXXer.com raid0-b
10.28.60.246 prod-23-internal.XXXer.com RAID0-B
10.28.60.248 prod-24-internal.XXXer.com RAID0-B
10.28.60.250 prod-25-internal.XXXer.com raid0-b
10.28.60.254 prod-26-internal.XXXer.com RAID0-B
10.28.60.252 prod-27-internal.XXXer.com RAID0-A
10.28.233.195 prod-28-internal.XXXer.com raid0-b
10.28.233.197 prod-29-internal.XXXer.com raid0-b
10.28.233.199 prod-30-internal.XXXer.com RAID0-B
10.28.233.201 prod-31-internal.XXXer.com RAID0-B
10.28.233.203 prod-32-internal.XXXer.com raid0-b
10.28.233.205 prod-33-internal.XXXer.com RAID0-B
10.28.233.207 prod-34-internal.XXXer.com Raid0-B
10.28.233.209 prod-35-internal.XXXer.com RAID0-B
10.28.60.236 prod-99-internal.XXXer.com
------------
/usr/lib/riak/erts-5.8.4/bin/erl -setcookie riak_XXXer_prod -hidden -name slf$$@127.0.0.1 -remsh riaksearch@10.28.60.212
Erlang R14B03 (erts-5.8.4) [source] [64-bit] [smp:24:24] [rq:24] [async-threads:0] [kernel-poll:false]
Eshell V5.8.4 (abort with ^G)
(riaksearch@10.28.60.212)1> {percept_profile:start("/tmp/slf.percept.prod-06.nopatch"), timer:sleep(5*1000), percept_profile:stop()}.
Starting profiling.
{{ok,#Port<0.76127726>},ok,ok}
(riaksearch@10.28.60.212)2> os:cmd("bzip2 -9v /tmp/slf.percept.prod-06.nopatch").
" /tmp/slf.percept.prod-06.nopatch: 11.485:1, 0.697 bits/byte, 91.29% saved, 69744612 in, 6072897 out.\n"
----------
* Convoy hunting
In slf.percept.prod-17.a.patch, at times:
14.5 - 15.1 seconds
delta [ms] time [ms] pid activity module:function/arity #runnables
0.0260 14505.0890 <0.398.0> inactive gen_server2:process_next_msg/9 3
0.1250 14505.2140 <0.11337.3385> inactive prim_inet:recv0/3 2
263.2790 14768.4930 <0.82.0> active gen_server:loop/6 3
0.0070 14768.5000 <0.91.0> active gen_server:loop/6 4
299.7850 15068.2850 <0.398.0> active gen_server2:process_next_msg/9 5
0.0570 15068.3420 <0.398.0> inactive gen_server2:process_next_msg/9 4
0.3950 15068.7370 <0.11995.3385> active prim_inet:recv0/3 5
0.0330 15068.7700 <0.82.0> inactive gen_server:loop/6 4
0.0800 15068.8500 <0.13398.3385> active prim_inet:accept0/2 5
0.0200 15068.8700 <0.13399.3385> active prim_inet:accept0/2 6
0.0140 15068.8840 <0.13403.3385> active prim_inet:accept0/2 7
.....
20.3 - 20.4 seconds
delta [ms] time [ms] pid activity module:function/arity #runnables
0.0290 20303.8780 <0.6542.3> active gen_fsm:loop/7 5
0.0080 20303.8860 <0.371.0> inactive gen_server:loop/6 4
41.4230 20345.3090 <0.13147.3384> inactive prim_file:drv_get_response/1 3
0.0530 20345.3620 <0.19601.3385> active gen_fsm:loop/7 4
0.0450 20345.4070 <0.19549.3385> active mochiweb_http:request/3 5
0.0050 20345.4120 <0.19553.3385> active mochiweb_http:request/3 6
20.48 - 20.9 seconds
delta [ms] time [ms] pid activity module:function/arity #runnables
0.0050 20493.0300 <0.19699.3385> inactive mochiweb_http:request/3 6
0.0070 20493.0370 <0.19811.3385> inactive erlang:apply/2 5
386.2050 20879.2420 <0.20033.7> inactive erlang:crc32/2 4
0.0070 20879.2490 <0.19444.3385> active erlang:process_info/2 5
0.0110 20879.2600 <0.398.0> active gen_server2:process_next_msg/9 6
0.0060 20879.2660 <0.82.0> active gen_server:loop/6 7
0.0050 20879.2710 <0.20033.7> active erlang:crc32/2 8
0.0030 20879.2740 <0.23401.3375> active prim_inet:close_pend_loop/2 9
0.0060 20879.2800 <0.8920.4> active gen_fsm:loop/7 10
25.90 - 26.7 seconds
delta [ms] time [ms] pid activity module:function/arity #runnables
0.0240 25991.9890 <0.29987.9> inactive gen_fsm:loop/7 3
0.0080 25991.9970 <0.398.0> inactive gen_server2:process_next_msg/9 2
493.1710 26485.1680 <0.91.0> active gen_server:loop/6 3
0.0230 26485.1910 <0.28651.3375> active mochiweb_http:request/3 4
0.0290 26485.2200 <0.92.0> active gen_event:fetch_msg/5 5
0.0080 26485.2280 <0.91.0> inactive gen_server:loop/6 4
In slf.percept.prod-17.b.patch, holy cow, there's bad behavior between
1 and about 8 seconds!!
Each "active" message included below is the first event after the big pause.
Also, pid <0.398.0> is riak_kv_stat.
1.29 - 2.1 seconds (775 msec)
<0.398.0> active gen_server2:process_next_msg/9
2.29 - 2.97 sec (667 msec)
<0.398.0> active gen_server2:process_next_msg/9
3.00 - 3.87 sec (859 msec)
<0.398.0> active gen_server2:process_next_msg/9
4.63 - 4.77 sec (131 msec)
<0.398.0> active gen_server2:process_next_msg/9
4.79 - 5.67 sec (861 msec)
<0.6542.3> inactive gen_server2:cast_msg/1
WEEEEEIIIRRRRDDD?????
5.69 - 6.6 sec (861 msec)
<0.398.0> active gen_server2:process_next_msg/9
7.14 - 7.46 sec (310 msec)
<0.398.0> active gen_server2:process_next_msg/9
7.67 - 8.36 sec (673 msec)
<0.9404.3489> active prim_file:drv_get_response/1
In slf.percept.prod-17.c.patch, holy cow, even worse: a single pause
of 3+ seconds!!
NOTE: during this time there was a bitcask merge. !@#$! I didn't save
the Riak log message. But it happened near the end of the 31 sec
sample window.
delta [ms] time [ms] pid activity module:function/arity #runnables
0.0080 25637.3680 <0.398.0> active gen_server2:process_next_msg/9 4
0.0050 25637.3730 <0.8920.4> inactive gen_fsm:loop/7 3
0.0100 25637.3830 <0.2472.11> inactive gen_fsm:loop/7 2
0.0070 25637.3900 <0.398.0> inactive gen_server2:process_next_msg/9 1
3187.6570 28825.0470 <0.24.0> active gen_server:loop/6 2
0.0130 28825.0600 <0.30277.3558> inactive gen:do_call/4 1
0.0210 28825.0810 <0.608.0> active gen_server:loop/6 2
0.0070 28825.0880 <0.82.0> active gen_server:loop/6 3
0.0090 28825.0970 <0.666.0> active mi_scheduler:worker_loop/1 4
0.0110 28825.1080 <0.6542.3> active gen_fsm:loop/7 5
0.0080 28825.1160 <0.20033.7> active gen_fsm:loop/7 6
0.0040 28825.1200 <0.665.0> active gen_server:loop/6 7
0.0100 28825.1300 <0.666.0> inactive mi_scheduler:worker_loop/1 6
0.0080 28825.1380 <0.23178.8> active gen_fsm:loop/7 7
0.0040 28825.1420 <0.20.0> active gen_server:loop/6 8
0.0100 28825.1520 <0.608.0> inactive gen:do_call/4 7
0.0230 28825.1750 <0.608.0> active gen:do_call/4 8
0.0040 28825.1790 <0.20.0> inactive gen_server:loop/6 7
0.0070 28825.1860 <0.91.0> active gen_server:loop/6 8
0.0060 28825.1920 <0.665.0> inactive gen_server:loop/6 7
0.0140 28825.2060 <0.23225.3559> active prim_inet:accept0/2 8
0.0060 28825.2120 <0.23228.3559> active prim_inet:accept0/2 9
0.0040 28825.2160 <0.23231.3559> active prim_inet:accept0/2 10
0.0050 28825.2210 <0.23233.3559> active prim_inet:accept0/2 11
0.0040 28825.2250 <0.82.0> inactive gen_server:loop/6 10
0.0020 28825.2270 <0.452.0> active gen_server:loop/6 11
0.0030 28825.2300 <0.24.0> inactive prim_file:drv_get_response/1 10
0.0040 28825.2340 <0.23235.3559> active prim_inet:accept0/2 11
0.0060 28825.2400 <0.23239.3559> active prim_inet:accept0/2 12
0.0060 28825.2460 <0.23244.3559> active prim_inet:accept0/2 13
0.0190 28825.2650 <0.23247.3559> active prim_inet:accept0/2 14
0.0080 28825.2730 <0.23249.3559> active prim_inet:accept0/2 15
0.0070 28825.2800 <0.23253.3559> active prim_inet:accept0/2 16
0.0110 28825.2910 <0.6542.3> inactive gen_fsm:loop/7 15
0.0140 28825.3050 <0.23254.3559> active prim_inet:accept0/2 16
0.0110 28825.3160 <0.23260.3559> active prim_inet:accept0/2 17
0.0090 28825.3250 <0.23259.3559> active prim_inet:accept0/2 18
0.0040 28825.3290 <0.23262.3559> active prim_inet:accept0/2 19
0.0080 28825.3370 <0.371.0> active gen_server:loop/6 20
---------------------------------------
basho@prod-17:~$ cat /home/basho/DO.sh
#!/bin/sh
TIME=$1
export PATH=/usr/lib/riak/erts-5.8.4/bin:$PATH
sh -c "while [ 1 ]; do echo RESTARTING ; date; /tmp/etop -setcookie riak_XXXer_prod -node riaksearch@10.28.60.232 -interval 1 -sort msg_q 2>&1 | egrep -v --line-buffered 'application_master:m| 0 gen_fsm:loop/7' | egrep --line-buffered 'Reds|proc_lib|riak_kv' ; done" &
ETOPPID=$!
sh -c "while [ 1 ]; do date ; sleep 1 ; done" &
DATEPID=$!
sh -c "tail -f /var/log/riak/console.log | egrep --line-buffered '@mi_|/ssd|monitor'" &
TAILPID=$!
sh -c ' while [ 1 ]; do ps 29259 | grep pts | grep -v Ssl; sleep 0.05; done' &
PSPID=$!
iostat -kx 1 $TIME
kill $ETOPPID
kill $DATEPID
kill $TAILPID
kill $PSPID
sleep 1
kill `ps axww | grep etop | grep -v grep | awk '{print $1}'`
---------------------------------------
basho@prod-17:~$ for i in /proc/sys/vm/*dirty*; do echo $i ; cat $i ; echo "" ; done
/proc/sys/vm/dirty_background_bytes
0
/proc/sys/vm/dirty_background_ratio
10
/proc/sys/vm/dirty_bytes
0
/proc/sys/vm/dirty_expire_centisecs
3000
/proc/sys/vm/dirty_ratio
20
/proc/sys/vm/dirty_writeback_centisecs
500
---------------------------------------
Dammit, I *knew* that I was seeing prod-17's disks act much, *much*
slower than prod-15's. ... There are 3 boxes that have RAID 5
settings, see the 'XXXer-raid.txt' file.
prod-15:
Logical device number 1
Logical device name : RAID0-B
RAID level : 0
Status of logical device : Optimal
Size : 762382 MB
Stripe-unit size : 256 KB
Read-cache mode : Enabled
MaxIQ preferred cache setting : Enabled
MaxIQ cache setting : Disabled
Write-cache mode : Enabled (write-back)
Write-cache setting : Enabled (write-back) when protected by battery/ZMM
Partitioned : Yes
Protected by Hot-Spare : No
Bootable : No
Failed stripes : No
Power settings : Disabled
prod-17:
Logical device number 1
Logical device name : raid-5-b
RAID level : 5
Status of logical device : Optimal
Size : 667090 MB
Stripe-unit size : 256 KB
Read-cache mode : Enabled
MaxIQ preferred cache setting : Disabled
MaxIQ cache setting : Disabled
Write-cache mode : Enabled (write-back)
Write-cache setting : Enabled (write-back) when protected by battery/ZMM
Partitioned : Yes
Protected by Hot-Spare : No
Bootable : No
Failed stripes : No
Power settings : Disabled
-----
tail -f /var/log/riak/console.log | egrep --line-buffered -v 'exit,normal|riak_search_backend:collect_info_response:55|riak_solr_searcher_wm:malformed_request:82|webmachine error: path|too_many_results,|to_mochijson2| monitor |Problem invoking pre-commit hook|riak_indexed_doc,index,|@mi_|riak_search_kv_hook,precommit|badmatch,.error,closed|error,function_clause.*base64' &
-----
slf.percept.prod-29.bigbacklog2.patch.bz2
16ms pause: 5.2053 - 5.2277 sec
16ms pause: 1.5708 - 1.5932 sec
9ms pause : 11.7316 - 11.7463 sec
: 16.2913 - 16.2957 sec
23ms pause: 24.0534 - 24.0796 sec
-----
Answer (?) to weird Erlang proc scheduling problem: busy_dist_port.
riak_sysmon was buggy and mis-configured! ARRGH!
Solution: add +zdbbl 16384 to all nodes, then do rolling restart.
During the rolling restart, run 32 different basho_bench instances,
each aimed at a single Riak cluster member. Used two shell 1-liner
hacks to monitor the basho_bench data:
* Look for non-zero error rates for GET
* Look for bad max latency values for GET
while [ 1 ]; do date ; for i in tests.prod-*/current/get*csv; do echo `echo $i | sed 's/-internal.*//'` `tail -1 $i`; done | awk '$12 > 0 { print }' | sort -nr -k 11 | head -5 ; sleep 15; done
while [ 1 ]; do date ; for i in tests.prod-*/current/get*csv; do echo `echo $i | sed 's/-internal.*//'` `tail -1 $i`; done | sort -nr -k 10 | head -5 ; sleep 15; done
-----
* Return of the vnode mailbox backups from hell.
Example:
Mon Dec 19 21:18:29 PST 2011
'riaksearch@10.28.233.207': [{riak_kv_vnode,327571},{riak_pipe_vnode,0},{riak_search_vnode,0}]
'riaksearch@10.28.60.208': [{riak_kv_vnode,17893},{riak_pipe_vnode,0},{riak_search_vnode,0}]
'riaksearch@10.28.233.205': [{riak_kv_vnode,6},{riak_pipe_vnode,0},{riak_search_vnode,0}]
'riaksearch@10.28.60.232': [{riak_kv_vnode,2},{riak_pipe_vnode,0},{riak_search_vnode,0}]
'riaksearch@10.28.60.218': [{riak_kv_vnode,2},{riak_pipe_vnode,0},{riak_search_vnode,0}]
'riaksearch@10.28.60.228': [{riak_kv_vnode,1},{riak_pipe_vnode,0},{riak_search_vnode,0}]
'riaksearch@10.28.60.240': [{riak_kv_vnode,1},{riak_pipe_vnode,0},{riak_search_vnode,0}]
'riaksearch@10.28.60.220': [{riak_kv_vnode,1},{riak_pipe_vnode,0},{riak_search_vnode,0}]
'riaksearch@10.28.233.195': [{riak_kv_vnode,1},{riak_pipe_vnode,0},{riak_search_vnode,0}]
'riaksearch@10.28.233.219': [{riak_kv_vnode,0},{riak_search_vnode,0}]
So, we appear to be having packet drop problems. If we measure using
"ping", we get one set of data: no packets dropped on the external
interfaces.
root@prod-04:~# sh -c 'FILE=/tmp/slf.capt`date +%s` ; echo Putting tcpdump output in $FILE ; tcpdump -i eth0 -c 200000 -w $FILE ; tcptrace -l $FILE | grep "rexmt data pkts" | head -5 ; echo Pseudo-count of TCP flows with retransmitted packets: ; tcptrace -l $FILE | grep "rexmt data pkts" | grep -v "rexmt data pkts: 0 rexmt data pkts: 0" | wc -l'
Putting tcpdump output in /tmp/slf.capt1324358762
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
200000 packets captured
200012 packets received by filter
0 packets dropped by kernel
rexmt data pkts: 0 rexmt data pkts: 0
rexmt data pkts: 0 rexmt data pkts: 0
rexmt data pkts: 0 rexmt data pkts: 0
rexmt data pkts: 0 rexmt data pkts: 14
rexmt data pkts: 0 rexmt data pkts: 16
Pseudo-count of TCP flows with retransmitted packets:
82
... and ...
basho@prod-04:~$ bash -c 'for t in "" -internal; do for i in prod-{0{4,5,6,7,8,9},1{0,1,2,3,4,5,7,8,9},2{0,1,2,3,4,5,6,7,8,9},3{0,1,2,3,4,5,6,7,8,9},40,99}$t.XXXer.com; do sudo ping -i 0.02 -c 50 $i 2>&1; done done' | awk '/PING/ {host = $2} /transmitted/ { print host, $0 }' | grep -v '50 received'
prod-05-internal.XXXer.com 50 packets transmitted, 0 received, 100% packet loss, time 1369ms
prod-06-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 995ms
prod-07-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 1005ms
prod-08-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 1157ms
prod-10-internal.XXXer.com 50 packets transmitted, 48 received, 4% packet loss, time 1114ms
prod-11-internal.XXXer.com 50 packets transmitted, 48 received, 4% packet loss, time 1030ms
prod-12-internal.XXXer.com 50 packets transmitted, 48 received, 4% packet loss, time 1009ms
prod-13-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 1006ms
prod-14-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 993ms
prod-17-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 1018ms
prod-20-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 1001ms
prod-21-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 1004ms
prod-24-internal.XXXer.com 50 packets transmitted, 48 received, 4% packet loss, time 1019ms
prod-26-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 1021ms
prod-28-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 1025ms
prod-30-internal.XXXer.com 50 packets transmitted, 47 received, 6% packet loss, time 1082ms
prod-32-internal.XXXer.com 50 packets transmitted, 47 received, 6% packet loss, time 1012ms
----
WRONG WRONG WRONG here for doc purposes only!
root@prod-04:~# for i in `seq 1 5`; do sh -c 'ifconfig eth0 ; sleep 3 ; ifconfig eth0' | grep bytes | sed -e 's/.. bytes://g' | awk 'NR == 1 { rx = $1; tx = $4} NR == 2 { printf "rx %.1f Mbit/sec tx %1.f Mbit/sec WRONG WRONG WRONG\n", ($1 - rx) / (1024*1024) / 3, ($4 - tx) / (1024*1024) / 3 }' ; donerx 71.5 Mbit/sec tx 104 Mbit/sec
rx 75.5 Mbit/sec tx 113 Mbit/sec
rx 64.5 Mbit/sec tx 44 Mbit/sec
rx 56.2 Mbit/sec tx 94 Mbit/sec
rx 79.4 Mbit/sec tx 108 Mbit/sec
2011-12-20 update: WWWHHHAAAAAA! That f'ing awk script is *wrong*.
It's not converting bytes to bits!!!!!
Here's a correct one:
root@prod-04:~# date ; for i in `seq 1 5`; do sh -c 'ifconfig eth0 ; sleep 3 ; ifconfig eth0' | grep bytes | sed -e 's/.. bytes://g' | awk 'NR == 1 { rx = $1; tx = $4} NR == 2 { printf "rx %.1f Mbit/sec tx %1.f Mbit/sec\n", ($1 - rx) * 8 / (1024*1024) / 3, ($4 - tx) * 8 / (1024*1024) / 3 }' ; done
Tue Dec 20 15:18:49 PST 2011
rx 423.0 Mbit/sec tx 488 Mbit/sec
rx 491.6 Mbit/sec tx 566 Mbit/sec
rx 483.2 Mbit/sec tx 519 Mbit/sec
rx 437.1 Mbit/sec tx 495 Mbit/sec
rx 456.5 Mbit/sec tx 489 Mbit/sec
---
root@prod-04:~# grep -i qu /tmp/slf.foo
105815089 requests sent out
12900 reassemblies required
source quenches: 106
echo requests: 58153
echo request: 95724
416815 packets pruned from receive queue because of socket buffer overrun
2222 packets pruned from receive queue
1 packets dropped from out-of-order queue because of socket buffer overrun
Quick ack mode was activated 9682843 times
413634 times the listen queue of a socket overflowed
600138 packets directly queued to recvmsg prequeue.
1780022 bytes directly received in process context from prequeue
167 packets header predicted and directly queued to user
4475848 packets collapsed in receive queue due to low socket buffer
watch -d "sh -c 'netstat -s | grep -i queue'"
watch -d "sh -c 'netstat -s | egrep -i retr'"
The "netstat -s" method confirms that there are several hundred
packets being retransmitted by TCP each second, on average.
--
Wed 12-21 early morning:
Set MTU on all eth0 interfaces from 1500 -> 9000.
Then disconnect and reconnect all Erlang nodes.
for i in `awk '{print $1}' /tmp/slf.some-hosts `; do /usr/lib/riak/erts-5.8.4/bin/erl -name slf$$@127.0.0.1 -hidden -setcookie riak_XXXer_prod -remsh riaksearch@$i ; done
length(nodes()). WARNING: Search for "WOW, that caused some unintentional chaos monkey" string below!!!
lists:usort([{net_kernel:disconnect(Nd), timer:sleep(20), net_kernel:connect(Nd), net_adm:ping(Nd)} || Nd <- nodes()]).
length(nodes()).
---
Wed 12-21:
Change CPU freq governor from 'ondemand' -> 'performance'
See set*expect script for how-to
Experimentation on prod-23 and prod-21
13 ethtool --show-pause eth0
14 ethtool --pause autoneg off rx off tx off
15 ethtool --pause eth0 autoneg off rx off tx off
16 sudo ethtool --pause eth0 autoneg off rx off tx off
17 ethtool --show-pause eth0
19 ethtool --show-ring eth0
20 xsudo ethtool --set-ring eth0 rx 4096 tx 4096 rx-mini 256 rx-jumbo 256
21 sudo ethtool --set-ring eth0 rx 4096 tx 4096 rx-mini 256 rx-jumbo 256
22 sudo ethtool --set-ring eth0 rx 4096 tx 4096
24 ethtool --show-ring eth0
Nothing really seems to help.
Except that Matt Ranney & his crew changed the method that the
timeline history is managed, shrinking the single key's max size for
appending history info (originally 60KB?) by half? Made a roughly
100Mbit/sec average decrease on all machines.
---
Thu 12-22
Matt changed the timeline history max size smaller again. Nice win.
Also, after:
"Matt R. Man, it feels really good to be finally making some
progress. All it took was spending an entire day yelling at
SoftLayer."
... we learned that SL's explanation is that despite average
utilization @ 300Mbit/sec, there are microburst periods where other
cluster members coincidentally burst traffic to a single receiver
above line rate and beyond the switch's very limited capacity to
buffer. So, during those very small time periods, yes, packets are
dropped. And yes, the 1-sec average utilization rate seems quite low
and reasonable.
That theory seems to be making some sense. See the escript file in
the data dir for polling the /sys/path/to/eth0/rx_bytes counter.
Either there's timing error (because the kernel's timer resolution is
100Hz and therefore Erlang's smallest timer firing is 100Hz aka 10
milliseconds), or it's true that these micro-bursts are happening on
final egress out of the switching fabric. Bummer.
---
ets:match() elimination
-----------------------
Data files in /Users/fritchie/b/notes/2011-12-07.XXXer.data/ets-match-stuff/*
prod-2000 has pthreads locking (not default custom locks) and no
-fomit-frame-pointer (for decent stack traces).
[root@prod-2000 ~]# awk '/PID/ { c = 0} /beam.smp/ && c < 5 { usr += $3; sys += $4; lck += $8; slp += $9} END { print "usr, sys, lck, slp" ; print usr, sys, lck, slp; }' /root/prstat.before-slf
usr, sys, lck, slp
7692.6 825.5 57352 5297.3
[root@prod-2000 ~]# awk '/PID/ { c = 0} /beam.smp/ && c < 5 { usr += $3; sys += $4; lck += $8; slp += $9} END { print "usr, sys, lck, slp" ; print usr, sys, lck, slp; }' /root/prstat.after-slf-draft3
usr, sys, lck, slp
5073.4 553.2 58133.1 4885
Makes a pretty big difference for USR time.
prod-2009 is stock riak 1.0.2 + eleveldb
[root@prod-2009 ~]# awk '/PID/ { c = 0} /beam.smp/ && c < 5 { usr += $3; sys += $4; lck += $8; slp += $9} END { print "usr, sys, lck, slp" ; print usr, sys, lck, slp; }' /root/prstat.before-slf
usr, sys, lck, slp
1346.8 650.6 59578 5631.3
[root@prod-2009 ~]# awk '/PID/ { c = 0} /beam.smp/ && c < 5 { usr += $3; sys += $4; lck += $8; slp += $9} END { print "usr, sys, lck, slp" ; print usr, sys, lck, slp; }' /root/prstat.after-slf-draft3
usr, sys, lck, slp
1269.4 584.8 56486 5512.3
--------
Crazy experimentation with plockstat.
I've managed to use a hacked version of plockstat for up to 250
milliseconds at a time. (Locks are really, really hot and generate
many thousands of DTrace events in that 1/4 second.) The differences
between 1/4 second samples are striking: the highest contended locks
fall into several different areas. More sampling (more than 6 or 7) is
required, and some automated script hacks to try to figure out what
the truly hot areas are.
[root@prod-2009 ~]# for i in a b c d e f; do dtrace -x dynvarsize=4m -s /root/plockstat.d -p 28476 > /tmp/foo250.$i ; sleep 2 ; done
---
my.joyentcloud.com
devops@XXXer.com
XXXerJoyentPasswordHere
From roughly Dec 23 00:00 through Dec 27 17:00 ... some polling
intervals are every 30 seconds, sometimes every 5 seconds, sometimes
polling not running at all.
awk '/ PST 2011$/ {c = 0 } /riak_kv_vnode/ { c++ } /riak_kv_vnode/ && c == 1 {print}' ~/XXXer.txt | sed 's/[{},]/ /g' | awk '{ print $4}' | sed -e 's/.$//' -e 's/.$//' -e 's/.$//' | sort -nr | uniq -c
instances vnode mailbox size x 1000
--------- -------------------------
1 258-259
1 70-70
1 54-55
1 28-29
5 5-6
17 4-5
80 3-4
424 2-3
1233 1-2
26882 0-1
---
So, using the rto_min setting has some kind of units conversion
problem. If you actually specify "200ms", you end up getting 120
*seconds* of timeout. "1ms" seems to do about the same thing.
Using "1ns" or "10ns" or perhaps "100ns" seems fine: there's never an
"iperf" run using less bandwidth than 250 Mbit/sec.
root@prod-04:~# for size in 1ns 10ns 1ns 100ns 1ns 1us 1ns 10us 1ns; do echo "HEY, changing to $size"; ip route change to 10.28.60.192/26 dev eth0 rto_min $size; ip route show to 10.28.60.192/26 ; for i in `seq 1 50`; do iperf -c prod-18-internal.XXXer.com -n 15M 2>&1 | grep sec ; sleep 0.1; done; done
HEY, changing to 1ns
10.28.60.192/26 dev eth0 scope link rto_min lock 0.00ms
[ 3] 0.0- 0.2 sec 15.0 MBytes 514 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 592 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 665 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 594 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 546 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 657 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 493 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 393 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 585 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 576 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 616 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 559 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 584 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 566 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 523 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 434 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 553 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 443 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 658 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 703 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 606 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 576 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 606 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 626 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 526 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 570 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 583 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 515 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 625 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 676 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 511 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 631 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 586 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 564 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 550 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 658 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 575 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 487 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 655 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 563 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 574 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 637 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 589 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 572 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 576 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 573 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 603 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 660 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 618 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 526 Mbits/sec
HEY, changing to 10ns
10.28.60.192/26 dev eth0 scope link rto_min lock 0.00ms
[ 3] 0.0- 0.2 sec 15.0 MBytes 546 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 557 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 598 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 584 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 543 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 642 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 582 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 657 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 553 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 634 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 646 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 624 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 608 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 489 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 556 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 538 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 520 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 605 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 523 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 576 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 561 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 543 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 402 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 555 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 619 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 610 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 587 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 638 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 610 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 605 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 588 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 534 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 617 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 537 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 601 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 672 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 658 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 624 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 635 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 629 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 631 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 604 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 656 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 735 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 423 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 443 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 553 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 573 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 652 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 544 Mbits/sec
HEY, changing to 1ns
10.28.60.192/26 dev eth0 scope link rto_min lock 0.00ms
[ 3] 0.0- 0.2 sec 15.0 MBytes 629 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 457 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 608 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 508 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 760 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 574 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 591 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 555 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 622 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 453 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 398 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 647 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 776 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 621 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 570 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 622 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 546 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 614 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 500 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 525 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 569 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 570 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 628 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 646 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 461 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 601 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 749 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 551 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 442 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 474 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 564 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 650 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 608 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 529 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 529 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 642 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 547 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 512 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 616 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 730 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 678 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 574 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 495 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 524 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 531 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 565 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 611 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 625 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 630 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 560 Mbits/sec
HEY, changing to 100ns
10.28.60.192/26 dev eth0 scope link rto_min lock 0.00ms
[ 3] 0.0- 0.3 sec 15.0 MBytes 470 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 587 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 640 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 670 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 521 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 518 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 632 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 543 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 553 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 579 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 637 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 673 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 503 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 502 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 575 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 619 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 638 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 608 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 592 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 610 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 509 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 521 Mbits/sec
[ 3] 0.0- 0.4 sec 15.0 MBytes 356 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 659 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 647 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 732 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 648 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 624 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 587 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 646 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 606 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 566 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 507 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 498 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 486 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 697 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 767 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 476 Mbits/sec
[ 3] 0.0- 0.1 sec 15.0 MBytes 863 Mbits/sec
[ 3] 0.0- 0.1 sec 15.0 MBytes 849 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 752 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 783 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 755 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 749 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 793 Mbits/sec
[ 3] 0.0- 0.1 sec 15.0 MBytes 872 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 771 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 743 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 788 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 725 Mbits/sec
HEY, changing to 1ns
10.28.60.192/26 dev eth0 scope link rto_min lock 0.00ms
[ 3] 0.0- 0.2 sec 15.0 MBytes 801 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 775 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 795 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 750 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 779 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 809 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 606 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 532 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 380 Mbits/sec
[ 3] 0.0- 0.5 sec 15.0 MBytes 273 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 366 Mbits/sec
[ 3] 0.0- 0.4 sec 15.0 MBytes 349 Mbits/sec
[ 3] 0.0- 0.4 sec 15.0 MBytes 288 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 501 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 648 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 719 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 505 Mbits/sec
[ 3] 0.0- 0.4 sec 15.0 MBytes 298 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 573 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 397 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 528 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 598 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 699 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 726 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 579 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 585 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 614 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 661 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 558 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 588 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 584 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 586 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 400 Mbits/sec
[ 3] 0.0- 0.4 sec 15.0 MBytes 349 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 512 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 521 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 652 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 668 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 618 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 615 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 704 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 590 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 481 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 544 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 502 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 586 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 686 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 656 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 462 Mbits/sec
[ 3] 0.0- 0.4 sec 15.0 MBytes 341 Mbits/sec
HEY, changing to 1us
10.28.60.192/26 dev eth0 scope link rto_min lock 0.00ms
[ 3] 0.0- 0.3 sec 15.0 MBytes 370 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 536 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 633 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 516 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 531 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 458 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 542 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 608 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 711 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 575 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 637 Mbits/sec
[ 3] 0.0- 1.2 sec 15.0 MBytes 103 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 618 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 713 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 629 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 705 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 650 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 572 Mbits/sec
[ 3] 0.0- 1.3 sec 15.0 MBytes 95.5 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 553 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 677 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 662 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 611 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 596 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 581 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 618 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 625 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 568 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 641 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 581 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 607 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 600 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 626 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 581 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 671 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 524 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 480 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 560 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 603 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 686 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 618 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 553 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 536 Mbits/sec
[ 3] 0.0- 1.4 sec 15.0 MBytes 92.2 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 523 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 525 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 522 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 487 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 620 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 419 Mbits/sec
HEY, changing to 1ns
10.28.60.192/26 dev eth0 scope link rto_min lock 0.00ms
[ 3] 0.0- 0.2 sec 15.0 MBytes 573 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 574 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 618 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 543 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 640 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 552 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 375 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 488 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 458 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 486 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 583 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 645 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 624 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 579 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 566 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 542 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 581 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 626 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 685 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 647 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 722 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 452 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 553 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 507 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 497 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 666 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 730 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 441 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 555 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 755 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 609 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 525 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 651 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 719 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 636 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 518 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 641 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 671 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 595 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 622 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 626 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 579 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 377 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 366 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 378 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 434 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 697 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 636 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 694 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 702 Mbits/sec
HEY, changing to 10us
10.28.60.192/26 dev eth0 scope link rto_min lock 0.01ms
[ 3] 0.0- 0.2 sec 15.0 MBytes 685 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 614 Mbits/sec
[ 3] 0.0-10.3 sec 15.0 MBytes 12.2 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 666 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 472 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 528 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 654 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 457 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 644 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 620 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 574 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 494 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 569 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 672 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 613 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 673 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 528 Mbits/sec
[ 3] 0.0-10.2 sec 15.0 MBytes 12.3 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 589 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 585 Mbits/sec
[ 3] 0.0-10.2 sec 15.0 MBytes 12.3 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 677 Mbits/sec
[ 3] 0.0-10.2 sec 15.0 MBytes 12.3 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 694 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 505 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 502 Mbits/sec
[ 3] 0.0-10.2 sec 15.0 MBytes 12.3 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 704 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 444 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 444 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 526 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 583 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 686 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 727 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 793 Mbits/sec
[ 3] 0.0- 0.1 sec 15.0 MBytes 971 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 639 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 483 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 516 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 576 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 620 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 560 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 574 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 570 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 552 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 635 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 576 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 655 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 564 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 730 Mbits/sec
HEY, changing to 1ns
10.28.60.192/26 dev eth0 scope link rto_min lock 0.00ms
[ 3] 0.0- 0.2 sec 15.0 MBytes 834 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 759 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 579 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 565 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 458 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 388 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 497 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 454 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 509 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 415 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 644 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 512 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 608 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 627 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 594 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 518 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 686 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 485 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 539 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 606 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 646 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 680 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 638 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 547 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 634 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 601 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 504 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 413 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 522 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 796 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 796 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 385 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 412 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 388 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 594 Mbits/sec
[ 3] 0.0- 0.3 sec 15.0 MBytes 423 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 688 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 572 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 519 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 652 Mbits/sec
[ 3] 0.0- 0.4 sec 15.0 MBytes 317 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 606 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 538 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 550 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 529 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 545 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 664 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 687 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 737 Mbits/sec
[ 3] 0.0- 0.2 sec 15.0 MBytes 582 Mbits/sec
----
Changed the RTO_min value for all SL cluster members on
Wed Dec 28 18:47:43 PST 2011.
I screwed up prod-09, which now has a 2x1GigE bonded interface.
Adding explicit routes on that machine to one of the member interfaces
was not a good idea. 99th% latencies shot pretty high for a little
while there, drat. Otherwise, things went smoothly. prod-09 was the
only machine that I screwed up, so no Riak operations failed.
Starting Wed Dec 28 21:06:12 PST 2011 - Wed Dec 28 21:14:44 PST 2011,
I used the net_kernel:disconnect() & reconnect trick (listed above) to
reset TCP connections between all Erlang nodes. I was still seeing
temporary (1 or 2 minute) see-saws in the 4-hour Cacti stats windows,
which suggest that perhaps that the old RTO_min value was being used??
We'll see if more of those dips appear now or not -- it may be too
late in the day to see the necessary traffic pattern to trigger them,
so wait until tomorrow?
WOW, that caused some unintentional chaos monkey. Apparently my
innocent Riak internal networking disconnect & recoonect caused very
visible problems. Among other things, all node.js processes in
Seattle restarted: a bug in XXXer's Riak client was found in an error
handler that caused another exception (and then cascading?).
So, if we ever do that again, we need to be much more careful.
----
1st week January 2012:
Libumem tuning, leak analysis, fragmentation analysis
See Campfire transcripts and Greg Burd for more details.
----
Tue Jan 10 2012:
leveldb on prod-2024: weird slow behavior for one vnode.
[lots of bogus D script and output omittied here.]
AH! The D script was incorrect! I mixed self and this, doh!
On a normal prod-2010:
[root@prod-2010 ~]# for i in 1 2 3 4 ; do date ; dtrace -n 'BEGIN { total = 0; mystart = timestamp; } pid$target::eleveldb_write:entry {self->start = timestamp;} pid$target::eleveldb_write:return /self->start > 0/ { this->nsec = timestamp - self->start ; @ = quantize((this->nsec)/1000000) ; self->start = 0; total = total + this->nsec; this->nsec = 0} tick-5sec { printa(@); printf("Total milliseconds in func: %d out of %d\n", total/1000000, (timestamp - mystart) / 1000000); exit(0);}' -p 2159 ; done
fixit for write
January 11, 2012 09:26:48 PM UTC
dtrace: description 'BEGIN ' matched 4 probes
CPU ID FUNCTION:NAME
15 70345 :tick-5sec
value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 677
1 | 2
2 | 0
Total milliseconds in func: 37 out of 4877
[...]
[root@prod-2010 ~]# echo fixit ; for i in 1 2 3 4 ; do date ; dtrace -n 'BEGIN { total = 0; mystart = timestamp; } pid$target::eleveldb_get:entry {self->start = timestamp;} pid$target::eleveldb_get:return /self->start > 0/ { this->nsec = timestamp - self->start ; @ = quantize((this->nsec)/1000000) ; self->start = 0; total = total + this->nsec; this->nsec = 0} tick-5sec { printa(@); printf("Total milliseconds in func: %d out of %d\n", total/1000000, (timestamp - mystart) / 1000000); exit(0);}' -p 2159 ; done
January 11, 2012 09:28:22 PM UTC
dtrace: description 'BEGIN ' matched 4 probes
CPU ID FUNCTION:NAME
5 70345 :tick-5sec
value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2185
1 |@ 78
2 | 12
4 | 11
8 | 28
16 | 15
32 | 4
64 | 0
128 | 0
256 | 2
512 | 0
Total milliseconds in func: 2204 out of 4876
[...]
----
OK, now, let's count function calls during an eleveldb_get func call:
Again, on prod-2010, our "normal" node.
[root@prod-2010 ~]# for i in 1 2 3 4; do date ; dtrace -n 'BEGIN { total = 0; mystart = timestamp; } pid$target::eleveldb_get:entry {self->active = 1; total++;} pid$target::eleveldb_get:return {self->active = 0;} syscall:::entry /pid == $target && self->active == 1/ { @[probefunc] = count(); } syscall::write*:entry /pid == $target && self->active == 1/ { @w[ustack()] = count(); } tick-5sec { printa(@); printf("Total calls to func: %d\n", total); printa(@w); exit(0);}' -p 2159 ; done
January 11, 2012 09:31:52 PM UTC
dtrace: description 'BEGIN ' matched 239 probes
CPU ID FUNCTION:NAME
14 70345 :tick-5sec
close 551
open 552
lwp_park 1224
pread 2222
Total calls to func: 2973
[...]
OK, here's what I do for prod-2024:
* Put riak_kv service up
* Wait for msgq on bad vnode to get to 1,600-2,000 messages.
It takes about 60 seconds (!) for 1,600 messages to drain.
* Put riak_kv service down
* Wait a sec or two
* Start this dtrace script:
[root@prod-2024 ~]# for i in 1 2 3 4; do date ; dtrace -n 'BEGIN { total = 0; mystart = timestamp; } pid$target::eleveldb_get:entry {self->active = 1; total++;} pid$target::eleveldb_get:return {self->active = 0;} syscall:::entry /pid == $target && self->active == 1/ { @[probefunc] = count(); } syscall::write*:entry /pid == $target && self->active == 1/ { @w[ustack()] = count(); } tick-5sec { printa(@); printf("Total calls to func: %d\n", total); printa(@w); exit(0);}' -p 7610 ; done
January 11, 2012 09:51:34 PM UTC
dtrace: description 'BEGIN ' matched 239 probes
CPU ID FUNCTION:NAME
7 69397 :tick-5sec
mmap 1
lwp_park 20
close 1655
open 1669
pread 8971
Total calls to func: 14332
January 11, 2012 09:51:41 PM UTC
dtrace: description 'BEGIN ' matched 239 probes
CPU ID FUNCTION:NAME
15 69397 :tick-5sec
lwp_park 73
close 529
open 536
pread 2696
Total calls to func: 5408
[...]
WEIRD. There are at *most* 2K vnode operations in the vnode's
mailbox. Yet the eleveldb_get function is being called many *many*
more times than that.
HRM, WEIRDER, that many-thousand calls per 5 sec happens only occasionally.
Bimodal behavior. And it's less common than the only mode, only a
few hundred calls per 5 sec.
So ... if we use prstat during the mailbox draining of vnode ops, we
see this: 98-99% system time for one LWP.
Total: 51 processes, 777 lwps, load averages: 1.10, 1.10, 0.96
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
7610 riak 1.5 98 0.0 0.0 0.0 0.0 0.0 0.0 0 13 208 0 beam.smp/582
So, let's see if we can create a D script that will look only at that
LWP. Viz. DTrace book page 80.
BUMMER, the sched provider isn't available inside a zone.
SO, it's possible to use tid == LWP# as a guard to look at only a
single thread. Hooray. Using "prstat -mLc" as shown above has a
single LWP at 98-99% system time. That LWP changes periodically: the
Erlang VM schedule migrates from time to time, and is prone to change
when the process is having DTrace probes enabled.
So, I'm eyeballing "prstat -mLc 1" output to catch which LWP is the
98% system time LWP, running probes guarded by that tid, and then
continue watching "prstat" output so that the same system time-eating
LWP remains through the entire dtrace snapshot of 2-4 seconds. Those
separate outputs are below ... I'll try to combine them using awk to
see if there's anything interesting to see when all of the runs are
aggregated.
---
^C[root@prod-2024 ~]# dtrace -n 'profile-997 /execname == "beam.smp" && arg1 && d == 581/ { @[ufunc(arg1)] = count(); } tick-2s {printa(@); exit(0);}'
dtrace: description 'profile-997 ' matched 2 probes
CPU ID FUNCTION:NAME
1 69398 :tick-2s
beam.smp`mbc_realloc 1
beam.smp`enc_atom 1
beam.smp`erts_dsig_send_msg 1
beam.smp`small_times 1
eleveldb.so`_ZN7leveldb6DBImpl5WriteERKNS_12WriteOptionsEPNS_10WriteBatchE 1
eleveldb.so`_ZN7leveldb8FindFileERKNS_21InternalKeyComparatorERKSt6vectorIPNS_12FileMetaDataESaIS5_EERKNS_5SliceE 1
eleveldb.so`_ZN7leveldb12_GLOBAL__N_116TwoLevelIteratorD1Ev 1
eleveldb.so`_ZN7leveldb12_GLOBAL__N_115ShardedLRUCache6LookupERKNS_5SliceE 1
eleveldb.so`_ZN7leveldb6crc32c6ExtendEjPKcm 1
eleveldb.so`_ZN7leveldb5Block11NewIteratorEPKNS_10ComparatorE 1
eleveldb.so`_ZN7leveldb5Block4IterD0Ev 1
libumem.so.1`umem_cache_alloc 1
libc.so.1`memcmp 1
libc.so.1`memcpy 1
libc.so.1`localtime_r 1
libc.so.1`clear_lockbyte 1
libc.so.1`mutex_unlock_queue 1
libc.so.1`mutex_lock_impl 2
beam.smp`process_main 3
[root@prod-2024 ~]# dtrace -n 'profile-997 /execname == "beam.smp" && arg1 && tid == 582/ { @[ufunc(arg1)] = count(); } tick-2s {printa(@); exit(0);}'
dtrace: description 'profile-997 ' matched 2 probes
CPU ID FUNCTION:NAME
7 69398 :tick-2s
beam.smp`buf_to_intlist 1
beam.smp`io_list_to_buf 1
beam.smp`encode_size_struct2 1
beam.smp`binary_to_term_1 1
beam.smp`tag_val_def 1
eleveldb.so`_ZN7leveldb6DBImpl19NewInternalIteratorERKNS_11ReadOptionsEPm 1
eleveldb.so`_ZNK7leveldb8SkipListIPKcNS_8MemTable13KeyComparatorEE18FindGreaterOrEqualERKS2_PPNS5_4NodeE 1
eleveldb.so`_ZN7leveldb10TableCache11NewIteratorERKNS_11ReadOptionsEmmPPNS_5TableE 1
eleveldb.so`_ZN7leveldb12_GLOBAL__N_116TwoLevelIteratorD1Ev 1
eleveldb.so`_ZN7leveldb12_GLOBAL__N_115ShardedLRUCache6LookupERKNS_5SliceE 1
eleveldb.so`_ZN7leveldb6crc32c6ExtendEjPKcm 1
eleveldb.so`_ZN7leveldb5Block11NewIteratorEPKNS_10ComparatorE 1
eleveldb.so`_ZN7leveldb9ReadBlockEPNS_16RandomAccessFileERKNS_11ReadOptionsERKNS_11BlockHandleEPPNS_5BlockE 1
libumem.so.1`umem_free 1
libumem.so.1`vmem_xfree 1
libc.so.1`strcmp 1
libc.so.1`clear_lockbyte 1
libc.so.1`mutex_lock_impl 1
libc.so.1`mutex_unlock 1
eleveldb.so`_ZNK7leveldb5Table11NewIteratorERKNS_11ReadOptionsE 2
libc.so.1`memcpy 2
beam.smp`process_main 4
eleveldb.so`_ZN7leveldb8FindFileERKNS_21InternalKeyComparatorERKSt6vectorIPNS_12FileMetaDataESaIS5_EERKNS_5SliceE 4
[root@prod-2024 ~]# dtrace -n 'profile-997 /execname == "beam.smp" && arg1 && tid == 579/ { @[ufunc(arg1)] = count(); } tick-2s {printa(@); exit(0);}'
dtrace: description 'profile-997 ' matched 2 probes
CPU ID FUNCTION:NAME
2 69398 :tick-2s
beam.smp`mbc_alloc 1
beam.smp`remote_send 1
beam.smp`enc_term 1
beam.smp`dsig_send 1
beam.smp`big_norm 1
beam.smp`erts_bs_private_append 1
beam.smp`erts_bs_append 1
beam.smp`tag_val_def 1
beam.smp`ethr_atomic32_inc 1
eleveldb.so`_ZN7leveldb10TableCache11NewIteratorERKNS_11ReadOptionsEmmPPNS_5TableE 1
eleveldb.so`_ZN7leveldb12_GLOBAL__N_115MergingIteratorD0Ev 1
eleveldb.so`_ZN7leveldb12_GLOBAL__N_115ShardedLRUCache6LookupERKNS_5SliceE 1
eleveldb.so`_ZN7leveldb5Block4IterD0Ev 1
eleveldb.so`_ZN7leveldb9ReadBlockEPNS_16RandomAccessFileERKNS_11ReadOptionsERKNS_11BlockHandleEPPNS_5BlockE 1
libc.so.1`mutex_trylock_adaptive 1
libc.so.1`mutex_unlock_queue 1
beam.smp`encode_size_struct2 2
beam.smp`erts_get_aligned_binary_bytes_extra 2
eleveldb.so`_ZN7leveldbL22GetLengthPrefixedSliceEPKc 2
eleveldb.so`_ZN7leveldb8FindFileERKNS_21InternalKeyComparatorERKSt6vectorIPNS_12FileMetaDataESaIS5_EERKNS_5SliceE 2
eleveldb.so`_ZN7leveldb6crc32c6ExtendEjPKcm 2
libc.so.1`pread 2
libc.so.1`memcpy 3
beam.smp`process_main 4
[root@prod-2024 ~]# dtrace -n 'profile-997 /execname == "beam.smp" && arg1 && tid == 581/ { @[ufunc(arg1)] = count(); } tick-4s {printa(@); exit(0);}'
dtrace: description 'profile-997 ' matched 2 probes
CPU ID FUNCTION:NAME
9 69403 :tick-4s
beam.smp`tree_delete 1
beam.smp`bf_unlink_free_block 1
beam.smp`get_free_block 1
beam.smp`and_2 1
beam.smp`eq 1
beam.smp`cmp 1
beam.smp`atom_to_list_1 1
beam.smp`encode_size_struct2 1
beam.smp`dsig_send 1
beam.smp`new_binary 1
beam.smp`small_to_big 1
beam.smp`small_times 1
beam.smp`erts_whereis_name 1
beam.smp`enif_get_list_cell 1
libstdc++.so.6.0.14`_ZdlPv 1
eleveldb.so`_ZN7leveldb12_GLOBAL__N_16DBIter4SeekERKNS_5SliceE 1
eleveldb.so`_ZNK7leveldb8SkipListIPKcNS_8MemTable13KeyComparatorEE14KeyIsAfterNodeERKS2_PNS5_4NodeE 1
eleveldb.so`_ZNK7leveldb8SkipListIPKcNS_8MemTable13KeyComparatorEE18FindGreaterOrEqualERKS2_PPNS5_4NodeE 1
eleveldb.so`_ZN7leveldb12_GLOBAL__N_115MergingIterator4SeekERKNS_5SliceE 1
eleveldb.so`_ZN7leveldb12_GLOBAL__N_116TwoLevelIterator4SeekERKNS_5SliceE 1
eleveldb.so`_ZN7leveldb12_GLOBAL__N_116TwoLevelIteratorD1Ev 1
eleveldb.so`_ZN7leveldb12_GLOBAL__N_115ShardedLRUCache7ReleaseEPNS_5Cache6HandleE 1
eleveldb.so`_ZN7leveldb5Block11NewIteratorEPKNS_10ComparatorE 1
eleveldb.so`_ZN7leveldb5Block4IterD0Ev 1
eleveldb.so`_ZNK7leveldb12_GLOBAL__N_121PosixRandomAccessFile4ReadEmmPNS_5SliceEPc 1
libumem.so.1`malloc 1
libumem.so.1`umem_depot_alloc 1
libumem.so.1`vmem_getseg 1
libumem.so.1`vmem_alloc 1
libc.so.1`memcpy 1
libc.so.1`___errno 1
libc.so.1`mutex_lock_impl 1
libc.so.1`cond_broadcast 1
libc.so.1`pthread_self 1
libumem.so.1`umem_cache_alloc 2
libc.so.1`memcmp 2
eleveldb.so`_ZN7leveldb8FindFileERKNS_21InternalKeyComparatorERKSt6vectorIPNS_12FileMetaDataESaIS5_EERKNS_5SliceE 3
eleveldb.so`_ZN7leveldb5Block4Iter4SeekERKNS_5SliceE 3
eleveldb.so`_ZN7leveldb12_GLOBAL__N_115ShardedLRUCache6LookupERKNS_5SliceE 5
beam.smp`process_main 10
^C[root@prod-2024 ~]# dtrace -n 'profile-997 /execname == "beam.smp" && arg1 && d == 585/ { @[ufunc(arg1)] = count(); } tick-4s {printa(@); exit(0);}'
dtrace: description 'profile-997 ' matched 2 probes
CPU ID FUNCTION:NAME
5 69403 :tick-4s
beam.smp`mbc_free 1
beam.smp`tree_delete 1
beam.smp`bf_get_free_block 1
beam.smp`cmp 1
beam.smp`erts_send_message 1
beam.smp`schedule 1
beam.smp`erts_gc_mixed_times 1
beam.smp`decoded_size 1
beam.smp`encode_size_struct2 1
beam.smp`dec_term 1
beam.smp`build_term_list 1
beam.smp`erts_find_export_entry 1
libstdc++.so.6.0.14`_Znam 1
eleveldb.so`_ZN7leveldbL22GetLengthPrefixedSliceEPKc 1
eleveldb.so`_ZNK7leveldb8SkipListIPKcNS_8MemTable13KeyComparatorEE18FindGreaterOrEqualERKS2_PPNS5_4NodeE 1
eleveldb.so`_ZN7leveldbL10UnrefEntryEPvS0_ 1
eleveldb.so`_ZNK7leveldb5Table11NewIteratorERKNS_11ReadOptionsE 1
eleveldb.so`_ZN7leveldb12_GLOBAL__N_116TwoLevelIteratorD1Ev 1
eleveldb.so`_ZN7leveldb4HashEPKcmj 1
eleveldb.so`_ZN7leveldb5BlockC1EPKcm 1
libumem.so.1`malloc 1
libumem.so.1`highbit 1
libumem.so.1`umem_alloc 1
libumem.so.1`umem_free 1
libc.so.1`offtime_u 1
libc.so.1`pread 1
libc.so.1`mutex_lock 1
libc.so.1`lmutex_lock 1
libc.so.1`pthread_self 1
beam.smp`bf_link_free_block 2
beam.smp`tag_val_def 2
eleveldb.so`_ZN7leveldb12_GLOBAL__N_115ShardedLRUCache7ReleaseEPNS_5Cache6HandleE 2
eleveldb.so`_ZN7leveldb5Block11NewIteratorEPKNS_10ComparatorE 2
eleveldb.so`_ZN7leveldb5Block4IterD0Ev 2
libc.so.1`clear_lockbyte 2
eleveldb.so`_ZN7leveldb5Block4Iter4SeekERKNS_5SliceE 3
libc.so.1`memcmp 3
libc.so.1`memcpy 3
libc.so.1`mutex_lock_impl 4
beam.smp`process_main 6
eleveldb.so`_ZN7leveldb8FindFileERKNS_21InternalKeyComparatorERKSt6vectorIPNS_12FileMetaDataESaIS5_EERKNS_5SliceE 7
WEIRD WEIRD WEIRD WEIRD WEIRD WEIRD WEIRD WEIRD
So, how does this work?
Steps taken:
* riak_core_node_watcher:service_up(riak_kv, whereis(riak_kv_sup)).
* Use etop to watch until mailbox backlog for pid 0.661.0 grows to
about 500.
* riak_core_node_watcher:service_down(riak_kv).
* Move mouse pointer over to "iostat -x 1" window then hit ENTER a
few times to help mark the point in time that I'd run service_down().
* Move mouse over to "dtrace" window and run the script below.
* Watch the dtrace window for output @ END.
* Move mouse over to "iostat" window and press control-c to stop it.
What do I see in the "iostat" window? Zero read IOPS for the entire
10 second period that the dtrace script is running. Yet dtrace is
claiming that it has done several hundred pread(2) calls that have
taken 16 millseconds or longer. WEIRD.
dtrace -n 'BEGIN {start = timestamp; } pid$target::eleveldb_get:entry {self->active = 1; total++;} pid$target::eleveldb_get:return {self->active = 0;} syscall:::entry /pid == $target && self->active == 1/ { self->start = timestamp } syscall:::return /pid == $target && self->active == 1 && self->start > 0/ { @[probefunc] = quantize((timestamp - self->start) / 1000000) ; @l[probefunc] = sum(timestamp-self->start); self->start = 0; } tick-5sec { printa(@); printf("Total calls to func: %d in %d msec wall-clock time\n", total, (timestamp - start) / 1000000); normalize(@l, 1000000); printf("Report: total milliseconds per system call (remember, in parallel!):\n"); printa(@l); exit(0);}' -p 7610
dtrace: description 'BEGIN ' matched 470 probes
CPU ID FUNCTION:NAME
9 69404 :tick-10sec
close
value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 9
1 | 0
open
value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
1 | 0
pread
value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@@@ 171
1 | 0
2 | 0
4 | 0
8 | 0
16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 367
32 |@ 15
64 | 0
Total calls to func: 604 in 9863 msec wall-clock time
Report: total milliseconds per system call (remember, in parallel!):
close 0
open 0
pread 9611
And here's the output of iostat -x 1 | egrep -v 'sd0|ramdisk',
starting shortly where I'd pressed ENTER to mark the time when I made
the transition for the riak_kv service from up -> down.
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd1 22.0 200.0 2816.6 10669.3 0.0 0.2 0.9 0 12
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd1 11.0 0.0 1282.4 0.0 0.0 0.1 6.7 0 6
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd1 9.0 0.0 1089.1 0.0 0.0 0.1 6.1 0 5
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd1 12.0 4.0 1534.9 16.0 0.0 0.1 5.2 0 8
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd1 6.0 44.0 768.5 5431.2 0.0 0.1 1.5 0 4
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd1 0.0 200.9 0.0 8396.9 0.0 0.1 0.3 0 1
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd1 0.0 140.0 0.0 3399.8 0.0 0.0 0.2 0 0
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd1 0.0 146.9 0.0 5718.7 0.0 0.0 0.3 0 1
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
^C
---
Timeline for weird-pread-latency.mov:
0:00 -- Upper left: DTrace window, currently idle
Lower left: Riak shell, currently idle
Upper right: iostat -x 1 | egrep -v 'sd0|ramdisk'
Lower right: etop output
0:03 -- Lower left: Enable the riak_kv service to start workload
0:08 -- Upper left: Get dtrace command ready
0:10 -- Lower right: Just watching the "MsgQ" column for a backlog
of about 600 messages. Once we hit that point, there's
about 10 seconds of work after the riak_kv service
is disabled (i.e. avoid new work from outside)
1:14 -- Lower right: etop crashed, so I restarted it
2:00 -- Lower right: Finally above the 600 message mark, time to go to work
2:02 -- Lower left: Disable riak_kv service
2:03 -- Upper right: press ENTER a few times to mark when the riak_kv
service went down
2:05 -- Upper left: Start dtrace
2:17 -- Upper left: DTrace sript has finished.
Upper right: Iostat reported one read operation for the entire
approx. 10 seconds that the D script ran.
---
ANSWER for the puzzling pread latency problem: leveldb is reading 30MB
per request (an entire leveldb file), and that file is cached, so it
isn't generating any disk I/O. All the system time is the kernel
copying 30MB of data in 22-24 millisecond operations each.
dtrace -qn 'pid$target::eleveldb_get:entry {self->active = 1; total++;} pid$target::eleveldb_get:return {self->active = 0;} syscall::pread:entry /pid == $target && self->active == 1 && arg2 > 1000000/ { printf("pread %d bytes offset %d fd %d path %s\n", arg2, arg3, arg0, fds[arg0].fi_pathname); }' -p 7610
Bummer, apparently fds[] array isn't available in a zone? There is no
output from this script except this:
dtrace: error on enabled probe ID 3 (ID 318: syscall::pread:entry): invalid kernel access in action #4 at DIF offset 64
dtrace: error on enabled probe ID 3 (ID 318: syscall::pread:entry): invalid kernel access in action #4 at DIF offset 64
Fine. So, we'll modify it a bit to provide a timestamp and omit the path.
dtrace -qn 'BEGIN { start = timestamp; } pid$target::eleveldb_get:entry {self->active = 1; total++;} pid$target::eleveldb_get:return {self->active = 0;} syscall::pread:entry /pid == $target && self->active == 1 && arg2 > 1000000/ { printf("time %d pread %d bytes offset %d fd %d\n", (timestamp - start) / 1000000000, arg2, arg3, arg0); }' -p 7610
time 3 pread 30929259 bytes offset 311941 fd 4397
time 3 pread 30929259 bytes offset 311941 fd 4397
time 3 pread 30929259 bytes offset 311941 fd 4397
time 3 pread 30929259 bytes offset 311941 fd 4397
time 3 pread 30929259 bytes offset 311941 fd 4397
time 3 pread 30929259 bytes offset 311941 fd 4397
time 3 pread 30929259 bytes offset 311941 fd 4397
time 3 pread 30929259 bytes offset 311941 fd 4397
[...]
dtrace -qn 'pid$target::_ZN7leveldb5SliceC1EPKcm:entry {this->size = (int) arg22; printf("size %d\n", this->size);}'
dtrace -n 'BEGIN { start = timestamp; } pid$target::_ZN7leveldb5SliceC1EPKcm:entry { self->data = arg1; self->size = (arg2 > 0) ? arg2 : 1; } pid$target::eleveldb_get:entry {self->active = 1; total++;} pid$target::eleveldb_get:return {self->active = 0;} syscall::pread:entry /pid == $target && self->active == 1 && arg2 > 50000/ { printf("time %d pread %d bytes offset %d fd %d self->size %d\n", (timestamp - start) / 1000000000, arg2, arg3, arg0, self->size); tracemem(copyin(self->data, 64), 64); }' -p 14922
11 318 pread:entry time 6 pread 62177 bytes offset 1427179 fd 2163 self->size 53
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
0: 10 00 00 00 03 0c b7 80 08 12 b1 5b ec 96 9b 2d ...........[...-
10: ce bf 72 b0 dd c0 08 12 98 cc e6 53 39 ac de 73 ..r........S9..s
20: 34 9c cc 26 33 09 cd 7e 63 34 9a cd 27 33 99 a4 4..&3..~c4..'3..
30: da 69 35 00 08 66 20 08 02 00 00 00 00 83 44 04 .i5..f .......D.
(riak@127.0.0.1)13> sext:decode(<<16#10, 16#00, 16#00, 16#00, 16#03, 16#0c, 16#b7, 16#80, 16#08, 16#12, 16#b1, 16#5b, 16#ec, 16#96, 16#9b, 16#2d,16#ce, 16#bf, 16#72, 16#b0, 16#dd, 16#c0, 16#08, 16#12, 16#98, 16#cc, 16#e6, 16#53, 16#39, 16#ac, 16#de, 16#73,16#34, 16#9c, 16#cc, 16#26, 16#33, 16#09, 16#cd, 16#7e, 16#63, 16#34, 16#9a, 16#cd, 16#27, 16#33, 16#99, 16#a4,16#da, 16#69, 16#35, 16#00, 16#08>>).
{o,<<"bodies_raw">>,<<"1323579490109_1454994645">>}
---
Later in the same day, Thu Jan 12, 6:21pm CST
Hrm. Well, what's a person to do? I wanted to do more investigation
with DTrace and eleveldb.so, but I wanted a shared lib that had pid
provider probes for class constructors, e.g. leveldb::Slice class. So
I copied the eleveldb.so shared lib from prod-2000 to prod-2024,
stopped Riak, then restarted to use the new shared lib.
What I forgot was that the restart would start riak_kv service and
leave it running. Duh! The weird/crazy/but-also-nice thing is that
LevelDB apparently wrote enough data that it triggered a merge (?),
and now the problem SST file is gone. *And* the latency problem has
disappeared.
I managed to make a copy of the allegedly-bad SST file, it's in the
tarball bad-latency.840833.sst.tar.gz.
-rw-r--r-- 0 riak riak 31242320 Jan 6 02:07 var/db/riak/leveldb/813531184842397132203223002586212227738165575680/840833.sst
... try to reproduce, starting from empty data dirs, Riak + eleveldb backend:
{ok, C} = riak:local_client().
[C:put(riak_object:new(<<"abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcd">>, <<"qwerqwerqwerqwerqwerqwerqwerqwer">>, <<"valvalvalvalvalvalvalvalvalvalvalval">>)) || _ <- lists:seq(1,11*1000)].
... will create 3 .sst files. Then clobber those 3 sst files with the
bad one from XXXer. .. Bah, that doesn't work.
---
Run from sbb:
#!/bin/sh
for i in {0,1,2,3}{0,1,2,3,4,5,6,7,8,9} 40; do
ssh -n -p 2222 root@prod-20$i.XXXer.com "/usr/bin/time curl http://prod-20$i.XXXer.com:8098/stats 2>&1 | awk '/real/ { print \"prod-20$i\", \$0}' " &
done | sort -nr -k 3
---
Ring partition ownership helper functions:
Fowned_partitions = fun(Owners, Node) -> [P || {P, Owner} <- Owners, Owner =:= Node] end.
Fvnode_pids = fun(Node) -> [Pid || {_,Pid,_,_} <- supervisor:which_children({riak_core_vnode_sup, Node})] end.
Frunning_vnodes = fun(Node) -> Pids = Fvnode_pids(Node), [rpc:call(Node, riak_core_vnode, get_mod_index, [Pid], 30000) || Pid <- Pids] end.
Factive_partitions = fun(Node) -> lists:foldl(fun({_,P}, Ps) -> ordsets:add_element(P, Ps) end, [], Frunning_vnodes(Node)) end.
Fpartitions = fun(Node, Ring) -> Owners = riak_core_ring:all_owners(Ring), Owned = ordsets:from_list(Fowned_partitions(Owners, Node)), Active = ordsets:from_list(Factive_partitions(Node)), Stopped = ordsets:subtract(Owned, Active), Secondary = ordsets:subtract(Active, Owned), Primary = ordsets:subtract(Active, Secondary), {Primary, Secondary, Stopped} end.
{ok, Ring1} = riak_core_ring_manager:get_my_ring().
{ok, RRing1} = riak_core_ring_manager:get_raw_ring().
(media-riak@prod-2024-internal.XXXer.com)33> riak_core_ring:index_owner(Ring1, 1444374665018431399630985401082889078019339583488).
'media-riak@prod-2024-internal.XXXer.com'
(media-riak@prod-2024-internal.XXXer.com)34> riak_core_ring:index_owner(Ring1, 552344857077206473969556670176954617780122943488).
'media-riak@prod-2023-internal.XXXer.com'
[Primary1, Secondary1, Stopped1] = Fpartitions(node(), Ring1).
(media-riak@prod-2024-internal.XXXer.com)57> lists:usort([riak_core_ring:next_owner(Ring1, II) || II <- Primary1 ++ Secondary1]).
[{undefined,undefined,undefined}]
Hmmmm, that isn't good, I don't believe: next_owner() is doing most of
the heavy lifting for riak_core_vnode:should_handoff().
{ok, RRing1} = riak_core_ring_manager:get_raw_ring().
element(7, RRing1). %% element 7 = #chstate_v2.next list
[]
----------
Interruption: error_logger <0.6.0> on prod-2000 is apparently hung.
(media-riak@prod-2000-internal.XXXer.com)19> io:format("~s\n", [element(2,process_info(pid(0,6,0), backtrace))]).
Program counter: 0x00000000030bb790 (gen:do_call/4 + 576)
CP: 0x0000000000000000 (invalid)
arity = 0
0xfffffd7fe59ec4c0 Return addr 0x00000000030c32d0 (gen_event:rpc/2 + 96)
y(0) #Ref<0.0.76.177386>
y(1) 'media-riak@prod-2000-internal.XXXer.com'
y(2) []
y(3) infinity
y(4) {sync_notify,{log,3,{["2012",45,["0",49],45,"14"],["16",58,"24",58,["0",50],46,"534"]},[["[","error","] "],["<0.22085.415>",32],"webmachine error: path=\"/riak/bodies_raw/1326326952460_e24032fb530dcd8a_ba15cfa417c2a5a2\"\n{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}"]}}
y(5) <0.6.0>
y(6) <0.41.0>
0xfffffd7fe59ec500 Return addr 0x0000000004777e18 (error_logger_lager_h:handle_event/2 + 6072)
0xfffffd7fe59ec508 Return addr 0x00000000030c73c0 (gen_event:server_update/4 + 272)
y(0) []
y(1) []
y(2) []
y(3) []
y(4) []
y(5) []
y(6) {}
0xfffffd7fe59ec548 Return addr 0x00000000030c6fa8 (gen_event:server_notify/4 + 136)
y(0) error_logger
y(1) {error,<0.89.0>,{<0.22085.415>,"webmachine error: path=~p~n~p~n",["/riak/bodies_raw/1326326952460_e24032fb530dcd8a_ba15cfa417c2a5a2",{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}]}}
y(2) {handler,error_logger_lager_h,false,{},<0.47.0>}
y(3) {}
y(4) error_logger_lager_h
y(5) Catch 0x00000000030c73c0 (gen_event:server_update/4 + 272)
0xfffffd7fe59ec580 Return addr 0x00000000030c3ea0 (gen_event:handle_msg/5 + 256)
y(0) error_logger
y(1) handle_event
y(2) {error,<0.89.0>,{<0.22085.415>,"webmachine error: path=~p~n~p~n",["/riak/bodies_raw/1326326952460_e24032fb530dcd8a_ba15cfa417c2a5a2",{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}]}}
y(3) []
0xfffffd7fe59ec5a8 Return addr 0x00000000030de690 (proc_lib:init_p_do_apply/3 + 56)
y(0) []
y(1) error_logger
y(2) <0.2.0>
0xfffffd7fe59ec5c8 Return addr 0x00000000006df368 (<terminate process normally>)
y(0) Catch 0x00000000030de6b0 (proc_lib:init_p_do_apply/3 + 88)
ok
(media-riak@prod-2000-internal.XXXer.com)20> io:format("~s\n", [element(2,process_info(pid(0,41,0), backtrace))]).
Program counter: 0x0000000004782d88 (io:wait_io_mon_reply/2 + 56)
CP: 0x0000000000000000 (invalid)
arity = 0
0xfffffd7ff73b8aa0 Return addr 0x00000000047806d8 (io:o_request/3 + 96)
y(0) #Ref<0.0.76.177389>
y(1) <0.37.0>
0xfffffd7ff73b8ab8 Return addr 0x00000000046fe798 (lager_console_backend:handle_event/2 + 752)
y(0) []
y(1) []
y(2) put_chars
y(3) {put_chars,unicode,[["16",58,"24",58,["0",50],46,"534"]," ",["[","error","] "],"webmachine error: path=\"/riak/bodies_raw/1326326952460_e24032fb530dcd8a_ba15cfa417c2a5a2\"\n{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}","\n"]}
y(4) <0.37.0>
0xfffffd7ff73b8ae8 Return addr 0x00000000030c73c0 (gen_event:server_update/4 + 272)
y(0) {state,6,false}
0xfffffd7ff73b8af8 Return addr 0x00000000030c6fa8 (gen_event:server_notify/4 + 136)
y(0) lager_event
y(1) {log,3,{["2012",45,["0",49],45,"14"],["16",58,"24",58,["0",50],46,"534"]},[["[","error","] "],["<0.22085.415>",32],"webmachine error: path=\"/riak/bodies_raw/1326326952460_e24032fb530dcd8a_ba15cfa417c2a5a2\"\n{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}"]}
y(2) {handler,lager_console_backend,false,{state,6,false},<0.44.0>}
y(3) {state,6,false}
y(4) lager_console_backend
y(5) Catch 0x00000000030c73c0 (gen_event:server_update/4 + 272)
0xfffffd7ff73b8b30 Return addr 0x00000000030c7088 (gen_event:server_notify/4 + 360)
y(0) lager_event
y(1) handle_event
y(2) {log,3,{["2012",45,["0",49],45,"14"],["16",58,"24",58,["0",50],46,"534"]},[["[","error","] "],["<0.22085.415>",32],"webmachine error: path=\"/riak/bodies_raw/1326326952460_e24032fb530dcd8a_ba15cfa417c2a5a2\"\n{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}"]}
y(3) []
0xfffffd7ff73b8b58 Return addr 0x00000000030c4008 (gen_event:handle_msg/5 + 616)
y(0) {handler,lager_file_backend,false,{state,[{file,"/var/log/riak/error.log",3,{file_descriptor,prim_file,{#Port<0.1773>,11}},880192,false,10485760,[{hour,0}],5},{file,"/var/log/riak/console.log",6,{file_descriptor,prim_file,{#Port<0.1774>,12}},880193,false,10485760,[{hour,0}],5}]},<0.45.0>}
0xfffffd7ff73b8b68 Return addr 0x00000000030de690 (proc_lib:init_p_do_apply/3 + 56)
y(0) []
y(1) lager_event
y(2) <0.40.0>
y(3) {<0.6.0>,#Ref<0.0.76.177386>}
y(4) <0.6.0>
0xfffffd7ff73b8b98 Return addr 0x00000000006df368 (<terminate process normally>)
y(0) Catch 0x00000000030de6b0 (proc_lib:init_p_do_apply/3 + 88)
ok
NOTE: We've been waiting on the same ref, #Ref<0.0.76.177389>, for
many minutes now. :-(
This may be a DTrace-related I/O problem (though no Erlang VM probes
are enabled that I know of). Or not. I can't tell.
UPDATE: See below for more info.
---
OK, on to trying to see how many LevelDB calls might be executing
simultaneously.
dtrace -qn 'BEGIN { highest = 0; cnt = 0; } pid$target:eleveldb.so:eleveldb_*:entry { cnt++ ; highest = (cnt > highest) ? cnt : highest; self->start = timestamp; } pid$target:eleveldb.so:eleveldb_*:return /self->start > 0/ { cnt = (cnt < 1) ? 0 : (cnt -1); @ = quantize((timestamp - self->start) / 1000000); self->start = 0;} tick-5sec { printf("Highest is %d\n", highest); highest = 0;} tick-30sec { cnt = 0; /*printf("Breakdown in milliseconds\n"); printa(@);*/ trunc(@);} ' -p 14922
---
"Bad SST" ... may not be bad, but stupid app asking for the same
big-ass key, {<<"bodies_raw">>, <<"key">>}, zillions of times.
{_, M} = process_info(pid(0,544,0), messages).
Ks = lists:sort([element(2, element(4, element(2, GE))) || GE <- M, element(1, element(4, element(2, GE))) == riak_kv_get_req_v1]).
Weird = [BK || {<<"bodies_raw">>, <<"key">>} = BK <- Ks].
(media-riak@prod-2023-internal.XXXer.com)27> length(Weird).
30590
And that key is 50MB.
(media-riak@prod-2024-internal.XXXer.com)9> file:write_file("/tmp/bad-sst-but-really-big-bkey-get.bin", term_to_binary(M)).
----
(media-riak@prod-2023-internal.XXXer.com)1> {ok, Ref} = eleveldb:open("/var/db/riak/leveldb/812103937149691172322164716616762732601782829056", []).
{ok,<<>>}
(media-riak@prod-2023-internal.XXXer.com)3> X = eleveldb:fold_keys(Ref, fun(X, Acc) -> [X|Acc] end, [], []).
(Media-riak@prod-2023-internal.XXXer.com)4> file:write_file("/var/tmp/keys-just-in-case.bin", term_to_binary(X)).
ok
(media-riak@prod-2023-internal.XXXer.com)5> length(X).
3864400
(media-riak@prod-2023-internal.XXXer.com)6> sext:decode(hd(X)).
{o,<<"messages">>,
<<"zyana.casillas.fb.100002111391145:contacts">>}
fun() ->
{ok, Ref} = eleveldb:open("/var/db/riak/leveldb/812103937149691172322164716616762732601782829056", []),
{ok, FH} = file:open("/var/tmp/812103937149691172322164716616762732601782829056.slfhackdump", [write, binary, raw, compressed]),
WriteOne = fun(X, {FH2, Count}) ->
XBin = term_to_binary(X),
XBinSize = size(XBin),
file:write(FH2, [<<XBinSize:32>>, XBin]),
if Count /= 100 -> ok; true -> error_logger:info_msg("Hack at ~p, TEST IT!\n", [Count]), file:close(FH2), exit(normal) end,
if Count rem 10000 == 0 -> error_logger:info_msg("Hack dump wrote ~p records\n", [Count]); true -> ok end,
{FH2, Count+1}
end,
{_, Final} = eleveldb:fold_keys(Ref, WriteOne, {FH, 1}, []),
error_logger:info_msg("Hack dump final: wrote ~p records\n", [Final]),
exit(normal)
end.
---
dtrace -qn 'BEGIN { highest = 0; cnt = 0; } pid$target:eleveldb.so:eleveldb_*:entry,pid$target:eleveldb.so:*DoCompactionWork*:entry { cnt++ ; highest = (cnt > highest) ? cnt : highest; } pid$target:eleveldb.so:eleveldb_*:return,pid$target:eleveldb.so:*DoCompactionWork*:return { cnt = (cnt < 1) ? 0 : (cnt -1); } tick-5sec { printf("Highest is %d\n", highest); highest = 0;} tick-30sec { printf("reset\n"); cnt = 0;} ' -p 23271
Highest is 9
Highest is 8
Highest is 7
Highest is 10
Highest is 10
[...]
dtrace -qn 'pid$target:eleveldb.so:*DoCompactionWork*:entry {self->start = timestamp} pid$target:eleveldb.so:*DoCompactionWork*:return { printf("compaction %d msec\n", timestamp - self->start);}' -p 23271
... Hm hrm hmmmm very interesting, see 2011-12-07.XXXer.data/slf.sh and
slf.d implementation.
prod-2013 compaction 3445242949531516 -> 3445579140520676 = 336190 msec
prod-2016 compaction 3444935493726522 -> 3445264709366428 = 329215 msec
prod-2017 compaction 3445574248440866 -> 3445866808284379 = 292559 msec
prod-2016 compaction 3446367501174857 -> 3446645210490365 = 277709 msec
prod-2004 compaction 3446351606124390 -> 3446569990225495 = 218384 msec
prod-2009 compaction 3447014449109281 -> 3447087977597887 = 73528 msec
prod-2010 compaction 3446561322573171 -> 3446630495727881 = 69173 msec
prod-2009 compaction 3444289135231845 -> 3444357866368973 = 68731 msec
prod-2024 compaction 3218058352359332 -> 3218126783109779 = 68430 msec
prod-2002 compaction 3445177206907731 -> 3445244794304685 = 67587 msec
prod-2002 compaction 3446004235060663 -> 3446065053403479 = 60818 msec
prod-2011 compaction 3446845235168486 -> 3446904391635234 = 59156 msec
prod-2016 compaction 3444818206254735 -> 3444877176239571 = 58969 msec
prod-2006 compaction 3445691300975842 -> 3445747795173234 = 56494 msec
prod-2006 compaction 3447533560070312 -> 3447588734869068 = 55174 msec
prod-2000 compaction 3447230625813014 -> 3447284372201923 = 53746 msec
prod-2013 compaction 3446347422456985 -> 3446398488149603 = 51065 msec
prod-2002 compaction 3446112111566514 -> 3446160085767080 = 47974 msec
prod-2000 compaction 3446089380507528 -> 3446129998030687 = 40617 msec
prod-2009 compaction 3443726676664708 -> 3443766386818881 = 39710 msec
prod-2017 compaction 3443354706109044 -> 3443391235674135 = 36529 msec
prod-2006 compaction 3445627567244311 -> 3445658942845795 = 31375 msec
prod-2009 compaction 3446209955030151 -> 3446239145536200 = 29190 msec
prod-2020 compaction 3218239502629725 -> 3218266849931259 = 27347 msec
---
Notes on the plane (see notebook for originals)
* Eumerate _observed_ problems with Joyent cluster:
1. ** Node 'media-riak@prod-20XX-internal.XXXer.com' not responding **
NOTE: This happened 20 times to 16 unique machines as observed
by prod-2039 on 15-Jan-2012
2. Slow HTTP queries to /stats and (!) /ping URIs
NOTE: This happens even when curl client is on localhost, so no
network effects (e.g., packet loss) is a factor
3. eleveldb interference during startup monster merges, e.g. wee hours
of Sunday 15-Jan-2012.
* Investigate what all a HTTP /stats query does: how many Erlang
processes does the service communicate with? Though we also see
latency problems with /ping URI, I've seen curl://localhost:8098/stats
queries take over 60 seconds.
* We don't have any meaningful network stats history. XXXer hasn't
set up Cacti or equivalent yet. So I believe we need to monitor
this ourselves on each node and keep a history, using data from
"netstat -s" and polled on a very frequent basis, e.g. 5 seconds.
* Would it be helpful to have a D script to watch the internal VM
functions for sending & receiving distributed Erlang messages? This
could also confirm what "not responding" errors cluster-wide claim:
that the complained-about node takes a nap/goes on holiday for at
least 60 seconds and sometimes for several minutes.
- We still don't know if this problem is related to network problems
(e.g., packet loss), NIF interference with Erlang scheudler threads,
both, or neither.
* Has Matt & crew found the app code path that's generating the
bizillion requests for the bogus 30-50MB {<<"bodies_raw">>,
<<"key">>} bkey?
- If not, then how about patching get & put FSM's to look for that
!@#$! bkey and complain whenever they see it? Or ignore any op
with that key? Or return a HTTP 972 CRAZY REPLY to provoke their
app into crashing ASAP?
* OT: create D to decode Erlang empty list, atom
* OT: resolve R15B source merge conflict
* OT: pester CDL about network monitoring best practices
----
Output from slf.d:
January 17, 2012 05:38:18 AM UTC
prod-2017 tid 593 compaction 3453448176397581 -> 3453774276161487 = 326099 msec
Report from prod-2000 (and all others, though not included here):
2012-01-17 05:36:56.216 ...** Node 'media-riak@prod-2017-internal.XXXer.com' not responding **
2012-01-17 05:36:56.492 ...net_kernel event: {nodedown,'media-riak@prod-2017-internal.XXXer.com',[{nodedown_reason,net_tick_timeout},{node_type,visible}]}
2012-01-17 05:38:58.641 ...net_kernel event: {nodeup,'media-riak@prod-2017-internal.XXXer.com',[{node_type,visible}]}
January 17, 2012 05:30:18 AM UTC
prod-2014 tid 594 compaction 3453278623074909 -> 3453568805849073 = 290182 msec
Report from prod-2000 (and all others, though not included here):
2012-01-17 05:29:24.449 ...** Node 'media-riak@prod-2014-internal.XXXer.com' not responding **
2012-01-17 05:29:24.678 ...net_kernel event: {nodedown,'media-riak@prod-2014-internal.XXXer.com',[{nodedown_reason,net_tick_timeout},{node_type,visible}]}
2012-01-17 05:30:29.045 ...net_kernel event: {nodeup,'media-riak@prod-2014-internal.XXXer.com',[{node_type,visible}]}
January 17, 2012 05:40:18 AM UTC
prod-2012 tid 593 compaction 3454206245670796 -> 3454284912945709 = 78667 msec
No nodedown events reported
January 17, 2012 05:00:18 AM UTC
prod-2010 tid 593 compaction 3451910297021077 -> 3451988140075538 = 77843 msec
No nodedown events reported
---
2012-01-18 Wednesday
On the LevelDB front, we appear to have found two different bugs:
* A single compaction op can "Expand" to add thousands of extra files
* While trying to recreate the above bug, found another apparent bug
where a single eleveldb put trigger a cascade of 3-file (2-file?)
compaction, dozens or hundreds of them, that blocks the put for 30+
seconds. Each little compaction appears to have the same number of
files.
----
More on the error-logger-is-blocked front.
My reading of this is that the lager_event process, <0.43.0>, is
blocked waiting for a reply from its group leader, <0.39.0>. The
group leader's mailbox is empty, so what happened? Did the
{io_request, ...} message never arrive at the group leader, or did
the group leader's reply never arrive back? In either case, how
could such intra-machine messages ever get lost??
(media-riak@prod-2000-internal.XXXer.com)5> lists:sort([{Size, Pid}
(media-riak@prod-2000-internal.XXXer.com)5> || Pid <- processes(),
(media-riak@prod-2000-internal.XXXer.com)5> {message_queue_len, Size} <-
(media-riak@prod-2000-internal.XXXer.com)5> [catch process_info(Pid, message_queue_len)],
(media-riak@prod-2000-internal.XXXer.com)5> Size > 0]).
[{1,<0.32.0>},{16,<0.43.0>},{261,<0.89.0>},{2856,<0.5.0>}]
(media-riak@prod-2000-internal.XXXer.com)6>
(media-riak@prod-2000-internal.XXXer.com)6>
(media-riak@prod-2000-internal.XXXer.com)6>
(media-riak@prod-2000-internal.XXXer.com)6> io:format("~s\n", [element(2,process_info(pid(0,5,0), backtrace))]).
Program counter: 0x00000000030aa5a0 (gen:do_call/4 + 576)
CP: 0x0000000000000000 (invalid)
arity = 0
0xfffffd7fd0445600 Return addr 0x00000000030b6a18 (gen_event:rpc/2 + 96)
y(0) #Ref<0.0.22.18473>
y(1) 'media-riak@prod-2000-internal.XXXer.com'
y(2) []
y(3) infinity
y(4) {sync_notify,{log,3,{["2012",45,["0",49],45,"19"],[["0",49],58,"31",58,"32",46,["0",50,50]]},[["[","error","] "],["<0.19513.124>",32],"webmachine error: path=\"/riak/bodies_raw/1326936634921_0291564814_f691cea\"\n{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}"]}}
y(5) <0.5.0>
y(6) <0.43.0>
0xfffffd7fd0445640 Return addr 0x0000000004781518 (error_logger_lager_h:handle_event/2 + 6072)
0xfffffd7fd0445648 Return addr 0x00000000030bab08 (gen_event:server_update/4 + 272)
y(0) []
y(1) []
y(2) []
y(3) []
y(4) []
y(5) []
y(6) {}
0xfffffd7fd0445688 Return addr 0x00000000030ba6f0 (gen_event:server_notify/4 + 136)
y(0) error_logger
y(1) {error,<0.91.0>,{<0.19513.124>,"webmachine error: path=~p~n~p~n",["/riak/bodies_raw/1326936634921_0291564814_f691cea",{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}]}}
y(2) {handler,error_logger_lager_h,false,{},<0.49.0>}
y(3) {}
y(4) error_logger_lager_h
y(5) Catch 0x00000000030bab08 (gen_event:server_update/4 + 272)
0xfffffd7fd04456c0 Return addr 0x00000000030b75e8 (gen_event:handle_msg/5 + 256)
y(0) error_logger
y(1) handle_event
y(2) {error,<0.91.0>,{<0.19513.124>,"webmachine error: path=~p~n~p~n",["/riak/bodies_raw/1326936634921_0291564814_f691cea",{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}]}}
y(3) []
0xfffffd7fd04456e8 Return addr 0x00000000030afec8 (proc_lib:init_p_do_apply/3 + 56)
y(0) []
y(1) error_logger
y(2) <0.2.0>
0xfffffd7fd0445708 Return addr 0x00000000006df368 (<terminate process normally>)
y(0) Catch 0x00000000030afee8 (proc_lib:init_p_do_apply/3 + 88)
ok
(media-riak@prod-2000-internal.XXXer.com)7> process_info(pid(054,0)).
** exception error: undefined shell command pid/2
(media-riak@prod-2000-internal.XXXer.com)8> process_info(pid(0,5,0)).
[{registered_name,error_logger},
{current_function,{gen,do_call,4}},
{initial_call,{proc_lib,init_p,5}},
{status,waiting},
{message_queue_len,2858},
{messages,[{notify,{error,<0.91.0>,
{<0.30723.124>,"webmachine error: path=~p~n~p~n",
["/riak/bodies_raw/1326936665946_0712657087_890c94f",
{error,{exit,normal,
[{webmachine_request,read_chunk_length,3},
{webmachine_request,recv_chunked_body,3},
{webmachine_request,read_whole_stream,...},
{webmachine_request,...},
{...}|...]}}]}}},
{notify,{error,<0.91.0>,
{<0.31283.124>,"webmachine error: path=~p~n~p~n",
["/riak/bodies_raw/1326927019020_4817631435_19183809",
{error,{exit,normal,
[{webmachine_request,read_chunk_length,3},
{webmachine_request,recv_chunked_body,...},
{webmachine_request,...},
{...}|...]}}]}}},
{notify,{error,<0.91.0>,
{<0.27205.124>,"webmachine error: path=~p~n~p~n",
["/riak/bodies_raw/1326936639145_0eefb4d7ced50272_d431aae4335c28d6",
{error,{exit,normal,
[{webmachine_request,read_chunk_length,...},
{webmachine_request,...},
{...}|...]}}]}}},
{notify,{error,<0.91.0>,
{<0.19987.125>,"webmachine error: path=~p~n~p~n",
["/riak/bodies_raw/1326936749141_2996738267",
{error,{exit,normal,
[{webmachine_request,...},{...}|...]}}]}}},
{notify,{error,<0.91.0>,
{<0.20989.125>,"webmachine error: path=~p~n~p~n",
["/riak/bodies_raw/1326936319977_0755845112_1753401",
{error,{exit,normal,[{...}|...]}}]}}},
{notify,{error,<0.91.0>,
{<0.18515.121>,"webmachine error: path=~p~n~p~n",
["/riak/bodies_raw/1326936525750_0871077864_f2af806",
{error,{exit,normal,[...]}}]}}},
{notify,{error,<0.91.0>,
{<0.24401.125>,"webmachine error: path=~p~n~p~n",
["/riak/bodies_raw/1326936763467_1191873287",
{error,{exit,normal,...}}]}}},
{notify,{error,<0.91.0>,
{<0.7968.126>,"webmachine error: path=~p~n~p~n",
["/riak/bodies_raw/lrg_1326930983572_279395581_7b1d173f",
{error,{error,...}}]}}},
{notify,{error,<0.91.0>,
{<0.7101.126>,"webmachine error: path=~p~n~p~n",
["/riak/bodies_raw/1326936457880_bab39ccd82aa0131_da56ea4b2b8643b3",
{error,{...}}]}}},
{notify,{error,<0.91.0>,
{<0.21435.126>,"webmachine error: path=~p~n~p~n",
["/riak/bodies_raw/1326936030608_c540d7f481f91100_ccd676e4dabd9b62",
{error,...}]}}},
{notify,{error,<0.91.0>,
{<0.1781.127>,"webmachine error: path=~p~n~p~n",
["/riak/bodies_raw/1326936083601_0249619846_0a82470",
{...}]}}},
{notify,{error,<0.91.0>,
{<0.20431.127>,"webmachine error: path=~p~n~p~n",
[[...]|...]}}},
{notify,{error,<0.91.0>,
{<0.28118.127>,"webmachine error: path=~p~n~p~n",[...]}}},
{notify,{error,<0.91.0>,{<0.4580.128>,[...],...}}},
{notify,{error,<0.91.0>,{<0.11898.128>,...}}},
{notify,{error,<0.91.0>,{...}}},
{notify,{error,<0.91.0>,...}},
{notify,{error,...}},
{notify,{...}},
{notify,...},
{...}|...]},
{links,[<0.32.0>,<0.49.0>,<0.0.0>]},
{dictionary,[{'$ancestors',[<0.2.0>]},
{'$initial_call',{gen_event,init_it,6}}]},
{trap_exit,true},
{error_handler,error_handler},
{priority,normal},
{group_leader,<0.32.0>},
{total_heap_size,1346269},
{heap_size,1346269},
{stack_size,35},
{reductions,6818201},
{garbage_collection,[{min_bin_vheap_size,46368},
{min_heap_size,233},
{fullsweep_after,0},
{minor_gcs,0}]},
{suspending,[]}]
(media-riak@prod-2000-internal.XXXer.com)9> process_info(pid(0,43,0)).
[{registered_name,lager_event},
{current_function,{io,wait_io_mon_reply,2}},
{initial_call,{proc_lib,init_p,5}},
{status,waiting},
{message_queue_len,16},
{messages,[{<0.11163.124>,
{<0.11163.124>,#Ref<0.0.22.3946>},
{sync_notify,{log,6,
{["2012",45,["0",49],45,"19"],
[["0",49],58,"31",58,"17",46,"799"]},
[["[","info","] "],
["<0.11163.124>",64,"riak_core_handoff_sender",58,
"start_fold",58,"39",32],
"Starting handoff of partition riak_kv_vnode 1084708246456529509604297336781616303650887434240 from 'media-riak@prod-2000-internal.XXXer.com' to 'media-riak@prod-2006-internal.XXXer.com'"]}}},
{<0.11237.124>,
{<0.11237.124>,#Ref<0.0.22.5130>},
{sync_notify,{log,6,
{["2012",45,["0",49],45,"19"],
[["0",49],58,"31",58,"18",46,["0",50|...]]},
[["[","info","] "],
["<0.11237.124>",64,"riak_core_handoff_receiver",58,
"process_message",58,"101",32],
"Receiving handoff data for partition riak_kv_vnode:1113253200310648707225463056170606206378542366720"]}}},
{<0.11241.124>,
{<0.11241.124>,#Ref<0.0.22.5131>},
{sync_notify,{log,6,
{["2012",45,["0",49],45,"19"],
[["0",49],58,"31",58,"18",46,[[...]|...]]},
[["[","info","] "],
["<0.11241.124>",64,"riak_core_handoff_receiver",58,
"process_message",58,
[...]|...],
"Receiving handoff data for partition riak_kv_vnode:1076144760300293750317947620964919332832590954496"]}}},
{<0.11289.124>,
{<0.11289.124>,#Ref<0.0.22.5202>},
{sync_notify,{log,6,
{["2012",45,["0",49],45,"19"],
[["0",49],58,"31",58,"18",46,[...]]},
[["[","info","] "],
["<0.11289.124>",64,"riak_core_handoff_receiver",58,
"process_message",58|...],
"Receiving handoff data for partition riak_kv_vnode:1001927880279583836502916750553545585740688130048"]}}},
{<0.11293.124>,
{<0.11293.124>,#Ref<0.0.22.5204>},
{sync_notify,{log,6,
{["2012",45,["0",49],45,"19"],
[["0",49],58,"31",58,"18",46|...]},
[["[","info","] "],
["<0.11293.124>",64,"riak_core_handoff_receiver",58,
[...]|...],
"Receiving handoff data for partition riak_kv_vnode:667951920186389224335277833702363723827125420032"]}}},
{<0.17488.124>,
{<0.17488.124>,#Ref<0.0.22.13770>},
{sync_notify,{log,6,
{["2012",45,["0",49],45,"19"],
[["0",49],58,"31",58,[...]|...]},
[["[","info","] "],
["<0.17488.124>",64,"riak_core_handoff_receiver",58|...],
"Receiving handoff data for partition riak_kv_vnode:1447229160403843319393101973021788068292105076736"]}}},
{<0.5.0>,
{<0.5.0>,#Ref<0.0.22.18473>},
{sync_notify,{log,3,
{["2012",45,["0",49],45,[...]],[["0",49],58,"31",58|...]},
[["[","error","] "],
["<0.19513.124>",32],
"webmachine error: path=\"/riak/bodies_raw/1326936634921_0291564814_f691cea\"\n{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}"]}}},
{<0.24660.124>,
{<0.24660.124>,#Ref<0.0.22.24522>},
{sync_notify,{log,6,
{["2012",45,[[...]|...],45|...],[["0",49],58,[...]|...]},
[["[","info",[...]],
["<0.24660.124>",64|...],
"Receiving handoff data for partition riak_kv_vnode:593735040165679310520246963290989976735222595584"]}}},
{<0.30213.125>,
{<0.30213.125>,#Ref<0.0.22.84272>},
{sync_notify,{log,6,
{["2012",45,[...]|...],[[[...]|...],58|...]},
[["[",[...]|...],[[...]|...],[...]]}}},
{<0.2254.127>,
{<0.2254.127>,#Ref<0.0.22.142846>},
{sync_notify,{log,6,
{["2012",45|...],[[...]|...]},
[[[...]|...],[...]|...]}}},
{<0.89.0>,
{<0.89.0>,#Ref<0.0.24.65121>},
{sync_notify,{log,6,{[[...]|...],[...]},[[...]|...]}}},
{<0.29908.243>,
{<0.29908.243>,#Ref<0.0.44.234047>},
{sync_notify,{log,6,{[...],...},[...]}}},
{<0.10551.244>,
{<0.10551.244>,#Ref<0.0.44.253694>},
{sync_notify,{log,6,{...},...}}},
{<0.1660.325>,
{<0.1660.325>,#Ref<0.0.60.172925>},
{sync_notify,{log,6,...}}},
{<0.11720.325>,
{<0.11720.325>,#Ref<0.0.60.189769>},
{sync_notify,{log,...}}},
{<0.8324.340>,
{<0.8324.340>,#Ref<0.0.63.180000>},
{sync_notify,{...}}}]},
{links,[<0.46.0>,<0.47.0>,#Port<0.1755082>,<0.42.0>,
#Port<0.1755079>]},
{dictionary,[{'$ancestors',[lager_sup,<0.40.0>]},
{'$initial_call',{gen_event,init_it,6}}]},
{trap_exit,true},
{error_handler,error_handler},
{priority,normal},
{group_leader,<0.39.0>},
{total_heap_size,10946},
{heap_size,10946},
{stack_size,33},
{reductions,975453},
{garbage_collection,[{min_bin_vheap_size,46368},
{min_heap_size,233},
{fullsweep_after,0},
{minor_gcs,0}]},
{suspending,[]}]
(media-riak@prod-2000-internal.XXXer.com)10>
(media-riak@prod-2000-internal.XXXer.com)10>
(media-riak@prod-2000-internal.XXXer.com)10> io:format("~s\n", [element(2,process_info(pid(0,43,0), backtrace))]).
Program counter: 0x0000000004773650 (io:wait_io_mon_reply/2 + 56)
CP: 0x0000000000000000 (invalid)
arity = 0
0xfffffd7ffad53d58 Return addr 0x0000000004770fa0 (io:o_request/3 + 96)
y(0) #Ref<0.0.22.3921>
y(1) <0.39.0>
0xfffffd7ffad53d70 Return addr 0x0000000004700e78 (lager_console_backend:handle_event/2 + 752)
y(0) []
y(1) []
y(2) put_chars
y(3) {put_chars,unicode,[[["0",49],58,"31",58,"17",46,"796"]," ",["[","info","] "],"Starting handoff of partition riak_kv_vnode 1402984481929958563080295107968853719064239931392 from 'media-riak@prod-2000-internal.XXXer.com' to 'media-riak@prod-2021-internal.XXXer.com'","\n"]}
y(4) <0.39.0>
0xfffffd7ffad53da0 Return addr 0x00000000030bab08 (gen_event:server_update/4 + 272)
y(0) {state,6,false}
0xfffffd7ffad53db0 Return addr 0x00000000030ba6f0 (gen_event:server_notify/4 + 136)
y(0) lager_event
y(1) {log,6,{["2012",45,["0",49],45,"19"],[["0",49],58,"31",58,"17",46,"796"]},[["[","info","] "],["<0.11161.124>",64,"riak_core_handoff_sender",58,"start_fold",58,"39",32],"Starting handoff of partition riak_kv_vnode 1402984481929958563080295107968853719064239931392 from 'media-riak@prod-2000-internal.XXXer.com' to 'media-riak@prod-2021-internal.XXXer.com'"]}
y(2) {handler,lager_console_backend,false,{state,6,false},<0.46.0>}
y(3) {state,6,false}
y(4) lager_console_backend
y(5) Catch 0x00000000030bab08 (gen_event:server_update/4 + 272)
0xfffffd7ffad53de8 Return addr 0x00000000030ba7d0 (gen_event:server_notify/4 + 360)
y(0) lager_event
y(1) handle_event
y(2) {log,6,{["2012",45,["0",49],45,"19"],[["0",49],58,"31",58,"17",46,"796"]},[["[","info","] "],["<0.11161.124>",64,"riak_core_handoff_sender",58,"start_fold",58,"39",32],"Starting handoff of partition riak_kv_vnode 1402984481929958563080295107968853719064239931392 from 'media-riak@prod-2000-internal.XXXer.com' to 'media-riak@prod-2021-internal.XXXer.com'"]}
y(3) []
0xfffffd7ffad53e10 Return addr 0x00000000030b7750 (gen_event:handle_msg/5 + 616)
y(0) {handler,lager_file_backend,false,{state,[{file,"/var/log/riak/error.log",3,{file_descriptor,prim_file,{#Port<0.1755082>,11}},1727659,false,10485760,[{hour,0}],5},{file,"/var/log/riak/console.log",6,{file_descriptor,prim_file,{#Port<0.1755079>,12}},1727658,false,10485760,[{hour,0}],5}]},<0.47.0>}
0xfffffd7ffad53e20 Return addr 0x00000000030afec8 (proc_lib:init_p_do_apply/3 + 56)
y(0) []
y(1) lager_event
y(2) <0.42.0>
y(3) {<0.11161.124>,#Ref<0.0.22.3919>}
y(4) <0.11161.124>
0xfffffd7ffad53e50 Return addr 0x00000000006df368 (<terminate process normally>)
y(0) Catch 0x00000000030afee8 (proc_lib:init_p_do_apply/3 + 88)
ok
(media-riak@prod-2000-internal.XXXer.com)11>
(media-riak@prod-2000-internal.XXXer.com)11>
(media-riak@prod-2000-internal.XXXer.com)11>
(media-riak@prod-2000-internal.XXXer.com)11> process_info(pid(0,39,0)).
[{current_function,{application_master,main_loop,2}},
{initial_call,{proc_lib,init_p,5}},
{status,waiting},
{message_queue_len,0},
{messages,[]},
{links,[<0.6.0>,<0.40.0>]},
{dictionary,[{'$ancestors',[<0.38.0>]},
{'$initial_call',{application_master,init,4}}]},
{trap_exit,true},
{error_handler,error_handler},
{priority,normal},
{group_leader,<0.39.0>},
{total_heap_size,233},
{heap_size,233},
{stack_size,6},
{reductions,4369},
{garbage_collection,[{min_bin_vheap_size,46368},
{min_heap_size,233},
{fullsweep_after,0},
{minor_gcs,0}]},
{suspending,[]}]
(media-riak@prod-2000-internal.XXXer.com)12> io:format("~s\n", [element(2,process_info(pid(0,39,0), backtrace))]).
Program counter: 0x00000000012f7130 (application_master:main_loop/2 + 64)
CP: 0x0000000000000000 (invalid)
arity = 0
0xfffffd7ff3a862b0 Return addr 0x00000000030afec8 (proc_lib:init_p_do_apply/3 + 56)
y(0) []
y(1) {state,<0.40.0>,{appl_data,lager,[],undefined,{lager_app,[]},[error_logger_lager_h,lager,lager_app,lager_console_backend,lager_crash_log,lager_file_backend,lager_handler_watcher,lager_handler_watcher_sup,lager_mochiglobal,lager_stdlib,lager_sup,lager_transform,lager_trunc_io,lager_util],[],infinity,infinity},[],0,<0.32.0>}
y(2) <0.6.0>
0xfffffd7ff3a862d0 Return addr 0x00000000006df368 (<terminate process normally>)
y(0) Catch 0x00000000030afee8 (proc_lib:init_p_do_apply/3 + 88)
ok
(media-riak@prod-2000-internal.XXXer.com)13> x.
x
(media-riak@prod-2000-internal.XXXer.com)14>
(media-riak@prod-2000-internal.XXXer.com)14>
(media-riak@prod-2000-internal.XXXer.com)14>
(media-riak@prod-2000-internal.XXXer.com)14>
(media-riak@prod-2000-internal.XXXer.com)14>
(media-riak@prod-2000-internal.XXXer.com)14>
(media-riak@prod-2000-internal.XXXer.com)14> process_info(pid(0,89,0)).
[{registered_name,riak_sysmon_handler},
{current_function,{gen,do_call,4}},
{initial_call,{proc_lib,init_p,5}},
{status,waiting},
{message_queue_len,305},
{messages,[{notify,{monitor,<0.13317.112>,long_gc,
[{timeout,580},
{old_heap_block_size,0},
{heap_block_size,75025},
{mbuf_size,0},
{stack_size,33},
{old_heap_size,0},
{heap_size,25272}]}},
{notify,{monitor,<0.26837.116>,long_gc,
[{timeout,119},
{old_heap_block_size,0},
{heap_block_size,46368},
{mbuf_size,0},
{stack_size,78},
{old_heap_size,0},
{heap_size,14577}]}},
{notify,{monitor,<0.14736.112>,long_gc,
[{timeout,121},
{old_heap_block_size,0},
{heap_block_size,46368},
{mbuf_size,0},
{stack_size,78},
{old_heap_size,0},
{heap_size,14575}]}},
{notify,{monitor,<0.7202.137>,long_gc,
[{timeout,266},
{old_heap_block_size,0},
{heap_block_size,4181},
{mbuf_size,0},
{stack_size,29},
{old_heap_size,0},
{heap_size,1448}]}},
{notify,{monitor,<0.32645.111>,long_gc,
[{timeout,533},
{old_heap_block_size,0},
{heap_block_size,46368},
{mbuf_size,0},
{stack_size,30},
{old_heap_size,0},
{heap_size,14501}]}},
{notify,{monitor,<0.14405.111>,long_gc,
[{timeout,595},
{old_heap_block_size,0},
{heap_block_size,46368},
{mbuf_size,0},
{stack_size,82},
{old_heap_size,0},
{heap_size,14489}]}},
{notify,{monitor,<0.107.0>,long_gc,
[{timeout,728},
{old_heap_block_size,0},
{heap_block_size,75025},
{mbuf_size,0},
{stack_size,9},
{old_heap_size,0},
{heap_size,...}]}},
{notify,{monitor,<0.14353.111>,long_gc,
[{timeout,603},
{old_heap_block_size,0},
{heap_block_size,46368},
{mbuf_size,0},
{stack_size,30},
{old_heap_size,...},
{...}]}},
{notify,{monitor,<0.14164.139>,long_gc,
[{timeout,466},
{old_heap_block_size,0},
{heap_block_size,46368},
{mbuf_size,0},
{stack_size,...},
{...}|...]}},
{notify,{monitor,<0.27900.116>,long_gc,
[{timeout,599},
{old_heap_block_size,0},
{heap_block_size,46368},
{mbuf_size,...},
{...}|...]}},
{notify,{monitor,<0.13157.111>,long_gc,
[{timeout,599},
{old_heap_block_size,0},
{heap_block_size,...},
{...}|...]}},
{notify,{monitor,<0.25009.111>,long_gc,
[{timeout,599},{old_heap_block_size,...},{...}|...]}},
{notify,{monitor,<0.14045.112>,long_gc,
[{timeout,...},{...}|...]}},
{notify,{monitor,<0.25009.111>,long_gc,[{...}|...]}},
{notify,{monitor,<0.5408.112>,long_gc,[...]}},
{notify,{monitor,<0.32645.111>,long_gc,...}},
{notify,{monitor,<0.25144.111>,...}},
{notify,{monitor,...}},
{notify,{...}},
{notify,...},
{...}|...]},
{links,[<0.87.0>,<0.94.0>]},
{dictionary,[{'$ancestors',[riak_sysmon_sup,<0.86.0>]},
{'$initial_call',{gen_event,init_it,6}}]},
{trap_exit,true},
{error_handler,error_handler},
{priority,normal},
{group_leader,<0.85.0>},
{total_heap_size,17711},
{heap_size,17711},
{stack_size,34},
{reductions,112210},
{garbage_collection,[{min_bin_vheap_size,46368},
{min_heap_size,233},
{fullsweep_after,0},
{minor_gcs,0}]},
{suspending,[]}]
(media-riak@prod-2000-internal.XXXer.com)15> io:format("~s\n", [element(2,process_info(pid(0,89,0), backtrace))]).
Program counter: 0x00000000030aa5a0 (gen:do_call/4 + 576)
CP: 0x0000000000000000 (invalid)
arity = 0
0xfffffd7ff86e2b40 Return addr 0x00000000030b6a18 (gen_event:rpc/2 + 96)
y(0) #Ref<0.0.24.65121>
y(1) 'media-riak@prod-2000-internal.XXXer.com'
y(2) []
y(3) infinity
y(4) {sync_notify,{log,6,{["2012",45,["0",49],45,"19"],[["0",49],58,"41",58,"18",46,"933"]},[["[","info","] "],["<0.89.0>",64,"riak_core_sysmon_handler",58,"handle_event",58,"85",32],"monitor long_gc <0.568.0> [{name,riak_kv_put_fsm_sup},{initial_call,{supervisor,riak_kv_put_fsm_sup,1}},{almost_current_function,{proc_lib,sync_wait,2}}] [{timeout,578},{old_heap_block_size,0},{heap_block_size,987},{mbuf_size,0},{stack_size,28},{old_heap_size,0},{heap_size,395}]"]}}
y(5) <0.89.0>
y(6) <0.43.0>
0xfffffd7ff86e2b80 Return addr 0x00000000048114f8 (riak_core_sysmon_handler:handle_event/2 + 648)
0xfffffd7ff86e2b88 Return addr 0x00000000030bab08 (gen_event:server_update/4 + 272)
y(0) []
y(1) []
y(2) []
y(3) []
y(4) []
y(5) {state}
0xfffffd7ff86e2bc0 Return addr 0x00000000030ba6f0 (gen_event:server_notify/4 + 136)
y(0) riak_sysmon_handler
y(1) {monitor,<0.568.0>,long_gc,[{timeout,578},{old_heap_block_size,0},{heap_block_size,987},{mbuf_size,0},{stack_size,28},{old_heap_size,0},{heap_size,395}]}
y(2) {handler,riak_core_sysmon_handler,false,{state},<0.94.0>}
y(3) {state}
y(4) riak_core_sysmon_handler
y(5) Catch 0x00000000030bab08 (gen_event:server_update/4 + 272)
0xfffffd7ff86e2bf8 Return addr 0x00000000030b75e8 (gen_event:handle_msg/5 + 256)
y(0) riak_sysmon_handler
y(1) handle_event
y(2) {monitor,<0.568.0>,long_gc,[{timeout,578},{old_heap_block_size,0},{heap_block_size,987},{mbuf_size,0},{stack_size,28},{old_heap_size,0},{heap_size,395}]}
y(3) []
0xfffffd7ff86e2c20 Return addr 0x00000000030afec8 (proc_lib:init_p_do_apply/3 + 56)
y(0) []
y(1) riak_sysmon_handler
y(2) <0.87.0>
0xfffffd7ff86e2c40 Return addr 0x00000000006df368 (<terminate process normally>)
y(0) Catch 0x00000000030afee8 (proc_lib:init_p_do_apply/3 + 88)
ok
(media-riak@prod-2000-internal.XXXer.com)16>
** See my yellow notepad for more notes, but...
... erlang.log.4 shows that Greg (?) used the console successfully
prior to the console going dead.
(media-riak@prod-2000-internal.XXXer.com)5>
rp(lists:keysort(2, [{N, element(1, rpc:call(N, erlang, statistics, [wall_clock]))} || N <- [node() | nodes()]])).
[{'media-riak@prod-2000-internal.XXXer.com',6519722},
{'media-riak@prod-2021-internal.XXXer.com',7155611},
{'media-riak@prod-2020-internal.XXXer.com',7183842},
{'media-riak@prod-2022-internal.XXXer.com',7187230},
{'media-riak@prod-2019-internal.XXXer.com',7195017},
{'media-riak@prod-2023-internal.XXXer.com',7232344},
{'media-riak@prod-2018-internal.XXXer.com',7246924},
{'media-riak@prod-2017-internal.XXXer.com',7256024},
{'media-riak@prod-2016-internal.XXXer.com',7279753},
{'media-riak@prod-2015-internal.XXXer.com',7297350},
{'media-riak@prod-2024-internal.XXXer.com',7299218},
{'media-riak@prod-2014-internal.XXXer.com',7369359},
{'media-riak@prod-2013-internal.XXXer.com',7378016},
{'media-riak@prod-2025-internal.XXXer.com',7402411},
{'media-riak@prod-2012-internal.XXXer.com',7436498},
{'media-riak@prod-2011-internal.XXXer.com',7455697},
{'media-riak@prod-2026-internal.XXXer.com',7493179},
{'media-riak@prod-2009-internal.XXXer.com',7498511},
{'media-riak@prod-2010-internal.XXXer.com',7511964},
{'media-riak@prod-2008-internal.XXXer.com',7558773},
{'media-riak@prod-2007-internal.XXXer.com',7576800},
{'media-riak@prod-2006-internal.XXXer.com',7606920},
{'media-riak@prod-2005-internal.XXXer.com',7623072},
{'media-riak@prod-2027-internal.XXXer.com',7646077},
{'media-riak@prod-2004-internal.XXXer.com',7656642},
{'media-riak@prod-2028-internal.XXXer.com',7678676},
{'media-riak@prod-2002-internal.XXXer.com',7683218},
{'media-riak@prod-2003-internal.XXXer.com',7715771},
{'media-riak@prod-2029-internal.XXXer.com',7767783},
{'media-riak@prod-2001-internal.XXXer.com',7788377},
{'media-riak@prod-2030-internal.XXXer.com',7806678},
{'media-riak@prod-2031-internal.XXXer.com',7833097},
{'media-riak@prod-2032-internal.XXXer.com',7872286},
{'media-riak@prod-2033-internal.XXXer.com',7888006},
{'media-riak@prod-2034-internal.XXXer.com',7923088},
{'media-riak@prod-2035-internal.XXXer.com',7945616},
{'media-riak@prod-2036-internal.XXXer.com',7962402},
{'media-riak@prod-2037-internal.XXXer.com',7993250},
{'media-riak@prod-2038-internal.XXXer.com',8088754},
{'media-riak@prod-2039-internal.XXXer.com',8127035},
{'media-riak@prod-2040-internal.XXXer.com',8200300}]
ok
(media-riak@prod-2000-internal.XXXer.com)6>
===== ALIVE Thu Jan 19 01:47:12 UTC 2012
===== ALIVE Thu Jan 19 02:02:12 UTC 2012
===== ALIVE Thu Jan 19 02:17:12 UTC 2012
===== ALIVE Thu Jan 19 02:32:12 UTC 2012
[SLF: many more 1/2 hour interval messages omitted...]
===== ALIVE Thu Jan 19 06:47:13 UTC 2012
===== ALIVE Thu Jan 19 07:02:13 UTC 2012
^R
x
y
asdflkjasdflkjasdf
^R
^G^M
BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded^M
(v)ersion (k)ill (D)b-tables (d)istribution^M
c
Eh?^M
^M
c
salkasdflkj
asdflkjasdf
asdflkjqwer
oiquwer
qwoeriua
asdflkj
qweoriu
-- end of log file --
Also, I didn't think anything much about it at the time (other than
"Hm, that seems a little bit high"), beam.smp's CPU utilization was
consistently high at that time:
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
11504 riak 9046M 8893M run 59 0 25:00:10 32% beam.smp/23
... and ...
Total: 56 processes, 287 lwps, load averages: 5.11, 4.91, 4.97
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
11504 riak 48 2.9 0.0 0.0 0.0 34 15 0.1 19K 2 24K 0 beam.smp/20
11504 riak 46 2.2 0.0 0.0 0.0 44 7.5 0.1 14K 15 17K 0 beam.smp/14
11504 riak 40 2.3 0.0 0.0 0.0 51 6.1 0.1 16K 7 19K 0 beam.smp/19
11504 riak 38 3.1 0.0 0.0 0.0 50 8.4 0.2 24K 5 28K 0 beam.smp/8
11504 riak 38 1.9 0.0 0.0 0.0 52 7.6 0.2 14K 7 17K 0 beam.smp/13
11504 riak 35 1.6 0.0 0.0 0.0 60 2.8 0.1 9K 1 11K 0 beam.smp/16
11504 riak 33 2.7 0.0 0.0 0.0 51 13 0.1 18K 6 22K 0 beam.smp/7
11504 riak 30 1.3 0.0 0.0 0.0 59 10 0.1 9K 4 12K 0 beam.smp/18
11504 riak 27 1.2 0.0 0.0 0.0 69 2.6 0.0 7K 3 8K 0 beam.smp/15
11504 riak 23 1.6 0.0 0.0 0.0 71 4.0 0.2 10K 6 11K 0 beam.smp/6
11504 riak 19 1.2 0.0 0.0 0.0 75 3.1 1.0 7K 3 9K 0 beam.smp/11
11504 riak 15 0.6 0.0 0.0 0.0 81 3.3 0.0 4K 1 4K 0 beam.smp/12
11504 riak 10 0.5 0.0 0.0 0.0 86 2.6 0.0 3K 3 4K 0 beam.smp/9
11504 riak 4.7 0.2 0.0 0.0 0.0 95 0.3 0.0 1K 2 1K 0 beam.smp/17
11504 riak 2.2 0.2 0.0 0.0 0.0 97 0.5 0.0 1K 2 1K 0 beam.smp/21
I did not use "truss" or DTrace to try to see what it was doing at
that time, alas.
So, when I restarted riak on prod-2000 via kill {pid} ("riak stop"
didn't work), a lot of handoffs were triggered. After a few minutes,
all outstanding transfers were finished and none remained
cluster-wide.
So, prod-2000 was in an odd state were "erl -remsh prod-2000..." would
work, but "riak attach" did not, and apparently its vnode weren't
working correctly? Odd............
----
Find all current transfers:
rp(lists:flatten(element(1,(rpc:multicall(lists:sort([node() | nodes()]), ets, tab2list, [global_locks]))))).
----
Pids = [pid(0,31950,667), pid(0,31204,667), pid(0,3869,668)].
[begin
{_, M} = process_info(P, messages),
lists:sort([element(2, element(4, element(2, GE))) || GE <- M, element(1, element(4, element(2, GE))) == riak_kv_get_req_v1])
end || P <- Pids].
[[{<<"bodies">>,<<"1323062665794_1623433739">>},
{<<"bodies">>,<<"1323062665794_1623433739">>},
{<<"bodies">>,<<"1323291523887_1261585781">>},
{<<"bodies">>,<<"1323291523887_1261585781">>},
{<<"bodies">>,<<"1323749748395_1663680247">>},
{<<"bodies">>,<<"1323749748395_1663680247">>},
{<<"bodies">>,
<<"1325103665235_a6d971e73bc93292_11ef8a9f712ba82f">>},
{<<"bodies">>,
<<"1325103665235_a6d971e73bc93292_11ef8a9f712ba82f">>},
{<<"bodies">>,<<"1325181319905_0223540070">>},
{<<"bodies">>,<<"1325292430028_726416307">>},
{<<"bodies">>,<<"1325292430028_726416307">>},
{<<"bodies">>,
<<"1325572004610_ad09ab45a4e270ab_305d714fc0460266">>},
{<<"bodies">>,
<<"1325572004610_ad09ab45a4e270ab_305d714fc0460266">>},
{<<"bodies">>,<<"1325626956996_1424663528">>},
{<<"bodies">>,<<"1325626956996_1424663528">>},
{<<"bodies">>,<<"1325695381884_1709701352">>},
{<<"bodies">>,<<"1325695381884_1709701352">>},
{<<"bodies">>,<<"1326132704350_a4fae6cc359908b1_a"...>>},
{<<"bodies">>,<<"1326231759930_144d4bfd752838"...>>},
{<<"bodies">>,<<"1326494084453_5114271274">>},
{<<"bodies">>,<<"1326612744480_193742"...>>},
{<<"bodies">>,<<"1326612744480_19"...>>},
{<<"bodies">>,<<"132661274448"...>>},
{<<"bodies">>,<<"13266127"...>>},
{<<"bodies">>,<<"1326"...>>},
{<<"bodi"...>>,<<...>>},
{<<...>>,...},
{...}|...],
[{<<"bodies">>,<<"1323897305170_6801318432">>},
{<<"bodies">>,<<"1323897305170_6801318432">>},
{<<"bodies">>,<<"1327040954254_0443391082_3a2190a">>},
{<<"bodies">>,<<"1327277886646_7864500163_edc0f306">>},
{<<"bodies">>,<<"1327295682946_4578250107_454b0b52">>},
{<<"bodies">>,<<"1327311363065_8737892066_5ae266b1">>},
{<<"bodies">>,<<"1327332554454_8735578690">>},
{<<"bodies">>,<<"1327355380079_4177460385_5c04edbb">>},
{<<"bodies">>,<<"1327388143845_0554939193_552f81b">>},
{<<"bodies">>,<<"1327432439439_5035365936_10c70a78">>},
{<<"bodies">>,<<"1327432439439_5035365936_10c70a78">>},
{<<"bodies">>,<<"1327432439439_5035365936_10c70a78">>},
{<<"bodies">>,<<"1327432439439_5035365936_10c70a78">>},
{<<"bodies">>,
<<"1327437260171_96c2d749a6e2d109_e2d87f6b66007"...>>},
{<<"bodies">>,
<<"1327437260171_96c2d749a6e2d109_e2d87f6b6"...>>},
{<<"bodies">>,<<"1327437260171_96c2d749a6e2d109_e2d87"...>>},
{<<"bodies">>,<<"1327437260171_96c2d749a6e2d109_e"...>>},
{<<"bodies">>,<<"1327437260171_96c2d749a6e2d1"...>>},
{<<"bodies">>,<<"1327437260171_96c2d749a6"...>>},
{<<"bodies">>,<<"1327437260171_96c2d7"...>>},
{<<"bodies">>,<<"1327437260171_96"...>>},
{<<"bodies">>,<<"132745260716"...>>},
{<<"bodies">>,<<"13274541"...>>},
{<<"bodies">>,<<"1327"...>>},
{<<"bodi"...>>,<<...>>},
{<<...>>,...},
{...}|...],
[]]
(media-riak@prod-2015-internal.XXXer.com)17> [length(LL) || LL <- v(16)].
[28050,7425,0]
(media-riak@prod-2015-internal.XXXer.com)19> file:write_file("/tmp/odd-vnode-backlog.keys.bin", term_to_binary(v(61))).
[begin
{_, M} = process_info(P, messages),
length(lists:sort([element(2, element(4, element(2, GE))) || GE <- M, element(1, element(4, element(2, GE))) == riak_kv_get_req_v1]))
end || P <- Pids].
----
dtrace -n 'profile-997 /execname == "beam.smp" && arg1/ { @[uaddr(arg1)] = count(); } tick-5s { trunc(@, 10); exit(0); }'
dtrace -q -n 'pid$target:eleveldb.so::entry { self->level = 1; } pid$target:eleveldb.so::return { self->level = 0 } syscall::open:entry /self->level > 0/ { printf("%s\n", copyinstr(arg0)); }' -p `pgrep -u riak beam.smp`
dtrace -q -n 'pid$target:eleveldb.so:eleveldb_*:entry { self->level = 1; } pid$target:eleveldb.so:eleveldb_*:return { self->level = 0 } syscall::open:entry /self->level > 0/ { printf("%s\n", copyinstr(arg0)); } tick-5sec { exit(0);}' -p `pgrep -u riak beam.smp` | awk -F/ '{print $6}' | sort | uniq -c | sort -nr | head -20
dtrace -n 'pid$target:eleveldb.so:*DoCompactionWork*:entry' -p `pgrep -u riak beam.smp`
dtrace -n 'pid$target:eleveldb.so:*OpenCompactionOutputFile*:entry' -p `pgrep -u riak beam.smp`
dtrace -q -n 'pid$target:eleveldb.so:*OpenCompactionOutputFile*:entry { self->level = 1; } pid$target:eleveldb.so:*OpenCompactionOutputFile*:return { self->level = 0 } syscall::open:entry /self->level > 0/ { printf("%s\n", copyinstr(arg0)); } tick-5sec { exit(0);}' -p `pgrep -u riak beam.smp`
/var/db/riak/leveldb/874902835628753407088729299272540518602623680512/352337.sst
/var/db/riak/leveldb/874902835628753407088729299272540518602623680512/352338.sst
/var/db/riak/leveldb/874902835628753407088729299272540518602623680512/352339.sst
[...]
dtrace -n 'syscall::*read*:entry { @[execname, lwp] = count} tick-5sec { exit(0);}'
----
repair action on prod-2003, 2014, 2015, and 2019
repair action on prod-2006
For example, on 2014:
[root@prod-2014 /var/db/riak/leveldb]# ls -lt | head
total 52123
drwxr-xr-x 2 riak riak 62692 2012-01-27 02:00 984800907967112317930217318920151644104095170560
drwxr-xr-x 2 riak riak 68096 2012-01-27 02:00 725041827894627619577609272480343529282435284992
drwxr-xr-x 2 riak riak 18 2012-01-27 01:28 943410724878639481379527025806116285148995518464
drwxr-xr-x 2 riak riak 17 2012-01-27 01:28 646543204795799826119403544160621296781384220672
[...]
So, two hosed dirs.
Verify by using "grep error: $vnode_dir/LOG",
there should be at least one match, probably thousands.
cd /var/db/riak/leveldb
erl -pz /tmp/eleveldb+repair/ebin
At shell prompt, e.g.,
eleveldb:repair("./984800907967112317930217318920151644104095170560", []).
Then, there's a second step, which takes a long long long time also.
1> {ok, DB} = eleveldb:open("./1455792646560079078679451688838485039110401556480", [write]).
{ok,<<>>}
Watch "iostat -x 1" and the 1455*/LOG file for compaction activity.
If compaction activity doesn't happen, or stops after only a few
minutes, then use the following (perhaps alternating put & delete
several times) to trigger enough writes to
10> [eleveldb:put(DB, <<X:32>>, <<"asdlfkjasdlfkjasdlkfjasdfasdlfkjasdlasdlfkjasdlasdlfkjasdlasdlfkjasdlasdlfkjasdl">>, []) || X <- lists:seq(1,20000)].
[ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,
ok,ok,ok,ok,ok,ok,ok,ok,ok,ok|...]
11> [eleveldb:delete(DB, <<X:32>>, []) || X <- lists:seq(1,20000)]. [ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,
ok,ok,ok,ok,ok,ok,ok,ok,ok,ok|...]
When compaction has finally finished (no "iostat -x" activity, and no
activity in the db's LOG file), then:
12> q().
ok
----
dtrace -qn 'pid$target:eleveldb.so:*DoCompactionWork*:entry { self->start = timestamp; } pid$target:eleveldb.so:*DoCompactionWork*:return /self->start > 0/ { @ = quantize((timestamp - self->start) / 1000000); self->start = 0; } END {printf("DoCompactionWork in milliseconds\n"); printa(@); }' -p `pgrep -u riak beam.smp`
dtrace -qn 'pid$target:eleveldb.so:eleveldb_*:entry { self->start = timestamp; } pid$target:eleveldb.so:eleveldb_*:return /self->start > 0/ { @[probefunc] = quantize((timestamp - self->start) / 1000000); self->start = 0; } END {printf("NIF function calls in milliseconds\n"); printa(@); } tick-300sec { exit(0); }' -p `pgrep -u riak beam.smp` ; date
---
Sunday 2012-01-29
Interesting, a new way for riak_sysmon_filter to get hosed.
(media-riak@prod-2101-internal.XXXer.com)10> process_info(pid(0,91,0)).
[{registered_name,riak_sysmon_filter},
{current_function,{net_kernel,get_status,3}},
{initial_call,{proc_lib,init_p,5}},
{status,waiting},
{message_queue_len,33004},
{messages,[reset,reset,reset,reset,reset,reset,reset,reset,
reset,reset,reset,reset,reset,reset,reset,reset,reset,reset,
reset,reset,reset|...]},
{links,[<0.79.0>,<0.90.0>]},
{dictionary,[{'$ancestors',[riak_sysmon_sup,<0.89.0>]},
{'$initial_call',{riak_sysmon_filter,init,1}}]},
{trap_exit,false},
{error_handler,error_handler},
{priority,normal},
{group_leader,<0.88.0>},
{total_heap_size,2584},
{heap_size,2584},
{stack_size,27},
{reductions,118673663},
{garbage_collection,[{min_bin_vheap_size,46368},
{min_heap_size,233},
{fullsweep_after,0},
{minor_gcs,0}]},
{suspending,[]}]
(media-riak@prod-2101-internal.XXXer.com)11> io:format("~s\n", [element(2,process_info(pid(0,91,0), backtrace))]).
Program counter: 0x0000000004435cf0 (net_kernel:get_status/3 + 216)
CP: 0x0000000000000000 (invalid)
arity = 0
0x0000000000e0dc48 Return addr 0x0000000004435a38 (net_kernel:get_node_info/1 + 424)
y(0) 'media-riak@prod-2038-internal.XXXer.com'
y(1) <0.10111.2>
0x0000000000e0dc60 Return addr 0x0000000004436180 (net_kernel:get_nodes_info/2 + 96)
y(0) normal
y(1) {net_address,{{192,168,25,113},60765},"prod-2038-internal.XXXer.com",tcp,inet}
y(2) <0.10111.2>
y(3) up
0x0000000000e0dc88 Return addr 0x000000000487bed0 (riak_sysmon_filter:get_node_map/0 + 64)
y(0) [{'media-riak@prod-2098-internal.XXXer.com',[{owner,<0.195.0>},{state,up},{address,{net_address,{{192,168,25,237},50572},"prod-2098-internal.XXXer.com",tcp,inet}},{type,normal},{in,16612},{out,24082}]},{'media-riak@prod-2012-internal.XXXer.com',[{owner,<0.10202.2>},{state,up},{address,{net_address,{{192,168,25,99},40263},"prod-2012-internal.XXXer.com",tcp,inet}},{type,normal},{in,14},{out,5}]},{'media-riak@prod-2032-internal.XXXer.com',[{owner,<0.10287.2>},{state,pending},{address,{net_address,{{192,168,25,20},62879},"prod-2032-internal.XXXer.com",tcp,inet}},{type,normal},{in,0},{out,0}]},{'media-riak@prod-2002-internal.XXXer.com',[{owner,<0.211.0>},{state,up},{address,{net_address,{{192,168,25,18},50827},"prod-2002-internal.XXXer.com",tcp,inet}},{type,normal},{in,25453},{out,24814}]},{'media-riak@prod-2035-internal.XXXer.com',[{owner,<0.10191.2>},{state,up},{address,{net_address,{{192,168,25,68},37453},"prod-2035-internal.XXXer.com",tcp,inet}},{type,normal},{in,10},{out,5}]},{'media-riak@prod-2015-internal.XXXer.com',[{owner,<0.10185.2>},{state,up},{address,{net_address,{{192,168,25,102},57790},"prod-2015-internal.XXXer.com",tcp,inet}},{type,normal},{in,11},{out,4}]},{'media-riak@prod-2034-internal.XXXer.com',[{owner,<0.10189.2>},{state,up},{address,{net_address,{{192,168,25,59},45894},"prod-2034-internal.XXXer.com",tcp,inet}},{type,normal},{in,11},{out,6}]},{'media-riak@prod-2104-internal.XXXer.com',[{owner,<0.9036.2>},{state,up},{address,{net_address,{{192,168,26,7},35922},"prod-2104-internal.XXXer.com",tcp,inet}},{type,normal},{in,64},{out,70}]},{'media-riak@prod-2017-internal.XXXer.com',[{owner,<0.10181.2>},{state,up},{address,{net_address,{{192,168,25,104},34530},"prod-2017-internal.XXXer.com",tcp,inet}},{type,normal},{in,5},{out,4}]},{'media-riak@prod-2037-internal.XXXer.com',[{owner,<0.10094.2>},{state,up},{address,{net_address,{{192,168,25,112},51065},"prod-2037-internal.XXXer.com",tcp,inet}},{type,normal},{in,15},{out,8}]}]
y(1) ['media-riak@prod-2108-internal.XXXer.com','media-riak@prod-2028-internal.XXXer.com','media-riak@prod-2029-internal.XXXer.com','media-riak@prod-2039-internal.XXXer.com','media-riak@prod-2036-internal.XXXer.com','media-riak@prod-2106-internal.XXXer.com','media-riak@prod-2006-internal.XXXer.com','media-riak@prod-2026-internal.XXXer.com','media-riak@prod-2016-internal.XXXer.com','media-riak@prod-2023-internal.XXXer.com','media-riak@prod-2013-internal.XXXer.com','media-riak@prod-2103-internal.XXXer.com','media-riak@prod-2033-internal.XXXer.com','media-riak@prod-2003-internal.XXXer.com','media-riak@prod-2010-internal.XXXer.com','media-riak@prod-2040-internal.XXXer.com','media-riak@prod-2100-internal.XXXer.com','media-riak@prod-2020-internal.XXXer.com','media-riak@prod-2030-internal.XXXer.com','media-riak@prod-2021-internal.XXXer.com','media-riak@prod-2001-internal.XXXer.com','media-riak@prod-2031-internal.XXXer.com','media-riak@prod-2011-internal.XXXer.com']
y(2) 'media-riak@prod-2038-internal.XXXer.com'
0x0000000000e0dca8 Return addr 0x000000000487b560 (riak_sysmon_filter:handle_info/2 + 2600)
y(0) Catch 0x000000000487bf48 (riak_sysmon_filter:get_node_map/0 + 184)
0x0000000000e0dcb8 Return addr 0x00000000043c5cc8 (gen_server:handle_msg/5 + 1680)
y(0) []
y(1) {0,nil}
y(2) {state,0,30,0,2,{0,nil},[{{{192,168,25,98},46335},'media-riak@prod-2011-internal.XXXer.com'},{{{192,168,24,169},48853},'media-riak@prod-2031-internal.XXXer.com'},{{{192,168,25,4},45159},'media-riak@prod-2001-internal.XXXer.com'},{{{192,168,24,123},50343},'media-riak@prod-2021-internal.XXXer.com'},{{{192,168,24,168},47264},'media-riak@prod-2030-internal.XXXer.com'},{{{192,168,24,121},59550},'media-riak@prod-2020-internal.XXXer.com'},{{{192,168,25,248},39400},'media-riak@prod-2100-internal.XXXer.com'},{{{192,168,24,233},57098},'media-riak@prod-2000-internal.XXXer.com'},{{{192,168,25,115},34939},'media-riak@prod-2040-internal.XXXer.com'},{{{192,168,25,97},51083},'media-riak@prod-2010-internal.XXXer.com'},{{{192,168,25,41},38256},'media-riak@prod-2033-internal.XXXer.com'},{{{192,168,25,250},64683},'media-riak@prod-2103-internal.XXXer.com'},{{{192,168,25,100},33050},'media-riak@prod-2013-internal.XXXer.com'},{{{192,168,24,125},55403},'media-riak@prod-2023-internal.XXXer.com'},{{{192,168,25,103},38466},'media-riak@prod-2016-internal.XXXer.com'},{{{192,168,24,127},41757},'media-riak@prod-2026-internal.XXXer.com'},{{{192,168,25,90},45461},'media-riak@prod-2006-internal.XXXer.com'},{{{192,168,26,11},34804},'media-riak@prod-2106-internal.XXXer.com'},{{{192,168,25,87},41709},'media-riak@prod-2036-internal.XXXer.com'},{{{192,168,25,96},43607},'media-riak@prod-2009-internal.XXXer.com'},{{{192,168,25,240},45068},'media-riak@prod-2099-internal.XXXer.com'},{{{192,168,25,106},61231},'media-riak@prod-2019-internal.XXXer.com'},{{{192,168,25,114},58133},'media-riak@prod-2039-internal.XXXer.com'},{{{192,168,24,166},63433},'media-riak@prod-2029-internal.XXXer.com'},{{{192,168,25,105},62116},'media-riak@prod-2018-internal.XXXer.com'},{{{192,168,25,111},48904},'media-riak@prod-2028-internal.XXXer.com'},{{{192,168,26,18},44053},'media-riak@prod-2108-internal.XXXer.com'},{{{192,168,25,113},60765},'media-riak@prod-2038-internal.XXXer.com'},{{{192,168,25,95},56026},'media-riak@prod-2008-internal.XXXer.com'},{{{192,168,25,237},50572},'media-riak@prod-2098-internal.XXXer.com'},{{{192,168,25,99},40263},'media-riak@prod-2012-internal.XXXer.com'},{{{192,168,25,20},46256},'media-riak@prod-2032-internal.XXXer.com'},{{{192,168,25,249},54807},'media-riak@prod-2102-internal.XXXer.com'},{{{192,168,24,124},64396},'media-riak@prod-2022-internal.XXXer.com'},{{{192,168,25,18},50827},'media-riak@prod-2002-internal.XXXer.com'},{{{192,168,25,68},37453},'media-riak@prod-2035-internal.XXXer.com'},{{{192,168,25,89},65223},'media-riak@prod-2005-internal.XXXer.com'},{{{192,168,25,102},57790},'media-riak@prod-2015-internal.XXXer.com'},{{{192,168,24,199},47972},'media-riak@prod-2025-internal.XXXer.com'},{{{192,168,26,8},55805},'media-riak@prod-2105-internal.XXXer.com'},{{{192,168,25,101},44535},'media-riak@prod-2014-internal.XXXer.com'},{{{192,168,25,59},45894},'media-riak@prod-2034-internal.XXXer.com'},{{{192,168,24,122},36804},'media-riak@prod-2024-internal.XXXer.com'},{{{192,168,25,88},33156},'media-riak@prod-2004-internal.XXXer.com'},{{{192,168,26,7},35922},'media-riak@prod-2104-internal.XXXer.com'},{{{192,168,25,91},56857},'media-riak@prod-2007-internal.XXXer.com'},{{{192,168,25,104},34530},'media-riak@prod-2017-internal.XXXer.com'},{{{192,168,26,12},42432},'media-riak@prod-2107-internal.XXXer.com'},{{{192,168,25,112},51065},'media-riak@prod-2037-internal.XXXer.com'},{{{192,168,24,208},53930},'media-riak@prod-2027-internal.XXXer.com'}],{interval,#Ref<0.0.0.185>},false}
0x0000000000e0dcd8 Return addr 0x00000000030bc6e8 (proc_lib:init_p_do_apply/3 + 56)
y(0) riak_sysmon_filter
y(1) {state,0,30,0,2,{0,nil},[{{{192,168,25,98},46335},'media-riak@prod-2011-internal.XXXer.com'},{{{192,168,24,169},48853},'media-riak@prod-2031-internal.XXXer.com'},{{{192,168,25,4},45159},'media-riak@prod-2001-internal.XXXer.com'},{{{192,168,24,123},50343},'media-riak@prod-2021-internal.XXXer.com'},{{{192,168,24,168},47264},'media-riak@prod-2030-internal.XXXer.com'},{{{192,168,24,121},59550},'media-riak@prod-2020-internal.XXXer.com'},{{{192,168,25,248},39400},'media-riak@prod-2100-internal.XXXer.com'},{{{192,168,24,233},57098},'media-riak@prod-2000-internal.XXXer.com'},{{{192,168,25,115},34939},'media-riak@prod-2040-internal.XXXer.com'},{{{192,168,25,97},51083},'media-riak@prod-2010-internal.XXXer.com'},{{{192,168,25,41},38256},'media-riak@prod-2033-internal.XXXer.com'},{{{192,168,25,250},64683},'media-riak@prod-2103-internal.XXXer.com'},{{{192,168,25,100},33050},'media-riak@prod-2013-internal.XXXer.com'},{{{192,168,24,125},55403},'media-riak@prod-2023-internal.XXXer.com'},{{{192,168,25,103},38466},'media-riak@prod-2016-internal.XXXer.com'},{{{192,168,24,127},41757},'media-riak@prod-2026-internal.XXXer.com'},{{{192,168,25,90},45461},'media-riak@prod-2006-internal.XXXer.com'},{{{192,168,26,11},34804},'media-riak@prod-2106-internal.XXXer.com'},{{{192,168,25,87},41709},'media-riak@prod-2036-internal.XXXer.com'},{{{192,168,25,96},43607},'media-riak@prod-2009-internal.XXXer.com'},{{{192,168,25,240},45068},'media-riak@prod-2099-internal.XXXer.com'},{{{192,168,25,106},61231},'media-riak@prod-2019-internal.XXXer.com'},{{{192,168,25,114},58133},'media-riak@prod-2039-internal.XXXer.com'},{{{192,168,24,166},63433},'media-riak@prod-2029-internal.XXXer.com'},{{{192,168,25,105},62116},'media-riak@prod-2018-internal.XXXer.com'},{{{192,168,25,111},48904},'media-riak@prod-2028-internal.XXXer.com'},{{{192,168,26,18},44053},'media-riak@prod-2108-internal.XXXer.com'},{{{192,168,25,113},60765},'media-riak@prod-2038-internal.XXXer.com'},{{{192,168,25,95},56026},'media-riak@prod-2008-internal.XXXer.com'},{{{192,168,25,237},50572},'media-riak@prod-2098-internal.XXXer.com'},{{{192,168,25,99},40263},'media-riak@prod-2012-internal.XXXer.com'},{{{192,168,25,20},46256},'media-riak@prod-2032-internal.XXXer.com'},{{{192,168,25,249},54807},'media-riak@prod-2102-internal.XXXer.com'},{{{192,168,24,124},64396},'media-riak@prod-2022-internal.XXXer.com'},{{{192,168,25,18},50827},'media-riak@prod-2002-internal.XXXer.com'},{{{192,168,25,68},37453},'media-riak@prod-2035-internal.XXXer.com'},{{{192,168,25,89},65223},'media-riak@prod-2005-internal.XXXer.com'},{{{192,168,25,102},57790},'media-riak@prod-2015-internal.XXXer.com'},{{{192,168,24,199},47972},'media-riak@prod-2025-internal.XXXer.com'},{{{192,168,26,8},55805},'media-riak@prod-2105-internal.XXXer.com'},{{{192,168,25,101},44535},'media-riak@prod-2014-internal.XXXer.com'},{{{192,168,25,59},45894},'media-riak@prod-2034-internal.XXXer.com'},{{{192,168,24,122},36804},'media-riak@prod-2024-internal.XXXer.com'},{{{192,168,25,88},33156},'media-riak@prod-2004-internal.XXXer.com'},{{{192,168,26,7},35922},'media-riak@prod-2104-internal.XXXer.com'},{{{192,168,25,91},56857},'media-riak@prod-2007-internal.XXXer.com'},{{{192,168,25,104},34530},'media-riak@prod-2017-internal.XXXer.com'},{{{192,168,26,12},42432},'media-riak@prod-2107-internal.XXXer.com'},{{{192,168,25,112},51065},'media-riak@prod-2037-internal.XXXer.com'},{{{192,168,24,208},53930},'media-riak@prod-2027-internal.XXXer.com'}],{interval,#Ref<0.0.0.185>},false}
y(2) riak_sysmon_filter
y(3) <0.90.0>
y(4) reset
y(5) Catch 0x00000000043c5cc8 (gen_server:handle_msg/5 + 1680)
0x0000000000e0dd10 Return addr 0x0000000000672c08 (<terminate process normally>)
y(0) Catch 0x00000000030bc708 (proc_lib:init_p_do_apply/3 + 88)
ok
---
Mon 2012-01-30: more riak_sysmon_filter mailbox suffiness
prod-2102:
Pid = pid(0,6777,3).
process_info(Pid).
{_, M} = process_info(Pid, messages).
[X || X <- M, X /= reset, element(1, X) /= monitor].
result on 2102: []
result on 2107: []
result on 2108: []
All 3 have stack backtraces that are identical.
See also: Jon's notes @ https://raw.github.com/gist/67963a1342931416e552/b5200f3264db0a4c091bdfd49d679cd393d6ec87/gistfile1.txt
----
-define(CHSTATE, #chstate_v2).
-record(chstate_v2, {
nodename :: node(), % the Node responsible for this chstate
vclock :: vclock:vclock(), % for this chstate object, entries are
% {Node, Ctr}
chring :: chash:chash(), % chash ring of {IndexAsInt, Node} mappings
meta :: dict(), % dict of cluster-wide other data (primarily
% bucket N-value, etc)
clustername :: {node(), term()},
next :: [{integer(), node(), node(), [module()], awaiting | complete}],
members :: [{node(), {member_status(), vclock:vclock(), []}}],
claimant :: node(),
seen :: [{node(), vclock:vclock()}],
rvsn :: vclock:vclock()
}).
{ok, R} = riak_core_ring_manager:get_my_ring().
Changes = riak_core_ring:pending_changes(R).
Ps = [{Idx, Owner, NewOwner} || {Idx, Owner, NewOwner, Mods, _St} <- Changes].
RunDu = fun(Nd, Cmd) ->
Dunno = "-1\t",
Out = try case rpc:call(Nd, os, cmd, [Cmd]) of
L when is_list(L) -> L;
_ -> Dunno
end
catch _:_ -> Dunno
end,
case string:tokens(Out, "\t") of
[Str|_] -> list_to_integer(Str);
_ -> -1
end
end.
Space = [begin
Cmd = "du -k /var/db/riak/leveldb/" ++ integer_to_list(Idx),
OwnK = RunDu(Owner, Cmd),
NewK = RunDu(NewOwner, Cmd),
{Idx, Owner, OwnK, NewOwner, NewK}
end || {Idx, Owner, NewOwner} <- Ps].
%% sort by owner node
Space2 = lists:sort(fun(X, Y) -> element(2, X) < element(2, Y) end, Space).
%% calculate space ratios
rp([{Idx, Owner, kbyte_ratio_at_destination_PERCENT, NewK / OwnK * 100}|| {Idx, Owner, OwnK, NewOwner, NewK} <- Space2]).
---
dtrace -qn 'BEGIN { c = 0; } pid$target:eleveldb.so:eleveldb_iterator_move:entry {self->i = 1} syscall::read:entry{ c += arg2 ;} syscall::pread:entry{ c += arg2;} tick-10sec { printf("Iterator reads @ 10 sample: %d MBytes/sec\n", c / (1024*1024)/10); c = 0; }' -p `pgrep -u riak beam.smp`
----
Alright, back to prod-2003, which has got several LevelDB instances
that complain about "bad magic number" or other errors, e.g.:
[root@prod-2003 /var/db/riak/leveldb]# grep -i error */LOG | head
1043318063368056673053607043667580944695787782144/LOG:2012/01/28-21:47:27.868097 251 Compaction error: Invalid argument: not an sstable (bad magic number)
1043318063368056673053607043667580944695787782144/LOG:2012/01/28-21:48:26.053983 251 Compaction error: Invalid argument: not an sstable (bad magic number)
1043318063368056673053607043667580944695787782144/LOG:2012/01/28-21:48:39.765415
251 Compaction error: Invalid argument: not an sstable (bad magic number)
[root@prod-2003 /var/db/riak/leveldb]# grep -i error */LOG | awk -F: '{ print $
1}' | uniq -c
16928 1043318063368056673053607043667580944695787782144/LOG
16926 1117534943388766586868637914078954691787690606592/LOG
16925 1191751823409476500683668784490328438879593431040/LOG
16926 1377294023461251285221245960518762806609350492160/LOG
300 1414402463471606242128761395724449680155301904384/LOG
16926 523799903223087276348390950787964715052468011008/LOG
16926 598016783243797190163421821199338462144370835456/LOG
16927 709342103274862060885968126816399082782225072128/LOG
16926 820667423305926931608514432433459703420079308800/LOG
for i in `cat /tmp/slf.foo`; do echo Moving $i; mv $i ./slf.leveldb-troublemakers ; done
Moving 1043318063368056673053607043667580944695787782144 repair+open, screen 0
Moving 1117534943388766586868637914078954691787690606592 repair+open, screen 2
Moving 1191751823409476500683668784490328438879593431040 repair+open, screen 3
Moving 1377294023461251285221245960518762806609350492160 repair+open, screen 4
Moving 1414402463471606242128761395724449680155301904384 repair+open, screen 5
Moving 523799903223087276348390950787964715052468011008 repair+open, screen 6
Moving 598016783243797190163421821199338462144370835456 repair+open, screen 7
Moving 709342103274862060885968126816399082782225072128 repair+open, screen 8
Moving 820667423305926931608514432433459703420079308800 repair, screen 9
... of the remaining ones ...
[root@prod-2003 /var/db/riak/leveldb]# du -hs * | grep G
39G 1006209623357701716146091608461894071149836369920 seems ok
53G 1080426503378411629961122478873267818241739194368 seems ok
41G 1154643383399121543776153349284641565333642018816 seems ok
53G 1228860263419831457591184219696015312425544843264 seems ok
41G 1265968703430186414498699654901702185971496255488 seems ok
41G 1303077143440541371406215090107389059517447667712 seems ok
41G 1340185583450896328313730525313075933063399079936 seems ok
41G 1451510903481961199036276830930136553701253316608 seems ok
42G 560908343233442233255906385993651588598419423232 seems ok
41G 635125223254152147070937256405025335690322247680 seems ok
41G 672233663264507103978452691610712209236273659904 seems ok
43G 746450543285217017793483562022085956328176484352 seems ok
41G 783558983295571974700998997227772829874127896576 seems ok
45G 857775863316281888516029867639146576966030721024 seems ok
40G 894884303326636845423545302844833450511982133248 seems ok
46G 931992743336991802331060738050520324057933545472 seems ok
40G 969101183347346759238576173256207197603884957696 seems ok
All seem ok, so, let's see what happens when we try the "Moved"
directories.... Damn, every single one is complaining about "Invalid
argument: not an sstable (bad magic number)" or about missing .sst
files. So, I'm repairing every single !@#$! one of them, and let the
!@#$! thing chug through 415GB of data for however long it takes.
NOTE: *Very* important to set the ulimit first!
ulimit -n 65000
erl -pz /opt/local/riak/lib/eleveldb-1.0.1/ebin
f().
{ok, DB0} = eleveldb:open("./", [write]).
B = <<1:(1024*1024*8)>>.
time().
[eleveldb:put(DB0, <<X:32>>, B, []) || X <- lists:seq(1,2000)].
[eleveldb:delete(DB0, <<X:32>>, []) || X <- lists:seq(1,2000)].
B2 = <<1:(8*1024)>>.
[eleveldb:put(DB0, <<X:32>>, B2, []) || X <- lists:seq(1,1*1024*1000)].
[eleveldb:delete(DB0, <<X:32>>, []) || X <- lists:seq(1,1*1024*1000)].
time().
-----
2012-02-13 Monday
[2/13/12 2:45:17 PM] Scott Fritchie: Where would I find the list of 12
nodes currently leaving?
[2/13/12 2:45:36 PM] Justin Shoffstall:
media-riak@prod-2002-internal.XXXer.com
media-riak@prod-2005-internal.XXXer.com
media-riak@prod-2008-internal.XXXer.com
media-riak@prod-2009-internal.XXXer.com
media-riak@prod-2010-internal.XXXer.com
media-riak@prod-2011-internal.XXXer.com
media-riak@prod-2012-internal.XXXer.com
media-riak@prod-2014-internal.XXXer.com
media-riak@prod-2015-internal.XXXer.com
media-riak@prod-2020-internal.XXXer.com
media-riak@prod-2031-internal.XXXer.com
media-riak@prod-2035-internal.XXXer.com
---
2012-02-18 Saturday morning, 12:30 AM or so CST
Owner: media-riak@prod-2027-internal.XXXer.com
Next Owner: media-riak@prod-2106-internal.XXXer.com
Index: 174124218510127105489110888272838406638695088128
Waiting on: [riak_kv_vnode]
Complete: [riak_pipe_vnode]
So, let's study these things:
* I/O activity on 2106 (the destination
* I/O activity on 2027 (the source)
Right now on 2106, there appear to be two transfers happening:
[root@prod-2106 ~]# env TZ=CST6CDT date ; netstat -na | grep 8099 | grep EST
February 18, 2012 12:32:56 AM CST
192.168.26.11.8099 192.168.24.208.43407 128872 0 70265 0 ESTABLISHED
192.168.26.11.8099 192.168.24.124.47353 128872 0 128872 0 ESTABLISHED
So, one of them is coming from prod-2027 (192.168.24.208), hooray
for the sanity check. The other is coming from
prod-2022 (192.168.24.124). So, for grins and giggles, lets
monitor the vnode handoff folding iterator on prod-2022 also.
Output from all 3 boxes are in:
~/b/notes/2011-12-07.XXXer.data/2012-02-18.transfers-data
Only one file has a "date" timestamp in it, alas, but all three
sets of output were started within a few seconds of each
other.
----
Screen experiment on prod-2106: making copies of eleveldb dirs
for iteration experiments....
rsync -rav --delete leveldb/944837972571345441260585311775565780285378265088 991937146430642117335508748767399119786008903680 999073384894171916740800178614646595467922636800 slf-basho-leveldb-exp-data/
NOTE: The rsync is now finished, see ~root/slf-basho-leveldb-exp-data/screen.1.out
for the output of the rsync commands.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment