Skip to content

Instantly share code, notes, and snippets.

@functioner
Created January 17, 2021 00:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save functioner/ad44b5e457c8cb22eac5fc861f56d0d4 to your computer and use it in GitHub Desktop.
Save functioner/ad44b5e457c8cb22eac5fc861f56d0d4 to your computer and use it in GitHub Desktop.
Attachments for a ZooKeeper 3.6.2 bug case

Usage

Step 1

Modify the environment variable ZK_HOME and BYTEMAN_HOME in the setup.sh script. Don't forget to compile the ZooKeeper in ZK_HOME.

Step 2

$ ./setup.sh

The setup.sh script creates the environment and other scripts in this directory.

Step 3

$ ./start_zookeeper_cluster.sh

Start the ZooKeeper cluster of 3 nodes, with all data in this directory.

Step 4

$ ./client.sh 1

Start the client connected to server 1, which is assumed to be a follower. You can connect to server 2 or 3 similarly.

Then you can try to make some requests in the client REPL. Everything should work well now.

Step 5

$ ./start_injection.sh 1

Inject the fault in server 1. Similarly you can inject in server 2 or 3.

After the injection, each invocation specified in writePacket-delay.btm will be injected a 30s delay. You can modify this duration in the btm file.

Step 6

You can continue to use the client in Step 4 or start a new client according to the instruction in Step 4.

Since you've inject the fault in server 1 (or another server), you should be able to see that in the faulty server, the client gets stuck.

You can also see something in the log of the leader node (in ./logs-2, if server 2 is the leader).

Step 7

$ ./stop_injection.sh 1

Stop the fault injection in server 1. Similarly you can do in in server 2 or 3.

After 30s (or other duration you specify in the btm file), you should be able to see that the client is not stuck and everything works well.

Step 8

$ ./stop_zookeeper_cluster.sh

Stop the ZooKeeper cluster for this experiment.

# Copyright 2012 The Apache Software Foundation
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Define some default values that can be overridden by system properties
zookeeper.root.logger=INFO, CONSOLE
zookeeper.console.threshold=INFO
zookeeper.log.dir=.
zookeeper.log.file=zookeeper.log
zookeeper.log.threshold=INFO
zookeeper.log.maxfilesize=256MB
zookeeper.log.maxbackupindex=20
zookeeper.tracelog.dir=${zookeeper.log.dir}
zookeeper.tracelog.file=zookeeper_trace.log
log4j.rootLogger=${zookeeper.root.logger}
#
# console
# Add "console" to rootlogger above if you want to use this
#
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.Threshold=${zookeeper.console.threshold}
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} [myid:%X{myid}] - %-5p [%t:%C{1}@%L] - %m%n
#
# Add ROLLINGFILE to rootLogger to get log file output
#
log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender
log4j.appender.ROLLINGFILE.Threshold=${zookeeper.log.threshold}
log4j.appender.ROLLINGFILE.File=${zookeeper.log.dir}/${zookeeper.log.file}
log4j.appender.ROLLINGFILE.MaxFileSize=${zookeeper.log.maxfilesize}
log4j.appender.ROLLINGFILE.MaxBackupIndex=${zookeeper.log.maxbackupindex}
log4j.appender.ROLLINGFILE.layout=org.apache.log4j.PatternLayout
log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d{ISO8601} [myid:%X{myid}] - %-5p [%t:%C{1}@%L] - %m%n
#
# Add TRACEFILE to rootLogger to get log file output
# Log TRACE level and above messages to a log file
#
log4j.appender.TRACEFILE=org.apache.log4j.FileAppender
log4j.appender.TRACEFILE.Threshold=TRACE
log4j.appender.TRACEFILE.File=${zookeeper.tracelog.dir}/${zookeeper.tracelog.file}
log4j.appender.TRACEFILE.layout=org.apache.log4j.PatternLayout
### Notice we are including log4j's NDC here (%x)
log4j.appender.TRACEFILE.layout.ConversionPattern=%d{ISO8601} [myid:%X{myid}] - %-5p [%t:%C{1}@%L][%x] - %m%n
#
# zk audit logging
#
zookeeper.auditlog.file=zookeeper_audit.log
zookeeper.auditlog.threshold=INFO
audit.logger=INFO, RFAAUDIT
log4j.logger.org.apache.zookeeper.audit.Log4jAuditLogger=${audit.logger}
log4j.additivity.org.apache.zookeeper.audit.Log4jAuditLogger=false
log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender
log4j.appender.RFAAUDIT.File=${zookeeper.log.dir}/${zookeeper.auditlog.file}
log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
log4j.appender.RFAAUDIT.Threshold=${zookeeper.auditlog.threshold}
# Max log file size of 10MB
log4j.appender.RFAAUDIT.MaxFileSize=10MB
log4j.appender.RFAAUDIT.MaxBackupIndex=10
#!/bin/bash
CUR_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
ZK_HOME=$CUR_DIR/../3.6.2 # you may modify this to set the ZooKeeper directory
BYTEMAN_HOME=$CUR_DIR/../../byteman # you may modify this to set the ByteMan directory
CLIENT_PORT_BASE=10711
BPORT_BASE=9090
BM_SPEC=$CUR_DIR/writePacket-delay.btm
for i in {1..3}
do
rm -rf $CUR_DIR/conf-$i $CUR_DIR/logs-$i $CUR_DIR/store-$i
mkdir -p $CUR_DIR/conf-$i $CUR_DIR/logs-$i $CUR_DIR/store-$i
echo $i > $CUR_DIR/store-$i/myid
cp $CUR_DIR/log4j.properties $CUR_DIR/conf-$i/
echo "
tickTime=2000
initLimit=10
syncLimit=5
clientPort=$(($CLIENT_PORT_BASE + $i))
dataDir=$CUR_DIR/store-$i" > $CUR_DIR/conf-$i/zoo.cfg
for j in {1..3}
do
echo "server.$j=localhost:$((10813 + $j)):$((10913 + $j))" >> $CUR_DIR/conf-$i/zoo.cfg
done
done
echo "
#!/bin/bash
for ((i=1;i<=3;i++))
do
export ZOOCFGDIR=$CUR_DIR/conf-\$i
export ZOO_LOG_DIR=$CUR_DIR/logs-\$i
export SERVER_JVMFLAGS=\"-javaagent:${BYTEMAN_HOME}/lib/byteman.jar=listener:true,port:\$(($BPORT_BASE + \$i)),boot:${BYTEMAN_HOME}/lib/byteman.jar -Dorg.jboss.byteman.transform.all\"
$ZK_HOME/bin/zkServer.sh start
unset ZOOCFGDIR
unset ZOO_LOG_DIR
unset SERVER_JVMFLAGS
done
" > $CUR_DIR/start_zookeeper_cluster.sh
chmod +x $CUR_DIR/start_zookeeper_cluster.sh
echo "
#!/bin/bash
for ((i=1;i<=3;i++))
do
export ZOOCFGDIR=$CUR_DIR/conf-\$i
export ZOO_LOG_DIR=$CUR_DIR/logs-\$i
$ZK_HOME/bin/zkServer.sh stop
unset ZOOCFGDIR
unset ZOO_LOG_DIR
done
" > $CUR_DIR/stop_zookeeper_cluster.sh
chmod +x $CUR_DIR/stop_zookeeper_cluster.sh
echo "
#!/bin/bash
$BYTEMAN_HOME/bin/bmsubmit.sh -p \$(($BPORT_BASE + \$1)) -l $BM_SPEC
" > $CUR_DIR/start_injection.sh
chmod +x $CUR_DIR/start_injection.sh
echo "
#!/bin/bash
$BYTEMAN_HOME/bin/bmsubmit.sh -p \$(($BPORT_BASE + \$1)) -u $BM_SPEC
" > $CUR_DIR/stop_injection.sh
chmod +x $CUR_DIR/stop_injection.sh
echo "
#!/bin/bash
export ZOOCFGDIR=$CUR_DIR/conf-\$1
export ZOO_LOG_DIR=$CUR_DIR/logs-\$1
$ZK_HOME/bin/zkCli.sh -server localhost:\$(($CLIENT_PORT_BASE + \$1))
unset ZOOCFGDIR
unset ZOO_LOG_DIR
" > $CUR_DIR/client.sh
chmod +x $CUR_DIR/client.sh

FollowerRequestProcessor (appears when doing transactions and replying to the client):

  • org.apache.zookeeper.server.quorum.FollowerRequestProcessor#run -> org.apache.zookeeper.server.quorum.Learner#request -> org.apache.zookeeper.server.quorum.Learner#writePacket

SyncRequestProcessor (appears when doing transactions):

  • org.apache.zookeeper.server.SyncRequestProcessor#run -> org.apache.zookeeper.server.SyncRequestProcessor#flush -> org.apache.zookeeper.server.quorum.SendAckRequestProcessor#processRequest -> org.apache.zookeeper.server.quorum.Learner#writePacket

QuorumPeer (the first two stack traces are about ZAB protocol at the beginning of FOLLOWING state, and the last one is heartbeat PING):

  • org.apache.zookeeper.server.quorum.QuorumPeer#run -> org.apache.zookeeper.server.quorum.Follower#followLeader -> org.apache.zookeeper.server.quorum.Learner#registerWithLeader -> org.apache.zookeeper.server.quorum.Learner#writePacket
  • org.apache.zookeeper.server.quorum.QuorumPeer#run -> org.apache.zookeeper.server.quorum.Follower#followLeader -> org.apache.zookeeper.server.quorum.Learner#syncWithLeader -> org.apache.zookeeper.server.quorum.Learner#writePacket
  • org.apache.zookeeper.server.quorum.QuorumPeer#run -> org.apache.zookeeper.server.quorum.Follower#followLeader -> org.apache.zookeeper.server.quorum.Follower#processPacket -> org.apache.zookeeper.server.quorum.Learner#ping -> org.apache.zookeeper.server.quorum.Learner#writePacket
RULE writePacket_delay
CLASS org.apache.zookeeper.server.quorum.Learner
METHOD writePacket
AT INVOKE org.apache.jute.OutputArchive.writeRecord
IF TRUE
DO
traceln("Captured by Byteman, sleep for 30s");
Thread.sleep(30000);
ENDRULE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment