Skip to content

Instantly share code, notes, and snippets.

@functioner
Last active February 6, 2021 20:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save functioner/d5aad57669f4e4b92b4a8757fa2d0720 to your computer and use it in GitHub Desktop.
Save functioner/d5aad57669f4e4b92b4a8757fa2d0720 to your computer and use it in GitHub Desktop.
Attachments for ZooKeeper-4203

Usage

Step 1

Modify the environment variable ZK_HOME and BYTEMAN_HOME in the setup.sh script. Don't forget to compile the ZooKeeper in ZK_HOME.

Step 2

$ ./setup.sh

The setup.sh script creates the environment and other scripts in this directory.

Step 3

$ ./start_zookeeper_cluster.sh

Start the ZooKeeper cluster of 3 nodes, with all data in this directory. The injection is based on Byteman spec serverSocketAccept-exception.btm. In the log, we can see that server 3 keeps trying to join the quorum and always fails. Server 1 and server 2 keep receiving these requests but can't accept server 3 due to the injection in the leader (server 2).

Step 4 (only required to check the fix)

$ ./stop_injection.sh 2

Stop the injection in the leader (assuming it's server 2), otherwise every time the leader recovers from the fault, the same fault will be injected. With the fix and cancelling the injection, the problematic follow is able to join the quorum and works well. It could be double-checked by feeding some workload to the problematic server in the client ./client.sh 3 where 3 is the server id of the problematic follower.

Step 5

$ ./stop_zookeeper_cluster.sh

Stop the ZooKeeper cluster for this experiment.

# Copyright 2012 The Apache Software Foundation
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Define some default values that can be overridden by system properties
zookeeper.root.logger=INFO, CONSOLE
zookeeper.console.threshold=INFO
zookeeper.log.dir=.
zookeeper.log.file=zookeeper.log
zookeeper.log.threshold=INFO
zookeeper.log.maxfilesize=256MB
zookeeper.log.maxbackupindex=20
zookeeper.tracelog.dir=${zookeeper.log.dir}
zookeeper.tracelog.file=zookeeper_trace.log
log4j.rootLogger=${zookeeper.root.logger}
#
# console
# Add "console" to rootlogger above if you want to use this
#
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.Threshold=${zookeeper.console.threshold}
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} [myid:%X{myid}] - %-5p [%t:%C{1}@%L] - %m%n
#
# Add ROLLINGFILE to rootLogger to get log file output
#
log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender
log4j.appender.ROLLINGFILE.Threshold=${zookeeper.log.threshold}
log4j.appender.ROLLINGFILE.File=${zookeeper.log.dir}/${zookeeper.log.file}
log4j.appender.ROLLINGFILE.MaxFileSize=${zookeeper.log.maxfilesize}
log4j.appender.ROLLINGFILE.MaxBackupIndex=${zookeeper.log.maxbackupindex}
log4j.appender.ROLLINGFILE.layout=org.apache.log4j.PatternLayout
log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d{ISO8601} [myid:%X{myid}] - %-5p [%t:%C{1}@%L] - %m%n
#
# Add TRACEFILE to rootLogger to get log file output
# Log TRACE level and above messages to a log file
#
log4j.appender.TRACEFILE=org.apache.log4j.FileAppender
log4j.appender.TRACEFILE.Threshold=TRACE
log4j.appender.TRACEFILE.File=${zookeeper.tracelog.dir}/${zookeeper.tracelog.file}
log4j.appender.TRACEFILE.layout=org.apache.log4j.PatternLayout
### Notice we are including log4j's NDC here (%x)
log4j.appender.TRACEFILE.layout.ConversionPattern=%d{ISO8601} [myid:%X{myid}] - %-5p [%t:%C{1}@%L][%x] - %m%n
#
# zk audit logging
#
zookeeper.auditlog.file=zookeeper_audit.log
zookeeper.auditlog.threshold=INFO
audit.logger=INFO, RFAAUDIT
log4j.logger.org.apache.zookeeper.audit.Log4jAuditLogger=${audit.logger}
log4j.additivity.org.apache.zookeeper.audit.Log4jAuditLogger=false
log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender
log4j.appender.RFAAUDIT.File=${zookeeper.log.dir}/${zookeeper.auditlog.file}
log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
log4j.appender.RFAAUDIT.Threshold=${zookeeper.auditlog.threshold}
# Max log file size of 10MB
log4j.appender.RFAAUDIT.MaxFileSize=10MB
log4j.appender.RFAAUDIT.MaxBackupIndex=10
RULE create countDown for org.apache.zookeeper.server.quorum.Leader$LearnerCnxAcceptor$LearnerCnxAcceptorHandler
CLASS org.apache.zookeeper.server.quorum.Leader$LearnerCnxAcceptor$LearnerCnxAcceptorHandler
METHOD <init>
IF TRUE
DO createCountDown($0, 1)
ENDRULE
RULE serverSocketAccept_exception
CLASS org.apache.zookeeper.server.quorum.Leader$LearnerCnxAcceptor$LearnerCnxAcceptorHandler
METHOD acceptConnections
AT INVOKE java.net.ServerSocket.accept
IF countDown($0)
DO
traceln("Captured by Byteman, inject IOException");
THROW new java.io.IOException("inject exception");
ENDRULE
#!/bin/bash
CUR_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
ZK_HOME=$CUR_DIR/../3.6.2 # you may modify this to set the ZooKeeper directory
BYTEMAN_HOME=$CUR_DIR/../../byteman # you may modify this to set the ByteMan directory
CLIENT_PORT_BASE=10711
BPORT_BASE=9090
BM_SPEC=$CUR_DIR/serverSocketAccept-exception.btm
for i in {1..3}
do
rm -rf $CUR_DIR/conf-$i $CUR_DIR/logs-$i $CUR_DIR/store-$i
mkdir -p $CUR_DIR/conf-$i $CUR_DIR/logs-$i $CUR_DIR/store-$i
echo $i > $CUR_DIR/store-$i/myid
cp $CUR_DIR/log4j.properties $CUR_DIR/conf-$i/
echo "
tickTime=2000
initLimit=10
syncLimit=5
clientPort=$(($CLIENT_PORT_BASE + $i))
dataDir=$CUR_DIR/store-$i" > $CUR_DIR/conf-$i/zoo.cfg
for j in {1..3}
do
echo "server.$j=localhost:$((10813 + $j)):$((10913 + $j))" >> $CUR_DIR/conf-$i/zoo.cfg
done
done
echo "
#!/bin/bash
for ((i=1;i<=3;i++))
do
export ZOOCFGDIR=$CUR_DIR/conf-\$i
export ZOO_LOG_DIR=$CUR_DIR/logs-\$i
[[ \$i = 2 ]] && export SERVER_JVMFLAGS=\"-javaagent:${BYTEMAN_HOME}/lib/byteman.jar=script:$BM_SPEC,listener:true,port:\$(($BPORT_BASE + \$i)),boot:${BYTEMAN_HOME}/lib/byteman.jar -Dorg.jboss.byteman.transform.all\"
$ZK_HOME/bin/zkServer.sh start
unset ZOOCFGDIR
unset ZOO_LOG_DIR
unset SERVER_JVMFLAGS
done
" > $CUR_DIR/start_zookeeper_cluster.sh
chmod +x $CUR_DIR/start_zookeeper_cluster.sh
echo "
#!/bin/bash
for ((i=1;i<=3;i++))
do
export ZOOCFGDIR=$CUR_DIR/conf-\$i
export ZOO_LOG_DIR=$CUR_DIR/logs-\$i
$ZK_HOME/bin/zkServer.sh stop
unset ZOOCFGDIR
unset ZOO_LOG_DIR
done
" > $CUR_DIR/stop_zookeeper_cluster.sh
chmod +x $CUR_DIR/stop_zookeeper_cluster.sh
echo "
#!/bin/bash
export ZOOCFGDIR=$CUR_DIR/conf-\$1
export ZOO_LOG_DIR=$CUR_DIR/logs-\$1
$ZK_HOME/bin/zkCli.sh -server localhost:\$(($CLIENT_PORT_BASE + \$1))
unset ZOOCFGDIR
unset ZOO_LOG_DIR
" > $CUR_DIR/client.sh
chmod +x $CUR_DIR/client.sh
echo "
#!/bin/bash
$BYTEMAN_HOME/bin/bmsubmit.sh -p \$(($BPORT_BASE + \$1)) -u $BM_SPEC
" > $CUR_DIR/stop_injection.sh
chmod +x $CUR_DIR/stop_injection.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment