Skip to content

Instantly share code, notes, and snippets.

@mattlord
Last active August 25, 2023 13:53
Show Gist options
  • Save mattlord/ea49abc347b02b2ee22db4b776800a8d to your computer and use it in GitHub Desktop.
Save mattlord/ea49abc347b02b2ee22db4b776800a8d to your computer and use it in GitHub Desktop.
buffering during sharded keyspace primary vttablet restart

Test case:

pushd ${VTROOT}/examples/local

sed -i "" '42i\
  --enable_buffer \\
' ../common/scripts/vtgate-up.sh

source ../common/env.sh

CELL=zone1 ../common/scripts/etcd-up.sh

CELL=zone1 ../common/scripts/vtctld-up.sh

for i in 100 101 102; do
        CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
        SHARD=-80 CELL=zone1 KEYSPACE=commerce TABLET_UID=$i ../common/scripts/vttablet-up.sh
done

for i in 200 201 202; do
        CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
        SHARD=80- CELL=zone1 KEYSPACE=commerce TABLET_UID=$i ../common/scripts/vttablet-up.sh
done

vtctldclient InitShardPrimary --force commerce/-80 zone1-100
vtctldclient InitShardPrimary --force commerce/80- zone1-200

vtctldclient ApplyVSchema --vschema='{"sharded":true,"vindexes":{"hash":{"type":"hash"}},"tables":{"customer":{"column_vindexes":[{"column":"customer_id","name":"hash"}]}}}' commerce
vtctldclient ApplySchema --sql='create table customer(customer_id bigint not null auto_increment, email varbinary(128), primary key(customer_id))' commerce

CELL=zone1 ../common/scripts/vtgate-up.sh

echo "Starting inserts at $(date)..."
rm /tmp/query-results.txt
id=1
while true; do date >> /tmp/query-results.txt; mysql commerce -e "insert into customer (customer_id, email) values (${id}, '${RANDOM}_person@planetscale.com')" >> /tmp/query-results.txt 2>&1; let id=id+1; done &

echo "Original tablet status:"
vtctldclient GetTablets

echo "Pre shard commerce/-80 primary tablet restart buffering stats:"
curl -s localhost:15001/debug/vars | grep -E "^\"Buffer"

echo "Restarting commerce/-80 primary tablet zone1-100 $(date)..."
CELL=zone1 TABLET_UID=100 ../common/scripts/vttablet-down.sh
CELL=zone1 SHARD=-80 TABLET_UID=100 KEYSPACE=commerce ../common/scripts/vttablet-up.sh
echo "Stopping commerce/80- primary tablet zone1-200 $(date)..."
CELL=zone1 TABLET_UID=200 ../common/scripts/vttablet-down.sh
echo "Sleeping for 20 seconds before restarting commerce/80- primary tablet zone1-200 $(date)..."
sleep 20
CELL=zone1 SHARD=80- TABLET_UID=200 KEYSPACE=commerce ../common/scripts/vttablet-up.sh
echo "Completed restarts at $(date)..."

echo "New tablet status:"
vtctldclient GetTablets

echo "Sleeping for 10 seconds..."
sleep 10

echo "Stopping inserts..."
kill %1

echo "Rows created:"
mysql commerce -e "select count(*) from customer"

echo "Post restart buffering stats:"
curl -s localhost:15001/debug/vars | grep -E "^\"Buffer"

grep -Ei "event|buffer" ${VTDATAROOT}/tmp/vtgate.INFO

Results:

...
Starting inserts at Fri Aug 25 09:44:31 EDT 2023...
[1] 11607
Original tablet status:
zone1-0000000100 commerce -80 primary localhost:15100 localhost:17100 [] 2023-08-25T13:44:30Z
zone1-0000000101 commerce -80 replica localhost:15101 localhost:17101 [] <null>
zone1-0000000102 commerce -80 rdonly localhost:15102 localhost:17102 [] <null>
zone1-0000000200 commerce 80- primary localhost:15200 localhost:17200 [] 2023-08-25T13:44:30Z
zone1-0000000201 commerce 80- replica localhost:15201 localhost:17201 [] <null>
zone1-0000000202 commerce 80- rdonly localhost:15202 localhost:17202 [] <null>
Pre shard commerce/-80 primary tablet restart buffering stats:
"BufferFailoverDurationSumMs": {"commerce.80-": 0, "commerce.-80": 0},
"BufferLastFailoverDurationMs": {},
"BufferLastRequestsDryRunMax": {},
"BufferLastRequestsInFlightMax": {},
"BufferRequestsBuffered": {"commerce.80-": 0, "commerce.-80": 0},
"BufferRequestsBufferedDryRun": {"commerce.80-": 0, "commerce.-80": 0},
"BufferRequestsDrained": {"commerce.80-": 0, "commerce.-80": 0},
"BufferRequestsEvicted": {"commerce.-80.ContextDone": 0, "commerce.-80.BufferFull": 0, "commerce.-80.WindowExceeded": 0, "commerce.80-.ContextDone": 0, "commerce.80-.BufferFull": 0, "commerce.80-.WindowExceeded": 0},
"BufferRequestsSkipped": {"commerce.80-.Shutdown": 0, "commerce.80-.LastReparentTooRecent": 0, "commerce.80-.LastFailoverTooRecent": 0, "commerce.-80.LastFailoverTooRecent": 0, "commerce.80-.BufferFull": 0, "commerce.-80.BufferFull": 0, "commerce.-80.Disabled": 0, "commerce.-80.Shutdown": 0, "commerce.-80.LastReparentTooRecent": 0, "commerce.80-.Disabled": 0},
"BufferSize": 1000,
"BufferStarts": {"commerce.80-": 0, "commerce.-80": 0},
"BufferStops": {"commerce.-80.Shutdown": 0, "commerce.80-.ReshardingComplete": 0, "commerce.80-.NewPrimarySeen": 0, "commerce.80-.MaxDurationExceeded": 0, "commerce.80-.Shutdown": 0, "commerce.-80.ReshardingComplete": 0, "commerce.-80.NewPrimarySeen": 0, "commerce.-80.MaxDurationExceeded": 0},
"BufferUtilizationDryRunSum": {"commerce.80-": 0, "commerce.-80": 0},
"BufferUtilizationSum": {"commerce.-80": 0, "commerce.80-": 0},
Restarting commerce/-80 primary tablet zone1-100 Fri Aug 25 09:44:32 EDT 2023...
Stopping vttablet...
Starting vttablet for zone1-0000000100...
HTTP/1.1 200 OK
Date: Fri, 25 Aug 2023 13:44:42 GMT
Content-Type: text/html; charset=utf-8

Stopping commerce/80- primary tablet zone1-200 Fri Aug 25 09:44:42 EDT 2023...
Stopping vttablet...
Sleeping for 20 seconds before restarting commerce/80- primary tablet zone1-200 Fri Aug 25 09:44:52 EDT 2023...
Starting vttablet for zone1-0000000200...
HTTP/1.1 200 OK
Date: Fri, 25 Aug 2023 13:45:12 GMT
Content-Type: text/html; charset=utf-8

Completed restarts at Fri Aug 25 09:45:12 EDT 2023...
New tablet status:
zone1-0000000100 commerce -80 primary localhost:15100 localhost:17100 [] 2023-08-25T13:44:30Z
zone1-0000000101 commerce -80 replica localhost:15101 localhost:17101 [] <null>
zone1-0000000102 commerce -80 rdonly localhost:15102 localhost:17102 [] <null>
zone1-0000000200 commerce 80- primary localhost:15200 localhost:17200 [] 2023-08-25T13:44:30Z
zone1-0000000201 commerce 80- replica localhost:15201 localhost:17201 [] <null>
zone1-0000000202 commerce 80- rdonly localhost:15202 localhost:17202 [] <null>
Sleeping for 10 seconds...
Stopping inserts...
Rows created:
[1]  + 11607 terminated  while true; do; date >> /tmp/query-results.txt; command mysql --no-defaults -
ERROR 1105 (HY000) at line 1: target: commerce.80-.primary: current keyspace is being resharded
Post restart buffering stats:
"BufferFailoverDurationSumMs": {"commerce.80-": 20001, "commerce.-80": 272},
"BufferLastFailoverDurationMs": {"commerce.-80": 272, "commerce.80-": 20001},
"BufferLastRequestsDryRunMax": {"commerce.-80": 0, "commerce.80-": 0},
"BufferLastRequestsInFlightMax": {"commerce.-80": 1, "commerce.80-": 1},
"BufferRequestsBuffered": {"commerce.80-": 2, "commerce.-80": 1},
"BufferRequestsBufferedDryRun": {"commerce.80-": 0, "commerce.-80": 0},
"BufferRequestsDrained": {"commerce.80-": 1, "commerce.-80": 1},
"BufferRequestsEvicted": {"commerce.80-.BufferFull": 0, "commerce.80-.WindowExceeded": 1, "commerce.-80.ContextDone": 0, "commerce.-80.BufferFull": 0, "commerce.-80.WindowExceeded": 0, "commerce.80-.ContextDone": 0},
"BufferRequestsSkipped": {"commerce.-80.BufferFull": 0, "commerce.-80.Disabled": 0, "commerce.-80.Shutdown": 0, "commerce.-80.LastReparentTooRecent": 0, "commerce.80-.Disabled": 0, "commerce.80-.Shutdown": 0, "commerce.80-.LastReparentTooRecent": 0, "commerce.80-.LastFailoverTooRecent": 1236, "commerce.-80.LastFailoverTooRecent": 0, "commerce.80-.BufferFull": 0},
"BufferSize": 1000,
"BufferStarts": {"commerce.80-": 1, "commerce.-80": 1},
"BufferStops": {"commerce.80-.NewPrimarySeen": 0, "commerce.80-.MaxDurationExceeded": 1, "commerce.80-.Shutdown": 0, "commerce.-80.ReshardingComplete": 0, "commerce.-80.NewPrimarySeen": 1, "commerce.-80.MaxDurationExceeded": 0, "commerce.-80.Shutdown": 0, "commerce.80-.ReshardingComplete": 0},
"BufferUtilizationDryRunSum": {"commerce.80-": 0, "commerce.-80": 0},
"BufferUtilizationSum": {"commerce.80-": 0, "commerce.-80": 0},

Condensed output of grep -Ei "event|buffer" ${VTDATAROOT}/tmp/vtgate.INFO:

I0825 09:44:31.138972   11587 flags.go:187] vtgate buffer enabled. PRIMARY requests will be buffered during detected failovers.
I0825 09:44:31.139187   11587 keyspace_events.go:90] started watching keyspace events in "zone1"
I0825 09:44:31.142645   11587 keyspace_events.go:537] created dedicated watcher for keyspace zone1/commerce
I0825 09:44:31.147711   11587 keyspace_events.go:304] keyspace event resolved: commerce/commerce is now consistent (serving: true)
I0825 09:44:31.147713   11587 keyspace_events.go:304] keyspace event resolved: commerce/commerce is now consistent (serving: true)
I0825 09:44:31.147728   11587 shard_buffer.go:485] disruption in shard commerce/80- resolved (serving: true), movetable state discovery.MoveTablesState{Typ:0, State:0}
I0825 09:44:31.147742   11587 shard_buffer.go:485] disruption in shard commerce/-80 resolved (serving: true), movetable state discovery.MoveTablesState{Typ:0, State:0}
I0825 09:44:42.124610   11587 buffer.go:92] CausedByFailover signalling failover for reason: current keyspace is being resharded
I0825 09:44:42.124624   11587 shard_buffer.go:284] Starting buffering for shard: commerce/-80 (window: 10s, size: 1000, max failover duration: 20s) (A failover was detected by this seen error: current keyspace is being resharded.)
I0825 09:44:42.397176   11587 keyspace_events.go:304] keyspace event resolved: commerce/commerce is now consistent (serving: true)
I0825 09:44:42.397180   11587 keyspace_events.go:304] keyspace event resolved: commerce/commerce is now consistent (serving: true)
I0825 09:44:42.397187   11587 shard_buffer.go:485] disruption in shard commerce/80- resolved (serving: true), movetable state discovery.MoveTablesState{Typ:0, State:0}
I0825 09:44:42.397190   11587 shard_buffer.go:485] disruption in shard commerce/-80 resolved (serving: true), movetable state discovery.MoveTablesState{Typ:0, State:0}
I0825 09:44:42.397196   11587 shard_buffer.go:590] Stopping buffering for shard: commerce/-80 after: 0.3 seconds due to: a primary promotion has been detected. Draining 1 buffered requests now.
I0825 09:44:42.397668   11587 shard_buffer.go:616] Draining finished for shard: commerce/-80 Took: 458.209µs for: 1 requests.
I0825 09:44:52.540470   11587 buffer.go:92] CausedByFailover signalling failover for reason: current keyspace is being resharded
I0825 09:44:52.540481   11587 shard_buffer.go:284] Starting buffering for shard: commerce/80- (window: 10s, size: 1000, max failover duration: 20s) (A failover was detected by this seen error: current keyspace is being resharded.)
I0825 09:45:12.541600   11587 shard_buffer.go:590] Stopping buffering for shard: commerce/80- after: 20.0 seconds due to: stopping buffering because failover did not finish in time (20s). Draining 1 buffered requests now.
I0825 09:45:12.544185   11587 shard_buffer.go:616] Draining finished for shard: commerce/80- Took: 2.539666ms for: 1 requests.
I0825 09:45:12.581370   11587 buffer.go:92] CausedByFailover signalling failover for reason: current keyspace is being resharded
I0825 09:45:12.581486   11587 shard_buffer.go:191] FailoverTooRecent-commerce/80-: NOT starting buffering for shard: commerce/80- because the last failover which triggered buffering is too recent (39.896ms < 1m0s). (A failover was detected by this seen error: Code: CLUSTER_EVENT
I0825 09:45:12.582033   11587 buffer.go:92] CausedByFailover signalling failover for reason: current keyspace is being resharded
...
I0825 09:45:17.550773   11587 buffer.go:92] CausedByFailover signalling failover for reason: current keyspace is being resharded
I0825 09:45:17.598480   11587 shard_buffer.go:191] FailoverTooRecent-commerce/80-: NOT starting buffering for shard: commerce/80- because the last failover which triggered buffering is too recent (5.056904291s < 1m0s). (A failover was detected by this seen error: Code: CLUSTER_EVENT
I0825 09:45:17.598605   11587 buffer.go:92] CausedByFailover signalling failover for reason: current keyspace is being resharded
...
I0825 09:45:22.603785   11587 buffer.go:92] CausedByFailover signalling failover for reason: current keyspace is being resharded
I0825 09:45:22.603796   11587 shard_buffer.go:191] FailoverTooRecent-commerce/80-: NOT starting buffering for shard: commerce/80- because the last failover which triggered buffering is too recent (10.062217583s < 1m0s). (A failover was detected by this seen error: Code: CLUSTER_EVENT
I0825 09:45:22.603925   11587 buffer.go:92] CausedByFailover signalling failover for reason: current keyspace is being resharded
...
I0825 09:45:22.984544   11587 buffer.go:92] CausedByFailover signalling failover for reason: current keyspace is being resharded
I0825 09:45:25.371575   11587 keyspace_events.go:304] keyspace event resolved: commerce/commerce is now consistent (serving: true)
I0825 09:45:25.371581   11587 keyspace_events.go:304] keyspace event resolved: commerce/commerce is now consistent (serving: true)
I0825 09:45:25.371599   11587 shard_buffer.go:485] disruption in shard commerce/80- resolved (serving: true), movetable state discovery.MoveTablesState{Typ:0, State:0}
I0825 09:45:25.371614   11587 shard_buffer.go:485] disruption in shard commerce/-80 resolved (serving: true), movetable state discovery.MoveTablesState{Typ:0, State:0}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment