These are possible steps to reset a csync2 cluster that has been seriously fubared. This is an apocalyptic approach and should only be used when more surgical fixes (like correcting an individual conflict) aren't workable.
This will solve errors like:
ERROR from peer 10.0.0.1: File is also marked dirty here!
And, depending on the cause, it may fix errors like:
Database backend is exceedingly busy => Terminating (requesting retry).
ERROR from peer 10.0.0.1: Connection closed.
References to lsyncd assume you're triggering csync2 via lsyncd (as with puppet-clustersync) but can be safely ignored if that doesn't apply to you.
As written these commands do not specify a csync2 config file. If you are using csync2
with a named config, each csync2
command should include -C [config-name]
.
-
On all servers, forcibly disable all the automatic syncing:
service lsyncd stop killall csync2
-
Choose one server that's closest to the correct state and manually adjust files as needed to make it the authoritative master.
-
On all servers, update csync2 database to match filesystem but without marking anything as needing to be synced out to other servers:
csync2 -cIr /
Run all commands in this phase from the authoritative server selected in the last phase.
-
Find all differences between authoritative server and remotes and mark for sync:
csync2 -TUXI
-
Reset database to force current server to be winner on any conflicts:
csync2 -fr /
-
Run a sync to all other servers:
csync2 -xr /
-
On all servers, confirm that all files are in sync (if successful will list no files):
csync2 -T
-
On all servers, restart automatic sync service:
service lsyncd start