Skip to content

Instantly share code, notes, and snippets.

@nihr43
Last active July 1, 2022 04:18
Show Gist options
  • Save nihr43/5a84c24254337e3f2e530b99eee04acd to your computer and use it in GitHub Desktop.
Save nihr43/5a84c24254337e3f2e530b99eee04acd to your computer and use it in GitHub Desktop.
gluster Self-heal daemon is not running
The issue here is these two nodes will not sync; the heal daemon apparently will not run.
This is not necessarily a classic 2-node "split brain" issue; this was originally a stable 3-node cluster that ran fine for a few months, then one day two of the nodes went out of sync. I dropped bothe the out-of-sync nodes in hopes of live-rebuilding the cluster, but this had no effect.
The state that is shown below is just after adding a fresh node.
```
~# gluster volume heal gv0 info
Brick 10.0.0.105:/var/db/glusterfs/gv0
/
Status: Connected
Number of entries: 1
Brick 10.0.200.3:/var/db/glusterfs/gv0
Status: Connected
Number of entries: 0
```
```
~# gluster v status
Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.0.0.105:/var/db/glusterfs/gv0 49152 0 Y 1043
Brick 10.0.200.3:/var/db/glusterfs/gv0 49152 0 Y 18493
Self-heal Daemon on localhost N/A N/A N N/A
Bitrot Daemon on localhost N/A N/A N N/A
Scrubber Daemon on localhost N/A N/A N N/A
Self-heal Daemon on 10.0.200.3 N/A N/A N N/A
Bitrot Daemon on 10.0.200.3 N/A N/A N N/A
Scrubber Daemon on 10.0.200.3 N/A N/A N N/A
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks
```
```
~# gluster volume heal gv0
Launching heal operation to perform index self heal on volume gv0 has been unsuccessful:
Self-heal daemon is not running. Check self-heal daemon log file.
~# gluster volume heal gv0 enable
Enable heal on volume gv0 has been successful
~# gluster volume heal gv0 full
Launching heal operation to perform full self heal on volume gv0 has been unsuccessful:
Self-heal daemon is not running. Check self-heal daemon log file.
```
Tailing /var/log/glusterfs/* reveals no clues.
This is the third gluster cluster that I've lost due to unexplained clustering/consistency issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment