Skip to content

Instantly share code, notes, and snippets.

@surajssd
Last active June 8, 2020 18:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save surajssd/6960400060b74a85c455903acde2a417 to your computer and use it in GitHub Desktop.
Save surajssd/6960400060b74a85c455903acde2a417 to your computer and use it in GitHub Desktop.
Debugging ceph

The cluster was in HEALTH_WARN state with backfill errors. So I followed the advise from https://centosquestions.com/how-to-resolve-ceph-pool-getting-activeremappedbackfill_toofull/.

See the health:

# ceph health detail
HEALTH_WARN 1 backfillfull osd(s); 1 pool(s) backfillfull
OSD_BACKFILLFULL 1 backfillfull osd(s)
    osd.8 is backfill full
POOL_BACKFILLFULL 1 pool(s) backfillfull
    pool 'replicapool' is backfillfull

You can see in the following image that some of the OSDs have some space but they are not rebalancing:

Now I do the rebalancing:

# ceph osd reweight-by-utilization
moved 4 / 384 (1.04167%)
avg 25.6
stddev 10.5249 -> 10.5502 (expected baseline 4.88808)
min osd.4 with 8 -> 8 pgs (0.3125 -> 0.3125 * mean)
max osd.1 with 39 -> 37 pgs (1.52344 -> 1.44531 * mean)

oload 120
max_change 0.05
max_change_osds 4
average_utilization 0.6327
overload_utilization 0.7592
osd.8 weight 1.0000 -> 0.9500
osd.1 weight 1.0000 -> 0.9500

Now they are all rebalanced:

And HEALTH_OK again.

# ceph health detail
HEALTH_OK

If this happens on rook ceph mgr pod restart then:

$ kubectl -n rook port-forward svc/rook-ceph-mgr-dashboard 8443:8443
Forwarding from 127.0.0.1:8443 -> 8443
Forwarding from [::1]:8443 -> 8443
Handling connection for 8443
E0609 00:11:54.021904 2368833 portforward.go:400] an error occurred forwarding 8443 -> 8443: error forwarding port 8443 to pod db763885aa7d42544815ffeda2b36e334a24a59094c7e84a7569c864475c77a1, uid : exit status 1: 2020/06/08 18:41:53 socat[1448] E connect(6, AF=2 127.0.0.1:8443, 16): Connection refused Handling connection for 8443

If you see error like this E0609 00:11:54.021904 2368833 portforward.go:400] an error occurred forwarding 8443 -> 8443: error forwarding port 8443 to pod db763885aa7d42544815ffeda2b36e334a24a59094c7e84a7569c864475c77a1, uid : exit status 1: 2020/06/08 18:41:53 socat[1448] E connect(6, AF=2 127.0.0.1:8443, 16): Connection refused Handling connection for 8443, then run following from toolbox pod:

ceph config set mgr mgr/dashboard/server_addr 0.0.0.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment