surajssd/000-unbalanced-storage-rebalancing.md

## 000-unbalanced-storage-rebalancing.md

      
    Raw
  

              000-unbalanced-storage-rebalancing.md
            
          
    The cluster was in HEALTH_WARN state with backfill errors. So I followed the advise from https://centosquestions.com/how-to-resolve-ceph-pool-getting-activeremappedbackfill_toofull/.
See the health:
# ceph health detail
HEALTH_WARN 1 backfillfull osd(s); 1 pool(s) backfillfull
OSD_BACKFILLFULL 1 backfillfull osd(s)
    osd.8 is backfill full
POOL_BACKFILLFULL 1 pool(s) backfillfull
    pool 'replicapool' is backfillfull
You can see in the following image that some of the OSDs have some space but they are not rebalancing:

Now I do the rebalancing:
# ceph osd reweight-by-utilization
moved 4 / 384 (1.04167%)
avg 25.6
stddev 10.5249 -> 10.5502 (expected baseline 4.88808)
min osd.4 with 8 -> 8 pgs (0.3125 -> 0.3125 * mean)
max osd.1 with 39 -> 37 pgs (1.52344 -> 1.44531 * mean)

oload 120
max_change 0.05
max_change_osds 4
average_utilization 0.6327
overload_utilization 0.7592
osd.8 weight 1.0000 -> 0.9500
osd.1 weight 1.0000 -> 0.9500
Now they are all rebalanced:

And HEALTH_OK again.
# ceph health detail
HEALTH_OK

  
## 001-port-forward-failure.md

      
    Raw
  

              001-port-forward-failure.md
            
          
    If this happens on rook ceph mgr pod restart then:
$ kubectl -n rook port-forward svc/rook-ceph-mgr-dashboard 8443:8443
Forwarding from 127.0.0.1:8443 -> 8443
Forwarding from [::1]:8443 -> 8443
Handling connection for 8443
E0609 00:11:54.021904 2368833 portforward.go:400] an error occurred forwarding 8443 -> 8443: error forwarding port 8443 to pod db763885aa7d42544815ffeda2b36e334a24a59094c7e84a7569c864475c77a1, uid : exit status 1: 2020/06/08 18:41:53 socat[1448] E connect(6, AF=2 127.0.0.1:8443, 16): Connection refused Handling connection for 8443
If you see error like this E0609 00:11:54.021904 2368833 portforward.go:400] an error occurred forwarding 8443 -> 8443: error forwarding port 8443 to pod db763885aa7d42544815ffeda2b36e334a24a59094c7e84a7569c864475c77a1, uid : exit status 1: 2020/06/08 18:41:53 socat[1448] E connect(6, AF=2 127.0.0.1:8443, 16): Connection refused Handling connection for 8443, then run following from toolbox pod:
ceph config set mgr mgr/dashboard/server_addr 0.0.0.0

  
## 01.png

      
    Raw
  

              01.png
            
          
## 02.png

      
    Raw
  

              02.png