Skip to content

Instantly share code, notes, and snippets.

@Millnert
Created May 5, 2018 12:42
Show Gist options
  • Save Millnert/601924584bfe4b7743d01fe9c1f13967 to your computer and use it in GitHub Desktop.
Save Millnert/601924584bfe4b7743d01fe9c1f13967 to your computer and use it in GitHub Desktop.
ceph recovery when FUBAR (mem / cpu looping crashing OSDs with OOM)
## Stop all OSDs
## Set OSD nodown
ceph osd set nodown
##Set OSD nobackfill
ceph osd set nobackfill
## Set OSD noup
ceph osd set noup
## Set map cache size smaller to reduce the overall memory footprint. In the [osd] section on each OSD node add:
[osd]
osd map cache size = 50
osd map max advance = 25
osd map share max epochs = 25
osd pg epoch persisted max stale = 25
## Start all OSDs, and let them catch up on their maps.
## Unset noup to trigger peering across all pgs at once.
ceph osd unset noup
##Once peering has completed, unset noout, nodown and nobackfill.
ceph osd unset noout
ceph osd unset nodown
ceph osd unset nobackfill
This should allow the recovery to complete using a smaller memory footprint.
##
systemctl stop ceph.target
ps aux | grep ceph
systemctl start ceph.target
top -cuceph
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment