Skip to content

Instantly share code, notes, and snippets.

@jasonrm
Last active May 29, 2017 08:41
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jasonrm/55b26261271b23fb707fa093d2d2af95 to your computer and use it in GitHub Desktop.
Save jasonrm/55b26261271b23fb707fa093d2d2af95 to your computer and use it in GitHub Desktop.

Roughly the initial state

❯ ceph osd dump
2017-05-26 21:46:43.373290 7f8bb3513700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
2017-05-26 21:46:43.379603 7f8bb3513700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
epoch 3361
fsid bc56bb17-4fad-40c2-93fa-cb07c3f7da0a
created 2017-02-28 08:43:12.121679
modified 2017-05-25 06:42:27.953396
flags noout,sortbitwise,require_jewel_osds,require_kraken_osds,require_luminous_osds
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client firefly
min_compat_client firefly 0.80
pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 178 flags hashpspool stripe_width 0
        removed_snaps [1~3]
pool 1 '.rgw.root' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 179 flags hashpspool stripe_width 0
pool 2 'default.rgw.control' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 180 flags hashpspool stripe_width 0
pool 3 'default.rgw.data.root' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 181 flags hashpspool stripe_width 0
pool 4 'default.rgw.gc' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 182 flags hashpspool stripe_width 0
pool 5 'default.rgw.log' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 183 flags hashpspool stripe_width 0
pool 6 'default.rgw.intent-log' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 184 flags hashpspool stripe_width 0
pool 7 'default.rgw.meta' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 185 flags hashpspool stripe_width 0
pool 8 'default.rgw.usage' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 186 flags hashpspool stripe_width 0
pool 9 'default.rgw.users.keys' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 187 flags hashpspool stripe_width 0
pool 10 'default.rgw.users.email' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 188 flags hashpspool stripe_width 0
pool 11 'default.rgw.users.swift' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 189 flags hashpspool stripe_width 0
pool 12 'default.rgw.users.uid' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 190 flags hashpspool stripe_width 0
pool 13 'default.rgw.buckets.extra' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 191 flags hashpspool stripe_width 0
pool 14 'default.rgw.buckets.index' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 16 pgp_num 16 last_change 192 flags hashpspool stripe_width 0
pool 15 'default.rgw.buckets.data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 193 flags hashpspool stripe_width 0
pool 16 'default.rgw.lc' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 195 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 17 'default.rgw.buckets.non-ec' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 197 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 18 'cephfs_data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 329 flags hashpspool stripe_width 0
pool 19 'cephfs_metadata' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 332 flags hashpspool stripe_width 0
max_osd 10
# Node 1
osd.0 down in  weight 1 up_from 3338 up_thru 3338 down_at 3343 last_clean_interval [2861,2999) 10.76.35.13:6800/14024 10.76.35.13:6801/14024 10.76.35.13:6802/14024 10.76.35.13:6803/14024 exists ccaf6a9b-55bf-4dd2-8f88-42c94d4f679e
osd.1 down in  weight 1 up_from 3282 up_thru 3282 down_at 3289 last_clean_interval [3171,3171) 10.76.35.13:6804/41918 10.76.35.13:6805/41918 10.76.35.13:6806/41918 10.76.35.13:6807/41918 exists 5193a91d-58eb-47c3-bbdf-8ba2c50ca8cf
osd.2 down in  weight 1 up_from 3284 up_thru 3284 down_at 3287 last_clean_interval [2863,2999) 10.76.35.13:6800/44608 10.76.35.13:6801/44608 10.76.35.13:6802/44608 10.76.35.13:6803/44608 exists 10ffb66e-26ac-4cab-9484-f730d691cb87
# Node 2
osd.3 up   in  weight 1 up_from 3345 up_thru 3345 down_at 3344 last_clean_interval [2839,3002) 10.76.35.12:6800/28031 10.76.35.12:6801/28031 10.76.35.12:6802/28031 10.76.35.12:6803/28031 exists,up 2c36efd1-80eb-431a-89c2-f1e41920b2aa
osd.4 down in  weight 1 up_from 3161 up_thru 3161 down_at 3165 last_clean_interval [2837,3002) 10.76.35.12:6800/16133 10.76.35.12:6801/16133 10.76.35.12:6802/16133 10.76.35.12:6803/16133 exists 8e0c8bc4-dfb8-486f-85e8-8611984616e3
osd.5 down in  weight 1 up_from 3154 up_thru 3154 down_at 3174 last_clean_interval [2847,3002) 10.76.35.12:6804/13617 10.76.35.12:6805/13617 10.76.35.12:6806/13617 10.76.35.12:6807/13617 exists e4a26d57-301a-4309-8c5b-eee7c35c7724
# Node 3
osd.6 up   in  weight 1 up_from 3360 up_thru 3360 down_at 3359 last_clean_interval [2984,2985) 10.76.35.14:6800/26017 10.76.35.14:6801/26017 10.76.35.14:6802/26017 10.76.35.14:6803/26017 exists,up 0f4f44ff-8f57-43ac-a5e5-aa9ebee4ad3a
osd.7 down in  weight 1 up_from 3260 up_thru 3260 down_at 3266 last_clean_interval [2964,2965) 10.76.35.14:6804/9161 10.76.35.14:6805/9161 10.76.35.14:6806/9161 10.76.35.14:6807/9161 exists afeb1fae-b657-4b46-80b7-a7558697029b
osd.8 down in  weight 1 up_from 3262 up_thru 3262 down_at 3266 last_clean_interval [2916,2920) 10.76.35.14:6808/16264 10.76.35.14:6809/16264 10.76.35.14:6810/16264 10.76.35.14:6811/16264 exists bc554390-91a8-44ad-9e4c-440e2ebef8f9
osd.9 down in  weight 1 up_from 3265 up_thru 3265 down_at 3271 last_clean_interval [2938,2940) 10.76.35.14:6800/23072 10.76.35.14:6801/23072 10.76.35.14:6802/23072 10.76.35.14:6803/23072 exists f3dc533b-fd07-4602-bb0b-e8e0b4d7a973
~/osdmaps root@nemo
❯ ceph -s
    cluster bc56bb17-4fad-40c2-93fa-cb07c3f7da0a
     health HEALTH_ERR
            1307 pgs are stuck inactive for more than 300 seconds
            149 pgs degraded
            155 pgs down
            424 pgs peering
            728 pgs stale
            149 pgs stuck degraded
            579 pgs stuck inactive
            728 pgs stuck stale
            728 pgs stuck unclean
            149 pgs stuck undersized
            149 pgs undersized
            recovery 178288/1788248 objects degraded (9.970%)
            8/10 in osds are down
            noout flag(s) set
            no active mgr
     monmap e7: 3 mons at {dory=10.76.35.13:6789/0,marlin=10.76.35.12:6789/0,nemo=10.76.35.14:6789/0}
            election epoch 182, quorum 0,1,2 marlin,dory,nemo
        mgr no daemons active
     osdmap e3364: 10 osds: 2 up, 10 in
            flags noout
      pgmap v607024: 728 pgs, 20 pools, 2942 GB data, 873k objects
            6117 GB used, 13533 GB / 19650 GB avail
            79.533% pgs inactive
            178288/1788248 objects degraded (9.970%)
                 424 stale+peering
                 155 stale+down
                 149 stale+active+undersized+degraded

Note that ceph dump and status lies, no OSDs were running at the time.

There was also some weirdness were runing the same command multiple times would not work.

~/osdmaps root@nemo
❯ ceph osd getmap 177 | shasum
got osdmap epoch 177
e01de918e0d1d29cb4711b2247e9847c993f3130  -

~/osdmaps root@nemo
❯ ceph osd getmap 177 | shasum
Error ENOENT: there is no map for epoch 177
da39a3ee5e6b4b0d3255bfef95601890afd80709  -

~/osdmaps root@nemo
❯ ceph osd getmap 177 | shasum
Error ENOENT: there is no map for epoch 177
da39a3ee5e6b4b0d3255bfef95601890afd80709  -

Thought it might have been a monitor issue so I stopped all monitors and removed all but one monitor and restarted the remaining master with the new modified monmap. That seems to have fixed the intermitent ceph osd getmap 177 but it didn't help with starting OSDs.

Possible Fix (what could go wrong with using --force)

Extracted all of the osdmaps I could from the monitors. The 3200 comes from the ceph osd dump and picking a number that was higer than the last_clean_interval of most of the OSDs. I probably could have picked something lower.

for i in $(seq 1 3200); do
  if [ ! -f osdmap.$i ]; then
    ceph osd getmap $i > osdmap.$i
  fi
done

And extract some from OSDs (maybe). This probably should start at the lowest number minus 1 that was extracted from the monitors copy of the osdmap

OSD_NUM=7
for i in $(seq 3200 -1 1); do
  if [ ! -f osdmap.$i ]; then
    ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-$OSD_NUM --op get-osdmap --file osdmap.$i --epoch $i
  fi
done

Before we start writing maps, I had to delete any map files that were empty (append -delete to actually delete of course)

find . -iname "osdmap.*" -size 0

And then finaly, to fill in the gaps we did something horrible. We wrote the last one before the gap to everything in the gap. In my case this was 426 and so for conveience I just hard coded it. It might be possible to

OSD_NUM=7
LATEST_MAPFILE=osdmap.1
for i in $(seq 178 2671); do
  MAPFILE=osdmap.$i
  if [ -f osdmap.$i ]; then
    LATEST_MAPFILE=$MAPFILE
  else
    MAPFILE=$LATEST_MAPFILE
  fi
  ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-$OSD_NUM --op set-osdmap --file $MAPFILE --epoch $i --force
done

Hours later (ran it on just a single OSD at first to see if would fatally kill it, then ran in parallel on all OSDs)……

❯ ceph -s
2017-05-28 20:53:52.209119 7f050c2b2700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
2017-05-28 20:53:52.215834 7f050c2b2700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
    cluster bc56bb17-4fad-40c2-93fa-cb07c3f7da0a
     health HEALTH_ERR
            noout flag(s) set
            no active mgr
     monmap e8: 1 mons at {marlin=10.76.35.12:6789/0}
            election epoch 200, quorum 0 marlin
        mgr no daemons active
     osdmap e3745: 10 osds: 10 up, 10 in
            flags noout
      pgmap v617686: 728 pgs, 20 pools, 2942 GB data, 873k objects
            6119 GB used, 13531 GB / 19650 GB avail
                 726 active+clean
                   2 active+clean+scrubbing+deep
                   
❯ ceph osd dump
2017-05-28 20:55:10.504495 7f193536b700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
2017-05-28 20:55:10.514740 7f193536b700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
epoch 3745
fsid bc56bb17-4fad-40c2-93fa-cb07c3f7da0a
created 2017-02-28 08:43:12.121679
modified 2017-05-28 20:53:31.939708
flags noout,sortbitwise,require_jewel_osds,require_kraken_osds,require_luminous_osds
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client firefly
min_compat_client firefly 0.80
pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 178 flags hashpspool stripe_width 0
        removed_snaps [1~3]
pool 1 '.rgw.root' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 179 flags hashpspool stripe_width 0
pool 2 'default.rgw.control' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 180 flags hashpspool stripe_width 0
pool 3 'default.rgw.data.root' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 181 flags hashpspool stripe_width 0
pool 4 'default.rgw.gc' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 182 flags hashpspool stripe_width 0
pool 5 'default.rgw.log' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 183 flags hashpspool stripe_width 0
pool 6 'default.rgw.intent-log' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 184 flags hashpspool stripe_width 0
pool 7 'default.rgw.meta' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 185 flags hashpspool stripe_width 0
pool 8 'default.rgw.usage' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 186 flags hashpspool stripe_width 0
pool 9 'default.rgw.users.keys' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 187 flags hashpspool stripe_width 0
pool 10 'default.rgw.users.email' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 188 flags hashpspool stripe_width 0
pool 11 'default.rgw.users.swift' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 189 flags hashpspool stripe_width 0
pool 12 'default.rgw.users.uid' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 190 flags hashpspool stripe_width 0
pool 13 'default.rgw.buckets.extra' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 191 flags hashpspool stripe_width 0
pool 14 'default.rgw.buckets.index' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 16 pgp_num 16 last_change 192 flags hashpspool stripe_width 0
pool 15 'default.rgw.buckets.data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 193 flags hashpspool stripe_width 0
pool 16 'default.rgw.lc' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 195 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 17 'default.rgw.buckets.non-ec' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 197 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 18 'cephfs_data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 329 flags hashpspool stripe_width 0
pool 19 'cephfs_metadata' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 332 flags hashpspool stripe_width 0
max_osd 10
osd.0 up   in  weight 1 up_from 3718 up_thru 3739 down_at 3713 last_clean_interval [3619,3712) 10.76.35.13:6800/3254 10.76.35.13:6801/3254 10.76.35.13:6802/3254 10.76.35.13:6803/3254 exists,up ccaf6a9b-55bf-4dd2-8f88-42c94d4f679e
osd.1 up   in  weight 1 up_from 3680 up_thru 3740 down_at 3643 last_clean_interval [3393,3405) 10.76.35.13:6804/44544 10.76.35.13:6805/44544 10.76.35.13:6806/44544 10.76.35.13:6807/44544 exists,up 5193a91d-58eb-47c3-bbdf-8ba2c50ca8cf
osd.2 up   in  weight 1 up_from 3686 up_thru 3744 down_at 3645 last_clean_interval [3391,3405) 10.76.35.13:6808/44646 10.76.35.13:6809/44646 10.76.35.13:6810/44646 10.76.35.13:6811/44646 exists,up 10ffb66e-26ac-4cab-9484-f730d691cb87
osd.3 up   in  weight 1 up_from 3727 up_thru 3731 down_at 3723 last_clean_interval [3471,3722) 10.76.35.12:6800/31421 10.76.35.12:6801/31421 10.76.35.12:6802/31421 10.76.35.12:6803/31421 exists,up 2c36efd1-80eb-431a-89c2-f1e41920b2aa
osd.4 up   in  weight 1 up_from 3737 up_thru 3742 down_at 3734 last_clean_interval [3665,3733) 10.76.35.12:6808/47642 10.76.35.12:6809/47642 10.76.35.12:6810/47642 10.76.35.12:6811/47642 exists,up 8e0c8bc4-dfb8-486f-85e8-8611984616e3
osd.5 up   in  weight 1 up_from 3647 up_thru 3720 down_at 3410 last_clean_interval [3401,3407) 10.76.35.12:6804/1790 10.76.35.12:6805/1790 10.76.35.12:6806/1790 10.76.35.12:6807/1790 exists,up e4a26d57-301a-4309-8c5b-eee7c35c7724
osd.6 up   in  weight 1 up_from 3708 up_thru 3739 down_at 3701 last_clean_interval [3465,3700) 10.76.35.14:6808/13880 10.76.35.14:6809/13880 10.76.35.14:6810/13880 10.76.35.14:6811/13880 exists,up 0f4f44ff-8f57-43ac-a5e5-aa9ebee4ad3a
osd.7 up   in  weight 1 up_from 3706 up_thru 3739 down_at 3701 last_clean_interval [3524,3700) 10.76.35.14:6804/13877 10.76.35.14:6805/13877 10.76.35.14:6806/13877 10.76.35.14:6807/13877 exists,up afeb1fae-b657-4b46-80b7-a7558697029b
osd.8 up   in  weight 1 up_from 3704 up_thru 3739 down_at 3701 last_clean_interval [3597,3700) 10.76.35.14:6800/13874 10.76.35.14:6801/13874 10.76.35.14:6802/13874 10.76.35.14:6803/13874 exists,up bc554390-91a8-44ad-9e4c-440e2ebef8f9
osd.9 up   in  weight 1 up_from 3706 up_thru 3739 down_at 3701 last_clean_interval [3576,3700) 10.76.35.14:6812/13883 10.76.35.14:6813/13883 10.76.35.14:6814/13883 10.76.35.14:6815/13883 exists,up f3dc533b-fd07-4602-bb0b-e8e0b4d7a973
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment