EshaMaharishi/RSM_addShard_removeShard_bugs.md

## RSM_addShard_removeShard_bugs.md

      
    Raw
  

              RSM_addShard_removeShard_bugs.md
            
          
    ###Base scenario:
You add a replica set as a shard, remove it, then add another replica set with the same setName (and therefore shardName, since we enforce those are the same)*.
If a ShardRegistry reload doesn't happen between the removeShard() and second addShard() (they happen every 30 seconds), the config server will use the old RSM during the second addShard().
This means it will target the old shard's hosts for things like validating the new shard, checking for conflicting databases, upserting the shardIdentity document, etc.
###Bug 1
If the first replica set is still up (removeShard doesn't imply the shard servers were shut down), you will be unable to add the new shard, with errmsg:
"in seed list mySet/eshamaharishi-X10DAi:15516, host eshamaharishi-X10DAi:15516 does not belong to replica set mySet; found { hosts: [ "eshamaharishi-X10DAi:15515" ], setName: "mySet", setVersion: 1, ismaster: true, secondary: false, primary: "eshamaharishi-X10DAi:15515", me: "eshamaharishi-X10DAi:15515\ ..."
Potential fix: Call ReplicaSetMonitor::remove() in the OpObserver for removes to config.shards.
###Bug 2
Bug 1 is exacerbated because, after a shard is removed, it will still respond to sharding requests because it still has its shardIdentity, and its ShardingState was never disabled.
Potential fix: add a command to disable sharding state, and make ShardingCatalogClientImpl::removeShard() send this command to the shard.
The "disableShardingState" command could easily be made to also remove the shardIdentity document from the removed shard as a bonus...
###Bug 3
If the first shard was shut down, the config will get HostUnreachable trying to contact it during addShard.
The config will then pass this HostUnreachable error back to mongos, which will luckily retry the _configsvrAddShard() command, thinking the HostUnreachable applies to the config server (this is a known issue: nodes transparently pass retriable errors upstream).
Fix: overhaul how we pass retriable errors upstream.
###Bug 4 (independent of the Base Scenario)
Doubly luckily (in this case), the stopMonitoringGuard in ShardingCatalogManagerImpl::addShard removes the replica set monitor on any error while attempting to add the shard.
So when the _configsvrAddShard command is retried because of Bug 3, now a new ReplicaSetMonitor for the new shard is created, and everything works.
However, it's bad that a simple badly formed addShard (accidental or malicious... what if it's done repeatedly?) can remove the ReplicaSetMonitor for any shard on the config server, forcing a ShardRegistry reload each time this happens.
Potential fix: only apply the stopMonitoringGuard if we actually created the ReplicaSetMonitor for the addShard.
Note: the above bugs only apply to replica set shards, since there is no ReplicaSetMonitor for standalone shards.