Stack overcloud CREATE_COMPLETE
Overcloud Endpoint: http://172.21.0.18:5000/v2.0
Overcloud Deployed
real 59m46.251s
user 0m5.943s
sys 0m0.459s
Mon Oct 3 04:29:52 UTC 2016
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
id_rsa 100% 1679 1.6KB/s 00:00
real 5m39.505s
user 0m1.558s
sys 0m3.230s
Stack overcloud UPDATE_COMPLETE
Overcloud Endpoint: http://172.21.0.18:5000/v2.0
Overcloud Deployed
real 234m50.263s
user 0m13.174s
sys 0m0.948s
Tue Oct 4 16:23:19 UTC 2016
Our Scale deployment failed due to a Supermico node becoming wedged. I tried to scale back, however that ended in failre as well. So I have started over.
To deploy 30 nodes, the timings were near identical.
When attempting to deploy 60 nodes out of the gate we saw (on the overcloud controller) :
Oct 6 20:47:50 localhost os-collect-config:<ErrorResponse><Error><Message>The request processing has failed due to an internalerror:Timed out waiting for a reply to message ID ed3a0334366348479a8fdcbf65b3a68e</Message><Code>InternalFailure</Code><Type>Server</Type></Error></ErrorResponse>+ rm/tmp/tmp.xZ5F0rDJNI
Speaking with Shardy on IRC this could be due to :
05:36:30 shardy | rook: Yes, it can happen when a back end process (such as heat-engine) is overloaded and thus takes too long to repond to an RPC call
...
05:43:05 shardy | rook: it's possible we need to tune the swift config on the undercloud to cope with 60 nodes hitting the API at the same time?
We successfully deployed 3 controllers and 30 computes. When we attempted to scale to 60 compute nodes, computenode-10 become hungup on the below.
[stack@c04-h01-6048r ~]$ heat resource-list -n 5 overcloud | grep -i prog
WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead
| ComputeAllNodesDeployment | 2c102ae9-e979-4e50-9201-d740de9b012e | OS::Heat::StructuredDeployments | UPDATE_IN_PROGRESS | 2016-10-07T12:27:57 | overcloud |
| 10 | ea656a19-71e7-4008-80d2-bfba03afd4e6 | OS::Heat::StructuredDeployment | UPDATE_IN_PROGRESS | 2016-10-07T12:28:55 | overcloud-ComputeAllNodesDeployment-oeyrhq6e7mor
<rant> The first deployment works with computenode-10, but the scaling to 60, things fail? This is totally broken. </rant>
Success!
Stack overcloud UPDATE_COMPLETE
Overcloud Endpoint: http://172.21.0.13:5000/v2.0
Overcloud Deployed
real 262m42.476s
user 0m13.942s
sys 0m1.072s
Fri Oct 7 19:18:20 UTC 2016
2016-10-07 19:50:09 [overcloud-Compute-gci7wjialpks]: UPDATE_FAILED ResourceInError: resources[71].resources.NovaCompute: Went to status ERROR due to "Message:
No valid host was found. There are not enough hosts available., Code: 500"
2016-10-07 19:50:09 [NovaComputeDeployment]: SIGNAL_IN_PROGRESS Signal: deployment d8f4ebbe-1937-4939-9725-71ad607bfbc8 succeeded
2016-10-07 19:50:09 [NovaComputeDeployment]: CREATE_COMPLETE state changed
2016-10-07 19:50:10 [overcloud-Compute-gci7wjialpks-83-nip4cihj75bx]: CREATE_FAILED Resource CREATE failed: Operation cancelled
2016-10-07 19:50:11 [Compute]: UPDATE_FAILED resources.Compute: ResourceInError: resources[71].resources.NovaCompute: Went to status ERROR due to "Message: No
valid host was found. There are not enough hosts available., Code: 500"
2016-10-07 19:50:11 [overcloud]: UPDATE_FAILED resources.Compute: ResourceInError: resources[71].resources.NovaCompute: Went to status ERROR due to "Message: N
o valid host was found. There are not enough hosts available., Code: 500"
2016-10-07 19:50:15 [UpdateDeployment]: SIGNAL_IN_PROGRESS Signal: deployment 62cae6a1-a07a-43bf-87fe-4587c5a0d91f succeeded
2016-10-07 19:50:15 [UpdateDeployment]: CREATE_COMPLETE state changed
2016-10-07 19:50:18 [NetworkDeployment]: SIGNAL_IN_PROGRESS Signal: deployment 3f0dbe87-6b5e-4e53-9b5e-21e47c3e3ea4 succeeded
2016-10-07 19:50:18 [NetworkDeployment]: CREATE_COMPLETE state changed
2016-10-07 19:50:18 [overcloud-Compute-gci7wjialpks-77-2ofjvzs6pf6i]: CREATE_FAILED Resource CREATE failed: Operation cancelled
Stack overcloud UPDATE_FAILED
Heat Stack update failed.
Stack overcloud CREATE_COMPLETE
Overcloud Endpoint: http://172.21.0.18:5000/v2.0
Overcloud Deployed
real 366m46.211s
user 0m15.143s
sys 0m2.439s