Last active
January 18, 2017 15:18
-
-
Save shannonmitchell/d24b98ce4a287a4cd663b464dc2f4559 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
################################# | |
# May Need Further Investigating | |
################################# | |
- ca4cbc80-f4e7-4b10-b16f-83266b5761a9(research complete): | |
-> NeutronNetworks.create_and_list_networks(1): kill neutron-server service on one node | |
-> This one had a lot of downtime around 150 -> 25 seconds of the test on and off. This was well after the restart and before the 1k limit was hit | |
- 48feed1c-c902-4bb1-b166-1b5cd62687b5(research complete) | |
-> NeutronNetworks.list_agents(1): restart keystone service on one node | |
-> NeutronNetworks.list_agents(2): restart keystone service on one node | |
-> NeutronNetworks.list_agents(3): restart keystone service on one node | |
-> These had failures around 50 seconds in on each run. | |
- 46bcfabc-2299-4822-8d08-cf8d4b212077 | |
-> NovaFlavors.list_flavors(1): restart keystone service on one node | |
-> NovaFlavors.list_flavors(2): restart keystone service on one node | |
-> NovaFlavors.list_flavors(3): restart keystone service on one node | |
-> Errors around 50 seconds in on each run | |
- 58857254-5e0a-44f0-a3b7-27d65348332e(research complete) | |
-> NovaServers.boot_and_delete_server(1): kill nova-api-os-compute service on one node | |
- efc3cf41-1cb0-453b-b2d6-7396344f3bb2 | |
-> NovaServers.boot_and_delete_server(1): kill nova-api-os-compute service on one node | |
- 59494bf7-f799-4b0a-bd30-05bf6fc9453a | |
-> NovaServers.boot_and_delete_server(1): kill nova-api-os-compute service on one node | |
- 65434a05-205f-4f1f-862f-da08dd0233e2 | |
-> NovaServers.boot_and_delete_server(1): kill nova-api-os-compute service on one node | |
- 7e273eb5-08c1-4395-b8bd-523f9ba67830 | |
-> NovaServers.boot_and_delete_server(1): kill nova-api-os-compute service on one node | |
- 89cdf8d1-29b6-4779-90f0-6d6765eb5253 | |
-> NovaServers.boot_and_delete_server(1): kill nova-api-os-compute service on one node | |
-> Error around 130 seconds in on several runs. | |
- 21b6a490-c6a9-4ed1-a4c3-6859830a9115(research complete) | |
-> SwiftObjects.create_container_and_object_then_delete_all(1): restart swift-proxy service on one node | |
-> Errors around 130-140 seconds in. | |
- 84ff85e5-08d2-4e14-a10e-d8f9187ba04a(research complete) | |
-> SwiftObjects.create_container_and_object_then_delete_all(1): restart mysql service on one node | |
-> Errors from about 130 to 290 seconds. (I think they had to manually restart) | |
- 577d7dd9-0f17-4372-ad11-ee7e0cafe07b(research complete) | |
-> SwiftObjects.list_objects_in_containers(1): restart keystone service | |
-> SwiftObjects.list_objects_in_containers(2): restart keystone service | |
-> SwiftObjects.list_objects_in_containers(3): restart keystone service | |
-> Error at about 56 or so seconds in | |
################ | |
# Neutron runs | |
################ | |
# NeutronNetworks.create_and_list_networks(1): restart neutron-metadata-agent service on one node | |
# NeutronNetworks.create_and_list_networks(1): kill neutron-metering-agent service on one node | |
# NeutronNetworks.create_and_list_networks(1): restart neutron-l3-agent service on one node | |
# NeutronNetworks.create_and_list_networks(1): restart neutron-server service on one node | |
# NeutronNetworks.create_and_list_networks(1): kill neutron-linuxbridge-agent service on one node | |
# NeutronNetworks.create_and_list_networks(1): restart neutron-linuxbridge-agent service on one node | |
# NeutronNetworks.create_and_list_networks(1): restart neutron-dhcp-agent service on one node | |
# NeutronNetworks.create_and_list_networks(1): kill neutron-metadata-agent service on one node | |
# NeutronNetworks.create_and_list_networks(1): kill neutron-l3-agent service on one node | |
# NeutronNetworks.create_and_list_networks(1): kill neutron-dhcp-agent service on one node | |
# NeutronNetworks.create_and_list_networks(1): restart mysql service on one node | |
# NeutronNetworks.create_and_list_networks(2): restart mysql service on one node | |
# NeutronNetworks.create_and_list_networks(3): restart mysql service on one node | |
- You can see a small red line from the restart, but things seemed to recover quickly and not differ much from the baseline | |
- It dies after a while due to it running pask the 1k vxlan limit in neutron. (https://01.org/jira/browse/OSIC-904) | |
- As the amount of networks increase on every creation, it will always hit a point to where it bypasses the baseline threshold eventually. | |
################################ | |
# All runs without major issues | |
################################ | |
# Authenticate.keystone(1): restart mysql service on one node. | |
# Authenticate.keystone(2): restart mysql service on one node. | |
# Authenticate.keystone(3): restart mysql service on one node. | |
# Authenticate.keystone(4): restart mysql service on one node. | |
# Authenticate.keystone(5): restart mysql service on one node. | |
# Authenticate.keystone(1): kill memcached service on one node. | |
# Authenticate.keystone(2): kill memcached service on one node. | |
# Authenticate.keystone(3): kill memcached service on one node. | |
# Authenticate.keystone(1): stressmem keystone service on one node. | |
# Authenticate.keystone(1): restart memcached service on one node. | |
# Authenticate.keystone(2): restart memcached service on one node. | |
# Authenticate.keystone(3): restart memcached service on one node. | |
# Authenticate.keystone(1): stressdisk keystone service on one node. | |
# Authenticate.keystone(1): stresscpi keystone service on one node. | |
# Authenticate.keystone(1): restart rabbitmq service on one node. | |
# Authenticate.keystone(2): restart rabbitmq service on one node. | |
# Authenticate.keystone(3): restart rabbitmq service on one node. | |
# Authenticate.keystone(4): restart rabbitmq service on one node. | |
# Authenticate.keystone(5): restart rabbitmq service on one node. | |
# CinderVolumes.list_volumes(1): restart keystone service on one node. | |
# CinderVolumes.list_volumes(2): restart keystone service on one node. | |
# CinderVolumes.list_volumes(3): restart keystone service on one node. | |
# GlanceImages.list_images(1): kill glance-api service on one node. | |
# GlanceImages.list_images(1): restart rabbitmq service on one node | |
# GlanceImages.list_images(1): restart keystone service on one node | |
# GlanceImages.list_images(2): restart keystone service on one node | |
# GlanceImages.list_images(3): restart keystone service on one node | |
# GlanceImages.list_images(1): restart mysql service on one node | |
# GlanceImages.list_images(1): restart glance-api service on one node. | |
# GlanceImages.list_images(1): kill glance-registry service on one node. | |
# GlanceImages.list_images(1): restart memcached service on one node. | |
# GlanceImages.list_images(1): restart glance-registry service on one node. | |
# GlanceImages.list_images(1): kill memcached service on one node. | |
# NovaServers.boot_and_delete_server(1): restart memecached service on one node | |
# NovaServers.boot_and_delete_server(1): kill nova-cert service on one node | |
# NovaServers.boot_and_delete_server(1): restart nova-scheduler service on one node | |
# NovaServers.boot_and_delete_server(1): restart nova-api-metadata service on one node | |
# NovaServers.boot_and_delete_server(1): kill nova-consoleauth service on one node | |
# NovaServers.boot_and_delete_server(1): restart rabbitmq service on one node | |
# NovaServers.boot_and_delete_server(1): restart nova-consoleauth service on one node | |
# NovaServers.boot_and_delete_server(1): restart nova-compute service on one node | |
# NovaServers.boot_and_delete_server(1): restart nova-cert service on one node | |
# NovaServers.boot_and_delete_server(1): kill nova-api-metadata service on one node | |
# NovaServers.boot_and_delete_server(1): kill nova-compute service on one node | |
# NovaServers.boot_and_delete_server(1): reboot one node with rabbitmq service | |
# NovaServers.boot_and_delete_server(1): kill memcached service on one node | |
# NovaServers.boot_and_delete_server(1): restart mysql service on one node | |
# NovaServers.boot_and_delete_server(1): restart nova-api-os-compute service on one node | |
# NovaServers.boot_and_delete_server(1): restart nova-spicehtml5proxy service on one node | |
# NovaServers.boot_and_delete_server(1): kill nova-conductor service on one node | |
# NovaServers.boot_and_delete_server(1): kill nova-spicehtml5proxy service on one node | |
# NovaServers.boot_and_delete_server(1): restart nova-conductor service on one node | |
# NovaServers.boot_and_delete_server(1): kill nova-scheduler service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): restart swift-object-auditor service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): restart swift-object-server service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): restart swift-container-sync service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): kill swift-account-reaper service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): kill swift-container-auditor service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): restart swift-account-reaper service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): restart swift-object-replicator service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): kill swift-object-updater service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): kill swift-container-server service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): restart swift-object-updater service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): kill swift-proxy-server service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): kill swift-account-server service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): kill swift-object-server service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): kill swift-container-replicator service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): restart swift-account-auditor service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): restart swift-container-server service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): kill swift-object-replicator service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): kill swift-object-auditor service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): restart swift-container-audiobjecttor service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): restart swift-account-replicator service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): restart swift-container-reconciler service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): restart memcached service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): kill swift-account-auditor service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): kill memcached service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): restart swift-account-server service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): restart swift-container-replicator service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): restart swift-object-expirer service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): kill swift-container-updater service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): kill swift-object-expirer service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): kill swift-conteiner-reconciler service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): kill swift-account-replicator service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): restart rabbitmq service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): restart swift-container-updater service on one node | |
# SwiftObject.create_container_and_object_then_delete_all(1): kill swift-container-sync service on one node | |
- You can see a small red line from the restart, but things seemed to recover quickly and not differ much from the baseline | |
- Overall, looks like we might need to run some tests and get a baseline for hte 'Degradation Threshold'. Looks like these | |
might be set to some default which doesn't reflect the baseline for this env. |
Created https://01.org/jira/browse/OSIC-933 with the results for 84ff85e5-08d2-4e14-a10e-d8f9187ba04a. Looking into 577d7dd9-0f17-4372-ad11-ee7e0cafe07b next.
Starting on 21b6a490-c6a9-4ed1-a4c3-6859830a9115
Created https://01.org/jira/browse/OSIC-936 for 21b6a490-c6a9-4ed1-a4c3-6859830a9115
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Log event start/finish:
(rally) root@deploy:
/os-faults/tools/output/json# grep timestamp ca4cbc80-f4e7-4b10-b16f-83266b5761a9.json | awk '{ print $2 }' | cut -d, -f1 | while read;do date -d@$REPLY;done | head -1/os-faults/tools/output/json# grep timestamp ca4cbc80-f4e7-4b10-b16f-83266b5761a9.json | awk '{ print $2 }' | cut -d, -f1 | while read;do date -d@$REPLY;done | tail -1Tue Jan 3 17:52:01 UTC 2017
(rally) root@deploy:
Tue Jan 3 18:00:01 UTC 2017
Bad Gateway:
(rally) root@deploy:~/os-faults/tools/output/json# date -d@1483466064.038841
Tue Jan 3 17:54:24 UTC 2017
Neutron event triggered start/finish:
(rally) root@deploy:
/os-faults/tools/output/json# date -d@1483466042.004516/os-faults/tools/output/json# date -d@1483466064.521169Tue Jan 3 17:54:02 UTC 2017
(rally) root@deploy:
Tue Jan 3 17:54:24 UTC 2017
The bad gateway event seems to correlate with the 120 seconds of restart in neutron.