Skip to content

Instantly share code, notes, and snippets.

@gregdek
Created February 9, 2014 06:48
Show Gist options
  • Save gregdek/8895362 to your computer and use it in GitHub Desktop.
Save gregdek/8895362 to your computer and use it in GitHub Desktop.
Analysis of cloud-error.log for CIAB post-install
103 error messages in the error log from a functional CIAB install:
[root@ciab200 eucalyptus]# grep ERROR cloud-error.log | wc -l
103
OK, a little digging indicates that 87 of these errors relate to the failure of loadbalancing to start:
[root@ciab200 eucalyptus]# grep ERROR cloud-error.log | grep loadbalanc | wc -l
87
Which leaves 16. Comments inline:
Sun Feb 9 00:41:36 2014 ERROR [BindingCache:main] [com.eucalyptus.binding.BindingCache$InternalSoapBindingGenerator$RootObjectTypeBinding.process(BindingCache.java):668] BUG: Fix your message type definitions for class edu.ucsb.eucalyptus.msgs.ErrorResponse
Sun Feb 9 00:41:36 2014 ERROR [BindingCache:main] [com.eucalyptus.binding.BindingCache$InternalSoapBindingGenerator$RootObjectTypeBinding.process(BindingCache.java):669] BUG: Binding generation encountered an element naming conflict. Using edu.ucsb.eucalyptus.msgs.ErrorResponse for edu.ucsb.eucalyptus.msgs.ErrorResponse
Sun Feb 9 00:42:11 2014 ERROR [BindingCache:main] [com.eucalyptus.binding.BindingCache$InternalSoapBindingGenerator$RootObjectTypeBinding.process(BindingCache.java):668] BUG: Fix your message type definitions for class edu.ucsb.eucalyptus.msgs.ErrorResponse
Sun Feb 9 00:42:11 2014 ERROR [BindingCache:main] [com.eucalyptus.binding.BindingCache$InternalSoapBindingGenerator$RootObjectTypeBinding.process(BindingCache.java):669] BUG: Binding generation encountered an element naming conflict. Using edu.ucsb.eucalyptus.msgs.ErrorResponse for edu.ucsb.eucalyptus.msgs.ErrorResponse
^^^ These look like they should be warns...?
Sun Feb 9 00:43:46 2014 ERROR [TCPListener:Thread-15] [com.eucalyptus.cloud.ws.TCPListener.run(TCPListener.java):103] java.net.SocketException: Socket closed
Sun Feb 9 00:43:46 2014 ERROR [TCPListener:Thread-15] [com.eucalyptus.cloud.ws.TCPListener.run(TCPListener.java):99] Cannot start service. Invalid socket.
^^^ These happen around the same time as the load balancer issues... related?
Sun Feb 9 00:43:58 2014 ERROR [NioServerHandler:New I/O worker #130] [com.eucalyptus.ws.server.NioServerHandler.exceptionCaught(NioServerHandler.java):199] Internal Error: Missing required parameter: AWSAccessKeyId
^^^ Hmm, no idea what this is.
Sun Feb 9 00:44:40 2014 ERROR [ConfigurationManager:Configuration.02] [com.eucalyptus.config.ConfigurationManager.registerComponent(ConfigurationManager.java):135] Unpartitioned component (walrus) is being registered w/o a partition. Using fixed partition=walrus for request: <?xml version="1.0" encoding="UTF-8"?>
^^^ Is this CIAB misconfiguring Walrus? And if it's being handled by the code, should this be changed to a WARN?
Sun Feb 9 00:44:49 2014 ERROR [Topology:Eucalyptus.bootstrap:EphemeralConfiguration:arn:euca:bootstrap:Topology::external/.class java.util.concurrent.ThreadPoolExecutor$Worker#21] [com.eucalyptus.component.Topology$TopologyChange.doTopologyChange(Topology.java):1155] START:DISABLED arn:euca:eucalyptus:CLUSTER01:cluster:cc_01/ INITIALIZED->DISABLED=NOTREADY [Cluster credentials are invalid: cc_01]
Sun Feb 9 00:44:49 2014 ERROR [Topology:Eucalyptus.cluster:EphemeralConfiguration:arn:euca:eucalyptus::cluster:com.eucalyptus.component.Topology-192.168.1.200-cc_01/.class java.util.concurrent.ThreadPoolExecutor$Worker#279] [com.eucalyptus.component.Topology$TopologyChange.doTopologyChange(Topology.java):1155] ENABLE:ENABLED arn:euca:eucalyptus:CLUSTER01:cluster:cc_01/ NOTREADY->ENABLED=NOTREADY [Cluster credentials are invalid: cc_01]
^^^ I end up with a perfectly valid cc_01 when I run euca-describe-availability-zones. Are these transient errors during startup? If so, should they be considered ERRORS or WARNS? It may be that they are more like WARNS at startup, but in a running cloud would be ERROR instead. How would we handle such a case?
Sun Feb 9 00:44:52 2014 ERROR [Topology:Eucalyptus.bootstrap:EphemeralConfiguration:arn:euca:bootstrap:Topology::external/.class java.util.concurrent.ThreadPoolExecutor$Worker#27] [com.eucalyptus.component.Topology$TopologyChange.doTopologyChange(Topology.java):1155] CHECK:null arn:euca:eucalyptus:CLUSTER01:cluster:cc_01/ NOTREADY->DISABLED=NOTREADY [Action:ProblemAction Code:wsse:InvalidSecurity Id:RelatesTo Error: Signature Verification failed.]
^^^ No idea what this is.
Sun Feb 9 00:44:54 2014 ERROR [Topology:Eucalyptus.bootstrap:EphemeralConfiguration:arn:euca:bootstrap:Topology::external/.class java.util.concurrent.ThreadPoolExecutor$Worker#14] [com.eucalyptus.component.Topology$TopologyChange.doTopologyChange(Topology.java):1155] START:DISABLED arn:euca:eucalyptus:CLUSTER01:storage:sc_01/ INITIALIZED->DISABLED=BROKEN [SC blockstorageamanger not configured. Found empty or unset manager(null). Legal values are: das,overlay]
Sun Feb 9 00:44:54 2014 ERROR [Topology:Eucalyptus.storage:EphemeralConfiguration:arn:euca:eucalyptus::storage:com.eucalyptus.component.Topology-192.168.1.200-sc_01/.class java.util.concurrent.ThreadPoolExecutor$Worker#280] [com.eucalyptus.component.Topology$TopologyChange.doTopologyChange(Topology.java):1155] ENABLE:ENABLED arn:euca:eucalyptus:CLUSTER01:storage:sc_01/ BROKEN->ENABLED=BROKEN [SC blockstorageamanger not configured. Found empty or unset manager(null). Legal values are: das,overlay]
^^^ Hm, do we not configure storage controller properly in CIAB?
Sun Feb 9 00:44:56 2014 ERROR [Topology:Eucalyptus.bootstrap:EphemeralConfiguration:arn:euca:bootstrap:Topology::external/.class java.util.concurrent.ThreadPoolExecutor$Worker#13] [com.eucalyptus.component.Topology$TopologyChange.doTopologyChange(Topology.java):1155] CHECK:null arn:euca:eucalyptus:CLUSTER01:cluster:cc_01/ NOTREADY->DISABLED=NOTREADY [Action:ProblemAction Code:wsse:InvalidSecurity Id:RelatesTo Error: Signature Verification failed.]
^^^ Again, no idea what this is.
Sun Feb 9 00:45:02 2014 ERROR [ListenerRegistry:SystemClockTimer] [com.eucalyptus.event.ListenerRegistry$ReentrantListenerRegistry.fireEvent(ListenerRegistry.java):326] com.eucalyptus.event.EventFailedException: Failed to fire event: listener=com.eucalyptus.blockstorage.BlockStorageAvailabilityEventListener event=com.eucalyptus.event.ClockTick@19bd2e6 because of: example: [StorageInfo name=CLUSTER01 maxTotalVolumeSizeInGb=null maxVolumeSizeInGB=null shouldTransferSnapshots=null maxSnapTransferRetries=null id=null version=null creationTimestamp=null lastUpdateTimestamp=null naturalId=null]
Sun Feb 9 00:45:12 2014 ERROR [ListenerRegistry:SystemClockTimer] [com.eucalyptus.event.ListenerRegistry$ReentrantListenerRegistry.fireEvent(ListenerRegistry.java):326] com.eucalyptus.event.EventFailedException: Failed to fire event: listener=com.eucalyptus.blockstorage.BlockStorageAvailabilityEventListener event=com.eucalyptus.event.ClockTick@4126c4cf because of: example: [StorageInfo name=CLUSTER01 maxTotalVolumeSizeInGb=null maxVolumeSizeInGB=null shouldTransferSnapshots=null maxSnapTransferRetries=null id=null version=null creationTimestamp=null lastUpdateTimestamp=null naturalId=null]
^^^ System clock timer failures?
@zhill
Copy link

zhill commented Feb 9, 2014

Starting from the top, I'll address each of the errors to my knowledge:

Line 17 comment: These should be warnings, but not sure how easily we can change this since it is generated by a dependency of ours (JIBX) rather than our code directly. We could suppress those messages but we would have to be careful to not suppress real errors that could come from the same subsystem.

Line 21: May be LB, may be DNS, not sure.

Line 24: That is an error we are fixing in 4.0. That comes from the Walrus system, from what I can tell in this output. It is benign, but annoying. If this is the message from the source I think it is, then it will be fixed.

Line 27: This is normal, not sure why this is an error. The CLC, registers Walrus just fine but shows an error because it doesn't have a partition, which is expected since it is a global service. This should be removed completely.

Line 31: I think there is something strange with the config here, this usually indicates an issue with the keys and certs used. But, if it goes to ENABLED in the end, then maybe this is just transitory between when the keys are synced and the component is registered.

Line 34: Same thing

Line 38: This is expected in the interval between when the SC is registered and when it is configured. It goes away once the SC backend is configured. As long as you don't see it continue in the logs, things are okay. This is a runtime configuration only, so there is no way to configure it prior to starting the system. We could make it a warn-level message, but this issue prevents the SC from going to ENABLED and requires admin intervention, so I think ERROR is the proper level.

Line 41: Same as 31

Line 45: Not sure what this is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment