SWSUP-665 has a description of the problem, basically:
- OS-5950 fixed so we attempt to start metadata once we see the VM go 'running'
- if qemu is up too slowly, it's possible we'll see 'running' before the socket is actually usable, in which case we'll get ECONNREFUSED
- when we hit ECONNREFUSED on the initial connection, we rely on the periodic (every minute) retry
- by the time we retry, Ubuntu zones with broken cloud-init (all of them, see IMAGE-1014) will get stuck and never properly recover
In order to work around this, we'll need to either:
- have metadata retry after ECONNREFUSED, before the periodic timer
- have some other mechanism that notifies metadata agent that the qemu socket is ready to be used