qdzlug/OS-5999.md

## OS-5999.md

      
    Raw
  

              OS-5999.md
            
          
    SWSUP-665 has a description of the problem, basically:

OS-5950 fixed so we attempt to start metadata once we see the VM go 'running'
if qemu is up too slowly, it's possible we'll see 'running' before the socket is actually usable, in which case we'll get ECONNREFUSED
when we hit ECONNREFUSED on the initial connection, we rely on the periodic (every minute) retry
by the time we retry, Ubuntu zones with broken cloud-init (all of them, see IMAGE-1014) will get stuck and never properly recover

In order to work around this, we'll need to either:

have metadata retry after ECONNREFUSED, before the periodic timer
have some other mechanism that notifies metadata agent that the qemu socket is ready to be used