misterdjules/zapi-793-questions.txt

## zapi-793-questions.txt
The original goal of this ticket was to handle the case when a volume would
unexpectedly change its IP address on its existing network due to, e.g. operator
changes, migration or bugs.

The first implementation that has been tried at
https://github.com/joyent/sdc-vmapi/tree/ZAPI-793 updates the
{{internal_metadata}} of VMs that mount a given volume during their {{start}}
workflow and update the IP address of any volume that they require.

While this solves part of the original use case, it has the fundamental
limitation that users need to stop _and_ start every VM that mount a volume for
which its IP address changed.

It doesn't handle the case when VMs mounting volumes are rebooted by users, or
reboot automatically.

We could definitely use the same approach as
https://github.com/joyent/sdc-vmapi/tree/ZAPI-793 in the VMAPI {{reboot}}
workflow, but the system would still not handle a VM reboot not initiated by a
VMAPI workflow.

To handle all use cases, it seems like a lower-level approach would be
necessary.

At this point, we can step back and wonder whether the original use case
(handling unexpected IP addresses changes) is worth the potential complexity of
implementation and limitations.

If this was the only use case, my answer would be that it's not worth it.
However, there is another use case that potentially changes this trade off.

The {{AttachVolumeToNetwork}} and {{DetachVolumeFromNetwork}} described at
https://github.com/joyent/rfd/tree/master/rfd/0026#attachvolumetonetwork-post-volumesidattachtonetwork-mvp-milestone
would allow users to change on which network(s) a given volume is reachable.

Without any possibility for existing VMs that mount those volumes to change how
they connect to those volumes, users would need to recreate those VMs. It seems
that it might not be an acceptable limitation.

I'll assume for now that we want to solve that use case. Going back to potential
lower-level approaches, I've identified two of them:

1. use DNS names instead of IP address when automatically mounting NFS volumes

2. update the IP address of volumes at the init subsystem level ({{lxinit}} for
   Docker and LX VMs, mdata-fetch for "infrastructure", or SmartOS, containers)

Approach #2 would still require users to reboot VMs mounting volumes, but would
handle the case of unplanned reboots.

I have questions for both approaches.

Using DNS names would require the following guarantees in Triton:

1. every VM that can mount NFS volumes have access to a DNS server able to
   resolve NFS volumes' host names. This should always be the case for VMs on
   networks that have a NAT zone, since they should be able to query Joyent's
   public DNS, which would answer with CNS entries corresponding to any NFS
   volume. However, it's not clear to me that all fabric networks are guaranteed
   to have NAT zones (e.g currently, or until recently, there were at least some
   use cases when Terraform used to _not_ create NAT zones when creating
   instances on a fabric network). It's even less clear if instances provisioned
   on non-fabric networks would be guaranteed to have access to a DNS service
   able to serve those records. Since we're currently discussing about allowing
   VMs on non-fabric networks to mount NFS volumes, this could be a relevant use
   case.

2. We would need to verify that the implementation of the NFS client on SmartOS
   (and potentially Linux and other systems for KVM) retry DNS name lookups when
   various operations fail due to timeouts or errors because the NFS server
   serving volumes' data is unreachable.

Updating IP addresses at the init subsystem level would probably be implemented
in {{lxinit}}, {{mdata-fetch}} an in user-scripts for KVM instances. I don't
foresee specific issues with that in {{lxinit}}, since requests to internal
services (e.g. VOLAPI) would be performed by code controlled by the
implementation, and could not be abused by users (although we could imagine a
large number of LX containers stuck in an autoreboot loop sending requests to
VOLAPI for IP addresses, but it seems could still mitigate that using e.g. a
cache).

However, performing that update in {{mdata-fetch}} or user scripts would imply
exposing tools to be able to refresh the IP address of a volume to the user. It
seems that without being careful, those tools could be used to DOS the internal
services that they would depend on.

Moreover, while it seems that this could be implemented in the metadata agent for
infrastructure (SmartOS) containers, it's also not clear how that interface
would be exposed to KVM VMs. It seems that it would require customization of
guest images (similar to cloudinit for Ubuntu images).

As a result, this is my current position on how we should move forward on this,
in order:

1. Determine whether the use case of changing network reachability of volumes
   need to be solved (that is, whether requiring to destroy/recreate mounting
   VMs is acceptable in this case).

2. If that use case needs to be solved, implement the approach used for
   https://github.com/joyent/sdc-vmapi/tree/ZAPI-793 for the reboot and start
   workflows and document usage and limitations. Otherwise just close this
   ticket.

3. Evaluate feasability of robust and safe implementation for updating IP
   addresses of volumes at the init subsystem level for all brands/types of
   instances/machines. Document limitations (users need to reboot VMs mounting
   volumes).

4. Evaluate feasbility of using DNS names for referring to NFS volumes
   everywhere.

Thoughts?
	The original goal of this ticket was to handle the case when a volume would
	unexpectedly change its IP address on its existing network due to, e.g. operator
	changes, migration or bugs.

	The first implementation that has been tried at
	https://github.com/joyent/sdc-vmapi/tree/ZAPI-793 updates the
	{{internal_metadata}} of VMs that mount a given volume during their {{start}}
	workflow and update the IP address of any volume that they require.

	While this solves part of the original use case, it has the fundamental
	limitation that users need to stop _and_ start every VM that mount a volume for
	which its IP address changed.

	It doesn't handle the case when VMs mounting volumes are rebooted by users, or
	reboot automatically.

	We could definitely use the same approach as
	https://github.com/joyent/sdc-vmapi/tree/ZAPI-793 in the VMAPI {{reboot}}
	workflow, but the system would still not handle a VM reboot not initiated by a
	VMAPI workflow.

	To handle all use cases, it seems like a lower-level approach would be
	necessary.

	At this point, we can step back and wonder whether the original use case
	(handling unexpected IP addresses changes) is worth the potential complexity of
	implementation and limitations.

	If this was the only use case, my answer would be that it's not worth it.
	However, there is another use case that potentially changes this trade off.

	The {{AttachVolumeToNetwork}} and {{DetachVolumeFromNetwork}} described at
	https://github.com/joyent/rfd/tree/master/rfd/0026#attachvolumetonetwork-post-volumesidattachtonetwork-mvp-milestone
	would allow users to change on which network(s) a given volume is reachable.

	Without any possibility for existing VMs that mount those volumes to change how
	they connect to those volumes, users would need to recreate those VMs. It seems
	that it might not be an acceptable limitation.

	I'll assume for now that we want to solve that use case. Going back to potential
	lower-level approaches, I've identified two of them:

	1. use DNS names instead of IP address when automatically mounting NFS volumes

	2. update the IP address of volumes at the init subsystem level ({{lxinit}} for
	Docker and LX VMs, mdata-fetch for "infrastructure", or SmartOS, containers)

	Approach #2 would still require users to reboot VMs mounting volumes, but would
	handle the case of unplanned reboots.

	I have questions for both approaches.

	Using DNS names would require the following guarantees in Triton:

	1. every VM that can mount NFS volumes have access to a DNS server able to
	resolve NFS volumes' host names. This should always be the case for VMs on
	networks that have a NAT zone, since they should be able to query Joyent's
	public DNS, which would answer with CNS entries corresponding to any NFS
	volume. However, it's not clear to me that all fabric networks are guaranteed
	to have NAT zones (e.g currently, or until recently, there were at least some
	use cases when Terraform used to _not_ create NAT zones when creating
	instances on a fabric network). It's even less clear if instances provisioned
	on non-fabric networks would be guaranteed to have access to a DNS service
	able to serve those records. Since we're currently discussing about allowing
	VMs on non-fabric networks to mount NFS volumes, this could be a relevant use
	case.

	2. We would need to verify that the implementation of the NFS client on SmartOS
	(and potentially Linux and other systems for KVM) retry DNS name lookups when
	various operations fail due to timeouts or errors because the NFS server
	serving volumes' data is unreachable.

	Updating IP addresses at the init subsystem level would probably be implemented
	in {{lxinit}}, {{mdata-fetch}} an in user-scripts for KVM instances. I don't
	foresee specific issues with that in {{lxinit}}, since requests to internal
	services (e.g. VOLAPI) would be performed by code controlled by the
	implementation, and could not be abused by users (although we could imagine a
	large number of LX containers stuck in an autoreboot loop sending requests to
	VOLAPI for IP addresses, but it seems could still mitigate that using e.g. a
	cache).

	However, performing that update in {{mdata-fetch}} or user scripts would imply
	exposing tools to be able to refresh the IP address of a volume to the user. It
	seems that without being careful, those tools could be used to DOS the internal
	services that they would depend on.

	Moreover, while it seems that this could be implemented in the metadata agent for
	infrastructure (SmartOS) containers, it's also not clear how that interface
	would be exposed to KVM VMs. It seems that it would require customization of
	guest images (similar to cloudinit for Ubuntu images).

	As a result, this is my current position on how we should move forward on this,
	in order:

	1. Determine whether the use case of changing network reachability of volumes
	need to be solved (that is, whether requiring to destroy/recreate mounting
	VMs is acceptable in this case).

	2. If that use case needs to be solved, implement the approach used for
	https://github.com/joyent/sdc-vmapi/tree/ZAPI-793 for the reboot and start
	workflows and document usage and limitations. Otherwise just close this
	ticket.

	3. Evaluate feasability of robust and safe implementation for updating IP
	addresses of volumes at the init subsystem level for all brands/types of
	instances/machines. Document limitations (users need to reboot VMs mounting
	volumes).

	4. Evaluate feasbility of using DNS names for referring to NFS volumes
	everywhere.

	Thoughts?