Skip to content

Instantly share code, notes, and snippets.

@misterdjules
Last active March 3, 2018 05:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save misterdjules/516e88ef7b18cb2d3cc37d34d8215910 to your computer and use it in GitHub Desktop.
Save misterdjules/516e88ef7b18cb2d3cc37d34d8215910 to your computer and use it in GitHub Desktop.
The original goal of this ticket was to handle the case when a volume would
unexpectedly change its IP address on its existing network due to, e.g. operator
changes, migration or bugs.
The first implementation that has been tried at
https://github.com/joyent/sdc-vmapi/tree/ZAPI-793 updates the
{{internal_metadata}} of VMs that mount a given volume during their {{start}}
workflow and update the IP address of any volume that they require.
While this solves part of the original use case, it has the fundamental
limitation that users need to stop _and_ start every VM that mount a volume for
which its IP address changed.
It doesn't handle the case when VMs mounting volumes are rebooted by users, or
reboot automatically.
We could definitely use the same approach as
https://github.com/joyent/sdc-vmapi/tree/ZAPI-793 in the VMAPI {{reboot}}
workflow, but the system would still not handle a VM reboot not initiated by a
VMAPI workflow.
To handle all use cases, it seems like a lower-level approach would be
necessary.
At this point, we can step back and wonder whether the original use case
(handling unexpected IP addresses changes) is worth the potential complexity of
implementation and limitations.
If this was the only use case, my answer would be that it's not worth it.
However, there is another use case that potentially changes this trade off.
The {{AttachVolumeToNetwork}} and {{DetachVolumeFromNetwork}} described at
https://github.com/joyent/rfd/tree/master/rfd/0026#attachvolumetonetwork-post-volumesidattachtonetwork-mvp-milestone
would allow users to change on which network(s) a given volume is reachable.
Without any possibility for existing VMs that mount those volumes to change how
they connect to those volumes, users would need to recreate those VMs. It seems
that it might not be an acceptable limitation.
I'll assume for now that we want to solve that use case. Going back to potential
lower-level approaches, I've identified two of them:
1. use DNS names instead of IP address when automatically mounting NFS volumes
2. update the IP address of volumes at the init subsystem level ({{lxinit}} for
Docker and LX VMs, mdata-fetch for "infrastructure", or SmartOS, containers)
Approach #2 would still require users to reboot VMs mounting volumes, but would
handle the case of unplanned reboots.
I have questions for both approaches.
Using DNS names would require the following guarantees in Triton:
1. every VM that can mount NFS volumes have access to a DNS server able to
resolve NFS volumes' host names. This should always be the case for VMs on
networks that have a NAT zone, since they should be able to query Joyent's
public DNS, which would answer with CNS entries corresponding to any NFS
volume. However, it's not clear to me that all fabric networks are guaranteed
to have NAT zones (e.g currently, or until recently, there were at least some
use cases when Terraform used to _not_ create NAT zones when creating
instances on a fabric network). It's even less clear if instances provisioned
on non-fabric networks would be guaranteed to have access to a DNS service
able to serve those records. Since we're currently discussing about allowing
VMs on non-fabric networks to mount NFS volumes, this could be a relevant use
case.
2. We would need to verify that the implementation of the NFS client on SmartOS
(and potentially Linux and other systems for KVM) retry DNS name lookups when
various operations fail due to timeouts or errors because the NFS server
serving volumes' data is unreachable.
Updating IP addresses at the init subsystem level would probably be implemented
in {{lxinit}}, {{mdata-fetch}} an in user-scripts for KVM instances. I don't
foresee specific issues with that in {{lxinit}}, since requests to internal
services (e.g. VOLAPI) would be performed by code controlled by the
implementation, and could not be abused by users (although we could imagine a
large number of LX containers stuck in an autoreboot loop sending requests to
VOLAPI for IP addresses, but it seems could still mitigate that using e.g. a
cache).
However, performing that update in {{mdata-fetch}} or user scripts would imply
exposing tools to be able to refresh the IP address of a volume to the user. It
seems that without being careful, those tools could be used to DOS the internal
services that they would depend on.
Moreover, while it seems that this could be implemented in the metadata agent for
infrastructure (SmartOS) containers, it's also not clear how that interface
would be exposed to KVM VMs. It seems that it would require customization of
guest images (similar to cloudinit for Ubuntu images).
As a result, this is my current position on how we should move forward on this,
in order:
1. Determine whether the use case of changing network reachability of volumes
need to be solved (that is, whether requiring to destroy/recreate mounting
VMs is acceptable in this case).
2. If that use case needs to be solved, implement the approach used for
https://github.com/joyent/sdc-vmapi/tree/ZAPI-793 for the reboot and start
workflows and document usage and limitations. Otherwise just close this
ticket.
3. Evaluate feasability of robust and safe implementation for updating IP
addresses of volumes at the init subsystem level for all brands/types of
instances/machines. Document limitations (users need to reboot VMs mounting
volumes).
4. Evaluate feasbility of using DNS names for referring to NFS volumes
everywhere.
Thoughts?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment