Currently, OpenStack Havana does not have built-in support for highly availabile virtual routers or DHCP services. In the existing Havana release, virtual routers and DHCP services are scheduled to a single Quantum network node, and are not rescheduled on network node failure.
Since virtual router and DHCP services are normally scheduled approximately evenly, the failure of a single Quantum network node could cause IP addressing and routing failure on a number of networks proportional to the number of Quantum network nodes in use. Because this is generally an unacceptable risk in production environments, most production deployments of OpenStack have traditionally used either the "old-style" nova network driver in HA mode instead of Quantum, or chosen to use Quantum with provider networks so as to externalize these services for higher availability.
This has the unfortunate consequence of reducing the utility of software defined networking, which is frequently one of the most compelling freatures of OpenStack itself. While the Quantum project itself will likely find solutions to the problem, production requirements dictated that a solution to the problem be found sooner. To attempt to solve some of these issues, we have developed a service to monitor topology changes in a running OpenStack cluster, and automatically make changes to the networking configuration to maintain availability of services even in the event of Quantum network node failure.
The RPCDaemon is a python-based daemon that subscribes to the nova message bus and watches for events that it should take action on. Three plugins are currently implemented:
-
DHCPAgent: Implements high availability in DHCP services
-
L3Agent: Implements high availability in virtual routers
-
Dump: Simple plugin to dump message traffic. This is typically only used for development or troubleshooting purposes.
The operation of the DHCPAgent plugin is simple to describe. At periodic intervals, DHCP services are removed from any Quantum DHCP agent that is no longer reporting itself as available. In addition, DHCP services are provisioned on every Quantum DHCP agent node that doesn't already have them provisioned.
In addition, when a DHCP enabled network is removed, the DHCPAgent plugin ensures that DHCP services are deprovisioned on all Quantum DHCP agent nodes.
The operation effect of these actions is that when creating new DHCP enabled networks, DHCP servers appear on every Quantum network node, rather than on a single Quantum network node. While this slightly increases DHCP traffic from multiple offers to each DHCP discovery request, it does so safely, as the OpenStack DHCP implementation uses DHCP reservations to ensure virtual machines always boot with predictable IP addresses.
Because of this, DHCP requests can continue to be services by other available network nodes, even in the event of catastrophic failure of a single network node.
The L3 agent also runs periodically, but is only interested in virtual routers that are currently assigned to L3 agents that have become inactive. If the L3Agent plugin observes an "down" L3 agent that Quantum believes is hosting a virtual router, then the L3Agent plugin deprovisions the virtual router from that node and reprovisions it on another active Quantum L3 agent node.
This reprovisioning action does not occur immediately, and there will be some minimal network interruption while the virtual router is migrated, however the corrective action happens without intervention, and any network outage is transient. While not perfect, this does allow a higher availability of virtual routing, and may be acceptable for some production workloads.
While the configuration options are not currently exposed by the
Rackspace Private Cloud cookbooks, the following is a description of
the configuration values in the rpcdaemon configuration file
(typically located at /etc/rpcdaemon.conf
)
The configuration file is a python ConfigParser ini-style file. There is one section for general daemon settings, and then configuration sections for each plugin.
General daemon options are specified in the Daemon
section of the
configuration file. Available options include:
-
plugins: Comma separated list of plugins to load. Valid options include L3Agent, DHCPAgent, and Dump.
-
rpchost: Kombu connection url for the OpenStack message server. In the case of rabbitmq, an IP address is sufficient. See the [Kombu Documentation] (http://kombu.readthedocs.org/en/latest/userguide/connections.html) for more information on Kombu connection urls.
-
pidfile: Location of the daemon pid file.
-
logfile: Location of the log file.
-
loglevel: Verbosity of logging. Valid options include DEBUG, INFO, WARNING, ERROR, and CRITICAL.
L3Agent options are specified in the L3Agent
section of the
configuration file. The L3Agent logs to the logfile specified in the
Daemon
section, but the log level of the L3Agent can be configured
independently of the daemon itself. Available configuration options
include:
-
conffile: Path to the quantum (or neutron) configuration file.
-
loglevel: Verbosity of logging.
DHCPAgent options are specified in the DHCPAgent
section of the
configuration file. Like the L3Agent, logs will also be sent to the
logfile specified in the Daemon
section, while the log level is
independently configurable. The DHCPAgent takes the same configuration
options as the L3Agent, namely:
-
conffile: Path to the quantum (or neutron) configuration file.
-
loglevel: Verbosity of logging.
Unsurprisingly, the Dump plugin options are specified in the Dump
section of the configuration file. In daemon mode, the Dump plugin
will log to the logfile specified in the Daemon
section, and
although the log level is configurable, dumped messages are emitted at
DEBUG level, so any other loglevel setting is essentially useless.
The Dump plugin is most useful when running in foreground mode. See
the Command Line Options
section for more information.
Available options:
-
loglevel: any valid loglevel vebosity, but should be DEBUG as explained previously.
-
queue: queue to dump. Typically
quantum
orneutron
to view network related messages.
The RPCDaemon currently understands only two command-line options:
-
-d: don't detach (run in foreground). When running in foreground, a pidfile is not dropped, the default log level is set to DEBUG, and the daemon logs to stderr rather than the specified logfile. This is most useful for running the Dump plugin, but can be helpful in development mode as well.
-
-c : path to configuration file. The default configuration file path is
/usr/local/etc/rpcdaemon.conf
, but init scripts on packaged version of RPCDaemon pass-c /etc/rpcdaemon.conf
.
forgot to doc queue_expire in neutronagent plugins to set x-expires headers. default is 60
timeout (default to 20s)