hairyhum/Rabbitmq node monitor.md

## Rabbitmq node monitor.md

      
    Raw
  

              Rabbitmq node monitor.md
            
          
    Cluster state on disk:


Mnesia schema
db_nodes - nodes of the schema. Either disk nodes or nodes where tables are replicated to.
extra_db_nodes - configuration telling mnesia which nodes to connect to on startup.
running_db_nodes - nodes which mnesia is currently connected to. [1]
table nodes - nodes on which tables are replicated.
A list with all nodes, and with "active" nodes.
"all nodes" is a subset of db_nodes, "active nodes" is a subset of running_db_nodes
In a way db_nodes and running_db_nodes are same as
"all nodes" and "active nodes" of the schema table
nodes_running_at_shutdown
a list of nodes, which are currently running. This is similar to running_db_nodes,
but is monitored by the node_monitor. It's modified when a node starts, joins/leaves
the cluster or when the rabbit process stops on a node.
cluster_nodes.config
two lists, one containing all clustered nodes, another containing disc nodes.
modified when a node joins/leaves the cluster

Monitors:


mnesia_monitor - a process linked to other monitors on all db_nodes.


rabbit_node_monitor - monitors nodes (net_kernel:monitor_nodes/2) and rabbit processes on remote nodes.


All the queues/channels/gm can monitor state across nodes.


Messages:

Common:


nodedown - a message from erlang internal node monitor.
handled by mnesia_monitor to keep track on down nodes (does not directly remove them from running_db_nodes)
and rabbit_node_monitor to track how many nodes are running for pause_minority and pause_if_all_down
triggers check_partial_partition


nodeup - counterpart of nodedown.
handled by mnesia_monitor to check the cluster status. This handler may send an inconsistent_database event.
rabbit_node_monitor logs the event and does nothing


mnesia_minitor:


Link EXIT signal from mnesia_monitor
Updates running_db_nodes and active nodes for all tables

rabbit_node_monitor:


notify_node_up
notify all nodes from running_db_nodes (except self) by sending node_up to them


DOWN from rabbit process
Update cluster status (removes the stopped node)
Clean up transient queues, listeners, alarms. Updates partition tracking (handle_dead_rabbit)


node_up (not to be confused with nodeup)
sent by a node monitor on a started remote node to notify the cluster (in a boot step)
Update cluster status, update alarms, cleanup started node from recoverable slaves for mirrored queues (handle_live_rabbit)


joined_cluster/left_cluster - update cluster status


{mnesia_system_event, {inconsistent_database, running_partitioned_network, Node}}
this message is being treated as reconnect after partial partition
update alarms, cleanup started node from recoverable slaves for mirrored queues (handle_live_rabbit)
record partitioned state
I'm not sure this is the right message to report reconnect.
This message may be emitted multiple times and does not necessary mean that a node have rejoined


Partial partition handling:


check_partial_partition:
the message is sent by a node handling a nodedown message to all the running nodes
except the sender and the node, which is "down".
The message contains GUIDs of these two nodes
A node, which receives this message, will check that the "down" node is actually down by
checking it's status (in the node_monitor data) and by sending an RPC request to call rabbit:is_running/0
If the "down" node is running, the "checker" node responds to the "reporter" node with partial_partition message
with the "checker" node and the "down" node
The RPC request is sent in a one-off process.
This feels dangerous intuitively and not that easy to reason about.


partial_partition:
the message tells a node that there is a partial partition.
It contains the "checker" node and the "not_really_down" node.
On this message node monitor will force disconnect from the "checker" node
and send it a partial_partition_disconnect message
The node may also pause instead if it's in pause_minority or pause_if_all_down mode


partial_partition_disconnect:
the message tells a node to disconnect from another node.


The assumption here is that a node should be promoted to a full partition,
disconnecting from the "checker" node and leaving the "checker" and the "down"
nodes in a partition together.
But because DOWN messages are symmetric and there is no additional coordination
this process may leave entire cluster disconnected or keep disconnecting nodes for
some time.
a note on disconnect:
When disconnecting, the nodes will disable reconnection for 1 second.
When some nodes are down, node monitor will ping entire cluster every 1 second.
Also it will send a cast keepalive message to all running nodes every 10 seconds.
[1] running_db_nodes:
This value is maintained by internal mnesia monitors.
A node is removed from this list when mnesia_monitor processes detects another mnesia_monitor to be "down".
When rediscovering the node it will not be automatically re-added unless schema is merged.
This can be called exolicitly: mnesia:change_config(extra_db_nodes, [Node]) or if the node restarts.
You may need to set the same extra_db_nodes configuration, which is already there, to reconnect the cluster.
When nodes are discovered, mnesia sends a message like this:
{mnesia_system_event, {inconsistent_database, running_partitioned_network, Node}}
to all processes subscribed to such events.
This may happen every time mnesia checks schema consistency, both when the node
is discovered to be up (e.g. a message is sent between nodes) or when connecting
with mnesia:change_config(extra_db_nodes, ...).