Chapter 4. VACM Architecture Overview

The following section runs through the various VACM components and architecture.

VACM Components

VACM consists of three main software components; some of which must run constantly, and some of which need only be run when you wish to have VACM perform a given operation. The components are described in detail below.

Nexxus (Node Controller)

Nexxus can be considered the "heart" of VACM. It is responsible for receiving user or automated script requests, and dispatching them to the proper low level handler for execution. It is also responsible for maintaining the concept of the "nodelist". Nexxus should be run on a piece of dedicated hardware referred to as the "Node Controller". Depending on the types of monitoring you wish to do, the Node Controller may or may not require special hardware. For example, if you wanted to have a Node Controller monitor and manage a bank of UPS's that used a serial cable for control, you may have to install serial port extenders into the Node Controller so it has the necessary ports to communicate. This is not to say that a machine that has been designated a Node Controller, must run Nexxus exclusively.

Depending on the number of machines you wish to monitor, along with the types of monitoring you wish to do, giving a Node Controller other processing responsibilities is just fine. One thing to keep in mind however, is that if your Node Controller is one of the nodes you are monitoring, exercise extreme care when managing this node. For example, if your server acts both as a Node Controller *and* a processing node, and decides to power down the entire cluster, the Node Controller itself will be powered down. To avoid 'painting yourself into a corner', VA recommends allocating a Node Controller as a dedicated node.

Modules

Modules are the "workhorses" of VACM. They receive requests from Nexxus and act upon them in a specific way. For example, the EMP module takes requests from Nexxus and performs operations specific to the Emergency Management Port on many server motherboards. The BAYTECH module takes requests and performs operations related to the Baytech serial port addressable power strips.

While some modules like EMP and VA1000 are responsible for communicating with nodes using specific management communication protocols, other modules provide higher-level services for cluster management. Many of these services are implemented totally in software and allow monitoring and control of the software running on the nodes in the cluster. For example, the SYSSTAT module allows users to inspect various details of a node's OS and hardware configuration.

Clients

Clients are the user interface to VACM. They handle requests from the user and submit the appropriate IPC command to the Nexxus. They receive ipc responses from the Nexxus and can display them to the user in one form or another. Clients are available for the command line (such as vash), as well for X11 (such as Hoover and Flim).

VACM IPC Messaging

VACM commands are in the form of IPC (Inter Process Communication) message strings. These are ASCII NULL terminated strings that are field separated by colons (:). All messages have the same basic format:
MODULE_IPC_TAG:COMMAND:NODE_ID:ARGS:...
The MODULE_IPC_TAG instructs which module the message is to be routed to. The COMMAND is the descriptor for the operation which is to be performed. NODE_ID is the target node which the operation is to be performed on (if applicable). In most applicable cases, a shell style GLOB string can be used to select a number of nodes. ARGS is a colon delimited list of arguments which the command requires to complete its operation. A few example messages are shown:
EMP:POWER_OFF:sanbox  -- Instruct the EMP module to 
                         power off node 'sanbox'
ICMP_ECHO:PING:sanbox -- Instruct the ICMP_ECHO module to 
                         ping node 'sanbox'
NEXXUS:MODULES        -- Instruct Nexxus to return a list
                         of all loaded modules
When a message is sent to Nexxus, a job id is assigned to the task associated with the message and sent back to the originator of the message. This is done because operations being performed by VACM for a node may run concurrently, and the originator needs to know which return messages are for which operation it has issued. A job id is an unsigned 32 bit integer value, with the value of 0 being reserved for return messages that are not associated with a requested operation. These special return messages are known as unsolicited messages. Here is an example IPC transaction using VASH:
[root@lysithea nexxus]# vash  (We execute vash from the 
                               command line)
vash$ connect lc              (Instruct vash to 
                               connect to Nexxus)
lc login: blum                (Enter in username 
                               for Nexxus auth)
Password: ****                (Enter in password
                               for user)
NEXXUS_READY                  (Nexxus informs us it is ready)
vash$ ipc lc nexxus:node_list (Get a list of nodes)
NEXXUS:2:JOB_STARTED          (Job started and job id 
                               notification)
NEXXUS:2:NODELIST:box1        (List of nodes)
NEXXUS:2:NODELIST:box2
NEXXUS:2:JOB_COMPLETED        (Job completion notification)
VASH is described in detail in 'Using VACM with vash'.

Node Global Variables

Sometimes there may be information for a node that pertains to all modules. An IP address for example, is in most cases global for a node, and many modules who wish to communicate with a node over the network, will need to know it. For this reason a node can have what are known as "global variables" associated with it. These are variables that can be read or written by the user, and that are sent to all modules when they are loaded, or when a variable is modified. The process of setting and getting global variables is discussed in 'Getting and Setting Node Global Variables' in the 'Nexxus Loopback Module' section.