1. Notification on Interconnect Status Changes

The Network Manager provides a mechanism to trigger actions when the state of the interconnect changes. The action to be triggered is a user-definable script or executable that is run by the Network Manager when the interconnect status changes.

1.1. Interconnect Status

The interconnect can be in any of these externally visible states:

UP

All Cluster Nodes and interconnect links are functional.

REDUCED

All Cluster Nodes and interconnect links that can be reached by the Interconnect Manager are functional, but there are some Cluster Nodes where the node manager is not responding (due to a missing Ethernet connection, or that the Dolphin Software has not been installed on the Cluster Node).

DEGRADED

All Cluster Nodes are up, but one or more interconnect links have been disabled. Disabling links can either happen manually via dis_admin, or through the Network Manager because of problems reported by the node managers. In status DEGRADED, all Cluster Nodes can still communicate via PCI Express, but the overall performance of the interconnect may be reduced.

FAILED

One or more Cluster Nodes are down (the node manager is not reachable via Ethernet), and/or a high number of links has been disabled which isolates one or more Cluster Nodes from the interconnect. These Cluster Nodes can not communicate via PCI Express, but i.e. SuperSockets will fall back to communicate via Ethernet if it is available.

UNSTABLE

UNSTABLE is a state which is only visibly externally. If the interconnect is changing states frequently (i.e. because Cluster Nodes are rebooted one after the other), the interconnect will enter the state UNSTABLE. After a certain period of less frequent internal status changes (which are continuously recorded by Network Manager), the external state will again be set to either UP, REDUCED, DEGRADED or FAILED (The first 60 seconds of operation Network Manager will not consider the unstable state).

It is possible for the user to set the "- unstableinterval <interval in minutes>" in networkmanager.conf. If the cluster changes state more than 5 times in the <interval in minutes> then the state will be UNSTABLE. If the <interval in minutes> is set to 0, then this state will never be set. We exit the UNSTABLE state when the above requirement no longer applies. If the user sets the

While in status UNSTABLE, the Network Manager will enable verbose logging (to /var/log/dis_networkmgr.log) to make sure that no internal events are lost.

1.2. Notification Interface

When the Network Manager invokes the specified script or executable, it hands over a number of parameters by setting environment variables. The content of these variables can be evaluated by the script or executable. The following variables are set:

DIS_FABRIC

The number of the fabric for which this notification is generated. Can be 0, 1 or 2.

DIS_STATE

The new state of the fabric. Can be either UP, REDUCED, DEGRADED, FAILED or UNSTABLE.

DIS_OLDSTATE

The previous state of the fabric. Can be either UP, REDUCED, DEGRADED, FAILED or UNSTABLE.

DIS_ALERT_TARGET

This variable contains the target address for the notification. This target address is provided by the user when the notification is enabled (see below), and the user needs to make sure that the content of this variable is useful for the chosen alert script. I.e., if the alert script should send an email, the content of this variable needs to be an email address.

DIS_ALERT_VERSION

The version number of this interface (currently 1). It will be increased if incompatible changes to the interface need to be introduced, which could be a change in the possible content of an existing environment variable, or the removal of an environment variable. This is unlikely and does not necessarily make an alert script fail, but a script that relies on this interface in a way where this matters needs to verify the content of this variable.

1.3. Setting Up and Controlling Notification

1.3.1. Configure Notification via the dis_netconfig

Notification on interconnect status changes is done via the dis_netconfig. In the Cluster Edit dialog, tick the check box above Alert target as shown in the screen shot below.

Then enter the alert target and choose the alert script by pressing the button and selecting the script in the file dialog. Dolphin provides an alert script /opt/DIS/etc/dis/alert.sh (for the default installation path) which sends out an email to the specified alert target. Any other executable can be specified here. Please consider that this script will be executed in the context of the user running the Network Manager (typically root), so path settings and permissions should be managed accordingly.

To make the changes done in this dialog effective, you need to save the configuration files (to /etc/dis on the Cluster Management Node) and then restart the Network Manager:

# service dis_networkmgr restart

1.3.2. Configure Notification Manually

If the dis_netconfig can not be used, it is also possible to configure the notification by editing /etc/dis/networkmanager.conf. Notification is controlled by two options in this file:

-alert_script <file>

This parameter specifies the alert script <file> to be executed.

-alert_target <target>

This parameter specifies the alert target <target> which is passed to the chosen alert script.

To disable notification, these lines can be commented out (prefix them with a #).

After the file has been edited, the Network Manager needs to be restarted to make the changes effective:

# service dis_networkmgr restart

1.3.3. Verifying Notification

To verify that notification is actually working, you should provoke a interconnect status change manually. This can easily be done from dis_diag by disabling any link via the Node Settings dialog of any Cluster Node.

1.3.4. Disabling and Enabling Notification Temporarily

Once the notification has been configured, it can be controlled via dis_diag. This is useful if the alerts should be stopped for some time. To disable alerts, open the Cluster Settings dialog and switch the setting next to Alert script as needed.

This is a per-session setting and will be lost if the Network Manager is restarted.

Warning

Make sure that the messages are enabled again before you quit dis_diag. Otherwise, interconnect status changes will not be notified until the Network Manager is restarted.