Continuous operation with automatic failover

Automatic switchover to a backup engine and event processing server when the active master domain manager becomes unavailable.

Ensure continuous operation with the automatic failover and high availability features of HCL Workload Automation. Configure one or more backup engines so that when a backup detects that the active master becomes unavailable, it triggers a long-term switchmgr operation to itself. You can optionally define potential backups in a list adding preferential backups at the top of the list. The backup engines monitor the behavior of the master domain manager to detect anomalous behavior. If one or more of the following behaviors persist for more than 5 minutes, an automatic long-term switchmgr operation is triggered:
  • WebSphere Application Server Liberty Base is down. HCL Workload Automation monitors WebSphere Application Server Liberty Base and tries to restart it if it is down.
  • The fault-tolerant agent of the master domain manager is monitored to check on the status of processes such as, Batchman, Mailman and Jobman. If one or more of these processes are down, the master domain manager makes three attempts to restart them. If the master is unsuccessful, then the automatic failover is triggered.
  • The engine can no longer contact the database, for example, due to a network outage.
If you have defined potential backups in a list, and a switch after 5 minutes is not possible with the first backup in the list because it is unavailable, then an attempt is made to contact the remaining backups in the list, following the order specified in the list, until an available backup is found to perform the switch. In this case, 5 minutes pass between each attempt.

Similarly, if a backup event processor detects that the event processor (which can be different from the master domain manager) is no longer available, the backup triggers a long-term switchevtproc to itself. You can configure a subset of workstations to act as a backup for the event processor. This is a separate list from the potential master domain manager backups because you might have a workstation that can serve as the event processor backup, but you do not want it to act as a potential master domain manager backup.

When you perform a fresh installation of the product, these features are enabled by default. When upgrading from an existing back-level version, the features are disabled and can be configured. Note that the automatic failover feature requires that the master domain manager and the backup master domain managers are installed by the same user.

For information about configuring automatic failover, see Automatic failover.

To achieve a high availability configuration, configure a load balancer in front of the master domain manager and backup master domain managers so that users are unaware of when a switch occurs and administrators configure a single engine connection in single-sign on that points to the name or IP address and port number of the load balancer and not ever need to know the hostname of the current active master. In a high available configuration, where the master and backup masters are configured behind a load balancer, workload requests are dispatched across all configured nodes to avoid any single node from being overloaded and avoids a single point of failure. To see a sample of an end-to-end high availability configuration, see An active-active high availability scenario.