Network link problems

HCL Workload Automation has a high degree of fault tolerance in the event of a communications problem. Each fault-tolerant agent has its own copy of the Symphony file, containing the production period's processing. When link failures occur, they continue processing using their own copies of the Symphony file. Any inter-workstation dependencies, however, must be resolved locally using appropriate console manager commands: deldep and release, for example.

While a link is down, any messages destined for a non-communicating workstations are stored by the sending workstations in the <TWA_home>/TWS/pobox directory, in files named <workstation>.msg. When the links are restored, the workstations begin sending their stored messages. If the links to a domain manager are down for an extended period of time, it might be necessary to switch to a backup (see IBM® Workload Scheduler: Administration Guide).

Note:
  1. The conman submit job and submit schedule commands can be issued on an agent that cannot communicate with its domain manager, provided that you configure (and they can make) a direct HTTP connection to the master domain manager. This is configured using the conman connection options in the localopts file, or the corresponding options in the useropts file for the user (see the IBM® Workload Scheduler: Administration Guide for details).

    However, all events have to pass through the domain manager, so although jobs and job streams can be submitted, their progress can only be monitored locally, not at the master domain manager. It is thus always important to attempt to correct the link problem as soon as possible.

  2. If the link to a standard agent workstation is lost, there is no temporary recovery option available, because standard agents are hosted by their domain managers. In networks with a large number of standard agents, you can choose to switch to a backup domain manager.