Fault recovery in a cluster

Fault recovery is the ability of an HCL Domino® server to clean up and restart itself after a failure. Fault recovery works well in a Domino® cluster. If there is no Domino® server to fail over to, fault recovery still ensures that users will have constant access to their data. Even if users fail over to another cluster server, fault recovery increases availability because the failed server becomes available again. In addition, depending on the workload balancing parameters you've set, some users will fail back to the original server when they open new databases.

About this task

If you are using an operating system cluster in conjunction with a Domino® cluster, the decision about whether or not to use fault recovery depends on how you configured the operating system cluster. If you configured the operating system cluster to fail over on a hardware failure only, fault recovery works well. Fault recovery restarts Domino® on its current server, and no operating system fail over occurs.

If you configured your operating system cluster to fail over on both hardware and software failures, you don't need fault recovery because the operating system cluster will restart Domino® on another server in the cluster. In fact, you should disable fault recovery so you won't have Domino® restarting itself while the operating system cluster is also restarting it.

By default, fault recovery is disabled. You enable it in the server document.

Procedure

  1. From the Domino® Administrator or the Web Administrator, click the Configuration tab.
  2. In the Task pane, expand Server, and click All Server Documents.
  3. In the Results pane, select the Server document you want, click Edit Server, and then click the Basics tab.
  4. In the Fault Recovery section, click Enabled in the Automatically Restart Server After Fault/Crash field.
  5. Optional: Complete any of the following fields that you want.
    • In the Run This Script After Server Fault/Crash field, enter the name of a cleanup script.
      Note: Do not try to activate NSD from this field. You activate NSD from the field Run NSD To Collect Diagnostic Information.
    • In the Run NSD To Collect Diagnostic Information field, choose Enabled to activate NSD when there is a fault or crash.
    • In the Cleanup Script / NSD Maximum Execution Time field, enter the maximum time for a cleanup script to run before being terminated. The maximum time you can specify is 1,800 seconds.
    • In the Maximum Fault Limits field, enter the maximum number of restarts allowed during the specified period. If the number of restarts exceeds the limit, the server won't restart.
    • In the Mail Fault Notification to field, enter the names of people or groups to receive an e-mail notification message each time the server restarts.
  6. Make any other changes you want to the Server document, and then click Save & Close.