Specifying recovery criteria

You specify the recovery criteria as part of the JCL for the operation in the form of special control statements. In the case of jobs, the recovery instructions must be inserted between the JOB statement and the first execution step (after the JOBLIB or JOBCAT statement and in-stream procedures, if present). HCL Workload Automation for Z ignores recovery instructions within in-stream procedures. So for started tasks (where the whole JCL is an in-stream procedure) place any recovery statements before the PROC statement.

Using these statements, you specify the type of error for which the recovery is attempted and how the recovery is achieved. If the error type is not one that you have covered in your specification, the failed operation remains in the Operations Ended-in-Error list.

HCL Workload Automation for Z retrieves the JCL for automatic recovery from the JCL repository (JS) data set. This means that automatic recovery can take place only for jobs or started tasks submitted by HCL Workload Automation for Z.

The automatic recovery function takes over when a job or started task ends in error. At that time, this information is available:
  • The error code for the operation. This can be:
    • The abend code of an abending step
    • The return code of the last step
    • An error code set by HCL Workload Automation for Z, such as JCLI, CCUN, JCL, CLNO, CLNA, CLNC, CAN, PCAN, CLNP, OFxx, or OSxx
    • An error code set by the job completion checker
      Note: Automatic recovery is not applicable for error codes, such as OSUP, that refer to jobs that have not reached the job queue.
  • The name of the abending step, if the error is associated with a step.
  • Step completion codes and step names for all steps executed. The step completion code is either an abend code or a return code.

If the error occurs in the initialization phase or in the completion phase of the job or started task, no step information is available. Statements that specify recovery actions for certain steps are not applicable to such errors.

HCL Workload Automation for Z begins the automatic recovery process by scanning the job for the first //*%OPC RECOVER statement where:
  • The step name matches the name of the failing step from the operating system.
  • The error code matches the error code from the job and started-task tracking function.
  • The return code matches the step return codes or abend codes from the job and started-task tracking function.
  • The RECOVER statement is unconditional (it specifies no step name, error code, return code, or abend code).

This means you should place the RECOVER statements with the most restrictive matching conditions before the RECOVER statements that deal with more general cases.

For example, assume there are three recovery procedures for a job. R1 is set up to handle errors of type E. R2 is set up to handle errors of type T, which includes error type E. R3 is a general recovery procedure for all errors in the job. The RECOVER statements should be placed in this order:
//*%OPC RECOVER   if error E      - actions R1
//*%OPC RECOVER   if error type T - actions R2
//*%OPC RECOVER   unconditionally - actions R3

This ensures that errors are handled by the RECOVER statement that is designed to handle it best.

When a matching statement is found, the recovery actions are controlled by the parameters on that statement. The RECOVER statement can specify these actions:
  • Restart the current occurrence at the failed operation, with or without JCL changes.
  • Restart the current occurrence at another operation.
  • Add occurrences of special recovery applications. Make the restart of the failed occurrence dependent on the completion of the recovery occurrences. This action lets you, for example, perform a data set recovery before restarting your main application. See Adding predecessor recovery occurrences to the current plan.
  • Release a dependent occurrence.
  • Restart the current occurrence at the failed step or at another step, with any of these JCL modifications:
    • Delete steps
    • Add recovery steps
    • Change JCL statements in a program exit module
  • Remain in error status.

The rule that controls how HCL Workload Automation for Z selects the failed step is described in Deciding which step of an operation has failed. For example, the error selection criteria if error E might match more than one failed step. In this case, HCL Workload Automation for Z selects the first step in the job that meets the criteria specified in the RECOVER statement. If this is not correct, you must change your RECOVER statements so that the correct step is chosen.

You can disable and enable the recovery function from the SERVICE FUNCTIONS panel. Also, you can set defaults so that the recovery function does not react to certain errors or reacts only at certain times unless specific requests are made.

When a RECOVER statement is matched against an error condition, the statement is changed to a JCL comment. If the job is rerun, it no longer functions as a RECOVER statement. Other RECOVER statements in the JCL remain active.