What happens if jobs or started tasks fail?

If you are automating a system, remember the 1% or 0.1% of cases when something goes wrong. Nobody likes a telephone call at 3 a.m., even if there is a terminal next to the bed. Ask your on-call experts what they do when they have to look at failed jobs. Then get HCL Workload Automation for Z to do it for them.

Here are the payroll analyst's notes on PAYDAILY:
  1. PAY04 transfers the hours-worked data from the CICSA key-sequenced data set to a sequential data set. If this program fails, the job can be rerun.
  2. PAY06 validates the hours-worked data and updates the payroll database if the data is valid. If there are validation errors (return code 4), payroll inspects and corrects the data, and the job is restarted from PAY04. If there are other errors, PAY04 can be rerun after running PAYRECOV.

This is how to change the PAYDAILY JCL so that HCL Workload Automation for Z handles it:

//PAYDAILY JOB  (890122,NOBO),'SAMPLE'
//*
//*       PAYMORE PAYROLL SAMPLE
//*       THIS JOB RUNS PAY04 AND PAY06
//*%OPC RECOVER ERRSTEP=PAY04                       1
//*%OPC RECOVER ERRSTEP=PAY06,STEPCODE=4,RESTART=N  2
//*%OPC RECOVER ERRSTEP=PAY06,ADDAPPL=PAYRECOV      3
//PAY04    EXEC PGM=PAY04
//STEPLIB  DD DSN=XRAYNER.OPC.LOADLIB,DISP=SHR
//PAYIN    DD DSN=XRAYNER.CICS.PAYDB,DISP=SHR
//PAYOUT   DD DSN=XRAYNER.DAY.TRANS,DISP=SHR
//SYSPRINT DD SYSOUT=*
//PAY06    EXEC PGM=PAY06,COND=(0,LT)
//STEPLIB  DD DSN=XRAYNER.OPC.LOADLIB,DISP=SHR
//PAYIN    DD DSN=XRAYNER.DAY.TRANS,DISP=SHR
//PAYOUT   DD DSN=XRAYNER.PAYROLL.DATABASE,DISP=SHR
//SYSPRINT DD SYSOUT=*

This is an explanation of the marked lines:

1 tells HCL Workload Automation for Z what to do if PAY04 fails for whatever reason. The default, with no action on the RECOVER statement, is that HCL Workload Automation for Z reruns the job. But before rerunning the job, it turns this RECOVER statement into an HCL Workload Automation for Z comment and saves the JCL in the job repository, so that PAY04 is not continually rerun in a loop if it keeps failing.

2 tells HCL Workload Automation for Z what to do if step PAY06 fails with return code 4. RESTART = N means leave the operation on the error list for the analysts to look at.

3 is the action for any other failure in PAY06. HCL Workload Automation for Z will add the PAYRECOV application to the plan, run it, and make the rerun of PAYDAILY dependent on PAYRECOV (PAYRECOV becomes the predecessor of PAYDAILY).

See Automatic recovery of jobs and started tasks for a full description of automatic recovery.