Procedure to resolve event-reapplying problems at controller startup

When the controller starts and a current plan already exists, you must ensure that no JT events were lost while the controller was stopped. JTOPTS CURRPLAN is the statement used by the controller to determine from which current plan to start. You can set it to one of the following values:
  • Set CURRPLAN(NEW) to start from the latest valid new current plan (NCP) and apply all the events logged in EQQJTARC and in the current EQQJTxx.

    - OR -

  • Set CURRPLAN(CURRENT) to start from the inactive current plan and apply the events logged in EQQJTxx with a date and time later than the inactive current plan backup.
Note: For detailed information about the JTOPTS statement, see Customization and Tuning.

However, if you set the CURRPLAN(CURRENT) option and the controller finds some inconsistencies, the option is not applied and you are requested to set CURRPLAN(NEW). The restarting process then runs based on the checkpoint data set EQQCKPT, which contains information about the current status of HCL Workload Automation for Z.

If problems occur during the restart, you can understand what went wrong by analyzing the initial values stored in the current plan VSAM headers, checkpoint data set, and job-tracking files. This data, however, might have changed or been lost during the restarting process and be difficult to reproduce. To prevent this problem, HCL Workload Automation for Z logs the following messages and provides you with the following modify command:
  • Message EQQN140I, to show the checkpoint data set content at controller startup and stop.
  • Message EQQN141I, to show the checkpoint data set content when the restart processing begins.
  • Modify command /F subsys,DCKPT to show the checkpoint data set header.

The following scenario describes how these messages and command help you to understand what happened when the controller restarts.

Understanding the restarting process

The following scenario describes how you can use EQQMLOG and modify command /F subsys,DCKPT to understand what happens when the controller restarts. Suppose that you:
  1. Start the controller after a REFRESH function. EQQMLOG stores message EQQN140I to show the checkpoint data set header at startup. The value for PLAN EXISTING is N, meaning that a current plan does not exist (because the restart occurred after a REFRESH).
    Note: In this scenario JTOPTS is set to CURRPLAN(CURRENT).
    ...                                                                       
    03/20 17.16.22 EQQZ015I INIT STATEMENT:          CURRPLAN(CURRENT)        
    ....                                                                      
    03/20 17.16.22 EQQN140I CKPT FILE AT START TIME                           
    03/20 17.16.22 EQQN140I ---------------------------------------------     
    03/20 17.16.22 EQQN140I CKPT FILE JUST INITIALIZED: N                     
    03/20 17.16.22 EQQN140I BACKUP IN PROGRESS        : N                     
    03/20 17.16.22 EQQN140I INACTIVE PLAN DATA SET    : EQQCP2DS              
    03/20 17.16.22 EQQN140I CURRENT JT NUMBER         : 05                    
    03/20 17.16.22 EQQN140I LAST ARCHIVED JT NUMBER   : 04                    
    03/20 17.16.22 EQQN140I PLAN EXISTING             : N                     
    03/20 17.16.22 EQQN140I ACTIONS ON NCP            : NO ACTIONS NEEDED     
    03/20 17.16.22 EQQN140I ---------------------------------------------    
    Because a current plan does not exist, the following message EQQN141I does not provide information that is useful for the controller restarting decision.
    03/20 17.16.24 EQQN141I CKPT DATA AFTER RESTART DECISION                  
    03/20 17.16.24 EQQN141I -----------------------------------              
    03/20 17.16.24 EQQN141I INACTIVE PLAN DATA SET      : EQQCP2DS           
    03/20 17.16.24 EQQN141I OCP  LAST BACKUP TIMESTAMP  :                    
    03/20 17.16.24 EQQN141I USED LAST BACKUP TIMESTAMP  :                    
    03/20 17.16.24 EQQN141I EVENTS FOUND                : N                  
    03/20 17.16.24 EQQN141I JTARC MAY BE TRUNCATED      : N                  
    
    03/20 17.16.24 EQQN141I ----------------------------------
    Message EQQN105W provides you with even more detailed information:
    03/20 17.16.24 EQQN105W NO VALID CURRENT PLAN EXISTS. CURRENT PLAN VSAM  
                   I/O IS NOT POSSIBLE   
    Message EQQN012I informs you that, as soon as you create a plan, all job-tracking events will be stored in file EQQJT05 (this is the JT number 05 shown in the previous message EQQN140I):
    03/20 17.16.24 EQQN012I JOB TRACKING EVENTS ARE NOW BEING LOGGED ON FILE   
                   EQQJT05
    To look at the entire content of the checkpoint data set, enter the following modify command:
    /F subsys,DCKPT 
    
    
    
    EQQZ024I LAST CKPT HEADER GET/PUT                                       
    EQQZ024I     START ADDRESS: 0007FF68                TIME: 130320 1716488
    EQQZ024I     000000  F0F0F0F0 F0F0F0F0 F0F00000 00000000  *0000000000...*
    EQQZ024I     000010  00004040 40404040 40404040 C5D8D8C3  *..          E*
    EQQZ024I     000020  D7F1C4E2 C5D8D8C3 D7F2C4E2 F0F5F0F4  *P1DSEQQCP2DS0*
    EQQZ024I     000030  F0F5D500 00000000 00000000 D5D5D5C5  *05N.........N*
    EQQZ024I     000040  D8D8D1E2 F1C4E2D5 F1F0C8E6 E2F9F1F0  *QQJS1DSN10HWS*
    EQQZ024I     000050  F040D500 00000000 404040D6 00640000  *0 N.....   O.*
    EQQZ024I     000060  00000000 00000000                    *........     *
    EQQZ048I A MODIFY COMMAND HAS BEEN PROCESSED. MODIFY CWSG,DCKPT 
    Note: The layout is mapped by the OPC - OPCSTAT record.
  2. Create a new current plan by running a DP batch. EQQMLOG records that a new current plan (NCP) is created, named EQQCP2DS. The inactive plan is EQQCP1DS, the current job-tracking file is EQQJT01, and EQQJT05 was archived in EQQJTARC.
    03/20 17.29.27 EQQN121I A DAILY PLANNING ACTIVITY WAS STARTED             
    03/20 17.29.57 EQQN120I DP BATCH NCP CREATION PHASE ENDED                 
    03/20 17.29.57 EQQN116I A NEW NCP HAS BEEN CREATED                        
    ...                                                                       
    03/20 17.29.58 EQQN056I A CURRENT PLAN COPY PROCESS HAS STARTED           
    03/20 17.29.58 EQQN057I A CURRENT PLAN DATA SET WAS SUCCESSFULLY COPIED:  
                   FROMDD=EQQNCPDS, TODD=EQQCP1DS                             
    ...                                                                       
    03/20 17.29.59 EQQN051I A CURRENT PLAN BACKUP PROCESS HAS STARTED.        
                   TRIGGER WAS: DP END                                        
    ...                                                                       
    03/20 17.30.00 EQQN012I JOB TRACKING EVENTS ARE NOW BEING LOGGED ON FILE  
                   EQQJT01                                                    
    ...                                                                       
    03/20 17.30.04 EQQN056I A CURRENT PLAN COPY PROCESS HAS STARTED           
    03/20 17.30.05 EQQN057I A CURRENT PLAN DATA SET WAS SUCCESSFULLY COPIED:  
                   FROMDD=EQQCP1DS, TODD=EQQCP2DS                             
                                                                              
    03/20 17.30.05 EQQN090I THE JOB TRACKING LOG DATA SET DEFINED BY DDNAME   
                   EQQJT05 HAS BEEN COPIED TO THE JOB TRACKING LOG ARCHIVE    
                   DATA SET 
  3. Stop the controller by using the /P subsys command. EQQMLOG shows the following messages:
    03/20 18.09.08 EQQZ000I A STOP COMMAND HAS BEEN RECEIVED                  
    ...                                                                       
    03/20 18.09.14 EQQN051I A CURRENT PLAN BACKUP PROCESS HAS STARTED.        
                   TRIGGER WAS: NMM END   
    Before closing, the controller creates a current plan backup. EQQMLOG shows you the checkpoint data set information recorded before the CP backup, as follows:
    03/20 18.09.14 EQQN140I CKPT FILE AT NMM ENDING: BACKUP                   
    03/20 18.09.14 EQQN140I ---------------------------------------------     
    03/20 18.09.14 EQQN140I CKPT FILE JUST INITIALIZED: N                     
    03/20 18.09.14 EQQN140I BACKUP IN PROGRESS        : N                     
    03/20 18.09.14 EQQN140I INACTIVE PLAN DATA SET    : EQQCP1DS              
    03/20 18.09.14 EQQN140I CURRENT JT NUMBER         : 01                    
    03/20 18.09.14 EQQN140I LAST ARCHIVED JT NUMBER   : 05                    
    03/20 18.09.14 EQQN140I PLAN EXISTING             : Y                     
    03/20 18.09.14 EQQN140I ACTIONS ON NCP            : NO ACTIONS NEEDED     
    03/20 18.09.14 EQQN140I ---------------------------------------------     
                                                                              
    03/20 18.09.19 EQQN056I A CURRENT PLAN COPY PROCESS HAS STARTED           
    03/20 18.09.19 EQQN057I A CURRENT PLAN DATA SET WAS SUCCESSFULLY COPIED:  
                   FROMDD=EQQCP2DS, TODD=EQQCP1DS                             
    03/20 18.09.19 EQQN090I THE JOB TRACKING LOG DATA SET DEFINED BY DDNAME   
                   EQQJT01 HAS BEEN COPIED TO THE JOB TRACKING LOG ARCHIVE  
                   DATA SET  
    After the CP backup completes, the checkpoint data set information is updated. EQQMLOG shows that the JT data set was archived, the inactive plan is EQQCP2DS, the current JT data set has become EQQJT02, and the latest archived JT data set is EQQJT01. For more detailed information about the archiving process, see Job-tracking-log archiver (JL).
    03/20 18.09.19 EQQN090I COPIED TO THE JOB TRACKING LOG ARCHIVE DATA SET 
    03/20 18.09.19 EQQN140I CKPT FILE AT NMM ENDING: LOG ARCHIVE            
    03/20 18.09.19 EQQN140I ---------------------------------------------   
    03/20 18.09.19 EQQN140I CKPT FILE JUST INITIALIZED: N                   
    03/20 18.09.19 EQQN140I BACKUP IN PROGRESS        : N                   
    03/20 18.09.19 EQQN140I INACTIVE PLAN DATA SET    : EQQCP2DS            
    03/20 18.09.19 EQQN140I CURRENT JT NUMBER         : 02                  
    03/20 18.09.19 EQQN140I LAST ARCHIVED JT NUMBER   : 01                  
    03/20 18.09.19 EQQN140I PLAN EXISTING             : Y                   
    03/20 18.09.19 EQQN140I ACTIONS ON NCP            : NO ACTIONS NEEDED   
    03/20 18.09.19 EQQN140I --------------------------------------------- 
  4. Restart the controller. Message EQQN140I shows exactly the same information that was recorded in the latest run.
    03/20 19.53.55 EQQZ015I INIT STATEMENT:          CURRPLAN(CURRENT)      
    ...                                                                     
    03/20 19.53.55 EQQN140I CKPT FILE AT START TIME                         
    03/20 19.53.55 EQQN140I ---------------------------------------------   
    03/20 19.53.55 EQQN140I CKPT FILE JUST INITIALIZED: N                   
    03/20 19.53.55 EQQN140I BACKUP IN PROGRESS        : N                   
    03/20 19.53.55 EQQN140I INACTIVE PLAN DATA SET    : EQQCP2DS            
    03/20 19.53.55 EQQN140I CURRENT JT NUMBER         : 02                  
    03/20 19.53.55 EQQN140I LAST ARCHIVED JT NUMBER   : 01                  
    03/20 19.53.55 EQQN140I PLAN EXISTING             : Y                   
    03/20 19.53.55 EQQN140I ACTIONS ON NCP            : NO ACTIONS NEEDED   
    03/20 19.53.55 EQQN140I ---------------------------------------------   
    ...                                                                     
    03/20 19.53.56 EQQN141I CKPT DATA AFTER RESTART DECISION                
    03/20 19.53.56 EQQN141I ---------------------------------------------   
    03/20 19.53.56 EQQN141I INACTIVE PLAN DATA SET      : EQQCP2DS          
    03/20 19.53.56 EQQN141I OCP  LAST BACKUP TIMESTAMP  : 130320/17091869   
    03/20 19.53.56 EQQN141I USED LAST BACKUP TIMESTAMP  : 130320/17091869   
    03/20 19.53.56 EQQN141I EVENTS FOUND                : N                 
    03/20 19.53.56 EQQN141I JTARC MAY BE TRUNCATED      : N                 
    
    03/20 19.53.56 EQQN141I ---------------------------------------------   
    ... 

    Suppose that the controller runs one day and performs all the jobs scheduled in the CP and then you run a CP backup. EQQMLOG shows that the inactive plan is now EQQCP1DS and the latest archived JT data set is EQQJT02.

    03/21 15.48.50 EQQN051I A CURRENT PLAN BACKUP PROCESS HAS STARTED.        
                   TRIGGER WAS: BACKUP CMD                                    
    03/21 15.48.50 EQQN012I JOB TRACKING EVENTS ARE NOW BEING LOGGED ON FILE  
                   EQQJT03                                                    
    03/21 15.48.53 EQQN056I A CURRENT PLAN COPY PROCESS HAS STARTED           
    03/21 15.48.53 EQQN057I A CURRENT PLAN DATA SET WAS SUCCESSFULLY COPIED:  
                   FROMDD=EQQCP1DS, TODD=EQQCP2DS                             
    03/21 15.48.54 EQQN090I THE JOB TRACKING LOG DATA SET DEFINED BY DDNAME   
                   EQQJT02 HAS BEEN COPIED TO THE JOB TRACKING LOG ARCHIVE    
                   DATA SET
    You can verify this information also by issuing the modify command /F subsys,DCKPT:
    EQQZ024I LAST CKPT HEADER GET/PUT                                       
    EQQZ024I     START ADDRESS: 0007FF68                TIME: 130321 1552432
    EQQZ024I     000000  F0F0F0F0 F0F0F0F0 F0F00000 00000000  *0000000000...
    EQQZ024I     000010  0000F4F1 F0F3F2F0 F2F4F0F0 C5D8D8C3  *..4103202400E
    EQQZ024I     000020  D7F2C4E2 C5D8D8C3 D7F1C4E2 F0F3F0F2  *P2DSEQQCP1DS0
    EQQZ024I     000030  F0F5D500 00000000 00000000 E8D5D5C5  *05N.........Y
    EQQZ024I     000040  D8D8D1E2 F1C4E2D5 F1F0C8E6 E2F9F1F0  *QQJS1DSN10HWS
    EQQZ024I     000050  F040D500 00000000 404040D6 00640000  *0 N.....   O.
    EQQZ024I     000060  00000000 00000000                    *........     
    EQQZ048I A MODIFY COMMAND HAS BEEN PROCESSED. MODIFY CWSG,DCKPT 
  5. As you continue to work, you add new occurrences to the plan. You then force the controller to immediately stop with a CANCEL function.
  6. When you restart the controller, EQQMLOG shows you that the inactive plan is EQQCP1DS and the latest archived JT data set is EQQJT02:
    03/21 15.55.46 EQQN140I CKPT FILE AT START TIME                         
    03/21 15.55.46 EQQN140I ---------------------------------------------   
    03/21 15.55.46 EQQN140I CKPT FILE JUST INITIALIZED: N                   
    03/21 15.55.46 EQQN140I BACKUP IN PROGRESS        : N                   
    03/21 15.55.46 EQQN140I INACTIVE PLAN DATA SET    : EQQCP1DS            
    03/21 15.55.46 EQQN140I CURRENT JT NUMBER         : 03                  
    03/21 15.55.46 EQQN140I LAST ARCHIVED JT NUMBER   : 02                  
    03/21 15.55.46 EQQN140I PLAN EXISTING             : Y                   
    03/21 15.55.46 EQQN140I ACTIONS ON NCP            : NO ACTIONS NEEDED   
    03/21 15.55.46 EQQN140I ---------------------------------------------          
    The controller uses the information shown in message EQQN141I to decide how to restore a correct current plan. The inactive plan EQQCP1DS is used as the start plan and the events stored in EQQJT03 after the backup date and time 130321/13485048 are applied. The following messages are recorded to show that the inactive plan EQQCP1DS is copied to EQQCP2DS:
    03/21 15.55.49 EQQN056I A CURRENT PLAN COPY PROCESS HAS STARTED           
    03/21 15.55.49 EQQN057I A CURRENT PLAN DATA SET WAS SUCCESSFULLY COPIED:  
                   FROMDD=EQQCP1DS, TODD=EQQCP2DS 
    The following messages show that EQQJT03 is used to update the plan before archiving it, EQQJT04 becomes the current JT data set, and EQQCP1DS becomes the current plan after the CP backup completes. You are also shown that the catch-up process identified and applied all the events required to restore a correct plan:
    03/21 15.55.50 EQQN058I THE SCHEDULER WILL RECOVER THE CP DATA SET FROM   
                   THE JTLOGS                                                 
    03/21 15.55.50 EQQN058I PLEASE BE PATIENT. THIS WILL TAKE A WHILE         
    03/21 15.55.50 EQQN014I THE JOB TRACKING LOG DATA SET, DDNAME EQQJT03 ,   
                   HAS BEEN USED TO UPDATE THE CURRENT PLAN.                  
                   2 EVENTS HAVE BEEN APPLIED TO DD-NAME EQQCP2DS             
    03/21 15.55.50 EQQN051I A CURRENT PLAN BACKUP PROCESS HAS STARTED.        
                   TRIGGER WAS: CATCH UP                                      
    03/21 15.55.53 EQQN056I A CURRENT PLAN COPY PROCESS HAS STARTED           
    03/21 15.55.54 EQQN057I A CURRENT PLAN DATA SET WAS SUCCESSFULLY COPIED:  
                   FROMDD=EQQCP2DS, TODD=EQQCP1DS                             
    03/21 15.55.54 EQQN012I JOB TRACKING EVENTS ARE NOW BEING LOGGED ON FILE  
                   EQQJT04                                                    
    03/21 15.55.54 EQQN013I JOB TRACKING IS NOW ACTIVE AND CURRENT PLAN       
                   DD-NAME IS EQQCP1DS  

Description of messages EQQN140 and EQQN141

Messages EQQN140 and EQQN141 are informational. EQQN140 is issued at controller startup and stop, and shows the following information:
CKPT FILE AT: &phase
---------------------------------------------
CKPT FILE JUST INITIALIZED: &NEWCKPT   
BACKUP IN PROGRESS        : &BACKUP    
INACTIVE PLAN DATA SET    : &OLDCP     
CURRENT JT NUMBER         : &CURRJT    
LAST ARCHIVED JT NUMBER   : &LASTJT    
PLAN EXISTING             : &VALCP     
ACTIONS ON NCP            : &NCPSTAT     
---------------------------------------------     
Where:
CKPT FILE AT
It can be one of the following values:
START TIME
Controller has just started.
NMM ENDING: BACKUP
Controller is stopping and the current plan is being backed up.
NMM ENDING: LOG ARCHIVE
Controller is stopping, the current plan is backed up, and the latest JT data set is archived.
CKPT FILE JUST INITIALIZED
It can be one Y or N. Y means that the checkpoint data set has just been initialized with the default values. This might occur the first time that the controller starts, when the checkpoint data set is empty.
BACKUP IN PROGRESS
It can be Y or N. Y means that when the controller was stopped, a backup plan was in progress, therefore the controller could not complete switching the CP and JT data set.
INACTIVE PLAN DATA SET
The DD name of the inactive plan (the old current plan).
CURRENT JT NUMBER
The name of the current JT data set.
LAST ARCHIVED JT NUMBER
The name of the latest archived JT data set.
PLAN EXISTING
It can be Y or N. Y means that a current plan exists.
ACTION ON NCP
The status related to the creation of CP. Possible values are:
NCP TO BE USED AS CURRENT PLAN
The new current plan is to be used by the controller as the current plan.
NCP UNDER CREATION, DP BATCH IS WORKING ON IT
A DP batch command is running to create a new plan.
NO ACTIONS NEEDED
No action to be done on the NCP.
EQQN141 is issued after the restarting decision was taken and shows the following information:
CKPT DATA &phase
---------------------------------------------
INACTIVE PLAN DATA SET      : &OLDCP          
OCP  LAST BACKUP TIMESTAMP  : &DTCP           
USED LAST BACKUP TIMESTAMP  : &DTUS           
EVENTS FOUND                : &EVENT          
JTARC MAY BE TRUNCATED      : &JTARCT         
---------------------------------------------   
Where:
CKPT DATA
It always has the value AFTER RESTART DECISION.
INACTIVE PLAN DATA SET
The DD name of the inactive plan (the old current plan).
OCP LAST BACKUP TIMESTAMP
Date and time of the latest backup in the inactive CP data set.
USED LAST BACKUP TIMESTAMP
Date and time that the latest backup was actually used. Normally it is the same as OCP LAST BACKUP TIMESTAMP. However, if a backup was in progress at stop time and the backup is not reliable, the controller considers the date and time of the first event in the current JT data set as the latest backup. This date is used to decide which events are to be applied.
EVENTS FOUND
Events to be applied are found in the JT data set.
JTARC MAY BE TRUNCATED
It can be Y or N. Y means that the latest archived JT data set was not complete and that data might have been lost.