Posting incidents and sharing information through a chat tool

When a problem arises, reacting is crucial. Identifying the issue, gathering possible solutions, choosing the best way to proceed are the fundamentals of problem-solving. In this realm, rapid communication becomes critical. By integrating with IBM Z ChatOps, HCL Workload Automation for Z provides you with a chat tool where you are notified about incidents and can share information with other team members. You are alerted through the chat platform of your choice (such as Slack, Microsoft Teams, or Mattermost) and communicate with the other chat users to share data and perform actions. Collaboration becomes easy, immediate, and effective for promoting teamwork and addressing daily issues.

To have the Z controller post an incident through a chat tool when an alert condition occurs, you need to:
  1. Add the EQQINCID DD statement to the HCL Workload Automation for Z JCL procedure. For detailed information about EQQINCID, see Incident data set (EQQINCID).
  2. Define the INCOPTS statement.
  3. In the INCIDENT parameter of the ALERTS statement, set the alert conditions for which to post an incident through the chat tool.
  4. Add the certificates that you have downloaded from your incident notifying tool to a key ring that is trusted by HCL Workload Automation for Z, as follows:
    If you use a SAF ring, perform the following steps:
    1. Create the sequential data sets where the downloaded certificates are to be stored. In this procedure, the certificates INCTOOL.CERT.SERVER and INCTOOL.CERT.ROOT are used as an example.

      If your certificate is chained, you must create a data set for each intermediate or root certificate and save them.

    2. Create a key ring (in this example, EQQRING) by using the certificates management command RACDCERT. Skip this step if you already use a key ring for HCL Workload Automation for Z services. (For more information about the RACDCERT command, see the section RACDCERT (Manage RACF digital certificates) in the IBM z/OS Server Security RACF Command Language Reference manual).
      RACDCERT ADDRING(EQQRING) ID(Your_RACF_userID)
    3. Add each certificate to the RACF database:
      RACDCERT CERTAUTH ID(Your_RACF_userID) ADD('INCTOOL.CERT.ROOT') TRUST WITHLABEL('INCTOOL ROOT')
      RACDCERT CERTAUTH ID(Your_RACF_userID) ADD('INCTOOL.CERT.SERVER') TRUST WITHLABEL('INCTOOL SERVER')
    4. Connect each certificate to the EQQRING key ring :
      RACDCERT ID(Your_RACF_userID) CONNECT(LABEL('INCTOOL ROOT') RING(EQQRING) USAGE(CERTAUTH)
      RACDCERT ID(Your_RACF_userID) CONNECT(LABEL('INCTOOL SERVER') RING(EQQRING) USAGE(PERSONAL)
    5. Check that the certificates have been successfully added to the chain:
      RACDCERT ID(Your_RACF_userID) LISTCHAIN(LABEL('INCTOOL SERVER')
    6. Update the SSL parameters in the HTTPOPTS statement according to the values that you have set in this procedure.
    If you use the keystore in the UNIX System Services, perform the following steps:
    1. Save the downloaded certificates into a USS directory. In this procedure, /u/mycerts is used as an example.

      If your certificate is chained, you must create a file for each intermediate or root certificate and save them.

    2. From /u/mycerts, create a keystore database (in this procedure, the gskkyman utility is used). Skip this step if you already use a database for HCL Workload Automation for Z services.
      gskkyman
    3. From the Database Menu, select option 1 - Create new database. Skip this step if you already use a database for HCL Workload Automation for Z services.On completion, the following message is issued:
      Key database /u/mycerts/my_db_name.kdb created
    4. To store your database password in a file, from the Key Management Menu select option 10 - Store database password. Skip this step if you already use a database for HCL Workload Automation for Z services.
      On completion, the following message is issued:
      Database password stored in /u/mycerts/my_db_name.sth
    5. From the Database Menu, select option 2 - Open database and enter the key database name and database password.
    6. Import each certificate to your keystore database by selecting option 7 - Import a certificate from the Key Management Menu.
    7. Based to the values that you set in this procedure, update the SSL parameters in the HTTPOPTS statement.
You can set that the incidents are posted for the following alert conditions:
DURATION
An operation in the current plan is active for an unexpectedly long time.
ERROROPER
An operation in the current plan is set to ended-in-error status.
HIGHRISK
The risk level of a critical operation in the current plan has become High.
LATEOPER
An operation in the current plan becomes late, which means that it reaches its latest start time and does not have the status started, complete, or deleted.
OPCERROR
An HCL Workload Automation for Z subtask or subsystem ends unexpectedly.
POTENTRISK
The risk level of a critical operation in the current plan has become Potential.
SPECRES
The time that an operation in the current plan is waiting to allocate a given resource exceeds the time specified by the RESOPTS CONTENTIONTIME parameter.
WLMOPER
An operation in the current plan is promoted by WLM.
The EQQINCID data set includes:
  • The members containing the text of the incidents, which you set in ALERTS INCIDENT.
  • A member named RULES (required).

    This member contains the rules that must be met for the incidents to be notified. Each rule consists of a FILTER, HEADER, and optionally a TEXTMEMBER, in the following format:

    FILTER(expression1, expression2, ..., expressionn)
    HEADER(header_text)
    [TEXTMEMBER(member_name)]
    Note: Each rule is associated with only one FILTER, HEADER, and TEXTMEMBER. If within a single rule you specify more than one FILTER, HEADER, or TEXTMEMBER, only the first occurrence is taken into account.
    Where:
    FILTER(expression1, expression2, ..., expressionn)
    The expressions to be satisfied for the incident to be notified, separated by commas. The incident is notified when all the expressions in the filter are met; for each satisfied filter the corresponding incident is posted.
    Each expression has the following format, which is not case-sensitive:
    value=filter
    where:
    value
    String of alphanumeric characters, included variables (for details about variables, see Variables allowed in the EQQINCID members). It cannot contain blanks.
    filter
    String of alphanumeric characters. It cannot contain blanks. You can use the wildcard characters asterisk (*) and percent sign (%).
    For example, you can set a FILTER that includes all the applications whose name begins with MY and ended with error code 16, as follows:
    FILTER(&OADID=MY*,&OERRCODE=16) 
    HEADER(header_text)
    Information used for the incident header, separated by blanks. As the header_text you can specify the following information.
    Note:
    • Each piece of information (Summary, Priority, and Severity) is followed by colons (:) and can be set only once. If you specify more than one, only the first is considered.
    • The sign colons (:) cannot be specified inside the header_text. If you specify it, the text that follows is not considered.
    Summary:
    Required.
    Priority:
    Optional. Valid values are high, medium, low. The default is low.
    Severity:
    Optional. Valid values are fatal, critical, major, minor, low. The default is low.
    For example, you can set a HEADER as follows:
    HEADER(
    Summary: This is the application error
    Priority: High 
    Severity: Fatal 
    )
    TEXTMEMBER (member_name)
    Optional. The member containing the text of the incident. If you do not specify any, the member set in ALERTS INCIDENT is used as default. For each alert condition, one member is defined.
Variables allowed in the EQQINCID members shows you the variables that you can use in the EQQINCID members. Variables are resolved only if they are meaningful for the event condition that you have set in the ALERTS INCIDENT parameter. Otherwise, the variable is ignored.
Table 1. Variables allowed in the EQQINCID members
Variable name (must be preceded by &) Variable description Max length Alert condition
ALERCOND Alert condition that generated the incident (for details, see the alert conditions listed in ALERTS INCIDENT).
It can assume the following values:
  • DURATION
  • ERROROPER
  • HIGHRISK
  • LATEOPER
  • OPCERROR
  • POTENTRISK
  • SPECRES
  • WLMOPER
10 DURATION, ERROROPER, HIGHRISK, LATEOPER, OPCERROR, POTENTRISK, SPECRES, WLMOPER
OADID Application ID 16 DURATION, ERROROPER, HIGHRISK, LATEOPER, POTENTRISK, SPECRES, WLMOPER
OADOWNER Occurrence owner. 16 DURATION, ERROROPER, LATEOPER, SPECRES, WLMOPER
OTOKEN Occurrence token. 8 DURATION, ERROROPER, LATEOPER, SPECRES, WLMOPER
OAUGROUP Authority group. 8 DURATION, ERROROPER, HIGHRISK, LATEOPER, POTENTRISK, SPECRES, WLMOPER
ODMY1 Occurrence input arrival date, DDMMYY. 6 DURATION, ERROROPER, HIGHRISK, LATEOPER, POTENTRISK, SPECRES, WLMOPER
OJOBNAME Operation job name. 8 DURATION, ERROROPER, HIGHRISK, LATEOPER, POTENTRISK, SPECRES, WLMOPER
OOPNO Operation number within the occurrence, right-justified and padded with zeros. 3 DURATION, ERROROPER, HIGHRISK, LATEOPER, POTENTRISK, SPECRES, WLMOPER
OWSID Workstation ID for the current operation. 4 DURATION, ERROROPER, HIGHRISK, LATEOPER, POTENTRISK, SPECRES, WLMOPER
OJOBID Job number. 8 DURATION, ERROROPER, WLMOPER
OERRCODE Error code. 4 ERROROPER
RESNAME Resource name. 44 SPECRES
RESWTTM Resource waiting time 4 SPECRES
TASKNAME HCL Workload Automation for Z task name. 16 OPCERROR

Troubleshooting

When errors occur in detecting and notifying an incident, messages are logged in the EQQMLOG file. You can set a further level of diagnosis by adding DIAGNOSE MONFLAGS(X'00000200') to the member of the EQQPARM library.