Defining step failure behavior settings

When creating an Automation Plan, for each step in the plan you can define the behavior that occurs if a step in the plan fails. This is known as step failure behavior and is distinct from adding a failure step to a step. You can also set advance failure behavior. Advanced failure behavior enables you to specify a period of time after which to fail the step on targets that have not returned a status.

How it works

Use the step failure behavior feature when designing your Automation Plan. The step failure behavior feature provides you with the ability to control the flow of the Automation Plan on endpoints. It gives you the ability to define the behavior that occurs when steps in your Automation Plan fail on some or all endpoints.

The overall step failure behavior is defined by two separate settings. The first setting, step failure mode, defines if the Automation Plan should stop at that point. The second setting, the failed targets behavior, defines if failed targets are included or excluded from subsequent steps. Separately, and regardless of the values that you define for step failure behavior, if a failure step is defined, the failure step is always run before the step failure behavior is processed. After the failure step is run, the system then processes the remaining steps in the plan, based on what you defined in the step failure behavior settings in the step that failed.

To define step failure behavior, first choose whether to continue or stop the Automation Plan. To do this, select an option from the Step Failure Mode list. If you select the option to stop the Automation Plan on step failure, you do not make any further selection. The Automation Plan action is stopped at this point. If you want to continue the Automation Plan, you must decide if you want to continue on all endpoints or only on those endpoints on which the step was successful.

When adding a step to your Automation Plan, you can define the failure behavior as described in the following tables.
Table 1. Defining the step failure mode

Option Description
Stop Automation Plan Select this option to run any associated failure step and then stop the Automation Plan.
Continue Automation Plan Select this option to run any associated failure step and then move on to the next step in the Automation Plan.
If you decide to continue the Automation Plan, choose from the options described in the following table to define the targeting. If you stop the Automation Plan, the Automation Plan is stopped.
Table 2. Defining targeting

Option Description
Include in Future Steps The endpoints on which the step failed are included in future steps in the Automation Plan. If you specified a non-reporting threshold, unresponsive endpoints are treated as failed steps. If endpoints that have been unresponsive subsequently report back and if you select Include in Future Steps, these unresponsive endpoints are included in future steps if they subsequently report back within the specified
Exclude from Future Steps The Automation Plan continues on the endpoints on which the step was successful. The endpoints on which the step failed are removed from the future steps. This setting also controls if unresponsive endpoints that were excluded from plan steps are also excluded from future steps in the plan.

For Automation Plans created previously, default values are implemented. The default values are Stop Automation Plan. When you open a legacy Automation Plan and save it, the new attributes are added to the saved Automation Plan.

Step failure behavior and failure step targeting

Step failure behavior targeting is different from failure step targeting. When you add a failure step to an Automation Plan, you can apply that failure step to all endpoints targeted in the step, or to only the endpoints on which the step failed. If you add a failure step to a step and set the targeting for that failure step to apply to all endpoints, this targeting might be superseded if you have defined step failure behavior settings. If you have step failure mode defined as Continue Automation Plan and Exclude from Future Steps, any associated failure step targeting is set automatically to Failed Only. The reason for this is that you do not want to run the failure step against endpoints on which you want to run future steps, as this is defined in the step failure behavior settings. Instead, you want to run the failure step only on the endpoints that will be omitted from future steps.

Tracking Automation Plan actions and step failure behavior

You can view Automation Plan actions and step actions on the Automation Plan Action Status dashboard. If a step in your Automation Plan fails, the failure is indicated in the Status column. For steps that fail and do not have step failure behavior behavior defined, a status of Failed is displayed in the Status column. For steps that fail and have step failure behavior defined, a status of Failed on some targets is displayed, identifying that the step has step failure behavior defined and has failed on some targets. Therefore, steps that have a status of Failed are steps that have failed. In this case, the Automation Plan runs any associated failure step and then stops. Steps that have a status of Failed on some targets are steps that have failed on some targets but the Automation Plan continues to run, according to the settings defined by the step failure behavior. The Automation Plan continues to run on all endpoints or only on the endpoints on which the step was successful.

To view the endpoints on which the step failed, click the Detail icon for the particular step.

Advanced failure behaviour

Advanced failure behaviour enables you to design your Automation Plan to run within scheduled maintenance windows by allowing you to specify a time limit for steps to complete on target endpoints. This enables you to control how steps complete and to fail steps on endpoints on which the step has not completed after a period of time that you specify. For example, if you have a maintenance window of 60 minutes and need to include three steps in your plan, you can enable advanced failure behaviour and enter a time period of say 20 minutes for each step. When a step runs and if 20 minutes elapse and the step has not completed on some endpoints, the step is then failed on those endpoints. The advanced failure behaviour settings are disabled by default.

To configure advanced failure behaviour:
  • If you are creating your plan, click the Default Settings icon for the step and go to the Execution tab. Enable the check box for Fail incomplete targets and enter a period of time, in minutes after which you want to fail the step on any endpoints on which the step has not completed.
  • If you are running your plan, from the Take Automation Plan Action screen, click the Execution tab and enable the check box for Fail incomplete targets and enter a period of time, in minutes, after which you want to fail the step on any endpoints on which the step has not completed.
Note: If you open a legacy Automation Plan that had timeout settings configured, the timeout targets are treated as failures. A message appears to indicate this. When you save the plan, the plan is updated and the timeout settings are changed to failure behavior settings.

Setting step failure threshold

The step failure threshold enables you to manage the success or failure of the step based on the percentage success and failure rate of the step on the total number of target endpoints. Setting a step failure threshold allows you to specify the percentage of failing targets that determines the success or failure of the step. For example, if you set the Step Failure Threshold at more than 5% and if the step fails on more than 5% of targeted endpoints, the step is treated as a failed step. If you have set a failure step, the failure step will then be executed. If you set the Step Failure Threshold at more than 5% and the step fails on 5% or fewer of targeted endpoints, the step will be treated as successful and a failure step, if set, is not executed.

To set the step failure threshold:
  1. Open the Automation Plan that contains the step for which you want to configure the step failure threshold and click Edit.
  2. Select the step for which you want to configure the step failure threshold.
  3. Click the Default Settings icon for the step and go to the Execution tab.
  4. In the Step Failure Threshold section, enter a percentage value for the threshold at which to fail the step. For example, if you enter more than 25%, the step will be failed if the step is unsuccessful on more than 25% of endpoints targeted. If the step is unsuccessful on 25% of endpoints, the step is treated as successful. The default value is any which means that if any endpoint fails the step, the step is treated as a failure and if you have defined a failure step for the step, the failure step will be executed.
  5. Click OK and then repeat this process for each step for which you want to configure a step failure threshold.

Pending Restart step actions and step failure behavior

When target endpoints report a Pending Restart status for a step action, the system does not automatically stop those Pending Restart step actions. If the Pending Restart step action was stopped, it would prevent the step action from being updated with the actual result of the action after the restart completed. Instead, the Pending Restart step action remains in an Open state, allowing any Restart Endpoint and Wait for Restart to Complete step added to update its status after the restart completes. This enables you to get the actual outcome of the step action once it becomes known.

This becomes more complex if the step that requires the restart fails. Steps in a Pending Restart state can fail if the step times out or if one or more endpoints report a failure status. If a step in a Pending Restart state fails, the Step failure behavior becomes more complex because the Pending Restart step action remains open. Here are two examples that illustrate how the Pending Restart state works with the step failure behavior settings.

Scenario 1: Failed Pending Restart Step and failure behavior set to Stop plan

In this scenario, step failure behavior is set to Stop Plan. A step fails but some endpoints report back a Pending Restart status. The step that fails has a failure step set. The failure step is Restart Endpoints and Wait step. The Pending Restart is then processed as follows:

  • The system leaves the failed step in an Open state and runs the failure step.
  • The targets that are in a Pending Restart state eventually report back with the actual result of the step action.
  • The system then stops all actions - for the failed step, the failure step, and the plan.

Scenario 2: Failed Pending Restart Step and failure behavior set to Continue Plan

In this scenario, step failure behavior is set to Continue Plan. A step fails but some endpoints report back a Pending Restart status. The step that fails does not have a failure step set. The Pending Restart is processed as follows:

  • The system leaves the failed step in an Open state and executes the subsequent step. The next step in the plan is a Restart Endpoints and Wait step.
  • Next, the Pending Restart targets of the failed step report back the actual status of the step action.
  • The system then stops the Restart Endpoints and Wait step and processes the remaining steps in the plan.
  • Last, when all steps have been processed, the system stops all remaining open step actions, including the action for the failed step, and then stops the plan action.