Automating patching of operating systems in Microsoft Windows clusters

You can patch Windows® clusters using Server Automation Fixlets. To automate the workflow, you combine the Fixlets in an Automation Plan. The Automation Plan that automates the workflow includes a Fixlet to pause the node that you are patching. Another Fixlet moves any groups on the node to another node. The Fixlets to patch the operating system are run next. The last Fixlet in the workflow resumes the node. The Fixlets to pause and resume the nodes are operating system-specific and the supported operating systems are described in the Fixlet descriptions. There are two procedures described in this topic, one to patch clusters on Windows 2003 and 2008 operating systems and another to patch clusters on Windows 2008 Release 2 and later operating systems.

Before you begin

The Windows cluster control scripts must be installed on the target nodes in the cluster. You can install these scripts by running the 110 Install Windows Cluster Control Application Fixlet.

About this task

The Server Automation cluster patching Fixlets support general Microsoft Windows clusters, including patching scenarios such as Microsoft SQL Server cluster patching.

The following procedure is designed for use with a cluster set up with a quorum configured for the following systems:

  • DiskWitness only (2012 systems)
  • NoMajority (Disk Only) (2008 systems)
  • Standard Quorum (2003 systems)
With this setup, the cluster remains operational as long as at least one of the nodes in the cluster is still running. If the quorum for the cluster is configured in any other way, ensure that when targeting the nodes in the cluster that half of the total nodes +1 are operational at any time. For example, if there are 20 nodes in the cluster, 11 nodes (10 +1) nodes must be operational at any one time. If there are 16 nodes in the cluster, 9 nodes (8 +1) must be operational at any one time. Otherwise, the cluster stops working. The cluster does not resume functioning correctly until more than half the nodes have resumed the cluster service and the cluster is unavailable until this time.
Note: If your clusters are on Windows 2008 Release 2 or later operating systems, you can use an alternative procedure to automate the patching workflow. For more information, see Automating patching of clusters on Windows 2008 Release 2 and later operating systems.

Complete the following steps to patch a Windows cluster using the Server Automation Fixlets.

Procedure

  1. Pause the node or nodes in the cluster that you want to patch. Run Fixlet 112 Pause Node in the Cluster (Windows 2003) to pause the node on Windows 2003 and Fixlet 116 Pause Node in the Cluster (Windows 2008-2012) to pause the node on Windows 2008 and 2012. When running the Fixlet to pause the node, target each node that you want to pause.
  2. Move any groups on the node to another node. Use Fixlet 111 Move Groups from Node in the Cluster (Windows 2003) to move groups on Windows 2003 and use Fixlet 114 Move Groups from Node in the Cluster (Windows 2008-2012) to move groups on Windows 2008 and 2012.
  3. Patch or update the node as required. If you use a Baseline to patch the node, check if the Baseline contains an action script that causes the node to report a status of Pending Restart, for example, action requires restart. If the node returns a Pending Restart status, the system determines this as a wait state and does not complete the step. To correct this, you must include a restart Fixlet as part of the Baseline.
  4. Resume the node. To resume on Windows 2003, use Fixlets 113 Resume Node in the Cluster (Windows 2003). To resume on Windows 2008 or Windows 2012, use Fixlets 115 Resume Node in the Cluster (Windows 2008-2012).
  5. Repeat this process for the remaining nodes in the cluster.

Automating patching of clusters on Windows 2008 Release 2 and later operating systems

To patch clusters with Windows 2008 Release 2 and later operating systems, there are two enhanced patching Tasks available. Using these Tasks in an Automation Plan to automate the workflow reduces the number of group moves through the patching cycle, while maintaining the clustered applications availability for the maximum possible time during patching. The cluster groups remain on the nodes they were on before the patching for the maximum possible time and each node is returned to its original state after it is patched, rather than at the end of the entire cluster patching process.

Before you begin

About this task

You use an automation plan to process the automation flow and patch the nodes in the cluster. You complete the patching in two phases, in the first phase, you patch the first node or group of nodes. You then repeat the process on the second node or group of nodes. The cluster remains operational at all times throughout the patching process.

Complete a procedure similar to the following to automate the patching process.

Procedure

  1. Create a new automation plan or copy a suitable sample automation plan.
  2. Add Task 138 Pre Patching Task For Non Hyper-V Clustered Microsoft Servers (Version 2008 R2 onwards) as the first step in the plan. When you are running the plan, you target the first node or group of nodes.
    This Task performs the following functions:
    1. Creates a file that lists all groups and virtual machines in the cluster, including what nodes they are on and the states of each resource in the cluster. If there are empty groups (groups with no resources) in the cluster, a second file is created detailing these groups.
    2. Pauses the node.
    3. Moves any groups on the node that have other available owners. If there are no other potential owners available to take a group, the task will fail. If a group is set up to be online only on the current node, the task takes this group offline (groups with only the target node as the possible owner of the group or at least one of the resources in the group).
    4. Moves groups that have other available owners. If no other owners are available to take groups, the Task fails. Any groups set up to be online only on the current node (only target set as possible owner of the group or at least one resource in the group) are the taken offline.
    5. Takes any empty groups offline.
    6. Checks to ensure the node is paused and has no active groups remaining and is a state suitable for patching the server operating system.
    Note: This Task does not move any groups that were offline before the Task was run.
  3. Add a second step to the plan, selecting the Fixlet, Task, or Baseline to patch the underlying operating system. When you are running the plan, you target the first node or group of nodes.
  4. Add a third step to the plan, selecting Server Automation Task ID 126 Restart Endpoint and Wait for Endpoint to Restart. When you are running the plan, you target the first node or group of nodes.
  5. Add a fourth step to the plan, selecting Server Automation Task ID 129 Post Patching task for Microsoft Server Clusters (Server 2008 R2 onwards including Hyper-V Clusters). When you are running the plan, you target the first node or group of nodes.
    This Task performs the following functions:
    1. Resumes the node.
    2. If the cluster detail file is found on the targeted endpoints, the Task checks this file and moves any virtual machines and groups back on to the node if they were moved off the node during patching. The Task then brings back online any empty groups or groups which had the target node as the only possible owner of the group before patching.
  6. Add four more steps to the plan, to repeat the automation flow completed in steps 2 to 5 for the remaining nodes in the cluster. For each of these steps, you must target the second node or group of nodes in the cluster.