ICp high-availability (HA) requirements and configuration prerequisites

Review system requirements before configuring IBM® Cloud Private for a high-availability (HA) deployment.

Prerequisite for installing for high availability

High availability deployment is only available with IBM® Cloud Private Enterprise Edition, available from IBM® Passport Advantage®. Download both of these files for your system which will be used during the HA install of IBM® Cloud Private.

IBM® Cloud Private 1.2.1 Installer (CNM6SEN)
IBM® Cloud Private 1.2.1 for Linux® (64-bit) Docker English (CNM6TEN)

Installing for high availability

High availability support is now available for IBM® Cloud Private. This allows for the setup of several IBM® Cloud Private Master and Proxy nodes.

CAUTION: HA deployments can only tolerate master node failure of (n-1)/2 nodes. For example, in the case of 3 masters, a deployment can only tolerate losing 1 master node.

Follow these additional requirements if you are installing for high availability:

You must have the following two files downloaded or available from a URL before installing:
- ibm-cloud-private-installer-1.2.1.tar.gz
- ibm-cloud-private-x86_64-1.2.1.tar.gz
Specify three master nodes and three proxy nodes.
The boot and one master node must be on the same server.
Host names for these servers must be lowercase. A good practice is to name them after the role they play in the deployment, plus the number of the machine. You can then set the floating IP to the role. For example, name your servers master1.hostname.com, master2.hostname.com, master3.com and name the floating IP master.hostname.com.
All machines of the same type must be on the same subnetwork. That is, all master nodes and the floating master ip address must be on the same subnetwork, and all proxy nodes and floating proxy ip address must on the same subnet.
Note: If the proxy and master servers are not co-located, they can be on different subnets. If they are co-located, they must all be on the same subnet, including the floating IP addresses.
The workers can be on any subnet as long as they can access the persistent storage, but it is good practice to keep them on the same subnet and on a network that connects as quickly as possible to persistent storage.

To check the machines are on the same subnet, use the command ip address and check that the public networks match. For example they are all set to 9.162.155.xxx. Then, check the gateways and net masks also match.

Note: The network interface name must also be consistent for all master nodes and for all proxy nodes. From the output of the ip address command on the servers, the same network interface name should be listed. For example, eth0.

For High Availability to be supported, specify the high availability parameters specified in the Installing IBM® Cloud Private topic.

CARP: High Availability heartbeat

For High Availability to work, CARP is needed to provide a heartbeat between the different master servers and the different proxy servers. If a node in a cluster does not receive a heartbeat, it assumes it must be the active node in the cluster. To support CARP; Promiscuous Mode, MAC Address Changes, and Forged Transmits all need to be enabled on the port group assigned to hosting the subnets of the servers. For more details on CARP consult this article.

ICp High Availability (HA) - Requirements

You will need the following system requirements to deploy ICp HA.

Boot node

One boot node co-located with a chosen Master.

Master nodes

Multiple master nodes are required in a high availability (HA) environment to allow for failover if the leading master host fails. Hosts that can act as the master are called master candidates. 3 or 5 Master Nodes are required to support an HA environment.

Worker nodes

A worker node is a node that provides a containerized environment for running tasks.

Proxy nodes

A proxy node is a node that transmits external requests to the services created inside your cluster. Multiple proxy nodes are deployed in a high availability (HA) environment to allow for failover if the leading proxy host fails. Proxy Nodes are co-located with the Worker Nodes. 3 or 5 Proxy Nodes are required to support an HA environment.

Shared Storage / Image Repository

Each master node needs access to persistent volumes.
The Docker image repository needs to be available on all nodes.

Network Requirements

The following are the network requirements for high availability.

The master nodes (minimum 3) must be on the same network subnet.
The proxy nodes (minimum 3) must be on the same network subnet.
The master and proxy clusters do not need to be on the same subnet unless master and proxy nodes are co-located.
Two additional IP addresses must be allocated which will serve as a floating VIP for whatever master / proxy is the active member of the cluster.
- Allocate one IP for the master cluster, and it must be on the same subnet as the master cluster.
- Allocate another IP for the proxy cluster which also must be on the same subnet as the proxy cluster.
Note:
The host names associated with these IPs do not matter, but a convention can be used for convenience. If the master nodes have hostnames master1, master2, master3, then a logical hostname for the master VIP could be the simple hostname "master".
The primary network interface on the master nodes and proxy nodes must be uniform across all nodes (eth0 by default).
CARP is used as a heartbeat to detect whether any nodes of a master or proxy cluster are up or down. If a node in a cluster does not receive the heartbeat, it will assume it must become the active member of the cluster. CARP uses the VRRP for this heartbeat.
Note: In virtualized environments, the following must be set to accept for the port group assigned to the subset hosting the virtual machines in the cluster:
- Promiscuous Mode
- MAC Address Changes
- Forged Transmits
- For more information
For more information, contact your VMWare™ administrator or review this article.

Image Registry in an HA environment

For HA, you must set up shared storage across your master nodes.

Important: Perform this step before installing ICp.

The following directories must be mounted on this shared storage:

/var/lib/registry - this directory is used to store images in the private image repository. This shared images directory is needed so that these images are kept synchronized across all master nodes.
/var/lib/icp/audit - this directory is used to store audit logs. Audit logs are used for tracking and storing data that is related to your IBM®® Cloud private usage.

This can be done using the --master_HA_mount_registry and --master_HA_mount_audit arguments when installing ICp, or it can be done manually with the following steps:

Provision a machine to act as an NFS Server.

Copy the NFS Image Repo setup script from the extracted zip on the boot server by running the following commands on the NFS Server:

sudo mkdir -p $HOME/nfsSetup
sudo cd $HOME/nfsSetup
sudo scp root@<IP Address of Boot Node>:/<extractedFolder>/microservices/hybridcloud/doc/samples/nfsImageRepoHASetup.sh .

Provide execution permission to the nfsImageRepoHASetup.sh script and run it in order to get NFS installed and configured:
```
sudo chmod +x nfsImageRepoHASetup.sh
sudo bash nfsImageRepoHASetup.sh
```
This will create the directories /CFC_IMAGE_REPO and /CFC_AUDIT and export these as NFS Shares for the master ICp servers to consume.

On each ICp Master in your HA environment, perform these steps to consume the NFS Share created above.

Create the directories

sudo mkdir -p /var/lib/registry
sudo mkdir -p /var/lib/icp/audit

Append the following to the file /etc/fstab on your ICp Master.

<Host_name/IP of NFS Server>:CFC_IMAGE_REPO /var/lib/registry   nfs  rsize=8192,wsize=8192,timeo=14,intr 
<Host_name/IP of NFS Server>:CFC_AUDIT /var/lib/icp/audit   nfs  rsize=8192,wsize=8192,timeo=14,intr

This will create the directories /CFC_IMAGE_REPO and /CFC_AUDIT and export these as NFS Shares for the master ICp servers to consume.

Enter this command to mount the share:
```
sudo mount -a
```
This command must be run on all ICp Masters.

To verify that the share is mounted successfully, enter:

df -k

Confirm that /var/lib/registry is mounted to <Hostname/IP of NFS Server>:/CFC_IMAGE_REPO and that /var/lib/icp/audit is mounted to <Hostname/IP of NFS Server>:/CFC_AUDIT

For example:

[root@<Hostname/IP of CFC Master> ~]# df -k
Filesystem                  1K-blocks     Used Available Use% Mounted on
/dev/mapper/rhel-root       102202368  4715088  97487280   5% /
devtmpfs                      3981568        0   3981568   0% /dev
tmpfs                         3997156      172   3996984   1% /dev/shm
tmpfs                         3997156     9144   3988012   1% /run
tmpfs                         3997156        0   3997156   0% /sys/fs/cgroup
/dev/sda1                      508588   202692    305896  40% /boot
tmpfs                          799432       12    799420   1% /run/user/0
tmpfs                          799432       16    799416   1% /run/user/42
<Hostname/IP of NFS Server>:/CFC_IMAGE_REPO 102202368 24478576  77723792  24% /var/lib/registry
<Hostname/IP of NFS Server>:/CFC_AUDIT 102202368 24478576  77723792  24% /var/lib/icp/audit

Persistent Storage in an HA environment

In an HA configuration, best practice is to maintain persistent storage separately from the ICp Masters themselves, on a machine that all ICp Masters can access (persistent volumes are set up after ICp is installed).

Otherwise, if the persistent storage is deployed on an ICp master that becomes unavailable, HA will failover to an elected ICp Master but will no longer be able to access the persistent storage. For more information, see Setting up persistent volumes on a high availability deployment (NFS) .