Determining the number and placement of replicas in a cluster

There are two major reasons to create a replica for a database in a cluster -- to provide constant availability of the data and to distribute the workload between multiple servers. Before you create replicas in a cluster, consider how frequently users access a database and their need for data redundancy. If a database is extremely busy or its availability is extremely important, create multiple replicas and locate them on your most reliable servers. For databases that are not very busy and whose constant availability is not important, you may not want to create any replicas at all. A server log file, for example, does not need to have a replica on another server.

The more replicas of a database, the more accessible the data. Creating too many replicas, however, can add unnecessarily to the overhead of maintaining a system and affect performance. As you plan your cluster strategy, try to create a balance between your users' requirements for data availability and the physical ability of each server in your cluster to manage additional workload. More than three replicas of a database may not provide you with significant incremental availability. If users can adequately access a database from one or two servers, do not increase the number of replicas in the cluster.

When users require the constant availability of a specific database, consider placing replicas on every server in the cluster if you have adequate disk space and resources.

In addition, try to distribute the busiest databases to different servers so that no server contains too many busy databases. If the servers in the cluster all have a similar amount of processing power, you can have an equal load on each server, including the processing power reserved for failover. If a server has significantly more or less processing power than the other servers, consider changing the number of databases on the server and the number of databases that can fail over to the server. Also, distribute mail files across a cluster, or set up separate servers or separate clusters for mail.

Because busy databases in a cluster can create a lot of replication events, it is a good idea to install these replicas on the fastest disk hardware available in the cluster. If possible, place these replicas where other processes are not in contention -- for example, on a partition other than the one that contains the operating system swap file.

To view which databases and replicas already exist in the cluster, open the Cluster Database Directory (CLDBDIR.NSF). It contains a document that stores information about each database and replica in a cluster.

Note: Selective replication formulas work differently in a cluster.

How many replicas to create

The following list describes some factors to consider when determining how many replicas to create.

  • The number of replicas of a database you create depends on how important the availability of that database is and the amount of use the database receives.
  • You should create at least one replica of a database for which you want data redundancy. If a database becomes unavailable, users can then fail over to the replica.
  • If you want to be sure that a database is available at all times, you can create more than one replica. The more important availability is, the more replicas you should create. Add multiple replicas for very important databases only. Unneeded replicas can diminish cluster and network resources.
  • For most databases, a single replica is adequate. Rarely are more than three replicas needed, unless a database is truly mission-critical.
  • Consider the power and bandwidth of your system when creating replicas. The busier a database is, the more network traffic and processing power it takes to keep replicas updated. If you have systems with limited power and bandwidth, you may want to create fewer replicas of busy databases than you would if you had more power and bandwidth, or you may want to add more processors and other resources to the servers. In a cluster with limited resources, creating replicas of busy databases can be counterproductive because of the additional resources needed for cluster replication. (Clustering is not a solution for inadequate resources.) The less busy a database is, however, the less overhead it takes to keep that database updated.
  • If you aren't sure how many replicas to create, start with one and track the cluster statistics. If the statistics show that the server becomes unavailable or that performance becomes a problem, increasing the number of replicas may solve the problem.
  • Do not create replicas of databases for which availability or workload balancing is not one of your goals.

Analyzing databases to determine the number of replicas

There are many factors to consider when deciding how many replicas to create. Some factors suggest creating more replicas, and some suggest creating fewer replicas. The following list describes those factors and how they might affect your cluster traffic and performance.

Prior to distributing databases in a cluster, it can be helpful to create a table of information about the databases and the cluster hardware. You can use the table to determine how important specific databases are and how adequate your resources are. You can include some or all of the following:

  • Titles of the databases

    This identifies each database.

  • Size of each database

    Large databases consume a lot of disk space. Depending on your disk capacity, you may want to create fewer replicas of larger databases to preserve disk space.

  • Number and distribution of database users

    If you have a large number of users, they will probably experience better performance if usage is spread across multiple servers. This requires multiple replicas. If the number of users is small, they probably won't notice a performance improvement from additional replicas.

  • How often user transactions take place

    If the transaction rate is high, creating multiple replicas may improve performance.

    To find out the rate of activity for a database, look in the HCL Notes® log file.

  • Expected volume of new data

    If you expect a large amount of new data in the database, additional replicas may slow down performance because cluster replication will cause a lot of additional traffic. If you have powerful servers and a lot of bandwidth, this may not create a problem.

  • Capacity of HCL Domino® server hardware

    The more powerful the servers and the more disk space they have, the more active replicas you can create without significantly affecting performance.

  • Type of network connections between servers

    Cluster replication can create a bottleneck on a network that does not have enough bandwidth. Therefore, the greater the bandwidth, the more replicas you can create.

  • How critical the database is to the functioning of your business

    For databases that are mission-critical, you should create multiple replicas. For databases where availability is less important, create fewer replicas or none at all.

Example table

This table helps identify which databases require high availability, which databases are busiest, and how much additional disk space you will need in the future. In this example, two databases are very important and are growing rapidly. You should be sure that there are enough replicas of these databases so that they are always available. You should also be sure there is adequate disk space for growth on every server that contains a replica of these databases. One database is of medium importance, not growing as quickly, and not very active. You should provide no more than one replica of this database, unless it would affect your business negatively if the database was not available for a while. One database is not very important and does not require a replica in the cluster.

The number of concurrent users helps you determine the need for workload balancing.

The following table uses a subset of the preceding information to determine the number of replicas needed.

Table 1. Sample table of organization-specific database information

Database Title

Size

Maximum Concurrent Users

Transaction Rate

Growth Rate

Need for Availability

Suggested Number of Replicas

Product Discussion

4GB

600<nozeros>

High

High

High

2<nozeros>

Sales Tracking

1GB

200<nozeros>

Medium

High

Critical

2 or more

Company Research

2GB

20<nozeros>

Low

Medium

Medium

0 or 1

Classified Ads

1GB

50<nozeros>

Medium

Medium

Low

0<nozeros>