Sizing Kubernetes for a production-grade cluster

This topic offers best practices for sizing Kubernetes for a production-grade, high-availabliliy cluster.

For production and high availability, we advise the following:
  • At least three master nodes
  • At least three non-infrastructure worker nodes
  • At least three exclusively infrastructure worker nodes
Note: The number three for nodes ensures only "by the book" high availability of all services.

Preconfigured Component Pack CPU and memory limits

To better understand how Kubernetes is managing its resources and what Component Pack's requirements are, check out this article in the Kubernetes documentation, and see the following tables:

Table 1. Limits for application containers
Application Container Limits Requests
analysisservice CPU: 500m, memory 1Gi CPU: 50m, memory 100Mi
appregistry-client CPU: 500m, memory 400Mi CPU: 50m, memory 75Mi
appregistry-service 500m, memory 500Mi CPU: 100m, memory 150Mi
cnx-ingress-controller CPU: 500m, memory 512Mi CPU: 20m, memory 64Mi
community-suggestions CPU: 500m, memory 400Mi CPU: 50m, memory 75Mi
haproxy CPU: 500m, memory 200Mi 50m, memory 50Mi
indexingservice CPU: 500m, memory 1Gi CPU: 200m, memory 100Mi
itm-services CPU: 1, memory 500Mi CPU: 100m, memory 75Mi
mail-service CPU: 500m, memory 500Mi CPU: 50m, memory 75Mi
middleware-graphql CPU: 1, memory 500Mi CPU: 100m, memory 75Mi
mw-proxy CPU: 500m, memory 400Mi CPU: 50m, memory 75Mi
orient-web-client CPU: 1, memory 1Gi CPU: 100m, memory 75Mi
people-idmapping CPU: 500m, memory 400Mi CPU: 50m, memory 75Mi
people-migrate CPU: 1, memory 1000Mi CPU: 100m, memory 75Mi
people-relation CPU: 500m, memory 400Mi CPU: 50m, memory 75Mi
people-scoring CPU: 500m, memory 1500Mi CPU: 50m, memory 75Mi
retrieval-service CPU: 500m, memory 1Gi CPU: 200m, memory 100Mi
userprefs-service CPU: 500m, memory 400Mi CPU: 50m, memory 75Mi
Table 2. Limits for infrastructure containers
Infrastructure Container Limits Requests
es-client CPU: 2, memory 2Gi CPU: 100m, memory 1536Mi
es-data CPU: 2, memory 4Gi CPU: 500m, memory 3Gi
es-master CPU: 1, memory 1Gi CPU: 100m, memory 768Mi
filebeat CPU: 2, memory 2Gi CPU: 500m, memory 512Mi
kibana CPU: 3, memory 4Gi CPU: 1, memory 1Gi
logstash CPU: 3, memory 8Gi CPU: 500m, memory 400Mi
mongo CPU: 2, memory 3096Mi CPU: 100m, memory 100Mi
redis-sentinel CPU: 500m, memory 100Mi CPU: 10m, memory 50Mi
redis-server CPU: 1, memory 1Gi CPU: 50m, memory 75Mi
sanity CPU: 100m, memory 512Mi CPU: 100m, memory 128Mi
sanity-watcher CPU: 500m, memory 100Mi CPU: 10m, memory 50Mi
solr CPU: 2, memory 4Gi CPU: 20m, memory 600Mi
zookeeper CPU: 500m, memory 400Mi CPU: 10m, memory 300Mi

Sizing the masters

The following recommendation is in line with best practices and the official Kubernetes recommendation. Note that master sizing is a function of the number of total nodes in the cluster and the number of users or requests that are going to come to the cluster. More active users mean more requests, and more requests mean more processing for a master.

For the optimal production scenario, we recommend at least three masters.

Note: Before you create your environment, be sure that you provision your cluster in a way that it can scale, so that you'll be able to add more nodes of any type later.
Maximum Number of Nodes in Cluster Resource Requirements AWS Equivalent
Up to 100 nodes (Kubernetes documentation) 4 CPUs, 16G of RAM, 100G disk space M4.xlarge

Sizing the workers

Sizing any type of workers is a function of what you are going to run there and the sum of the limits of all those containers.

To run all the services shipped with Component Pack, we suggest at least three workers (each running one replica of each pod) with at least 8 cores and 32G of RAM (AWS equivalent would be m4.2xlarge type of instance). Remember that we are sizing for the scenario when everything is running using 100% capacity, not for the scenario to simply start the services without any load.

However, if you start noticing any performance issues with resource usage, try creating another three infrastructure workers, which will automatically take over the load for everything tagged to run on infrastructure workers.

Sizing the storage

Persistent volumes are a firm requirement for Component Pack, but even without it, nodes need disk space for caching the images and maintaining normal system operation.

For normal system operation, it is best for each master to have at least 100G of dedicated disk space, and for each worker at least 150G of dedicated disk space.

For persistent volume storage (used for ElasticSearch, Customizer, MongoDB, Solr, and Zookeeper), we suggest at least 200G of storage.