Sizing Kubernetes for a production-grade cluster

This topic offers best practices for sizing Kubernetes for a production-grade, high-availabliliy cluster.

For production and high availability, we advise the following:

At least three master nodes
At least three non-infrastructure worker nodes
At least three exclusively infrastructure worker nodes

Note: The number three for nodes ensures only "by the book" high availability of all services.

Preconfigured Component Pack CPU and memory limits

To better understand how Kubernetes is managing its resources and what Component Pack's requirements are, check out this article in the Kubernetes documentation, and see the following tables:

Table 1. Limits for application containers
Application Container	Limits	Requests
analysisservice	CPU: 500m, memory 1Gi	CPU: 50m, memory 100Mi
appregistry-client	CPU: 500m, memory 400Mi	CPU: 50m, memory 75Mi
appregistry-service	500m, memory 500Mi	CPU: 100m, memory 150Mi
cnx-ingress-controller	CPU: 500m, memory 512Mi	CPU: 20m, memory 64Mi
community-suggestions	CPU: 500m, memory 400Mi	CPU: 50m, memory 75Mi
haproxy	CPU: 500m, memory 200Mi	50m, memory 50Mi
indexingservice	CPU: 500m, memory 1Gi	CPU: 200m, memory 100Mi
itm-services	CPU: 1, memory 500Mi	CPU: 100m, memory 75Mi
mail-service	CPU: 500m, memory 500Mi	CPU: 50m, memory 75Mi
middleware-graphql	CPU: 1, memory 500Mi	CPU: 100m, memory 75Mi
mw-proxy	CPU: 500m, memory 400Mi	CPU: 50m, memory 75Mi
orient-web-client	CPU: 1, memory 1Gi	CPU: 100m, memory 75Mi
people-idmapping	CPU: 500m, memory 400Mi	CPU: 50m, memory 75Mi
people-migrate	CPU: 1, memory 1000Mi	CPU: 100m, memory 75Mi
people-relation	CPU: 500m, memory 400Mi	CPU: 50m, memory 75Mi
people-scoring	CPU: 500m, memory 1500Mi	CPU: 50m, memory 75Mi
retrieval-service	CPU: 500m, memory 1Gi	CPU: 200m, memory 100Mi
userprefs-service	CPU: 500m, memory 400Mi	CPU: 50m, memory 75Mi

Table 2. Limits for infrastructure containers
Infrastructure Container	Limits	Requests
es-client	CPU: 2, memory 2Gi	CPU: 100m, memory 1536Mi
es-data	CPU: 2, memory 4Gi	CPU: 500m, memory 3Gi
es-master	CPU: 1, memory 1Gi	CPU: 100m, memory 768Mi
filebeat	CPU: 2, memory 2Gi	CPU: 500m, memory 512Mi
kibana	CPU: 3, memory 4Gi	CPU: 1, memory 1Gi
logstash	CPU: 3, memory 8Gi	CPU: 500m, memory 400Mi
mongo	CPU: 2, memory 3096Mi	CPU: 100m, memory 100Mi
redis-sentinel	CPU: 500m, memory 100Mi	CPU: 10m, memory 50Mi
redis-server	CPU: 1, memory 1Gi	CPU: 50m, memory 75Mi
sanity	CPU: 100m, memory 512Mi	CPU: 100m, memory 128Mi
sanity-watcher	CPU: 500m, memory 100Mi	CPU: 10m, memory 50Mi
solr	CPU: 2, memory 4Gi	CPU: 20m, memory 600Mi
zookeeper	CPU: 500m, memory 400Mi	CPU: 10m, memory 300Mi

Sizing the masters

The following recommendation is in line with best practices and the official Kubernetes recommendation. Note that master sizing is a function of the number of total nodes in the cluster and the number of users or requests that are going to come to the cluster. More active users mean more requests, and more requests mean more processing for a master.

For the optimal production scenario, we recommend at least three masters.

Note: Before you create your environment, be sure that you provision your cluster in a way that it can scale, so that you'll be able to add more nodes of any type later.

Maximum Number of Nodes in Cluster	Resource Requirements	AWS Equivalent
Up to 100 nodes (Kubernetes documentation)	4 CPUs, 16G of RAM, 100G disk space	M4.xlarge

Sizing the workers

Sizing any type of workers is a function of what you are going to run there and the sum of the limits of all those containers.

To run all the services shipped with Component Pack, we suggest at least three workers (each running one replica of each pod) with at least 8 cores and 32G of RAM (AWS equivalent would be m4.2xlarge type of instance). Remember that we are sizing for the scenario when everything is running using 100% capacity, not for the scenario to simply start the services without any load.

However, if you start noticing any performance issues with resource usage, try creating another three infrastructure workers, which will automatically take over the load for everything tagged to run on infrastructure workers.

Sizing the storage

Persistent volumes are a firm requirement for Component Pack, but even without it, nodes need disk space for caching the images and maintaining normal system operation.

For normal system operation, it is best for each master to have at least 100G of dedicated disk space, and for each worker at least 150G of dedicated disk space.

For persistent volume storage (used for ElasticSearch, Customizer, MongoDB, Solr, and Zookeeper), we suggest at least 200G of storage.