Docker performance tuning

The following performance tuning information can be used to help you configure your containerized HCL Commerce environments for optimal performance.

In HCL Commerce Version 9.0, the application servers (Store, Transaction, Search) run within Docker containers. These Docker containers are often managed by some form Docker orchestration system, K8S, IBM Cloud Private, and so on.

Previously, HCL Commerce application servers ran within physical box environments (bare metal servers), or virtual boxes, like LPAR, VMware, and XEN. Docker brings another level of virtualization, but the Docker implementation technology differs from traditional virtualization, and thus impacts how you tune for performance.

The following specifications are made based on internal testing results. Your production environment might differ from the environments that are used for internal testing. Always monitor your production environment statuses to adjustment your performance tuning.

Docker container resource configuration

Docker containers include the following general computing resources:
  • CPU
  • Memory
  • Network I/O
  • Disk I/O
In general, the Network I/O or Disk I/O for the application Docker containers are not limited. Limitations are placed only on the CPU and Memory usage of the application containers. Memory resource limitation is more straight-forward, since it uses a single-sizing parameter. Whereas, CPU resource limitation is a bit more complex.

Comparison between two different CPU resource settings

Docker provides several different approaches to control CPU resources. For more information about controlling CPU resources, see Docker runtime constraints on resources.

CPUset/CPU-binding and CPU-quota are commonly used tuning approaches that are simple to understand. Concurrent Linux-based Docker hosts often have multiple virtual CPUs. CPUset binds Docker container processes to specific virtual CPUs. So the upper-limit of the CPU resource that the Docker container can use is controlled by the number of virtual CPUs assigned. In production Docker environments, the Docker container might not use the limit because other Docker containers can compete with it on the same virtual CPU. Alternatively, CPU-quota is a different CPU time-based approach. This approach controls the total CPU time specific that the Docker container can use, but the Docker container can use different virtual CPUs freely. At first glance, the two approaches might appear the same. However, when you have many virtual CPUs, there can be significant performance differences between the two approaches. Switching Docker processes among different virtual CPUs has its own cost. The bigger the gap that is between the CPU-quota number and the total virtual CPU number, the higher the possibility is that one Docker process needs to switch among different virtual CPUs.

Internal testing conducted a comparison between the two approaches for limiting CPU resources. The test used a bare metal server with 24 virtual CPUs. Three test cases were performed with the three different HCL Commerce application servers: Store, Transaction, and Search. For each test, one Docker container was used for the target application tier. The environment was also configured so the only possible bottleneck was the CPU resource for the target application tier. In other words, all other potential bottlenecks were removed.

Internal testing revealed the same pattern for all three servers:
  • In general, CPUset showed better performance than the CPUquota approach.
  • The optimal virtual CPU number is 6~8. Too less (<=4), or too many (>=10) virtual CPU numbers might negatively impact performance.

Recommendation for CPU resource setting

In general, the CPUset is a safe approach to limit the CPU resources for HCL Commerce Docker containers. This approach can be used with docker-run or docker-compose to manage or run your containers. Unfortunately, CPUset mode limits your capability to freely move Dockers among different Docker hosts, which can limit the Docker orchestration system design. So most Docker systems support only the CPUquota approach to control the CPU resource usage. And there is often a "default" CPU number for each application Docker. For example, the CPU resource limit is 2 virtual CPUs in K8S/IBM Cloud Private systems. This should not be a problem if the Docker host has only a few virtual CPUs. However, if the Docker host has many virtual CPUs, consider changing the CPU limit to 6~8 for application containers.

Recommendation for Memory resource setting

HCL Commerce application servers are all Java programs. With the introduction of containerization, confusion can occur around the JVM heap size and the Docker process memory size. Java programs require two parts of memory: JVM heap, and Native heap, specifically the memory that is used for JVM management program rather than Java application code. The total physical memory required for HCL Commerce Docker containers is the sum of the two parts.

Currently, HCL Commerce application servers require a JVM heap size of 2 GB~4 GB. For more information, see HCL Commerce default tunables.

The default memory resource limit for application containers in K8S/IBM Cloud Private system is 4 GB. When the HCL Commerce application JVM heap nears 4 GB, the total memory size is larger than 4 GB, since there is more space required for native heap. As a result, the memory is capped and the application container is terminated.

The correct solution in this case is to add extra buffer for native heap. In most cases, you can add a 2 GB buffer for native heap. If the max JVM heap size is 4 GB, set the resource limit for the application container to 6 GB. If the max JVM heap size is bigger than 4 GB, increase the Docker memory limit accordingly. Like all performance tuning, performance testing is required to determine these values.

Note: All the previous CPU/Memory settings are specific to production, or performance test environments. For authoring, or functional test environments, the system is often under utilized, so the CPU/Memory limit can be set to smaller value; for example, 2 virtual CPU / 4GB-memory.