HCL Commerce Version or later

Elasticsearch-based search performance tuning part II

Ingest Tuning Guide is a continuation of the previous tuning guide Elasticsearch based search perfomance tuning guide for the NiFi/ Elasticsearch components of the HCL Commerce search solution. This guide helps you gain a broad understanding of tuning and discuss how to tune the solution for a certain setup and catalog structure.

Multiple factors influence the performance of the search index build, including hardware footprint, catalogue size, and attribute dictionary data/ cardinality richness. Understanding the bottlenecks and how they express themselves across the whole process is crucial to fine-tuning the search solution.

Background information

The index creation process consists of three key steps:
  1. Data retrieval.
  2. Data processing/ Transformation.
  3. Data uploading.
A set of predefined connectors consisting of multiple process groups for different purposes is available. To handle the data retrieval, processing, and uploading stages of the index creation process, each process group often has nested process groups.
  • Retrieving data group: Fetch data from database or Elasticsearch.
  • Processing data group: For the fetched data: build, update, or copy the index documents.
  • Uploading data group: Upload the index document to Elasticsearch.

Each group has an influence on the index building process's speed and efficiency. The retrieving data group, for example, would be in control of the flow file size (bucket size) and query execution frequency (scroll page size). You can optimise the payload and retrieval cost from the database, as the chunk of data NiFi processes as a unit, by altering these variables. The size of the flow file affects Elasticsearch upload performance. Complicated and large structures can take longer time for Elasticsearch to parse, resulting in poor scalability.

The processing data group controls the amount of work NiFi can do. For example, you can regulate how many flow files may be processed concurrently by controlling the processor's thread count. This increases the processing speed of a typical processor, potentially improving flow file pileup in front of the processor. The NLP processor, for example, is a typical processor that benefits significantly from additional threads. You can control how many connections to Elasticsearch you make concurrently using the more specialised bulk update type processor, allowing you to import more data to Elasticsearch.

These scenarios will be examined in further detail in Interpreting patterns and tuning search solution, using real-world examples and data.

Infrastructure requirements

The product's infrastructure requirements are well defined, and while NiFi/ Elasticsearch may function on a reduced footprint, performance may suffer. You need good I/O bandwidth on both NiFi and Elasticsearch infrastructure, as well as a good amount of memory for Java heap and Java native memory allocation, and preferably, enough memory for file caching. The latter may need to be specified in the pod since it ensures that the Operating System has enough additional RAM for the service.

Key Tunable Variables

The following key tunable variables can be adjusted to improve overall processing time:

Processor thread count (Concurrent Tasks)

The default processor runs a single thread at a time, processing one flow file at a time. If concurrent processing is desired, the number of concurrent tasks that it can do can be adjusted. Set the number of threads for the process group by changing the processor Concurrent Tasks value (under Processor configuration or SCHEDULING tab).

Throughput can be improved if a CPU is able to multitask by increasing the number of threads it employs. The transformation processor (as in NLP) and the Bulk update processor are two such examples (sends data to Elasticsearch). This update does not help every processor. Most processors come with an default configuration that takes this variable into account and does not need to be altered. When performance testing reveals a bottleneck in front of the processors, the default configuration may benefit from further tuning.

When the processor can process flow files at the same rate as they come, the Concurrent Tasks value is ideal, preventing large pileups of flow files in the processor's wait queue. Because such a balance may not always be feasible, the best setup focuses on reducing the flow file pileup in the waiting queue.