HCL Commerce Version 9.1.12.0 or later

Troubleshooting: Slow search performance with 504 error

When too many business functions have been added to an Elasticsearch query, causing very high CPU utilization and long query response time from Elasticsearch.

Problem

When too many business functions have been added to an Elasticsearch query, causing high CPU utilization and long query response time, you may consider enabling a query timeout circuit breaker to avoid runaway threads gobbling up inside Elasticsearch. This incident will eventually saturate all Elasticsearch nodes, causing a prolonged response time and may even crash the system.

Solution

To enable the query timeout circuit breaker, perform the following settings in the component configuration:
  • Use the global maxTimeout in the component configuration to control the Elasticsearch query timeout.
  • Set allowPartialSearchResults to true in the component configuration to enable partial search results to be returned.

Once the query timeout circuit breaker is open, a partial search result set will be returned upon timeout, and no error will be returned to the caller. At the same time, the backend server will terminate all pending operations that have not yet been completed to prevent any runaway operations from remaining inside Elasticsearch. In addition, this partial search result set will not be cached. This query will only be cached when the entire query has been successfully processed and the complete search result has been returned.