HCL Commerce Version 9.1.15.0 or later

Indexing data lifecycles in dual-nodegroup Elasticsearch configurations

Different processes are involved in indexing data lifecycles in dual Elasticsearch nodegroup configurations. The affected processes include Near-Real-Time updates, offline dataloads, Push-To-Live scenarios, and Update-Live operations. Cache invalidation processes are also performed with each of these updates.

Near Real Time (NRT) incremental updates in the Authoring​ environment

All business data updates in the Management Center for HCL Commerce are written to the database first. This stage is followed by an incremental update event that includes Authoring context through Redis.​ There is business logic incorporated into the Apache NiFi indexing pipeline for analyzing and processing these change requests.​ Acting as the message bus, Redis broadcasts this change request to all connectors (in NiFi) that monitor for this kind of operation, resulting in incremental updates being made to appropriate Search indices. This workflow is indicated by a blue dashed line in the following diagram.​

The gray dashed line shows how the data object cache and REST caching used in the Query Service for the Authoring environment are invalidated. JSP caching used in the store server may also be invalidated if this is applicable.​ This process can provide an NRT update experience via the Query service when previewing the storefront in the Authoring environment​, as indicated by the yellow dashed line in the diagram.

Offline dataload

After Dataload updates the Commerce database with the business data, the corresponding change history is then written to the two TI_DELTA database tables (as is done for the Solr search engine). When the load operation completes successfully, an event will be sent through Redis to NiFi to launch an indexing flow in NiFi. This data flow is indicated by the blue dashed line in the following diagram.

Data object cache and REST caching used in Query Service for Authoring environment are invalidated; JSP caching used in Store server may also be invalidated if applicable. This process is indicated by the gray dashed line in the diagram. Because Dataload is a batch, non-interactive operation that runs in the background and may contain a large amount of data updates, the search indices may take a bit longer to complete and changes only be visible after cache invalidation takes place.

Push-To-Live in dual nodegroup scenarios

When staging propagation starts, the publish operation on the Authoring transaction server will push and replicate all production ready changes from the Auth database to the Live database. It will also issue an index propagation Push-To-Live request to the Ingest service for triggering a replication of the approved updates from Authoring index to the Live index. This process is indicated by the dashed blue line in the following diagram.

This Push-To-Live operation no longer requires replicating to subordinate nodes. Instead, a copy of the new live index is created in Live and is swapped in once the new index is ready. The old version is decommissioned, depending on the value of the parameter alias.keep.backup. You can force decommissioning by setting its value to 0.

This dual Elasticsearch node group setup has a few advantages over the single Elasticsearch node group setup:
  • It enables one dedicated set of nodes and capacity to be used for ingest operations, and another dedicated only for end users' browsing and search traffic.
  • It minimizes the change window that affects shoppers when indexing is performed directly against Live indexes. It also eliminates the window that affects shoppers when indexing is performed directly against Auth indexes.
  • The data object cache and REST caching used in the Query service for the Live environment are invalidated. JSP caching used in the store server may also be invalidated if applicable. This process is indicated by the dashed yellow lines in the diagram.

Update-Live

The Update-Live operation allows Live Feeds to update Live indexes directly in the dual Elasticsearch node group setup without going through Authoring. This “push” operation, indicated by the dashed blue line in the following diagram, can be thought of as performing only the second half of the Push-To-Live operation. It does not publish anything from Authoring to Live.

You can store data from live feeds, such as inventory and price, in place-holder indexes (for example, live.inventory, live.prices) based on a recurring time schedule. This can later trigger the Update-Live connector to push the changes out to the catalog indexes in the Live environment.

Fine-grain cache invalidation is issued once this Update-Live operation completes successfully, as indicated by the dashed yellow lines in the diagram.