Comparing Solr and Elasticsearch index lifecycles

Apache Solr and Elasticsearch have different approaches to updating your search index. Solr does delta updates on a regular schedule, usually every five minutes. Elasticsearch performs an incremental update in near real-time.

After the complete search index is built, you are able to update it quickly as needed. The two search architectures available in HCL Commerce Version 9.1 have slightly differing approaches to doing this update. Both Apache Solr and Elastsearch are embedded in the larger HCL Commerce system, so they share certain components, chiefly in the front-end of catalog updating and database access. Where they differ is in the way new items are added to the working index. Solr performs a regular batch operation based on updates to the CACHEIVL database table. Elasticsearch reads from CACHEIVL as well but then feeds the data into the Ingest service, where it enables near real-time preview of the updates in the Management Center.

Delta updates using Apache Solr

When you use Apache Solr as your search engine, all business data updates made in the Management Center are first written to the database. The sequence is:
  1. A business user working in the Authoring environment makes catalog updates in the Management Center. These changes update the databaes.
  2. The database update triggers an index invalidation request, which is inserted into the CACHEIVL database table.
  3. During the next indexing operation, typically run on a five-minute cycle, the TI_DELTA_CATENTRY and TI_DELTA_CATGROUP tables are read and the index updated accordingly.
    Note: Reindexing also happens when a user triggers a storefront preview in the Management Center. There can be a delay before this is visible in the preview.
  4. While the index is being updated, CACHEIVL can be receiving new invalidation requests, which will be incorporated into the next scheduled indexing operation.

Push-to-live in Solr

When staging propagation starts, the publish operation on the Authoring Transaction server replicates all production ready changes from the Authoring database to the Live database. This triggers a cache invalidation request, which is inserted into the CACHEIVL table of the Live database.

Once Staging Propagation completes, the Search Repeater starts IndexProp operation that pulls the latest version of all search indexes from the Search Master into the Search Repeater. One minute later, this version is pulled by each Search subordinate running in the Live environment.

Once index replication is completed, a cache invalidation restart request is inserted into the CACHEIVL table of the Live database. This database once again performs all the cache invalidation requests starting from a given timestamp using the IndexProp command.

Emergency updates in Solr

After an emergency update has been approved in the Solr environment, the scheduled commit operation on the Authoring Transaction server will push updates to both Authoring and Live databases. This triggers cache a invalidation for the CACHEIVL tables of both the Authoring and Live databases. Every five minutes an indexing scheduler job on the Live-environment Transaction server checks whether any indexing operation is required against the Search Repeater. While the indexing operation is running, fine-grain cache invalidation requests are inserted into the CACHEIVL table of the Live database. Any emergency index update on the Search Repeater will then be replicated to each Search Slave every minute

Incremental updates using Elasticsearch

While full indexing is possible in Elasticsearch (and must be done initially), the recommended approach is to do incremental updates. If the delta is small enough, the amount of invalidation in an incremental update will be relatively small compared to a full invalidation and reindex. This does not mean that you cannot do a full reindex, you just do not have to do it as often as you would with Solr. This is because Solr does not have event driven invalidation and has a much more coarse-grained invalidation, which requires more frequent reindexing. Elasticsearch has much more fine-grained invalidation and can be event-driven, which you can take full advantage of.

The Elasticsearch process starts in the same way as the Solr reindexing, but differs significantly in both how the index is updated and in how the update propagates through your data centers.

  1. A business user working in the Authoring environment updates the catalog in the Management Center.
  2. The update triggers an invalidation request which is sent to CACHEIVL.
  3. This data is also propagated along the Redis bus to the Ingest service, which uses Apache NiFi pipes to analyze the data, and update the Elasticsearch index.
  4. The user can call the Query service to view these updates in the Authoring environment in an effectively real-time manner. For more information, see the Query REST API.
  5. These incremental business changes from the Management Center are represented as Redis events and sent back to the Ingest service. Here they can be further analyzed and processed.
  6. Once the index has been updated, all the related cache invalidation events are sent to the Redis bus, where they can be replicated across data centers.

Push-to-live using Elasticsearch

When staging propagation starts, the Authoring Transaction server replicates all production ready changes from the Authoring database to the Live database. It also issues a staging propagation request to the Ingest service for triggering a replication of the approved updates from Authoring index to the Live index.

The data object and REST caches used in the Live instance Query service are also invalidated. If applicable, the JSP caches used by the Store server in the Live instance will be invalidated as well.

This push-to-live process does not require replication to subordinate nodes, as with Solr. Instead, a copy of the new live index is created in Live environment and is swapped in once the new index is ready. The old version is de-commissioned immediately.

HCL Commerce Version 9.1.4.0Restriction: If Catalog data changes are not available in your live stores after a push-to-live operation, trigger a WCT+ESINDEX invalidation operation when you make the update. For more information about the caches to be updated and the procedure, see Index changes are not reflected in storefront after Elasticsearch push-to-live.

Coordinated updates

You can rebuild the entire search index as a separate new index to minimize update time business impact. Once the new index is fully rebuilt, immediately after the Indexing Complete event has been issued, the current live index can be swapped out with the newly created index. This is done by simply replacing the current index alias.

If you load a large amount of data into an ingest connector, it can optionally insert it into a separate copy of the target index. This resembles a full index rebuild, but the new data is loaded on top of a copy of the target index.

The Elasticsearch Disaster Recovery strategy involves creating regular local ES Index Snapshots. This enables a speedy index recovery to the most recent incremental snapshot. Once the index is fully restored, the current live index can be swapped out with it, and an Indexing Complete event triggered.