Migrating from the Solr-based to the Elasticsearch-based search solution

The following table describes the Solr-based and Elasticsearch-based search solutions and their conceptual and technical paradigms. A guideline is also presented for considerations for migrating from the Solr-based search solution to the Elasticsearch-based solution.

Solution overviews

Solr-based search solution

The Solr-based search solution is used in previous releases of WebSphere Commerce, and HCL continues to support its use with HCL Commerce Version 9.1.

Elasticsearch-based search solution
With the release of HCL Commerce Version 9.1, a new search and indexing platform based on the popular open-source Elasticsearch search engine was introduced. In this new search platform paradigm:
  • The Ingest Service provides a new data loading architecture, allowing for easier customization and incremental index updates.
  • The Query Service provides enhanced search relevancy for site users navigating through the catalog on the storefront.
This micro service-based search solution retains backwards compatibility with the HCL Commerce Transaction server, storefront, and externalized customization (xC) customizations. Advancements in natural language parsing, part-of-speech tagging, and text mining enable more sophisticated merchandising and analysis capabilities.

Migration guidelines

The following table contains general guidelines to help you migrate your Solr-based search solution to the Elasticsearch-based search solution.

Migration Item With Solr-based search With Elasticsearch-based search Migration guideline
Ingest / Indexing
Index Schema Index schema definitions (such as field, filter and analyzers) are defined in schema.xml and x-schema.xml files.

See Solr schema file (schema.xml) for details.

Defined in data specification for each object type (product, category, attribute, etc.) that is part of the NiFi Ingest Service connector. Index field definitions that were previously located in the Solr schema.xml are now defined in data specification for each object type (product, category, attribute, etc.) that is part of the NiFi Ingest Service connector.

For the specifications of the different connectors that are used, see Ingest Product index pipeline.

Pre-process Configuration Pre-process configuration is defined in XML files and data is processed in temporary database tables and views.

For more information on search preprocessing, see Configuring the search preprocessor.

The di-preprocess command is available to launch pre-processing for the search index in WebSphere Commerce version 7 and 8. This command has been removed from HCL Commerce version 9, and pre-processing is now automatically processed as part of the index.

Defined in NiFi Ingest Service connector (consisting of input, mapping, processing and output pipelines). The approach to preprocessing data for the index has changed significantly with the new HCL Commerce version 9.1 architecture.

Preprocess configuration is no longer defined in XML files and there are no temporary database tables needed for flattening of the index data. The new Ingest Service handles the extraction or ingestion and has the flexibility to process data from many different sources and to transform the data in memory in a high-performance Apache NiFi cluster. The extract, transform, and load (ETL) processes are defined in NiFi Ingest Service connector (consisting of input, mapping, processing and output pipelines). For more information, see Ingest Product index pipeline.

Extension indexes Some data, such as inventory and price, can be built into a separate index so that it can be updated separately with its own frequency.

For an example of setting up a separate inventory index, see Configuring the search preprocessor.

Not required with the new incremental index update function. Extension indexes are no longer necessary. This solution was introduced for use in the Solr-based search solution to avoid rebuilding the entire base Catalog Entry (product) index document when only certain frequently changed fields, such as inventory and price, are updated. Now with the ability of incremental index update, it is possible to update only the desired fields in any existing indexed document. Limitations of extension indexes, including the inability to perform sorting, faceting, grouping are now lifted with HCL Commerce 9.1 with the Elasticsearch-based based search solution. This is due to the fact all of the fields are now in the same product index.
Unstructured content indexes The unstructured content such as PDF files or attachment files for products can be indexed and categorized under the catalog entry search index.

For information on indexing unstructured content with the Solr-based search solution, see Unstructured and site content.

A single ingest connector needs to be created for each source of documents. In HCL Commerce 9.1, this process has been simplified. A single ingest connector needs to be created for each source of documents. This process allows you to more easily customize the extensions that will consume the data and load it into the indexes of the search component.
Site content crawler Unmanaged site content of store can be crawled and indexed into the Solr index using the crawler REST API. A crawler processor can be used to gather and index web page URLs and/or scrape and index page content.

Over 250 freely available NiFi processors can be used to handle different ingestion scenarios.

There are also many freely available use case examples to aid in their implementation.

With HCL Commerce 9.1, the site crawler can continue to be used to gather the URLs of the pages, but it will not automatically load the data into the search index.

However, a custom connector and pipeline can be created to load the data.

The following articles highlight how to build a web crawler pipeline with NiFi:
Launch building index The di-buildindex command is available to launch index builds for WebSphere Commerce version 7 and 8.

The build index REST API is available in HCL Commerce version 9 and 9.1 to launch index builds. See Building the HCL Commerce Search index for details.

The NiFi pipeline output will trigger an Elasticsearch index build or update. In HCL Commerce 9.1 the search index can be built by triggering a NiFi connector job.

See Building the Elasticsearch index for more details.

Query / Runtime
Search runtime configuration Search application features related configuration, such as search profile, query expressions, facets, etc. can be defined in wc-server.xml. See HCL Commerce Search configuration file (wc-search.xml) for details.

Search server related configuration, such as spell check, relevancy, search runtime properties, etc. can be defined in wc-component.xml file. See Search properties in the component configuration file (wc-component.xml) for details.

In general search configurations are stored in zookeeper with Query REST services.

In HCL Commerce 9.1, ZooKeeper is used to store your custom configurations. At runtime, each microservice polls ZooKeeper for any custom configurations to automatically override default behaviors, such as query responses. For the Ingest Service, ZooKeeper stores connector descriptors that are used for loading new custom NiFi connectors.

For more information on setting up custom profiles in ZooKeeper, as well as implementing custom search options, see Configuring Query services in ZooKeeper.

Search profiles Search profiles control the behavior of search on the storefront by manipulating the final search query and formatting the search responses.

The search profile is defined in the wc-search.xml and contains the fields that are being searched, expression providers, the query pre-processors and post-processors to be used, and other relevant information.

Search profiles still define the search runtime behavior. However, profile information is now stored in the dynamic configuration service using ZooKeeper, and can be managed using REST. Existing Solr-based search profiles should be assessed, along with the defined expression providers, and any custom query pre and post processors in the profiles.

The introduction of Natural Language Processing (NLP) in the new HCL Commerce version 9.1 Elasticsearch-based search solution may provide an opportunity to remove runtime customizations that are defined in your Solr-based search profiles.

To set up search profiles in ZooKeeper, see Configuring Query services in ZooKeeper.

Query expression providers Query expression providers allow for modifications to be made to queries before they are read by query pre-processors and added to the search query. This allows for more control over search parameter values. The existing query expression provider extension programming framework is also provided with the Elasticsearch-based search solution.
Note: The Query Service does not provide direct database access. Data should be indexed, or additional microservices should be built to expose any custom data that is required.
Custom expression providers related to search relevancy and ranking should be assessed and evaluated considering the addition of Natural Language Processing (NLP).
Query pre-processors Query pre-processors are used to modify the query before it is processed by Solr. You can use control parameters that are provided for the search request to add data to the query.

For example, you can add a sort parameter based on the value in the _wcf.search.sort control parameter.

The existing query pre-processor provider extension programming framework is also provided with the Elasticsearch-based search solution.
Note: The Query Service does not provide direct database access. Data should be indexed, or additional microservices should be built to expose any custom data that is required.
The Elasticsearch-based search solution provides significant enhancements to search matching and ranking with the introduction of Natural Language Processing (NLP) that may provide an opportunity to remove runtime customizations related search relevancy. Custom query pre-processors related to search relevancy and ranking should be assessed and evaluated considering the addition of NLP.
Query post-processors Query post-processors are used to modify the query results before they are returned as the search response. The Elasticsearch index, and the structure of the search response are both structured differently from Solr. Hence, the two solutions also use different post-processors.

Any custom post-processors will need to be recreated to conform with the Elasticsearch solution.

Note: The Query Service does not provide direct database access. Data should be indexed, or additional microservices should be built to expose any custom data that is required.
Custom query post-processors should be assessed to determine whether the customization could be recreated an a NiFi Ingest Service connector, and used to structure the static Elasticsearch index.

Custom post-processors that cannot be moved to index-time will need to be recreated, as the Elasticsearch search response is in a different format from the Solr search response.

Storefront REST API HCL Commerce V1 REST API HCL Commerce V1 and V2 REST API Storefronts should be evaluated for compatibility, and updated to utilize the HCL Commerce V1 REST API.
Configuration
Solr-specific configurations Some Solr configurations are in Solr specific configuration files. For example, core configuration properties and request URL mappings are in solr.xml; and Solr request handler mapping and Solr parameters are in solrconfig.xml file. These configurations are Solr specific, and are not needed for the Elasticsearch-based search solution. These configurations are not needed in Elasticsearch-based search solution.
Search configurations in STORECONF Some search properties, such as price mode and entitlement, are configured as an entry in STORECONF database table. Existing properties that are configured in STORECONF are still valid.

Some new properties are also stored in STORECONF. For example, a property to tell if a store is headless or not.

No migration is needed.
Search Merchandising & Relevancy
Search rules Search Rules can be defined in the Marketing tool in Management Center for HCL Commerce. Existing search rules are still valid. No migration is needed.
Search term associations Search Term Associations are defined in the Catalogs tool in Management Center for HCL Commerce. Existing search term associations are still valid. No migration is needed.
Tuning Relevancy Search relevancy can be tuned in Management Center for HCL Commerce with the combination of search rules, search term association, and other search tuning functions. Search relevancy can be tuned with the Natural Language Processing (NLP) feature. Search relevancy in the Elasticsearch-based search solution provides significant search matching and ranking improvements with the inclusion of Natural Language Processing (NLP) and part-of-speech tagging. New NLP filters have been added that provide expanded opportunities to improve the search experience.
Hero product image The hero product image functionality returns the most representative item for a product search. The hero product image functionality returns the most representative item for a product search. No migration is needed.
Deployment
Search caching Search caching uses a data object cache using Dynacache with Kafka and ZooKeeper for cache invalidation. HCL Cache, with Redis as cache provider, provides REST and data object caching that offers event-based, course-grained, cache invalidation. Time-based (TTL) cache invalidation is provided as a fallback. Client microservices to expose custom data should be evaluated for caching and invalidation requirements.

Custom Kafka events (event bus) should be migrated to Redis.

Local store A local store, that is, a store running on the Transaction server, works with Solr-based search. A REST-based local store can work with the Elasticsearch-based search platform.

A BOD-based local store will not work with the Elasticsearch-based search solution.

It is recommended to use a remote store, or headless store to work with the Elasticsearch-based search solution.
Sharding and scalability Limited index scaling.

No query or runtime sharding or scaling.

Scalable Ingest Service microservices.

The Query Service provides runtime dynamic scalability and distributed search services using Elasticsearch clustering.

No migration is needed.