Customizing your Solr-based search service

In keeping with HCL's commitment to current and open standards, HCL Commerce Search uses Apache Lucene as the basis of its Search framework. Lucene powers the Apache Solr search engine and the Elasticsearch search engine. The indexing pipeline is a more open, flexible and scalable and is tightly integrated with the data service. Using the underlying dataflow technology and architecture, you can easily customize the pipelines. This open-standards approach considerably eases the process of integrating Search with existing and third-party applications.
HCL Commerce Search uses a multi-channel, container-based system for search and merchandising:
HCL Commerce Search

How Solr-based search works

At the front end, HCL Commerce Search interacts with users and other components of the store environment by using HTTP REST requests. On the server, it merges the speed of relational databases with the flexibility of unstructured XML. As queries come in, they are transformed into XML documents whose content is compared against a detailed index by the Apache Solr engine.

Search requests have the following processing cycle.
  1. A customer enters a search string in the browser. Autosuggest and look-ahead make this process easier. The browser submits the request to the storefront by HTTP, in the form of an JSON expression.
  2. The expression is routed to the search interface on the main server or a dedicated Search server. The search interface can accommodate multiple languages and high-volume transaction environments. The interface parses the expression and applies spelling corrections, thesaurus-lookup, and other optimizations. It also assigns a search profile to the incoming expression (declaring it to be a ProductView request, for instance).
  3. Any number of business and merchandiser rules can now come into play in expression preprocessing of the query. Customer profiles and segments can also alter the expression before it is processed. Dynamic states such as the current contents of the customer's cart can also influence the query at this stage. In B2B situations, access and entitlement rules can filter the results.
  4. The search processor compares the finalized contents of the search expression to the fields in the index.
  5. The results of this search can be processed again by using a customizable expression post-processing provider. A response is built, taking advantage of the storefront's customer-centric features, such as faceted navigation, previewing, and catalog navigation.
  6. The response is returned as XML and be further processed and presented in an appropriate manner, by using a landing page, preview, suggestion or within a widget.
In going through this cycle, the search request interacts with the following server components.

The Solr search server

The search server consists of a set of REST services, a search runtime framework, and a set of HCL Commerce foundation services that also provide access to the production database. The search runtime engine includes the search expression providers and expression processor. All of these services run together in a compact container. With the exception of index preprocessing, all Search-related functions take place inside the container.

REST services

The search interface uses REST to convert the incoming request into a specific resource call, for instance a ProductView call. The resulting expression is handed off to the expression processor, where more business logic will be applied.

Expression providers

Depending on the search profile of the request, various business components might get involved, such as Marketing for search-based merchandising rules, or Contracts for entitlement. Each business component can contribute a portion of the search expression through its own expression provider. Each contribution is combined with the main search expression that is generated by the REST services. The resulting search expression is run by the search processor.

The expression processor

The expression processor uses the Solr engine to run the search against the index, and captures the result for post-processing and final response.

The search index

The search index is the key to the system's power. This index is a large flat table that contains data fields that are optimized for search performance. Each field consists of a name, its content, and metadata that tells HCL Commerce Search how to handle the content. Typically, fields can contain Boolean values, numbers, or strings. A field is flexible, so you can define your own type in the system's schema file.

Catalog items must be in the index to be searchable. Therefore, before Search is deployed an indexing step is required. The index can be updated at regular intervals or as needed during the operation of the store. During the indexing step, Solr collects, parses, and files catalog data from the store to facilitate fast and accurate retrieval of information. In its emphasis on building and optimizing an index, the Solr engine acts more like a traditional database than a web server or transaction processor. It is its own application, and can reside on its own dedicated server.