Building the search index

The HCL Commerce Search index is built by using the Build Index call to the REST API.

The following diagram illustrates the relationship between preprocessing and index building in HCL Commerce:
Full build, preprocess, and data import

The Build Index call extracts and flattens HCL Commerce data and then outputs the data into a set of temporary tables inside the HCL Commerce database. The data in the temporary tables is then used to populate the data into search indexes by using the Data Import Handler (DIH). When you have multiple indexes, for example, each language uses its own separate index, the index is built multiple times.

Preprocessing data

Preprocessing data is an automatic step. It involves querying HCL Commerce tables and creating a set of temporary tables to hold the data. By default, preprocessing is used for HCL Commerce attributes. The default data preprocessors are based on the configuration information that is defined in wc-dataimport-preprocess.xml to process the data.

The table is first loaded in the wc-dataimport-preprocess-fullbuild.xml or wc-dataimport-preprocess-deltaupdate.xml files, since the process might be time consuming. This process helps keep the data consistent between the temporary tables. However, these two files are for the same temporary table. The SQL statement to get the data differs for full index builds and delta index builds.

For example, all the qualified catalog entry IDs for a master catalog are stored when the REST call is started. A benefit for this approach is that whether used for full index builds or delta index build, all the other data import preprocessing-related configuration files remain the same.

Sample preprocessing configuration files

Sample files can be found in the directory WCDE_installdir\WC\xml\search\dataImport\v3\db2 For more information, see Temporary table schema definition. The naming convention for the configuration files is wc-dataimport-preprocess-*.xml.
Important: For large index sizes, specifying a larger batch size can decrease build times. A setting of 300,000 to 500,000 is recommended, depending on the amount of free system memory. The following snippet specifies a batch size of 300,000:

<_config:data-processing-config 
  	processor="com.ibm.commerce.foundation.dataimport.preprocess.CatalogHierarchyDataPreProcessor"
  	masterCatalogId="10101" batchSize="300000">
	...
  </_config:data-processing-config>
This step caches some of the information that can be reused to determine all of the ancestor catalog groups for each catalog entry. The process results in fewer hits to the database to determine this information.

Index-building and the Data Import Handler (DIH)

The index building REST API is a wrapping utility that uses the DIH service to build the index, either partially through delta index updates or completely through full index builds. The DIH uses URLs to call commands. For example:

http://host:port/solr/MasterCatalog_CatalogEntry_en_US/dataimport?command=full-import
The index building utility uses DIH to connect to the HCL Commerce database through a JDBC connection. It crawls the temporary tables that are populated by the preprocess utility, and then populates the Solr index. The wc-data-config.xml configuration file defines the JDBC configuration and crawling SQL statements.