Index Load configuration files for indexing from database

Index Load requires configuration files before it can be run from a web browser.
Index Load requires three types of configuration files, based on the XML schema definitions of the Data Load framework:

Index Load configuration files

Index Load configuration file Schema definition file
Environment configuration file (wc-indexload-env.xml) wc-dataload-env.xsd
Profile configuration file (wc-indexload-profileName.xml) wc-indexload.xsd
Profile item configuration file (wc-indexload-businessobject.xml) wc-indexload-item.xsd

Environment configuration file (wc-indexload-env.xml)

The wc-indexload-env.xml file contains environment control information and global properties that are required by Index Load, including a common data writer and data source to be used to persist the data.

The wc-indexload-env.xml file does not typically require customization. You can use the default sample file as-is.

Example: wc-indexload-env.xml

<_config:DataLoadEnvConfiguration
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config ../../../xml/config/xsd/wc-dataload-env.xsd" 
	xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config">

	<_config:DataSource reference="com.ibm.commerce.foundation.server.services.search.datasource" />

	<_config:DataWriter className="com.ibm.commerce.foundation.internal.server.services.indexload.writer.SolrIndexLoadWriter" >
		 <_config:DataLoadBatchService className="com.ibm.commerce.foundation.server.services.indexload.writer.solr.SolrIndexLoadBatchService" />
	</_config:DataWriter>

</_config:DataLoadEnvConfiguration>

Profile configuration file (wc-indexload-profileName.xml)

The wc-indexload-profileName.xml file contains configurable performance attributes and load item configurations.

Profile names that you define in configuration files are then substituted in as a URL parameter when you call Index Load in a web browser.

The load item configurations are listed under the load order section of this file. They are processed in the same order as they are specified.

It can contain one or multiple LoadItem definitions, with every LoadItem configuration specifying the specific LoadItem configuration and coreName target. Multiple LoadItems are run in parallel, without sequence.

Example: wc-indexload-price.xml

<_config:LoadItem name="ExternalPrice-1" businessObjectConfigFile="wc-indexload-price-sql.xml">
			<_config:property name="coreName" value="MC_10001_CatalogEntry_Price_generic" />
			<_config:property name="groupName" value="1" />
	  </_config:LoadItem>

The following configurable performance attributes apply to profile configuration files:
batchSize
The threshold when documents are soft committed in memory.
The default value is 1. If a value of 0 is specified, it does not commit until the load item finishes.
commitCount
The threshold when documents are hard committed to disk from memory.
You can use a commitCount of 0 if you use a memory-based commit. For more information, see Tuning Index Load.
ThreadLaunchTimeDelay
The amount of time in milliseconds to wait before starting another new thread to avoid overloading the system at startup.
The default value is 1000.
OptimizeAfterIndexing
Indicates whether Index Load performs index optimization after commit.
Note: Performing optimization after a full indexing improves runtime performance; however, it increases the overall indexing time.
StatusRefreshInterval
The maximum amount of time in seconds to wait before refreshing the current Index Load status and display it in the administrative log.
The default value is 300. Use a value of -1 to disable the service.
DocumentSizeSamplingInterval
The time interval in seconds to calculate the size of the indexed document. Use -1 to disable the service. The default value is 300.
IndexHeightCacheHint
A number that hints the system to determine the size of the applicable caches for index height that is used during indexing.
IndexWidthCacheHint
A number that hints the system to determine the size of the applicable caches for index width that is used during indexing.

Profile item configuration file (wc-indexload-businessobject.xml)

The wc-indexload-businessobject.xml file contains detailed DataLoader configurations, which include the dataload className, DataReader, and BusinessObjectBuilder. The SolrIndexLoadQueryLoader is used to load objects from the database.
Example: wc-indexload-price-sql.xml

<_config:DataLoader className="com.ibm.commerce.foundation.server.services.indexload.loader.solr.SolrIndexLoadQueryLoader" >
The following configurable performance attributes apply to profile item configuration files:
ParallelThreads
Reads data in parallel. It specifies the maximum loader thread number, which can be dispatched by the search work manager. The loader thread reads data in parallel, sharing the data writer.
An empty value or 1 indicates no parallel indexing.
ParallelLowerRangeSQL
SQL queries that get the first keys.
SQL queries can be used to specify that indexLoad only load parts of the objects from the database.
ParallelUpperRangeSQL
SQL queries that get the end keys.
ParallelNextRangeSQL
An SQL statement that determines the next available identifier when an empty range ID is detected from the parallel range. Typically, the nextStartKey value is the firstKey, and the nextEndKey is the firstKey+prefetchSize-1.
ParallelLowerRange
A hardcoded value that tracks the lower range keys. If defined, it is an absolute number for the lower range and overrides the value of ParallelLowerRangeSQL.
ParallelUpperRange
A hardcoded value that tracks the upper range keys. If defined, it is an absolute number for the upper range and overrides the value of ParallelUpperRangeSQL.
ParallelPrefetchSize
Determines how much data to read in one run, when the reader performs a query from the database. If defined, the run time breaks up the entire data range into fragments to avoid overloading the database sort heap with too large a query result set
The default value is 10000.
ParallelDeltaUpdate
Determines whether the SQL result set is merged into an existing indexed document that contains a matching primary key. This delta update operation is equivalent to the Atomic Update feature provided by Solr.
The profile item configuration file contains a data reader section that defines how data can be read and inserted into the index. Two data readers are provided by default:
com.ibm.commerce.foundation.server.services.indexload.reader.solr.SolrIndexLoadQueryReader
A simple SQL loader that reads the original physical data from the data source in parallel as specified by the configuration files.
com.ibm.commerce.foundation.server.services.indexload.reader.solr.SolrIndexLoadQueryMultiplexReader
Requires the index entity to have the KeyFieldName property that is defined and only one primary key field. The database column that maps to this primary key index field is used as the identifier for the index entry.
It is used in the following way:
  • The KeyFieldName property is the index field name for the primary key.
  • The query tag is the database SQL query to be used, and must be ordered by the primary key field.
  • Multiple ColumnMapping tags can be used, with each one mapping to a database table column (name) with an index field name (value).
  • The DynamicFields section allows a list of dynamic fields to be defined. Multiplexing is applied to this field with the column name as the resolved value from dynamicFieldName and the value in this column as the resolved value from dynamicFieldValue. In addition, dynamicFieldName and dynamicFieldValue can be used as a template where other field variable names can be declared. An optional parameter, indexingMode, with its default value as replace, is used to define the behavior for handling multiple values in this dynamic column. Other supported operations are append and sum, where append is for handling multi-value index fields, and sum is for adding up all the values.