Feature Pack 8

Index Load configuration files for indexing

Index Load requires configuration files before it can be run from a web browser.
Index Load requires three types of configuration files, based on the XML schema definitions of the Data Load framework:

Index Load configuration files

Index Load configuration file Data Load definition file
Environment configuration file (wc-indexload-env.xml) wc-dataload.xsd
Profile configuration file (wc-indexload-profileName.xml) wc-dataload-env.xsd
Profile item configuration file (wc-indexload-businessobject.xml) wc-dataload-businessobject.xsd

Environment configuration file (wc-indexload-env.xml)

The wc-indexload-env.xml file contains environment control information and global properties required by Index Load, including a common data writer and data source to be used to persist the data.

The wc-indexload-env.xml file does not typically require customization, you can use the default sample file as-is.

Example: wc-indexload-env.xml

<_config:DataLoadEnvConfiguration
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config ../../../xml/config/xsd/wc-dataload-env.xsd" 
	xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config">

	<_config:DataSource reference="com.ibm.commerce.foundation.server.services.search.datasource" />

	<_config:DataWriter className="com.ibm.commerce.foundation.internal.server.services.indexload.writer.SolrIndexLoadWriter" >
		 <_config:DataLoadBatchService className="com.ibm.commerce.foundation.server.services.indexload.writer.solr.SolrIndexLoadBatchService" />
	</_config:DataWriter>

</_config:DataLoadEnvConfiguration>

Profile configuration file (wc-indexload-profileName.xml)

The wc-indexload-profileName.xml file contains configurable performance attributes and load item configurations.

Profile names that you define in configuration files are then substituted in as a URL parameter when calling Index Load in a web browser.

The load item configurations are listed under the load order section of this file. They are processed in the same order as they are specified.

It can contain one or multiple LoadItem definitions, with every LoadItem configuration specifying the specific loaditem configuration and coreName target. Multiple LoadItems are run in parallel, without sequence.

Example: wc-indexload-price.xml

<_config:LoadItem name="ExternalPrice-1" businessObjectConfigFile="wc-indexload-price-sql.xml">
			<_config:property name="coreName" value="MC_10001_CatalogEntry_Price_generic" />
			<_config:property name="groupName" value="1" />
	  </_config:LoadItem>

The following configurable performance attributes apply to profile configuration files:
batchSize
The threshold when documents are soft committed in memory.
The default value is 1. If a value of 0 is specified, it will not commit until the load item finishes.
commitCount
The threshold when documents are hard committed to disk from memory.
You can use a commitCount of 0 if using a memory-based commit. For more information, see Tuning Index Load.
ThreadLaunchTimeDelay
The amount of time in milliseconds to wait before launching another new thread, to avoid overloading the system at startup.
The default value is 1000.
OptimizeAfterIndexing
Indicates whether Index Load performs index optimization after commit.
Note: Performing optimization after a full indexing improves runtime performance; however, it increases the overall indexing time.
StatusRefreshInterval
The maximum amount of time in seconds to wait before refreshing the current Index Load status and display it in the administrative log.
The default value is 300. Use a value of -1 to disable the service.
IndexHeightCacheHint
A number that hints the system to determine the size of the applicable caches for index height used during indexing.
IndexWidthCacheHint
A number that hints the system to determine the size of the applicable caches for index width used during indexing.

Profile item configuration file (wc-indexload-businessobject.xml)

The wc-indexload-businessobject.xml file contains detailed DataLoader configurations, which include the dataload className, DataReader, and BusinessObjectBuilder. The SolrIndexLoadQueryLoader is used to load objects from the database.
Example: wc-indexload-price-sql.xml

<_config:DataLoader className="com.ibm.commerce.foundation.server.services.indexload.loader.solr.SolrIndexLoadQueryLoader" >
The following configurable performance attributes apply to profile item configuration files:
ParallelThreads
Reads data in parallel. It specifies the maximum loader thread number which can be dispatched by the search work manager. The loader thread will read data in parallel, sharing the same data writer.
An empty value or 1 indicates no parallel indexing.
ParallelLowerRangeSQL
SQL queries that get the first keys.
SQL queries can be used to specify that indexLoad only load parts of the objects from the database.
ParallelUpperRangeSQL
SQL queries that get the end keys.
ParallelNextRangeSQL
An SQL statement that determines the next available identifier when an empty range ID is detected from the parallel range. Typically, the nextStartKey value is the firstKey, and the nextEndKey is the firstKey+prefetchSize-1.
ParallelLowerRange
A hardcoded value that keeps track of the lower range keys. If defined, it is an absolute number for the lower range and overrides the value of ParallelLowerRangeSQL.
ParallelUpperRange
A hardcoded value that keeps track of the upper range keys. If defined, it is an absolute number for the upper range and overrides the value of ParallelUpperRangeSQL.
ParallelPrefetchSize
Determines how much data to read in one run, when the reader performs a query from the database. If defined, the runtime will break up the entire data range into fragments to avoid overloading the database sort heap with too large a query result set
The default value is 10000.
ParallelDeltaUpdate
Determines whether the SQL result set will be merged into an existing indexed document that contains a matching primary key.

Sample configuration files

You can use the following sample configuration files for reference: IndexLoadSampleCode.zip.

The sample includes configuration files used by Index Load, and manual updates performed in the Indexing contract prices using Index Load task, for reference.