Customizing the preprocessor and Data Import Handler (DIH)

The preprocessing tasks are controlled by the wc-dataimport-preprocess XML files. The files contain table definitions, database schema metadata, and references to the Java classes used in the preprocessing steps. Those files are invoked by the di-preprocess utility, which extracts and flattens WebSphere Commerce data and then outputs the data into a set of temporary tables inside the WebSphere Commerce database. The data in the temporary tables is then used by the index building utility to populate the data into Solr indexes that use the Solr Data Import Handler (DIH).

Preprocessor

The sample wc-dataimport-preprocess XML files are in the following directory:
  • WC_installdir\IBM\WCDE_INT70\components\foundation\samples\dataimport\catalog
Note: The deployed wc-dataimport-preprocess files are database-specific, as each database can contain specific data types and syntax.

To customize the preprocessor, create a new specific wc-dataimport-preprocess XML file. The custom wc-dataimport-preprocess XML file is invoked by the di-preprocess utility.

The following sample is a template preprocess configuration that goes into the custom wc-dataimport-preprocess XML file:

<_config:DIHPreProcessConfig xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config 
../../xsd/wc-dataimport-preprocess.xsd">

  <_config:data-processing-config processor="com.ibm.commerce.foundation.dataimport.preprocess.EmptyPreProcessor" 
masterCatalogId="#MASTER_CATALOG_ID#">
    <_config:table definition="" name="" />
    <_config:query sql="" />
    <_config:mapping>
      <_config:key queryColumn="" tableColumn=""/>
      <_config:column-mapping>
        <_config:column-column-mapping>
          <_config:column-column queryColumn="" tableColumn="" />
        </_config:column-column-mapping>
       </_config:column-mapping>
    </_config:mapping>
  </_config:data-processing-config>


</_config:DIHPreProcessConfig>
Note: The catalog entry customizable fields, field1, field3, and field5 do not require preprocessing. Therefore, the DIH can map them directly from the CATENTRY table.

For more information about customizing the preprocessor, see Configuring the Data Import Handler mapping.

Solr Data Import Handler (DIH)

The index building utility uses DIH to connect to the WebSphere Commerce database through a JDBC connection. It crawls the WebSphere Commerce tables, and then populates the Solr index. The JDBC configuration and crawling SQL statements are defined in the wc-data-config.xml configuration file.

The sample wc-data-config.xml XML files are in the following directory WC_installdir\IBM\WCDE_INT70\components\foundation\subcomponents\search\solr\home\template\CatalogEntry\conf
Note: The deployed wc-data-config.xml files are database-specific, as each database can contain specific data types and syntax.
To customize the DIH to index new fields, modify the wc-data-config.xml file with the following steps:
  1. Add the new field name to the SELECT clause of the query and the deltaImportQuery.
  2. Add the source table name of the new field to the FROM clause of the query and the deltaImportQuery.
  3. Add a field mapping to the actual index field name in the index.
For example, the following steps add the catalog entry customizable fields, field1, field3, and field5 to the wc-data-config.xml file:
  1. Append the list of fields under the SELECT clause with the following snippet:
    
    CATENTRY.FIELD1, CATENTRY.FIELD3, CATENTRY.FIELD5,
    
  2. The CATENTRY table is already included in the list of tables of the FROM clause.
  3. Add the mapping for each of the fields:
    
    <field column=" FIELD1" name="catentry_field1" />
    <field column=" FIELD3" name="catentry_field3" />
    <field column=" FIELD5" name="catentry_field5" />
    

Naming conventions

Use a prefix of XI_ when naming custom temporary tables. This naming convention prevents naming conflicts between customization tables and default WebSphere Commerce tables.