Customizing the preprocessor and Data Import Handler (DIH)

The preprocessing tasks are controlled by the wc-dataimport-preprocess XML files. The files contain table definitions, database schema metadata, and references to the Java classes used in the preprocessing steps. Those files are invoked by the di-preprocess utility, which extracts and flattens WebSphere Commerce data and then outputs the data into a set of temporary tables inside the WebSphere Commerce database. The data in the temporary tables is then used by the index building utility to populate the data into Solr indexes that use the Solr Data Import Handler (DIH).

Preprocessor

The sample wc-dataimport-preprocess XML files are in the following directory:
  • WCDE_installdir\samples\dataimport\catalog
Note: The deployed wc-dataimport-preprocess files are database-specific, as each database can contain specific data types and syntax.

To customize the preprocessor, create a new specific wc-dataimport-preprocess XML file. The custom wc-dataimport-preprocess XML file is invoked by the di-preprocess utility.

The following sample is a template preprocess configuration that goes into the custom wc-dataimport-preprocess XML file:

<_config:DIHPreProcessConfig xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config 
../../xsd/wc-dataimport-preprocess.xsd">

  <_config:data-processing-config processor="com.ibm.commerce.foundation.dataimport.preprocess.EmptyPreProcessor" 
masterCatalogId="#MASTER_CATALOG_ID#">
    <_config:table definition="" name="" />
    <_config:query sql="" />
    <_config:mapping>
      <_config:key queryColumn="" tableColumn=""/>
      <_config:column-mapping>
        <_config:column-column-mapping>
          <_config:column-column queryColumn="" tableColumn="" />
        </_config:column-column-mapping>
       </_config:column-mapping>
    </_config:mapping>
  </_config:data-processing-config>


</_config:DIHPreProcessConfig>
Note: The catalog entry customizable fields, field1, field3, and field5 do not require preprocessing. Therefore, the DIH can map them directly from the CATENTRY table.

Solr Data Import Handler (DIH)

The index building utility uses DIH to connect to the WebSphere Commerce database through a JDBC connection. It crawls the WebSphere Commerce tables, and then populates the Solr index. The JDBC configuration and crawling SQL statements are defined in the wc-data-config.xml configuration file.

The sample wc-data-config.xml XML files are in the following directory WCDE_installdir\components\foundation\subcomponents\search\solr\home\template\CatalogEntry\conf
Note: The deployed wc-data-config.xml files are database-specific, as each database can contain specific data types and syntax.
To customize the DIH to index new fields, define the custom query part in the solrcore.properties file and the field declaration mapping in the x-data-config.xml file with the following steps:
  1. Set the dataImporter.ext.querySelect property value to the new field name, followed by a comma in the solrcore.properties file.
  2. Set the dataImporter.ext.queryFrom source table name of the new field in the solrcore.properties file.
    Note: You must use a LEFT OUTER JOIN statement in this property.
  3. Add a field mapping between the SQL column name and the actual index field name in the x-data-config.xml file.
For example, the following steps add the catalog entry customizable fields, field1, field3, and field5 to the index query:
  1. Add the following property to the solrcore.properties file:
    
    dataImporter.ext.querySelect=CATENTRY.FIELD1, CATENTRY.FIELD3, CATENTRY.FIELD5,
    
  2. Add the mapping for each of the fields into the x-data-config.xml file:
    
    <field column=" FIELD1" name="catentry_field1" />
    <field column=" FIELD3" name="catentry_field3" />
    <field column=" FIELD5" name="catentry_field5" />
    

Naming conventions

Use a prefix of XI_ when naming custom temporary tables. This naming convention prevents naming conflicts between customization tables and default WebSphere Commerce tables.