Configuring the search preprocessor

In this lesson, you modify the preprocessor configuration file to support custom data. You add support for your custom data by adding a custom configuration file that references the required processing Java classes and the temporary customer ranking table information. Preprocessing tasks are controlled by the wc-dataimport-preprocess XML files. The files contain table definitions, database schema metadata, and references to the Java classes that are used in the preprocessing steps.

About this task

The customer ratings data is preprocessed in two steps:
  1. First, the data is loaded into a temporary table
  2. Then, internal HCL Commerce referential constraints are resolved and the resolved data is loaded into a secondary table. The secondary table data is used for indexing purposes.
This data is processed in two steps since the external ratings data does not contain references to internal identifiers, such as CATENTRY.CATENTRY_ID and CATENTRY.MEMBER_ID identifiers. In this tutorial, the ratings data refers to the CATENTRY.PARTNUMBER database column values. The CATENTRY_ID is resolved from the PARTNUMBER and MEMBER_ID values. The MEMBER_ID value that is used for this tutorial is set in the provided code snippet.

Procedure

  1. Copy the sample Ratings.xml file into any directory within your development environment. This file is included in the searchindexratings.zip compressed file that you downloaded from the tutorial introduction. As an example, the following steps have the file included within the HCL Commerce bin directory. This XML file includes sample customer ratings, which you load into your database and use to develop product rankings with HCL Commerce Search. If you want to include more ratings data, you can edit the file.
  2. In your file manager utility, go to the workspace_dir\WC\xml\search\dataImport\v3\dbtype\CatalogEntry directory, where dbtype is the database type for your environment, such as DB2.

    .

  3. In this folder, create an XML file and name the file wc-dataimport-preprocess-custom.xml.
  4. Add the following code into your new file.
    
    <?xml version="1.0" encoding="UTF-8"?>
    <_config:DIHPreProcessConfig xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config ../../xsd/wc-dataimport-preprocess.xsd ">
      
      <!-- load ratings into temp table -->
      <_config:data-processing-config processor="com.mycompany.commerce.preprocess.StaticRatingsDataPreProcessor" batchSize="500">
        <_config:table definition="CREATE TABLE TI_RATING_TEMP ( PART_NUMBER VARCHAR(256),RTYPE VARCHAR(256), RATING VARCHAR(256))" name="TI_RATING_TEMP"/>
        <_config:query sql=""/>
        <_config:mapping>
          <_config:key queryColumn="CATENTRY_ID" tableColumn="CATENTRY_ID"/>
          <_config:column-mapping>
            <_config:column-column-mapping>
               <_config:column-column queryColumn="" tableColumn="" />
            </_config:column-column-mapping>
            </_config:column-mapping>
        </_config:mapping>    
    
        <!-- this property is added new to locate the input file path instead of hard coding it to be in WC\bin -->
        <_config:property name="inputFile" value="WCDE_installdir\bin\Ratings.xml"/>
      </_config:data-processing-config>
      
      <_config:data-processing-config processor="com.mycompany.commerce.preprocess.StaticRatingsDataPopulator" batchSize="500">
        <_config:table definition="CREATE TABLE TI_RATING ( CATENTRY_ID BIGINT NOT NULL, PART_NUMBER VARCHAR(256),RTYPE VARCHAR(256), RATING VARCHAR(256))" 
          name="TI_RATING"/>
       <_config:query sql="insert into TI_RATING ( catentry_id,part_number, rating,rtype ) select catentry_id,part_number,rating,rtype 
         from catentry,ti_rating_temp where catentry.partnumber=ti_rating_temp.part_number and catentry.member_id=7000000000000000101"/>
        <_config:mapping>
          <_config:key queryColumn="CATENTRY_ID" tableColumn="CATENTRY_ID"/>
          <_config:column-mapping>
            <_config:column-column-mapping>
               <_config:column-column queryColumn="" tableColumn="" />
            </_config:column-column-mapping>
            </_config:column-mapping>
        </_config:mapping>    
      </_config:data-processing-config>
    </_config:DIHPreProcessConfig>
    
    Notes:
    • Ensure that any member ID values and the inputFile property value are correct for your store and environment. The inputFile property must point to the XML file that includes the customer ranking data.
    • The first instance of the <_config:data-processing-config> element refers to the com.mycompany.commerce.preprocess.StaticRatingsDataPreProcessor Java class, which is used for loading data by the processor attribute. This element defines the table definition for the first temporary table, TI_RATING_TEMP, by using the <_config:table> subelement. The remaining subelements remain unused and ensure that the XML is well-formed.
    • The second <_config:data-processing-config> refers to the com.mycompany.commerce.preprocess.StaticRatingsDataPopulator Java class, which is responsible for reading the data that is produced by the first stage of the preprocessing operation and resolves the internal identifiers. This element defines the table definition for the secondary temporary table, TI_RATING, which stores the resolved data. The <_config:query> subelement defines the SQL that is used to resolve and load the data.
    • For the runtime environment, the inputFile property must point to the location where the Ratings.xml file is located within the container.
    • For Oracle databases, please replace the CREATE TABLE statement with the following: CREATE TABLE TI_RATING ( CATENTRY_ID NUMBER NOT NULL, PART_NUMBER VARCHAR(256),RTYPE VARCHAR(256), RATING VARCHAR(256));.
    • You will not see changes you make to preprocess XML configuration files until you execute a DROP TABLE command on the corresponding table.
  5. Save and close the file
  6. Create the preprocessor Java classes in your environment for preprocessing your custom data. The following procedure creates sample StaticRatingsDataPopulator.java and StaticRatingsDataPreProcessor.java files. These files include sample code for only this tutorial. If you need to index different data, you must define your own custom Java files.
    1. Open HCL Commerce Developer and switch to the Enterprise Explorer view.
    2. Expand WebSphereCommerceServerExtensionsLogic > src
    3. Right-click src and select Import.
    4. In the Import dialog, expand General. Select File System > Next.
    5. Browse to the directory where your downloaded and extracted the searchindexratings.zip compressed file from the introductory topic of this tutorial. Select the src directory within the extracted file and click OK.
    6. Select the check box next to the src directory and click Finish.
      The following packages should be imported into the WebSphereCommerceServerExtensionsLogic project within the src directory:
      • com.mycompany.commerce.preprocess
      • com.mycompany.commerce.preprocess.rating
    Note: The RatingXMLReader.java within the com.mycompany.commerce.preprocess.rating package is a simple Java Class that takes an XML file name and parses the file. The format of the XML decides how the implementation of this class is performed. The format of the XML and how to parse it is left open. For instance, the following code snippet is a sample format for the Ratings.xml file.
    
    <?xml version="1.0" encoding="utf-8"?>
    <customInfo>
      <product partNumber="AC-01">
        <rating type="quality">
          <averageRating>1.7</averageRating>
          <reviewCount>60</reviewCount>
        </rating>
      </product>
      <product partNumber="AC-0101">
        <rating type="quality">
          <averageRating>4.6</averageRating>
          <reviewCount>85</reviewCount>
        </rating>
      </product>
    </customInfo>
    
  7. Package the classes within a JAR file in your WC_eardir directory so that the utilities can locate the classes at run time.
    1. Export the WebSphereCommerceServerExtensionsLogic package into a JAR file within a temporary directory. Ensure that you name the file WebSphereCommerceServerExtensionsLogic.
    2. Copy the exported WebSphereCommerceServerExtensionsLogic.jar file from your temporary directory into the WC_eardir directory within your environment.