Customizing Index Load

You can customize Index Load to suit your business needs. For example, you can customize Index Load to read from multiple sources, or change how the source input is transformed.

Before you begin

Download and extract the following sample code: IndexLoadSampleCode.zip. The sample includes configuration files that are used by Index Load, and manual updates that are performed in this task, for reference.

Procedure

  • Customizing Index Load by using SQL statements:

    The wc-indexload-profileName.xml file contains the business object and load item definitions.

    Use the following sample files as a reference.
    • wc-indexload-price.xml
    • wc-indexload-price-sql.xml
    The following sample snippet shows how to define the business object:
    
    <_config:LoadItem name="ExternalPrice-1" businessObjectConfigFile="wc-indexload-price-sql.xml">
    <_config:property name="coreName" value="MC_10001_CatalogEntry_Price_generic" />
    </_config:LoadItem>
    
    The following sample snippet shows how to define the load items by using SQL statements:
    
    <_config:DataLoader className="com.ibm.commerce.foundation.server.services.indexload.loader.solr.SolrIndexLoadQueryLoader"> 
    <_config:property name="ParallelThreads" value="2" />
    <_config:property name="ParallelLowerRangeSQL" value="SELECT MIN(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE" />
    <_config:property name="ParallelUpperRangeSQL" value="SELECT MAX(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE" />
    <_config:property name="ParallelNextRangeSQL" value="SELECT MIN(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE WHERE CE.CATENTRY_ID &gt; ?" />
    <_config:property name="ParallelLowerRange" value="" />
    <_config:property name="ParallelUpperRange" value="" />     
    <_config:property name="ParallelPrefetchSize" value="100" />
    <_config:DataReader className="com.ibm.commerce.foundation.server.services.indexload.reader.solr.SolrIndexLoadQueryMultiplexReader">
    <_config:DynamicFields>
    </_config:DynamicFields>
    <_config:Query>
     <_config:SQL>
       SELECT TI.CATENTRY_ID,  TI.PRICE              FROM TI_CNTRPRICE_0 TI
    WHERE TI.CATENTRY_ID &gt;= %ParallelLowerRange%
        AND TI.CATENTRY_ID <= %ParallelUpperRange%
    ORDER BY TI.CATENTRY_ID
      /_config:SQL>
      <_config:ColumnMapping name="CATENTRY_ID" value="catentry_id" />
      <_config:ColumnMapping name="PRICE" value="price" />
      </_config:Query>
      </_config:DataReader>
      <_config:BusinessObjectBuilder className="com.ibm.commerce.foundation.internal.server.services.indexload.builder.SolrIndexLoadMapObjectBuilder" >
      <_config:BusinessObjectMediator className="com.ibm.commerce.foundation.internal.server.services.indexload.mediator.SolrIndexLoadBusinessObjectMediator">
         <_config:extension className="com.ibm.commerce.foundation.server.services.indexload.mediator.solr.SolrIndexLoadExternalPriceMediator" />
          </_config:BusinessObjectMediator>
        </_config:BusinessObjectBuilder>
      </_config:DataLoader>  
    
  • Customizing Index Load by using ranges read the database in parallel:

    You can configure parallel load item configurations that can be used to split up data evenly across the data set. It uses multiple threads by using the SolrIndexLoadQueryLoader to support automatic range-based parallel indexing.

    Use the following sample files as a reference:
    • wc-indexload-price-adv.xml
    • wc-indexload-external-price-adv1.xml
    • wc-indexload-external-price-adv2.xml
    The following sample snippet shows how to define ParallelLowerRangeSQL and ParallelUpperRangeSQL SQL ranges:
    
    <_config:property name="ParallelThreads" value="2" />
    <_config:property name="ParallelLowerRangeSQL" value="SELECT MIN(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE" />
    <_config:property name="ParallelUpperRangeSQL" value="SELECT MAX(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE" />
    <_config:property name="ParallelNextRangeSQL" value="SELECT MIN(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE WHERE CE.CATENTRY_ID &gt; ?" />
    <config:SQL>
       SELECT TI.CATENTRY_ID,  TI.PRICE  FROM TI_CNTRPRICE_0 TI
    WHERE TI.CATENTRY_ID &gt;= %ParallelLowerRange%
       AND TI.CATENTRY_ID <= %ParallelUpperRange%
    ORDER BY TI.CATENTRY_ID
     </_config:SQL>
    
    The following sample snippet shows how to define ParallelLowerRange and ParallelUpperRange hardcoded ranges:
    
    <_config:property name="ParallelThreads" value="2" />
    <_config:property name="ParallelLowerRange" value="1000" />
    <_config:property name="ParallelUpperRange" value="2000" />
    <_config:property name="ParallelNextRangeSQL" value="SELECT MIN(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE WHERE CE.CATENTRY_ID &gt; ?" />
    <config:SQL>
       SELECT TI.CATENTRY_ID,  TI.PRICE  FROM TI_CNTRPRICE_0 TI
    WHERE TI.CATENTRY_ID &gt;= %ParallelLowerRange%
       AND TI.CATENTRY_ID <= %ParallelUpperRange%
    ORDER BY TI.CATENTRY_ID
     </_config:SQL>
    
  • Customizing Index Load to use horizontal sharding:

    You can configure Index Load to use horizontal sharding by using multiple cores that contain the price schema.

    Use the following sample files as a reference:
    • wc-indexload-price-adv.xml
    • wc-indexload-external-price-adv1.xml
    • wc-indexload-external-price-adv2.xml
    1. Define multiple load items in the Index Load configuration file.
      For example, in the wc-indexload-price.xml file:
      
      <_config:LoadItem name="ExternalPrice-1" businessObjectConfigFile="wc-indexload-external-price-adv1.xml">
      <_config:property name="coreName" value="MC_3074457345616676668_CatalogEntry_Price1_generic" />
      </_config:LoadItem>
      <_config:LoadItem name="ExternalPrice-2" businessObjectConfigFile="wc-indexload-external-price-adv2.xml">
      <_config:property name="coreName" value="MC_3074457345616676668_CatalogEntry_Price2_generic" />
      </_config:LoadItem>
      
    2. Define different data ranges in the load item configuration file.
      For example, in the wc-indexload-external-price-adv1.xml file:
      
      <_config:property name="ParallelThreads" value="2" />
      <_config:property name="ParallelLowerRangeSQL" value="" />
      <_config:property name="ParallelUpperRangeSQL" value="" />
      <_config:property name="ParallelNextRangeSQL" value="SELECT MIN(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE WHERE CE.CATENTRY_ID &gt; ?" />
      <_config:property name="ParallelLowerRange" value="3074457345616676668" />
      <_config:property name="ParallelUpperRange" value="3074457345616678880" />     
      
    3. Define different data ranges in another load item configuration file.
      For example, in the wc-indexload-external-price-adv2.xml file:
      
      <_config:property name="ParallelThreads" value="2" />
      <_config:property name="ParallelLowerRangeSQL" value="" />
      <_config:property name="ParallelUpperRangeSQL" value="" />
      <_config:property name="ParallelNextRangeSQL" value="SELECT MIN(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE WHERE CE.CATENTRY_ID &gt; ?" />
      <_config:property name="ParallelLowerRange" value="3074457345616678881" />
      <_config:property name="ParallelUpperRange" value="3074457345616680584" />     
      
  • Customizing Index Load to merge multiple horizontal shards:

    You can use Index Load to merge multiple horizontal shards into a single index. In the Index Load configuration file, define multiple load items that point to the same load item configuration file, but a different source index. The coreName property specifies the target index to be merged into.

    For example:
    
    <_config:LoadItem name="PriceIndexData-1" businessObjectConfigFile="wc-indexload-merge-base.xml">
    <_config:property name="coreName" value="MC_3074457345616676668_CatalogEntry_Price_generic" />
    <_config:DataSourceLocation location="C:\Price1\data\index" />
    </_config:LoadItem>
    <_config:LoadItem name="PriceIndexData-2" businessObjectConfigFile="wc-indexload-merge-base.xml">
    <_config:property name="coreName" value="MC_3074457345616676668_CatalogEntry_Price_generic" />
    <_config:DataSourceLocation location="C:\Price2\data\index" />
    </_config:LoadItem>
    
    For more information, see Index Load configuration files for merging indexes.
  • Customizing the Business Object Mediator:

    You can customize the Business Object Mediator to change how the source input is transformed.

    Create a custom business object mediator class that extends SolrIndexLoadBusinessObjectMediator:
    
    protected void transform(Object dataObjects, boolean deleteFlag)
             throws DataLoadException {
          final String METHODNAME = "transform(Object, boolean)";
          if (LoggingHelper.isTraceEnabled(LOGGER)) {
             LOGGER.entering(CLASSNAME, METHODNAME, new Object[] { dataObjects,
                   deleteFlag });
          }
    
          if (LoggingHelper.isEntryExitTraceEnabled(LOGGER)) {
             LOGGER.exiting(CLASSNAME, METHODNAME);
          }
       }
    

    The subclass must implement the abstract method transform(). This method transforms the input logic business object to a physical document object to be saved into the Solr index. After the transform() method finishes, the super class passes the list of physical objects to the persistence layer to persist them in the Solr index. The subclass is responsible for populating all data in the physical objects.

    For example, to transform the following input into multiple column in price index:
    
    catentry_id         price
    10001               price_USD_10001:100.00||price_EUR_10001:78.29||price_JPY_10001:11274||
                        price_KRW_10001:95048||price_BRL_10001:232.15||price_CNY_10001:802.25
    
    The following snippet is the default implementation to perform the transform:
    
    public void transform(Map<String, Object> document)
             throws SolrIndexLoadException {
          final String METHODNAME = "transform(Map<String, Object>)";
          if (LoggingHelper.isEntryExitTraceEnabled(LOGGER)) {
             LOGGER.entering(CLASSNAME, METHODNAME, new Object[] { document });
          }
    
          if (document != null && !document.isEmpty()) {
             Object fieldValue = document.get("price");
             StringTokenizer st = new StringTokenizer(fieldValue.toString(), "||");
             while(st.hasMoreTokens()){
                String priceElement = (String)st.nextElement();
                int i = priceElement.lastIndexOf(":");
                if (i < 0) {
                   LOGGER.logp(Level.WARNING, CLASSNAME, METHODNAME,
                         "ignoring invalid data format: " + priceElement + "("
                               + String.valueOf(document.get(0)) + ")");
                   continue;
                }
                String priceFieldName = priceElement.substring(0, i);
                //String currency = value.substring(6,i);
                String price = priceElement.substring(i + 1);
                Float newprice = Float.valueOf(price);
                document.put(priceFieldName, newprice);
             }
             document.remove("price");
          } else {
             if (LoggingHelper.isTraceEnabled(LOGGER)) {
                LOGGER.logp(Level.WARNING, CLASSNAME, METHODNAME,
                      "nothing to process");
             }
          }
    
          if (LoggingHelper.isEntryExitTraceEnabled(LOGGER)) {
             LOGGER.exiting(CLASSNAME, METHODNAME);
          }
       }
    
    The following values are formed as a result:
    
    <float name=" price_USD_10001">100.00</float> 
    <float name=" price_EUR_10001">78.29</float> 
    <float name=" price_JPY_10001">11274</float> 
    <float name=" price_KRW_10001">95048</float> 
    <float name=" price_BRL_10001">232.15</float> 
    <float name=" price_CNY_10001">802.25</float> 
    
  • Customizing the SolrIndexLoadQueryReader to transform multiple data entries from a database table into a single index row:

    The SolrIndexLoadQueryMultiplexReader can be used to transform multiple data entries from a database table into a single index row that contains multiple dynamic index fields.

    Define the KeyFieldName property by using one primary key field. The database column that maps to this primary key index field is used as the identifier for the index entry.