Customizing Index Load

You can customize Index Load to suit your business needs. For example, you can customize Index Load to read from multiple sources, or change how the source input is transformed.

Procedure

  • Customizing Index Load by using SQL statements:

    The wc-indexload-profileName.xml file contains the business object and load item definitions.

    Use the following sample files as a reference.
    • wc-indexload-price.xml
    • wc-indexload-price-sql.xml
    The following sample snippet shows how to define the business object:
    
    <_config:LoadItem name="ExternalPrice-1" businessObjectConfigFile="wc-indexload-price-sql.xml">
    <_config:property name="coreName" value="MC_10001_CatalogEntry_Price_generic" />
    </_config:LoadItem>
    
    The following sample snippet shows how to define the load items by using SQL statements:
    
    <_config:DataLoader className="com.ibm.commerce.foundation.server.services.indexload.loader.solr.SolrIndexLoadQueryLoader"> 
    <_config:property name="ParallelThreads" value="2" />
    <_config:property name="ParallelLowerRangeSQL" value="SELECT MIN(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE" />
    <_config:property name="ParallelUpperRangeSQL" value="SELECT MAX(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE" />
    <_config:property name="ParallelNextRangeSQL" value="SELECT MIN(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE WHERE CE.CATENTRY_ID &gt; ?" />
    <_config:property name="ParallelLowerRange" value="" />
    <_config:property name="ParallelUpperRange" value="" />     
    <_config:property name="ParallelPrefetchSize" value="100" />
    <_config:DataReader className="com.ibm.commerce.foundation.server.services.indexload.reader.solr.SolrIndexLoadQueryMultiplexReader">
    <_config:DynamicFields>
    </_config:DynamicFields>
    <_config:Query>
     <_config:SQL>
       SELECT TI.CATENTRY_ID,  TI.PRICE              FROM TI_CNTRPRICE_0 TI
    WHERE TI.CATENTRY_ID &gt;= %ParallelLowerRange%
        AND TI.CATENTRY_ID <= %ParallelUpperRange%
    ORDER BY TI.CATENTRY_ID
      /_config:SQL>
      <_config:ColumnMapping name="CATENTRY_ID" value="catentry_id" />
      <_config:ColumnMapping name="PRICE" value="price" />
      </_config:Query>
      </_config:DataReader>
      <_config:BusinessObjectBuilder className="com.ibm.commerce.foundation.internal.server.services.indexload.builder.SolrIndexLoadMapObjectBuilder" >
      <_config:BusinessObjectMediator className="com.ibm.commerce.foundation.internal.server.services.indexload.mediator.SolrIndexLoadBusinessObjectMediator">
         <_config:extension className="com.ibm.commerce.foundation.server.services.indexload.mediator.solr.SolrIndexLoadExternalPriceMediator" />
          </_config:BusinessObjectMediator>
        </_config:BusinessObjectBuilder>
      </_config:DataLoader>  
    
  • Customizing Index Load by using ranges read the database in parallel:

    You can configure parallel load item configurations that can be used to split up data evenly across the data set. It uses multiple threads by using the SolrIndexLoadQueryLoader to support automatic range-based parallel indexing.

    Use the following sample files as a reference:
    • wc-indexload-price-adv.xml
    • wc-indexload-external-price-adv1.xml
    • wc-indexload-external-price-adv2.xml
    The following sample snippet shows how to define ParallelLowerRangeSQL and ParallelUpperRangeSQL SQL ranges:
    
    <_config:property name="ParallelThreads" value="2" />
    <_config:property name="ParallelLowerRangeSQL" value="SELECT MIN(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE" />
    <_config:property name="ParallelUpperRangeSQL" value="SELECT MAX(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE" />
    <_config:property name="ParallelNextRangeSQL" value="SELECT MIN(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE WHERE CE.CATENTRY_ID &gt; ?" />
    <config:SQL>
       SELECT TI.CATENTRY_ID,  TI.PRICE  FROM TI_CNTRPRICE_0 TI
    WHERE TI.CATENTRY_ID &gt;= %ParallelLowerRange%
       AND TI.CATENTRY_ID <= %ParallelUpperRange%
    ORDER BY TI.CATENTRY_ID
     </_config:SQL>
    
    The following sample snippet shows how to define ParallelLowerRange and ParallelUpperRange hardcoded ranges:
    
    <_config:property name="ParallelThreads" value="2" />
    <_config:property name="ParallelLowerRange" value="1000" />
    <_config:property name="ParallelUpperRange" value="2000" />
    <_config:property name="ParallelNextRangeSQL" value="SELECT MIN(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE WHERE CE.CATENTRY_ID &gt; ?" />
    <config:SQL>
       SELECT TI.CATENTRY_ID,  TI.PRICE  FROM TI_CNTRPRICE_0 TI
    WHERE TI.CATENTRY_ID &gt;= %ParallelLowerRange%
       AND TI.CATENTRY_ID <= %ParallelUpperRange%
    ORDER BY TI.CATENTRY_ID
     </_config:SQL>
    
  • Customizing the Business Object Mediator:

    You can customize the Business Object Mediator to change how the source input is transformed.

    Create a custom business object mediator class that extends SolrIndexLoadBusinessObjectMediator:
    
    protected void transform(Object dataObjects, boolean deleteFlag)
             throws DataLoadException {
          final String METHODNAME = "transform(Object, boolean)";
          if (LoggingHelper.isTraceEnabled(LOGGER)) {
             LOGGER.entering(CLASSNAME, METHODNAME, new Object[] { dataObjects,
                   deleteFlag });
          }
    
          if (LoggingHelper.isEntryExitTraceEnabled(LOGGER)) {
             LOGGER.exiting(CLASSNAME, METHODNAME);
          }
       }
    

    The subclass must implement the abstract method transform(). This method transforms the input logic business object to a physical document object to be saved into the Solr index. After the transform() method finishes, the super class passes the list of physical objects to the persistence layer to persist them in the Solr index. The subclass is responsible for populating all data in the physical objects.

    For example, to transform the following input into multiple column in price index:
    
    catentry_id         price
    10001               price_USD_10001:100.00||price_EUR_10001:78.29||price_JPY_10001:11274||
                        price_KRW_10001:95048||price_BRL_10001:232.15||price_CNY_10001:802.25
    
    The following snippet is the default implementation to perform the transform:
    
    public void transform(Map<String, Object> document)
             throws SolrIndexLoadException {
          final String METHODNAME = "transform(Map<String, Object>)";
          if (LoggingHelper.isEntryExitTraceEnabled(LOGGER)) {
             LOGGER.entering(CLASSNAME, METHODNAME, new Object[] { document });
          }
    
          if (document != null && !document.isEmpty()) {
             Object fieldValue = document.get("price");
             StringTokenizer st = new StringTokenizer(fieldValue.toString(), "||");
             while(st.hasMoreTokens()){
                String priceElement = (String)st.nextElement();
                int i = priceElement.lastIndexOf(":");
                if (i < 0) {
                   LOGGER.logp(Level.WARNING, CLASSNAME, METHODNAME,
                         "ignoring invalid data format: " + priceElement + "("
                               + String.valueOf(document.get(0)) + ")");
                   continue;
                }
                String priceFieldName = priceElement.substring(0, i);
                //String currency = value.substring(6,i);
                String price = priceElement.substring(i + 1);
                Float newprice = Float.valueOf(price);
                document.put(priceFieldName, newprice);
             }
             document.remove("price");
          } else {
             if (LoggingHelper.isTraceEnabled(LOGGER)) {
                LOGGER.logp(Level.WARNING, CLASSNAME, METHODNAME,
                      "nothing to process");
             }
          }
    
          if (LoggingHelper.isEntryExitTraceEnabled(LOGGER)) {
             LOGGER.exiting(CLASSNAME, METHODNAME);
          }
       }
    
    The following values are formed as a result:
    
    <float name=" price_USD_10001">100.00</float> 
    <float name=" price_EUR_10001">78.29</float> 
    <float name=" price_JPY_10001">11274</float> 
    <float name=" price_KRW_10001">95048</float> 
    <float name=" price_BRL_10001">232.15</float> 
    <float name=" price_CNY_10001">802.25</float> 
    
  • Customizing the SolrIndexLoadQueryReader to transform multiple data entries from a database table into a single index row:

    The SolrIndexLoadQueryMultiplexReader can be used to transform multiple data entries from a database table into a single index row that contains multiple dynamic index fields.

    Define the KeyFieldName property by using one primary key field. The database column that maps to this primary key index field is used as the identifier for the index entry.