Extending the Dataload indexer mediator for WebSphere Commerce search

The default Dataload indexer generic mediator enables indexing indexer-ready data directly into WebSphere Commerce search. A new mediator can be created to support new data types.

There are two requirements for the data file:
  1. The data file schema structure must match the index schema structure. That is, the external name of XML elements must match the internal name of the index field.
  2. All the index required fields are provided in the data file. If the external names are different than the internal index names, a mapping can be done in a configuration file. If the data file does not contain all the required fields, the mediator must extended to be able to compute the required fields.

A new mediator must be created for each data type. The new mediator extends the com.ibm.commerce.foundation.dataimport.dataload.mediator.AbstractSolrInputDocumentMediator class, and implements the transform() method. The logic to resolve any missing data from the data file can be added into the new mediator class.

The Dataload framework is enhanced to support indexing data directly into the WebSphere Commerce search server. A SolrJ Java client is used to index a flat data structure from an external data source in XML or CSV format. The Dataload processing is done in multiple components: the reader, the mediator, and the writer:
Extending the Dataload indexer mediator for WebSphere Commerce search
Where:
  1. The reader parses the input file, and constructs a name-value-pair object. There are generic readers for CSV and XML data formats available by default.
  2. The business object builder builds a map object of the name-value-pairs read by the reader.
  3. A Solr document object is created in the mediator. The document is constructed and populated according to the index schema structure. A generic Solr mediator is provided by default that can be used to index indexer-ready data.
    Note: If the input data is not indexer-ready data, a custom mediator must process the data to make it ready for indexing.
  4. The constructed Solr document is inserted into the Solr server using the SolrJ Java client.

    The reader, builder, mediator and other configurations are specified in the loader file.

    Where, within the <_config:DataMapping> section:
    • internal_field_A is the Solr index field name.
    • external_field_A is the field name in the CSV or XML file.
    For example, the XML loader, in bold:
    
    <?xml version="1.0" encoding="UTF-8"?>
    <_config:DataloadBusinessObjectConfiguration 
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config ../xsd/wc-dataload-businessobject.xsd" 
       xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config">
    <_config:DataLoader className="com.ibm.commerce.foundation.dataload.BusinessObjectLoader">
    
         <_config:DataReader className="com.ibm.commerce.foundation.dataload.datareader.XmlReader" >
           <_config:XmlHandler className="com.ibm.commerce.foundation.dataload.xmlhandler.NVPXmlHandler" />
      </_config:DataReader>
    
           <_config:BusinessObjectBuilder className="com.ibm.commerce.foundation.dataload.businessobjectbuilder.MapObjectBuilder">
    <_config:DataMapping>
       <_config:mapping xpath="internal_field_A" value="external_field_A" />
       <_config:mapping xpath=" internal_field_B" value="external_field_B" />   
       <_config:mapping xpath="internal_compositeUniqueKey" value="" valueFrom="Fixed">
                 <_config:ValueHandler className="com.ibm.commerce.foundation.dataload.config.OrderedConcatenateValueHandler" >
                        <_config:Parameter name="1" value="external_field_A" />
                  <_config:Parameter name="2" value="_" valueFrom="Fixed" />
                           <_config:Parameter name="3" value="external_field_B"/> 
                    </_config:ValueHandler> 
       </_config:mapping >
    <_config:mapping xpath="" value="delete"  deleteValue="true"/>
          </_config:DataMapping>
          <_config:BusinessObjectMediator className="com.mycompany.commerce.foundation.dataimport.dataload.mediator.myCompanySolrInputDocumentMediator">
       <!-- idFieldName value should match the index uniqueKey value -->
       <_config:property name="idFieldName" value="internal_ compositeUniqueKey"/>  
          </_config:BusinessObjectMediator>
        </_config:BusinessObjectBuilder>
      </_config:DataLoader>
    
    </_config:DataloadBusinessObjectConfiguration>
    
    For example, the CSV loader, in bold:
    
    <?xml version="1.0" encoding="UTF-8"?>
    <_config:DataloadBusinessObjectConfiguration 
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config ../xsd/wc-dataload-businessobject.xsd" 
       xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config">
    <_config:DataLoader className="com.ibm.commerce.foundation.dataload.BusinessObjectLoader">  
       
    <_config:DataReader className="com.ibm.commerce.foundation.dataload.datareader.CSVReader" firstLineIsHeader="true"/>     
    
    <_config:BusinessObjectBuilder className="com.ibm.commerce.foundation.dataload.businessobjectbuilder.MapObjectBuilder">      
    <_config:DataMapping>
       <_config:mapping xpath="internal_field_A" value="external_field_A" />
       <_config:mapping xpath=" internal_field_B" value="external_field_B" />   
       <_config:mapping xpath="internal_compositeUniqueKey" value="" valueFrom="Fixed">
                 <_config:ValueHandler className="com.ibm.commerce.foundation.dataload.config.OrderedConcatenateValueHandler" >
                        <_config:Parameter name="1" value="external_field_A" />
                  <_config:Parameter name="2" value="_" valueFrom="Fixed" />
                           <_config:Parameter name="3" value="external_field_B"/> 
                    </_config:ValueHandler> 
       </_config:mapping >
    <_config:mapping xpath="" value="delete"  deleteValue="true"/>
          </_config:DataMapping>
          <_config:BusinessObjectMediator className="com.mycompany.commerce.foundation.dataimport.dataload.mediator.myCompanySolrInputDocumentMediator">
       <!-- idFieldName value should match the index uniqueKey value -->
       <_config:property name="idFieldName" value="internal_ compositeUniqueKey"/>  
          </_config:BusinessObjectMediator>
        </_config:BusinessObjectBuilder>
      </_config:DataLoader>
    
    </_config:DataloadBusinessObjectConfiguration>
    
To extend the Dataload indexer mediator:
  1. Create a myCompanySolrInputDocumentMediator class that extends from the com.ibm.commerce.foundation.dataimport.dataload.mediator.AbstractSolrInputDocumentMediator class.
  2. Implement the transform method which takes the dataObject object and deleteFlag boolean as input parameters. The dataObject object is a Java representation of a NVP mapping passed from the builder. The deleteFlag boolean is a flag indicating whether a dataObject passed is to be deleted.

Example

The following snippet represents the pseudo code of this class:

public class myCompanySolrInputDocumentMediator extends
AbstractSolrInputDocumentMediator{

protected void transform(Object dataObject, boolean deleteFlag)throws DataLoadException {

// read the data object and assign to local variables
if (data != null && !data.isEmpty()) {
readProductAttribute1 = (ArrayList)data.get(PRODUCT_ATTRIBUTE1_DATA_NAME);
readProductAttribute2 = (ArrayList)data.get(PRODUCT_ATTRIBUTE2_DATA_NAME);
}

//Compute Product Attribute 3 from read attributes 1 and 2
computedProductAttribute3 = ....

// Create the SolrInputDocument object
SolrInputDocument doc= new SolrInputDocument();

//Read from the loader configuration file the unique id name of the Solr Document
String idFieldName = getSolrIdFieldName();

// Add the product attributes field names and their values to the Solr doc object
if (idFieldName != null) {
doc.addField(PRODUCTATTRIBUTE1_SOLR_FIELD_NAME, readProductAttribute1);
doc.addField(PRODUCTATTRIBUTE1_SOLR_FIELD_NAME, readProductAttribute2);
doc.addField(PRODUCTATTRIBUTE3_SOLR_FIELD_NAME, computedProductAttribute3);
}

// Create the SolrDocumentDataObject object, and set the operatio mode
SolrDocumentDataObject solrDocDO = new SolrDocumentDataObject(idFieldName, doc);
if (deleteFlag) {
solrDocDO.setOpMode('D');
} else {
solrDocDO.setOpMode('U');
}

// Add the SolrDocumentDataObject to the list of SolrDocumentDataObject
addSolrInputDocDataObjects(solrDocDO);
}

} 

What to do next

To work with an inventory index in WebSphere Commerce search, complete the following tutorial: Tutorial: Indexing external inventory data in WebSphere Commerce search.