Using the Solr atomic update feature with Search

Atomic update, also known as partial update, enables you to make index updates on specified stored fields in an existing document. This approach is especially useful when a core has many fields and only a small number of them have been changed between index builds.

Solr supports several atomic update modifiers:
set
Set or replace a particular value, or remove the value if null is specified as the new value.
add
Adds an additional value to a list.
remove
Removes a value (or a list of values) from a list.
removeregex
Removes from a list that match the given Java regular expression.
inc
Increments a numeric value by a specific amount (use a negative value to decrement),
All original source fields must be stored for field modifiers to work correctly. This is the default in Solr. IndexLoad only supports the set modifier, so by default IndexLoad will fetch the data from a CSV file or database, and use that value to replace the specified stored Solr field value.

Example

Assume there are three inventory records for same product but from different stores, as follows:
catentry_id        "10044"
inv_strlocqty_1        100
inv_strlocqty_2        200
inv_strlocqty_3        300
indexedTime        "2018-11-28T14:51:58.042Z"
An inventory update occurs, updating an available inventory in Store 1 to 400, and the available inventory in Store 2 to 500.
catentry_id store_id availquantity
10044 1 400
10044 2 500
To capture this change, run IndexLoad, loading data using the following source CSV file. You need only provide the final updated quantity for the changed stores in this CSV file.
catentry_id,inv_strlocqty_1,inv_strlocqty_2
10044,400, 500
After index load, the document in Solr will look like:

catentry_id        "10044"
inv_strlocqty_1        400
inv_strlocqty_2        500
inv_strlocqty_3        300
indexedTime        "2018-11-28T15:51:38.033Z"

Procedure to use atomic update with a CSV file

  1. Create the environment configuration file workspace_dir\workspace\search-config-ext\src\index\indexloadwc-indexload-profileName-csv.xml, where profileName is the URL parameter you use when you call IndexLoad in a web browser. In following scenarios, price-delta is used as the profileName for the CSV scenario, and inventory-delta as profileName for the SQL scenario.
    The wc-indexload-profileName-csv.xml file contains environment control information and global properties that are required by Index Load. For example, it includes the specified data mapping between the CSV field and the corresponding Solr field. (You have the option of leaving a column empty of data if its name in this file matches a Solr field name.) This file also specifies the DataReader and mediator. To load from a CSV file, specify com.ibm.commerce.search.indexload.reader.SearchIndexLoadCSVReader as the reader, and com.ibm.commerce.search.indexload.mediator.SearchIndexLoadCSVMediator as BusinessObjectMediator. The wc-indexload-profileName-csv.xml file does not typically require customization. You can use the following sample file as-is.
    <_config:DataLoader className="com.ibm.commerce.search.indexload.loader.SearchIndexLoadCSVLoader" >
             
                     <_config:property name="FirstLineIsHeader" value="true" />
                             <_config:property name="Charset" value="UTF-8" />
                    <_config:property name="TokenDelimiter" value="," />
        <_config:DataReader className="com.ibm.commerce.search.indexload.reader.SearchIndexLoadCSVReader" />
            <_config:BusinessObjectBuilder>
                        <_config:DataMapping>
          </_config:DataMapping>
          <_config:BusinessObjectMediator className="com.ibm.commerce.foundation.internal.server.services.indexload.mediator.SolrIndexLoadBusinessObjectMediator"/>
                                     <_config:BusinessObjectMediator className="com.ibm.commerce.search.indexload.mediator.SearchIndexLoadCSVMediator" />
                       </_config:BusinessObjectBuilder>
      </_config:DataLoader>
  2. Create the profile configuration file wc-indexload-profileName.xml.

    The wc-indexload-profileName.xml file contains configurable performance attributes, and one or multiple load item definitions. It also contains the CSV file location and the target core name. Profile names that you define in configuration files are then substituted in as a URL parameter when you call IndexLoad in a web browser. The load item configurations are listed under the load order section of this file. Every LoadItem definition specifies a particular load item configuration such as coreName or location. Multiple load items are run in parallel. Within every load item configuration section, the environment configuration file wc-indexload-profileName-csv.xml must be specified. The profile configuration file also contains DataWriter configuration; keep the original com.ibm.commerce.search.indexload.writer.SearchIndexLoadBatchService as the writer. The CSV file need only contain the changed field value. IndexLoad will use the Solr atomic update API to update the specified stored field.

    Example: wc-indexload-price-delta.xml
    <_config:LoadItem name="ExternalPrice-1" fileName="wc-indexload-externalprice-csv.xml">
            <_config:property name="coreName" value="MC_10001_CatalogEntry_Price1_generic" />
            <_config:property name="groupName" value="1" />
            <_config:DataSourceLocation location="resources/search/index/indexload/contract-price-example1.csv" />
    </_config:LoadItem>
  3. Run IndexLoad in POST mode with the profileName defined in step 2. For example, if the profileName configuration file named as wc-indexload-price-delta.xml, then run indexload with the URL:
    https://searchMaster:3738/search/admin/resources/indexload/profile/price-delta/start?catalogId=#MASTER_CATALOG_ID
  4. After IndexLoad has run successfully, run WCB to build the package and deploy the package into the Search Docker container. For more information, see Packaging customized code for deployment.

Procedure to use atomic update via SQL

  1. Create the environment configuration file workspace_dir\workspace\search-config-ext\src\index\indexloadwc-indexload-profileName-sql.xml.

    This SQL version of the environment configuration file specifies the parallel indexing configuration. This configuration will be used to evenly split the dataset across multiple threads when run with the SolrIndexLoadQueryLoader and the configuration SQL code, which is used to capture the data from the specified datasource.

    This configuration file also specifies the data reader. There are two DataReader entries:
    com.ibm.commerce.search.indexload.reader.SearchIndexLoadQueryReader
    You can use this command to read unique records from database, and later save them into the index.
    com.ibm.commerce.search.indexload.reader.SearchIndexLoadQueryMultiplexReader
    This command is used to transform multiple data entries from the database table into a single index row with numerous dynamic index fields.
    Folllowing is a sample DataReader entry, which is used to get the updated inventory from a specific time. Since there are multiple records for any unique catentryId, the example uses com.ibm.commerce.search.indexload.reader.SearchIndexLoadQueryMultiplexReader to accumulate multiple rows.
    <_config:DataReader className="com.ibm.commerce.search.indexload.reader.SearchIndexLoadQueryMultiplexReader">
                <_config:DynamicFields>
                  <_config:DynamicField dynamicFieldName="inv_strlocqty_%storeId%" dynamicFieldValue="%quantity%" indexingMode="replace" />
                </_config:DynamicFields>
                <_config:property name="KeyFieldName" value="catentry_id" />
                <_config:property name="ExcludeFieldNames" value="storeId,quantity" />
                <_config:property name="minDelta" value="5"/>  
             <_config:Query>
                        <_config:SQL>
                                SELECT invavl.catentry_id, invavl.STORE_ID,INVAVL.AVAILQUANTITY
                                      FROM INVAVL, CATGPENREL
                                WHERE CATGPENREL.CATALOG_ID = 10001
                                                      AND INVAVL.CATENTRY_ID = CATGPENREL.CATENTRY_ID
                                              AND INVAVL.QUANTITYMEASURE = 'C62'
                                              AND INVAVL.LASTUPDATE BETWEEN '2018-11-25 16:45:24.000' AND current timestamp ORDER BY INVAVL.CATENTRY_ID WITH UR
                                </_config:SQL>
                <_config:ColumnMapping columnName="CATENTRY_ID" indexFieldName="catentry_id" />
                <_config:ColumnMapping columnName="STORE_ID" indexFieldName="storeId" />
                <_config:ColumnMapping columnName="AVAILQUANTITY" indexFieldName="quantity" />
               </_config:Query>
     </_config:DataReader>
  2. Create the profile configuration file workspace_dir\workspace\search-config-ext\src\index\indexloadwc-indexload-profileName.xml.
    As with the CSV file approach, specify the SQL configuration file within the load item section:
    <_config:LoadItem name="Inventory-Delta" fileName="wc-indexload-dom-delta-inventory-sql.xml">
            <_config:property name="coreName" value="MC_10001_CatalogEntry_Inventory_generic" />
            <_config:property name="groupName" value="I" />
    </_config:LoadItem>
  3. Run IndexLoad with the defined profileName. For example, if in step 2, the profile configuration name is wc-indexload-inventory-delta.xml, then run:
    https://searchMaster:3738/search/admin/resources/indexload/profile/inventory-delta/start?catalogId=#MASTER_CATALOG_ID
  4. After IndexLoad has run successfully, run WCB to build the package and deploy the package into the Search Docker container. For more information, see Packaging customized code for deployment.