Introduced in Feature Pack 2

Preprocessing the WebSphere Commerce search index data

You must preprocess the search index data to prepare your WebSphere Commerce data for indexing.

Before you begin

  • WebSphere Commerce DeveloperEnsure that the test server is stopped.
  • Ensure that your administrative server is started.
    • If WebSphere Commerce is managed by WebSphere Application Server Deployment Manager (dmgr), start the deployment manager and all node agents. Your cluster can also be started.
    • If WebSphere Commerce is not managed by WebSphere Application Server Deployment Manager (dmgr), start the WebSphere Application Server server1.
  • Ensure your server has one of the following feature packs installed to support WebSphere Commerce search:
    • Feature Pack 2Use this procedure to preprocess a catalog entry index for your master catalog.
    • Introduced in Feature Pack 3Use this procedure to preprocess both a catalog entry index and a category index for your master catalog.
  • Ensure that you have completed the following task:
  • Feature Pack 8If you use the TRUNCATE statement when running the utility:
    1. Download the interim fix for APAR JR52982, or apply the latest cumulative interim fix for Feature Pack 8, JR53438.fep.
    2. Ensure that your database is either DB2 9.7 or later, or Oracle.

About this task

The preprocess utility extracts and flattens WebSphere Commerce data and then outputs the data into a set of temporary tables inside the WebSphere Commerce database. The data in the temporary tables is from the base schema. Then, the index building utility uses this data to populate the data into Solr indexes by using the Solr Data Import Handler (DIH).

Feature Pack 8The preprocess utility can use the TRUNCATE statement when run, to avoid dropping tables during preprocessing. The TRUNCATE statement that is used by default is:
  • DB2TRUNCATE TABLE #TABLE_NAME# IMMEDIATE
  • OracleTRUNCATE TABLE #TABLE_NAME#
If you have a business need to change it, complete step 1 in this task.

Procedure

  1. Optional: Feature Pack 8 Change the default TRUNCATE statement:
    1. Create a properties file that contains the new statement. This properties file is then passed in to the di-preprocess utility.
      The properties file must contain content that resembles the following snippet:
      • DB2truncateTableSQL=TRUNCATE TABLE #TABLE_NAME# IMMEDIATE
      • OracletruncateTableSQL=TRUNCATE TABLE #TABLE_NAME#
    2. If you are using the UpdateSearchIndex scheduler job and store preview, update the catalog component configuration file with the following values:
      DB2
      
      <_config:property name="DropTempTable" value="false" />
      <_config:property name="TruncateTableSQL" value="TRUNCATE TABLE #TABLE_NAME# IMMEDIATE" />
      
      Oracle
      
      <_config:property name="DropTempTable" value="false" />
      <_config:property name="TruncateTableSQL" value="TRUNCATE TABLE #TABLE_NAME#" />
      
      For more information, see Changing properties in the component configuration file (wc-component.xml) (WC EAR).
    3. If you are using parallel preprocessing, define the following global property in the parallel processing property file:
      
      Global.preprocessing-truncate-table-sql=TRUNCATE TABLE #TABLE_NAME#
      
      For more information, see Sharding input properties file.
  2. Complete one of the following tasks:
    • SolarisLinuxAIXLog on as a WebSphere Commerce non-root user.
    • For IBM i OS operating systemLog on with a user profile that has *SECOFR authority.
    • WindowsLog on with a user ID that is a member of the Windows Administration group.
  3. Go to the following directory:
    • For IBM i OS operating systemSolarisLinuxAIX WC_installdir/bin
    • WebSphere Commerce Developer WCDE_installdir\bin
  4. Run the preprocessing utility:
    • Windows di-preprocess.bat full-path -instance instance_name -dbuser dbuser -dbuserpwd dbuserpwd [-fullbuild true | false] [-localename localename] [-onelevel true | false] [-multithread true | false] [-workspace workspaceId] [-dbURL dbURL] [-skipDeltaNoEntry skipDeltaNoEntry] [-passwordFile passwordFile] [-nonLangTables nonLangTables] [-langTables langTables] [-publishedOnly true | false] [-deepUnpublish true | false] [-deepSequence true | false] [-dropTempTable true | false] [-propFile propFile]
    • For IBM i OS operating systemSolarisLinuxAIXdi-preprocess.sh full-path -instance instance_name -dbuser dbuser -dbuserpwd dbuserpwd [-fullbuild true | false] [-localename localename] [-onelevel true | false] [-multithread true | false] [-workspace workspaceId] [-dbURL dbURL] [-skipDeltaNoEntry skipDeltaNoEntry] [-passwordFile passwordFile] [-nonLangTables nonLangTables] [-langTables langTables] [-publishedOnly true | false] [-deepUnpublish true | false] [-deepSequence true | false] [-dropTempTable true | false] [-propFile propFile]
    • WebSphere Commerce Developerdi-preprocess.bat full-path [-fullbuild true | false] [-localename localename] [-onelevel true | false] [-multithread true | false] [-dbURL dbURL] [-skipDeltaNoEntry skipDeltaNoEntry] [-passwordFile passwordFile] [-nonLangTables nonLangTables] [-langTables langTables] [-publishedOnly true | false] [-deepUnpublish true | false] [-deepSequence true | false]
    Where:
    full-path
    Required: The full directory location of the preprocessing configuration files, for example: CommerceServer70/instances/instance name/search/pre-processConfig/MC_10001/DB2.
    The names of these files start with wc-dataimport-preprocess, for example, wc-dataimport-preprocess-fullbuild.xml. The search index setup utility installs this set of files when you deploy WebSphere Commerce search.
    Tip: To preprocess both the catalog entry index and the category index at the same time, specify the full path for the catalog entry index only. It then processes the configuration files for both the catalog entry index and the category index by default. This is because the directory which contains the configuration files for the category index (called CatalogGroup) is located one level below the directory for the catalog entry index (called CatalogEntry).
    For the catalog entry index: by default, the configuration files install at the following path:
    • SolarisLinuxAIXWindowsWC_installdir/instances/instance_name/search/pre-processConfig/MC_masterCatalogId/databaseType
    • For IBM i OS operating systemWC_instance_root/instances/instance_name/search/pre-processConfig/MC_masterCatalogId/databaseType
    • WebSphere Commerce DeveloperWCDE_installdir/search/pre-processConfig/MC_masterCatalogId/databaseType
    Introduced in Feature Pack 3For the category index: by default, the configuration files install in a directory called CatalogGroup, one level below the catalog entry index configuration files, for example:
    • SolarisLinuxAIXWindowsWC_installdir/instances/instance_name/search/pre-processConfig/MC_masterCatalogId/databaseType/CatalogGroup
    • For IBM i OS operating systemWC_instance_root/instances/instance_name/search/pre-processConfig/MC_masterCatalogId/databaseType/CatalogGroup
    • WebSphere Commerce DeveloperWCDE_installdir/search/pre-processConfig/MC_masterCatalogId/databaseType/CatalogGroup
    Feature Pack 6 or laterFor the inventory index: by default, the configuration files install in a directory called Inventory, one level below the catalog entry index configuration files, for example:
    • SolarisLinuxAIXWindowsWC_installdir/instances/instance_name/search/pre-processConfig/MC_masterCatalogId/databaseType/SubTypes/Inventory
    • For IBM i OS operating systemWC_instance_root/instances/instance_name/search/pre-processConfig/MC_masterCatalogId/databaseType/SubTypes/Inventory
    • WebSphere Commerce DeveloperWCDE_installdir/search/pre-processConfig/MC_masterCatalogId/databaseType/SubTypes/Inventory
    instance
    The name of the WebSphere Commerce instance with which you are working (for example, demo).
    dbuser

    DB2The name of the user who is connecting to the database.

    OracleThe user ID connecting to the database. If you are using workspaces, the database user must be granted cross-schema privileges to create and drop tables. Otherwise, you cannot preview changes made in workspaces.

    dbuserpwd
    The password for the user who is connecting to the database.
    Feature Pack 8Alternatively, you can use the passwordFile parameter to specify the encrypted password from a file.
    fullbuild
    Optional: A flag that indicates whether it is a full index build. The accepted values are either true or false. The default value is true.
    localename
    Optional: The locale that should be indexed. The accepted values are either:
    • All
    Or one of the following values:
    • en_US
    • fr_FR
    • de_DE
    • it_IT
    • es_ES
    • pt_BR
    • zh_CN
    • zh_TW
    • ko_KR
    • ja_JP
    • ru_RU
    • ro_RO
    • pl_PL
    The default value is All.
    Introduced in Feature Pack 3onelevel
    Introduced in Feature Pack 3Optional: A flag you can use to save time when setting up preprocessing for the catalog entry index (CatalogEntry indextype) and the category index (CatalogGroup indextype) at the same time. If you set the onelevel flag to true, then for the full-path value, you only need to specify the path to the preprocessing configuration files for the catalog entry index. The utility will automatically look for the category index files in the CatalogGroup directory one level down.

    Example:

    • Instead of specifying both paths for your full-path value, as shown here:
      C:/Program Files/IBM/WebSphere/CommerceServer70/instances/demo/search/pre-processConfig/MC_10001/DB2,
      C:/Program Files/IBM/WebSphere/CommerceServer70/instances/demo/search/pre-processConfig/MC_10001/DB2/CatalogGroup
    • Specify only the first path, as shown here:
      C:/Program Files/IBM/WebSphere/CommerceServer70/instances/demo/search/pre-processConfig/MC_10001/DB2

    The default value of the onelevel flag is true.

    Introduced in Feature Pack 3multithread
    Introduced in Feature Pack 3Optional: Preprocesses data by using multiple threads.
    Introduced in Feature Pack 3The number of threads used is based on the number of existing wc-dataimport-preprocess-XXXXX.xml files, excluding the wc-dataimport-preprocess-fullbuild.xml and wc-dataimport-preprocess-deltaupdate.xml files.
    Introduced in Feature Pack 3The default value is false.
    Feature Pack 5 or laterworkspace
    Feature Pack 5 or laterThe workspace index to preprocess. This value is case-sensitive. If specified, the specified workspace index is preprocessed. If not specified, the base schema index is preprocessed. The default value is to preprocess the base schema index.
    To get the workspace ID, either:
    • Open the workspace in the Workspace Management tool in the Management Center. The workspace code is the workspace ID; or
    • If the workspace has an active task group, run the following SQL query: select * from cmwsschema, where the workspace ID is listed under the workspace column.
    Feature Pack 5 or laterOracledbURL
    Feature Pack 5 or laterOracle1 The database URL the utility uses to connect to the database. If not provided, the utility constructs a database URL based on the default database value.
    Feature Pack 7 or laterskipDeltaNoEntry
    Feature Pack 7 or laterOptional: When delta preprocessing (fullbuild set to false), the utility checks if there are any delta updates to perform. The utility ends if there are no delta updates to perform. Otherwise, the preprocessing is performed as expected.
    Feature Pack 7 or laterIf this parameter is set to false, and there are no delta updates to perform, the delta preprocessing updates all of the temporary tables to empty. This might save time, where the utility would otherwise check all the tables and process no records.
    Feature Pack 7 or laterThe default value is false.

    Feature Pack 7You must apply the interim fix for APAR #JR50553 to use this parameter.

    Feature Pack 7 or laterpublishedOnly
    Feature Pack 7 or laterOptional: Allows only products from published categories to be displayed in the keyword search results when deep category unpublish is enabled.
    Feature Pack 7Note: Interim fixes for Feature Pack 7 are required to use this parameter. Apply the latest cumulative interim fix for Feature Pack 7, JR52306.fep.
    Feature Pack 7 or laterThe default value is false.
    Feature Pack 7 or laterdeepUnpublish
    Feature Pack 7 or laterOptional: Enables preprocessing for the deep category unpublish feature.
    Feature Pack 7Note: Interim fixes for Feature Pack 7 are required to use this parameter. Apply the latest cumulative interim fix for Feature Pack 7, JR52306.fep.
    Feature Pack 7 or laterThe default value is false.
    Feature Pack 7 or laterFor more information, see Hiding categories and products using deep category unpublish.
    Feature Pack 7 or laterdeepSequence
    Feature Pack 7 or laterOptional: Enables preprocessing for the deep search sequencing feature.
    Feature Pack 7Note: Interim fixes for Feature Pack 7 are required to use this parameter. Apply the latest cumulative interim fix for Feature Pack 7, JR52306.fep.
    Feature Pack 7 or laterThe default value is false.
    Feature Pack 7 or laterFor more information, see Hiding categories and products using deep category unpublish.
    Feature Pack 8passwordFile
    Feature Pack 8Optional: The full path to the password.properties file that contains the password for the user who is connecting to the database. For example, C:\password.properties.
    Feature Pack 8The password.properties file contains the following content:
    
    dbUserPassword=encrypted_pwd
    
    Where encrypted_pwd is the password that has been encrypted using the wcs_encrypt utility.
    Feature Pack 8nonLangTables
    Feature Pack 8Optional: Preprocesses only language-insensitive tables.
    Feature Pack 8Language-insensitive tables use the following naming convention: TI_string_number. For example, TI_CATENTRY_0.
    Feature Pack 8The default value is false.
    Feature Pack 8langTables
    Feature Pack 8Optional: Preprocesses only language-sensitive tables.
    Feature Pack 8Language-sensitive tables use the following naming convention: TI_string_number_number. For example, TI_ATTR_0_1 for United States English.
    Feature Pack 8The default value is false.
    Feature Pack 8Usage notes for language table parameters:
    • When both nonLangTables and langTables values are set to false, all tables are preprocessed.
    • If setting both values to true, first run the utility with nonLangTables, then run the utility with langTables.
    • For sites with many supported languages enabled, you can preprocess only language-insensitive tables first, and then parallelly preprocess language-sensitive tables.
    Feature Pack 8dropTempTable
    Feature Pack 8Indicates whether to drop tables when preprocessing the search index.
    Feature Pack 8Passing in a value of false uses a TRUNCATE statement on the tables.
    Feature Pack 8The default value is true, which uses a DROP statement on the tables.
    Note: This parameter supports only DB2 9.7 or later, or Oracle databases.
    Feature Pack 8propFile
    Feature Pack 8The full path to the properties file to pass in to the utility.
  5. Ensure that the utility runs successfully.
    Verify that the output from the script contains no errors and that the last part of the output contains the following lines:
    "Program exiting with exit code: 0.
    Data import pre-processing completed successfully with no errors."
    
    Also, inspect the following file for errors:
    • WebSphere Commerce DeveloperWCDE_installdir\logs\wc-dataimport-preprocess.log
    • SolarisLinuxAIXWindowsWC_installdir\logs\wc-dataimport-preprocess.log
    For more information about exit codes, see WebSphere Commerce search utility exit codes.
    To get more logging information, update the logging level from INFO to FINEST in the following file:
    • WebSphere Commerce DeveloperWCDE_installdir\workspace\WC\xml\config\dataimport\logging.properties
    • SolarisLinuxAIXWindowsWC_installdir/instances/instance_name/xml/config/dataimport/logging.properties
    • For IBM i OS operating systemWC_instance_root/xml/config/dataimport/logging.properties
    # Default global logging level, INFO
    .level=FINEST
    
    Feature Pack 7 or later
    # Default global logging level, INFO
    com.ibm.commerce.level=FINEST
    
    You can also increase the log file size and the number of log files. For example:
    
    # Limiting size of output file in bytes:
    java.util.logging.FileHandler.limit=50000000
    
    # Number of output files to cycle through
    java.util.logging.FileHandler.count=2 
    

What to do next

After preprocessing the search index data, you must build the search index.
1 Feature Pack 5You must apply the interim fix for APAR #JR44514 to use this parameter.