WebSphere Commerce search performance tuning

There are several search performance tuning hints and tips to consider when administering WebSphere Commerce search.

WebSphere Commerce search performance tuning falls under the following sections:

Indexing server
Search runtime server

Indexing server

Consider the following factors when tuning the indexing server:

Search caching for the indexing server

You should typically disable all Solr caches on the indexing server.

When to perform full search index builds

The WebSphere Commerce search index is automatically built when certain business tasks are performed, as outlined in ../refs/rsdsearchindexhints.html. In several cases, common business tasks result in delta index builds that do not pose a significant risk to production system performance. However, performing several delta index builds without occasional full index builds might result in the search index gradually degrading over time due to fragmentation. To avoid this issue, performing full search index builds when possible ensures that the search index performs well over time.

When Lucene receives a delete request, it does not delete entries from the index, but instead marks them for deletion and adds updated records to the end of the index. This results in the catalog unevenly spreading out across different segment data files in the search index, and might result in increased search response times. If you have a dedicated indexing server, consider scheduling a full search index build that runs in the background approximately once per month, so that the deleted entries are flushed out, and to optimize the data.

Tuning index buffer size and commit actions during dataimport (buildindex)

You can tune your solrconfig.xml file to allocate sufficient memory for index buffering and prevent commit actions when you are building your index. When the RAM buffer for index updates is full, Solr begins to perform commit actions that persist data onto disks. When these commit actions occur, Solr has a global exclusive lock on your entire JVM. This lock prevents other threads from performing update operations, even when the thread is working on different records or files. This locking can increase the amount of time that is required to build your index. By increasing your RAM buffer size, and disabling the commit trigger, you can reduce the chances of this locking. You can tune your Solr parameters for commit timing and buffer size in the solrconfig.xml file:

Allocate more memory for index buffering by changing the value for the ramBufferSizeMB parameter. 2048 MB is the maximum memory that you can allocate:
```
<ramBufferSizeMB>2048</ramBufferSizeMB>
```
Disable the document-based count buffer setting to reduce the occurrence of commit actions by commenting out the maxBufferedDocs parameter:
```

```
Disable the server side automatic commit trigger to also reduce occurrence of commit actions by commenting out the autoCommit trigger:
```

```

Paging and database heap size configuration

Ensure that your memory and paging size is configured according to the size of your catalog data or if your environment contains multiple indexes for different languages. For example, if you are having issues with accessing or building large amounts of index data:

Increase the default paging size for your operating system. For example, 3 GB. In cases where the operating system requires a higher paging size, adding more memory to the system also helps to resolve issues.
Increase the default database heapsize to a larger value. For example, increase the DB2 heap size to 8192.
Increase the file descriptor limit to a higher value. For example: ulimit -n 8192.

Heap size configuration

Ensure that your WebSphere Commerce search heap size is configured according to the size of your catalog data or if your environment contains multiple indexes for different languages. For example, if you are having issues with accessing large amounts of index data, increase the default heapsize to a larger value such as 1280.

Important:

Using large heap sizes in WebSphere Commerce search, for example, those more than 4 GB in size, require a 64-bit installation of Apache Solr. That is, for example, if you intend to increase the heap size to values greater than 1280, ensure that you install the 64-bit version of Apache Solr.
Do not exceed 28 GB of heap size per JVM, even when using a 64-bit environment. In a 64-bit JVM, there is an address compressed reference optimization feature that might be disabled if the heap space exceeds 28 GB, which results in up to a 30% overall throughput degradation.

Shared pool size configuration

Ensure that the SHARED_POOL_SIZE is configured according to your environment. Increasing the shared pool size might improve the performance of the di-preprocess utility.

For example:


ALTER SYSTEM SET SHARED_POOL_SIZE='668M' SCOPE=BOTH

Multithreaded running of SQL query expressions

Consider using multithreading in DB2 to allow for increased performance when preprocessing the search index.

To do so, update the datasource property of com.ibm.db2.jcc.DB2BaseDataSource to ANY. For more information, see Common IBM Data Server Driver for JDBC and SQLJ properties for DB2 servers.

Search runtime server

Consider the following factors when tuning the search runtime server:

Caching considerations

Search caching for the runtime production subordinate servers

The starter configuration included in the CatalogEntry solrconfig.xml file is only designed for a small scale development environment, such as WebSphere Commerce Developer.

When redeploying this index configuration to a larger scale system, such as a staging or production system, it is highly recommended to customize at a minimum the following cache parameters to tune up your system performance:

queryResultWindowSize
queryResultMaxDocsCached
queryResultCache
filterCache (Required on the product index when an extension index such as Inventory exists)
documentCache (Required on the product index when an extension index such as Inventory exists)

The following example demonstrates how to define cache sizes for the Catalog Entry index and its corresponding memory heap space required in the JVM:

Sample catalog size:

Catalog size: 1.8 million entries
Total attributes: 2000
Total categories: 10000
Each product contains: 20 attributes
Average size of each Catalog Entry: 10 KB

Sample calculation:

queryResultWindowSize

The size of each search result page in the storefront, such as 12 items per page. This includes 2 prefetch pages.

This results in a queryResultWindowSize value of 36 (3 x 12).

queryResultMaxDocsCached

For optimal performance, set this value to be the same value as queryResultWindowSize.

queryResultCache

The size of each queryResultCache is 4 bytes per docId (int) reference x queryResultWindowSize, for a value of 144 bytes.

Allocate 10M cache slots for caching the first three pages of the main query.

This results in a total required queryResultCache size of 1.44 GB (144 B x 10000000).

filterCache

Assume an average search result size to be 5% of the entire catalog size of 1.8 M, or 90,000.

The size of each filterCache is 4 bytes per docId (int) reference x random number of search hits of 90,000, equalling 360 KB.

Assign 5000 entries for the filterCache.

The total required size for the filterCache results in a value of 1.8 GB (360 KB x 5000).

Note: The filterCache is required on the product index when an extension index such as Inventory exists, so that the query component functions correctly.

documentCache

Assume an average size of each Catalog Entry document to be 10 KB.

Assign 5% of the entire catalog to be cached, or 100000 entries for the documentCache.

The total required size for the documentCache results in a value of 1.0 GB (10 KB x 100000).

Note:

The documentCache size should be set to at least the maximum anticipated size of a search result.
The documentCache is required on the product index when an extension index such as Inventory exists, so that the query component functions correctly.

As a result, the estimated JVM heap size required for each Catalog Entry core is 4.3 GB (1.44 GB + 1.8 GB + 1.0 GB).

Managing cache sizes to conform to JVM memory

Ensure that you configure the fieldValueCache of the catalog entry index core in the solrconfig.xml file. This configuration can prevent out-of-memory issues by limiting its size to conform to JVM memory.

The cache set size depends on the facets field quantity and catalog size. The cache entry size can roughly be computed by the quantity of catalog entries in the index core, which is then multiplied by 4 bytes. That is, the potential quantity of cache entries equals the quantity of potential facets.

For example, in the solrconfig.xml file:


<fieldValueCache class="solr.FastLRUCache"
                 size="300"
                 autowarmCount="128"
                 showItems="32" />

Note: The recommended solr.FastLRUCache caching implementation does not have a hard limit to its size. It is useful for caches that have high hit ratios, but may significantly exceed the size value that you set. If you are using solr.FastLRUCache, monitor your heap utilization during peak periods. If the cache is significantly exceeding its limit, consider changing the fieldValueCache class to solr.LRUCache in order to avoid performance issues or an out-of-memory condition.

For more information, see Solr Caching.

Tuning the search relevancy data cache

Ensure that you tune the search relevancy data cache for your catalog size.

Relevance data is stored in the following cache instance:

service/cache/WCSearchNavigationDistributedMapCache

Each entry ranges between 8 - 10 KB, containing 10 - 20 relevancy fields. The cache instance also contains other types of cache entries. The database is used for every page hit when the cache instance is full, reducing performance.

Tuning the search data cache for faceted navigation

The WebSphere Commerce search server code uses the WebSphere Dynamic Cache facility to perform caching of database query results. Similar to the data cache used by the main WebSphere Commerce server, this caching code is referred to as the WebSphere Commerce search server data cache

For more information, see WebSphere Commerce search data cache.

Facet performance considerations

Consider the following facet performance tuning considerations when you work with facets in starter stores:

Tune the size of the services/cache/WCSearchNavigationDistributedMapCache cache instance according to the number of categories.
Tune the size of the services/cache/WCSearchAttributeDistributedMapCache cache instance according to the number of attribute dictionary facetable attributes.
Avoid enabling many attribute dictionary faceted navigation attributes in the storefront (Show facets in search results). Avoiding many of these attributes can help avoid Solr out-of-memory issues.

Extension index considerations

Consider the following usage when an extension index such as Inventory exists in WebSphere Commerce search:

The filterCache and documentCache are required on the product index when an extension index such as Inventory exists in WebSphere Commerce search, so that the query component functions correctly.
You should typically disable all other internal Solr caches for the extension index in the search runtime.

Configuration options

Search configuration

Ensure that you are familiar with the various Solr configuration parameters, Solr Wiki: solrconfig.xml. The documentation contains information for typical configuration customizations that can potentially increase your search server performance. For example, if your store contains a high number of categories or contracts, or if your search server is receiving Too many boolean clauses errors, increase the default value for maxBooleanClauses.

Indexing changes and other considerations

Garbage collection

The default garbage collector policy for the WebSphere Commerce JVM is the Generational Concurrent Garbage Collector. Typically, you do not need to change this garbage collector policy.

You can activate the Generational Concurrent Garbage Collector for the WebSphere Commerce search JVM using the -Xgcpolicy:gencon command-line option.

Note: Using a garbage collection policy other than the Generational Concurrent Garbage Collector might result in situations with increased request processing times and high CPU utilization.

For more information, see Generational Concurrent Garbage Collector.

Spell checking

There might be a performance impact when you enable spell checking for WebSphere Commerce search terms.

You might see performance gains in transaction throughput if either spell checking is skipped where necessary, or when users search for products with catalog overrides.

For example, a search term that is submitted in a different language than the storefront requires resources for spell checking. However, product names with catalog overrides are already known and do not require any resources for spell checking.

The following spell checking methods are used:

The spell check index is used for spell checking.
The spell checker component, DirectSolrSpellChecker, uses data directly from the CatalogEntry index, instead of relying on a separate stand-alone index.

Tuning the spell check index

The spell check index ensures that automatically suggested search terms accurately reflect the terms in the search index.

It is built automatically during commits (build index and replication), including subordinate search nodes in a clustered environment.

The automatic build is defined in the solrconfig.xml file, in bold:


<lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">spellCheck</str>
      <str name="spellcheckIndexDir">spellchecker</str>
      <str name="classname">solr.IndexBasedSpellChecker</str>
      <str name="field">spellCheck</str>
          <str name="buildOnCommit">true</str>
      <str name="buildOnOptimize">true</str>
      <str name="spellcheckIndexDir">./spellchecker</str>

If you are running frequent delta updates during the day, you might notice high CPU usage on your search subordinate servers. If the high CPU usage is excessive, you can set the buildOnCommit parameter to false. Then, you can manually trigger a build of the spellcheck index for a specific index by using the following command:


http://host_name:search_port/solr/MC_masterCatalogId_CatalogEntry_locale/select?q=query&spellcheck=true&spellcheck.collate=true&spellcheck.build=true

Improving Store Preview performance for search changes

To improve performance when previewing search changes, you can skip indexing unstructured content when business users launch Store Preview:

In the wc-component.xml file, set the IndexUnstructured property to false.

For more information, see Changing properties in the component configuration file (wc-component.xml) (Search EAR).

Performance monitoring

You can monitor Solr in WebSphere Commerce search using the following methods:

Solr administrative interface

The Solr native administrative interface can be used to gather runtime statistics for each Solr core that is running on the search server. It can also be used to perform simple search queries. For more information, see Enabling the Solr administrative interface.

Lucene Index Toolbox (Luke)

Luke is a development and diagnostic tool for search indexes. It enables you to display and modify search index content. For more information, see Luke - Lucene Index Toolbox.

WebSphere Application Server JMX clients

JMX clients can read runtime statistics from Solr.

To set up the client:

Add the JMX registry in the Solr core configuration file, solrconfig.xml:


<jmx serviceURL="service:jmx:iiop://host_name:2809/jndi/JMXConnector"/>

Use jconsole in Rational Application Developer to connect to the runtime JMX.
When the Solr core is initialized, you can use jconsole to view information from JMX, such as statistics information for caches.

For more information, see SolrJmx.

Advanced configuration of the WebSphere Commerce search configuration file (wc-search.xml) (WC EAR)

Ensure that your advanced configuration is tuned to meet your performance needs.

For example, consider changing performance values from:


<_config:server name="AdvancedConfiguration_1">
  <_config:common-http
    URL="http://host_name:3737/solr/"
    allowCompression="true" connectionTimeout="15000"
    defaultMaxConnectionsPerHost="100" followRedirects="false"
    maxRetries="1" maxTotalConnections="100"
    retryTimeInterval="1000" soTimeout="5000"/>
</_config:server>

To tuned values:


<_config:server name="AdvancedConfiguration_1">
  <_config:common-http
    URL="http://host_name:3737/solr/"
    allowCompression="true" connectionTimeout="1200000"
    defaultMaxConnectionsPerHost="600" followRedirects="false"
    maxRetries="1" maxTotalConnections="600"
    retryTimeInterval="6" soTimeout="1200000"/>
</_config:server>

Note: Avoid using values that are too long for the two timeout values, as it causes the storefront page to hang for a prolonged period in an event where the search server is not available. The search run time retries for the maxRetries number of times and there are retryTimeInterval milliseconds before each retry attempt. This timeout setting should only be used as the HTTP connection timeout. For the search request timeout, use a combination of connectionTimeout and soTimeout from the Advanced Configuration, with the maxTimeAllowed parameter inside of each search profile.

For more information about this parameter and other wc-search.xml configurations, see WebSphere Commerce search configuration file (wc-search.xml) (WC EAR).