WebSphere Commerce Search best practices

Follow the best practices to ensure that your WebSphere Commerce Search implementation works efficiently. Best practices include specific roles, such as site developers, administrators, and business users.

Site developers

Know your use cases. Based on your site's usage pattern, you can fine-tune the search code path and payload to maximize your throughput. For more information, see Disabling search expression providers and result filters in the search configuration file (wc-search.xml).
Avoid programming against the database directly from the storefront. That is, the back-office data store is the database, while the storefront data store is the search index.
Avoid declaring too many expression providers, query preprocessors and postprocessors, or result filters to any search profile. That is, isolate each search profile by usage at the storefront and include only the required processing logic in each search profile. Unnecessary processing and filtering increases the overall search response time.
Group similar usages into one index field and assign the appropriate analyzers to them.
Use inheritance when you reuse search profile properties.
Include only the necessary index fields for searching, and in the result set.
Avoid declaring too many facets for each search operation.
Use search expression providers for modifying search expressions.
Use search result filters for modifying Business Object Document (BOD) responses.
Use search query postprocessors for modifying REST responses.
Use pagination to reduce payload.

When caching:

Increase cache sizes to suit your production environment. That is, cache sizes are not optimized for production by default. Cache sizes should be as large as possible, offering a greater chance that searches are against cached search results (fq parameters), rather than new query results (q parameters). The more Solr can cache, the better.
Use fragment caching for each result that is displayed to increase the cache-hit ratio across different search requests.
Use time-based cache invalidation for keyword search results to simplify cache policy management.
Cache invalidation can be performed only after reindexing and after index replication is complete.
Invalidating large amounts of cached content becomes difficult when timing index replication.

Site administrators

When you set up the production system network:

Configure WebSphere Commerce to use load balancing and failover, where code is installed on the primary node. A load balancer can be used on top of the web server for WebSphere Commerce and WebSphere Commerce Search. Both a load balancer and web server can help route traffic for workload balancing. The WebSphere Commerce application on the primary node can be added to the WebSphere Commerce cluster to handle live traffic. The deployment manager (DMGR) is used to federate to managed nodes and the repeaters.
A true failover configuration should be used with a load balancer for WebSphere Commerce and WebSphere Commerce Search. That is, when the CPU is pegged on any one of the search servers, the WebSphere Commerce threads might timeout as it waits for a response. This leads to timed out requests at the storefront when shoppers are accessing the site. Having a proper hardware load balancer on top of the web server reduces this risk, as it can better manage heartbeats and routing. That is, in addition to load balancing, it also routes based on response times.
Use more than one repeater in the setup to provide failover support for index replication, in case one of the indexing repeaters become unavailable. That is, have changes that are pushed to all the repeaters and have one of them as the master. The master is then configured to replicate to all the Solr subordinate servers in production. When the master repeater becomes unavailable, the backup repeater can immediately take over for index replication.
A storage area network (SAN) is more resilient than direct-access storage (DAS). When SAN is employed, it is possible to have all Solr subordinate servers mount to the same search index directory, therefore reducing the overall workload on the repeater for replication.
Although the repeater is optional, it is recommended to deploy at least one repeater to your production environment. The purpose of the repeater is to act as an index snapshot of what is in production, so that emergency fixes can be applied. Since the data in the staging index might contain non-production-ready changes, the staging index cannot be used for applying emergency fixes. It is not recommended to perform indexing directly on the Solr subordinate servers that are in production, as they are serving up storefront live traffic.
Do not create workspace Solr cores in the production environment, as this can significantly increase the overall amount of memory that is used on the search server.

For more information about the recommended configurations when you implement a search cluster, see WebSphere Commerce Search server: Advanced configuration.

When you administer the site:

Perform index optimization only when system usage is low.
Use the UpdateSearchIndex scheduler task in production when synchronizing.
Configure the UpdateSearchIndex scheduler task in production to run more frequently for delta updates and perform a clean full build less often when the system usage is low.
Production search statistics can be moved or replicated into staging and can accumulate over time. Therefore, periodically archive old statistics data to achieve better response times in the Management Center Search Statistics tooling.
Do not crawl into any catalog-related pages. When you crawl the WebSphere Commerce site, configure all URLs to be crawled into the StaticContentSitemap, which allows SEO-enabled URL tags.

Storage

Consider the following storage factors:

Conserve storage space by cleaning up old or backed up search indexes that are no longer in use. For example, the solrhome/data directory might contain multiple time-stamped index directories after replication that can safely be removed, unless they are being used explicitly for backup purposes.
The solrhome/data/index directory contains the active index files by default. Or, another location if otherwise specified in an solrhome/data/index.properties file.
Consider the following factors when you use storage area network (SAN) or Direct-attached storage (DAS) as your storage space:
When you consider SAN:

Hardware failure

Highly resilient with no single point of failure and unlikely to have any business impact.

Network failure

Highly resilient with no single point of failure and unlikely to have any business impact, assuming a proper clustering of search servers. The load balancer automatically routes traffic to the server that has sufficient capacity to handle full peak load.

Scalability

Provides much greater scalability than local disks.

Performance

Outperforms local disks in terms of input/output operations per second (IOPS), given the Fibre Channel is used instead of NFS.

Total Cost of Ownership

Higher than DAS.

Flexibility

Moving LPARS, JVMs, or indexes are relatively straightforward when you use partition management software such as the Live Partition Mobility on AIX.

Adding extra disks should not affect operation.

When you consider DAS:

Hardware failure

Disks are mirrored. Single disk failures should have no impact.

Some server models might have disk hot-swap capability.

Network failure

Disks are local and unlikely impacted by network issues.

Scalability

Limited by the amount of hardware the physical server can hold.

Performance

Although it is likely easier to configure local disks, SAN on average should be relatively faster than local disk.

Total Cost of Ownership

Lower than SAN.

Flexibility

Moving LPARS, JVMs, or indexes require rebuilding.

Adding disks might involve downtime to reconfigure the RAID array.

Marketing business users

Use catalog filters to filter site content. For example, including or excluding content based on category, product, attribute, or property.
Use search rules for reordering products in the storefront.
Group all search rules with the same trigger into the same rule.
Use separate search rules when appropriate, for example, when there is a need to separately control or track the search rule.
When using store preview, use the search summary to debug search rules, and use the relevancy score of each search result to fine-tune the boost factor.

Catalog business users

Use synonyms to increase search scope, while replacement terms can be used to reduce search scope.
Understand the scope of search administration, for example, stage propagation versus emergency fixes, to determine the business impact. This helps in determining when changes are available in production.
Avoid indexing other languages into the same index for a locale.
Avoid loading many Catalog objects into the workspace. Instead, load production-ready data to avoid indexing twice; once under the workspace, and another under the base schema upon approval. For more information, see mergeFactor.
When starting store preview and triggering reindexing, it is used to preview a few catalog changes. Depending on the size of your catalog, up to 10000 to 15000 changes might be previewed in approximately 30 minutes. It is recommended to load and preview many changes in the base schema.

Searchable and facetable attributes (Site administrators and Marketing business users)

The following workflow applies to attribute dictionary attributes that are marked as searchable or facetable in WebSphere Commerce Search.

Marking Attribute dictionary attributes as searchable or facetable: You can use the Management Center. You can also load attributes as searchable or facetable using Dataload or catalog upload. The searchable and facetable flag that is changed in a workspaces environment is saved to the workflow content; not the approved content.
The searchable or facetable flag that is associated with the attribute dictionary attribute cannot be managed by workflow in the Management Center. After the attribute is marked searchable or facetable in Management Center, the change is immediately committed into the approved content.
When the searchable or facetable flag is selected for an attribute dictionary attribute, it cannot be cleared. That is, links are created for it throughout the WebSphere Commerce database and search index. For reliability and consistency, these links remain intact despite clearing the searchable or facetable check boxes; once specified, you cannot change these settings.
When an attribute dictionary attribute is marked as searchable or facetable, business users can start working with them in workspaces, such as associating them with products, and are able to preview them in the storefront.