Troubleshooting: Long load times for query response pages

Slow response times to search queries can have various causes. A systematic process for investigating such delays is provided.

Problem

Long delays in loading a search result page or category page might be due to preprocessing, postprocessing, or even REST service issues on the side of the WebSphere Commerce server. A systematic investigation that eliminates WebSphere Commerce causes and identifies Search server-related issues can quickly find the cause.

, then the first step is to identify whether the problem lies with the Search server, or with WebSphere Commerce itself. If the issue is on the Search side, you can test each of the query stages (preprocessing, query execution, and postprocessing) to determine which is the source of the delay. With the query lifecycle stage determined, the final step is to isolate the specific process that is causing the delay.

Eliminating causes on the WebSphere Commerce side

If all you know, initially, is that query responses are taking too long, then determine whether it is the Search server or WebSphere Commerce that is causing the problem. For instance, say that a search for the word "coffee" is taking 25 seconds to complete. You can start troubleshooting on the WebSphere Commerce side by enabling Performance Logger tracing. Review the logging data in the Performance Measurement Tool. The Operation Performance Report (from the report-operations.csv file) shows how long each query operation took. Say that the REST request to retrieve product data for the coffee search is shown to be taking over 20 seconds to respond to the WebSphere Commerce server with the search results. Since 80% of the query delay is due to activities in the Search server, it is reasonable to look there for answers.

Solution

To understand what is delaying return of the search results to the Commerce server, first determine in which step of the query lifecycle the delay is occurring. Each step takes time to complete.

Time is required for the construction of the search query.
Execution of the search query takes time.
Time is also required for any further processing of the search query's results before it is returned to the WebSphere Commerce server.

When you know which step is causing the delay, you can focus your attention there.

Time that is taken for construction of the query

In this step, the query expression providers and query pre-processors construct the search query. You can use ServiceLogger tracing on the Search server to determine how long each expression provider and preprocessor is taking to complete. For instance, With com.ibm.commerce.foundation.logging.ServiceLogger=finest tracing enabled, the log shows the elapsed time for each class that is called by the process.

[12/16/15 10:38:34:054 EST] 00000028 ServiceLogger <   
     REST <com.ibm.commerce.foundation.server.services.rest.search.expression.solr.SolrRESTSearchBasedMerchandisingExpressionProvider> 
        <elapsed:1032ms>  RETURN
[12/16/15 10:38:34:066 EST] 00000028 ServiceLogger <   
     REST <com.ibm.commerce.foundation.server.services.rest.search.expression.solr.SolrRESTSearchTermAssociationExpressionProvider> 
        <elapsed:11ms>  RETURN
[12/16/15 10:38:34:076 EST] 00000028 ServiceLogger <   
     REST <com.ibm.commerce.foundation.server.services.rest.search.expression.solr.SolrRESTSearchByKeywordRelevancyExpressionProvider> 
        <elapsed:10ms> 
…
<com.ibm.commerce.foundation.server.services.rest.search.expression.solr.SolrRESTSearchExpressionProvider> 
<elapsed:1119ms>  RETURN

In this example, the SolrRESTSearchBasedMerchandisingExpressionProvider class took 1032 ms of the total 1119 ms for processing expression providers. This class is used to process search rules and apply relevant search rules to the search query. You can use similar troubleshooting for the query preprocessors to identify any that are running for a long period.

Time that is taken for the search query to run

This part is made up of the time to process the query itself. You can review the query results by running the query (captured in the Search server's trace log) directly against the Search server.

<response> 
<lst name="responseHeader"> 
  <int name= "status">0</int> 
  <int name="QTime">52</int> 
<lst name="params"> 
  <Str name="q">coffee</str> 
  <Str name="ps">2</str> 
  <Str name="debugQuery">true</str> 
  <Str name="fl">name</str> 
  <Str name="pf">name^100</str> 
  <Str name="defType">edismax</str> 
<result name="response" numFound="52" start="0 >

In the example, QTime represents how long the search query took to complete in milliseconds. The params list includes all the parameters of the search query, and the values used for each parameter. numFound shows how many Solr documents were found in the index based on the search query.

Using this data, you can confirm how long the search query itself took to complete in relation to the entire scenario (for instance, a 52 ms search query versus a 15-second page load). If the search query had a high QTime, you can review the parameters of the query to confirm that the complexity of the query is expected.

Continuing with the query results, you can review the contents of the debug, parsedquery, and explain parameters for further information.

<lst name="debug"> 
  <Str name="rawquerystring">coffee</str> 
  <Str name="querystring">coffee</str> 
  <Str name="parsedquery"> 
     (+DisjunctionMaxQuery((defaultSearch:coffe)) ())/no coord 
  </str> 
  <Str name="parsedquery_toString">+(defaultSearch:coffe) ()</str> 
<lst name="explain"> 
  <Str name=" 17010"> 
2.0115635 = (MATCH) sum of: 2.0115635 = (MATCH) 
weight(defaultSearch:coffe in 3057) [DefaultSimilarity] , result of: 
2.0115635 = score(doc=3057 = termFreq=3.0 ) , product of: 
0.99999994 = queryWeight, product of: 5.3091516 = 
idf(docFreq=52, maxDocs=3942) 0.188354 = queryNorm 2.0115638 
= fieldWeight in 3057 , product of: 1.7320508 = tf(freq=3.0), with 
freq of: 3.0 = termFreq=3.0 5.3091516 = idf(docFreq=52, 
maxDocs=3942) 0.21875 = fieldNorm(doc=3057) 
  </str>

The debug parameter is added to the query response if a debugQuery=true parameter is added to the search query. parsedquery shows what the search query looks like after the parameters of the query are processed. The explain parameter contains the relevancy score calculations for each catentry_id in the result. With this data, if the query had a high QTime, you can review the parsedquery to confirm how complex it is. A search for a single search term can quickly become complex if functions such as search term associations, search rules, boosting, or phrase slop are used.

Lastly, The process parameter shows how long each component took to process. You can also review this data to confirm if a particular component is taking a long time to process.

  <lst name="process"> 
<double ">50.0</double> 
  <lst name="wc_query"> 
<double name="time">40.0</double> 
  <lst name="wc facet"> 
<double name="time">0.0</double> 
  <lst name="mlt"> 
<double name="time">0.0</double> 
  <lst name="stats"> 
<double name="time">0.0</double> 
  <lst name="debug"> 
<double name="time">10.0</double> 
  <lst name="wc_spellcheck"> 
<double name="time">0.0</double>

Time that is taken to process the search query results

In this step, query post-processors are used to do further processing on the search query results. You can use ServiceLogger tracing on the Search server to determine how long each post-processor is taking to complete. When com.ibm.commerce.foundation.logging.ServiceLogger=finest tracing is enabled, the log shows the time taken by each post-processing class.

[3/14/16 14:12:08:469 CDT] 00000029 ServiceLogger <   
     REST <com.ibm.commerce.foundation.server.services.rest.search.postprocessor.solr.SolrRESTSearchCatalogEntryViewAttributesQueryPostprocessor> 
      <elapsed:19609ms> RETURN
…
[3/14/16 14:12:08:469 CDT] 00000029 ServiceLogger <   
    REST <com.ibm.commerce.foundation.server.services.rest.search.processor.solr.SolrRESTSearchExpressionProcessor> 
       <elapsed:19890ms>  RETURN

In this example, the SolrRESTSearchCatalogEntryViewAttributesQueryPostprocessor took 19.6 seconds out of 19.9 seconds for processing query post-processors. Further investigation by using the tracing logs can help you determine why this query post-processor is taking so long to run.