HCL Commerce Version 9.1.12.0 or later

Expanding synonyms and Search Term Associations at query time

In the Query service, Search Term Association (STA) and synonym expansion are performed by the Query service before the query is passed to the Elasticsearch engine.

Search term associations suggest additional, different, or replacement products in search results. They can also link search terms to a selected landing page in the store. Search term associations are used as a product recommendation strategy to increase store sales when customers search for products, as the search submission is modified to increase or target search results. For a detailed explanation of what they are and how they work, see Search term associations.

How expansion is performed

When you use Elasticsearch as your search solution, STA and synonym expansion is peformed by the SearchNLPSTAExpansionProviderHelper class defined in the HCL_NLPProfile. This class also removes any stop words that were added through the /configuration REST API endpoint. The characteristics of the query time synonym expansion are:
  • No spell check on any search term defined in STA.
  • Lemmatization is used to match input search terms with search term associations. Stemming will be performed on the original STA set for the primary search.
  • All synonym-expanded terms are contained in a SHOULD clause. Natural Language Processing (NLP) analysis is not performed on expanded STAs, except replacement terms with a one-to-one relationship. Replace terms will go through the NLP processing. For more information on the expected behavior from this design decision, see Default behaviors after STA expansion.
  • Processing is only performed in a single direction (from top to bottom):
    1. The first pass manages all replacement terms.
    2. The second pass is for synonym expansion.

Note that the overall combined search scope may be increased slightly when one or more of the original search terms are identified to be adjectives. This could result in more search hits for the remaining expanded search terms.

Special attention is needed when using query time synonym expansion together with long tail removal. The outcome of the synonym expansion may be affected by the long tail removal function when the logic to handle too many search results has been triggered. To avoid this condition:
  1. Place the most significant terms at the beginning of the synonym expansion list. The Query service parser can then perform the appropriate boosting against those significant terms at runtime. The result is a more relevant result set that can be presented to the end user. This applies when long tail is enabled; otherwise, the input search term will be used for boosting.
  2. Use only the singular form in the synonym expansion list so that more accurate stemming can be applied at query time.
  3. List all the related synonyms terms in one line, do not repeat them in multiple lines.
  4. All the synonyms must be spelled correctly. Misspelled keywords can be added as replacement terms.
  5. Replacement terms should not span multiple lines. For example, the following is an invalid example:
    vision => vision, eyes
    eyes => eye
    
    The correct syntax is
    vision => vision, eyes, eye
  6. Synonym expansion is only performed in the forward direction.
  7. Avoid duplicate entries for search term associations and synonyms added through the Management Center or the /configuration endpoint.
Note:
  1. If no synonym expansion has been done, then the Query service does dependency parsing to get the root keyword out of the search term.
  2. NLP processing will not be performed on the synonyms or replacement terms, except for replacement with a one-to-one relationship. For example, chair => sofa.
  3. In the case of one-to-one replacement, replace terms can be expanded with synonyms if applicable.

General guidelines for composing synonyms and replacement terms

  1. Exercise care when adding synonyms and replacement terms.
    • The replacement terms associated with synonyms are designed to replace the shopper's search terms when the shopper is interacting with the storefront with terms that match the product data and return desired results. Synonyms are intended to be used by merchandisers, to aggregate and return similar products with different product data names/ terms together in the search results. The search service can handle typical spelling mistakes and most inflected forms of terms (-ing, -ed, plurals, etc.). Use singular terms and do not add plural / inflected term forms unless you find the search service is returning matched results. If you are not sure whether the search service will properly match inflected terms for your top searches, you can check term’s “stem”: https://snowballstem.org/demo.html
      • Synonyms may be used for inflected terms used in product data that are not being “stemmed” down to a matching root form of the term. (Example: “conditioner, conditioning”, “shelf, shelves”)
      • Replacements should be used where the shopper’s incoming inflected search term is not matching with non-inflected terms in the product data. (Example: “welder => weld”)
    • Synonyms are "global" and adding a synonym to improve one search use case may impact relevancy for other searches. In order to minimize potential search relevancy issues, improve product data and add keywords instead of using synonyms. Improvements to product data isolates the change to specific products and may also provide Search Engine Optimization (SEO) (Google cannot crawl/ index synonym data) and Key Performance Indicator (KPI) benefits.
    • Use only nouns for terms and avoid duplicate/ overlapping entries.
    • Do not add misspelled forms of terms as synonyms. If you find commonly misspelled search terms in search analytics that the search service is having difficulty matching, use a replacement term entry.
    • Use replacement terms for abbreviations unless abbreviations are used in the product data. Synonym entries can be used to aggregate results for similar products where the abbreviations are inconsistent. (Example: “recip, reciprocate”).
  2. Other guidelines when creating synonyms:
    • Use either the Management Center Search Term Association (STA) tool or the query configuration REST API. Note that synonyms added using the synonyms API will not be visible in the Management Center STA tool.
    • Replacement terms are processed before synonyms and synonym expansion is not performed on the replaced keywords.
    • Keep synonym entries as short and as simple as possible. Use single-word synonyms whenever possible to simplify similar multi-word terms. For example, consider the following multi-word synonyms used for describing a hoist for lifting a vehicle.

      Original:

      vehicle lift, vehicle hoist, car lift, car hoist, automobile lift
      
      vehicle ramp, car ramp, automobile ramp
      Simplified:
      automobile, car, vehicle
      
      hoist, lift
    • Use consistent terms usage in the product data and synonyms and replacement terms (For example: bandsaw versus band saw, e-track versus etrack, units of measure (eg., in, “, inch) in product names).
    • Place the most significant terms at the beginning of the synonyms entry when long tail search is enabled.
    • Ensure that the number of synonyms in a given synonym entry does not exceed the SynonymExpansionThreshold (default of 20 terms) configuration setting.

Default behaviors after STA expansion

  1. In a case where you have added the synonyms "style home chair, sofa, sofa set" and a customer searches for 'sofa set' then the expanded query will not be processed through NLP parsing. The system performs a text search of all of these words against the catalog, using a query formed with grouping as follows:
     query : ("style" AND "home" AND "chair") OR ("sofa") OR ("sofa" AND "set")
     fields : [sta query fields]
  2. If you have added a replacement "style home chair => sofa set" with the replacement type set to "Also Search For" (that is, also search for:A > A, B), then the expanded query will not be processed through NLP parsing. Instead, a text search of all of these words is performed against the catalog. The query will be formed with the group as shown below while searching for 'style home chair':
    query : ("style" AND "home" AND "chair") OR ("sofa" AND "set")
    fields : [sta query fields]
  3. If you have added the replacement "style home chair => sofa, sofa set" with the replacement type set to "Instead Search For" (that is, instead search for: A > B, C). Then the expanded query is not processed through NLP parsing. Instead, a text search of all of these words is performed against the catalog. The query is formed with the group shown below while searching for 'style home chair':
    query : ("sofa") OR ("sofa" AND "set")
    fields : [sta query fields]
  4. If you added a replacement "style home chair => sofa set" with the replacement type set to "Instead Search For" (instead search for: A > B, then the expanded query is processed through NLP parsing, because there is a one-to-one relation for replacement. The search is performed based on the NLP classification after parsing the replacement term through the NLP process while searching for 'style home chair'.
    query : ("sofa" "set")
    fields : [nlp classification query fields]

Sample Use Case A – difference between keywords and synonyms

Suppose there is a hoist category with chain hoist products, and another gardening category with garden hose products. A shopper searches for chain hoists, and the result is that 486 products are found. Later there is a requirement to associate the term link to chain. But since there are over 400 products involved, the merchandiser chooses to add a simple synonym link, chain.

Next, another requirement arises, related to associating strap to link for a small number of tie-down products. The merchandiser was not aware that there are several products in the gardening category that have "strap" in their descriptions, specifically garden hoses with a storage strap. When the merchandiser adds link, chain, strap to the above synonym, a shopper searching for the same chain hoist receives chain hoists in the response, but also garden hoses.

The suggested way of addressing the strap to link requirement is to add strap and link to the keyword field of all the tie-downs instead of using synonyms:

Synonyms are generally global, while keywords are only specific to the assigned products or items.

Sample use case B: Size of result set before and after synonym expansion

The following example describes how the search result set after synonym expansion could be different than the combined total from each individual term out of the synonym list. Consider the following search results:

When a customer searches for drill, 1723 results are returned.

Insert an arbitrary term, such as non-existing into this search phrase. The same number of hits is returned. This is because the Query service has automatically adjusted the search expression to remove the term non-existing because non-existing drill produces a null result.
"metaData": { 
    "price": "1", 
    "searchPhrase": { 
      "original": "non-existing drill", 
      "adjusted": "drill" 
    }, 
    "spellcheck": [] 
  }, 

Consider another similar search with electric drill and cordless drill, producing 130 hits and 278 hits respectively:

Combine all these terms into one single synonym list: non-existing drill, electric drill, cordless drill. Now when searching for cordless drill, the returned result set size is 342.

Search Term Synonyms Size of Search Result Set
drill none 1723
non-existing drill none 1723
electric drill none 130
cordless drill none 278
electric drill non-existing drill, electric drill, cordless drill 342

One would expect the size of the synonym expanded result to be at least the maximum of that from one of the synonyms in the synonym list, that is, 1723. Instead, the size returned is only 342.

To understand why the size is smaller, first consider what the query part of the search expression contains after the three synonyms have been expanded:
“non-existing drill” OR “electric drill” OR “cordless drill” 

Even though non-existing drill does not return any search hit, because the rest of the conditions can still generate some results, the Query service will not auto correct the first condition. So, when combining all three synonyms together, it is really the last two synonyms that are being used. The final result now becomes 342 (greater than the maximum of [130, 278]).

Sample Use Case C – difference between replacement term and synonym expansion 

Merchandisers generally use synonyms to aggregate and return similar products with different product data names together in the search results. Replacement terms are designed to replace the shopper’s search terms used on the storefront with terms that match the product data and return desired results. Consider the following example:
“drum hoist => drum lifter” : “r” 

This is a replacement term. When a shopper searches for drum hoist, the Query service will automatically replace this input phrase with drum lifter and use it for an NLP enabled term search. The result will be exactly the same as if the shopper entered drum lifter.

An additional synonym expansion against the replaced term may be needed, for example:
“drum, barrel” : “s” 
In such a case, whenever the word drum or barrel is detected in the search phrase, expand this term in its original place of the search phrase with drum OR barrel. When a shopper searches for drum lifter, the final search expression will look like the following:
“( drum OR barrel ) lifter” 
Combining the two examples above into the following two entries defined in STA results in
“drum hoist => drum lifter” : “r” 
“drum, barrel” : “s” 
when a shopper searches for:
“drum hoist” – expression will be “( drum OR barrel ) lifter” 
“drum lifter” – expression will be “( drum OR barrel ) lifter”

When using replacement terms with one-to-one relationships, NLP processing will be performed on the replaced term as well.

To demonstrate that synonym expansion is only performed at most once per substitution and does not expand recursively with additional matching synonyms, consider the following entry, added immediately after our STA example above:
 “barrel, tub”: “s” 

When searching for either drum hoist or drum lifter, the expression will remain the same: “( drum OR barrel ) lifter”. It is because the term drum has already been expanded once into drum OR barrel, and the next synonym from the third STA barrel, tub is ignored even though there is a matching term barrel in the final search expression.

Important: Replacement terms are processed before synonyms and the synonym expansion on the replacement keywords is performed only once and is not recursive.