Managing synonyms, stop words, and Search Term Associations

ZooKeeper is used to manage several kinds of customizable lists of associated terms used by the search function. Stop words are removed from the query before it is processed. Synonyms and Search Term Associations (STAs) implement catalog substitutions for query terms in slightly different ways. Each custom list is accessible and you can directly change stop words and synonyms in ZooKeeper.

Introduction

ZooKeeper stores a set of lists that are used by the Version 9.1 Query service. Each list consists of a word, and its accompanying association or the action to be taken when it is encountered. Three lists are currently maintained:
  • The Stop words list records all those words that are to be filtered out of the search query before Natural Language Processing is performed on it. This list usually contains the most common words in a language (such as "the" for English).
  • Synonyms and Search Term Associations increase the scope of search results by adding additional search terms to search submissions. The search results include the submitted search term, plus the search results for the additional defined synonyms or STAs. Although they are processed in the same way, they are separate lists that are generated by different mechanisms. The STA mechanism provides a backwards-compatible approach whereby the synonyms are loaded directly from the database. The synonyms mechanism uses ZooKeeper and is intended to provide more options for managing associations in the Elasticsearch environment.
These custom lists are stored in JSON format in ZooKeeper, in language-specific dictionaries. The following sections describe the structure of these dictionaries, and how you can interact with them in ZooKeeper using the REST API.

The Stop Words dictionary

You interact with the Stop Words dictionary using REST calls. The permitted calls are GET, POST, and PATCH. For example, in the case of a GET call, the response body contains a json-formatted set of the terms you are calling. There is no explicit DELETE call; however, you can simply do a POST with empty content to delete an item.

The address for this query is:
http://data_environment_hostname:30920/search/resources/api/v2/configuration?nodeName=environmentType_storeID_product_stopwords&locale=en_US
Where the environmentType is either or is:
http://data_environment_hostname:30920/search/resources/api/v2/configuration?nodeName=environmentType_storeID_product_stopwords&locale=en_US
Where the environmentType is either auth or live.
Your reply will contain a set of stopwords, as in the following example.
{    
"the": "",
"and":""
} 
ZooKeeper Query Options
"http://search_server:30920/search/resources/api/v2/configuration?nodeName&locale-en_US
where nodeName is the flattenednamespace for the index.

Updating synonyms and replacement terms

You can also update synonyms and replacement terms using the query service configuration REST API. The permitted calls are GET, POST, and PATCH. For example, in the case of a GET call, the response body contains a json-formatted set of the terms you are calling. There is no explicit DELETE call; however, you can simply do a POST with empty content to delete an item.

The address for this query is:
http://data_environment_hostname:30920/search/resources/api/v2/configuration?nodeName=environmentType_storeID_product_sta&locale=en_US
Your reply will contain a set of synonyms, as in the following example.
{    
"couch, sofa": ""  ,  
"coff => coffee": "" ,
"driveway, road, street": "" , 
...} 
Synonyms can be simple words paired with an original term, in a list one after another. This is similar to the structure that is used for stop words, although there are several ways that the notation will be interpreted for the list depending on your settings. The default settings for interpreting the list are:
  • expand = true
  • lenient = false
For a complete description of how Elasticsearch uses these settings, see Synonym token filter in the Elasticsearch reference. This flexibility in constructing and interpreting the synonyms list is what necessitates the distinction between synonym and STA in the Commerce Elasticsearch implementation.

For information on how to tune synonym processing, see Synonym-related configurations.

Working with STAs

Search Term Associations are functionally the same as synonyms and they are stored in the same format in ZooKeeper. STAs are treated the same by Elasticsearch as they are by Solr; for more information, see Search term associations. Elasticsearch STAs are generated using the same mechanism as the Solr search engine, rather than by the REST call mechanism used for synonyms and stop words. The process is that when an STA is saved in Management Center for HCL Commerce, a near realtime update is triggered and overwrites the existing STA list in ZooKeeper with a new list from the database.

You can do a Get to verify the changes.

{
   "couch, sofa": "" 
   ...
}