Managing stop words

ZooKeeper is used to manage several kinds of customizable lists of associated terms used by the search function. Stop words are removed from the query before it is processed. Each custom list is accessible and you can directly change stop words in ZooKeeper.

Stop word lists in Zookeeper

ZooKeeper stores a set of lists that are used by the HCL Commerce Version 9.1 Query service. Each list consists of a word, and its accompanying association or the action to be taken when it is encountered. The Stop words list records all those words that are to be filtered out of the search query before Natural Language Processing is performed on it. This list usually contains the most common words in a language (such as "the" for English).

HCL Commerce Version 9.1.15.0 or laterThe default list of English Stop words is:
(a|an|and|are|as|at|be|but|by|for|if|in|into|is|it|no|not|of|on|or|such|that|the|their|then|there|these|they|this|to|was|will|with|,

These custom lists are stored in JSON format in ZooKeeper, in language-specific dictionaries. The following section describes the structure of these dictionaries, and how you can interact with them in ZooKeeper using the REST API.

The Stop Words dictionary

You interact with the Stop Words dictionary using REST calls. The permitted calls are GET, POST, and PATCH. For example, in the case of a GET call, the response body contains a json-formatted set of the terms you are calling. There is no explicit DELETE call; however, you can simply do a POST with empty content to delete an item.

The address for this query is:
http://data_environment_hostname:30920/search/resources/api/v2/configuration?nodeName=environmentType_storeID_product_stopwords&locale=en_US
Where the environmentType is either auth or live.
Your reply will contain a set of stopwords, as in the following example.
{    
"the": "",
"and":""
} 
Note: Refer to the Query REST API for more information. For more information on the Query Service configuration API, see Configuring Query services in ZooKeeper.
HCL Commerce Version 9.1.15.0 or later

Extending the Stop words

You can extend the list of Stop words. Two common scenarios where this is likely are:
  • When your customers will commonly use additional terms, in English or in technical terminology, that need to be filtered out in addition to the default set.
  • When you need to add Stop words in a language other than English.
Use the PATCH method at the following REST API endpoint to extend the Stop words list.
PATCH http://host:port/search/resources/api/v2/configuration?nodeName=component&envType=auth
Where the body of the request similar to the following example, in which the value contains a list of Spanish Stop words.
{
"extendedconfiguration": {
"configgrouping": [
{
"name": "SearchConfiguration",
"property":
{ "name": "SkipWords", "value": "\\s+(de|la|que|el|en|,|a|los|.|del|se|')\\s+" }

}
]
}
}