Natural Language Processing profiles

A Natural Language Processing (NLP) profile is used to preprocess search terms and modify search queries to fetch desired search results at the storefront.

Logic for processing search strings

Shoppers are not experts in formulating the search query that can yield the desired search results to them. They are often unaware of the ideal search terms to use to find products or services at the storefront. Using Natural Language Processing, the Query service is able to parse plain-language search terms and discern what shoppers are trying to find. It modifies the search term at runtime to fetch the desired search results to the shoppers. The search term processing logic of the Query service is described with the help of the following examples. Each example consists of an example search term and the search term processing logic that the Query service uses to process the search term and fetch the desired search results at the storefront.

"white shirt girls"

The NLP parser generates the following three tokens to map the search term and then runs the Elasticsearch query to fetch the search results at the storefront. It returns two products.

  • white – COLOR
  • shirt – CATEGORY

girls – CATEGORY

"white shirt girls under 89$"

NLP parser generates the following four tokens to map the search term and then runs the Elasticsearch query to fetch the search results at the storefront. It returns a single product.

  • white – COLOR
  • shirt – CATEGORY
  • girls – CATEGORY
"white shirt girls under 89$"

NLP parser generates the following four tokens to map the search term and then runs the Elasticsearch query to fetch the search results at the storefront. It returns zero matches.

  • white – COLOR
  • shirt – CATEGORY
  • girls – CATEGORY
  • under 20$ - FILTER

In this case, the NLP parser uses search term dropping logic. It starts to drop the search phrase from left with one token at a time until it gets the tokens to fetch the appropriate search results or up to four iterations. If there is any price filter in the search phrase/term, then it also gets removed in this process. Post completion of search dropping logic, the NLP parser runs the Elasticsearch query based on the following two tokens. It returns all the eight products from the girls’ category by considering the shirt as a category or in the name or the short description of the product.

  • shirt – CATEGORY
  • girls – CATEGORY
"vitamin capsules"

NLP parser generates the following two tokens to map the search term and then runs the Elasticsearch query to fetch the search results. It returns zero matches because capsule has been set as an attribute value and based on the aforementioned tokens the Elasticsearch query searches capsules against the Noun field.

  • vitamin – CATEGORY
  • capsules – NOUN

In this case, the NLP parser uses search term dropping logic. But this also returns the empty search results because capsule has been set as an attribute value and the Elasticsearch query searches capsules against the Noun field. To handle such situations there is a business logic in place which runs the fallback Elasticsearch query based on the previous analysis of the search phrase/term. The Elasitcsearch query gets executed based on the following token. This returns all the products with the vitamin category.

  • vitamin – CATEGORY

NLP profiles

A Natural Language Processor (NLP) profile is used to control the preprocessing flow of search terms before executing an Elasticsearch query. The profile is a .json file and stored in your query runtime container.

You can find the default HCL_NLPProfile.json file in the resources\profiles\nlp directory of the query runtime. A NLP profile can also be created through the /profiles REST endpoint and is stored inside of the Zookeeper “nlpprofiles” node. The /profiles endpoint allows a new optional query parameter, profileType with values of Search or NLP to differentiate the profile. Search is the default choice if this parameter is not provided.

The NLP profile contains three main sections:

  1. A list of provider classes. These classes help to preprocess the search term. For more information about the provider classes, see Provider class reference.
  2. A list of NLP classifications that will override default classifications at query run time.
  3. A search term dropping priority section, which is used to define the sequence in which search term are dropped.

The following is a sample HCL_NLPProfile.json file, showing how the data is organized. In this sample the classification is provided for informational purposes only. In the HCL default profile this section is empty.

Creating or updating an NLP profile

To create or update an NLP profile, use the POST method. The following example shows how to use the POST method to create or update an NLP profile.
POST https://server:port/search/resources/api/v2/documents/profiles/HCL_NLPProfile?profileType=NLP

How the Query service finds the NLP profile

The Query runtime can load the NLP profile configuration details from the store index at runtime, or in response to a call via the Query REST API. If the configuration details are not provided, it will fall back on the default HCL NLP profile. The Query Service performs the following steps to lookup the name of the NLP profile.

  1. The Query service will check for the store locale NLP profile. If it is found, this profile will be loaded from Zookeeper.
  2. If no NLP profile is configured for the store locale, the query service will find the base locale from the language code of the store locale, and search for a profile name for that base locale. If one is found, this profile will be loaded from Zookeeper. The base locale can be any one of “en_US”, “es_ES”, “fr_FR”, “de_DE”, or “zh_CN”.
  3. If no NLP profile has been configured for the local base, the Query service will find the default NLP profile name for the store that isconfigured in the STORECONF table. If found, then this profile will be loaded from Zookeeper.
  4. If no default NLP profile is configured in STORECONF table, the Query Service will fall back to the default NLP profile.

Automatically handling search misses

Prior to HCL Commerce Version 9.1.8.0, if there are no results, search terms will be dropped from the search list, from left to right and one token at a time. You can now can specify which token gets removed from the search term when there are no results. In the NLP profile, the termDroppingPriority section details the priority according to which tokens are removed from the search term. After removing a token, the process makes another call with the updated search term. If a result is found, the Query service returns the result; otherwise, based on the configuration, another token is removed from search term. If you are using the default NLP profile, the dropping logic will be applied in the order below, but you can change the order or remove items from the list.

  1. FILTER: Will remove the price filter from search term.
  2. MEASUREMENT: Will remove measurement details from search term.
  3. BRAND: Will remove brand name from search term.
  4. COLOR: Will remove color name from search term.
  5. ADJECTIVE: Will remove adjective from search term.
  6. CATEGORY: Will remove category name from search term.
  7. NOUN: Will remove nouns from search term.

Before applying the term dropping logic, the process also removes tokens that are not identified by the NLP processor. For more information about term dropping, see Addressing search misses due to search dropping.

If there is no response after applying this logic, then the Query service makes a final fallback call based on the spell corrected details. This step cannot be customized.

HCL Commerce Version 9.1.15.0 or laterNote: You can disable term dropping by setting termDroppingPriority to a null value. For more information, see Disabling term dropping.

NLP profile classification

At query runtime, the NLP profile classification overrides the classification that were analyzed by the Query service from index data or the default NLP data model.

For example, consider a case where the word “apple” is classified as a BRAND_NAME by the Query service base on the index data. If you now want to classify “apple” as a category, this change can be configured in the NLP profile classification section.

Helper classes for NLP providers
Provider class reference
The provider classes used in the profile have the following functions.
PartNumber
com.hcl.commerce.search.internal.expression.provider.SearchNLPPartNumberProviderHelper
Matches the input search term with the part number patterns, if it matches then perform a search for a part number. The rest of the helper classes will not be executed.
BlankSpace
com.hcl.commerce.search.internal.expression.provider.SearchNLPWhiteSpaceProviderHelper
Replace more than two white spaces with a single white space.
CurrenySymbol
com.hcl.commerce.search.internal.expression.provider.SearchNLPCurrencySymbolProviderHelper
If the search term contains a price filter with currency symbol, then the currency symbol will be removed from the search term.
ExcludeSearchTerm
com.hcl.commerce.search.internal.expression.provider.SearchNLPExcludedTermProviderHelper
Remove the excluded term from the search term.
STA
com.hcl.commerce.search.internal.expression.provider.SearchNLPSTAExpansionProviderHelper
Performs Search Term Association (STA) expansion and replacement at query time in the Query service.
MultiWordSearchTerm
com.hcl.commerce.search.internal.expression.provider.SearchNLPMultiwordTermProviderHelper
Perform a check for multiword category, brand name, attribute value, color name if present, then add that into the respective list.
LowerCase
com.hcl.commerce.search.internal.expression.provider.SearchNLPLowerCaseProviderHelper
Convert server term into lowercase.
PriceRangeSeparator
com.hcl.commerce.search.internal.expression.provider.SearchNLPPriceRangeSeparatorProviderHelper
Check for searchs term that contain a price range filter with “–”. If yes, then replace "-" with the appropriate locale specific separator. Eg. : en – to, es – a, zh - 至 etc.
DMM
com.hcl.commerce.search.internal.expression.provider.SearchNLPDMMProviderHelper
Check whether the search term contains dimension details, then parse the search term for dimension matchmaker.
SpecialCharacter
com.hcl.commerce.search.internal.expression.provider.SearchNLPSpecialCharacterProviderHelper
If the search term contains a special character, add that token in the list of nouns.
MultiWordPriceFilter
com.hcl.commerce.search.internal.expression.provider.SearchMultiwordFilterProviderHelper
Check for multiword filter in search term. Then, replace space with NNNN for next processor to identify term as a single word.
Stopword
com.hcl.commerce.search.internal.expression.provider.SearchNLPStopwordProviderHelper
Remove words marked with IGNORE_TERM by the configuration filter from the search term.
WordToNumber
com.hcl.commerce.search.internal.expression.provider.SearchNLPWordToNumberProviderHelper
Convert the word into its equivalent numeric format.
PriceFilter
com.hcl.commerce.search.internal.expression.provider.SearchNLPPriceFilterProviderHelper
Check for search terms with price filter with multiword along with NNNN.
POS_NER
com.hcl.commerce.search.internal.expression.provider.SearchNLPPOSAndNERProviderHelper
Perform POS tagging and NER extraction. Check for NOUN, CATEGORY, BRAND_NAME, ADJECTIVE, ATTRIBUTE_VALUE and so on. Then add them to the respective list.
Color
com.hcl.commerce.search.internal.expression.provider.SearchNLPColorMMProviderHelper
Retrieve the color family details for color matchmaker base on inputted color name in the search term.

You can configure logging for the HCL Commerce Test server through the WebSphere Application Server Administrative console.