Natural Language Processing profiles

A Natural Language Processing (NLP) profile is used to preprocess search terms and modify search queries to fetch desired search results at the storefront.

Logic for processing search strings

Shoppers are not experts in formulating the search query that can yield the desired search results to them. They are often unaware of the ideal search terms used to find products or services at the storefront. Using Natural Language Processing, the Query service can parse plain-language search terms and discern what shoppers are trying to find. It modifies the search term at runtime to fetch the desired search results to the shoppers. The search term processing logic of the Query service is described with the help of the following examples. Each example consists of an example search term and the search term processing logic that the Query service uses to process the search term and fetch the desired search results at the storefront.

"white shirt girls"

The NLP parser generates the following three tokens to map the search term and then runs the Elasticsearch query to fetch the search results at the storefront. It returns two products.

white – COLOR
shirt – CATEGORY

girls – CATEGORY

"white shirt girls under 89$"

NLP parser generates the following four tokens to map the search term and then runs the Elasticsearch query to fetch the search results at the storefront. It returns a single product.

white – COLOR
shirt – CATEGORY
girls – CATEGORY

"white shirt girls under 89$"

NLP parser generates the following four tokens to map the search term and then runs the Elasticsearch query to fetch the search results at the storefront. It returns zero matches.

white – COLOR
shirt – CATEGORY
girls – CATEGORY
under 20$ - FILTER

In this case, the NLP parser uses search term-dropping logic. It drops the search phrase from the left with one token at a time until it gets the tokens to fetch the appropriate search results or up to four iterations. If there is any price filter in the search phrase/term, then it also gets removed in this process. After completing of the search-dropping logic, the NLP parser runs the Elasticsearch query based on the following two tokens. It returns all eight products from the girls’ category by considering the shirt as a category or in the product's name or the short description.

shirt – CATEGORY
girls – CATEGORY

"vitamin capsules"

NLP parser generates the following two tokens to map the search term and then runs the Elasticsearch query to fetch the search results. It returns zero matches because the capsule has been set as an attribute value and based on the tokens above the Elasticsearch query searches capsules against the Noun field.

vitamin – CATEGORY
capsules – NOUN

In this case, the NLP parser uses search term-dropping logic. But this also returns the empty search results because the capsule has been set as an attribute value and the Elasticsearch query searches capsules against the Noun field. To handle such situations, a business logic runs the fallback Elasticsearch query based on the previous analysis of the search phrase/term. The Elasitcsearch query gets executed based on the following token. This returns all the products with the vitamin category.

vitamin – CATEGORY

NLP profiles

A Natural Language Processor (NLP) profile controls the preprocessing flow of search terms before executing an Elasticsearch query. The profile is a .json file and stored in your query runtime container.

The default HCL_NLPProfile.json file can be found in the resources\profiles\nlp directory of the query runtime. An NLP profile can also be created through the /profiles REST endpoint and stored inside the Zookeeper “nlpprofiles” node. The /profiles endpoint allows a new optional query parameter, profileType with values of Search or NLP to differentiate the profile. Search is the default choice if this parameter is not provided.

The NLP profile contains three main sections:

A list of provider classes. These classes help to preprocess the search term. For more information about the provider classes, see Provider class reference.
A list of NLP classifications that will override default classifications at query run time.
A search term dropping priority section, which is used to define the sequence in which search term are dropped.

The following is a sample HCL_NLPProfile.json file, showing how the data is organized. In this sample the classification is provided for informational purposes only. In the HCL default profile this section is empty.

Creating or updating an NLP profile

To create or update an NLP profile, use the POST method. The following example shows how to use the POST method to create or update an NLP profile.

POST https://server:port/search/resources/api/v2/documents/profiles/HCL_NLPProfile?profileType=NLP

How the Query service finds the NLP profile

The Query runtime can load the NLP profile configuration details from the store index at runtime, or in response to a call via the Query REST API. If the configuration details are not provided, it will fall back on the default HCL NLP profile. The Query Service performs the following steps to lookup the name of the NLP profile.

The Query service will check for the store locale NLP profile. If it is found, this profile will be loaded from Zookeeper.
If no NLP profile is configured for the store locale, the query service will find the base locale from the language code of the store locale, and search for a profile name for that base locale. If one is found, this profile will be loaded from Zookeeper. The base locale can be any one of “en_US”, “es_ES”, “fr_FR”, “de_DE”, or “zh_CN”.
If no NLP profile has been configured for the local base, the Query service will find the default NLP profile name for the store that isconfigured in the STORECONF table. If found, then this profile will be loaded from Zookeeper.
If no default NLP profile is configured in STORECONF table, the Query Service will fall back to the default NLP profile.

Automatically handling search misses

Prior to HCL Commerce Version 9.1.8.0, if there are no results, search terms will be dropped from the search list, from left to right and one token at a time. You can now can specify which token gets removed from the search term when there are no results. In the NLP profile, the termDroppingPriority section details the priority according to which tokens are removed from the search term. After removing a token, the process makes another call with the updated search term. If a result is found, the Query service returns the result; otherwise, based on the configuration, another token is removed from search term. If you are using the default NLP profile, the dropping logic will be applied in the order below, but you can change the order or remove items from the list.

FILTER: Will remove the price filter from search term.
MEASUREMENT: Will remove measurement details from search term.
BRAND: Will remove brand name from search term.
COLOR: Will remove color name from search term.
ADJECTIVE: Will remove adjective from search term.
CATEGORY: Will remove category name from search term.
NOUN: Will remove nouns from search term.

Before applying the term dropping logic, the process also removes tokens that are not identified by the NLP processor. For more information about term dropping, see Addressing search misses due to search dropping.

If there is no response after applying this logic, then the Query service makes a final fallback call based on the spell corrected details. This step cannot be customized.

Note: You can disable term dropping by setting termDroppingPriority to a null value. For more information, see Disabling term dropping.

NLP profile classification

At query runtime, the NLP profile classification overrides the classification that was analyzed by the Query service from index data or the default NLP data model. Consider Catalog A has the following details in the NER file generated,

Dresses = CATEGORY
Bath = ATTRIBUTE_NAME
Style Home = BRAND_NAME
Hermitage Collection = BRAND_NAME
Albini = BRAND_NAME

You can override the above NER classification through the NLP profile as shown in the Sample Json 1.0. This configuration Dresses is considered as ATTRIBUTE_VALUE NER. Bath, Style Home and Hermitage Collection are always tagged with a CATEGORY NER for all the e-Site and Catalog.

Sample json 1.0

{
"profileName": "HCL_NLPProfile",
…..
"classification": {
		"Dresses": "ATTRIBUTE_VALUE ",
		"Bath": "CATEGORY",
		"Style Home": "CATEGORY",
		"Hermitage Collection": "CATEGORY"
}
…..
}

Create catalog-specific NER classifications

As shown in the example above Sample json 1.0, consider four catalogs shown with the following IDs:

3074457345616678668
3074457345616678669
3074457345616678670
3074457345616678671

In the NLP Profile classification, if BATH has a CATEGORY tag, consider it a CATEGORY, irrespective of the catalog.

Bath: CATEGORY

To specify the catalog specific NER, you can create the classification such as NER_TAG: [list of catalogId], if one keyword is required to map with different classifications for different catalogs, add semicolon to separate values such as NER_TAG1: [list of catalogId]; NER_TAG2: [list of catalogId].

Note: Do not add duplicate catalog for the same keyword in different classifications, first tag that is matching with the NER for catalog takes the precedence. For example, “Keyword”: “NER_TAG1: [Catalog A,Catalog B];NER_TAG2: [Catalog B]”. Do not include space while defining the value for keyword.

Here is the sample NLP Profile for catalog-specific NER classification.

Sample json 2.0

{
      "profileName": "HCL_NLPProfile",
                           …..
                          "classification": {
                "Dresses": "NOUN:[3074457345616678668]",  
                                   "Bath": "CATEGORY",
                                   "Hermitage Collection": "CATEGORY:[3074457345616678669];NOUN:[3074457345616678668]",
                "Style Home": "CATEGORY",
                                   "Albini": "CATEGORY:[3074457345616678669,3074457345616678670];NOUN:[3074457345616678671]"
       },
      …..
}

As per the sample data:

Dresses is a tag with CATEGORY NER classification.
Bath is a tag with ATTRIBUTE_NAME NER classification.
Style Home is a tag with BRAND_NAME NER classification.
Hermitage Collection is a tag with BRAND_NAME NER classification.
Albini is a tag with BRAND_NAME NER classification.

As per Sample json 2.0

Search for Dresses with catalog id 3074457345616678668 has a tag with NOUN and the search gets performed on the noun fields. For all other catalogs except 3074457345616678668 Dresses are considered as CATEGORY.
Search for BATH is considered a CATEGORY as it is not mapped to any catalog, and the search is performed on the category fields.
Search for Hermitage Collection with catalog 3074457345616678669 has a tag with CATEGORY and with 3074457345616678668 tag with NOUN, and the search is performed on the noun fields.
Searching for Style Home is considered a CATEGORY as it is not mapped to any catalog, and the search is performed on the category fields.
Searching for Albini with any catalog except 3074457345616678669, 3074457345616678670, and 3074457345616678671 tags with the BRAND_NAME.
With catalog 3074457345616678669 and 3074457345616678670, Albini will be tag as CATEGORY and the search is performed on category fields.
With catalog 3074457345616678671, the Albini is tagged with NOUN and search is performed on noun fields.

For example, consider a case where the word “apple” is classified as a BRAND_NAME by the Query service based on the index data. If you want to classify “apple” as a category, this change can be configured in the NLP profile classification section.

Helper classes for NLP providers

Provider class reference: The provider classes used in the profile have the following functions.
PartNumber: com.hcl.commerce.search.internal.expression.provider.SearchNLPPartNumberProviderHelper; Matches the input search term with the part number patterns, if it matches then perform a search for a part number. The rest of the helper classes will not be executed.
BlankSpace: com.hcl.commerce.search.internal.expression.provider.SearchNLPWhiteSpaceProviderHelper; Replace more than two white spaces with a single white space.
CurrenySymbol: com.hcl.commerce.search.internal.expression.provider.SearchNLPCurrencySymbolProviderHelper; If the search term contains a price filter with the currency symbol, then the currency symbol will be removed from the search term.
ExcludeSearchTerm: com.hcl.commerce.search.internal.expression.provider.SearchNLPExcludedTermProviderHelper; Remove the excluded term from the search term.
STA: com.hcl.commerce.search.internal.expression.provider.SearchNLPSTAExpansionProviderHelper; Performs Search Term Association (STA) expansion and replacement at query time in the Query service.
MultiWordSearchTerm: com.hcl.commerce.search.internal.expression.provider.SearchNLPMultiwordTermProviderHelper; Perform a check for multiword category, brand name, attribute value, color name if present, then add that into the respective list.
LowerCase: com.hcl.commerce.search.internal.expression.provider.SearchNLPLowerCaseProviderHelper; Convert server term into lowercase.
PriceRangeSeparator: com.hcl.commerce.search.internal.expression.provider.SearchNLPPriceRangeSeparatorProviderHelper; Check for search terms that contain a price range filter with “–”. If yes, then replace "-" with the appropriate locale specific separator. Eg. : en – to, es – a, zh - 至 etc.
DMM: com.hcl.commerce.search.internal.expression.provider.SearchNLPDMMProviderHelper; Check whether the search term contains dimension details, then parse the search term for dimension matchmaker.
SpecialCharacter: com.hcl.commerce.search.internal.expression.provider.SearchNLPSpecialCharacterProviderHelper; If the search term contains a special character, add that token to the list of nouns.
MultiWordPriceFilter: com.hcl.commerce.search.internal.expression.provider.SearchMultiwordFilterProviderHelper; Check for multiword filters in search terms. Then, replace space with NNNN for the next processor to identify the term as a single word.
Stopword: com.hcl.commerce.search.internal.expression.provider.SearchNLPStopwordProviderHelper; Remove words marked with IGNORE_TERM by the configuration filter from the search term.
WordToNumber: com.hcl.commerce.search.internal.expression.provider.SearchNLPWordToNumberProviderHelper; Convert the word into its equivalent numeric format.
PriceFilter: com.hcl.commerce.search.internal.expression.provider.SearchNLPPriceFilterProviderHelper; Check for search terms with price filter with multiword along with NNNN.
POS_NER: com.hcl.commerce.search.internal.expression.provider.SearchNLPPOSAndNERProviderHelper; Perform POS tagging and NER extraction. Check for NOUN, CATEGORY, BRAND_NAME, ADJECTIVE, ATTRIBUTE_VALUE and so on. Then add them to the respective list.
Color: com.hcl.commerce.search.internal.expression.provider.SearchNLPColorMMProviderHelper; Retrieve the color family details for color matchmaker base on inputted color name in the search term.

You can configure logging for the HCL Commerce Test server through the WebSphere Application Server Administrative console.