Adding custom nouns and classifications to NLP Name-Entity-Recognition (NER)

Name Entity Recognition (NER) is one of the most common text pre-processing techniques used in Natural Language Processing (NLP). NER is used in many fields in Artificial Intelligence (AI) including Natural Language Processing (NLP) and Machine Learning. Apart from the default entities for NER in the Stanford CoreNLP Natural Language Processor, you can also add custom language specific nouns and classifications for NER. The custom noun and classification for NER can be configured in the zookeeper filter node. New nouns and classifications are added using the POST request method. The existing nouns and classifications are updated using the PATCH request method.

Endpoints

The endpoints for this service are:
http://data_environment_hostname:30920/search/resources/api/v2/configuration?nodeName=filter&envType=auth&locale=en_US
https://data_environment_hostname:30921/search/resources/api/v2/configuration?nodeName=filter&envType=auth&locale=en_US
Include your configuration in JSON syntax in the body of the query. For example, the following code sets the NER tags for the "Dresses" category.
{
	“Dresses” : “CATEGORY”,
	“Versatil” : “BRAND_NAME”,
	“Size” : “ATTRIBUTE_NAME”,
	“XL” : “ATTRIBUTE_VALUE”,
	“an” : “IGNORE_TERM”,
	“One and half” : “TO_NUMBER~1.5”,
	“below” : “FILTER_LTE~1”,
	“above” : “FILTER_GTE~1”,
	“red” : “COLOR”,
	“inch” : “UOM”
}
Note: The PATCH request method for en_US Locale and the POST request method for non en_US Locale are used to introduce new nouns and classifications.

Default NER tags

HCL Commerce Search comes with a default set of NER tags. These should allow you to classify words for most situations.
CATEGORY
Maps to the category related index fields in the Product index.
BRAND_NAME
Maps to the manufacturer name indexed field in the Product index.
ATTRIBUTE_VALUE
Maps to the NLP Adjective indexed fields in the Product index.
ATTRIBUTE_NAME
Maps to the indexed attribute name in the Product index.
IGNORE_TERM
Removes the matching terms from the term search expression.
TO_NUMBER
Maps to the NLP Numeric indexed fields in the Product index.
FILTER_GTE~1
Defines a range filter condition that is greater than or equal to the given argument. This argument is the term which follows immediately the matching pattern.
FILTER_LTE~1
Defines a range filter condition that is greater than or equal to the given argument. This argument is the term which follows immediately the matching pattern.
UOM
Maps to the MatchMaker unit of measure indexed fields in the Product index.
COLOR
Maps to the MatchMaker color family indexed fields in the Product index.
Note: New nouns and classifications are added using the PATCH request method for en_US Locale. The POST request method is used for non-en_US locales.
HCL Commerce Version 9.1.11.0 or later

Custom NER tags

If the default NER tags do not cover all of your needs, you can define your own. You do this by adding a new NER tag mapping via the /configuration endpoint, using the Patch method and a request body containing a JSON-formatted tag definition. Each request extends one NLPSearchFieldMapping object, and you can only map to product index fields.

Procedure

  1. Define your new mapping tag. In the example below, the mapping tag is SELLER.
    {
    	"extendedconfiguration": {
    		"configgrouping": [
    			{
    				"name": "NLPSearchFieldMapping",
    				"property": {
    					"name": "NLPFieldsDetail",
    					"value": "[{\"NERTag\":\"SELLER\",\"IndexRawFieldName\":\"seller.raw\",\"IndexNormalizedFieldName\":\"seller.normalized\",\"FieldLevelLemmatization\":\"false\", \"BoostFactor\":\"100.0\"}]"
    				}
    			}
    		]
    	}
    }
    The SELLER mapping tag uses five tags. Of these five tags, three are mandatory, and the other two are optional and have default values.
    NERTag
    The name of the NER tag used to classify the token (SELLER).
    IndexRawFieldName
    The field to use as the aggregation field while training custom data and raw field while searching for the term. (seller.raw).
    IndexNormalizedFieldName
    The field to use as a normalized field while searching for the term (seller.normalized).
    FieldLevelLemmatization
    While training and searching, apply lemmatization on the field if set to true. The default value is false.
    BoostFactor
    The search value of the boost factor is used to apply for boosting. Default value is 100.0.
    Note: First three tags are mandatory missing any will log an error and continue with other tags. This could impact on the search result.
  2. Update the new NER tag mapping using the PATCH request method for the /configuration endpoint . Add your new mapping as the request body.
    PATCH -http://dataQueryHost:dataQueryPort/search/resources/api/v2/configuration?nodeName=component&envType=auth
  3. After you have updated the mapping, restart the Query service.

Result

Your custom NER tag is now available for use in NLP processing.