Optimizing the Natural Language Processing service

In certain circumstances you may want to use the Basic mode instead, for example, when your language is not supported by CoreNLP, because you want to reduce the size of your Query containers, or to reduce required heap memory for the Query container in the runtime environment. The NLP mode and its scope are controlled by the environmental variable NLP_ENABLE_LANGUAGE_CODE or Vault key nlpEnableLanguageCode. This variable supports a list of up to eight languages that are supported by a Stanford CoreNLP module, . If you add languages beyond the eight supported by CoreNLP, as described in Adding languages to the NLP service, the extra languages are evaluated by the Basic NLP module.

For more information on Basic NLP, see Considerations when using Basic Natural Language Processing.

Blocking the Query service during NLP initialization

When the Query service starts it begins Natural Language Processing initialization. This can take several minutes, and during that time calls can be made to the Search service. Inconsistent search results may be returned when a search is made to an initializing Query engine and then repeated and the search string sent to an already running NLP service. This could occur if you are running or starting multiple Query services and is undesirable behavior.

You can block the Query service from serving live data from a partially initialized cache. This block causes a fully initialized Query Service /health endpoint check to return a Service Unavailable (503 http status code) response until the NLP service is fully initialized. Once it is, the Query service will return a 200 http status in response.

You can selectively block different environments based on the environment type and environment name. The Auth and Live environment types are checked, and then the configured environment name in the component configuration. By default the Query service will perform this action for the prod and preprod environments, but you can include other environments at the /configuration endpoint shown below. This configuration will check the ENVIRONMENT environment variable to verify the environments exist.

Note: If the "value" variable is empty, the blocker is disabled.

Use the POST request method at the /configuration endpoint if this is the first time you are making this configuration change. Subsequently, use the PATCH request method to make updates. Restart the Query service after making any changes.

PATCH/POST http://dataQueryHost:dataQueryPort/search/resources/api/v2/configuration?nodeName=component&envType=auth

{
    "extendedconfiguration": {
        "configgrouping": [
            {
                "name": "SearchConfiguration",
                "property": [
                    {
                        "name": "LockQueryServiceForNLPIntialization",
                        "value": "prod,preprod,dev"
                    }
                ]
            }
        ]
    }
}

Setting the NLP refresh interval

The NLP refresh interval defines the time between queries to the Stanford core Natural Language Processing object for updated data for Named Entity Recognition classification. If any custom NER classification is added or index data updated with category name, manufacturer name or attribute values, it will be identified by the NLP service after the NER values are updated by the scheduled job, which updates the NER details in the NLP object. By default the NLP refresh interval is set to 30 minutes. You can set this refresh interval time using the NLP_REFRESH_INTERVAL environment variable.