HCL Commerce Version 9.1.15.0 or later

Collecting Elasticsearch MustGather data for Ingest issues

About this task

This MustGather can be used to investigate the cause of Elasticsearch-based issues where Ingest does not complete its run. If you are trying to answer one of the following questions, this is the MustGather you will want to use:
  1. Ingest normally takes a predictable amount of time and has considerably exceeded that without finishing.
  2. The Ingest process appears to be 'stuck' working on one stage.

Procedure

  1. Run the following REST API call to start the indexing job and save the runId value from the response for later use:
    https://ingest_hostname:port/connectors/connectorId/run?storeId=storeId&envType=envType 
  2. Run the following REST API to read the current status of the ingest run (using runId from Step 1).
    https://ingest_hostname:port/connectors/connectorId/runs/runId/status 
  3. Periodically check on status of ingest (by re-running the request from Step 2). Wait for it to reach the particular stage that it fails to complete.
  4. Once you observe the ingest run reach the stage that you believe Ingest is not completing, collect the responses for the following REST API calls.
    https://ingest_hostname:port/connectors/connectorId/runs/runId/
    https://ingest_hostname:port/connectors/connectorId/runs/
    https://ingest_hostname:port/connectors/connectorId
  5. Go to your NiFi console in a browser.
    http://nifi_hostname:port/nifi/
  6. Use the search bar to locate the stage referenced from your /status API call (from Step 3). For example, if the stage was auth.reindex - Product Stage 1e, look for the following element:
    <INSERT_IMAGE_ProductStage1e>
    
  7. Navigate through the process group until you find the processor(s) that contain recently processed data and note them for later investigation. For example, the following string indicates that the Execute SQL - Find Attributes processor has shown recent activity:
    <INSERT_IMAGE_ProductStage1e_query>

Results

Reviewing the data

Ingest run status
You can use the Ingest run status to identify the current stage of processing that a given run is in, or report a summary of the results of a completed run. For example, here is sample output from an Ingest run that is currently processing Product Stage 1e:
{
                    "date": "2023-11-09T19:01:59.769",
                    "runId": "i-b11b8f08-2143-4bc8-b0ac-8e83131d21c7",
                    "status": -1,
                    "progress": "52% (47 out of 90 pipes processed)",
                    "totalTime": "",
                    "type": "Ingest",
                    "message": "Indexing running, current progression of indexing is at process group: auth.reindex - Product Stage 1e (Find Attributions)",
                    "summary": ""
                    }
                

If Ingest does not complete a particular stage, you can try to run this status call multiple times over a period of time and still observe it hanging at the same stage of processing.

Note that "pipes" refers to the individual NiFi objects within the processor group. This metric can be useful if the Ingest process is looping over a set of particular processors. Depending on how many times you check the status and the time interval, you may observe that it is still reporting the same stage but referencing different "pipe" values.

Example scenario 1: Ingest looping
Below is an example of status checks that may point to looping behaviour
{
                    "date": "2023-11-09T19:01:59.769",
                    "runId": "i-b11b8f08-2143-4bc8-b0ac-8e83131d21c7",
                    "status": -1,
                    "progress": "52% (47 out of 90 pipes processed)",
                    "totalTime": "",
                    "type": "Ingest",
                    "message": "Indexing running, current progression of indexing is at process group: auth.reindex - Product Stage 1e (Find Attributions)",
                    "summary": ""
                    }  
                
{
                    "date": "2023-11-09T19:06:55.244",
                    "runId": "i-b11b8f08-2143-4bc8-b0ac-8e83131d21c7",
                    "status": -1,
                    "progress": "40% (36 out of 90 pipes processed)",
                    "totalTime": "",
                    "type": "Ingest",
                    "message": "Indexing running, current progression of indexing is at process group: auth.reindex - Product Stage 1e (Find Attributions)",
                    "summary": ""
                    } 
{
                    "date": "2023-11-09T19:11:47.312",
                    "runId": "i-b11b8f08-2143-4bc8-b0ac-8e83131d21c7",
                    "status": -1,
                    "progress": "52% (47 out of 90 pipes processed)",
                    "totalTime": "",
                    "type": "Ingest",
                    "message": "Indexing running, current progression of indexing is at process group: auth.reindex - Product Stage 1e (Find Attributions)",
                    "summary": ""
                    } 
Example scenario 2: Ingest does not complete
The following is an example of status checks that may point to Ingest not completing a specific process.
{
                            "date": "2023-11-09T19:01:59.769",
                            "runId": "i-b11b8f08-2143-4bc8-b0ac-8e83131d21c7",
                            "status": -1,
                            "progress": "52% (47 out of 90 pipes processed)",
                            "totalTime": "",
                            "type": "Ingest",
                            "message": "Indexing running, current progression of indexing is at process group: auth.reindex - Product Stage 1e (Find Attributions)",
                            "summary": ""
                            }  
{
                            "date": "2023-11-09T19:06:55.244",
                            "runId": "i-b11b8f08-2143-4bc8-b0ac-8e83131d21c7",
                            "status": -1,
                            "progress": "52% (47 out of 90 pipes processed)",
                            "totalTime": "",
                            "type": "Ingest",
                            "message": "Indexing running, current progression of indexing is at process group: auth.reindex - Product Stage 1e (Find Attributions)",
                            "summary": ""
                            } 
{
                            "date": "2023-11-09T19:11:47.312",
                            "runId": "i-b11b8f08-2143-4bc8-b0ac-8e83131d21c7",
                            "status": -1,
                            "progress": "52% (47 out of 90 pipes processed)",
                            "totalTime": "",
                            "type": "Ingest",
                            "message": "Indexing running, current progression of indexing is at process group: auth.reindex - Product Stage 1e (Find Attributions)",
                            "summary": ""
                            } 
NiFi Console
Once you have identified the stage of processing that Ingest is not completing, you can use the NiFi console to review the ongoing behavior of this stage. Specifically, yu can identify the particular processor or processors that are seeing activity while ingest appears to be 'stuck .' This can help confirm the cause of this issue.

Each process group is made up of three general stages (ETL):

  • Extraction: Retrieving data from particular location (index, database, etc)
  • Transformation: Modifying data to prepare it for loading into desired index fields
  • Loading: Pushing data into Elasticsearch index

So, for example, if Ingest does not complete processes used for extraction, perhaps there is an issue at the database layer preventing your SQL queries from completing in a timely manner (due for instance to database locks, high load, etc).