Common components of NiFi process groups

A common set of NiFi processors is used to perform various processing tasks within HCL Commerce NiFi process groups.

Process groups simplify complex data flows by allowing you to group components, such as processors, inside their own embedded canvas in the NiFi user interface. HCL Commerce Search comes with a set of default components that are commonly used in Ingest process groups. These processors are described below, as well as the most common set of NiFi-provided processors that are used.

For more information about process groups, see Anatomy of a Process Group in the Apache NiFi documentation.

HCL-supplied processors

ComposeDatabaseSQL
Typically used before ExecuteSQL. Its purpose is to define the SQL statement for use with ExecuteSQL, as well as to act as a user exit for an optional Ingest profile to perform additional modification to the given SQL before sending to ExecuteSQL.
AnalyzeExecuteSQLRecordResponse
Typically used after ExecuteSQL for analyzing the database query response. It has two properties: Relationship Type, and Refresh Index.
  • Relationship Type: Relationship type defines whether the incoming connection is in a state of "success" or "failure" - there is dedicated logic within this processor to classify whether the response is of a real failure, success, or empty.
  • Refresh Index: Refresh Index is an optional function which allows the Elasticsearch index to perform a refresh operation immediately after processing each database page.
RouteOnCatalog
Used only at the junction of the main processing flow at each flow stage, to determine how many additional flow files should be sent to the side flow. A "side flow" in NiFi is an alternative, optional processing flow in an ingestion pipeline that is used with Ingest profiles for performing custom ETL tasks. This processor uses three properties to control the side flows, which are based on Master Catalog, Default Catalog and Other Catalogs.

For more information, see Customizing Ingest profiles.

FilterOnCatalog
Used only at the junction of the main processing flow at each flow stage, to ensure that flow files with desired catalog properties are sent to the side flow. This processor uses three properties to control what can and cannot be routed to side flows: Master Catalog, Default Catalog, and Other Catalogs.
RouteOnLanguage
Used only at the junction of the main processing flow at each flow stage, to determine how many additional flow files should be sent to the side flow. This processor uses two properties to control the side flows, which are based on Default Language, and Other Supported Languages.
FilterOnLanguage
Used only at the junction of the main processing flow, at each flow stage, to rensure that flow files with desired language properties are sent to the side flow. This processor uses two properties to control what can and cannot be routed to the side flows: Default Language, and Other Supported Languages..
TrackBulkRequest
Used only at the beginning, immediately after entering any of the Bulk Services. TrackBulkRequest registers additional metadata against each incoming flowfile,to track its state and total time spent within this Bulk Service. The processor has a property, Dataflow Rate Control, which can be used to enable or disable rate control against the incoming dataflow. Rate Control can be used to slow down dataflow to the specified rate to avoid overloading Elasticsearch. In addition, this processor also acts as a user exit for an optional Ingest profile to perform additional customization to the incoming dataflow.
AnalyzeBulkResponse
Used only at the end of a Bulk Service. Its main uses are to analyze the Elasticsearch Bulk response for error determination, and to act as a user exit for an optional Ingest profile to perform additional post-processing customization to the dataflow. This processor also detects the last flowfile of a stage and sends a release signal to the corresponding Wait Link of that stage in the main flow, to allow it continue to the next stage.
ScrollElasticsearch
Scroll through a given Elasticsearch result set.
ComposeIndexSchema
Calling a given ingest profile (if defined) to customize an existing index schema for Elasticsearch.
SerializeDocument
Find all (two dimensional) records serially and convert in (single dimension) format to be processed by downstream custom processor.
MapIndexFieldsFromDatabase
Mapping of custom database table columns into the corresponding index schema fields for ingest operation.
PublishEvent
Publish the current flowfile content as an event to HCL Cache.
SubscribeEvent
Subscribe events generated from HCL Cache.
UpdateDocumentCounter
Increase or decrease a given HCL Cache counter with the provided delta value. This processor is mainly used with event counters for tracking dataflows inside of NiFi.
TrackDocument
Register meta data to be used for tracking the dataflow in the current ingest stage, such as Product Stage 1a - Create Product Documents.
RetryDocument
Retry selected portion of a given bulk request flowfile in its waiting queue.

NiFi-supplied processors

ExecuteSQL
Executes the provided SQL statement. For more information ,see ExecuteSQL in the Apache NiFi documentation.
ControlRate
Controls the rate at which data is transferred to follow-on processors. For more information, see ControlRate in the Apache NiFi documentation.
InvokeHTTP
Mainly used to interact with a configurable Elasticsearch HTTP endpoint. For more information, see InvokeHTTP in the Apache NiFi documentation.
RetryFlowFile
Mainly used, along with the default RetryDocument processor, to perform rule-based retry operations. For more information, see RetryFlowFile in the Apache NiFi documentation.