Release changes to the Ingest service

SQL calls can be issued to retrieve database data from inside the NiFi pipeline. This information is converted from the "2d" tabular format in the database to a one-dimensional string for Elasticsearch. Normally this list aggregation takes place inside the database, however each database imposes a limit on the length of the returned string. If the SQL tries to serialize a longer string, it will be truncated. A solution to this issue was introduced in HCL CommerceVersion 9.1.7.

LISTAGG parameter introduced in Version 9.1.7

HCL Commerce Search provides an application-level function to perform the serialization rather than performing the aggregation in the database. You can control this behavior using the flow.database.listagg variable/flowfile attribute. The default value is set to True, which maintains DB-level string aggregation.

Setting the flow.database.listagg attribute in ReindexLink, NRTLink, or DataloadLink defines this property globally throughout the entire dataflow. Setting the flow.database.listagg flowfile attribute as needed in a connector pipe instead will scope and isolate the application-level string aggregation to the given pipeline processing stage and is preferable for optimizing ingestion performance.

The default value of True means that the system will rely on the database to perform list aggregation. This default setting is fast but imposes a limit on the size of returned strings, which is different for each of the supported databases. Setting the value to False switches on application-level list aggregation, which has no size limit to the returned query, however, this change may result in a 2x increase in ingestion processing time.

Tip: A general guideline is to use the database LISTAGG function if possible and only disable it on specific connectors if aggregation limits/issues arise to avoid the unnecessary overhead of performing list aggregation inside of NiFi application. For more information, see Tunable Parameters in the setup of NiFi and Elasticsearch.

Upgrading to Version 9.1.10

In Version 9.1.10.0 and above, the Ingest service automatically synchronizes NiFi with your custom connector descriptors stored in Zookeeper. The default connector descriptors from earlier releases, which in previous versions were stored inside of Zookeeper, are no longer needed, except those that are customized. For these, you will need to maintain your own copy inside of Zookeeper.

In versions prior to Version 9.1.10.0, whenever the Ingest service is restarted, all the connector descriptors are deleted from the Zookeeper /connectors node. The Ingest service recreates the connectors inside Zookeeper based on the default connector descriptors. This will cause you to lose your customized connector descriptors.

For detailed instructions, see Migrating Ingest service customizations.