HCL Commerce Version 9.1.7.0 or later

Using List Aggregation with Ingest

You can issue SQL calls to retrieve database data from inside the NiFi pipeline. This information is converted from the "2d" tabular format in which the database stores it, to the one-dimensional string that the Elasticsearch process expects. Normally this list aggregation takes place inside the database, however each database imposes a limit on how long a returned string can be; in the case of DB2, for example, this length is 32k. If the SQL tries to serialize a longer string, it will be truncated.

To avoid this problem, HCL Commerce Search provides an application-level function to perform the serialization rather than having the database do it. You control this behavior using a variable that you can set at the flow level or, as a global switch, at the level of the reindex link or the NRT link. This variable is flow.database.listagg, and the default value is True. Setting it in ReindexLink, NRTLink, or DataloadLink means to define this property globally throughout the entire dataflow, while setting this property against a connector pipe instead will scope the operation only against that given stage.

In general you should use the database LISTAGG function and only disable it when the limit is reached. This avoids the unnecessary overhead of performing the list aggregration inside the NiFi application.