Full reindexing connector

This topic describes the dataflow of the default NiFi connector used for full-reindexing of the Elasticsearch search index.

HCL Commerce Version 9.1.10.0 or later

Full reindexing connector dataflow

The reindexing connector rebuilds the Elasticsearch search index. This index consists of the indices Store, Catalog, Category, Attribute, Product, URL, and Price. The indexing process starts with the creation of a schema, the product database, and STA. Data is loaded by drilling progressively down into catalog and then category. Then the main flow loop takes over, processing attributes to products to SEO URLs and finally emerging from that loop, to price.

The following diagram illustrates this dataflow as of HCL Commerce Version 9.1.10. The flow consists of four main stages, each of which contains sub-stages that may repeat. Each stage is further described in Table 1.

The Background processing stage improves the search relevancy of each of the product documents. It does not affect the storefront. Once the search index goes live the store becomes operational.
Note: Natural Language Processing (NLP) and matchmaker relationships such as color matching do not rely on the index being finished to be usable. They are built separately and in parallel to the index and supplement its contents when they are complete.
Table 1. Table 1

Full Reindexing Connector

Pipeline Dataflow Stages & Pipe/Flow Names

LANGUAGE LOOP
SCHEMA STA STORE CATALOG CATEGORY ATTRIBUTE PRODUCT SEO URL PRICE BACKGROUND
Store1

StoreSchema2

Zookeeper

DatabaseSTAZookeeperStage1

Store

DatabaseStoreStage1

1a (Main Document)

DatabaseCatalogStage1a

1a (Main Document)

DatabaseCategoryStage1a

1a (Main Document)

DatabaseAttributeStage1a

1a (Main Document)

DatabaseProductStage1a

1a (Category URL)

DatabaseURLStage1a

1a (Find Product Prices)

DatabasePriceStage1a

Category Stage 1c (Long Descriptions)

DatabaseCategoryStage1c

Catalog

CatalogSchema

Synonym Stopword

LoadSTASynonymsStopwordStage1

1b (Find Filters)

DatabaseCatalogStage1b

1b (Find Facets)

DatabaseCategoryStage1b

1b (Find Attribute Values)

DatabaseAttributeStage1b

1b (Enrich Document)

DatabaseProductStage1b

1b (Product URL)

DatabaseURLStage1b

1b (Find Bundle Prices)

DatabasePriceStage1b

Product Stage 1g (Long Descriptions)

DatabaseProductStage1g

Attribute

AttributeSchema

1d (Build Hierarchy)

DatabaseCategoryStage1d

1h (Find Child Items)

DatabaseProductStage1h

1c (Page Type)

DatabaseURLStage1c

2 (Copy To Product)

CopyLink

Enrich NLP

NLPStage1

URL

URLSchema

1e (Find Hidden Facets)

DatabaseCategoryStage1e

1i (Find Parent Category)

DatabaseProductStage1i

1d (Category URL Fallback)

DatabaseURLStage1d

SplitLink – Inventory

(Launches separate Inventory Connector:)

Category

CategorySchema

1e (Find Attributions)

DatabaseProductStage1e

1e (Product URL Fallback)

DatabaseURLStage1e

Inventory Stage 1a (Parent Inventories)

DatabaseInventoryStage1a

Product

ProductSchema

1f (Content URL)

DatabaseURLStage1f

Inventory Stage 1b (Child Inventories)

DatabaseInventoryStage1b

Description

DescriptionSchema

Inventory Stage 2 (Copy To Product)

InventoryStage2

Price

PriceSchema

Inventory

InventorySchema

Workspace

WorkspaceSchema

1Process Group Label

2Pipe Name / Flow Name (use Flow Names in NiFi to find specific Process Groups in a given Stage)

This connector uses the following schemas.

HCL Commerce Version 9.1.12.0 or later

eSite versus Catalog Asset Store reindexing

As described in Choosing your index model, HCL Commerce Search Version 9.1.12 introduces an alternative to the preexisting eSite indexing model. In that model, reindexing for eSite stores uses the process flow described in Full reindexing connector dataflow. Focusing on the main and side flows, the processes are performed in the following sequence:

Each processing flow can only accommodate one store and one supported language at a time. A flow control component is used to perform recurring internal dataflows in order to manage additional supported languages. SEO URL and other related metadata are indexed into a separate URL index for query time lookup.

In contrast, the Catalog Asset Store (CAS) indexing approach processes all underlying extended sites and their supported languages in one single dataflow. The processing stages are very similar to the eSite model, except that the URL stage is replaced with a Page stage that only indexes SEO templates associated with each keyword. SEO URL and meta data are no longer indexed and are calculated and cached at query time instead. The resulting process flow is shown in the following diagram.

For more information about indexing flow in the CAS indexing model, see Catalog Asset Store index features.