Full reindexing connector

This topic describes the dataflow of the default NiFi connector used for full-reindexing of the Elasticsearch search index.

Full reindexing connector dataflow

The reindexing connector rebuilds the Elasticsearch search index. This index consists of the indices Store, Catalog, Category, Attribute, Product, URL, and Price. The indexing process starts with the creation of a schema, the product database, and STA. Data is loaded by drilling progressively down into catalog and then category. Then the main flow loop takes over, processing attributes to products to SEO URLs and finally emerging from that loop, to price.

The following diagram illustrates this dataflow as of HCL Commerce Version 9.1.10. The flow consists of four main stages, each of which contains sub-stages that may repeat. Each stage is further described in Table 1.

The Background processing stage improves the search relevancy of each of the product documents. It does not affect the storefront. Once the search index goes live the store becomes operational.

Note: Natural Language Processing (NLP) and matchmaker relationships such as color matching do not rely on the index being finished to be usable. They are built separately and in parallel to the index and supplement its contents when they are complete.

Table 1. Table 1
Full Reindexing Connector Pipeline Dataflow Stages & Pipe/Flow Names
					LANGUAGE LOOP
SCHEMA	STA	STORE	CATALOG	CATEGORY	ATTRIBUTE	PRODUCT	SEO URL	PRICE	BACKGROUND
Store¹ StoreSchema²	Zookeeper DatabaseSTAZookeeperStage1	Store DatabaseStoreStage1	1a (Main Document) DatabaseCatalogStage1a	1a (Main Document) DatabaseCategoryStage1a	1a (Main Document) DatabaseAttributeStage1a	1a (Main Document) DatabaseProductStage1a	1a (Category URL) DatabaseURLStage1a	1a (Find Product Prices) DatabasePriceStage1a	Category Stage 1c (Long Descriptions) DatabaseCategoryStage1c
Catalog CatalogSchema	Synonym Stopword LoadSTASynonymsStopwordStage1		1b (Find Filters) DatabaseCatalogStage1b	1b (Find Facets) DatabaseCategoryStage1b	1b (Find Attribute Values) DatabaseAttributeStage1b	1b (Enrich Document) DatabaseProductStage1b	1b (Product URL) DatabaseURLStage1b	1b (Find Bundle Prices) DatabasePriceStage1b	Product Stage 1g (Long Descriptions) DatabaseProductStage1g
Attribute AttributeSchema				1d (Build Hierarchy) DatabaseCategoryStage1d		1h (Find Child Items) DatabaseProductStage1h	1c (Page Type) DatabaseURLStage1c	2 (Copy To Product) CopyLink	Enrich NLP NLPStage1
URL URLSchema				1e (Find Hidden Facets) DatabaseCategoryStage1e		1i (Find Parent Category) DatabaseProductStage1i	1d (Category URL Fallback) DatabaseURLStage1d		SplitLink – Inventory (Launches separate Inventory Connector:)
Category CategorySchema						1e (Find Attributions) DatabaseProductStage1e	1e (Product URL Fallback) DatabaseURLStage1e		Inventory Stage 1a (Parent Inventories) DatabaseInventoryStage1a
Product ProductSchema							1f (Content URL) DatabaseURLStage1f		Inventory Stage 1b (Child Inventories) DatabaseInventoryStage1b
Description DescriptionSchema									Inventory Stage 2 (Copy To Product) InventoryStage2
Price PriceSchema
Inventory InventorySchema
Workspace WorkspaceSchema

¹Process Group Label

²Pipe Name / Flow Name (use Flow Names in NiFi to find specific Process Groups in a given Stage)

This connector uses the following schemas.

eSite versus Catalog Asset Store reindexing

As described in Choosing your index model, HCL Commerce Search Version 9.1.12 introduces an alternative to the preexisting eSite indexing model. In that model, reindexing for eSite stores uses the process flow described in Full reindexing connector dataflow. Focusing on the main and side flows, the processes are performed in the following sequence:

Each processing flow can only accommodate one store and one supported language at a time. A flow control component is used to perform recurring internal dataflows in order to manage additional supported languages. SEO URL and other related metadata are indexed into a separate URL index for query time lookup.

In contrast, the Catalog Asset Store (CAS) indexing approach processes all underlying extended sites and their supported languages in one single dataflow. The processing stages are very similar to the eSite model, except that the URL stage is replaced with a Page stage that only indexes SEO templates associated with each keyword. SEO URL and meta data are no longer indexed and are calculated and cached at query time instead. The resulting process flow is shown in the following diagram.

For more information about indexing flow in the CAS indexing model, see Catalog Asset Store index features.