WebSphere Commerce Search index schema definition

The WebSphere Commerce Search index schema definition contains information for processing data in WebSphere Commerce.

For more detailed information, check the solrhome/MC_masterCatalogId/locale/CatalogEntry/conf/schema.xml file for the search index schema definition that matches your version of WebSphere Commerce search.

Catalog Entry extended sites override

When an extended site overrides any of the following catalog entry properties: Name, short description, and keywords, both the asset store, and the extended site properties are being indexed. If a hit matches either the asset store, or the extended site store, the product is returned and updated by the run time with the extended site override display text.

Category Name

The category name is being indexed under the CatalogEntry index. This is a searchable field, which is used to enhance the relevancy of the product that is being returned. Runtime components boost this field. For more information, see Search relevancy and merchandising.

Master catalogs and multiple languages

A WebSphere Commerce instance might have multiple master catalogs, with each master catalog that belong to one store, or in extended sites, might belong to a cluster of stores. The master catalog is not reflected in the index schema. That is, it is reflected at the level of index cores. In addition, each master catalog contains its supported languages through its attached stores. A separate index core is created for each language and master catalog.

The following are the main benefits of maintaining separate index cores for each language:
Clean schema design
The text search schema is language-dependent. For example, different languages contain different analyzers, where some languages contain spell checking or stemming, while others do not.
Runtime query performance
Shoppers typically only search by using one language. The query then runs against only one index.
Easier maintenance for language dictionaries
Each language contains different synonyms, stop words, and rules.

Where the WebSphere Commerce instance contains two master catalogs, with each master catalog possibly containing different languages. Separate catalog entry index cores are created in each language, under each master catalog. When new WebSphere Commerce objects are indexed, new cores are created by using the same structure. Each index then provides its own set of search functionality in the storefront, such as its own automatic keyword suggestions, spelling corrections, and dictionaries.

WebSphere Commerce store models and catalog entry types

Search indexing supports various store models by indexing the top-level Catentry store_id. This information is stored in the STORECENT table. In extended site, the store path information is built in query conditions, including both the shopper's extended site store_id, and parent asset store_id.

WebSphere Commerce catentry types such as Product, Item, Bundle, Package, and DynamicKit are supported by default for WebSphere Commerce Search indexing. They are in the same index, with their data extracted separately from the WebSphere Commerce database. For example, the query to extract attribute values for products differs for items. The CATENTTYPE_ID is indexed as a field in the search index to distinguish each catentry's type.

Prices in WebSphere Commerce Search

WebSphere Commerce Search indexes the default offer price. The default offer price is typically considered as the offer price under the catentry's owning store's master catalog's default trading position container. The default offer price can also be complex by considering the following variables:
The store's default currency
Each store contains only one default currency. This default currency price is indexed as a searchable field. The default index contains USD as the default currency, following the WebSphere Commerce sample stores. If your store's default currency is different, you can index it by adding new searchable fields in the index schema and adding new SQL in the Data Import Handler (DIH).
Multiple stores, each with a different default currency
This requires separate index fields for each currency, For example, price_USD, price_CAD, price_EUR, price_CNY. The following currency codes contain a predefined default index field:
  • USD
  • CAD
  • EUR
  • CNY
  • TWD
  • JPY
  • GBP
  • KRW
  • BRL
  • PLN
  • RON
  • RUB
  • EGP
  • ILS
  • TRY
Where the default indexing indexes the price for different currencies, if there is a value for the currency in the OFFERPRICE table.
Stores with multiple currencies (currency conversion)
There are two different approaches in WebSphere Commerce to support multiple currencies and currency conversion. The first approach is populating the OFFERPRICE table with the price in each currency. The second approach is calculating the currency exchange based on the default currency price during run time, with the exchange rate in the CURCONVERT table. Since the WebSphere Commerce sample stores use the second approach and contain only the price for USD, the default indexing only contains the price_USD field as populated.
Miscellaneous attributes of offer
The effective date (OFFER.STARTDATE and OFFER.ENDDATE) is not considered by the default index. Instead, the offer precedence is considered in indexing, with only the offer price with the highest OFFER.PRECEDENCE indexed. The offer quantity is also considered in indexing, where only the offer with (OFFER.MINIMUMQUANTITY IN (1, 0) OR OFFER.MINIMUMQUANTITY IS NULL) is indexed.
Price override in extended site stores
Not indexed by default.
Price rule
Not indexed by default.
Catalog Entry List price
Indexed by default.

Contract entitlement

The runtime contract entitlement filtering is performed through the product set. Based on the shopper's eligible contract in the session, where the contracts are represented as a list of included and excluded product sets, the PRSETCEREL table is checked against each catentry to ensure that the shopper is eligible to access the catentry. Therefore, the PRSETCEREL.PRODUCTSET_ID is indexed and queried with the shopper's list of included and excluded product sets. When the includeEntireCatalog flag is true the entitlement logic ignores productset checking. That is, the entire catalog is selected and productset_id is not evaluated against the search index.

Spell check

A dedicated indexed multivalue field is being used to maintain the source and scope of the spell check suggestions. The content of the spellCheck field is populated by copying other column's content. The following are all the fields that are copied into the spellCheck field:
  • name
  • shortDescription
  • keyword
  • nameOverride
  • shortDescriptionOverride
  • keywordOverride

The spell check index field is the source of the keyword suggestions under the Keyword auto-suggest section, and also the source of the suggestions of misspelled keywords in the storefront.

Parent catalog groups and catentries

There are two parent catalog group fields in the index:
  • parentCatgroup_id_search
  • parentCatgroup_id_facet
The parentCatgroup_id_search field is used for search filtering. For example, searching for a catentry under a specific parent catalog group that can be in master catalog or sales catalog, where the parent catalog group is not necessarily the direct parent of the catentries. To improve the runtime search performance, parentCatgroup_id_search includes both the catentry's direct parent group and all its parents in the catalog hierarchical graph. In addition, since a catentry can contain different parent groups under different catalogs, the parent catgroup must be prefixed by the catalog_id: CatalogId_ParentCatgroupId. This multivalued field improves the search performance, but reduces the catgroup faceting performance. This is because the search engine calculates the faceting count for every parent catgroup, even if most of the calculation results are not useful for faceting.

Therefore, the parentCatgroup_id_facet field is used for search faceting, and only indexes the catentry's direct parent catalog groups in one specific catalog. That is, catgroup faceting needs only the faceting numbers on leaf catalog groups to build the navigation tree. The format of the field is parentCatgroupId.

For parent catentries in WebSphere Commerce, an item can belong to one product, many packages, many bundles, and many kits. These parent catentry's IDs are indexes in a field in the index, so that the parent catentry can be easily identified.

Attribute Dictionary

Attributes in the Attribute Dictionary are indexed as catentry properties. Not only are the attributes indexed, but business users can dynamically add attributes from the index.

The mapping is stored in the ATTRDICTSRCHCONF table. ADS_Fn columns are for string type of attributes, ADI_Fn columns are for integer type of attributes, and ADF_Fn columns are for float type of attributes.

Attribute Dictionary attributes

Some of the catalog entry's associated Attribute Dictionary Attributes properties are being index under the multivalue ad_attribute catalogEntry index field. The properties are tokenized into a string, which is separated by a delimiter.

The following snippet is an example of the raw Attribute Dictionary attributes data as it appears in the catalog entry index:

<arr name="ad_attribute">
  <str>7000000000000000003/_/construction/_/Construction/_/1.00000/_/7000000000000000025/_/Knit/_/Knit/_/19.00000/_/1/_/0/_/1/_/0/_/0/_/2/_/0</str> 
  <str>7000000000000000004/_/length/_/Length/_/1.00000/_/7000000000000000027/_/Short/_/Short/_/20.00000/_/1/_/0/_/1/_/0/_/0/_/2/_/0</str> 
  <str>7000000000000000005/_/material/_/Material/_/1.00000/_/7000000000000000030/_/Synthetic/_/Synthetic/_/21.00000/_/1/_/0/_/1/_/0/_/0/_/2/_/0</str> 
  <str>7000000000000000002/_/swatchcolor/_/Color/_/1.00000/_/0/_/0/_/0/_/.00000/_/1/_/1/_/1/_/0/_/1/_/1/_/0</str> 
  <str>7000000000000000006/_/occasion/_/Occasion/_/1.00000/_/7000000000000000037/_/Cocktail & Evening/_/Cocktail & Evening/_/25.00000/_/1/_/0/_/1/_/0/_/0/_/2/_/0</str> 
  <str>7000000000000000001/_/swatchSize/_/Available Sizes/_/2.00000/_/0/_/0/_/0/_/.00000/_/1/_/1/_/1/_/0/_/1/_/1/_/0</str> 
</arr>
The following properties are being indexed and mapped to the following internal index field name:
Attributes properties
Index Database source table/column name Internal field name as defined in the wc-component.xml Description
0 CATENTRYATTR.ATTR_ID attr_id The attribute unique ID.
1 ATTR.IDENTIFIER identifier The attribute identifier.
2 ATTRDESC.NAME name The attribute display name.
3 CATENTRYATTR.SEQUENCE sequence The attribute sequence with its product.
4 CATENTRYATTR.ATTRVAL_ID attrval_id The attribute value unique ID.
5 ATTRVAL.IDENTIFIER identifier The attribute value identifier.
6 ATTRVALDESC.STRINGVALUE,ATTRVALDESC.VALUE value The actual attribute value display value (if string type, uses the STRINGVALUE field; otherwise, uses the VALUE field.
7 ATTRVALDESC.SEQUENCE sequence The attribute value sequence with its attribute.
8 ATTR.DISPLAYABLE displayable The attribute is marked searchable.
9 ATTR.SEARCHABLE searchable The attribute is marked displayable.
10 ATTR.COMPARABLE comparable The attribute is marked comparable.
11 ATTR.STOREDISPLAY storeDisplay The attribute is being used for ribbon ad.
12 ATTR.FACETABLE facetable The attribute is marked facetable.
13 CATENTRYATTR.USAGE usage The attribute is assigned to a product as either descriptive or defining attribute.
14 ATTRVALDESC.IMAGE1 image1 The attribute value image 1.

Bundles, Packages, and Dynamic Kits' components

Some of the catalog entry's components properties are indexed under the CatalogEntry component's multivalue index field. The properties are tokenized into a string, which is separated by a delimiter, and sequenced according to their sequence value in the CATENTREL.SEQUENCE, DKPDCCOMPLIST.SEQUENCE database columns.

The following snippet is an example of the raw components data as it appears in the catalog entry index:

Bundle, and Package component index field:

<arr name="components">
   <str>11412/_/1.00000</str> 
   <str>11420/_/1.00000</str> 
</arr>
Dynamic Kit component index field:

<arr name="components">
   <str>12173/_/1.00000/_/11504/_/11504/_/assetModel001Class</str> 
</arr>
The following dynamic Bundle and Package properties are being indexed and mapped to the following internal index field name:
Bundle and Package properties
Index Database source table/column name Internal field name as defined in the wc-component.xml Description
0 CATENTREL.CATENTRY_ID_CHILD, and DKPDCCOMPLIST.CATENTRY_ID catentry_id The component unique ID.
1 CATENTREL.QUANTITY, DKPDCCOMPLIST.QUANTITY quantity The component quantity.
2 DKPDCCOMPLIST.DKPREDEFCONF_ID preDefConfId The Dynamic kit predefined configuration ID.
3 KPDCCOMPLIST.DKPDCCOMPLIST_ID componentListId The Dynamic kit components ID.
4 DKPDCCOMPLIST.GROUPNAME groupName The Dynamic kit group header name that is used to group like components for display.

Dynamic Kits indexed properties

The following Dynamic Kit properties are being indexed and mapped to the following internal index field name:
Dynamic Kit properties
Database source table/column name Internal field name as defined in the wc-component.xml Description
DKPREDEFCONF.COMPLETE isDKPreConfigured This is a flag to specify whether the predefined configuration is complete, or if it requires extra input from an external configurator. The default value, 0, indicates that the predefined configuration is not a complete configuration, and therefore cannot be added to a shopping cart without going through a configurator.
CATCONFINF.REFERENCE dkModelReference Reference information that is required by an external product configurator.
CATCONFINF.URL dkURL (if null, the value is read from the TRANSPORT table by run time) The URL location of an external product configurator.
CATCONFINF.CONFIGURATION dkDefaultConfiguration The detailed information of this configuration, typically, it might be XML from an external configurator.