Site content crawling in Extended Sites

You must consider extra factors at indexing time, at run time when you use Extended Sites.
At indexing time
The storeId of all the Extended Sites is passed in from the database, with the assumption that the manifest.txt file defines a list of content files that are accessible by the given Extended Site. Each indexed record must have the owning storeId associated with it.
Note:
  • The store path is resolved at indexing time and not run time. That is, the manifest.txt file must list Asset store files before Extended Sites, and the crawler must override the URL for Extended Site-specific static pages that use the static site map.
  • The static site map is used as the integration point of the crawler and the storefront. The static site map returns both SEO-enabled URLs and all index-able non-catalog related URLs. This static site map controller command takes in a static content flag, language flag, and an operational store ID parameter.
The storeId of the Asset Store is typically not passed in, since WebSphere Commerce Search only indexes operational store site content.
Even though the site content file is the same, the content might be different for each Extended Site. Therefore, index all of the accessible pages even though they come from the Asset store.
At run time
Site content that resides only with the Asset store: It expects search result URLs to navigate shoppers to the Asset store's site content file.
Site content that resides only with the Extended Site store: It expects search result URLs to navigate shoppers to the Extended Site's site content file.
Extended Site-specific site content that overrides the Asset store: It expects search result URLs to navigate shoppers to the Extended Site's site content file, not from the Asset store.

Managing site content in Extended Sites

The SRCHCONFEXT configuration table includes needed information when you index the site content static pages. This information is populated by the setupSearchIndex utility when the setupWebContent parameter is set to true, or enabled by default when not specified. The di-buildindex utility uses this information to locate the manifest.txt file and associates the storeId with static content when you build the site content index.

Setting the base path to work for your environment

The base path is the path from the Solr server to access the static content. The base path examples that are listed are for files that are on the local server.

For remote server configurations, the base path must point to the Solr server path that is mounted to the WebSphere Commerce file server.

Creating new Extended Site stores

More Extended Sites can be added over time. If you are adding new Extended Site stores before you run the setupSearchIndex utility, running the utility populates the table with the all the needed configuration information, and no further updates are required. However, if you are adding new Extended Site stores after the utility has already run, the new Extended Site store BasePath and storeId must be added to the existing CONFIG entry in the SRCHCONFEXT table. That is, adding its corresponding MasterCatalog_Id, and language_ID.

For example, an Extended Site store is created before you run the setupSearchIndex utility. The following entry with indexSubType=WebContent is created, containing a pair of BasePath and storeId values and other configuration information:

DB2

select config from SRCHCONFEXT where indexsubtype='WebContent'
CONFIG
SearchServerName=search_host_name,SearchServerPort=3737,
BasePath=c:\WebSphere\AppServer\profiles\demo\installedApps\WC_demo_cell\WC_demo.ear\Stores.war\MadisonsStorefrontAssetStore\StaticContent\en_US\,
StoreId=10152

A new extended site is then created later, with a storeId of 10751. This action requires updating the existing entry in the database table with the new store configuration information, BasePath, and storeId. A semicolon is used as a separator. The updates are marked in bold:

DB2

select config from SRCHCONFEXT where indexsubtype='WebContent'
CONFIG
SearchServerName=search_host_name,SearchServerPort=3737,
BasePath=c:\WebSphere\AppServer\profiles\demo\installedApps\WC_demo_cell\WC_demo.ear\Stores.war\MadisonsStorefrontAssetStore\StaticContent\en_US\;
c:\WebSphere\AppServer\profiles\demo\installedApps\WC_demo_cell\WC_demo.ear\Stores.war\MadisonsStorefrontAssetStore\StaticContent\en_US\
,
StoreId=10152;10751
Note: Running the setupSearchIndex utility would result in a similar outcome. However, there are scenarios where it is not recommended to run the utility again, such as when the BasePath of the existing stores is updated by the site content crawler.

Indexing Extended Site store-specific static content pages

An Extended Site store can contain its own specific static content pages where the BasePath points to its own manifest.txt file. It can also share its Asset store static content pages, where the BasePath points to its Asset Store manifest.txt file. Furthermore, it can even have both.

For example, a store with storeId 10152 is an Extended Site, which originally shares its Asset store static content pages with a second store. There is a business requirement to add extra static content pages specific to only the first store. The result is to therefore share some common static content pages from the Asset store, and have other specific pages only for the first store.

The original configuration:

DB2

select config from SRCHCONFEXT where indexsubtype='WebContent'
CONFIG
SearchServerName=search_host_name,SearchServerPort=3737,
BasePath=c:\WebSphere\AppServer\profiles\demo\installedApps\WC_demo_cell\WC_demo.ear\Stores.war\MadisonsStorefrontAssetStore\StaticContent\en_US\;
c:\WebSphere\AppServer\profiles\demo\installedApps\WC_demo_cell\WC_demo.ear\Stores.war\MadisonsStorefrontAssetStore\StaticContent\en_US\,
StoreId=10152;10751

Later, the first store's directory is created and the specific static content pages are placed under its store directory. Then, the SRCHCONFEXT configurations are updated to include the BasePath of the new manifext.txt file for the specific pages. The updates are marked in bold:

DB2

select config from SRCHCONFEXT where indexsubtype='WebContent'
CONFIG
SearchServerName=search_host_name,SearchServerPort=3737,
BasePath=c:\WebSphere\AppServer\profiles\demo\installedApps\WC_demo_cell\WC_demo.ear\Stores.war\MadisonsStorefrontAssetStore\StaticContent\en_US\;
c:\WebSphere\AppServer\profiles\demo\installedApps\WC_demo_cell\WC_demo.ear\Stores.war\MadisonsStorefrontAssetStore\StaticContent\en_US\;
c:\WebSphere\AppServer\profiles\demo\installedApps\WC_demo_cell\WC_demo.ear\Stores.war\StoreA\StaticContent\en_US\,
StoreId=10152;10751;10152
As a result, the first store shares the common pages that are located under its Asset store, and is also able to use its own specific static content pages.
Note: If a static content page with the same name exists under the Asset store, and the Extended Site store, the last one overrides the first one, and is displayed in the storefront.