Data Load best practices for Catalog

The following best practices are recommended when you use the Data Load utility to load catalog data.

General

Ensure that you load catalog, category, and catalog entry objects into the store that owns the objects.When you use the Data Load utility to load catalog, category, or catalog entry, ensure that you specify the store that owns these objects. Specify this ownership in the data load order configuration file. When the Data Load utility creates, replaces or deletes an object, the store owner identifier is used to resolve the object identity. Then, the Data Load utility runs the requested operation.

Catalog

Initialize the attribute dictionary. The attribute dictionary attribute model is more efficient than assigning unique attributes to each product. The attribute dictionary facilitates data sharing and takes up less number of rows in the database. Specify it in the catalog data load configuration file.
<_config:property name="initAttributeDictionary" value="true" />

Catalog Entry

Use the right mediator for the purpose. The CatalogEntryMediator is general enough to load catalog entries, their descriptions, relationships, prices, calculation codes, attributes, SEO data, and other related information within one input file. However, if you are loading only a particular piece of information, say description, then use the CatalogEntryDescriptionMediator for it specifically. The specific mediators are more focused and efficient.

Use the Data Load utility in update mode to load minor changes for catalog entries and catalog entry descriptions.You can use the Data Load utility in update mode for loading only catalog entry or catalog entry description information. In update mode, the utility compares the catalog entry data in the input file with the corresponding data for the catalog entries in the database. The update mode then replaces or adds data for only the columns that are specified in the input file. All other columns remain unchanged. For more information about configuring and running the Data Load utility in update mode, see Scenario: Catalog entry update load.

Use the Data Load utility to load catalog entry SEO information. If your site uses SEO and plans to load SEO URLs with the Data Load utility, you must include the loadSEO parameter with a value set to be "true". This parameter must be set within the data load order configuration file with the following format:
<_config:property name="loadSEO" value="true"/>
When you run the utility to load SEO information, you must include the Dinstance parameter and value to identify your instance. Ensure that you have the files for the instance that you specify in the utilities_root/instances/instance_name directory and that you have access to the files. When SEO is first enabled for a store, the seourlkeywordgen utility is used to generate SEO URLs and keywords for your store. However, after this initial generation of SEO URLs and keywords, use the Data Load utility with the loadSEO parameter enabled to load SEO URL information instead of reusing the seourlkeywordgen utility. To load the SEO-related information when you are loading catalog entry data with the Data Load utility, your input files must include the SEO information along with the catalog data. For more information about structuring your catalog entry input files to include SEO information, see CatalogEntrySEO.
Note: When you set a value to be loaded into the SEOURLKEYWORD.URLKEYWORD database column for a catalog entry, ensure that you do not include any invalid characters in the value. The list of invalid characters for an SEO URL keyword is configured with the invalidURLCharactersList property in the infrastructure component configuration (wc-admin-component.xml) file. For more information about this property, see Configuration properties in the infrastructure component.

Use the Data Load utility to create parent product before you load the child SKU.You can use the catalog entry load to create the parent product while you create the child SKUs. To improve performance, it is best to use data load to load the specific parent product before you load the child SKUs. By loading the parent product before you load the child SKUs, you can skip the task of caching the parent products and related attributes.

Delete the old parent category relationship before you move a catalog entry to another parent category. To move a catalog entry from one parent category to another parent category, the old parent-child relationship must be deleted first. Only then can you create a new relationship. For example, to move the catalog entry with part number "Test-PN-10001" from category "Accessory" to "Pants", two records are needed in the input file:
PartNumber,ParentGroupIdentifier,Sequence,Delete
Test-PN-10001,Accessory,2,1
Test-PN-10001,Pants,1,0
The second row deletes the old relationship and the third row adds the new parent-child relationship. The data loader configuration for this scenario:
<_config:mapping xpath="CatalogEntryIdentifier/ExternalIdentifier/PartNumber" value="PartNumber" />
<_config:mapping xpath="ParentCatalogGroupIdentifier/ExternalIdentifier/GroupIdentifier" value="ParentGroupIdentifier"  /> 

Specify the relationship types when you load the child catalog entries to bundle or kit relationships. When you load the child catalog entries to bundle or kit relationships, the type of relationship is an optional field in the input file. However, you can provide this field in the input file to optimize data load performance. If this field is not provided, the Data Load utility retrieves the catalog entry type from the database. Depending on the catalog entry type, the corresponding relationship type is created.

Use the mark for delete option when you delete a catalog entry. The default behavior for deleting a catalog entry is to mark for delete a catalog entry. This behavior means that the mark for delete flag of this catalog entry in the database is set as '1'. The catalog entry is not physically deleted from the database. Although you can change this default delete option to physically delete a catalog entry, you are recommended to use the default mark for delete option. This default mark is to ensure that any order items that refer to this deleted catalog entry is not removed as a result of the database cascade delete.

HCL Commerce EnterpriseDo not load catalog entries into an extended site store with part numbers that duplicate inherited catalog entry part numbers.When you are loading new catalog entries into an extended sites store, your new catalog entries can have part numbers that are duplicates of the part numbers for inherited catalog entries. An extended site store catalog entry and a catalog asset store catalog entry can have the same part number because the catalog entries belong to different stores. If duplicate part numbers exist, store functions that retrieve catalog entries by only the part number can behave unexpectedly or can result in an error. For example, if a store function uses the part number to retrieve only a single catalog entry and instead finds two catalog entries with the same part number, an error can occur. Ensure that the part numbers for the catalog entries that you are loading do not exist for any inherited catalog entries. If your extended site store does include duplicate part numbers, you can use the Catalogs tool to change the part numbers for your extended site store catalog entries.

Delete the catalog entry to delete UserData. To remove UserData, delete the entire catalog entry that contains the UserData. You can also load blank fields to the UserData tables.

Avoid circular dependencies

Categories can be defined with parent-child relationship in the catalog. For example, "Lights" could be a parent of "Bulbs", "Bulbs" in turn could be the parent of "Festive Lights," etc. Within this hierarchy, if "Lights" is added as a child of "Bulbs" or "Festive Lights," or as a child of any of the categories further down the same hierarchy, it creates a circular dependency in the data. During data load, when the application tries to build the hierarchy in memory, it can never find leaf nodes and continues in infinite cycles, since one of the parents is added as a child in the same hierarchy. This scenario can lead to memory and performance issues.

To prevent such performance issues, avoid including circular dependencies in the catalog.

Category (catalog group)

Delete the old parent category relationship before you move a category to another parent category. When you move a category from one parent group to another parent group, delete the category from the old parent group before you add it to the new parent group. For example, to move group "Accessory" from old parent group "Womens Fashions" to new parent group "Mens Fashions", two rows must be specified in the input file. The second row deletes the old relationship, and the third row adds the new relationship.

GroupIdentifier,ParentGroupIdentifier,Sequence,Delete
Accessory,Womens Fashions,2,1
Accessory,Mens Fashions,3,0
The data loader configuration for this scenario:
<_config:mapping xpath="CatalogGroupIdentifier/ExternalIdentifier/GroupIdentifier" value="GroupIdentifier" />
<_config:mapping xpath="topCatalogGroup" value="TopGroup" />
<_config:mapping xpath="ParentCatalogGroupIdentifier/ExternalIdentifier/GroupIdentifier" value="ParentGroupIdentifier" />
<_config:mapping xpath="displaySequence" value="Sequence"  />
<_config:mapping xpath="" value="Delete"  deleteValue="1"/>
Use the Data Load utility to load category SEO information.If your site uses SEO and plans to load SEO URLs with the Data Load utility, you must include the loadSEO parameter with a value set to be "true". This parameter must be set within the data load order configuration file with the following format:
<_config:property name="loadSEO" value="true"/>
When you run the utility to load SEO information, you must include the Dinstance parameter and value to identify your instance. When SEO is first enabled for a store, the seourlkeywordgen utility is used to generate SEO URLs and keywords for your store. However, after this initial generation of SEO URLs and keywords, use the Data Load utility with the loadSEO parameter enabled to load SEO URL information instead of reusing the seourlkeywordgen utility. To load the SEO-related information when you are loading category data with the Data Load utility, your input files must include the SEO information along with the catalog data. For more information about structuring your category input files to include SEO information, see CatalogGroupSEO.
Note: When you set a value to be loaded into the SEOURLKEYWORD.URLKEYWORD database column for a category, ensure that you do not include any invalid characters in the value. The list of invalid characters for an SEO URL keyword is configured with the invalidURLCharactersList property in the infrastructure component configuration (wc-admin-component.XML) file. For more information about this property, see Configuration properties in the infrastructure component.

When you load category data and the utility generates an SEO URL keyword, the utility can generate a different SEO URL keyword if a duplicate keyword is encountered. When the utility generates an SEO URL keyword for a category, the utility first uses the category name as the SEO URL keyword. If the keyword is already used by another category, the utility generates a different keyword with the category name and identifier. If that keyword is still not unique, the utility then generates a keyword with the category name, identifier, and language ID. For example, if you are loading data for a category "Shirts", the utility first attempts to generate the SEO keyword "Shirts". If another category already uses this keyword, the utility then attempts to generate a keyword that also includes the category identifier, such as 10001. If this alternate keyword, "Shirts10001", is also used by another category, the utility then includes the language ID, "-1", to generate the keyword, "Shirts10001-1". For more information about generating SEO URL keywords when duplicate keywords exist, see Creating descriptive storefront URLs when duplicate keywords exist.

Load data for multiple languages in separate files. Even though data load supports loading categories with multi-language descriptions in one input file, it is recommended to load each language in its own input file. This recommendation is because the input file for different languages can use the different encoding setting, and it is easy to manage. For example, double byte languages like Chinese have different encoding settings.

Do not use the Data Load utility to load category hierarchy changes for linked categories.The Data Load utility, and the Catalog Upload feature in Management Center, cannot handle loading hierarchy data for linked categories. If you use the Data Load utility or Catalog Upload, to change the hierarchy for a category, the load process does not synchronize the data for any linked categories. For example, if you delete a child category or change the parent category, the changes are not reflected in any linked categories. The linked categories continue to have the original hierarchy. If the linked categories are not updated with the changes separately, you can encounter errors when you browse the categories on your store. You can use the Data Load utility or Catalog Upload to add or remove catalog entries from a category. When you load changes to the catalog entry assignments for a category, the load process does synchronize the changes across any linked categories.

Attribute

Use the attribute dictionary for attributes.The attribute dictionary facilitates data sharing and takes up less number of rows in the database. For more information, see Attribute dictionary. When you use the attribute dictionary, you are recommended to follow the best practices. For more information, see Best practices for using the attribute dictionary.

Load attributes and allowed values together.Attributes and allowed values can be loaded together or separately. For simplicity and manageability, it is recommended to load them together in one input file. The separate attribute value load is intended for more granular loading. Use this separate load when you must load details (such as Field1, Field2, Field3, Image1, Image2) for each individual attribute value.

Load catalog entry and attribute relationships by loading the SKU and attribute value relationship.After you load the attributes and allowed values into the attribute dictionary, load the relationship between the product SKU item and the attribute value directly. The relationship between product and attribute is automatically handled by the data load mediator.

Configure the Data Load utility to reuse assigned values.You can enable the Data Load utility to share attribute assigned values when the same value is needed for multiple catalog entries. By sharing attribute assigned values across catalog entries, you can reduce the number of duplicate values that the utility creates in the database. Reducing the number of duplicate values can improve the performance of retrieving attribute information from the database. When you enable the utility to reuse assigned values, the utility creates only the first instance of a value that included in the input file. The utility then reuses that value for all other instances where the utility is loading the same value for an attribute. For more information about enabling the reuse of assigned attribute values, see Reuse attribute assigned values with the Data Load utility.