Overview of the Data Extract utility

You can run the utility in the staging and production environments, but you are recommended to run the utility in an environment that has the information that you need to extract. For example, the staging environment might not have inventory or pricing information for a catalog entry. In this case, run the utility on the production environment.

This utility uses the Data Load utility framework and follows a similar interaction process:

The configured data reader for the utility reads the data that is to be extracted from the database and returns the data to the business object builder.
The business object builder populates a business object that is based on the data that is passed from the data reader. The business object builder passes the object to the business object mediator.
The business object mediator transforms the business object into a list of map objects that is then passed to the data writer.
The data writer then generates the configured output file and writes the list of CSV or XML objects into the output file.

There are two methods for extracting data that you can use with the Data Extract utility, an SQL-based extraction and a business logic-based extraction. The extraction method that you must configure the utility to use depends on the type of data that you want to extract.

If you want to extract promotions, marketing, or Commerce Composer objects, you must use the SQL-based extraction.
If you want to extract catalog data to generate Enterprise Product Report (EPR) data for use with IBM Product Recommendations, you must use the logic-based extraction.

SQL-based extraction

This SQL-based extraction uses a direct database connection and SQL statements to extract data. Unless you are extracting data for use with IBM Product Recommendations or are extracting data that cannot be directly retrieved from the database, you are recommended to use this SQL-based approach. This SQL-based extract process improves the performance and flexibility of the utility in comparison to the business logic-based extraction method.

The SQL-based process can also reduce the implementation cost for customizing the utility to extract data that is not supported for extracting with the utility by default. By default, the utility supports extracting the following types of data with the SQL-based extraction process:

Promotions
Commerce Composer objects, such as widgets, layouts, layout templates, and pages
Marketing objects, such as activities. e-Marketing Spots, content, campaigns, attachments, and customer segments

To configure the utility to use an SQL-based extract process instead of the business logic process, configure the utility to use the following classes:

UniqueIdReader: This data reader class adds support for the utility to use SQL statements to retrieve the unique ID value for a business object. The data reader class can then send a map object for the business object to the business object builder.
AssociatedObjectMediator: This business object mediator adds support for the utility to use SQL statements to retrieve the detailed business object information for the map object. The mediator can then send an updated map object that contains the detailed business object information to the configured data writer class.
CSVWriter: A data writer class that can convert the map objects that are sent by the business object mediator into a CSV formatted record. This writer class can then write the record into the configured output CSV file. Use either this data writer class or the XmlWriter data writer class.
XmlWriter: A data writer class that can convert the map objects that are sent by the business object mediator into an XML formatted element. This writer class can then write the element and any subelements into the configured output XML file. Use either this data writer class or the CSVWriter data writer class.
ValueHandler: This interface provides a customization point that you can use when the utility cannot retrieve data directly from the database. You can also use this class when you need to modify data before the data writer class writes the data into the output file.

For more information about configuring the Data Extract utility to use these classes and the SQL-based extraction process, see Configuring and running the Data Extract utility. When you are configuring the utility, you are recommended to copy and edit the provided sample configuration files to help you quickly configure and run the utility.

Business logic-based extraction

This approach uses business logic to fetch the data, similar to the behavior of existing web service. The configured data reader class for the utility uses catalog web service to retrieve data in the catalog business object (noun) format. The business object builder class does not populate any data in this process. Instead, the builder class passes the noun objects from the data reader class to the business object mediator class. The mediator class is then used to extract the data from the business object to build a map object. The data writer then converts the map object into CSV formatted ouput files, such as EPCMF and ECDF files for use with IBM Product Recommendations.

This business logic approach is useful when data cannot be directly retrieved from the database. For example, when complicated business logic is needed to compute the data, such as for extracting pricing data that uses price rules. To extract this pricing data, logic is needed to apply the price rules before the catalog entry prices can be determined, extracted, and written to an output file. When complicated business logic is needed, you do not need to reimplement the logic that is used to load or create the data to support extracting the data.

This approach, however has a few disadvantages:

The approach can cause the performance of the extraction process to be slow. The logic-based services for retrieving data is intended to retrieve a single business object or a list of business objects. If any of the business objects are large, however, the performance can be slow.
Customizing the extraction process requires significant effort to retrieve custom data or data that is not supported for extracting by default. If you need to extract custom data or data that is not supported for extracting with the utility, you must implement your own custom services to extract the data.

Configuration files for the Data Extract utility

The Data Extract utility uses three types of configuration files. Samples of each type of file are provided, but you must update the sample files with configuration information specific to your environment. These configuration files are based on the Data Load utility configuration files, but include some extensions.

wc-dataextract.xml

This file is the order configuration file that you must point to when you run the Data Extract utility. This file specifies the paths to the environment configuration file and to the business object configuration file.

wc-dataextract-env.xml

The environment configuration file, which includes the environment variables for your WebSphere Commerce instance. These variables include the following information:

Business context variables, including the store identifier, catalog identifier, and the default language and currency for your store.
Database environment settings, including the database type, name, and schema.

wc-dataextract-business_object.xml

The business object configuration file, which configures how the utility identifies the data to extract for a specific business object. By default sample business object configuration files are provided for extracting data for the following types of objects with the SQL-based extraction process:

Commerce Composer objects
Sample configuration files for extracting Commerce Composer widgets, layouts, templates, and pages. The files are configured to generate CSV files that can be used with the Data Load utility.
Promotions
The sample configuration files for extracting promotion data are configured to generate an XML file that can be used with the Data Load utility.
Marketing objects
Sample configuration files are provided for extracting marketing activities, campaigns, content, attachments, customer segments, and e-Marketing Spots. The files are configured to generate CSV files that can be used with the Data Load utility.

These files include the following information:

Business context information.
Data mappings that are required to transform WebSphere Commerce business objects to the data that can be written in the output file.
Definitions for the order that the utility writes the data to the columns in the file.
Pointers to interfaces and implementation classes that the utility uses to extract and transform the data.

Note: Sample configuration files are also provided for extracting catalog entry data into an EPCMF file and category data into an ECDF file for use with IBM Product Recommendations. These sample configuration files configure the utility to use the business logic-based extraction method. For more information about configuring the utility to uses thes sample files, see Data extraction utility for dynamic recommendations in IBM Product Recommendations.

Best Practices

When you use the Data Extract utility, there are general configuration recommendations that you can use to ensure that you take advantage of the full capability of the utility. For more information, see Data Extract utility best practices.