Configure the comma-separated values (CSV) data reader in the business object
configuration file to modify the way data is read from CSV source files. You might want to change
the default settings of the CSV data reader to better work with the format of your existing source
data.
About this task
The CSV data reader reads and processes data from an input CSV file one record at a time until
the end of the file is reached. Each record in the CSV file must have the same data structure. The
data read from the CSV file can be mapped to a WebSphere Commerce business object by using a
business object configuration file. Using the configuration file, each column of data in the input
CSV file is mapped directly to a property of a WebSphere Commerce business object.
A CSV file can contain multiple data records, with each record spanning multiple columns. Each
column value for a record is also known as a token. The CSV file must include delimiter characters
to separate tokens within each record and to separate records. The CSV data reader uses these
delimiter characters to identify each record and token.
- Tokens are separated by a tokenDelimiter character. By default, the tokenDelimiter character is
a comma (
,
). Each token can be optionally enclosed by tokenValueDelimiter
characters, with a tokenValueDelimiter at the beginning and the end of the token. The default
tokenValueDelimiter is the double quotation mark character ( "
). If a token value
contains a special character, such as the tokenDelimiter or the lineDelimiter character, the token
must be enclosed by tokenValueDelimiter characters. As an example, the following string includes
commas and is enclosed in the tokenValueDelimeter:
"Men's fashions for business, casual, and formal occasions"
- Records are separated by a lineDelimiter character, which can also be called a record delimiter.
By default, the lineDelimiter character is the new line character. This character indicates the end
of a record for an object and the beginning of a new object record.
Since the default character
is a new line character, the CSV data reader reads each line in the file as a separate object
record. If you include data for a column or record across multiple lines in your file, you can
encounter errors or issues with the load process or with your data. If you want data for a column to
span multiple lines, enclose the data within the configured tokenValueDelimiter characters. If you
want data for an entire record to span multiple lines, you must configure a different lineDelimiter
character to use instead of the new line character to identify the end of each record.
Procedure
-
Open the wc-loader-<object>.xml configuration file
in edit mode.
A sample of this file is in the
WC_installdir/samples/DataLoad/Catalog directory.
-
Find the
<_config:DataReader>
element.
-
Add the following optional parameters inside the
<_config:DataReader>
tag:
- lineDelimiter
- Specifies the line separator character or record separator character. The default value is the
new line character. The lineDelimiter character cannot appear in the content of a token unless
enclosed within the tokenValueDelimiter character.
Note: If you want records in a CSV file to span
multiple lines, you can configure a custom lineDelimiter character to identify the end of a record.
By configuring a different delimiter character, CSV files can include newline characters within
object records, instead of having the data reader handle each newline character as the end of a
record. For instance, you can configure the lineDelimiter to be a semi-colon (
;
)
instead of the newline character. With this new lineDelimiter character configured, the following
CSV file is considered to have a single object record instead of two
records.
Column1, Column2, Column3, Column4, Column5;
Value1,Val
ue2,Value3,Value4,Value5;
The
CSV data reader reads this object record as a single record with the value for
Column2
spanning multiple lines.
- tokenDelimiter
- Specifies the token separator character. The default is the comma character (,).
- tokenValueDelimiter
- Specifies the string separator character. The tokenValueDelimiter is used to indicate the
beginning and the end of a token. The default tokenValueDelimiter character is the double quotation
mark ("). For instance, the following token, which contains commas, can be used for a catalog entry
short
description:
"Men's fashions for business, casual, and formal occasions"
Notes:
- If you are editing your file with a plain text editor, use the tokenValueDelimiter when your
token contains special characters, such as the tokenDelimiter character or the tokenValueDelimiter
itself. To use the tokenValueDelimiter character within the token, you must use two
tokenValueDelimiter characters. For instance, the following token, which contains commas and
quotation marks, can be used for a catalog entry short
description:
"Men's fashions for ""business"", ""casual"", and ""formal"" occasions."
The
output can resemble the following
string:Men's fashions for "business", "casual", and "formal" occasions.
These
usages of the tokenValueDelimeter
apply only when you are using a plain text editor
to edit your file.
- If you want to include column values that span multiple lines within your input file, enclose
the column value within tokenValueDelimiter characters. By enclosing the value within these
characters, you can include the newline character in the column value without causing the data
reader to handle the newline character as the end of the object record.
- charset
- Specifies the character set of the CSV file. The default character set is UTF-8.
- firstLineIsHeader
- Indicates that the first line in the CSV file is column header information. Use this header line
for providing the column mappings in the
<_config: Data>
element in the
wc-loader-<object>.xml configuration file. The default
value is false.
- useHeaderAsColumnName
- Indicates that the first line in the CSV file is used as column information. The default value
for useHeaderAsColumnName is false. There are four possible combinations of the firstLineIsHeader
and useHeaderAsColumnName parameters:
-
firstLineIsHeader = "false" and
useHeaderAsColumnName = "false". In this case, the
column mappings in the wc-loader-<object>.xml
configuration file is mandatory.
- firstLineIsHeader = "false" and
useHeaderAsColumnName = "true". In this case, the
useHeaderAsColumnName flag is ignored and the column mapping is
mandatory.
- firstLineIsHeader = "true" and
useHeaderAsColumnName = "false". In this case, the
column mapping configuration is optional. If the column mapping configuration is defined in the
wc-loader-<object>.xml configuration file, use the column
mapping configuration. If not, use the CSV header for the column names.
- firstLineIsHeader = "true" and
useHeaderAsColumnName = "true". In this case, the
column mapping configuration is ignored and always use the CSV header for the column names.
Note: The
DataReader
element can contain nested elements. To add column
mappings, you can use the following code as an
example:
<_config:DataReader firstLineIsHeader="false" useHeaderAsColumnName="false">
<_config:Data>
<_config:column number="1" name="FIRST" />
<_config:column number="2" name="SECOND" />
</_config:Data>
</_config:DataReader>
-
Save and close the file.
Example
The following code snippet demonstrates how to use the parameters. This code snippet uses
all default
values:<_config:DataReader lineDelimiter="\n" tokenDelimiter="," tokenValueDelimiter='"'
charset="UTF-8" firstLineIsHeader="false" useHeaderAsColumnName="false" />