Transforming character-delimited text to XML

About this task

To transform data between a character-delimited format, such as comma-separated value (CSV), and an XML data format:

Procedure

  1. Create a character-delimited format file that contains the data you want to transform.
  2. Create an XML schema file in a text editor defining how the character-delimited format file maps to an XML file.
    Your file should be structured similar to the following example:
    
    <?xml version="1.0" encoding="UTF-8" ?>
    <TextSchema DataType = "CSV Format">
       <RecordDescription
          FieldSeparator = ","
          RecordSeparator = "&#010;&#013;"
          StringDelimiter = "&quot;"
          HeaderIncluded = "true"
          HeaderLines = "1"
          ElementName = "category"
       >
             <FieldDescription FieldName = "categoryName"
    FieldPosition = "1" />
             <FieldDescription FieldName = "markForDelete"
    FieldPosition = "2" />
             <FieldDescription FieldName = "field1" FieldPosition =
    "3" />
             <FieldDescription FieldName = "field2" FieldPosition =
    "4" />
       </RecordDescription>
    </TextSchema>
    
    Use the following tags and attributes when creating your XML schema file:
    DataType
    Enter a description of the data format of your character-delimited format
    RecordDescription
    This tag describes the structure of your character-delimited format file. This tag uses the following attributes to define the file structure:
    FieldSeparator
    This attribute specifies the character or characters separating fields in the character-delimited format file.
    RecordSeparator
    This attribute specifies the character or characters separating records in the character-delimited format file.

    Special characters must entered as a decimal numerical Unicode entity. For example, a line feed or new line (\n) must be entered as &#010; and a carriage return (\r) must be entered as &#013;.

    StringDelimiter
    This attribute specifies the character or characters that enclose each field in a record in your character-delimited format file.
    HeaderIncluded
    Valid values for this attribute are:
    false
    The character-delimited format file does not contain a header line that indicates the field names.
    true
    The character-delimited format file contains a header line that indicates the field names.
    HeaderLines
    This attribute specifies the number of records at the start of the character-delimited format file, as separated by the RecordSeparator, that are not to be considered as data. These lines will not be converted to XML.
    ElementName
    This attribute defines the root element for each record.
    FieldDescription
    This element describes a field in the records in your character-delimited format file. There must be one FieldDescription element for each field in your character-delimited format file. This element uses the following attributes to define each field:
    FieldName
    This attribute defines the name of the field. This attribute will be used as a child element.
    FieldPosition
    This attribute indicates the position of this field in a record. The first field in a record is in field position 1.

    Save your file as an XML file. This file is not an XSD file.

  3. Create the parameters file that specifies the parameters required by the txttransform utility.

    The order of the values in the parameters file is important. The parameters must be separated by commas and appear in the file in the following order:

    Input file
    Name of the character-delimited variable format file to be transformed.
    Schema file
    Name of the XML schema file to be used in the transformation.
    Output file
    Name for the output XML file in which the transformed data will be stored.
    Transformation method
    Method to be used in adding the data to the output file. The following methods are valid:
    Create
    Create a new XML file from the text file.
    Append
    Append new XML data to an existing XML file.
    Encoding
    The character encoding scheme of the input file. Any character encoding scheme supported by Java can be specified.

    For example, your parameters file could consist of the following line of text:

    sample.csv,sample_schema.xml,catalog.xml,Create,UTF8

    This parameter file tells the txtransform utility to create a new XML file (catalog.xml) from a UTF-8 encoded comma separated value (CSV) file (sample.csv) using the schema defined in an XML schema file (sample_schema.xml).

  4. Run the txttransform utility.

Results

Based on the example XML schema file, each record would have the following XML structure:


<category>
  <categoryName>
Category_name_value</categoryName>
  <markforDelete>
Marked_for_delete_value</markforDelete>
  <field1>
field1_value</field1>
  <field2>
field2_value</field2>
</category>