Reading a DTD

A Document Type Definition (DTD) describes the content and hierarchy of Extensible Markup Language (XML) tags that are used by an organization to define data it wants to share across platforms. The DTD consist of the following:

  • Elements and their allowed content
  • Attribute lists for elements
  • Entities that make DTDs more modular
  • Comments that make DTDs more understandable

For a complete listing and description of the various declarations in a DTD, refer to the XML Standard Recommendations Web site at http://www.w3.org/TR/REC-xml.

You can view the domino DTD file in any text editor. If you accepted the defaults when installing Notes, the path of the file is Notes/domino_6_0.dtd.

Elements

An element is a tag, defined as follows:

<!ELEMENT tagname ( content-model ) >

where tagname is the name of the tag. Tag names are case-sensitive. The content-model is an expression stating what content can appear inside the element. It can also contain nested tags. An element is used within start and end tags. For example, the viewentry element in the Domino DTD is defined thus:

<!ELEMENT viewentry ( entrydata* ) >

This element is used in an XML output as follows:

<viewentry children="3">

<entrydata columnnumber="1">

<text> Joe Smith </text>

</entrydata>

</viewentry>

The operators you can use to structure the content model are:

  • OR ( | ) means you can choose one of the elements
  • comma ( , ) means the elements must be in the specified sequence
  • asterisk ( * ) means 0 or more of the elements are required
  • plus ( + ) means 1 or more of the elements are required
  • question mark ( ? ) means 0 or 1 element; the element is optional.
  • parentheses ( () ) are for grouping elements

The following example shows the use of operators in structuring the content model:

<!ELEMENT acl ( role*, aclentry+, logentry* )>

In this case, an <acl> element can contain any number of <role> elements, followed by one or more <aclentry> elements, followed by any number of <logentry> elements.

Another example of operators is:

<!ELEMENT text ( #PCDATA | break )* >

Here a <text> element can contain any quantity of either plain text, or <break> elements, in any order. #PCDATA is parsed character data, which means it is plain text that does not recognize extra spacing and requires that characters, like an ampersand (&), for example, be represented by a character entity, such as &amp;. The following table displays the characters that are internal entities which are predefined in XML and cannot be included in parsed character data unless they are represented in the specified XML format:

Character

XML represention

< (less than)

&lt;

> (greater than)

&gt;

' (apostrophe)

&apos;

" (quotation marks)

&quot;

& (ampersand)

&amp;

Attributes

The attributes for any element are declared in the following manner:

<!ATTLIST tagname

attr1

type1

default_value1

attr2

type2

default_value2

...

The attributes, along with their types and default values, are defined for the tagname element.

Consider this example for the element with the tagname book:

<!ATTLIST book

title

CDATA

#REQUIRED

isbn

CDATA

#IMPLIED

cover

(hard|paper)

#IMPLIED

format

( bound|looseleaf|scroll|parchment )

"bound"

This example defines four allowed attributes for the <book> element: title, isbn, cover, and format. If there is no other ATTLIST declaration for <book>, then these are the only attributes allowed. Any others are illegal. The title attribute can have any value, but it must appear because its default_value equals #REQUIRED. A <book> element without a title attribute is not valid according to this DTD. The isbn attribute can have any value, or can be omitted entirely. The cover attribute can have one of two values: "hard" or "paper." Any other value is illegal, but the attribute can also be omitted. The format attribute can have one of four possible values, and if it is omitted, a validating parser will provide "bound" as its value just as if format="bound" had appeared in the XML content.

All attributes are fundamentally strings, as shown in this table:

Attribute value

Means

CDATA

Unparsed character data, which means any text string.

NMTOKEN

A single name token composed of letters, numbers, hyphens, underscores, or colons.

NMTOKENS

A list of name tokens separated by spaces.

ID

Name that identifies an instance of an element in a document. No two ID attributes can share the same name.

( a | b | c )

"a" or "b" or "c". You can specify any of the following as the default value:

  • one of the choices from the enumerated list, "a," for example
  • #REQUIRED, indicating that one of the enumerated choices must be supplied in the document
  • #IMPLIED, indicating that no value is required

#IMPLIED

Optional. A value is not required. The application will supply a default value.

#REQUIRED

The value must be supplied in the document. There is no default value.

'literal'

The default value to use if a value is not specified in the document.

#FIXED 'literal'

The default value to use. If the document specifies a value, it must be equal to this fixed value.

Since attribute values are strings, they are always enclosed by quotation marks in XML. The order of attributes in the ATTLIST declaration is irrelevant (just as it is in XML data), and white space can be used to make the DTD more legible.

Entities

There are three types of XML entities:

  • Internal: Defined in the DTD as replacement values.
  • External: Defined in an external DTD or DTD fragment.
  • Parameter: Expansion macros for simplifying DTDs.

Internal entities can be referenced in an XML document. For instance, you might declare the following internal entity in your DTD:

<!ENTITY productName "2000 Calendar">

Then you could reference it in your .xml file as follows:

<response>Thank you for purchasing the &productName; from us.</response> 

This means that in January of 2001, you could update the DTD to change the value of the &productName; entity to "2001 Calendar" and the change would be reflected in all XML documents that reference that entity.

The entities defined in the Domino DTD are parameter entities. Parameter entities are entities created to be referenced exclusively within the DTD itself. They facilitate code reusability and decrease the size of the DTD. You cannot reference parameter entities in an .xml file.

The Domino DTD entities serve as macros that are declared as shown:

<!ENTITY macroname "macrocontent" >

For example:

<!ENTITY

% string

"CDATA" >

<!ENTITY

% book.formats

"bound|looseleaf|scroll|parchment" >

<!ENTITY

% book.covers

"hard|paper" >

The preceding entities are used as follows:

<!ATTLIST book

title

%string;

#REQUIRED

isbn

%string;

#IMPLIED

cover

( %book.covers; )

#IMPLIED

format

( %book.formats; )

"bound"

The %book.formats; entity, which has been defined as a list of formatting options, replaces the list in the attribute list definition for the book element. The name of the entity is encased in percent and semicolon, and the contents are interpolated like a C macro. Using the entity to represent the formatting options list increases the efficiency of the DTD because the entity can be used more than once. For instance, if a manual element existed in this DTD, in addition to the book element, which only differed from the book element in that it was only available in soft cover, but was available in the same formatting options, the %book.formats; entity could be used again in defining the attribute list for the manual.

Comments

Comments within a DTD are enclosed as shown:

<!-- comment text, anything except two dashes. -->