This article describes a set of JAXP based APIs that are meant to facilitate the generation of XML generation or parsing.

Table of contents

1 Design Principles

2 Categories of functions

2.1 Convenience classes
2.2 Handling of xml:base

3 Access to the service

3.1 Example how to access an implementation
3.2 Example how to access the registry

4 Examples

4.1 Handle links in an XML document with xml:base tags

4.1.1 Reader chain
4.1.2 Handler
4.1.3 Result

4.2 Interoperability with the ATOM_API

Design Principles

This API should help in the execution of commonly required tasks based on the JAXP APIs. In particular the streaming concepts of the JAXP APIs are considered and preserved in this helper API.

Content handler chains

One of the basic design principles of JAXP is the possibility to construct chains of handlers of SAX events. The idea is that each layer in this chain is responsible for dealing with one singular aspect of the event processing and delegates to the following elements in the chain for further processing. Given that the layers concentrate on solving just one aspect of the the complete processing, they are reusable more likely than handlers that solve the complete algorithm.

Clients of such handlers would typically construct a handler chain at initialization time that represents the intended algorithm. This chain would then be reused to process many different XML inputs.

This design pattern implies that it is possible to concatenate a chain of handlers and to associate different XML inputs at runtime. In JAXP there exist two different modes of processing:

parsing: an XML parser is represented by the XMLReader interface. A chain of parsers that filter the event during input is represented by XMLFilter, an extension of XMLReader. Clients can construct the parser filter chain using the constructors of the individual filter elements. For each differnet XML input clients either just specify a special InputSource object and reuse the existing filter chain. Alternatively they call setParent on the XMLFilter to associate a completely different XML parser with the chain. The individual chain elements are advised to route the setParent call to their outmost parent to keep the chain intact and just reassociate the root parser.
handling: a parser produces SAX events and throws them onto a ContentHandler. The second possibility to filter events is during this handling process, the ContentHandler that is attached to a XMLReader can internally also be a handler chain. Again this chain would be built using the constructors of the individual chain elements and each layer in the chain dispatches the events to its next layer, in addition to performing its layer specific logic. In JAXP there is however no extension to ContentHandler that allows to reassociate the target (leaf) element of this handler chain. This however is required to reuse a chain over multiple XML streams. The ResettableContentHandler extension closes this gap.

Categories of functions

Convenience classes

The following classes simplify the construction of XML filter chains.

LeafContentHandler: represents the last element in a ResettableContentHandler chain, i.e. the element that is not connected to any further chain element. This object however keeps a resettable reference to the target ContentHandler.
ResettableContentHandlerAdapter: can be used as a convenient base class for custom implementations of ResettableContentHandler

Handling of `xml:base`

In many documents, and in particular in ATOM feeds, the use of the xml:base tag is explicitly permitted. This implies that all URL references need to be interpreted relative to the closest base tag. See http://www.w3.org/TR/xmlbase/ for more details.

The XML helper API supplies classes that handle the resolution of nested base tags automatically and allow clients to resolve (relative) URLs in the scope of the current base tag. The management of the base tags is done transparently to the client application by an introspection of the XML stream. This introspection can be done at XML generation time via the ResolverXMLFilter interface or at consumption time via the ResolverContentHandler interface. Both interfaces leave the XML stream completely unmodified. Clients can make use of the URLResolver interface to resolve relative references.

Access to the service

The XML factory defines the Eclipse extension point XMLFactory.EXTENSION_POINT_ID. Clients can either look for any available extension of this extension point or directly look for the extension XMLFactory.DEFAULT_EXTENSION_ID.

Example how to access an implementation

final IExtensionPoint extPoint = extReg
        .getExtensionPoint(XMLFactory.EXTENSION_POINT_ID);
final IExtension ext = extPoint
        .getExtension(XMLFactory.DEFAULT_EXTENSION_ID);
final IConfigurationElement[] cfgElems = ext.getConfigurationElements();
final XMLFactory xmlFct = (XMLFactory) cfgElems[0]
        .createExecutableExtension(XMLFactory.ATTR_CLASS);

Example how to access the registry

InitialContext initialcontext = new InitialContext();
IExtensionRegistry extReg = (IExtensionRegistry) initialcontext
                .lookup("services/extensionregistry/global");

Examples

This section provides some examples on how to use the XML helper APIs. The following XML data is used as an example input.

<?xml version="1.0" encoding="UTF-8"?>
<atom:feed xmlns:atom="http://www.w3.org/2005/Atom"
        xmlns:xhtml="http://www.w3.org/1999/xhtml">

        <atom:title>dive into mark</atom:title>
        <atom:subtitle type="html">
                A <em>lot</em> of effort went into making this
                effortless
        </atom:subtitle>

        <atom:updated>2006-08-08T20:48:48.00Z</atom:updated>
        <atom:id>tag:example.org,2003:3</atom:id>
        <atom:link href="http://example.org" type="text/html" hreflang="en" />

        <atom:link href="feed.atom" rel="self"
                type="application/atom+xml" />
        <atom:rights>>

        <atom:generator uri="http://www.example.com/" version="1.0">
                Example Toolkit
        </atom:generator>
        <atom:entry xml:base="http://entry.org/entry/">
                <atom:title>Atom draft-07 snapshot</atom:title>

                <atom:link href="../2005/04/02/atom"
                        type="text/html" />
                <atom:link href="ph34r_my_podcast.mp3"
                        rel="enclosure" type="audio/mpeg" length="1337" />

                <atom:id>tag:example.org,2003:3.2397</atom:id>
                <atom:updated>2006-08-08T20:48:48.00Z</atom:updated>
                <atom:published>2006-08-08T20:48:48.00Z</atom:published>
                <atom:author>

                        <atom:name>Mark Pilgrim</atom:name>
                        <atom:uri>http://example.org/</atom:uri>
                        <atom:email>f8dy@example.com</atom:email>
                </atom:author>

                <atom:contributor>
                        <atom:name>Sam Ruby</atom:name>
                </atom:contributor>
                <atom:contributor>
                        <atom:name>Joe Gregorio</atom:name>

                </atom:contributor>
                <atom:content xml:lang="en" xml:base="http://diveintomark.org/"
                        type="xhtml">
                        <div xmlns="http://www.w3.org/1999/xhtml">

                                <p>
                                        <i>[Update: The Atom draft is finished.]</i>
                                </p>
                        </div>
                </atom:content>

        </atom:entry>
</atom:feed>

Handle links in an XML document with `xml:base` tags

In order to construct a parser that automatically takes the xml:base tags into account, we need to produce two artifacts

parser chain

Reader chain

// base URL to start with
final URL baseURL = new URL("http://www.ibm.com");
                
// step1: setup the XML generation chain                
// the original parser
final XMLReader rdr = XMLReaderFactory.createXMLReader();
// the filter layer that handles xml:base
final ResolverXMLFilter xmlBaseRdr = XML_FCT.createXMLBaseFilter(rdr);
xmlBaseRdr.setBaseURL(baseURL);
// the content handler that wants to resolve xml:base links
final ContentHandler handler = new XMLBaseHandler(xmlBaseRdr);
// construct the chain
xmlBaseRdr.setContentHandler(handler);
                
// this is the actual object to use as the parser, we just need the XMLReader interface
final XMLReader parser = xmlBaseRdr;

// step2: parse actual sources, the same chain can be used for many sources
// the actual XML data source
final InputSource src = new InputSource(getClass().getResourceAsStream("sampleBaseFeed.xml"));
parser.parse(src);

Handler

package com.ibm.wps.resolver.test;

import java.io.IOException;
import java.net.URL;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import com.ibm.portal.resolver.xml.URLResolver;

public class XMLBaseHandler extends DefaultHandler {

        /**
         * the resolver that can convert between relative and absolute URLs
         */
        protected final URLResolver urlResolver;
        
        public XMLBaseHandler(final URLResolver aUrlResolver) {
                urlResolver = aUrlResolver;
        }

        public void startElement(final String uri, final String localName, final String qName, final Attributes attributes) throws SAXException {
                // try to resolve all hrefs
                final String href = attributes.getValue("href");
                if (href�!= null) {
                        // resolve this
                        try {
                                final URL url = urlResolver.resolve(href);
                                System.out.println(url);
                        } catch (IOException ex) {
                                throw new SAXException(ex);
                        }
                }
        }
        
}

Result

The example generates the following output. Notice that the xml:base URLs have been respected and resolved correctly.


http://example.org
http://www.ibm.com/feed.atom
http://entry.org/2005/04/02/atom
http://entry.org/entry/ph34r_my_podcast.mp3

Interoperability with the ATOM_API

Interpretation of xml:base resolution can also be done easily with the ATOM_API. The same design patterns as in the previous example apply, only that the handler receives method calls instead of SAX events.

// base URL to start with
final URL baseURL = new URL("http://www.ibm.com");
                
// step1: setup the XML generation chain                
// the original parser
final XMLReader rdr = XMLReaderFactory.createXMLReader();
// the filter layer that handles xml:base
final ResolverXMLFilter xmlBaseRdr = XML_FCT.createXMLBaseFilter(rdr);
xmlBaseRdr.setBaseURL(baseURL);
// the layer that decodes the ATOM stream
final AtomXMLReader atomRdr = ATOM_FCT.createAtomXMLReader(xmlBaseRdr);
// the content handler that interprets the ATOM events
final AtomContentHandler atomHandler = new AtomBaseHandler(xmlBaseRdr);
// construct the chain
atomRdr.setAtomContentHandler(atomHandler);

// step2: parse actual sources, the same chain can be used for many sources
// the actual ATOM data source
final InputSource src = new InputSource(getClass().getResourceAsStream("sampleBaseFeed.xml"));
atomRdr.parse(src);

And the handler side that receives semantically meaningful method callbacks

package com.ibm.wps.resolver.test;

import java.io.IOException;
import java.net.URL;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;

import com.ibm.portal.resolver.atom.helper.EmptyAtomContentHandler;
import com.ibm.portal.resolver.xml.URLResolver;

public class AtomBaseHandler extends EmptyAtomContentHandler{
        
        /**
         * the resolver that can convert between relative and absolute URLs
         */
        protected final URLResolver urlResolver;
        
        public AtomBaseHandler(final URLResolver aUrlResolver) {
                urlResolver = aUrlResolver;
        }

        /* (non-Javadoc)
         * @see com.ibm.portal.resolver.atom.AtomContentHandler#startLink(java.lang.String, java.lang.String, java.lang.String, java.lang.String, java.lang.String, java.lang.String, org.xml.sax.Attributes)
         */
        public void startLink(String rHref, String oRel, String oType, String oHrefLang, String oTitle, String oLength, Attributes attrs) throws SAXException {
                try {
                        // resolve the rHref
                        final URL url = urlResolver.resolve(rHref);
                        // display the resolved URL
                        System.out.println(url);                        
                } catch (IOException ex) {
                        // convert to a SAX exception
                        throw new SAXException(ex);
                }
        }
}

Interface Summary
Constants	Convenience constants for XML processing.
DefaultContentHandler	General convenience interface that allows to format common data fields and write them as characters onto a content handler
ResettableContentHandler	Extension of `ContentHandler` that allows to build content handler chains.
ResolverContentHandler	Content handler filter that dispatches to a target content handler and that can use the context of the events floating over it to perform context sensitive URL resolution.
ResolverXMLFilter	XML filter that produces XML elements and that can use the context of the events floating over it to perform context sensitive URL resolution.
URIResolver	Resolves (relative) URI references into absolute URIs.
URLResolver	Resolves (relative) URL references into absolute URLs.