|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Interface Summary | |
---|---|
Constants | Convenience constants for XML processing. |
DefaultContentHandler | General convenience interface that allows to format common data fields and write them as characters onto a content handler |
ResettableContentHandler | Extension of ContentHandler that allows to build content handler
chains. |
ResolverContentHandler | Content handler filter that dispatches to a target content handler and that can use the context of the events floating over it to perform context sensitive URL resolution. |
ResolverXMLFilter | XML filter that produces XML elements and that can use the context of the events floating over it to perform context sensitive URL resolution. |
URIResolver | Resolves (relative) URI references into absolute URIs. |
URLResolver | Resolves (relative) URL references into absolute URLs. |
This article describes a set of JAXP based APIs that are meant to facilitate the generation of XML generation or parsing.
Table of contents |
This API should help in the execution of commonly required tasks based on the JAXP APIs. In particular the streaming concepts of the JAXP APIs are considered and preserved in this helper API.
One of the basic design principles of JAXP is the possibility to construct chains of handlers of SAX events. The idea is that each layer in this chain is responsible for dealing with one singular aspect of the event processing and delegates to the following elements in the chain for further processing. Given that the layers concentrate on solving just one aspect of the the complete processing, they are reusable more likely than handlers that solve the complete algorithm.
Clients of such handlers would typically construct a handler chain at initialization time that represents the intended algorithm. This chain would then be reused to process many different XML inputs.
This design pattern implies that it is possible to concatenate a chain of handlers and to associate different XML inputs at runtime. In JAXP there exist two different modes of processing:
XMLReader
interface. A chain of parsers that filter the event during input is
represented by XMLFilter
,
an extension of XMLReader
.
Clients can construct the parser filter chain using the constructors of
the individual filter elements. For each differnet XML input clients
either just specify a special InputSource
object and reuse the existing filter chain. Alternatively they call setParent
on the XMLFilter
to associate a completely different XML parser with the chain. The
individual chain elements are advised to route the setParent
call to their outmost parent to keep the chain intact and just
reassociate the root parser.ContentHandler
.
The second possibility to filter events is during this handling
process, the ContentHandler
that is attached to a XMLReader
can internally also be a handler chain. Again this chain would be built
using the constructors of the individual chain elements and each layer
in the chain dispatches the events to its next layer, in addition to
performing its layer specific logic. In JAXP there is however no extension to ContentHandler
that allows to reassociate the target (leaf) element of this handler
chain. This however is required to reuse a chain over multiple XML
streams. The ResettableContentHandler
extension closes this gap.The following classes simplify the construction of XML filter chains.
LeafContentHandler
ResettableContentHandler
chain, i.e. the element that is not connected to any further chain
element. This object however keeps a resettable reference to the target
ContentHandler
.
ResettableContentHandlerAdapter
ResettableContentHandler
xml:base
In many documents, and in particular in ATOM feeds, the use of the xml:base
tag is explicitly permitted. This implies that all URL references need
to be interpreted relative to the closest base tag. See http://www.w3.org/TR/xmlbase/
for more details.
The XML helper API supplies classes that handle the resolution of
nested base tags automatically and allow clients to resolve (relative)
URLs in the scope of the current base tag. The management of the
base tags is done transparently to the client application by an
introspection of the XML stream. This introspection can be done at XML
generation time via the ResolverXMLFilter
interface or at consumption time via the ResolverContentHandler
interface. Both interfaces leave the XML stream completely unmodified.
Clients can make use of the URLResolver
interface to resolve relative references.
The XML factory defines the Eclipse extension point XMLFactory.EXTENSION_POINT_ID
.
Clients can either look for any available extension of this extension
point or directly look for the extension XMLFactory.DEFAULT_EXTENSION_ID
.
final IExtensionPoint extPoint = extReg .getExtensionPoint(XMLFactory.EXTENSION_POINT_ID); final IExtension ext = extPoint .getExtension(XMLFactory.DEFAULT_EXTENSION_ID); final IConfigurationElement[] cfgElems = ext.getConfigurationElements(); final XMLFactory xmlFct = (XMLFactory) cfgElems[0] .createExecutableExtension(XMLFactory.ATTR_CLASS);
InitialContext initialcontext = new InitialContext(); IExtensionRegistry extReg = (IExtensionRegistry) initialcontext .lookup("services/extensionregistry/global");
This section provides some examples on how to use the XML helper APIs. The following XML data is used as an example input.
<?xml version="1.0" encoding="UTF-8"?> <atom:feed xmlns:atom="http://www.w3.org/2005/Atom" xmlns:xhtml="http://www.w3.org/1999/xhtml"> <atom:title>dive into mark</atom:title> <atom:subtitle type="html"> A <em>lot</em> of effort went into making this effortless </atom:subtitle> <atom:updated>2006-08-08T20:48:48.00Z</atom:updated> <atom:id>tag:example.org,2003:3</atom:id> <atom:link href="http://example.org" type="text/html" hreflang="en" /> <atom:link href="feed.atom" rel="self" type="application/atom+xml" /> <atom:rights>> <atom:generator uri="http://www.example.com/" version="1.0"> Example Toolkit </atom:generator> <atom:entry xml:base="http://entry.org/entry/"> <atom:title>Atom draft-07 snapshot</atom:title> <atom:link href="../2005/04/02/atom" type="text/html" /> <atom:link href="ph34r_my_podcast.mp3" rel="enclosure" type="audio/mpeg" length="1337" /> <atom:id>tag:example.org,2003:3.2397</atom:id> <atom:updated>2006-08-08T20:48:48.00Z</atom:updated> <atom:published>2006-08-08T20:48:48.00Z</atom:published> <atom:author> <atom:name>Mark Pilgrim</atom:name> <atom:uri>http://example.org/</atom:uri> <atom:email>f8dy@example.com</atom:email> </atom:author> <atom:contributor> <atom:name>Sam Ruby</atom:name> </atom:contributor> <atom:contributor> <atom:name>Joe Gregorio</atom:name> </atom:contributor> <atom:content xml:lang="en" xml:base="http://diveintomark.org/" type="xhtml"> <div xmlns="http://www.w3.org/1999/xhtml"> <p> <i>[Update: The Atom draft is finished.]</i> </p> </div> </atom:content> </atom:entry> </atom:feed>
xml:base
tagsIn order to construct a parser that automatically takes the xml:base
tags into account, we need to produce two artifacts
XMLReader
object that pipes SAX events from a parent reader via the ResolverXMLFilter
layer onto a ContentHandler
.
The complete chain exposes only the XMLReader
interface, so it looks (and behaves) exactly like a normal XML parser
to a client.ContentHandler
interface to receive the SAX events from a parser. In order to
correctly interpret relative references however this handler keeps a
reference to the URLResolver
interface that is part of the reader chain. Because this resolver
intercepts the XML data stream it can keep track of the current
base URL and allows the handler to correctly resolve all references.// base URL to start with final URL baseURL = new URL("http://www.ibm.com"); // step1: setup the XML generation chain // the original parser final XMLReader rdr = XMLReaderFactory.createXMLReader(); // the filter layer that handles xml:base final ResolverXMLFilter xmlBaseRdr = XML_FCT.createXMLBaseFilter(rdr); xmlBaseRdr.setBaseURL(baseURL); // the content handler that wants to resolve xml:base links final ContentHandler handler = new XMLBaseHandler(xmlBaseRdr); // construct the chain xmlBaseRdr.setContentHandler(handler); // this is the actual object to use as the parser, we just need the XMLReader interface final XMLReader parser = xmlBaseRdr; // step2: parse actual sources, the same chain can be used for many sources // the actual XML data source final InputSource src = new InputSource(getClass().getResourceAsStream("sampleBaseFeed.xml")); parser.parse(src);
package com.ibm.wps.resolver.test; import java.io.IOException; import java.net.URL; import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler; import com.ibm.portal.resolver.xml.URLResolver; public class XMLBaseHandler extends DefaultHandler { /** * the resolver that can convert between relative and absolute URLs */ protected final URLResolver urlResolver; public XMLBaseHandler(final URLResolver aUrlResolver) { urlResolver = aUrlResolver; } public void startElement(final String uri, final String localName, final String qName, final Attributes attributes) throws SAXException { // try to resolve all hrefs final String href = attributes.getValue("href"); if (href != null) { // resolve this try { final URL url = urlResolver.resolve(href); System.out.println(url); } catch (IOException ex) { throw new SAXException(ex); } } } }
The example generates the following output. Notice that the xml:base
URLs have been respected and resolved correctly.
http://example.org http://www.ibm.com/feed.atom http://entry.org/2005/04/02/atom http://entry.org/entry/ph34r_my_podcast.mp3
Interpretation of xml:base
resolution can also be done
easily with the ATOM_API.
The same design patterns as in the previous example apply, only that the
handler receives method calls instead of SAX events.
// base URL to start with final URL baseURL = new URL("http://www.ibm.com"); // step1: setup the XML generation chain // the original parser final XMLReader rdr = XMLReaderFactory.createXMLReader(); // the filter layer that handles xml:base final ResolverXMLFilter xmlBaseRdr = XML_FCT.createXMLBaseFilter(rdr); xmlBaseRdr.setBaseURL(baseURL); // the layer that decodes the ATOM stream final AtomXMLReader atomRdr = ATOM_FCT.createAtomXMLReader(xmlBaseRdr); // the content handler that interprets the ATOM events final AtomContentHandler atomHandler = new AtomBaseHandler(xmlBaseRdr); // construct the chain atomRdr.setAtomContentHandler(atomHandler); // step2: parse actual sources, the same chain can be used for many sources // the actual ATOM data source final InputSource src = new InputSource(getClass().getResourceAsStream("sampleBaseFeed.xml")); atomRdr.parse(src);
And the handler side that receives semantically meaningful method callbacks
package com.ibm.wps.resolver.test; import java.io.IOException; import java.net.URL; import org.xml.sax.Attributes; import org.xml.sax.SAXException; import com.ibm.portal.resolver.atom.helper.EmptyAtomContentHandler; import com.ibm.portal.resolver.xml.URLResolver; public class AtomBaseHandler extends EmptyAtomContentHandler{ /** * the resolver that can convert between relative and absolute URLs */ protected final URLResolver urlResolver; public AtomBaseHandler(final URLResolver aUrlResolver) { urlResolver = aUrlResolver; } /* (non-Javadoc) * @see com.ibm.portal.resolver.atom.AtomContentHandler#startLink(java.lang.String, java.lang.String, java.lang.String, java.lang.String, java.lang.String, java.lang.String, org.xml.sax.Attributes) */ public void startLink(String rHref, String oRel, String oType, String oHrefLang, String oTitle, String oLength, Attributes attrs) throws SAXException { try { // resolve the rHref final URL url = urlResolver.resolve(rHref); // display the resolved URL System.out.println(url); } catch (IOException ex) { // convert to a SAX exception throw new SAXException(ex); } } }
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |