Mapping XML to Java, Part 1

Employ the SAX API to map XML documents to Java objects

XML is hot. Because XML is a form of self-describing data, it can be used to encode rich data models. It’s easy to see XML’s utility as a data exchange medium between very dissimilar systems. Data can be easily exposed or published as XML from all kinds of systems: legacy COBOL programs, databases, C++ programs, and so on.

TEXTBOX:

TEXTBOX_HEAD: Mapping XML to Java: Read the whole series!

  • Part 1 — Employ the SAX API to map XML documents to Java objects
  • Part 2 — Create a class library that uses the SAX API to map XML documents to Java objects

:END_TEXTBOX

However, using XML to build systems poses two challenges. First, while generating XML is a straightforward procedure, the inverse operation, using XML data from within a program, is not. Second, current XML technologies are easy to misapply, which can leave a programmer with a slow, memory-hungry system. Indeed, heavy memory requirements and slow speeds can prove problematic for systems that use XML as their primary data exchange format.

Some standard tools currently available for working with XML are better than others. The SAX API in particular has some important runtime features for performance-sensitive code. In this article, we will develop some patterns for applying the SAX API. You will be able to create fast XML-to-Java mapping code with a minimum memory footprint, even for fairly complex XML structures (with the exception of recursive structures).

In Part 2 of this series, we will cover applying the SAX API to recursive XML structures in which some of the XML elements represent lists of lists. We will also develop a class library that manages the navigational aspects of the SAX API. This library simplifies writing XML mapping code based on SAX.

Mapping code is similar to compiling code

Writing programs that use XML data is like writing a compiler. That is, most compilers convert source code into a runnable program in three steps. First, a lexer module groups characters into words or tokens that the compiler recognizes — a process known as tokenizing. A second module, called the parser, analyzes groups of tokens in order to recognize legal language constructs. Last, a third module, the code generator, takes a set of legal language constructs and generates executable code. Sometimes, parsing and code generation are intermixed.

To use XML data in a Java program, we must undergo a similar process. First, we analyze every character in the XML text in order to recognize legal XML tokens such as start tags, attributes, end tags, and CDATA sections.

Second, we verify that the tokens form legal XML constructs. If an XML document consists entirely of legal constructs per the XML 1.0 specification, it is well-formed. At the most basic level, we need to make sure that, for instance, all of the tagging has matching opening and closing tags, and the attributes are properly structured in the opening tag.

Also, if a DTD is available, we have the option to make sure that the XML constructs found during parsing are legal in terms of the DTD, as well as being well-formed XML.

Finally, we use the data contained in the XML document to accomplish something useful — I call this mapping XML into Java.

XML Parsers

Fortunately, there are off-the-shelf components — XML parsers — that perform some of these compiler-related tasks for us. XML parsers handle all lexical analysis and parsing tasks for us. Many currently available Java-based XML parsers support two popular parsing standards: the SAX and DOM APIs.

The availability of an off-the-shelf XML parser may make it seem that the hard part of using XML in Java has been done for you. In reality, applying an off-the-shelf XML parser is an involved task.

SAX and DOM APIs

The SAX API is event-based. XML parsers that implement the SAX API generate events that correspond to different features found in the parsed XML document. By responding to this stream of SAX events in Java code, you can write programs driven by XML-based data.

The DOM API is an object-model-based API. XML parsers that implement DOM create a generic object model in memory that represents the contents of the XML document. Once the XML parser has completed parsing, the memory contains a tree of DOM objects that offers information about both the structure and contents of the XML document.

The DOM concept grew out of the HTML browser world, where a common document object model represents the HTML document loaded in the browser. This HTML DOM then becomes available for scripting languages like JavaScript. HTML DOM has been very successful in this application.

Dangers of DOM

At first glance, the DOM API seems to be more feature-rich, and therefore better, than the SAX API. However, DOM has serious efficiency problems that can hurt performance-sensitive applications.

The current group of XML parsers that support DOM implement the in-memory object model by creating many tiny objects that represent DOM nodes containing either text or other DOM nodes. This sounds natural enough, but has negative performance implications. One of the most expensive operations in Java is the new operator. Correspondingly, for every new operator executed in Java, the JVM garbage collector must eventually remove the object from memory when no references to the object remain. The DOM API tends to really thrash the JVM memory system with its many small objects, which are typically tossed aside soon after parsing.

Another DOM issue is the fact that it loads the entire XML document into memory. For large documents, this becomes a problem. Again, since the DOM is implemented as many tiny objects, the memory footprint is even larger than the XML document itself because the JVM stores a few extra bytes of information regarding all of these objects, as well as the contents of the XML document.

It is also troubling that many Java programs don’t actually use DOM’s generic object structure. Instead, as soon as the DOM structure loads in memory, they copy the data into an object model specific to a particular problem domain — a subtle yet wasteful process.

Another subtle issue for the DOM API is that code written for it must scan the XML document twice. The first pass creates the DOM structure in memory, the second locates all XML data the program is interested in. Certain coding styles may traverse the DOM structure several additional times while locating different pieces of XML data. By contrast, SAX’s coding style encourages locating and collecting XML data in a single pass.

Some of these issues could be addressed with a better underlying data-structure design to internally represent the DOM object model. Issues such as encouraging multiple processing passes and translating between generic and specific object models cannot be addressed within the XML parsers.

SAX for survival

Compared to the DOM API, the SAX API is an attractive approach. SAX doesn’t have a generic object model, so it doesn’t have the memory or performance problems associated with abusing the new operator. And with SAX, there is no generic object model to ignore if you plan to use a specific problem-domain object model instead. Moreover, since SAX processes the XML document in a single pass, it requires much less processing time.

SAX does have a few drawbacks, but they are mostly related to the programmer, not the runtime performance of the API. Let’s look at a few.

The first drawback is conceptual. Programmers are accustomed to navigating to get data; to find a file on a file server, you navigate by changing directories. Similarly, to get data from a database, you write an SQL query for the data you need. With SAX, this model is inverted. That is, you set up code that listens to the list of every available piece of XML data available. That code activates only when interesting XML data are being listed. At first, the SAX API seems odd, but after a while, thinking in this inverted way becomes second nature.

The second drawback is more dangerous. With SAX code, the naive “let’s take a hack at it” approach will backfire fairly quickly, because the SAX parser exhaustively navigates the XML structure while simultaneously supplying the data stored in the XML document. Most people focus on the data-mapping aspect and neglect the navigational aspect. If you don’t directly address the navigational aspect of SAX parsing, the code that keeps track of the location within the XML structure during SAX parsing will become spread out and have many subtle interactions. This problem is similar to those associated with overdependence on global variables. But if you learn to properly structure SAX code to keep it from becoming unwieldy, it is more straightforward than using the DOM API.

Basic SAX

There are currently two published versions of the SAX API. We’ll use version 2 (see Resources) for our examples. Version 2 uses different class and method names than version 1, but the structure of the code is the same.

SAX is an API, not a parser, so this code is generic across XML parsers. To get the examples to run, you will need to access an XML parser that supports SAX v2. I use Apache’s Xerces parser. (See Resources.) Review your parser’s getting-started guide for specifics on invoking a SAX parser.

The SAX API specification is pretty straightforward. In includes many details, but its primary task is to create a class that implements the ContentHandler interface, a callback interface used by XML parsers to notify your program of SAX events as they are found in the XML document.

The SAX API also conveniently supplies a DefaultHandler implementation class for the ContentHandler interface.

Once you’ve implemented the ContentHandler or extended the DefaultHandler, you need only direct the XML parser to parse a particular document.

Our first example extends the DefaultHandler to print each SAX event to the console. This will give you a feel for what SAX events will be generated and in what order.

To get started, here’s the sample XML document we will use in our first example:

<?xml version="1.0"?>
<simple date="7/7/2000" >
   <name> Bob </name>
   <location> New York </location>
</simple>

Next, we see the source code for XML mapping code of the first example:

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
public class Example1 extends DefaultHandler {
   // Override methods of the DefaultHandler class
   // to gain notification of SAX Events.
   //
        // See org.xml.sax.ContentHandler for all available events.
   //
   public void startDocument( ) throws SAXException {
      System.out.println( "SAX Event: START DOCUMENT" );
   }
   public void endDocument( ) throws SAXException {
      System.out.println( "SAX Event: END DOCUMENT" );
   }
   public void startElement( String namespaceURI,
              String localName,
              String qName,
              Attributes attr ) throws SAXException {
         System.out.println( "SAX Event: START ELEMENT[ " +
                  localName + " ]" );
      // Also, let's print the attributes if
      // there are any...
                for ( int i = 0; i < attr.getLength(); i++ ){
                   System.out.println( "   ATTRIBUTE: " +
                  attr.getLocalName(i) +
                  " VALUE: " +
                  attr.getValue(i) );
      }
   }
   public void endElement( String namespaceURI,
              String localName,
              String qName ) throws SAXException {
      System.out.println( "SAX Event: END ELEMENT[ " +
                  localName + " ]" );
   }
   public void characters( char[] ch, int start, int length )
                  throws SAXException {
      System.out.print( "SAX Event: CHARACTERS[ " );
      try {
         OutputStreamWriter outw = new OutputStreamWriter(System.out);
         outw.write( ch, start,length );
         outw.flush();
      } catch (Exception e) {
         e.printStackTrace();
      }
      System.out.println( " ]" );
   }
   public static void main( String[] argv ){
      System.out.println( "Example1 SAX Events:" );
      try {
         // Create SAX 2 parser...
         XMLReader xr = XMLReaderFactory.createXMLReader();
         // Set the ContentHandler...
         xr.setContentHandler( new Example1() );
            // Parse the file...
         xr.parse( new InputSource(
               new FileReader( "Example1.xml" )) );
      }catch ( Exception e )  {
         e.printStackTrace();
      }
   }
}

Finally, here is the output generated by running the first example with our sample XML document:

Example1 SAX Events:
SAX Event: START DOCUMENT
SAX Event: START ELEMENT[ simple ]
   ATTRIBUTE: date VALUE: 7/7/2000
SAX Event: CHARACTERS[
    ]
SAX Event: START ELEMENT[ name ]
SAX Event: CHARACTERS[  Bob  ]
SAX Event: END ELEMENT[ name ]
SAX Event: CHARACTERS[
    ]
SAX Event: START ELEMENT[ location ]
SAX Event: CHARACTERS[  New York  ]
SAX Event: END ELEMENT[ location ]
SAX Event: CHARACTERS[
 ]
SAX Event: END ELEMENT[ simple ]
SAX Event: END DOCUMENT

As you can see, the SAX parser will call the appropriate ContentHandler method for every SAX event it discovers in the XML document.

Hello world

Now that we understand the basic pattern of SAX, we can start to do something slightly useful: extract values from our simple XML document and demonstrate the classic hello world program.

First, for each element we are interested in mapping to Java, we will reset our collection buffer in the startElement SAX event handler. Then, when startElement for a tag has occurred, but endELement has not, we will collect the characters presented by the characters SAX event. Finally, when the endElement for the tag has occurred, we will store the collected characters in the appropriate field of a Java object.

Below you’ll find the sample data for our hello world example:

<?xml version="1.0"?>
<simple date="7/7/2000" >
   <name> Bob </name>
   <location> New York </location>
</simple>

Here’s the source listing for the XML mapping code of the hello world example:

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
public class Example2 extends DefaultHandler {
   // Local variables to store data
   // found in the XML document
   public  String  name       = "";
   public  String  location   = "";
   // Buffer for collecting data from // the "characters" SAX event.
   private CharArrayWriter contents = new CharArrayWriter();
   // Override methods of the DefaultHandler class
   // to gain notification of SAX Events.
   //
        // See org.xml.sax.ContentHandler for all available events.
   //
   public void startElement( String namespaceURI,
              String localName,
              String qName,
              Attributes attr ) throws SAXException {
      contents.reset();
   }
   public void endElement( String namespaceURI,
              String localName,
              String qName ) throws SAXException {
      if ( localName.equals( "name" ) ) {
         name = contents.toString();
      }
      if ( localName.equals( "location" ) ) {
         location = contents.toString();
      }
   }
   public void characters( char[] ch, int start, int length )
                  throws SAXException {
      contents.write( ch, start, length );
   }
   public static void main( String[] argv ){
      System.out.println( "Example2:" );
      try {
         // Create SAX 2 parser...
         XMLReader xr = XMLReaderFactory.createXMLReader();
         // Set the ContentHandler...
         Example2 ex2 = new Example2();
         xr.setContentHandler( ex2 );
         // Parse the file...
         xr.parse( new InputSource(
               new FileReader( "Example2.xml" )) );
         // Say hello...
         System.out.println( "Hello World from " + ex2.name
                              + " in " + ex2.location );
      }catch ( Exception e )  {
         e.printStackTrace();
      }
   }
}

The following is the output of our hello world example:

Example2:
Hello World from  Bob  in  New York

This is not the simplest hello world program ever written. As such, there are several things worth noting in the example code.

First, the code demonstrates some of the bad features of event-driven code. Things get tricky when event-driven code needs to respond to a pattern of events instead of just a single event. In this specific case, we are looking for a pattern of SAX events that mark the name and location of our simple XML document.

The tagged content is presented in the characters SAX event; the tags themselves are spread between the startElement and endElement SAX events. I got around this in the hello world example by coordinating around the contents buffer, which the startElement always resets. The end element assumes that the contents have been collected and assigns them to the appropriate local variable. This is not a bad pattern, but it assumes that no two fields of a Java object possess the same tag — not always a valid assumption. We will address this issue later.

Another interesting feature of the example code is the use of a contents buffer — a little SAX gotcha. You can create a string directly in the characters SAX event instead of copying the characters to a buffer as in the example. But that means ignoring the fact that the SAX specification of the characters() method indicates the XML parser may call characters() multiple times. This will cause data loss if the data between two tags are large, or if the buffering of the stream feeding the XML parser data breaks in between two tags while you are collecting data. Also, reusing a buffer is much more efficient than constantly creating new strings.

Mapping our first Java object

Now that we’ve gotten through hello world, let’s try a more useful example that maps an XML document to a Java object. This example is similar to hello world, but maps data to a single object and has an accessor for the object — a useful pattern of using SAX present in the rest of the examples. Unlike a constructor or a Factory method, objects mapped in a SAX parser are not available until after parsing. A clean way to deal with this difference is to provide access methods from the mapping class to the finished mapped object. That way, you create the mapping class, attach it to an XMLReader, parse the XML, and then call the accessor to get a reference to the mapped object. A variation of this theme is to supply a set method and then supply the object to be mapped just before parsing.

Take a look at the sample XML document for the third example:

<?xml version="1.0"?>
<customer>
   <FirstName> Bob </FirstName>
   <LastName> Hustead </LastName>
   <CustId> abc.123 </CustId>
</customer>

Next, we see a simple class that will be mapped with data supplied by our XML document:

package common;
import java.io.*;
// Customer is a very simple class
// that holds fields for a dummy Customer
// data.
// It has a simple method to print it's
// self to a print stream.
public class Customer {
   // Customer member variables.
   public String firstName = "";
   public String lastName  = "";
   public String custId    = "";
        public void print( PrintStream out ) {
            out.println( "Customer: " );
            out.println( "  First Name -> "  + firstName );
            out.println( "  Last Name -> "   + lastName  );
            out.println( "  Customer Id -> " + custId    );
   }
}

This is the source code that does the XML mapping for our third example:

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
import common.*;
public class Example3 extends DefaultHandler {
   // Local Customer object to collect
   // customer XML data.
   private  Customer cust = new Customer();
   // Buffer for collecting data from
   // the "characters" SAX event.
   private CharArrayWriter contents = new CharArrayWriter();
   // Override methods of the DefaultHandler class
   // to gain notification of SAX Events.
   //
        // See org.xml.sax.ContentHandler for all available events.
   //
   public void startElement( String namespaceURI,
              String localName,
              String qName,
              Attributes attr ) throws SAXException {
      contents.reset();
   }
   public void endElement( String namespaceURI,
              String localName,
              String qName ) throws SAXException {
         if ( localName.equals( "FirstName" ) ) {
            cust.firstName = contents.toString();
      }
      if ( localName.equals( "LastName" ) ) {
         cust.lastName = contents.toString();
      }
      if ( localName.equals( "CustId" ) ) {
         cust.custId = contents.toString();
      }
   }
   public void characters( char[] ch, int start, int length )
                  throws SAXException {
      contents.write( ch, start, length );
   }
   public Customer getCustomer()  {
           return cust;
   }
   public static void main( String[] argv ){
      System.out.println( "Example3:" );
      try {
         // Create SAX 2 parser...
         XMLReader xr = XMLReaderFactory.createXMLReader();
         // Set the ContentHandler...
         Example3 ex3 = new Example3();
         xr.setContentHandler( ex3 );
         // Parse the file...
         xr.parse( new InputSource(
               new FileReader( "Example3.xml" )) );
         // Display customer to stdout...
         Customer cust = ex3.getCustomer();
         cust.print( System.out );
      }catch ( Exception e )  {
         e.printStackTrace();
      }
   }
}

The following is the output generated by our simple Customer object, populated with data from our XML document:

Example3:
Customer:
  First Name ->  Bob
  Last Name ->  Hustead
  Customer Id ->  abc.123

A simple list of Java objects

For more complex XML documents, we will need to map lists of objects into Java. Mapping object lists is like bartending: when a bartender pours several beers in a row, he usually leaves the tap running while he quickly swaps glasses under the tap. This is exactly what we need to do to capture a list of objects. We have no control over incoming SAX events; they flow in like beer from a tap that we can’t shut off. To solve the problem, we need to provide empty containers, allow them to fill up, and continually replace them.

Our next example highlights this technique. Using an XML document that represents some information about a fictional customer order, we will map the XML that represents a list of order items to a vector of Java order-item objects. The key to implementing this concept is the current item. We’ll create a variable named currentOrderItem. Every time we get an event indicating a new order item (startElement for the OrderItem tag), we will create a new empty order-item object, add it to the list of order items, and assign it as the current order item. The XML parser does the rest.

First, here is the XML document representing our fictional customer order:

<?xml version="1.0"?>
<CustomerOrder>
   <Customer>
      <FirstName> Bob </FirstName>
      <LastName> Hustead </LastName>
      <CustId> abc.123 </CustId>
   </Customer>
   <OrderItems>
      <OrderItem>
         <Quantity> 1 </Quantity>
              <ProductCode> 48.GH605A </ProductCode>
         <Description> Pet Rock </Description>
         <Price> 19.99 </Price>
      </OrderItem>
      <OrderItem>
         <Quantity> 12 </Quantity>
              <ProductCode> 47.9906Z </ProductCode>
         <Description> Bazooka Bubble Gum </Description>
         <Price> 0.33 </Price>
      </OrderItem>
      <OrderItem>
         <Quantity> 2 </Quantity>
              <ProductCode> 47.7879H </ProductCode>
         <Description> Flourescent Orange Squirt Gun </Description>
         <Price> 2.50 </Price>
      </OrderItem>
   </OrderItems>
</CustomerOrder>

Again, here is our simple customer class:

package common;
import java.io.*;
// Customer is a very simple class
// that holds fields for a dummy Customer
// data.
// It has a simple method to print it's
// self to a print stream.
public class Customer {
   // Customer member variables.
   public String firstName = "";
   public String lastName  = "";
   public String custId    = "";
        public void print( PrintStream out ) {
            out.println( "Customer: " );
            out.println( "  First Name -> "  + firstName );
            out.println( "  Last Name -> "   + lastName  );
            out.println( "  Customer Id -> " + custId    );
   }
}

Next, a simple class to represent an order item:

package common;
import java.io.*;
// OrderItem is a very simple class
// that holds fields for dummy order
// item data.
// It has a simple method to print it's
// self to a print stream.
public class OrderItem {
   // OrderItem member variables.
   public int    quantity     = 0;
   public String productCode  = "";
   public String description  = "";
   public double price        = 0.0;
        public void print( PrintStream out ) {
            out.println( "OrderItem: " );
            out.println( "  Quantity -> "  + Integer.toString(quantity) );
            out.println( "  Product Code -> "   + productCode  );
            out.println( "  Description -> " + description    );
            out.println( "  price -> " + Double.toString( price )    );
   }
}

Now, we turn our attention to the SAX parser for example four, which maps customers and order items:

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
import java.util.*;
import common.*;
public class Example4 extends DefaultHandler {
   // Local Customer object to collect
   // customer XML data.
   private  Customer cust = new Customer();
   // Local list of order items...
   private Vector orderItems = new Vector();
   // Local current order item reference...
   private OrderItem currentOrderItem;
   // Buffer for collecting data from
   // the "characters" SAX event.
   private CharArrayWriter contents = new CharArrayWriter();
   // Override methods of the DefaultHandler class
   // to gain notification of SAX Events.
   //
        // See org.xml.sax.ContentHandler for all available events.
   //
   public void startElement( String namespaceURI,
               String localName,
              String qName,
              Attributes attr ) throws SAXException {
      contents.reset();
      // New twist...
      if ( localName.equals( "OrderItem" ) ) {
                   currentOrderItem = new OrderItem();
         orderItems.addElement( currentOrderItem );
      }
   }
   public void endElement( String namespaceURI,
               String localName,
              String qName ) throws SAXException {
      if ( localName.equals( "FirstName" ) ) {
         cust.firstName = contents.toString();
      }
      if ( localName.equals( "LastName" ) ) {
         cust.lastName = contents.toString();
      }
      if ( localName.equals( "CustId" ) ) {
         cust.custId = contents.toString();
      }
      if ( localName.equals( "Quantity" ) ) {
         currentOrderItem.quantity = Integer.valueOf(contents.toString().trim()).intValue();
      }
      if ( localName.equals( "ProductCode" ) ) {
         currentOrderItem.productCode = contents.toString();
      }
      if ( localName.equals( "Description" ) ) {
         currentOrderItem.description = contents.toString();
      }
      if ( localName.equals( "Price" ) ) {
         currentOrderItem.price = Double.valueOf(contents.toString().trim()).doubleValue();
      }
   }
   public void characters( char[] ch, int start, int length )
                  throws SAXException {
         contents.write( ch, start, length );
   }
   public Customer getCustomer()  {
           return cust;
   }
   public Vector getOrderItems() {
           return orderItems;
   }
   public static void main( String[] argv ){
      System.out.println( "Example4:" );
      try {
         // Create SAX 2 parser...
         XMLReader xr = XMLReaderFactory.createXMLReader();
         // Set the ContentHandler...
         Example4 ex4 = new Example4();
         xr.setContentHandler( ex4 );
         // Parse the file...
         xr.parse( new InputSource(
               new FileReader( "Example4.xml" )) );
         // Display customer to stdout...
         Customer cust = ex4.getCustomer();
         cust.print( System.out );
         // Display all order items to stdout...
         OrderItem i;
         Vector items = ex4.getOrderItems();
         Enumeration e = items.elements();
         while( e.hasMoreElements()){
                           i = (OrderItem) e.nextElement();
            i.print( System.out );
         }
      }catch ( Exception e )  {
         e.printStackTrace();
      }
   }
}

Here’s the output generated by our Customer and OrderItems objects:

Example4:
Customer:
  First Name ->  Bob
  Last Name ->  Hustead
  Customer Id ->  abc.123
OrderItem:
  Quantity -> 1
  Product Code ->  48.GH605A
  Description ->  Pet Rock
  price -> 19.99
OrderItem:
  Quantity -> 12
  Product Code ->  47.9906Z
  Description ->  Bazooka Bubble Gum
  price -> 0.33
OrderItem:
  Quantity -> 2
  Product Code ->  47.7879H
  Description ->  Fluorescent Orange Squirt Gun
  price -> 2.5

When the structure of the XML document becomes more complex, the real task is managing the creation of empty containers to contain the flow of SAX events. For simpler things like a single list of objects, this management is straightforward. However, we will need to develop techniques to help manage more complicated containment hierarchies such as lists of lists and lists of objects that contain lists.

Objects sharing tags

Before we get to the more advanced containment layouts, there is another difficulty with SAX we will sometimes need to address. While it may not always be present, occasionally data at different places in the XML document will be tagged with the same tag, but will have to be mapped to different objects in Java. Suppose you have a customer section and a customer representative section in your XML document. Both of these sections have fields with FirstName and LastName as tags. Because of this ambiguity, you can no longer be sure which object the contents buffer should be assigned to during the endElement SAX event. You must keep some information about containing startElement SAX events to clarify which object collects the contents during the common endELement SAX event.

This problem can become dangerous, even with XML documents that don’t initially have this structure, if the XML document doesn’t have a DTD or the DTD is changed without updating the mapping code. Without the DTD, your clients can legally supply you with any tag that you are mapping in the wrong place within the XML document.

In truth, the only way to safely deal with the problem is to constantly track information about all open start tags. As a simple example, let’s say you have the following XML document:

<?xml version=1.0"?>
<CustomerInformation>
   <Customer>
      <Name>
      Some Customer Name
      </Name>
      <Company>
         <Name>
         The customer's company name
         </Name>
      </Company>
   </Customer>   

Even though the tag name Name is ambiguous, the full path to the name is not — it’s either CustomerInformation->Customer->Name or CustomerInformation->Customer->Company->Name. Keeping the full path available at all times guarantees that accidentally reusing a tag name won’t fool your mapping code. It turns out that mapping recursive XML structures requires a solution to this problem; we will cover this issue in the next article.

Next, we’ll examine two examples for dealing with this situation. The first example is a brute force if solution. I will set some flags during the containing element’s startElement SAX event. Then during the endElement event, I will run if statements against the flags to determine which object the contents should be assigned to.

Below you’ll find our sample XML document demonstrating overlapping tag names:

<?xml version="1.0"?>
<Shapes>
   <Triangle name="tri1" >
      <x> 3 </x>
      <y> 0 </y>
      <height> 3 </height>
      <width> 5 </width>
   </Triangle>
   <Triangle name="tri2" >
      <x> 5 </x>
      <y> 0 </y>
      <height> 3 </height>
      <width> 5 </width>
   </Triangle>
   <Square name="sq1" >
      <x> 0 </x>
      <y> 0 </y>
      <height> 3 </height>
      <width> 3 </width>
   </Square>
   <Circle name="circ1" >
      <x> 10 </x>
      <y> 10 </y>
      <height> 3 </height>
      <width> 3 </width>
   </Circle>
</Shapes>

The following is a base class for all of our dummy shape classes:

package common;
// Dummy base class to hold values 
// common to shapes.
public class Shape {
   public int x = 0;
   public int y = 0;
   public int height = 0;
   public int width  = 0;
   
}

Here’s a simple triangle class:

package common;
import java.io.*;
// Dummy triangle shape.
public class Triangle extends Shape {
   // Dummy Triangle specific stuff...
   public String name = "";   
   
   public void print( PrintStream out ){
      out.println( "Triange: " + name + 
            " x: " + x  +
            " y: " + y  +
            " width: " + width + 
            " height: " + height );    
   }
}

Next, we see a simple square class:

package common;
import java.io.*;
// Dummy square shape.
public class Square extends Shape {
   // Dummy Triangle specific stuff...
   public String name = "";   
   
   public void print( PrintStream out ){
      out.println( "Square: " + name + 
            " x: " + x  +
            " y: " + y  +
            " width: " + width + 
            " height: " + height );    
   }
}

Here’s a simple circle shape:

package common;
import java.io.*;
// Dummy circle shape.
public class Circle extends Shape {
   // Dummy Circle specific stuff...
   public String name = "";
   public void print( PrintStream out ){
      out.println( "Circle: " + name +
            " x: " + x  +
            " y: " + y  +
            " width: " + width +
            " height: " + height );
   }
}

Next, we map code that represents the brute force method of separating identical tag names associated with different objects:

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
import java.util.*;
import common.*;
public class Example5 extends DefaultHandler {
   // Flags to help us capture the contents
   // of a tagged element.
   private boolean inCircle      = false;
   private boolean inTriangle    = false;
   private boolean inSquare      = false;
   // Local list of different shapes...
   private Vector triangles = new Vector();
   private Vector squares = new Vector();
   private Vector circles = new Vector();
   // Local current shape references...
   private Triangle currentTriangle;
        private Circle   currentCircle;
   private Square   currentSquare;
   // Buffer for collecting data from
   // the "characters" SAX event.
   private CharArrayWriter contents = new CharArrayWriter();
   // Override methods of the DefaultHandler class
   // to gain notification of SAX Events.
   //
        // See org.xml.sax.ContentHandler for all available events.
   //
   public void startElement( String namespaceURI,
               String localName,
              String qName,
              Attributes attr ) throws SAXException {
      contents.reset();
      if ( localName.equals( "Circle" ) ) {
                    inCircle = true;
                        currentCircle = new Circle();
         currentCircle.name = attr.getValue( "name" );
         circles.addElement( currentCircle );
      }
      if ( localName.equals( "Square" ) ) {
                    inSquare = true;
         currentSquare = new Square();
         currentSquare.name = attr.getValue( "name" );
         squares.addElement( currentSquare );
      }
      if ( localName.equals( "Triangle" ) ) {
                    inTriangle = true;
         currentTriangle = new Triangle();
         currentTriangle.name = attr.getValue( "name" );
         triangles.addElement( currentTriangle );
      }
   }
   public void endElement( String namespaceURI,
               String localName,
              String qName ) throws SAXException {
      if ( localName.equals( "x" ) ) {
         if ( inCircle ) {
                           currentCircle.x = 
                              Integer.valueOf
                              (contents.toString().trim()).intValue();
         }
         else if ( inSquare ) {
                           currentSquare.x = 
                               Integer.valueOf
                              (contents.toString().trim()).intValue();
         }
         else {
                           currentTriangle.x = 
                              Integer.valueOf
                              (contents.toString().trim()).intValue();
         }
      }
      if ( localName.equals( "y" ) ) {
         if ( inCircle ) {
                           currentCircle.y = 
                              Integer.valueOf
                              (contents.toString().trim()).intValue();
         }
         else if ( inSquare ) {
                           currentSquare.y = 
                              Integer.valueOf
                              (contents.toString().trim()).intValue();
         }
         else {
                           currentTriangle.y = 
                              Integer.valueOf
                              (contents.toString().trim()).intValue();
         }
      }
      if ( localName.equals( "width" ) ) {
         if ( inCircle ) {
                           currentCircle.width =
                              Integer.valueOf
                              (contents.toString().trim()).intValue();
         }
         else if ( inSquare ) {
                           currentSquare.width = 
                              Integer.valueOf
                              (contents.toString().trim()).intValue();
         }
         else {
                           currentTriangle.width = 
                              Integer.valueOf
                              (contents.toString().trim()).intValue();
         }
      }
      if ( localName.equals( "height" ) ) {
         if ( inCircle ) {
                           currentCircle.height = 
                              Integer.valueOf
                              (contents.toString().trim()).intValue();
         }
         else if ( inSquare ) {
                           currentSquare.height = 
                              Integer.valueOf
                              (contents.toString().trim()).intValue();
         }
         else {
                           currentTriangle.height = 
                              Integer.valueOf
                              (contents.toString().trim()).intValue();
         }
      }
      if ( localName.equals( "Circle" ) ) {
                    inCircle = false;
      }
      if ( localName.equals( "Square" ) ) {
                    inSquare = false;
      }
      if ( localName.equals( "Triangle" ) ) {
                    inTriangle = false;
      }
   }
   public void characters( char[] ch, int start, int length )
                  throws SAXException {
      // accumulate the contents into a buffer.
      contents.write( ch, start, length );
   }
   public Vector getCircles() {
           return circles;
   }
   public Vector getSquares() {
           return squares;
   }
   public Vector getTriangles() {
           return triangles;
   }
   public static void main( String[] argv ){
      System.out.println( "Example5:" );
      try {
         // Create SAX 2 parser...
         XMLReader xr = XMLReaderFactory.createXMLReader();
         // Set the ContentHandler...
         Example5 ex5 = new Example5();
         xr.setContentHandler( ex5 );
         // Parse the file...
         xr.parse( new InputSource(
               new FileReader( "Example5.xml" )) );
         // Display all circles to stdout...
         Circle c;
         Vector items = ex5.getCircles();
         Enumeration e = items.elements();
         while( e.hasMoreElements()){
                           c = (Circle) e.nextElement();
            c.print( System.out );
         }
         // Display all squares to stdout...
         Square s;
         items = ex5.getSquares();
         e = items.elements();
         while( e.hasMoreElements()){
                           s = (Square) e.nextElement();
            s.print( System.out );
         }
         // Display all triangle to stdout...
         Triangle t;
         items = ex5.getTriangles();
         e = items.elements();
         while( e.hasMoreElements()){
                           t = (Triangle) e.nextElement();
            t.print( System.out );
         }
      }catch ( Exception e )  {
         e.printStackTrace();
      }
   }
}

The following is the output we have collected into our shape classes:

Example5:
Circle: circ1 x: 10 y: 10 width: 3 height: 3
Square: sq1 x: 0 y: 0 width: 3 height: 3
Triange: tri1 x: 3 y: 0 width: 5 height: 3
Triange: tri2 x: 5 y: 0 width: 5 height: 3

The second solution takes advantage of the fact that you can replace the SAX ContentHandler of a SAX parser while it’s running. This allows us to divide our mapping tasks into modular pieces. We can implement mapping code only in the local terms of its particular fragment of XML document.

The endElement() method of the second example does not contain a network of nested if statements. This modularity becomes critical when processing more complex XML documents. It also ensures that this style of mapping code does not error in the face of duplicate tag names in unexpected locations within the XML document.

Although the second method is a little bulkier due to the replication of most of the class definition, this technique of swapping the ContentHandler is the first step toward a more generic solution to parsing with SAX. Swapping the ContentHandler is also another way for us to swap mugs under the running tap of a SAX parser.

The following code demonstrates the ContentHandler swap technique. The contents buffer is shared by the Example6 class and the other type-specific ContentHandler inner classes:

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
import java.util.*;
import common.*;
public class Example6 extends DefaultHandler {
   // XML Parser...
   XMLReader parser;
        // Mapping delegates...
        Example6Circle circleMapper = new Example6Circle();
        Example6Square squareMapper = new Example6Square();
        Example6Triangle triangleMapper = new Example6Triangle();
   // Local list of different shapes...
   private Vector circles = new Vector();
   private Vector triangles = new Vector();
   private Vector squares = new Vector();
   // Buffer for collecting data from
   // the "characters" SAX event.
   private CharArrayWriter contents = new CharArrayWriter();
   // Constructor with XML Parser...
   Example6( XMLReader parser ) {
           this.parser = parser;
   }
   // Override methods of the DefaultHandler class
   // to gain notification of SAX Events.
   //
        // See org.xml.sax.ContentHandler for all available events.
   //
   public void startElement( String namespaceURI,
               String localName,
              String qName,
              Attributes attr ) throws SAXException {
      contents.reset();
      if ( localName.equals( "Circle" ) ) {
                        Circle aCircle = new Circle();
         aCircle.name = attr.getValue( "name" );
         circles.addElement( aCircle );
         circleMapper.collectCircle( parser, this, aCircle );
      }
      if ( localName.equals( "Square" ) ) {
                    Square aSquare = new Square();
         aSquare.name = attr.getValue( "name" );
         squares.addElement( aSquare );
         squareMapper.collectSquare( parser, this, aSquare );
      }
      if ( localName.equals( "Triangle" ) ) {
         Triangle aTriangle = new Triangle();
         aTriangle.name = attr.getValue( "name" );
         triangles.addElement( aTriangle );
         triangleMapper.collectTriangle( parser, this, aTriangle);
      }
   }
   public void endElement( String namespaceURI,
               String localName,
              String qName ) throws SAXException {
      // Nothing left for the Example 6 mapper
      // to handle in the endElement SAX event.
   }
   public void characters( char[] ch, int start, int length )
                  throws SAXException {
      // accumulate the contents into a buffer.
      contents.write( ch, start, length );
   }
   public Vector getCircles() {
           return circles;
   }
   public Vector getSquares() {
           return squares;
   }
   public Vector getTriangles() {
           return triangles;
   }
   public static void main( String[] argv ){
      System.out.println( "Example6:" );
      try {
         // Create SAX 2 parser...
         XMLReader xr = XMLReaderFactory.createXMLReader();
         // Set the ContentHandler...
         Example6 ex6 = new Example6(xr);
         xr.setContentHandler( ex6 );
         // Parse the file...
         xr.parse( new InputSource(
               new FileReader( "Example6.xml" )) );
         // Display all circles to stdout...
         Circle c;
         Vector items = ex6.getCircles();
         Enumeration e = items.elements();
         while( e.hasMoreElements()){
                           c = (Circle) e.nextElement();
            c.print( System.out );
         }
         // Display all squares to stdout...
         Square s;
         items = ex6.getSquares();
         e = items.elements();
         while( e.hasMoreElements()){
                           s = (Square) e.nextElement();
            s.print( System.out );
         }
         // Display all triangle to stdout...
         Triangle t;
         items = ex6.getTriangles();
         e = items.elements();
         while( e.hasMoreElements()){
                           t = (Triangle) e.nextElement();
            t.print( System.out );
         }
      }catch ( Exception e )  {
         e.printStackTrace();
      }
   }
}
class Example6Circle extends DefaultHandler {
   // Local current circle reference...
        private Circle   currentCircle;
   // Parent...
   ContentHandler parent;
   // XML Parser
   XMLReader parser;
   // Buffer for collecting data from
   // the "characters" SAX event.
   private CharArrayWriter contents = new CharArrayWriter();
   public void collectCircle( XMLReader parser,
               ContentHandler parent,
               Circle newCircle ) {
      this.parent = parent;
      this.parser = parser;
      parser.setContentHandler( this );
      currentCircle = newCircle;
   }
   // Override methods of the DefaultHandler class
   // to gain notification of SAX Events.
   //
        // See org.xml.sax.ContentHandler for all available events.
   //
   public void startElement( String namespaceURI,
               String localName,
              String qName,
              Attributes attr ) throws SAXException {
      contents.reset();
   }
   public void endElement( String namespaceURI,
               String localName,
              String qName ) throws SAXException {
      if ( localName.equals( "x" ) ) {
                          currentCircle.x = 
                            Integer.valueOf
                            (contents.toString().trim()).intValue();
      }
      if ( localName.equals( "y" ) ) {
                          currentCircle.y = 
                              Integer.valueOf
                              (contents.toString().trim()).intValue();
      }
      if ( localName.equals( "width" ) ) {
                          currentCircle.width = 
                            Integer.valueOf
                            (contents.toString().trim()).intValue();
      }
      if ( localName.equals( "height" ) ) {
                          currentCircle.height = 
                             Integer.valueOf
                             (contents.toString().trim()).intValue();
      }
      if ( localName.equals( "Circle" ) ) {
                    // swap content handler back to parent
         parser.setContentHandler(parent);
      }
   }
   public void characters( char[] ch, int start, int length )
                  throws SAXException {
      // accumulate the contents into a buffer.
      contents.write( ch, start, length );
   }
}
class Example6Square extends DefaultHandler {
   // Local current square reference...
        private Square   currentSquare;
   // Parent...
   ContentHandler parent;
   // XML Parser
   XMLReader parser;
   // Buffer for collecting data from
   // the "characters" SAX event.
   private CharArrayWriter contents = new CharArrayWriter();
   public void collectSquare( XMLReader parser,
               ContentHandler parent,
               Square newSquare ) {
      this.parent = parent;
      this.parser = parser;
      parser.setContentHandler( this );
      currentSquare = newSquare;
   }
   // Override methods of the DefaultHandler class
   // to gain notification of SAX Events.
   //
        // See org.xml.sax.ContentHandler for all available events.
   //
   public void startElement( String namespaceURI,
               String localName,
              String qName,
              Attributes attr ) throws SAXException {
      contents.reset();
   }
   public void endElement( String namespaceURI,
               String localName,
              String qName ) throws SAXException {
      if ( localName.equals( "x" ) ) {
                          currentSquare.x = Integer.valueOf
                             (contents.toString().trim()).intValue();
      }
      if ( localName.equals( "y" ) ) {
                          currentSquare.y = Integer.valueOf
                             (contents.toString().trim()).intValue();
      }
      if ( localName.equals( "width" ) ) {
                          currentSquare.width = Integer.valueOf
                             (contents.toString().trim()).intValue();
      }
      if ( localName.equals( "height" ) ) {
                          currentSquare.height = Integer.valueOf
                             (contents.toString().trim()).intValue();
      }
      if ( localName.equals( "Square" ) ) {
                    // swap content handler back to parent
         parser.setContentHandler(parent);
      }
   }
   public void characters( char[] ch, int start, int length )
                  throws SAXException {
      // accumulate the contents into a buffer.
      contents.write( ch, start, length );
   }
}
class Example6Triangle extends DefaultHandler {
   // Local current triangle reference...
        private Triangle currentTriangle;
   // Parent...
   ContentHandler parent;
   // XML Parser
   XMLReader parser;
   // Buffer for collecting data from
   // the "characters" SAX event.
   private CharArrayWriter contents = new CharArrayWriter();
   public void collectTriangle( XMLReader parser,
               ContentHandler parent,
               Triangle newTriangle ) {
      this.parent = parent;
      this.parser = parser;
      parser.setContentHandler( this );
      currentTriangle = newTriangle;
   }
   public void endElement( String namespaceURI,
               String localName,
              String qName ) throws SAXException {
      if ( localName.equals( "x" ) ) {
                          currentTriangle.x = 
                             Integer.valueOf
                             (contents.toString().trim()).intValue();
      }
      if ( localName.equals( "y" ) ) {
                          currentTriangle.y = 
                            Integer.valueOf
                            (contents.toString().trim()).intValue();
      }
      if ( localName.equals( "width" ) ) {
                          currentTriangle.width = 
                             Integer.valueOf
                             (contents.toString().trim()).intValue();
      }
      if ( localName.equals( "height" ) ) {
                          currentTriangle.height = Integer.valueOf
                             (contents.toString().trim()).intValue();
      }
      if ( localName.equals( "Triangle" ) ) {
                    // swap content handler back to parent
         parser.setContentHandler(parent);
      }
   }
   public void characters( char[] ch, int start, int length )
                  throws SAXException {
      // accumulate the contents into a buffer.
      contents.write( ch, start, length );
   }
}

Notice that we get the same output from our shape classes:

Example6:
Circle: circ1 x: 10 y: 10 width: 3 height: 3
Square: sq1 x: 0 y: 0 width: 3 height: 3
Triange: tri1 x: 3 y: 0 width: 5 height: 3
Triange: tri2 x: 5 y: 0 width: 5 height: 3

Conclusion

We’ve demonstrated that SAX, when properly applied, has many advantages over the DOM API. We’ve covered some of the basic perspectives regarding SAX that allow us to effectively write XML to Java mapping code for simple and moderately complex XML documents. We’ve also highlighted some of the danger areas for applying the SAX API.

Finally, I hope you now understand the implications of using the DOM API in performance-sensitive environments, where the SAX API shines.

In the next article, we will tackle recursive XML structures, the ambiguous tag name problem, and the navigational aspects of SAX. These three threads come together in a general purpose class library that turns even the most complicated XML mapping code into a declarative style of coding that focuses on container management — swapping beer glasses under the open tap.

Source: www.infoworld.com