Easy Java/XML integration with JDOM, Part 2

Use JDOM to create and mutate XML

In Part 1 of this series, we introduced you to JDOM, and discussed how you can use it to extract information from an existing XML data source such as a file, input stream, or URL. Additionally, we covered the core JDOM classes and explained how those classes work together to represent an XML document.

TEXTBOX: TEXTBOX_HEAD: Easy Java/XML integration with JDOM: Read the whole series!

:END_TEXTBOX

In this article, we tell the rest of the story and explain how you can modify XML documents or even create them from scratch. JDOM accomplishes those tasks, using standard conventions as much as possible, and lets you create, change, or move nearly every document component at runtime.

One word of warning before we begin: The JDOM API is in beta and subject to change until the 1.0 release. That release is expected within the next few months, but it may happen sooner or later, depending on when the JDOM community approves on the solidity of the design and implementation. Keep an eye on the JDOM Website for changes.

Creating a Document

In Part 1 you learned how to construct a JDOM Document from an external source:

SAXBuilder builder = new SAXBuilder();  // parameters control validation, etc
Document doc = builder.build(url);

It’s also possible and even easier to construct a JDOM Document from scratch with no existing XML data store:

Element root = new Element("myRootElement");
Document doc = new Document(root);

As simple as that, you can construct an Element and a Document using that Element as the document root. At that point, normal document manipulation can occur as well:

root.setText("This is a root element");

The above code sets the root element’s text content to the given string. If the Document were to output now, using XMLOutputter as discussed in Part 1, the result would be the following XML:

<?xml version="1.0"?>
<root>This is a root element</root>

One major advantage of JDOM is that you don’t need to use factories or other advanced programming models to create either the JDOM Document or constructs within the Document. You can compare that to DOM, which requires the following code to accomplish the same task: (We cannot compare it to SAX because SAX is a read-only API.)

// DOM code:
// Creating the document is vendor-specific, or requires the use of JAXP
Document doc = new org.apache.xerces.dom.DocumentImpl();
// Create the root node and its text node, using the document as a factory
Element root = myDocument.createElement("myRootElement");
Text text = myDocument.createText("This is a root element");
// Put the nodes into the document tree
root.appendChild(text);
myDocument.appendChild(root);

One thing you’ll notice is that with DOM you don’t have a standard way to create a Document. You have to either use implementation-specific commands or use Sun Microsystems’ new JAXP API, which doesn’t work with all parsers and doesn’t yet support DOM2 and SAX 2.0. With JDOM, that is not an issue because JDOM documents are created through normal construction calls. You’ll also notice that a DOM Node is always constructed through a factory method on its parent Document. That poses the strange chicken-and-egg problem that a DOM Element can only be created using a DOM Document object, but an XML Document should never exist without a root element! Again, with JDOM that is not an issue because elements can be created independently of documents.

Additionally, it is obvious how you move JDOM elements between documents: you simply move the children. Consider this example, where parent1 and parent2 are Elements in different JDOM documents:

// Add to first document's element
Element movable = new Element("movableRootElement");
parent1.addContent(movable);
// Remove and add to another document's element
parent1.removeContent(movable);
parent2.addContent(movable);

That straightforward move operation is possible because the JDOM Element is not created by or tied to a specific Document. In DOM, to accomplish that you must import the Element from the first DOM Document to the second, and then perform the append; a direct move fails to execute.

// DOM code:
Element movable = doc1.createElement("movable");
parent1.appendNode(movable);
// Try to append to another document's element
parent2.appendNode(movable);
// This causes an error! Incorrect document!

As you can see here, one of JDOM’s core features is simplifying the way you create and modify documents.

Creating an Element

Every JDOM Document should contain, at a minimum, a root Element. Each Element, in turn, can contain as few or as many child elements as needed to represent the data in the XML document. The root Element can be passed to a Document at the time it’s created, as we showed earlier:

Element root = new Element("myRootElement");
Document doc = new Document(root);

To change a Document‘s root Element, you call setRootElement(Element element) on the Document:

Element newRoot = new Element("myNewRootElement");
doc.setRootElement(newRoot);

The constructor for org.jdom.Element takes the Element‘s string name and can optionally take information about the Element‘s namespace (which we will discuss later). This name is checked against XML 1.0 specification requirements for element naming; if the name is illegal, an IllegalNameException is thrown. To avoid having to look out for exceptions every time an Element is created, IllegalNameException extends IllegalArgumentException — a runtime exception. Thus, you don’t have to take any special actions when creating a new Element since JDOM will ensure only legal XML names are used. For example:

// This code will throw an IllegalNameException at runtime.
Element badElement = new Element("*(foo)");

That validation is handled by the org.jdom.Verifier class, which contains rules for all aspects of XML syntax. Applications can also use that class directly to ensure that a supplied name, such as in an XML editor, is correct before usage.

Elements without children most typically contain textual content. That content looks like this in XML:

<elementName>Here is some textual content</elementName>

In Part 1, we discussed that you can retrieve that text content with getText():

String textualContent = element.getText();

That content can also be set through setText(String text):

element.setText("My textual content");

JDOM simplifies the handling of special characters such as &, , ‘, and ” by dealing with them automatically. Simply set the content, using the raw string value and, upon output, the XMLOutputter class will automatically convert them to the proper XML entities that represent those characters. Of course, all builders will automatically convert them back to their traditional string values. To demonstrate, the following is perfectly legal in JDOM:

element.setText("Save cocoon.properties in <TOMCAT_HOME>/conf");

The brackets will be automatically converted to entity references when outputted through XMLOutputter:

Save cocoon.properties in &lt;TOMCAT_HOME&gt;/conf 

Making kids

Elements often have child elements. To add a child to an element, you can use the addContent(Element element) method:

// Create a parent and two kids
Element parent = new Element("parent");
Element child1 = new Element("firstChild").setText("I'm number one");
Element child2 = new Element("secondChild").setText("I'm number two");
// Add the kids
parent.addContent(child1);
parent.addContent(child2);

The resulting XML would look like:

<parent>
  <firstChild>I'm number one</firstChild>
  <secondChild>I'm number two</secondChild>
</parent>

JDOM also supports a nesting shortcut, based on a design that may be familiar to users of other APIs such as BEA/WebLogic’s htmlKona package and the Java Apache Element Construction Set (ECS). The idea is that operations on an Element return that Element for further manipulation:

Document doc = new Document(
  new Element("family")
    .addContent(new Element("mom"))
    .addContent(new Element("dad")
       .addContent("kidOfDad")));

Just be careful: It’s the parenthesis location, not the indention, that determines the family tree. While that is handy in some cases, it may be confusing and error prone when constructing large JDOM Documents. If you don’t want to use that feature, just ignore the returned value.

Because elements are constructed without factories, you can easily subclass the Element class, allowing for customizable elements as well as template elements. For example, if every XML document should have a common footer, you can construct that footer as a FooterElement:

root.addContent(new FooterElement());

Its contents might be something like this:

<footer>
  <copyright>
    JavaWorld 2000
  </copyright>
</footer>

You could write the FooterElement class like this:

public class FooterElement extends Element {
  public FooterElement() {
    super("footer");
    addContent("copyright").setText("JavaWorld 2000");
  }
}

When the Element is created, it automatically adds its child elements and all their content. You can expand that idea to construct a general template for copyrights using an additional constructor:

  public FooterElement(int copyrightYear) {
    super("footer");
    addContent("copyright").setText("JavaWorld " + copyrightYear);
  }

Next year, you could then use it as follows:

root.addContent(new FooterElement(2001));

Managing the population

In Part 1, we showed you how to get an Element‘s children as a Java List object:

List children = element.getChildren();

What’s exciting is that the returned list of children is live and can be used to directly manage the children — any changes to the list affect the actual children of the element. By leveraging the Java Collections API, JDOM lets the programmer change the document, using an API that is well understood by developers, making manipulations easy to understand and learn. The following code demonstrates a few things you can do with the returned List:

List children = element.getChildren();
// Remove the third child
children.remove(3);
// Remove all children named "jack"
children.removeAll(element.getChildren("jack"));
// Add a new child
children.add(new Element("jane"));
// Add a new child in the second position
children.add(1, new Element("second"));

Of course, for common tasks there are convenience methods that don’t require List manipulations:

// A non-List way to remove children named "jack"
element.removeChildren("jack");
// A non-List way to add a new child
element.addContent(new Element("jane"));

Setting element attributes

Manipulating element attributes is even simpler than manipulating children because attributes always have a single textual value. JDOM provides basic methods to add and remove Attributes to and from a JDOM Element:

table.addAttribute("vspace", "0");
table.removeAttribute("border");

Additionally, you can construct an Attribute directly and add the Attribute object to the Element through an overloaded addAttribute() method:

element.addAttribute(new Attribute("align", "right"));

Constructing an Attribute such as that makes most sense when the Attribute needs to be placed in a namespace, which we cover in the next section.

Similar to handling child elements, you can obtain the attributes of an Element in a Java List through the getAttributes() method.

// Get all attributes
List attributes = element.getAttributes();
// Remove all attributes
table.getAttributes().clear();

Also similar to Elements, the Attribute constructors will ensure that correct naming is used and throw IllegalNameException when an invalid name is supplied:

// This will throw a runtime exception, IllegalNameException
Attribute illegalAttribute = new Attribute("@lutris.com");

Working with Namespaces

XML Namespaces are a way of giving an XML name two dimensions. For example, consider an ambiguous element name such as table — it could mean a table on which you eat, an HTML table, or a data table as in a spreadsheet. That confusion is caused by the one dimension of information. Additionally, you couldn’t use two elements named table in the same document to mean different things — you would have a namespace collision. However, the XML Namespaces recommendation aims to solve that by providing an additional dimension. If you knew that the second dimension of an element was furniture, it would be easy to understand what table means. That is represented in XML by prefacing the element name with another name (a namespace prefix) and separating the two dimensions with a colon:

<furniture:table />

That prefix then maps to a URI, which is the actual unique identifier, usually identified in the root element of an XML document, using an xmlns attribute:

<?xml version="1.0"?>
<root xmlns:furniture="
      xmlns:xlink="
  <furniture:table>
    <furniture:numChairs>4</furniture:numChairs>
    <furniture:cushion available="yes" xlink:href="
  </furniture:table>
</root>

In JDOM, the org.jdom.Namespace class represents an XML Namespace. You can obtain a namespace through that class by supplying the URI and a prefix to map to that URI:

Namespace furniture =
  Namespace.getNamespace("furniture", ");
Namespace xlink =
  Namespace.getNamespace("xlink", "

The Namespace.getNamespace() method ensures that each request for the same prefix/URI Namespace returns the exact same Namespace object, allowing faster output because namespace comparisons can use the very fast (==) equality check. The Namespace object is passed to the constructor of an Element or Attribute:

Element table = new Element("table", furniture);
Attribute href = new Attribute("href", " xlink);
table.addAttribute(href);

The various JDOM output classes automatically add the required namespace declarations and prefixes to Elements. It’s a relatively simple design, created after weeks of debate on the jdom-interest list to find the right compromise between simplicity, speed, and power.

In addition to the version of getNamespace() that takes a namespace prefix and URI, there is a version that takes a simple namespace URI. That provides a means to assign JDOM Elements to a default namespace, which has no prefix:

Namespace javaworldNS = Namespace.getNamespace("
Element myElement = new Element("article", javaworldNS);
myElement.addContent(new Element("title")
                   .setText("Easy Java/XML integration with JDOM"));

That results in the following output:

<article xmlns="
  <title>Easy Java/XML integration with JDOM</title>
</article>

To handle namespaces, the removeChildren() and removeAttribute() methods also have overloaded versions that accept a Namespace object as an additional argument:

element.removeChildren("img", xhtml);
element.removeAttribute("width", xlink);

Because each Element and Attribute has a reference to its Namespace, you can easily move those elements and attributes from one Document to another without having to worry about moving their Namespaces. That eliminates the possibility that an XML construct would lose its Namespace in transit.

One last note about namespaces: Because JDOM does namespace handling internally, JDOM supports namespaces and validation (with a DTD) at the same time. With other APIs, that isn’t possible.

Providing Processing Instructions

XML Processing Instructions (often simply called PIs) are directives that are included in an XML document. They are intended for use by processing applications to help in special processing of an XML document. In a document, they look like this:

<?cocoon-process type="xslt"?>
<?format-names %s//%gn?>

The processing instructions are split into two pieces of information: the target and the data. The target is the keyword that appears first in the PI, which in the examples above would be cocoon-process and format-names. The data is the rest of the data contained, complete with spaces and other text. The data in the first example PI is type="xslt" and in the second, it is %s//%gn.

In JDOM, the org.jdom.ProcessingInstruction class represents XML PIs. You can construct a PI with target and data strings:

ProcessingInstruction pi =
  new ProcessingInstruction("cocoon-process", "type="xslt"");

Additionally, JDOM takes advantage of the fact that the PI’s data is often formatted as name/value pairs, as in the first example (where the name is type and the value is xslt). Because that is so common, you can construct PIs with a target and a Java Map object that contains the name/value pairs:

Map pairs = new HashMap();
pairs.put("type", "xslt");
ProcessingInstruction pi =
  new ProcessingInstruction("cocoon-process", pairs);

That makes constructing multiple name/value pairs a piece of cake, especially when the value comes from an outside source. For example, consider the following PI:

<?message-format type="XML-RPC" format="XML" encoding="UTF8" length="256"?>

You can create that PI by using the following code:

Map pairs = new HashMap();
pairs.put("type", "XML-RPC");
pairs.put("format", "XML");
pairs.put("encoding", encoding);  // variable substitution
pairs.put("length", length);      // variable substitution
ProcessingInstruction message =
  new ProcessingInstruction("message-format", pairs);

PIs are most often added to an XML document’s document level, but you can place them anywhere within the document:

ProcessingInstruction pi =
  new ProcessingInstruction("cocoon-process", "type="xslt"");
Element root = new Element("root");
Document doc = new Document(root);
// Add to the document
doc.addProcessingInstruction(pi);
// Add under an element
root.addContent(new ProcessingInstruction("myPI", "myData");)

Notice that PIs are added to an Element by using the addContent() method, the same method name that allowed the addition of child elements. That overloading lets you simply add any type of legal JDOM construct to an Element.

You can remove PIs by following the same pattern, through providing a ProcessingInstruction object or a target name to remove:

// Remove PI
doc.removeProcessingInstruction(pi);
// Remove PI with given target name
doc.removeProcessingInstruction("cocoon-process")

Additionally, if there are multiple PIs with the same target at the document level, you can remove all of them with the following convenience method:

// Remove all with given target name
doc.removeProcessingInstructions("cocoon-process")

As with the other JDOM constructs, trying to create a PI with an illegal or invalid target will result in an IllegalNameException.

// This will cause a runtime exception
ProcessingInstruction badPI =
  new ProcessingInstruction("$foo", "not legal name!");

DocTypes

A DOCTYPE declaration in XML lets an XML document reference a DTD and specify that the DTD should be used to validate the document (if validation is requested). DOCTYPE declarations must contain several pieces of data: the root element being constrained (generally the root element of the XML document itself), an optional public ID, specifying the name of the DTD to reference (where the name is a standard name defined by the W3C), and a system ID, specifying how to locate the DTD if resolution of the public ID fails. All together, a common DOCTYPE declaration looks like the following example, which specifies the DTD to be used as the XHTML DTD:

<!DOCTYPE html PUBLIC
  "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "

In that case, html is the root element being constrained, the public ID is "-//W3C//DTD XHTML 1.0 Transitional//EN", and the system ID is "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd". In that example, the system ID is a URI, which would require network access for resolution. The system ID could also reference a local file on the filesystem:

<!DOCTYPE html PUBLIC
  "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "/usr/local/xml/DTDs/xhtml1-transitional.dtd">

Those declarations are represented in JDOM by the org.jdom.DocType class. Because many portions of the declaration are optional, there are several versions of the constructor. First, all three pieces of information can be given:

DocType xhtml = new DocType("html",
                 "-//W3C//DTD XHTML 1.0 Transitional//EN",
                 "

In the case where no public identifier is provided, the following constructor would work:

xhtml = new DocType("html",
                 "

Once constructed, you can add the DocType to a JDOM Document through the setDocType(DocType docType) method:

doc.setDocType(xhtml);

JDOM currently does not support validating the document in memory, although it might at a later time.

Comments anyone?

As in most languages, XML allows the document author to insert comments into his or her document that are ignored by XML parsers and processors. XML comments are text surrounded by to the right:

<?xml version="1.0"?>
<!-- Comments can be at the document level -->
<root>
  <!-- Comments can be within elements as well -->
  <placeholder />
</root>

In JDOM, comments are represented by the org.jdom.Comment class. You can construct a comment with a simple string of text:

Comment comment = new Comment("I would have lots of content here");

A comment cannot contain text with a double hyphen sequence (“–“), so when a comment is created, the Verifier class checks it and throws an IllegalDataException if the comment is invalid (IllegalDataException is like IllegalNameException, and extends IllegalArgumentException).

// This will throw a runtime exception
Comment badComment = new Comment("This --> will never work!");

The Element class has methods that allow comment manipulation:

// Add a comment
root.addContent(new Comment("Anything but double dashes"));

The Document class has similar methods specifically for Comments:

doc.addComment(new Comment("This is a document level comment"));

Mixed content

In Part 1, we described the concept of “mixed content” of an Element. That is content that contains more than just text, and when obtained from an Element through the getMixedContent() method, the resulting List may contain text (Strings), Elements, ProcessingInstructions, Comments, and Entities. In other words, that method returns all content from an Element, regardless of type.

At times, you may need to set an Element‘s content as a list of mixed content.

For example, imagine you need to move all the children of one Element to another Element:

// Get the mixed content
List mixed = oldElement.getMixedContent();
// Remove it all from the old Element
for (Iterator i = mixed.iterator(); i.hasNext(); ) {
  oldElement.removeContent(i.next());
}
// Add it to the new Element - this is easier than iterating through again
newElement.setMixedContent(mixed);

Fibonacci: Mixing the old and the new

Realistic JDOM use cases are too long to print in this article, but below is a short example program that demonstrates several of the features discussed in this series. It’s the classic fibonacci series generator with a modern XML twist. It creates an XML output file that looks like the following:

<?xml version="1.0" encoding="UTF-8"?>
<Fibonacci_Numbers>
  <fibonacci index="0">0</fibonacci>
  <fibonacci index="1">1</fibonacci>
  <fibonacci index="2">1</fibonacci>
  <fibonacci index="3">2</fibonacci>
  <fibonacci index="4">3</fibonacci>
  <fibonacci index="5">5</fibonacci>
  <fibonacci index="6">8</fibonacci>
  <fibonacci index="7">13</fibonacci>
  <fibonacci index="8">21</fibonacci>
  <fibonacci index="9">34</fibonacci>
  <fibonacci index="10">55</fibonacci>
  <fibonacci index="11">89</fibonacci>
  <fibonacci index="12">144</fibonacci>
  <fibonacci index="13">233</fibonacci>
  <fibonacci index="14">377</fibonacci>
  <fibonacci index="15">610</fibonacci>
  <fibonacci index="16">987</fibonacci>
  <fibonacci index="17">1597</fibonacci>
  <fibonacci index="18">2584</fibonacci>
  <fibonacci index="19">4181</fibonacci>
  <fibonacci index="20">6765</fibonacci>
  <fibonacci index="21">10946</fibonacci>
  <fibonacci index="22">17711</fibonacci>
  <fibonacci index="23">28657</fibonacci>
  <fibonacci index="24">46368</fibonacci>
  <fibonacci index="25">75025</fibonacci>
</Fibonacci_Numbers>

The above code is based on an example written by Elliotte Rusty Harold, author of The XML Bible (IDG Books, July 1999), in an XML DevCon presentation (see Resources). It constructs a root element named Fibonacci_Numbers, then adds child elements named fibonacci — each with an index attribute and corresponding content value. After the document is constructed, it’s written to a file named “fibonacci.xml” using the standard XMLOutputter. If any exceptions are encountered, they’re printed to standard error.

import org.jdom.Element;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;
public class FibonacciJDOM {
  public static void main(String[] args) {
    Element root = new Element("Fibonacci_Numbers");
    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;
    for (int i = 0; i <= 25; i++) {
      Element fibonacci = new Element("fibonacci");
      fibonacci.addAttribute("index", String.valueOf(i));
      fibonacci.setText(low.toString());
      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(fibonacci);
    }
    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("fibonacci.xml");
      XMLOutputter serializer = new XMLOutputter();
      serializer.output(doc, out);
      out.flush();
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }
  }
}

If you’re interested in learning how to write this program using DOM, we recommend you read Harold’s presentation, which includes that program as well as several others.

Final words

At this point, you should be ready to tackle the world…or at least the part that can be represented in XML. Hopefully you have seen that JDOM’s simplicity can greatly aid you in getting your XML-related tasks done easily and efficiently, without resorting to APIs that weren’t written for the Java developer. JDOM is about making XML accessible to all Java developers rather than an elite few. Additionally, the JDOM community is adding more functionality every day, and JDOM 1.0 looks to have support for XSL transformations in an early form, and the 1.1 API will have XPath support — both in the intuitive, easy-to-use fashion that the rest of the API is built upon. So check out JDOM today, and have fun!

Jason Hunter is a senior technologist with
Collab.net, a company that provides tools and services for open
source collaboration. In addition to being the cocreator of JDOM,
he is the author of Java Servlet Programming (O’Reilly)
and the publisher of He has
worked on projects from the largest (setting up an intranet
application for a Fortune 100 company) to the smallest (helping
develop a commercial product for a small startup). He contributes
to Apache’s Jakarta project and belongs to the working group
responsible for Servlet API development. Brett McLaughlin works as an Enterprise Java
consultant at Metro Information Services and specializes in
distributed systems architecture. In addition to cocreating JDOM,
he has written Java and XML (O’Reilly) and Enterprise
Applications in Java (O’Reilly). Brett is involved in
technologies such as Java servlets, Enterprise JavaBeans, XML, and
business-to-business applications. He is an active developer on the
Apache Cocoon project and EJBoss EJB server, and he is a cofounder
of the Apache Turbine project.

Source: www.infoworld.com