Server-Side Java: Using XML and JSP together

Two great tastes that taste great together

For the purpose of this article I’m going to assume that you know what JavaServer Pages (JSP) and Extensible Markup Language (XML) are, but you may be a little unclear on how you can use them. JSP use is pretty easy to defend. It allows you to design a Website built from files that look and act a lot like HTML. The only difference is that JSPs also act dynamically — for example, they can process forms or read databases — using Java as a server-side scripting language. XML use is more difficult to justify. While it seems as if every new product supports it, each one seems to be using XML for a different purpose.

In this article, you will learn to design a system using XML in a fairly modest way. Many Websites have vast collections of data that are displayed in a more or less standard way. I will design a system that uses XML files to store data on a Web server and JSP files to display that data.

XML versus relational databases

“But wait,” you may ask, “you’re using XML to store data? Why not use a database?” Good question. The answer is that for many purposes, a database is overkill. To use a database, you have to install and support a separate server process, which often also requires installing and supporting a database administrator. You must learn SQL, and write SQL queries that convert data from a relational to an object structure and back again. If you store your data as XML files, you lose the overhead of an extra server. You also gain an easy way to edit your data: just use a text editor, rather than a complicated database tool. XML files are also easier to back up, to share with your friends, or to download to your clients. You can also easily upload new data to your site, using FTP.

A more abstract advantage of XML is that, being a hierarchical rather than a relational format, it can be used in a much more straightforward manner to design data structures that fit your needs. You don’t need to use an entity relationship editor nor normalize your schema. If you have one element that contains another element, you can represent that directly in the format, rather than using a join table.

Note that for many applications, a filesystem will not suffice. If you have a high volume of updates, a filesystem may get confused or corrupted by simultaneous writes; databases usually support transactions, which allow concurrency without corruption. Further, a database is an excellent tool if you need to make complicated queries, especially if they will vary from time to time. Databases build indexes, and are optimized for keeping the indexes up to date with a constantly changing data set. Relational databases also have many other advantages, including a rich query language, mature authoring and schema design tools, proven scalability, fine-grained access control, and so on.

(Note: You can use simple file locking to provide a poor man’s transaction server. And you can also implement an XML index-and-search tool in Java, but that’s a topic for another article.)

In this case, as in most low-to-medium volume, publishing-based Websites, you can assume the following: most of the data access is reads, not writes; the data, though potentially large, is relatively unchanging; you won’t need to do complicated searches, but if you do, you’ll use a separate search engine. The advantages of using a mature RDBMS fade, while the advantage of using an object-oriented data model come to the fore.

Finally, it’s entirely possible to provide a wrapper for your database that makes SQL queries and translates them into XML streams, so you could have it both ways. XML becomes a more robust, programmer-friendly frontend to a mature database for storing and searching. (Oracle’s XSQL servlet is one example of this technique.)

The application: An online photo album

Everybody loves photos! People love showing pictures of themselves, their friends, their pets, and their vacations. The Web is the ultimate medium for self-indulgent shutterbugs — they can annoy their relatives from thousands of miles away. While a full-fledged photo album site would require a complicated object model, I’ll focus on defining a single Picture object. (The source code for this application is available in Resources.) The object representing a picture needs fields representing its title, the date it was taken, an optional caption, and, obviously, a pointer to the image source.

An image, in turn, needs a few fields of its own: the location of the source file (a GIF or JPEG) and the height and width in pixels (to assist you in building <img> tags). Here there is one neat advantage to using the filesystem as your database: you can store the image files in the same directory as the data files.

Finally, let’s extend the picture record with an element defining a set of thumbnail images for use in the table of contents or elsewhere. Here I use the same concept of image I defined earlier.

The XML representation of a picture could look something like this:

<picture>
  <title>Alex On The Beach</title>
  <date>1999-08-08</date>
  <caption>Trying in vain to get a tan</caption>
  <image>
    <src>alex-beach.jpg</src>
    <width>340</width>
    <height>200</height>
  </image>
  <thumbnails>
    <image>
      <src>alex-beach-sm.jpg</src>
      <width>72</width>
      <height>72</height>
    </image>
    <image>
      <src>alex-beach-med.jpg</src>
      <width>150</width>
      <height>99</height>
    </image>
  </thumbnails>
</picture>

Note that by using XML, you put all the information about a single picture into a single file, rather than scattering it among three or four separate tables. Let’s call this a .pix file — so your filesystem might look like this:

 summer99/alex-beach.pix
 summer99/alex-beach.jpg
 summer99/alex-beach-sm.jpg
 summer99/alex-beach-med.jpg
 summer99/alex-snorkeling.pix
 etc.

Techniques

There’s more than one way to skin a cat, and there’s more than one way to bring XML data on to your JSP page. Here is a list of some of those ways. (This list is not exhaustive; many other products and frameworks would serve equally well.)

DOM: You can use classes implementing the DOM interface to parse and inspect the XML file
XMLEntryList: You can use my code to load the XML into a java.util.List of name-value pairs
XPath: You can use an XPath processor (like Resin) to locate elements in the XML file by path name
XSL: You can use an XSL processor to transform the XML into HTML
Cocoon: You can use the open source Cocoon framework
Roll your own bean: You can write a wrapper class that uses one of the other techniques to load the data into a custom JavaBean

Note that these techniques could be applied equally well to an XML stream you receive from another source, such as a client or an application server.

JavaServer Pages

The JSP spec has had many incarnations, and different JSP products implement different, incompatible versions of the spec. I will use Tomcat, for the following reasons:

It supports the most up-to-date versions of the JSP and servlet specs
It’s endorsed by Sun and Apache
You can run it standalone without configuring a separate Web server
It’s open source

(For more information on Tomcat, see Resources.)

You are welcome to use any JSP engine you like, but configuring it is up to you! Be sure that the engine supports at least the JSP 1.0 spec; there were many changes between 0.91 and 1.0. The JSWDK (Java Server Web Development Kit) will work just fine.

The JSP structure

When building a JSP-driven Website (also known as a Webapp), I prefer to put common functions, imports, constants, and variable declarations in a separate file called init.jsp, located in the source code for this article.

I then load that file into each JSP file using <%@include file="init.jsp"%>. The <%@include%> directive acts like the C language’s #include, pulling in the text of the included file (here, init.jsp) and compiling it as if it were part of the including file (here, picture.jsp). By contrast, the <jsp:include> tag compiles the file as a separate JSP file and embeds a call to it in the compiled JSP.

Finding the file

When the JSP starts, the first thing it needs to do after initialization is find the XML file you want. How does it know which of the many files you need? The answer is from a CGI parameter. The user will invoke the JSP with the URL picture.jsp?file=summer99/alex-beach.pix (or by passing a file parameter through an HTML form).

However, when the JSP receives the parameter, you’re still only halfway there. You still need to know where on the filesystem the root directory lies. For example, on a Unix system, the actual file may be in the directory /home/alex/public_html/pictures/summer99/alex-beach.pix. JSPs do not have a concept of a current directory while executing, so you need to provide an absolute pathname to the java.io package.

The Servlet API provides a method to turn a URL path, relative to the current JSP or Servlet, into an absolute filesystem path. The method ServletContext.getRealPath(String) does the trick. Every JSP has a ServletContext object called application, so the code would be:

String picturefile =
  application.getRealPath("/" + request.getParameter("file"));

String picturefile =
  getServletContext().getRealPath("/" + request.getParameter("file"));

which also works inside a servlet. (You must append a / because the method expects to be passed the results of request.getPathInfo().)

One important note: whenever you access local resources, be very careful to validate the incoming data. A hacker, or a careless user, can send bogus data to hack your site. For instance, consider what would happen if the value file=../../../../etc/passwd were entered. The user could in this way read your server’s password file.

The Document Object Model

DOM stands for the Document Object Model. It is a standard API for browsing XML documents, developed by the World Wide Web Consortium (W3C). The interfaces are in package org.w3c.dom and are documented at the W3C site (see Resources).

There are many DOM parser implementations available. I have chosen IBM’s XML4J, but you can use any DOM parser. This is because the DOM is a set of interfaces, not classes — and all DOM parsers must return objects that faithfully implement those interfaces.

Unfortunately, though standard, the DOM has two major flaws:

The API, though object-oriented, is fairly cumbersome.
There is no standard API for a DOM parser, so, while each parser returns a org.w3c.dom.Document object, the means of initializing the parser and loading the file itself is always parser specific.

The simple picture file described above is represented in the DOM by several objects in a tree structure.

Document Node
 --> Element Node "picture"
      --> Text Node "n  " (whitespace)
      --> Element Node "title"
        --> Text Node "Alex On The Beach"
      --> Element Node "date"
           --> ... etc.

To acquire the value Alex On The Beach you would have to make several method calls, walking the DOM tree. Further, the parser may choose to intersperse any number of whitespace text nodes, through which you would have to loop and either ignore or concatenate (you can correct this by calling the normalize() method). The parser may also include separate nodes for XML entities (like &), CDATA nodes, or other element nodes (for instance, the <b>big<b> bear would turn into at least three nodes, one of which is a b element, containing a text node, containing the text big). There is no method in the DOM to simply say “get me the text value of the title element.” In short, walking the DOM is a bit cumbersome. (See the XPath section of this article for an alternative to DOM.)

From a higher perspective, the problem with DOM is that the XML objects are not available directly as Java objects, but they must be accessed piecemeal via the DOM API. See my conclusion for a discussion of Java-XML Data Binding technology, which uses this straight-to-Java approach for accessing XML data.

I have written a small utility class, called DOMUtils, that contains static methods for performing common DOM tasks. For instance, to acquire the text content of the title child element of the root (picture) element, you would write the following code:

Document doc = DOMUtils.xml4jParse(picturefile);
Element nodeRoot = doc.getDocumentElement();
Node nodeTitle = DOMUtils.getChild(nodeRoot, "title");
String title = (nodeTitle == null) ? null : DOMUtils.getTextValue(nodeTitle);

Getting the values for the image subelements is equally straightforward:

Node nodeImage = DOMUtils.getChild(nodeRoot, "image");
Node nodeSrc = DOMUtils.getChild(nodeImage, "src");
String src =  DOMUtils.getTextValue(nodeSrc);

And so on.

Once you have Java variables for each relevant element, all you must do is embed the variables inside your HTML markup, using standard JSP tags.

 <table class="legacyTable" bgcolor="#FFFFFF" border="0" cellspacing="0" cellpadding="5">
 <tr>
 <td align="center" valign="center">
 <img src=" width="<%=width%>" height="<%=height%>" border="0" alt="></td>
 </tr>
 </table>

See the full source code for more details. The HTML output produced by the JSP file — an HTML screenshot, if you will — is in picture-dom.html.

Use JSP beans for model/view separation

All that code at the top of picture-dom.jsp, located in the source code, is unattractive. While you can put hundreds of lines of Java code inside a JSP, a cleaner approach exists: you can use JSP JavaBeans to store significant amounts of Java code, while reserving the use of JSP scriptlet tags (<% and %>) for control flow and minor variable manipulation inside the JSP page.

For prototyping purposes, it is generally easier to start a project by throwing all your Java code inside the JSP. Once you have a better idea of your needs, you can go back and extract the code and write some JavaBeans. The investment is higher, but so is the payoff in the long run, since your applications will be more modular. You can use the same beans in several different pages without the horror of copy-and-paste code reuse.

In our case, a clear candidate for a JSP JavaBean is the code that extracts String values from an XML file. You can define classes Picture, Image, and Thumbnails, representing the major elements in the XML file. These beans will have constructors or setter methods that take in a DOM node or a filename from which to extract their values. You can browse the picturebeans package source directory from the source code file in Resources.

When looking through the source, be sure to notice the following:

I defined interfaces separately from implementation classes, so you are free to choose alternate implementations in the future. You may want to store the values in a List, in the DOM itself, or even in a database.
The beans are defined in a custom package, picturebeans. All JSP beans must be in a package; most JSP engines won’t be able to find classes that are in the default package.
I provided set methods in addition to get methods. At the moment, you’re only reading; however, in the future, you may want to let users edit pictures, so you need to plan for the ability to change and write properties.
I now have to say <%=picture.getCaption()%> instead of just <%=caption%>, since the values are stored in a bean rather than in local variables. However, if you want, you can define local variables like String caption = picture.getCaption();. This is acceptable because it makes the code a little easier to read and understand.

Zooming through thumbnails

You may have noticed that the output from my first JSP, picture-dom.html, used the full-sized source image file. Let’s change the code slightly, so that instead of showing the full-sized image, it shows a smaller, thumbnail version. I will use the list of thumbnail images stored in the XML data file.

Let’s define a parameter, zoom, whose value determines which of the thumbnail images to display. Clicking on the thumbnail will show the full-sized raw image source; clicking on a Zoom In or Zoom Out button will select the next or previous thumbnail in the list.

Since the Thumbnails object returns a java.util.List of Image objects, finding the right thumbnail couldn’t be easier: just say (Image)picture.getThumbnails().get(i).

To build the Zoom In and Zoom Out links, you must generate a recursive reference to the same page, with different parameters. For this, you use the request.getRequestURI() method. This only gives you the path to the servlet, with no parameters, so you can then tack on the parameters you want.

<% 
if (zoom < (thumbnails.size() -1)) { 
     out.print("<a href="" + 
            request.getRequestURI() +
            "?file=" + request.getParameter("file") +
            "&zoom=" + (zoom+1) +
            "">");
     out.print("Zoom In</a>");
} 
%>

Here is an HTML screenshot of the working JSP page.

Using JSP bean tags

The JSP spec defines the <jsp:useBean> tag for automatically instantiating and using JavaBeans from a JSP page. The useBean tag can always be replaced by embedded Java code, and that’s what I’ve done here. For this reason, many people question the need for the useBean and setProperty tags. The arguments in favor of these tags are:

The tag syntax is arguably less intimidating to HTML designers.
useBean has a scope parameter that automatically determines whether the bean should be stored as a local variable, a session variable, or an application attribute.
If the variable is persistent (session or application), useBean initializes it if necessary, but fetches the variable if it already exists.
The tags are potentially more portable to future versions of the JSP spec, or alternate implementations (for example, a hypothetical JSP engine that stores variables in a database, or shares them across server processes).

The equivalent useBean syntax for this application is:

<jsp:useBean id="picture" scope="request" class="picturebeans.DOMPicture">
 <% 
  Document doc = DOMUtils.xml4jParse(picturefile);
  Element nodeRoot = doc.getDocumentElement();
  nodeRoot.normalize();
  picture.setNode(nodeRoot);
 %>
</jsp:useBean>

or, if you define a setFile(String) method inside DOMBean:

<jsp:useBean id="picture" scope="request" class="picturebeans.DOMPicture">
 <jsp:setProperty name="picture" property="file" value="<%=picturefile%>"/>
</jsp:useBean>

Using XMLEntryList

To overcome some of the difficulties of using the DOM APIs, I have created a class called XMLEntryList. This class implements the Java Collections interface java.util.List, as well as the get and put methods of java.util.Map, providing a more intuitive set of methods with which to traverse a simple XML tree structure. You can use the standard abstraction of the Collections API to do things like acquire iterators and subviews. Each entry in an EntryList has a key and a value, like a Map; the keys are the names of the child nodes, and the values are either Strings or child XMLEntryLists.

XMLEntryList is not meant to be a full replacement for the DOM; it cannot perform several DOM functions. However, it is a convenient wrapper for performing basic getting, setting, and list-oriented functions on your XML data structure. For instance, to get the caption element of the picture node, you can say:

 String caption = (String)picturelist.get("caption");

The value of the caption field has already been parsed and stored as a String.

Caching

Whatever its advantages, parsing an XML file takes time. To improve the performance of XML-based applications, you must use some sort of cache. This cache must store XML objects in memory based on the name of the file from which they came. If the file has been modified in the time since the object was loaded, then the object must be reloaded. I have implemented a simple implementation of this data structure, called CachedFS.java. You can feed a CachedFS callback function, using inner classes, that actually performs the XML parsing, transforming a file into an object. The cache then stores that object in memory.

Here is the code for creating a cache. This object has application scope, so future requests get to use the same object cache. I will put this code in init.jsp, so that you don’t need to copy and paste the initialization code into the other JSPs in the Webapp. In general, you should define application-scope objects in a common location, so you don’t end up with different initialization routines in different areas.

<jsp:useBean id="cache" class="com.purpletech.io.CachedFS" scope="application">
 <% cache.setRoot(application.getRealPath("/"));
    cache.setLoader( new CachedFS.Loader() {
      // load in a single Picture file
      public Object process(String path, InputStream in) throws IOException
      {
          try {
           Document doc = DOMUtils.xml4jParse
               (new BufferedReader(new InputStreamReader(in)));
           Element nodeRoot = doc.getDocumentElement();
           nodeRoot.normalize();
           Picture picture = new DOMPicture(nodeRoot);
           return picture;
          }
          catch (XMLException e) {
           e.printStackTrace();
           throw new IOException(e.getMessage());
          }
      }
     });
 %>
</jsp:useBean>

XPath

XPath is a simple syntax for locating nodes in an XML tree. It is easier to use than DOM, since instead of making method calls each time you want to step to another node, you embed the entire path in a string — for example, /picture/thumbnails/image[2]. The Resin product, by Caucho (see Resources), includes an XPath processor that you can use in your own apps. You can use the Caucho XPath object on its own, without buying into the rest of the Resin framework.

Node verse = XPath.find("chapter/verse", node);

Resin also includes a scripting language, compatible with JavaScript, that allows easy access to XPath and XSL functionality from inside your JSP.

XSL

This article has discussed embedding Java inside a JSP to extract data from XML nodes. There is another popular model for accomplishing this task: the Extensible Stylesheet Language (XSL). This model is radically different than the JSP model I’ve been discussing. In JSP, the main document is HTML, containing snippets of Java code; in XSL, the main document is an XSL document, containing snippets of HTML. There is a lot to say about the relationship between XSL and Java/JSP, much more than I have space for here. A future article in JavaWorld will explore using XSL and JSP together.

Conclusion and routes for future advancements

After reading this tutorial, you should have a good idea of the structure of a JSP-XML application, and of its power. You should also have some idea of its limitations.

The most tedious part of developing a JSP-XML application is creating JavaBeans for each of the elements in your XML schema. The XML Data Binding group is developing technology that will automatically generate Java classes from a given schema. Also, I have developed a prototype open source Java-XML data binding technology. And IBM alphaWorks has recently released XML Master, or XMas, another XML-Java data binding system.

Another possibility is to expand the functionality of the filesystem, building some more powerful features, such as queries and transactions. Naturally, I am contemplating implementing this type of functionality as an open source project as well. Anybody want to write an XML search engine?

Alex Chaffee is a software
guru with jGuru. He has been
promoting, teaching, and programming in Java since 1995. As the
director of software engineering for Earthweb, Alex cocreated
Gamelan, a directory for the Java community. He has presented at
numerous users groups and conferences, written articles for several
Java magazines, and contributed to the book The Official
Gamelan Java Directory. You can see his source code at
JavaWorld and jGuru have formed a partnership to help the
community better understand server-side Java technology. Together,
JavaWorld and jGuru are jointly producing articles and
free educational Web events.

Source: www.infoworld.com