Blend the power of XML and databases using custom SAX and DOM APIs
Databases and XML offer complementary functionality for storing data. Databases store data for efficient retrieval, whereas XML offers an easy information exchange that enables interoperability between applications. To take advantage of XML’s features we can convert database tables into XML documents. We can then use XML tools with such documents for further processing. For example, XML documents can be presented as HTML pages with XSLT stylesheets, can be searched with XML-based query languages such as XQL, can be used as a data-exchange format, and so on. However, converting a database into an XML document is an expensive approach, one that requires not only the initial cost of conversion but also the subsequent costs of synchronizing both information sources.
TEXTBOX: TEXTBOX_HEAD: Abbreviations in this article API: application programming interface
DOM: Document Object Model
DTD: document type definition
JAXP: Java API for XML Parsing
JDBC: Java Database Connectivity
SAX: Simple API for XML
XML: Extensible Markup Language
XQL: XML Query Language
XSL: Extensible Stylesheet Language
XSLT: XSL Transformations
XT: An XSLT processor
:END_TEXTBOX
For processing XML documents, most XML tools work with the SAX or DOM API. In this article, we’ll look at a way to implement the same APIs directly over a database, enabling XML tools to treat databases as if they were XML documents. That way, we can obviate the need of converting a database.
We’ll see an implementation of the SAX API for Databases that should work with any database with a JDBC driver. Next, we’ll examine an implementation of the DOM API for Databases that uses the SAX API internally. To demonstrate the SAX API for Databases, we’ll look at its integration with XT (an XSLT processor). We’ll also see an example of how such integration can create HTML pages that incorporate an XSLT stylesheet directly from a database and how it can convert a database into an XML document. Finally, we’ll look at how the DOM API for Databases integrates with an XQL processor.
In this article, I make use of existing tools rather than create new tools to illustrate the applications of the SAX and DOM APIs for Databases. I show how to leverage a number of available XML tools to work with a database. All the XML tools I mention are either available for free or free for noncommercial use (though you should, of course, check licensing agreements).
Overview of the SAX and DOM APIs
SAX is an event-based API for XML. With it, the SAX parser reports events such as the start and end of elements to the application as it walks over the document. Because the parser reports events as it visits different parts of the document, it does not have to build any internal structure. That reduces the strain on system resources, which makes the parser attractive for large documents. For XML documents received as continuous streams, an event-based API is the only choice.
The DOM API, on the other hand, follows a treelike construct. Elements have parent-child relations with other elements. With this API, the parser builds an internal structure such that an application can navigate it in a treelike fashion. DOM allows an application to have random access to the tree-structured document at the cost of increased memory usage.
XML APIs for databases: The basics
Because of a database’s highly regular data-storage structure, we can map it into data-centric XML documents. For example, we can transform a database table into an XML document with a DTD of the following form:
<!ELEMENT table rows*>
<!ELEMENT rows (column1, column2, ...)>
<!ELEMENT column1 #PCDATA>
<!ELEMENT column2 #PCDATA>
....
In other words, with an XML API for databases, we can make the database look like an XML document; these APIs present the database as a virtual XML document. We are at the most basic concept of object-oriented design: it is the interface — not the implementation — that matters. In our situation, the tools using such an XML API need not care whether they are operating on a database table or an XML file.
Implementing the SAX API for Databases
To implement the SAX API for Databases, we need to implement a parser that operates on a JDBC data source, iterates over each row and column, and generates appropriate SAX events while iterating. The SAX specification provides the org.xml.sax.InputSource
class that models a data source representing a URL or a byte stream. To represent a database, we need a specialized form of it that can represent a table in a database. We therefore implement JDBCInputSource
, which extends the org.xml.sax.InputSource
class. Let’s look at JDBCInputSource
in more detail:
// JDBCInputSource.java
package dbxml.sax;
import java.sql.*;
import org.xml.sax.InputSource;
public class JDBCInputSource extends InputSource {
private String _connectionURL;
private String _userName;
private String _passwd;
private String _tableName;
public JDBCInputSource(String connectionURL, String userName,
String passwd, String tableName) {
super(connectionURL);
_connectionURL = connectionURL;
_userName = userName;
_passwd = passwd;
_tableName = tableName;
}
public String getTableName() {
return _tableName;
}
public Connection getConnection() throws SQLException {
return DriverManager.getConnection(_connectionURL, _userName, _passwd);
}
}
In the code above, the constructor takes the information needed to connect to a database and the name of the table to be parsed. The method getConnection()
connects to the database and returns a Connection
object.
Next, we need to implement the SAX parser that uses JDBCInputSource
to iterate over database table rows and columns and generates SAX events along the way. To simplify the code, we create an abstract ParserBase
class, which implements the org.xml.sax.Parser
and has responsibility only for managing various handlers. We then create our SAX parser for the JDBC source JDBCSAXParser
that extends the ParserBase
class.
(To view the code for ParserBase.java, click here.)
// JDBCSAXParser.java
package dbxml.sax;
import java.io.IOException;
import java.sql.*;
import org.xml.sax.*;
import org.xml.sax.helpers.AttributeListImpl;
public class JDBCSAXParser extends ParserBase {
private static final AttributeList _stockEmptyAttributeList
= new AttributeListImpl();
//------------------------------------------------------------------
// Methods from the Parser interface
//------------------------------------------------------------------
public void parse (InputSource source) throws SAXException, IOException {
if (! (source instanceof JDBCInputSource)) {
throw new SAXException("JDBCSAXParser can work only with source "
+ "of JDBCInputSource type");
}
parse((JDBCInputSource)source);
}
public void parse (String systemId) throws SAXException, IOException {
throw new SAXException("JDBCSAXParser needs more information to "
+ "connect to database");
}
//------------------------------------------------------------------
// Additional methods
//------------------------------------------------------------------
public void parse(JDBCInputSource source)
throws SAXException, IOException {
try {
Connection connection = source.getConnection();
if (connection == null) {
throw new SAXException("Could not establish connection with "
+ "database");
}
String sqlQuery = getSelectorSQLStatement(source.getTableName());
PreparedStatement pstmt = connection.prepareStatement(sqlQuery);
ResultSet rs = pstmt.executeQuery();
parse(rs, source.getTableName());
rs.close();
connection.close();
} catch (SQLException ex) {
throw new SAXException(ex);
}
}
public void parse(ResultSet rs, String tableName)
throws SAXException, SQLException, IOException {
if (_documentHandler == null) {
return; // nobody is interested in me, no need to sweat!
}
ResultSetMetaData rsmd = rs.getMetaData();
int numCols = rsmd.getColumnCount();
String tableMarker = getTableMarker(tableName);
String rowMarker = getRowMarker();
_documentHandler.startDocument();
_documentHandler.startElement(tableMarker, _stockEmptyAttributeList);
while(rs.next()) {
_documentHandler.startElement(rowMarker,
_stockEmptyAttributeList);
for (int i = 1; i <= numCols; i++) {
generateSAXEventForColumn(rsmd, rs, i);
}
_documentHandler.endElement(rowMarker);
}
_documentHandler.endElement(tableMarker);
_documentHandler.endDocument();
}
public void parse(String connectionURL, String userName, String passwd,
String tableName) throws SAXException, IOException {
parse(new JDBCInputSource(connectionURL, userName, passwd, tableName));
}
//------------------------------------------------------------------
// Protected methods that derived classes could override to
// customize the parsing.
//------------------------------------------------------------------
protected void generateSAXEventForColumn(ResultSetMetaData rsmd,
ResultSet rs,
int columnIndex)
throws SAXException, SQLException {
String columnValue = rs.getString(columnIndex);
if (columnValue == null) {
return;
}
String columnMarker
= getColumnMarker(rsmd.getColumnLabel(columnIndex));
char[] columnValueChars = columnValue.toCharArray();
_documentHandler.startElement(columnMarker,
_stockEmptyAttributeList);
_documentHandler.characters(columnValueChars,
0, columnValueChars.length);
_documentHandler.endElement(columnMarker);
}
protected String getTableMarker(String tableName) {
return tableName;
}
protected String getRowMarker() {
return "row";
}
protected String getColumnMarker(String columnName) {
return columnName;
}
protected String getSelectorSQLStatement(String tableName) {
return "select * from " + tableName;
}
}
Let’s examine the code in more detail. JDBCSAXParser
includes several overloaded parse()
methods. In the list below, the org.xml.sax.Parser
interface requires implementing the parse(InputSource)
and parse(String)
methods. The other parse()
methods simplify the code and allow derived classes to override them to modify the parser behavior.
-
The
parse(InputSource)
method calls theparse(JDBCInputSource)
method if the argument is of typeJDBCInputSource
; otherwise, it throws aSAXException
as it cannot deal with it. -
The
parse(String)
method throws aSAXException
as the information supplied is not sufficient to access the database. -
The
parse(JDBCInputSource)
method gets aConnection
object from the input source and executes a query to obtain aResultSet
object. It then callsparse(ResultSet)
with this object. -
The
parse(ResultSet, String)
method performs the core parsing logic. It iterates over each row in the result set and each column in the rows. The row iteration loop is surrounded by calls tostartElement()
andendElement()
with a table marker as the element-name argument. Similarly, each column iteration loop is surrounded by calls tostartElement()
andendElement()
with a row marker as the element-name argument. In both cases an empty attribute list passes as the second argument to thestartElement()
methods. On each visit to a column, thegenerateSAXEventForColumn()
method is called with column-name and column-value arguments. The value of a column is accessed by thegetString()
method on the result-set object, as we need a string representation of the column data to be notified in thecharacters()
SAX event. - The convenience method
parse(String, String, String, String)
simply creates aJDBCInputSource
object with the arguments passed to it and then calls theparse(JDBCInputSource)
method with it.
The protected methods of JDBCSAXParser
offer some customization possibilities through overriding:
-
The
generateSAXEventForColumn()
method generates events for column data. A null value for a column in a database has a different meaning from a column with an empty string. We capture the difference by not firing any events for a column that has a null value. Another choice for representing a null value in a database is to use a binary attribute likeisNull
. With this option, atrue
value will be set for null data; otherwise it will befalse
. -
The
getTableMarker()
,getRowMarker()
, andgetColumnMarker()
methods return reasonable defaults for table, row, and column markers. Derived classes may override these to provide custom markups. - The
getSelectorSQLStatement()
method returns a “select * from <table class="legacyTable"Name>
” string. Derived classes can override it to provide a different select query string to offer database-level filtering.
The JDBCSAXUtils
convenience class provides two methods for creating a JDBCInputSource
: it can be done from either a property file or a Property
object. There’s no need to supply a long list of parameters that describe a database to an application that uses either the SAX or DOM APIs for database. The class expects the user to supply a property file that contains entries for a database URL, a user name and password to connect to the database, a JDBC driver to establish a connection, and a table name. The code below demonstrates a typical property file:
# portfolio.prop
# JDBCSAXSource property file
URL=jdbc:odbc:db1
user=jw
password=jw-passwd
table=portfolio
driver=sun.jdbc.odbc.JdbcOdbcDriver
The story so far …
We now have a simple parser that can generate appropriate SAX events for the information in a database table. It takes care of null data and offers some marker customization. While such functionality may be sufficient for some applications, the complete solution will consider additional functionality because:
-
The parser does not incorporate relational information. That can be solved by using a XPointer/XLink to set the reference to a foreign key in a table.
-
A text column in a database may contain marked-up data. A SAX parser for databases should parse those data as well and generate appropriate SAX events. If such functionality is important for an application, it could override
generateSAXEventForColumn()
and parse the content of the column and generate additional SAX events. -
In databases, a table contains an unordered list of columns; the order in which columns are stored is not important. An XML DTD, on the other hand, does not have a way to describe an unordered collection of child elements.
We can deal with this problem in a few ways. If the task is to convert a database into another XML document, say an HTML page, the XSLT stylesheet written for that purpose can create output in the correct order. We could also override the
getSelectorSQLStatement()
method to supply an explicit list of columns in the correct order. -
It is desirable to present only a selected part of a table as a document based on some query. While XML tools can do the filtering, databases are better at it. The
getSelectorSQLStatement()
method can be overridden to return the appropriate select query string. -
The parser uses the
getString()
method on the result-set object to obtain the string representation of the value in a column. This works fine for columns with text, numbers, and so on, but it does not work well with binary data. While binary data can be represented as text, that may not be suitable for certain tasks. The parser also does not deal with user-defined types available with SQL3/JDBC 2.0.We can solve both problems by overriding the
generateSAXEventForColumn()
method and providing a suitable implementation.
Implementing the DOM API for Databases
To build a DOM tree for a database table, we could iterate over rows and columns and build nodes for a tree as we visit them. Or we could employ another library, like Sun’s JAXP implementation, that builds a DOM tree from a SAX event stream. The latter approach is simpler, and it requires less coding because it reuses an existing facility. To implement the DOM API using such an approach we need just a clever reuse of the SAX parser for databases we implemented earlier.
Class JDBCDOMParser
implements the DOM API for Databases:
// JDBCDOMParser.java
package dbxml.dom;
import java.io.IOException;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
import com.sun.xml.tree.XmlDocumentBuilder;
import dbxml.sax.*;
public class JDBCDOMParser {
public static Document createDocument(JDBCInputSource inputSource)
throws SAXException, IOException {
XmlDocumentBuilder documentBuilder = new XmlDocumentBuilder();
JDBCSAXParser saxParser = new JDBCSAXParser();
documentBuilder.setParser(saxParser);
saxParser.parse(inputSource);
return documentBuilder.getDocument();
}
}
The implementation for class JDBCDOMParser
is simple. It uses the XmlDocumentBuilder
class provided by JAXP to build a DOM document from a SAX event stream. The JDBCDOMParser
has only one method: createDocument()
, which takes a JDBC data source as the argument. The method creates a JDBCSAXParser
and sets it as a parser for an XmlDocumentBuilder
object. It then fires the parsing and returns the resulting document from the XmlDocumentBuilder
object. Internally, the XmlDocumentBuilder
object responds to the SAX events generated by the JDBCSAXParser
object by building a DOM document.
Using the SAX API for Databases
We have already looked at one example of using the SAX API for Databases to implement a DOM API for Databases. Here we look at another example of using the SAX API for Databases. In this section, we look at using the SAX API for Databases to integrate with XT — an XSLT processor written in Java. With such integration, we can apply an XSLT stylesheet directly to the virtual XML documents stored in a database.
We wrap the logic of creating a SAX source for a given database source and processing it with the given XSLT stylesheet to produce an output file in class JDBCXSLProcessor
, which is based on com.jclark.xsl.sax.Driver
from XT. The main method takes three arguments: a database property file, an XSLT stylesheet file, and an output file.
As we’ll see below, we can use this approach to generate HTML pages directly without incurring the penalty of creating an intermediate XML file. Moreover, we’ll look at how we can use the integration of the SAX API for databases and XT to convert a database into a nonvirtual XML document.
(To view the source code for JDBCXSLProcessor.java
, click here.)
Generating HTML pages directly from a database using an XSLT stylesheet
Here we look at a simple stylesheet that formats a highly regular XML document represented by a database table. The database table is formatted as an HTML table. The generic stylesheet createTable.xsl
can be used to transform any XML document with a tablelike structure. The stylesheet uses names of markers for columns as table headers.
(To view the stylesheet createTable.xsl
, click here.)
Converting a database to XML with an XSLT stylesheet
While most XML applications work with the SAX or DOM API, we may need to obtain an XML document file in some situations. For example, we need an XML document to use a tool that does not work with either API. Here, I suggest a way to convert a database into an XML file. With this approach, we write an XSLT stylesheet to do the identity transformation. Using such a stylesheet with the SAX API for Databases will yield an XML document that represents a table in the database. I have provided a stylesheet — identity.xsl
— for the identity transformation that works with the current implementation of XT.
identity.xsl
<xsl:stylesheet version="1.0"
xmlns:xsl="
<xsl:template match="@*|*">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Note that while an XSLT stylesheet easily performs the identity transformation, it is not very efficient as it goes through the complex general-purpose logic of applying the full stylesheet. This inefficiency can be a particular problem for database tables with large numbers of records. An alternative approach is to write an application to perform the identity transformation. Such an application would listen to SAX events and create XML elements and data in response to them, resulting in an XML document that represents the table in the database.
Using the DOM API for Databases
For most situations, the SAX API for Databases is more memory efficient than the DOM API for Databases. However, some applications need random access to the XML documents and therefore require the treelike structure that the DOM API for Databases offers.
Integrate the DOM API for Databases with the XQL processor
XML Query Language (XQL), a query language for XML documents, has a syntax similar that of to XPath patterns. Here, we integrate our DOM parser for database with GMD-IPSI’s XQL Engine. With such integration, we can perform SQL-like queries on the XML document representing a database table.
As an example of integration, I provide a simple shell for querying the XML document obtained from a database table. The class JDBCXQLProcessor
creates a shell-like environment, which takes the queries from the user and prints the resulting document. The method processQueries()
can work with any Document
object — not just objects created by the JDBCDOMParser
. It reads the System.in
for the query string, performs the query, and prints the result on System.out
. The main()
method creates a JDBCInputSource
object from its argument and uses it with JDBCDOMParser
to obtain a Document
object corresponding to the given database table.
(To view the code for JDBCXQLProcessor.java
, click here.)
As a side note, writing a database-to-XML converter is a snap with XMLWriter
and JDBCDOMParser
. Just get a Document
object from JDBCDOMParser
and write it to the desired file with XMLWriter.write(Document)
.
Conclusion
In this article, we discussed using the XML APIs for database to tap the information residing in databases. With such APIs, we can avoid the costs associated with converting a database into XML documents and the cost of keeping them synchronized. We presented a Java implementation of SAX and DOM APIs for databases, which can work with any database with a JDBC driver. We then presented an integration of SAX API for Databases with XT. We illustrated the uses of such integration to create HTML pages directly from a database and to convert databases into XML documents. Finally, we integrated the DOM API for Databases with an XQL processor.