Dodge the traps hiding in the URLConnection class

The URLConnection class’s generic design causes snags when posting to a URL

A pitfall is Java code that compiles fine but leads to erroneous, and sometimes disastrous, results. Avoiding pitfalls can save you hours of frustration. In this article, I will present a pitfall you might encounter when posting to a URL, and another that plagues Java beginners.

Pitfall 5: The hidden complexity of posting to a URL

As the Simple Object Access Protocol (SOAP) and other XML remote procedure calls (RPCs) continue to grow in popularity, posting to a URL will become a more common and more important operation — it is the method for sending the SOAP or RPC request to the respective server.

While implementing a standalone SOAP server, I stumbled upon multiple pitfalls associated with posting to a URL, starting with the nonintuitive design of the URL-related classes and ending with specific usability pitfalls in the URLConnection class.

A simple HttpClient class would be the most direct way to perform an HTTP post operation on a URL, but after scanning the java.net package, you’ll come up empty. Some open source HTTP clients are available, but I have not tested them. (If you have tested those clients, drop me an email regarding their utility and stability.) Interestingly, there is an HttpClient in the sun.net.www.http package that is shipped with the JDK (and used by HttpURLConnection), but it is not part of the public API. Instead, the java.net URL classes were designed to be extremely generic and take advantage of dynamic class-loading of both protocols and content handlers. Before we jump into the specific problems with posting, let’s examine the overall structure of the classes we will use (either directly or indirectly).

This UML diagram of the URL-related classes in the java.net package illustrates the classes’ interrelatedness. (The diagram was created with ArgoUML — see Resources for a link.) For brevity’s sake, the diagram shows only key methods and no data members.

URL classes

Pitfall 5 centers on the main class: URLConnection. However, you cannot instantiate that class directly — it is abstract. Instead, you will receive a reference to a specific subclass of URLConnection via the URL class.

Admittedly, the figure above is complex. The general sequence of events works like this: A static URL commonly specifies the location of some content and the protocol needed to access it. The first time the URL class is used, a URLStreamHandlerFactory singleton is created. That factory generates an URLStreamHandler that understands the access protocol specified in the URL. The URLStreamHandler instantiates the appropriate URLConnection class, which opens a connection to the URL and instantiates the appropriate ContentHandler to handle the content at the URL.

So what is the problem? Because of the classes’ overly generic design, they lack a clear conceptual model. In his book, The Design of Everyday Things (Doubleday, 1990), Donald Norman states that one of the primary principles of good design is a sound conceptual model that allows us to “predict the effects of our actions.” Some problems with the URL classes’ conceptual model include:

  • The URL class is conceptually overloaded. A URL is merely an abstraction for an address or an endpoint. In fact, a better design would feature URL subclasses that differentiate static resources from dynamic services. Missing conceptually is a URLClient class that uses the URL as the endpoint to read from or write to.
  • The URL class is biased toward retrieving data from a URL. There are three methods that retrieve content from a URL, but only one that writes data to a URL. That disparity would be better served with a URL subclass for static resources that only has a read operation; the URL subclass for dynamic services would have both read and write methods. That design would provide a clean conceptual model for use.
  • Calling the protocol handlers “stream” handlers is confusing because their primary purpose is to generate (or build) a connection. A better model would emulate the Java API for XML Processing (JAXP), where a DocumentBuilderFactory produces a DocumentBuilder, which produces a Document. Applying that model to the URL classes would yield a URLConnectorFactory that generates a URLConnector that produces a URLConnection.

Now you are ready to tackle the URLConnection class and attempt to post to a URL. The goal is to create a simple Java program that posts some text to a common gateway interface (CGI) program. To test the programs, I created a simple CGI program in C that echoes (in an HTML wrapper) whatever passes into it. (See Resources to download the source code for any program in this article, including the CGI program.)

The URLConnection class has getOutputStream() and getInputStream() methods, just like the Socket class. Based on that similarity, you would expect that sending data to a URL would be as easy as writing data to a Socket. Armed with that information and an understanding of the HTTP protocol, we write the program in Listing 5.1, BadURLPost.java.

Listing 5.1 BadURLPost.java

package com.javaworld.jpitfalls.article3;
import java.net.*;
import java.io.*;
public class BadURLPost
{
    public static void main(String args[])
    {
        // get an HTTP connection to POST to
        if (args.length < 1)
        {
            System.out.println("USAGE: java GOV.dia.mditds.util.BadURLPost 
url");
            System.exit(1);
        }
        try
        {
            // get the url as a string
            String surl = args[0];
            URL url = new URL(surl);
            URLConnection con = url.openConnection();
            System.out.println("Received a : " + con.getClass().getName());
            con.setDoInput(true);
            con.setDoOutput(true);
            con.setUseCaches(false);
            String msg = "Hi HTTP SERVER! Just a quick hello!";
            con.setRequestProperty("CONTENT_LENGTH", "5"); // Not checked
            con.setRequestProperty("Stupid", "Nonsense");
            System.out.println("Getting an input stream...");
            InputStream is = con.getInputStream();
            System.out.println("Getting an output stream...");
            OutputStream os = con.getOutputStream();
            /*
            con.setRequestProperty("CONTENT_LENGTH", "" + msg.length());
            Illegal access error - can't reset method.
            */
            OutputStreamWriter osw = new OutputStreamWriter(os);
            osw.write(msg);
            osw.flush();
            osw.close();
            System.out.println("After flushing output stream. ");
            // any response?
            InputStreamReader isr = new InputStreamReader(is);
            BufferedReader br = new BufferedReader(isr);
            String line = null;
            while ( (line = br.readLine()) != null)
            {
                System.out.println("line: " + line);
            }
        } catch (Throwable t)
          {
            t.printStackTrace();
          }
    }
}

A run of Listing 5.1 produces:

E:classescomjavaworldjpitfallsarticle3>java -Djava.compiler=NONE 
com.javaworld.jpitfalls.article3.BadURLPost 

Received a : sun.net.www.protocol.http.HttpURLConnection
Getting an input stream...
Getting an output stream...
java.net.ProtocolException: Can't reset method: already connected
        at 
java.net.HttpURLConnection.setRequestMethod(HttpURLConnection.java:10
2)
        at 
sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLCo
nnection.java:349)
        at 
com.javaworld.jpitfalls.article2.BadURLPost.main(BadURLPost.java:38)

When we try to obtain the HttpURLConnection class’s output stream, the program informs us that we cannot reset the method because we are already connected. The Javadoc for the HttpURLConnection class contains no reference to setting a method. The program is referring to the HTTP method, which should be POST when we send data to the URL and GET when we retrieve data from the URL.

The getOutputStream() method causes the program to throw a ProtocolException with the error message “Can’t reset the method.” The JDK source code reveals that the error message results because the getInputStream() method has the side effect of sending the request (whose default request method is GET) to the Web server. This is similar to a side effect in the ObjectInputStream and ObjectOutputStream constructors, detailed in my book, Java Pitfalls: Time Saving Solutions and Workarounds to Improve Programs (John Wiley & Sons, 2000).

The pitfall is the assumption that the getInputStream() and getOutputStream() methods behave just as they do for a Socket connection. Since the underlying mechanism for communicating to the Web server actually is a Socket, it is not an unreasonable assumption. A better implementation of HttpURLConnection would postpone the side effects until the initial read or write to the respective input or output stream. You can do that by creating an HttpInputStream and an HttpOutputStream, which would keep the Socket model intact. You could argue that HTTP is a request/response stateless protocol, and the Socket model does not fit. Nevertheless, the API should fit the conceptual model; if the current model is identical to a Socket connection, it should behave as such. If it does not, you have stretched the bounds of abstraction too far.

In addition to the error message, there are two problems with the above code:

  • The setRequestProperty() method parameters are not checked, which we demonstrate by setting a property called stupid with a value of nonsense. Since those properties actually go into the HTTP request and are not validated by the method (as they should be), you must take extra care to ensure that the parameter names and values are correct.
  • Although the code is commented out, it is also illegal to attempt to set a request property after obtaining an input or output stream. The documentation for URLConnection indicates the sequence to set up a connection, although it does not state that it is a mandatory sequence.

If we did not have the luxury of examining the source code — which should definitely not be a requirement to use an API — we would be reduced to trial and error, the absolute worst way to program. Neither the documentation nor the API of the HttpURLConnection class afford us any understanding of how the protocol is implemented, so we feebly attempt to reverse the order of calls to getInputStream() and getOutputStream(). Listing 5.2, BadURLPost1.java, is an abbreviated version of that program.

Listing 5.2 BadURLPost1.java

package com.javaworld.jpitfalls.article3;
import java.net.*;
import java.io.*;
public class BadURLPost1
{
    public static void main(String args[])
    {
// ...
        try
        {
// ...
            System.out.println("Getting an output stream...");
            OutputStream os = con.getOutputStream();
            System.out.println("Getting an input stream...");
            InputStream is = con.getInputStream();
// ...
        } catch (Throwable t)
          {
            t.printStackTrace();
          }
    }
}

A run of Listing 5.2 produces:

E:classescomjavaworldjpitfallsarticle3>java -Djava.compiler=NONE 
com.javaworld.jpitfalls.article3.BadURLPost1 

Received a : sun.net.www.protocol.http.HttpURLConnection
Getting an output stream...
Getting an input stream...
After flushing output stream.
line: <HEAD>
line: <TITLE> Echo CGI program </TITLE>
line: </HEAD>
line: <BODY BGCOLOR='#ebebeb'><CENTER>
line: <H2> Echo </H2>
line: </CENTER>
line: No content! ERROR!
line: </BODY>
line: </HTML>

Although the program compiles and runs, the CGI program reports that no data was sent! Why? The side effects of getInputStream() bite us again, causing the POST request to be sent before anything is placed in the post’s output buffer, thus sending an empty POST request.

After failing twice, we understand that getInputStream() is the key method that actually writes the requests to the server. Therefore we must perform the operations serially (open output, write, open input, read) as we do in Listing 5.3, GoodURLPost.

Listing 5.3 GoodURLPost.java

package com.javaworld.jpitfalls.article3;
import java.net.*;
import java.io.*;
public class GoodURLPost
{
    public static void main(String args[])
    {
        // get an HTTP connection to POST to
        if (args.length < 1)
        {
            System.out.println("USAGE: java 
GOV.dia.mditds.util.GoodURLPost url");
            System.exit(1);
        }
        try
        {
            // get the url as a string
            String surl = args[0];
            URL url = new URL(surl);
            URLConnection con = url.openConnection();
            System.out.println("Received a : " + con.getClass().getName());
            con.setDoInput(true);
            con.setDoOutput(true);
            con.setUseCaches(false);
            String msg = "Hi HTTP SERVER! Just a quick hello!";
            con.setRequestProperty("CONTENT_LENGTH", "" + msg.length()); 
// Not checked
            System.out.println("Msg Length: " + msg.length());
            System.out.println("Getting an output stream...");
            OutputStream os = con.getOutputStream();
            OutputStreamWriter osw = new OutputStreamWriter(os);
            osw.write(msg);
            osw.flush();
            osw.close();
            System.out.println("After flushing output stream. ");
            System.out.println("Getting an input stream...");
            InputStream is = con.getInputStream();
            // any response?
            InputStreamReader isr = new InputStreamReader(is);
            BufferedReader br = new BufferedReader(isr);
            String line = null;
            while ( (line = br.readLine()) != null)
            {
                System.out.println("line: " + line);
            }
        } catch (Throwable t)
          {
            t.printStackTrace();
          }
    }
}

A run of Listing 5.3 produces:

E:classescomjavaworldjpitfallsarticle3>java -Djava.compiler=NONE 
com.javaworld.jpitfalls.article3.GoodURLPost 

Received a : sun.net.www.protocol.http.HttpURLConnection
Msg Length: 35
Getting an output stream...
After flushing output stream.
Getting an input stream...
line: <HEAD>
line: <TITLE> Echo CGI program </TITLE>
line: </HEAD>
line: <BODY BGCOLOR='#ebebeb'><CENTER>
line: <H2> Echo </H2>
line: </CENTER>
line: Length of content: 35
line: Content: Hi HTTP SERVER! Just a quick hello!
line: </BODY>
line: </HTML>

Finally, success! We can now post data to a CGI program running on a Web server. To avoid the HTTP post pitfall, do not assume that the methods behave as they do for a Socket. Rather, the getInputStream() method has the side effect of writing the requests to the Web server. Therefore, you must observe the proper sequence.

Pitfall 6: JVM throws NoClassDefFoundError

Pitfall 6 — in which the JVM throws a NoClassDefFoundError — is guaranteed to surface numerous times in every introductory Java course. The JVM’s inability to locate the classfiles it needs crops up even after the programmer has learned to properly set the classpath. I recently encountered this problem while trying to teach a programmer to run a Java program from the command line instead of from within Symantec’s integrated development environment (IDE). Her classpath and path were set fine; however, the JVM continued to throw the NoClassDefFoundError. To solve this problem, you need to know what the VM thinks the classpath is, not what you think it is.

To find out what the beginner programmer’s VM thought the classpath was, I wrote a simple command line utility to print out the classpath:

Listing 6.1 PrintClassPath

import java.util.*;
public class PrintClassPath
{
    public static void main(String args[])
    {
        try
        {
            System.out.println(System.getProperty("java.class.path"));
        } catch (Throwable t)
          {
            t.printStackTrace();
          }
    }
}

On my Windows NT box, a run of PrintClassPath produced:

E:classescomjavaworldjpitfallsarticle2>java PrintClassPath 
e:javarhinojs.jar;e:borlandinterclientinterclient.jar;e:Program 
FilesInterBase 
CorpInterClientinterclient.jar;e:javappsjcannery;e:javaxml-soaplibs
oap.jar;e:classes;.;e:jdk1.1.8libclasses.zip;e:javaxerces-1_2_0xerce
s.jar;e:javajaxp1.0jaxp.jar;e:javajaxp1.0parser.jar;e:javaJClass36libjctable362.jar;e:javaJClass36;e:javaJClass36;e:javaJClass36libj
cbwt362.jar;e
:javaJClass36;e:synergysolutionsXml-in-Java;e:Program 
FilesNetscapeProgra
mJavaClassesjava40.jar;e:javajdombuildjdom-b4.jar

With this program, it quickly became evident that the virtual machine was not using the classpath set in the environment variable. The problem became clear when the programmer explained that she was using the built-in VM that came with her Symantec Visual Café IDE. That VM uses an sc.ini text file instead of the environment variable in DOS. After changing the classpath in the sc.ini file, everything worked fine.

Nine times out of ten, when you see NoClassDefFoundError, it is a classpath issue. With the simple command line tool in Listing 6.1, you can quickly ascertain what classpath the virtual machine thinks it is using. Most of the time you will find that the classpath includes a typographic error, or the directory structure does not match the package name.

Developing for both Java 1.2 and Java 1.1.8 can be quite difficult. You must be careful to not mix your environments, and especially careful not to use classes that are incompatible with your target JVM. To test an applet, I recently switched my VM from 1.2 to 1.1.8 without switching my classpath, since it correctly pointed to the target classes. Unfortunately, I then received the dreaded NoClassDefFoundError on a class that was clearly present. Listings 6.2 and 6.3 demonstrate the problem.

Listing 6.2 TopLevel.java

package com.javaworld.jpitfalls.article2;
public class TopLevel
{
    static SuperFrame frame;
    public static void main(String args[])
    {
        frame = new SuperFrame();
    }
}

Under JDK 1.1.8, attempting to run TopLevel using a classpath configured for JDK 1.2 produces:

E:classescomjavaworldjpitfallsarticle2>java 
com.javaworld.jpitfalls.article2.TopLevel
java.lang.NoClassDefFoundError: com/javaworld/jpitfalls/article2/SuperFrame
        at com.javaworld.jpitfalls.article2.TopLevel.main(Compiled Code)

The error states that the JVM cannot find the class SuperFrame.class; however, the directory shows that the class is in the same package and the same directory as TopLevel.class. So the class is clearly present. If the JVM can find TopLevel.class, why can’t it find SuperFrame.class? They are in the same package and the same directory!

When I tried to run the SuperFrame class directly, I received:

E:classescomjavaworldjpitfallsarticle2>java 
com.javaworld.jpitfalls.article
2.SuperFrame
Can't find class com/javaworld/jpitfalls/article2/SuperFrame

Still no help. The code for SuperFrame.java is found in Listing 6.3; it merely extends a JFrame and is only used to illustrate the behavior.

Listing 6.3 SuperFrame.java

package com.javaworld.jpitfalls.article2;
import com.sun.java.swing.JFrame;
public class SuperFrame extends JFrame
{
}

To solve the problem, I wrote a simple command line utility to load a class and catch any thrown errors. My hope was that the error-reporting would be more detailed. Listing 6.4 presents the LoadClass class.

Listing 6.4 LoadClass.java

import java.util.*;
public class LoadClass
{
    public static void main(String args[])
    {
        if (args.length < 1)
        {
            System.out.println("USAGE: java GOV.dia.mditds.util.LoadClass 
fullClassName");
            System.exit(1);
        }
        try
        {
            Class c = Class.forName(args[0]);
            if (c != null)
                System.out.println("Class: " + c.getName() + " loaded 
successfully.");
            else
                System.out.println("Unable to load: " + args[0]);
        } catch (Throwable t)
          {
            t.printStackTrace();
          }
    }
}

When I ran the utility program LoadClass to load the class in question, I received:

E:classescomjavaworldjpitfallsarticle2>java LoadClass 
com.javaworld.jpitfals.article2.SuperFrame
java.lang.NoClassDefFoundError: com/sun/java/swing/JFrame
        at LoadClass.main(LoadClass.java:15)

That cleared up the error. The JVM incorrectly reported that it could not find a class; actually, it could not fully load the class because the class depended on a class that could not be loaded. In that simple example, the SuperFrame class wants a Swing class that is not in the classpath. While this has been fixed in Java 1.2, it is important to point out for the many people still developing programs in JDK 1.1.8. In Java 1.2, the JVM will accurately report the class it cannot find, instead of reporting that it cannot find the parent class. The LoadClass utility can help you determine why the JVM cannot load a class in Java 1.1.8.

Conclusion

The key to avoiding these two pitfalls is to be careful in your assumptions. The URLConnection class resembles the Socket class in some ways, but their methods will not behave in the same way. As you learned in Pitfall 5, getInputStream() produces adverse side effects in the URLConnection class. Therefore, when posting to a URL, you must observe the proper sequence or you will fall into a trap. Also, don’t assume that your JVM is using the classpath set in the environment variable. Doing so might cause a NoClassDefFoundError. Find out what the JVM understands the classpath to be by using the command line utility in Listing 6.1. To compound the problem, JVM 1.1.8 poorly conveys this type of error. To receive more detailed reports of your errors, use LoadClass.java in Listing 6.4 to get around the NoClassDefFoundError pitfall. Avoiding these traps will save you and your team hours of wasted time.

Michael C. Daconta is the director of Web and
technology services for McDonald
Bradley, where he conducts training seminars and develops
advanced systems with Java, JavaScript, and XML. In the past 15
years, Daconta has held every major development position, including
chief scientist, technical director, chief developer, team leader,
systems analyst, and programmer. He is a Sun-certified Java
programmer and coauthor of Java Pitfalls (John Wiley &
Sons, 2000), Java 2 and JavaScript for C and C++
Programmers (John Wiley & Sons, 1999), and XML
Development with Java 2 (Sams Publishing, 2000). He also wrote
C++ Pointers and Dynamic Memory Management (John Wiley
& Sons, 1995).

Source: www.infoworld.com