Java Tip 122: Beware of Java typesafe enumerations

Think twice before relying on instance identity

Departing from traditional practice for JavaWorld’s Tips ‘N Tricks column, I will talk about when not to use a previously suggested trick. Specifically, the typesafe enum construct, covered in JDC Tech Tips and other publications, can sometimes be hazardous to your code.

Because Java lacks a proper C/C++ enumeration (enum) feature, Java programmers have opted to define simple sets of primitive values:

public class Colors
{
    public static final int GREEN = 0;
    public static final int RED = 1;
    ...
}

This is not particularly typesafe, but it works. You can easily copy and serialize these constants, and then use them for fast switch lookups and so on. In fact, this is how Java language designers originally advised Java programmers to handle Java’s lack of an enumeration feature (see “The Java Language Environment” whitepaper).

How it’s supposed to work

The typesafe Java enum concept basically replaces the set of primitive constants above with a set of static final object references encapsulated in a class that (possibly) restricts further instantiation. A basic example would be:

public final class Enum
{
    public static final Enum TRUE = new Enum ();
    public static final Enum FALSE = new Enum ();
    private Enum () {}
} // end of class

Because the set of instances is restricted by the private constructor and Enum class being final, we can assume that Enum.TRUE and Enum.FALSE are the only instances of the Enum class. Thus, we can use the identity comparison (==) operator instead of the equals() method when comparing enum values. The cost of using the == operator equates to directly comparing pointer values in C/C++. Great, right? We have both type and range safety for enum values while keeping value comparisons efficient.

Is that enough?

Alas, the simple Enum class above lacks a few features. One missing feature is that we cannot pass instances of our Enum class as an argument to an RMI (remote method invocation) or EJB (Enterprise JavaBeans) method. To do that, we must mark the class Serializable:

public final class Enum implements java.io.Serializable
{
    public static final Enum TRUE = new Enum ();
    public static final Enum FALSE = new Enum ();
    private Enum () {}
} // end of class

Ok, so what is wrong with that?

The above has a subtle trap, as shown here:

    ByteArrayOutputStream bout = new ByteArrayOutputStream ();
    ObjectOutputStream out = new ObjectOutputStream (bout);
    Enum e1 = Enum.TRUE;
    out.writeObject (e1);
    out.flush ();
    ByteArrayInputStream bin = new ByteArrayInputStream (bout.toByteArray ());
    ObjectInputStream in = new ObjectInputStream (bin);
    Enum e2 = (Enum) in.readObject ();
    System.out.println ((e2 == Enum.TRUE || e2 == Enum.FALSE));

This code will print out false, indicating that e2 is neither Enum.TRUE nor Enum.FALSE. This happens because deserializing an object creates a new object without regard to the class’s constructors — the instantiation protection that we thought we got from making the Enum constructor private doesn’t affect deserialization.

This could lead to unexpected results in runtime environments like an EJB container, especially since most EJB containers support optimization options to disable the serialization marshalling of method arguments for beans deployed in the same JVM. Your code’s runtime behavior will then depend on this option’s setting — not a very comforting thought, is it?

As pointed out by Joshua Bloch, we must do more to ensure that serialization doesn’t result in illegal Enum instances unexpectedly springing up at runtime. At a minimum, we have to add a readResolve() method and an instance field to use as the real instance ID:

public final class Enum implements java.io.Serializable
{
    public static final Enum TRUE = new Enum (true);
    public static final Enum FALSE = new Enum (false);
    public String toString ()
    {
        return String.valueOf (m_value).toUpperCase ();
    }
    private Enum (boolean value)
    {
        m_value = value;
    }
    private Object readResolve () throws java.io.ObjectStreamException
    {
        return (m_value ? TRUE : FALSE);
    }
    private boolean m_value;
} // end of class

Here, in the readResolve() method, I check the value ID of the instance just created and replace the deserialized instance with one of the static objects.

Unfortunately, many programmers today are unaware they must implement readResolve() to perform instance substitution during serialization (this feature was not available before Java 2 either). If we don’t do this, however, we won’t get any compiler or runtime errors — the reference comparison will simply fail each time we compare an Enum value against a deserialized Enum instance. Depending on the enumeration’s size, the amount of work necessary to have a correct and serializable typesafe class may be too much compared to the good old “typeunsafe” pattern (the standard practice of defining simple-minded sets of constants referred to earlier), which lacks this issue.

Interestingly enough, Sun’s JDK uses the typesafe enum pattern and is not consistent with making all such types Serializable: several Swing typesafe enum classes are not Serializable (for example, javax.swing.text.html.HTML.Tag), while others are (for example, java.util.logging.Level in JDK 1.4+).

Dealing with classloaders

Another scenario in which the typesafe enum pattern breaks completely is when the Java runtime loads the Enum class multiple times. Although this sounds obscure, it can happen more easily than you might think.

Consider an EJB invoking a method on another EJB. If the EJBs come from different deployment JAR units, different classloaders may load them. Both deployment JARs could package the Enum class, and the particular details of the container classloader hierarchy can conspire to have both classloaders load the Enum twice. If the two EJBs then exchange data that includes the Enum type and the data is not marshalled by means of serialization, relying on object reference identity for comparison will most certainly fail.

Consider another possibility: a JavaServer Page (JSP) or a servlet placing data that includes Enum instances in an HTTP session. If the servlet later reloads (for example, because the JSP updates) and then attempts to compare anything against Enum values left in the session, this will create the same effect of a class in one classloader namespace acquiring data from a different classloader namespace.

The typesafe enum pattern fails in the above cases for a reason different from serialization intricacies: the same class loaded by different classloaders is, strictly speaking, actually a different class each time. The Enum class’s static data will be created anew by each classloader loading the class. Instances of such classes can coexist in the VM, but they will be instances of incompatible types; they could not be cast to each other, and thus, they could not be compared using the == operator.

The following code simulates this runtime scenario. This new class, EnumConsumer, will act as something that uses the Enum type:

public class EnumConsumer implements IEnumConsumer
{
    public Vector getObjects ()
    {
        Vector result = new Vector ();
        result.add (Enum.FALSE);
        result.add (Enum.TRUE);
        return result;
    }
    public void validate (Vector objects)
    {
        if (objects.get (0) != Enum.FALSE)
            System.out.println ("element 0 [" + objects.get (0) + "] != Enum.FALSE");
        else
            System.out.println ("element 0 Ok");
        if (objects.get (1) != Enum.TRUE)
            System.out.println ("element 1 [" + objects.get (1) + "] != Enum.TRUE");
        else
            System.out.println ("element 1 Ok");
    }
} // end of class

EnumConsumer implements a simple test interface, IEnumConsumer, that we will use to drive EnumConsumer instances across multiple classloader namespaces:

public interface IEnumConsumer
{
    Vector getObjects ();
    void validate (Vector objects);
} // end of interface

The idea here is simple enough: getObjects() returns a Vector of two possible Enum values in known order. If that Vector is passed into validate(), it will check the expected state of data and complain if something is wrong. Naively, I expect that if I execute getObjects() and send the result of that execution into validate() then it should never fail. But it can fail, as shown below. The key to making this interesting is to drive this class from the following main() method:

    public static void main (String [] args) throws Exception
    {
        File loaderClasspathDir = new File ("data");
        loaderClasspathDir.mkdir ();
        // move Enum.class and EnumConsumer.class from "./out/" to "./data/":
        String [] classNames = new String [] {"Enum.class", "EnumConsumer.class"};
        for (int c = 0; c < classNames.length; c ++)
        {
            File source = new File ("out", classNames [c]);
            File target = new File (loaderClasspathDir, classNames [c]);
            if (! target.exists () || (source.lastModified () > target.lastModified ()))
            {
                if (target.exists ()) target.delete ();
                source.renameTo (target);
            }
         }
        URL [] URLlist = new URL [] {loaderClasspathDir.toURL ()};
        // simulate 2 different classloader namespaces: this is namespace #1
        URLClassLoader l1 = new URLClassLoader (URLlist);
        Class c1 = l1.loadClass ("EnumConsumer");
        IEnumConsumer obj1 = (IEnumConsumer) c1.newInstance ();
        // ... and this is namespace #2:
        URLClassLoader l2 = new URLClassLoader (URLlist);
        Class c2 = l2.loadClass ("EnumConsumer");
        IEnumConsumer obj2 = (IEnumConsumer) c2.newInstance ();
        // get data to pass between obj1 and obj2:
        Vector objects = obj1.getObjects ();
        // this works as expected:
        obj1.validate (objects);
        // this fails:
        obj2.validate (objects);
    }

In the above code, I assume that all classes in the project compile into the out directory, which will be in the system classloader’s classpath when the program runs. I first execute a loop that moves Enum and EnumConsumer classes in a separate data directory, which classloaders l1 and l2 will use. This simulates l1 and l2 being packaged in the same deployment unit and is necessary to prevent them from delegating to their common parent system classloader. IEnumConsumer is left alone so that the result of ClassLoader.loadClass() could be cast to it. Both l1 and l2 are then asked to create two EnumConsumer instances, obj1 and obj2, which then validate the result of obj1.getObjects() twice:

>java -cp out Main
element 0 Ok
element 1 Ok
element 0 [FALSE] != Enum.FALSE
element 1 [TRUE] != Enum.TRUE

The last two lines of output show that obj2 will never be able to use Enum values that obj1 creates. Effectively, obj1 and obj2 have different views of what Enum class and its values are. Neither the reference comparison (==) nor the equals() method will work.

In fact, fixing the Enum class so it works in this case would require a nontrivial amount of effort. You can override Object.equals() for Enum to use reflection in order to compare class names and m_value values. This will of course eliminate the speed advantage of a typesafe enum type. Besides, it would definitely be too much work for something that has no issues with the old typeunsafe construct in the first place.

Ironically, core JDK classes are in little danger of this happening because they always load precisely once from the same bootstrap classloader. However, we, the developers, can run into this issue with custom-loaded code.

Proceed with caution

The typesafe enum pattern may require too much work to be truly safe in all situations, especially if your runtime involves serialization or a complex classloader structure — typical elements of Java 2 Platform, Enterprise Edition (J2EE) applications. In certain cases, it won’t work at all. As such, the pattern is an unreliable substitute for a basic feature that the Java language lacks: a compiler-supported enum feature. In many cases, you’re better off using the good old set-of-static-primitive-values enumeration.

Vladimir Roubtsov has
programmed in a variety of procedural languages for more than 12
years and Java since 1995. Currently, he develops enterprise
software as a senior developer for Trilogy in Austin, Texas. In his
spare time, Vladimir develops software tools based on Java byte
code or source code instrumentation.

Source: www.infoworld.com