Adding generic types to Java could mean less coding and fewer
Suppose you want to implement a list class in Java. You start with an abstract class, List
, and two subclasses, Empty
and Cons
, representing empty and nonempty lists, respectively. Since you plan to extend the functionality of these lists, you design a ListVisitor
interface, and provide accept(...)
hooks for ListVisitor
s in each of your subclasses. Furthermore, your Cons
class has two fields, first
and rest
, with corresponding accessor methods.
What will the types of these fields be? Clearly, rest
should be of type List
. If you know in advance that your lists will always contain elements of a given class, the task of coding will be considerably easier at this point. If you know that your list elements will all be integer
s, for instance, you can assign first
to be of type integer
.
However, if, as is often the case, you don’t know this information in advance, you must settle for the least common superclass that has all possible elements contained in your lists, which typically is the universal reference type Object
. Hence, your code for lists of varying type elements has the following form:
abstract class List {
public abstract Object accept(ListVisitor that);
}
interface ListVisitor {
public Object _case(Empty that);
public Object _case(Cons that);
}
class Empty extends List {
public Object accept(ListVisitor that) {
return that._case(this);
}
}
class Cons extends List {
private Object first;
private List rest;
Cons(Object _first, List _rest) {
first = _first;
rest = _rest;
}
public Object first() {return first;}
public List rest() {return rest;}
public Object accept(ListVisitor that) {
return that._case(this);
}
}
Although Java programmers often use the least common superclass for a field in this way, the approach has its disadvantages. Suppose you create a ListVisitor
that adds all the elements of a list of Integer
s and returns the result, as illustrated below:
class AddVisitor implements ListVisitor {
private Integer zero = new Integer(0);
public Object _case(Empty that) {return zero;}
public Object _case(Cons that) {
return new Integer(((Integer) that.first()).intValue() +
((Integer) that.rest().accept(this)).intValue());
}
}
Note the explicit casts to Integer
in the second _case(...)
method. You are repeatedly performing runtime tests to check properties of the data; ideally, the compiler should perform these tests for you as part of program type checking. But since you are not guaranteed that AddVisitor
will only be applied to List
s of Integer
s, the Java type checker cannot confirm that you are, in fact, adding two Integer
s unless the casts are present.
You could potentially obtain more precise type checking, but only by sacrificing polymorphism and duplicating code. You could, for instance, create a special List
class (with corresponding Cons
and Empty
subclasses, as well as a special Visitor
interface) for each class of element you store in a List
. In the example above, you would create an IntegerList
class whose elements are all Integer
s. But if you wanted to store, say, Boolean
s in some other place in the program, you would have to create a BooleanList
class.
Clearly, the size of a program written using this technique would increase rapidly. There are further stylistic issues, as well; one of the essential principles of good software engineering is to have a single point of control for each functional element of the program, and duplicating code in this copy-and-paste fashion violates that principle. Doing so commonly leads to high software development and maintenance costs. To see why, consider what happens when a bug is found: the programmer would have to go back and correct that bug separately in every copy made. If the programmer forgets to identify all of the duplicated sites, a new bug will be introduced!
But, as the example above illustrates, you will find it difficult to simultaneously keep a single point of control and use static type checkers to guarantee that certain errors will never occur when the program executes. In Java, as it exists today, you often have no choice but to duplicate code if you want precise static type checking. To be sure, you could never eliminate this aspect of Java entirely. Certain postulates of automata theory, taken to their logical conclusion, imply that no sound type system can determine precisely the set of valid inputs (or outputs) for all methods in a program. Consequently, every type system must strike a balance between its own simplicity and the expressiveness of the resulting language; the Java type system leans a bit too much in the direction of simplicity. In the first example, a slightly more expressive type system would have let you maintain precise type checking without having to duplicate code.
Such an expressive type system would add generic types to the language. Generic types are type variables that can be instantiated with an appropriately specific type for each instance of a class. For the purposes of this article, I will declare type variables in angle brackets above class or interface definitions. The scope of a type variable will then consist of the body of the definition at which it was declared (not including the extends
clause). Within this scope, you can use the type variable anywhere that you can use an ordinary type.
For example, with generic types, you could rewrite your List
class as follows:
abstract class List<T> {
public abstract T accept(ListVisitor<T> that);
}
interface ListVisitor<T> {
public T _case(Empty<T> that);
public T _case(Cons<T> that);
}
class Empty<T> extends List<T> {
public T accept(ListVisitor<T> that) {
return that._case(this);
}
}
class Cons<T> extends List<T> {
private T first;
private List<T> rest;
Cons(T _first, List<T> _rest) {
first = _first;
rest = _rest;
}
public T first() {return first;}
public List<T> rest() {return rest;}
public T accept(ListVisitor<T> that) {
return that._case(this);
}
}
Now you can rewrite AddVisitor
to take advantage of the generic types:
class AddVisitor implements ListVisitor<Integer> {
private Integer zero = new Integer(0);
public Integer _case(Empty<Integer> that) {return zero;}
public Integer _case(Cons<Integer> that) {
return new Integer((that.first()).intValue() +
(that.rest().accept(this)).intValue());
}
}
Notice that the explicit casts to Integer
are no longer needed. The argument that
to the second _case(...)
method is declared to be Cons<Integer>
, instantiating the type variable for the Cons
class with Integer
. Therefore, the static type checker can prove that that.first()
will be of type Integer
and that that.rest()
will be of type List<Integer>
. Similar instantiations would be made each time a new instance of Empty
or Cons
is declared.
In the example above, the type variables could be instantiated with any Object
. You could also provide a more specific upper bound to a type variable. In such cases, you could specify this bound at the type variable’s point of declaration with the following syntax:
<var> extends <bound>
For instance, if you wanted your List
s to contain only Comparable
objects, you could define your three classes as follows:
class List<T extends Comparable> {...}
class Cons<T extends Comparable> {...}
class Empty<T extends Comparable> {...}
Although adding parameterized types to Java would give you the benefits shown above, doing so would not be worthwhile if it meant sacrificing compatibility with legacy code in the process. Fortunately, such a sacrifice is not necessary. It is possible to automatically translate code, written in an extension of Java that has generic types, to bytecode for the existing JVM. Several compilers already do this — the Pizza and GJ compilers, written by Martin Odersky, are particularly good examples. Pizza was an experimental language that added several new features to Java, some of which were incorporated into Java 1.2; GJ is a successor to Pizza that adds only generic types. Since this is the only feature added, the GJ compiler can produce bytecode that works smoothly with legacy code. It compiles source to bytecode by means of type erasure, which replaces every instance of each type variable with that variable’s upper bound. It also allows type variables to be declared for specific methods, rather than for whole classes. GJ uses the same syntax for generic types that I use in this article.
Work in progress
At Rice University, the programming languages technology group in which I work is implementing a compiler for an upward-compatible version of GJ, called NextGen. The NextGen language was jointly developed by Professor Robert Cartwright of Rice’s computer science department and Guy Steele of Sun Microsystems; it adds the ability to perform runtime checks of type variables to GJ.
Another potential solution to this problem, called PolyJ, was developed at MIT. It is being extended at Cornell. PolyJ uses a slightly different syntax than GJ/NextGen. It also differs slightly in the use of generic types. For example, it does not support type parameterization of individual methods, and currently, doesn’t support inner classes. But unlike GJ or NextGen, it does allow type variables to be instantiated with primitive types. Also, like NextGen, PolyJ supports runtime operations on generic types.
Sun has released a Java Specification Request (JSR) for adding generic types to the language. Unsurprisingly, one of the key goals listed for any submission is the maintenance of compatibility with existing class libraries. When generic types are added to Java, it is likely that one of the proposals discussed above will serve as the prototype.
There are some programmers who are opposed to adding generic types in any form, despite their advantages. I’ll refer to two common arguments of such opponents as the “templates are evil” argument and the “it’s not object oriented” argument, and address each of them in turn.
Are templates evil?
C++ uses templates to provide a form of generic types. Templates have earned a bad reputation among some C++ developers because their definitions are not type checked in parameterized form. Instead, the code is replicated at each instantiation, and each replication is type checked separately. The problem with this approach is that type errors might exist in the original code that don’t show up in any of the initial instantiations. These errors can manifest themselves later if program revisions or extensions introduce new instantiations. Imagine the frustration of a developer using existing classes that type check when compiled by themselves, but not after he adds a new, perfectly legitimate subclass! Worse still, if the template is not recompiled along with the new classes, such errors will not be detected, but will instead corrupt the executing program.
Because of these problems, some people frown upon bringing templates back, expecting the drawbacks of templates in C++ to apply to a generic type system in Java. This analogy is misleading, because the semantic foundations of Java and C++ are radically different. C++ is an unsafe language, in which static type checking is a heuristic process with no mathematical foundation. In contrast, Java is a safe language, in which the static type checker literally proves that certain errors cannot occur when the code is executed. As a result, C++ programs involving templates suffer from myriad safety problems that cannot occur in Java.
Moreover, all of the prominent proposals for a generic Java perform explicit static type checking of the parameterized classes, rather than just doing so at each instantiation of the class. If you’re worried that such explicit checking would slow down type checking, rest assured that, in fact, the opposite is true: since the type checker makes only one pass over the parameterized code, as opposed to a pass for each instantiation of the parameterized types, the type checking process is expedited. For these reasons, the numerous objections to C++ templates do not apply to the generic type proposals for Java. In fact, if you look beyond what has been widely used in the industry, there are many less popular but very well-designed languages, such as Objective Caml and Eiffel, that support parameterized types to great advantage.
Are generic type systems object oriented?
Finally, some programmers object to any generic type system on the grounds that, because such systems were originally developed for functional languages, they are not object oriented. This objection is spurious. Generic types fit very naturally into a object-oriented framework, as the examples and discussion above demonstrate. But I suspect that this objection is rooted in a lack of understanding of how to integrate generic types with Java’s inheritance polymorphism. In fact, such integration is possible, and is the basis for our implementation of NextGen.
Generic types can be understood as subtrees in the class tree, rooted in the parameterized class, with additional child classes and interfaces for each instantiation. The interfaces are necessary because a particular instantiation of a generic type (for example, Cons<Integer>
) should not only inherit from your parameterized class (Cons<T>
), but also from the same instantiation of any parameterized superclass (List<Integer>
). Fortunately, the multiple inheritance involved in this integration does not require any implementation inheritance, and therefore can be modeled by the existing Java interface inheritance mechanism. Notice, in particular, that, unlike Java arrays, there is no covariant relationship among the instantiated type parameters (for example, List<Cons<Integer>>
is not a subclass of List<List<Integer>>
). It is possible to enhance this integration model to support covariant subtyping. Since covariant subtyping is a useful feature in practice, we plan to add this feature to NextGen after the initial implementation is completed.
Conclusion
Given Sun’s release of a JSR for adding generic types to Java, and the existence of several viable Java extensions that support generic types, it is likely that generic types will be part of Java in the near future. When used judiciously, they will provide programmers with improved safety guarantees and less development and maintenance costs. In a world where Java plays increasingly critical roles in embedded systems, ecommerce technologies, and even good old-fashioned monolithic software apps, these guarantees are sorely needed.