Reveal the magic behind subtype polymorphism

Behold polymorphism from a type-oriented point of view

The word polymorphism comes from the Greek for “many forms.” Most Java developers associate the term with an object’s ability to magically execute correct method behavior at appropriate points in a program. However, that implementation-oriented view leads to images of wizardry, rather than an understanding of fundamental concepts.

Polymorphism in Java is invariably subtype polymorphism. Closely examining the mechanisms that generate that variety of polymorphic behavior requires that we discard our usual implementation concerns and think in terms of type. This article investigates a type-oriented perspective of objects, and how that perspective separates what behavior an object can express from how the object actually expresses that behavior. By freeing our concept of polymorphism from the implementation hierarchy, we also discover how Java interfaces facilitate polymorphic behavior across groups of objects that share no implementation code at all.

Quattro polymorphi

Polymorphism is a broad object-oriented term. Though we usually equate the general concept with the subtype variety, there are actually four different kinds of polymorphism. Before we examine subtype polymorphism in detail, the following section presents a general overview of polymorphism in object-oriented languages.

Luca Cardelli and Peter Wegner, authors of “On Understanding Types, Data Abstraction, and Polymorphism,” (see Resources for link to article) divide polymorphism into two major categories — ad hoc and universal — and four varieties: coercion, overloading, parametric, and inclusion. The classification structure is:

                                 |-- coercion
                 |-- ad hoc    --|
                                 |-- overloading
  polymorphism --|
                                 |-- parametric
                 |-- universal --|
                                 |-- inclusion

In that general scheme, polymorphism represents an entity’s capacity to have multiple forms. Universal polymorphism refers to a uniformity of type structure, in which the polymorphism acts over an infinite number of types that have a common feature. The less structured ad hoc polymorphism acts over a finite number of possibly unrelated types. The four varieties may be described as:

Coercion: a single abstraction serves several types through implicit type conversion
Overloading: a single identifier denotes several abstractions
Parametric: an abstraction operates uniformly across different types
Inclusion: an abstraction operates through an inclusion relation

I will briefly discuss each variety before turning specifically to subtype polymorphism.

Coercion

Coercion represents implicit parameter type conversion to the type expected by a method or an operator, thereby avoiding type errors. For the following expressions, the compiler must determine whether an appropriate binary + operator exists for the types of operands:

  2.0 + 2.0
  2.0 + 2
  2.0 + "2"

The first expression adds two double operands; the Java language specifically defines such an operator.

However, the second expression adds a double and an int; Java does not define an operator that accepts those operand types. Fortunately, the compiler implicitly converts the second operand to double and uses the operator defined for two double operands. That is tremendously convenient for the developer; without the implicit conversion, a compile-time error would result or the programmer would have to explicitly cast the int to double.

The third expression adds a double and a String. Once again, the Java language does not define such an operator. So the compiler coerces the double operand to a String, and the plus operator performs string concatenation.

Coercion also occurs at method invocation. Suppose class Derived extends class Base, and class C has a method with signature m(Base). For the method invocation in the code below, the compiler implicitly converts the derived reference variable, which has type Derived, to the Base type prescribed by the method signature. That implicit conversion allows the m(Base) method’s implementation code to use only the type operations defined by Base:

  C c = new C();
  Derived derived = new Derived();
  c.m( derived );

Again, implicit coercion during method invocation obviates a cumbersome type cast or an unnecessary compile-time error. Of course, the compiler still verifies that all type conversions conform to the defined type hierarchy.

Overloading

Overloading permits the use of the same operator or method name to denote multiple, distinct program meanings. The + operator used in the previous section exhibited two forms: one for adding double operands, one for concatenating String objects. Other forms exist for adding two integers, two longs, and so forth. We call the operator overloaded and rely on the compiler to select the appropriate functionality based on program context. As previously noted, if necessary, the compiler implicitly converts the operand types to match the operator’s exact signature. Though Java specifies certain overloaded operators, it does not support user-defined overloading of operators.

Java does permit user-defined overloading of method names. A class may possess multiple methods with the same name, provided that the method signatures are distinct. That means either the number of parameters must differ or at least one parameter position must have a different type. Unique signatures allow the compiler to distinguish between methods that have the same name. The compiler mangles the method names using the unique signatures, effectively creating unique names. In light of that, any apparent polymorphic behavior evaporates upon closer inspection.

Both coercion and overloading are classified as ad hoc because each provides polymorphic behavior only in a limited sense. Though they fall under a broad definition of polymorphism, these varieties are primarily developer conveniences. Coercion obviates cumbersome explicit type casts or unnecessary compiler type errors. Overloading, on the other hand, provides syntactic sugar, allowing a developer to use the same name for distinct methods.

Parametric

Parametric polymorphism allows the use of a single abstraction across many types. For example, a List abstraction, representing a list of homogeneous objects, could be provided as a generic module. You would reuse the abstraction by specifying the types of objects contained in the list. Since the parameterized type can be any user-defined data type, there are a potentially infinite number of uses for the generic abstraction, making this arguably the most powerful type of polymorphism.

At first glance, the above List abstraction may seem to be the utility of the class java.util.List. However, Java does not support true parametric polymorphism in a type-safe manner, which is why java.util.List and java.util‘s other collection classes are written in terms of the primordial Java class, java.lang.Object. (See my article “A Primordial Interface?” for more details.) Java’s single-rooted implementation inheritance offers a partial solution, but not the true power of parametric polymorphism. Eric Allen’s excellent article, “Behold the Power of Parametric Polymorphism,” describes the need for generic types in Java and the proposals to address Sun’s Java Specification Request #000014, “Add Generic Types to the Java Programming Language.” (See Resources for a link.)

Inclusion

Inclusion polymorphism achieves polymorphic behavior through an inclusion relation between types or sets of values. For many object-oriented languages, including Java, the inclusion relation is a subtype relation. So in Java, inclusion polymorphism is subtype polymorphism.

As noted earlier, when Java developers generically refer to polymorphism, they invariably mean subtype polymorphism. Gaining a solid appreciation of subtype polymorphism’s power requires viewing the mechanisms yielding polymorphic behavior from a type-oriented perspective. The rest of this article examines that perspective closely. For brevity and clarity, I use the term polymorphism to mean subtype polymorphism.

Type-oriented view

The UML class diagram in Figure 1 shows the simple type and class hierarchy used to illustrate the mechanics of polymorphism. The model depicts five types, four classes, and one interface. Although the model is called a class diagram, I think of it as a type diagram. As detailed in “Thanks Type and Gentle Class,” every Java class and interface declares a user-defined data type. So from an implementation-independent view (i.e., a type-oriented view) each of the five rectangles in the figure represents a type. From an implementation point of view, four of those types are defined using class constructs, and one is defined using an interface.

Figure 1. UML class diagram for the example code

The following code defines and implements each user-defined data type. I purposely keep the implementation as simple as possible:

/* Base.java */
public class Base
{
  public String m1()
  {
    return "Base.m1()";
  }
  public String m2( String s )
  {
    return "Base.m2( " + s + " )";
  }
}
/* IType.java */
interface IType
{
  String m2( String s );
  String m3();
}
/* Derived.java */
public class Derived
  extends Base
  implements IType
{
  public String m1()
  {
    return "Derived.m1()";
  }
  public String m3()
  {
    return "Derived.m3()";
  }
}
/* Derived2.java */
public class Derived2
  extends Derived
{
  public String m2( String s )
  {
    return "Derived2.m2( " + s + " )";
  }
  public String m4()
  {
    return "Derived2.m4()";
  }
}
/* Separate.java */
public class Separate
  implements IType
{
  public String m1()
  {
    return "Separate.m1()";
  }
  public String m2( String s )
  {
    return "Separate.m2( " + s + " )";
  }
  public String m3()
  {
    return "Separate.m3()";
  }
}

Using these type declarations and class definitions, Figure 2 depicts a conceptual view of the Java statement:

Derived2 derived2 = new Derived2();

Figure 2. Derived2 reference attached to Derived2 object

The above statement declares an explicitly typed reference variable, derived2, and attaches that reference to a newly created Derived2 class object. The top panel in Figure 2 depicts the Derived2 reference as a set of portholes, through which the underlying Derived2 object can be viewed. There is one hole for each Derived2 type operation. The actual Derived2 object maps each Derived2 operation to appropriate implementation code, as prescribed by the implementation hierarchy defined in the above code. For example, the Derived2 object maps m1() to implementation code defined in class Derived. Furthermore, that implementation code overrides the m1() method in class Base. A Derived2 reference variable cannot access the overridden m1() implementation in class Base. That does not mean that the actual implementation code in class Derived can’t use the Base class implementation via super.m1(). But as far as the reference variable derived2 is concerned, that code is inaccessible. The mappings of the other Derived2 operations similarly show the implementation code executed for each type operation.

Now that you have a Derived2 object, you can reference it with any variable that conforms to type Derived2. The type hierarchy in Figure 1’s UML diagram reveals that Derived, Base, and IType are all super types of Derived2. So, for example, a Base reference can be attached to the object. Figure 3 depicts the conceptual view of the following Java statement:

Base base = derived2;

Figure 3. Base reference attached to Derived2 object

There is absolutely no change to the underlying Derived2 object or any of the operation mappings, though methods m3() and m4() are no longer accessible through the Base reference. Calling m1() or m2(String) using either variable derived2 or base results in execution of the same implementation code:

String tmp;
// Derived2 reference (Figure 2)
tmp = derived2.m1();             // tmp is "Derived.m1()"
tmp = derived2.m2( "Hello" );    // tmp is "Derived2.m2( Hello )"
// Base reference (Figure 3)
tmp = base.m1();                 // tmp is "Derived.m1()"
tmp = base.m2( "Hello" );        // tmp is "Derived2.m2( Hello )"

Realizing identical behavior through both references makes sense because the Derived2 object does not know what calls each method. The object only knows that when called upon, it follows the marching orders defined by the implementation hierarchy. Those orders stipulate that for method m1(), the Derived2 object executes the code in class Derived, and for method m2(String), it executes the code in class Derived2. The action performed by the underlying object does not depend on the reference variable’s type.

However, all is not equal when you use the reference variables derived2 and base. As depicted in Figure 3, a Base type reference can only see the Base type operations of the underlying object. So although Derived2 has mappings for methods m3() and m4(), variable base can’t access those methods:

String tmp;
// Derived2 reference (Figure 2)
tmp = derived2.m3();             // tmp is "Derived.m3()"
tmp = derived2.m4();             // tmp is "Derived2.m4()"
// Base reference (Figure 3)
tmp = base.m3();                 // Compile-time error
tmp = base.m4();                 // Compile-time error

The runtime

Derived2

object remains fully capable of accepting either the

m3()

m4()

method calls. The type restrictions that disallow those attempted calls through the

Base

reference actually occur at compile time. That compile-time type checking acts as armor, protecting runtime objects by permitting interaction only through explicitly declared type operations. In that way, types define the boundaries of interaction between objects.

Polymorphic attachments

Type conformance lies at the heart of polymorphism. For each reference variable attached to an object, the static type-checker verifies that the attachment conforms to the defined type hierarchy. Interesting polymorphic behavior arises when a reference variable successively attaches to a variety of possibly different object types. (Strictly speaking, by object type, I mean the type defined by the object’s class.) You can, however, attach several different reference variables to the same object as well. We’ll look at why the latter scenario does not produce polymorphic behavior before we turn to the more interesting former scenario.

Multiple references attached to an object

Figures 2 and 3 depict examples of attaching two or more reference types to a single object. Though the actual underlying Derived2 object remains unaffected by the attached reference variable’s type, the Base type reference in Figure 3 effectively reduces the underlying object’s capability. That result can be generalized: attaching a super type reference to an object effectively reduces the underlying object’s utility.

Why would a developer choose to lose object functionality? The choice is often indirect. Suppose a reference variable named ref attaches to an object whose class contains the following method definition:

public String poly1( Base base )
{
  return base.m1();
}

Parameter type conformance permits calling poly1(Base) using a Derived2 reference, which points to a Derived2 object:

ref.poly1( derived2 );

Method invocation attaches a local Base type variable to the incoming object. So although the method receives a Derived2 object, it may only access Base type operations. The developer of the implementation code does not necessarily choose to lose functionality. From the perspective of the person passing in the Derived2 object, the implementer’s attachment of a Base type reference results in a loss of functionality. But from the implementer’s point of view, every object passed into poly1(Base) looks like a Base object. The implementer does not care that multiple reference types may point to the same object; to the implementer, a single reference type successively attaches to the different objects passed to the method. That those objects have possibly different types is not a primary concern. The implementer only cares that the runtime object maps all Base type operations to appropriate implementation. That type-oriented perspective reveals the true power of polymorphism.

Reference attached to multiple objects

Let’s look at the polymorphic behavior occurring inside of poly1(Base). The following code creates three objects from different classes and passes a reference to each into poly1(Base):

Derived2 derived2 = new Derived2();
Derived derived = new Derived();
Base base = new Base();
String tmp;
tmp = ref.poly1( derived2 );     // tmp is "Derived.m1()"
tmp = ref.poly1( derived );      // tmp is "Derived.m1()"
tmp = ref.poly1( base );         // tmp is "Base.m1()"

The implementation code in poly1(Base) calls m1() for each passed object, using a local Base type variable. Figures 3 and 4 depict type-oriented views of the conceptual structure that results when the passed reference points to objects of each of the three classes.

Figure 4. Base reference attached to a Derived and a Base object

In each figure, note the mapping of the m1() operation. In Figure 3, m1() maps to implementation code in class Derived; a comment in the above code notes that the poly1(Base) method call returns "Derived.m1()". For the Derived object in Figure 4, method m1() also maps to the implementation in class Derived, again returning "Derived.m1()". Finally, for the Base object in Figure 4, method m1() maps to class Base implementation and returns "Base.m1()".

So where does the power of polymorphism lie? Return to the code for method poly1(Base), which views whatever object it receives through a Base-type lens. Yet when passed a Derived2 object, it actually returns a result from code executed in class Derived! And if you later extend the classes Base, Derived, or Derived2, the method poly1(Base) cheerfully accepts an object of your class and executes the appropriate implementation code. Polymorphism allows you to add those classes long after writing poly1(Base).

That certainly seems like magic. However, a fundamental understanding reveals the inner workings of polymorphism. From a type-oriented perspective, the underlying object’s actual implementation code is immaterial. The most important aspect of reference-to-object attachment is that the compile-time type-checker has guaranteed that the underlying object possesses a runtime implementation for each type operation. Polymorphism frees the developer from the implementation details of an object and allows design to occur using a type-oriented perspective. Therein lies a significant benefit in separating type and implementation (often referred to as separating interface and implementation).

The interface to an object

So polymorphism relies on separating the concerns of type and implementation, which is often referred to as separating interface and implementation. But that latter statement seems confusing in light of the Java keyword interface.

More importantly, what do developers mean by the common phrase the interface to an object? Typically, the statement’s context indicates that the phrase refers to the set of all public methods defined by the object’s class hierarchy — that is, the set of all publicly available methods that may be called on the object. That definition, however, leans toward an implementation-centric view by concentrating our focus on an object’s runtime capability, rather than on a type-oriented view of the object. In Figure 3, the interface to the object refers to the panel labeled “Derived2 Object.” That panel lists all available methods for the Derived2 object. But to understand polymorphism, we must free ourselves from an implementation level and view the object from the perspective of the type-oriented panel labeled “Base Reference.” At that level, the reference variable’s type dictates an interface to the object. That’s an interface, not the interface. Under the guidance of type conformance, we may attach multiple type-oriented views to a single object. There is no singularly specified interface to an object.

So in terms of type, the interface to an object refers to the widest possible type-oriented view of that object — as in Figure 2. A super type reference attached to the same object typically narrows the view — as in Figure 3. The concept of type best captures the spirit of freeing object interactions from the details of object implementation. Rather than refer to the interface of an object, a type-oriented perspective encourages referring to the reference type attached to an object. The reference type dictates the permissible interaction with the object. Think type when you want to know what an object can do, as opposed to how the object implements its responsibilities.

Java interfaces

The previous examples of polymorphic behavior use subtype relationships established through class inheritance. Java interfaces also declare user-defined types, and correspondingly, Java interfaces enable polymorphic behavior by establishing type inheritance structure. Suppose a reference variable named ref attaches to an object whose class contains the following method definition:

public String poly2( IType iType )
{
  return iType.m3();
}

To explore polymorphic behavior inside poly2(IType), the following code creates two objects from different classes and passes a reference to each into poly2(IType):

Derived2 derived2 = new Derived2();
Separate separate = new Separate();
String tmp;
tmp = ref.poly2( derived2 );       // tmp is "Derived.m3()"
tmp = ref.poly2( separate );       // tmp is "Separate.m3()"

The above code resembles the previous discussion of polymorphic behavior inside poly1(Base). The implementation code in poly2(IType) calls method m3() for each object, using a local IType reference. As before, code comments note the String result of each call. Figure 5 shows the conceptual structure of the two calls to poly2(IType):

Figure 5. IType reference attached to a Derived2 and a Separate object

The similarity between the polymorphic behavior occurring inside methods poly1(Base) and poly2(IType) results directly from a type-oriented perspective. Raising our view above the implementation level allows an identical understanding of the two code samples’ mechanics. Local super type references attach to incoming objects and make type-restricted calls to those objects’ methods. Neither reference knows (nor cares) what implementation code actually executes. The subtype relationship verified at compile time guarantees the passed object’s capability to perform appropriate implementation code when called upon.

However, an important distinction manifests itself at the implementation level. In the poly1(Base) example (Figures 3 and 4), the Base-Derived-Derived2 class inheritance chain establishes the requisite subtype relations, and method overriding determines the implementation code mappings. In the poly2(IType) example (Figure 5), a completely different dynamic occurs. Classes Derived2 and Separate do not share any implementation hierarchy, yet objects instantiated from those classes exhibit polymorphic behavior through an IType reference.

Such polymorphic behavior highlights a significant utility of Java interfaces. The UML diagram in Figure 1 shows that type Derived subtypes both Base and IType. By defining a type completely free of implementation, Java interfaces allow multiple type inheritance without the thorny issues of multiple implementation inheritance, which Java prohibits. Classes from completely separate implementation hierarchies may be grouped by a Java interface. In Figure 1, interface IType groups Derived and Separate (and any subtypes of those types).

By grouping objects from disparate implementation hierarchies, Java interfaces facilitate polymorphic behavior even in the absence of any shared implementation or overridden methods. As shown in Figure 5, an IType reference polymorphically accesses the m3() methods of the underlying Derived2 and Separate objects.

The interface to an object (again)

Note that objects Derived2 and Separate in Figure 5 each possess mappings for method m1(). As previously discussed, the interface to each object includes that m1() method. But there is no way, using these two objects, to engage method m1() in polymorphic behavior. It is insufficient that each object possesses an m1() method. A common type must exist with operation m1(), through which to view the objects. The objects may seem to share m1() in their interfaces, but without a common super type, polymorphism is impossible. Thinking in terms of the interface to an object simply confounds this issue.

Conclusion

Having established subtype polymorphism in the general context of object-oriented polymorphism, you closely examined that critical variety from a type-oriented perspective. A fundamental understanding of subtype polymorphism requires that you make the shift from implementation concerns to thinking in terms of type. Types define common object groupings and govern permissible object interactions. The hierarchical structure of type inheritance determines the type relationships necessary to achieve polymorphic behavior.

Interestingly, implementation does not affect the hierarchical structure of subtype polymorphism. Types determine what methods the object may perform; implementation determines how the object actually responds to each method. That is, types declare responsibilities, and classes implement those responsibilities. By cleanly separating type and implementation, we find the two governing a grand object dance: types determine permissible partners and the names of the dances, while implementations choreograph the actual steps.

Wm. Paul Rogers is a senior engineering
manager and application architect at Lutris Technologies, where he
builds computer solutions that utilize Enhydra, the leading open source
Java/XML application server. He began using Java in the fall of
1995 in support of oceanographic studies conducted at the Monterey
Bay Aquarium Research Institute, where he led the charge to use new
technologies to expand the possibilities of ocean science research.
Paul has been using object-oriented methods and technologies for
more than nine years.

Source: www.infoworld.com