Reflection vs. code generation

Avoid runtime reflection when marshaling data

Data marshaling (pulling data from an outside source and loading it into a Java object) can utilize the benefits of reflection to create a reusable solution. The problem is simple enough: load data from a file into an object’s fields. Now, what if the target Java classes for the data change on a weekly basis? A straightforward solution still works, but you must continually maintain the loading procedures to reflect any changes. To further complicate the situation, the same problem may crosscut the system’s breadth. Anyone who has dealt with a large system using XML has encountered this problem. Coding a load procedure is often tedious and subject to frequent updates and rewrites due to changes to the source data or the target Java class. A solution using reflection, which I’ll describe here, often requires less coding, and updates itself when changes are made to the target Java class.

Originally, I intended to demonstrate a solution using reflection during runtime for data marshaling. Initially a dynamic, reflection-based program was far more attractive than a simpler approach. Over time, the novelty faded to reveal the complexity and risk of runtime reflection. This article charts this evolution from runtime reflection to active code generation.

From simplicity to complexity

My first solution used a loading class to load the objects’ data from a flat file. My source code contained several dozen calls for the next token of a StringTokenizer object. After several refactorings (see Martin Fowler’s Refactoring), my coding logic became straightforward, nearly systematic. The class structure dictated code. My initial solutions showed me that I needed only to account for three basic objects:

Strings
Objects
Arrays of objects

You could map the class’s objects to generalized code blocks, as shown in the following table:

Objects mapped to generalized code blocks

Field type	Code block
`String`	`fileIterator.nextString();`
`Object[]`	`Vector collector = new Vector(); while(fileIterator.hasMoreDataForArray()){ Object data = initializeObject(fileIterator)collector.add(data); } Object[] objArray = new Object[collector.size()]; collector.copyInto(objArray);`
`Object`	`initializeObject(fileIterator);`

Having coded the solution several times, I knew the solution and the code structure before I wrote any of the code. The difficulty arose from the classes’ changing landscape. The class names, compositions, and structures could change at any moment, and any change could force a rewrite. Given these changes, the structure and loading process still remained the same; I still knew the code structure and composition before I wrote the code. I needed a way to translate the coding processes in my head into a reproducible, automated form. Since I am an efficient (i.e., lazy) programmer, I quickly tired of writing nearly identical code. Reflection came to my rescue.

Marshaling usually requires source and target data maps. Maps can take the shape of a schema, DTD (document type definition), file format, and so on. In this case, reflection interprets an object’s class definition as the target map for our mapping process. Reflection can duplicate the code’s functionality during runtime. So during a required rewrite, I replaced the load procedure with reflection in the same amount of time it would have taken me to do the rewrite.

The load process can be summarized in the following steps:

Interpret: A map decides what you need to construct an object.
- If you need to construct other objects first, recurse; repeat step 1.
Request data: To fulfill construction requirements, a call is made to obtain data.
Pull: Data is extracted from source.
Push: Data is stuffed into an object’s new instance.
If necessary, repeat step 1.

You need the following classes to fulfill the steps above.

Data classes: Instantiate with the data from the ASCII files. The class definitions provide the map for the data. The following must be true of data classes:
- They must contain a constructor that takes all the required arguments to construct the object in a valid state.
- They must be composed of objects that the reflective procedure knows how to handle.
Object loader: Uses reflection and the data class as a map to load the data. Makes the data requests.
Load manager: Acts as an intermediary between the object loader and the source data by translating requests for data into a data-specific call. This enables the object loader to be data-source independent. Communicates through its interface and a loadable class object.
Data iterator interface: The load manager and load class objects use this interface to pull the data from its source.

Once you create the supporting classes, you can create and map an object with the following statements:

FooFileIterator iter = new FooFileIterator(fileLocation, log);
LoadManager manager = new FooFileLoadManager(iter);
SubFooObject obj = 
(SubFooObject)ReflectiveObjectLoader.initializeInstance(SubFooObject.class, manager,log);

With just this bit of magic, you create a new instance of a SubFooObject containing the file contents.

Limitations

Developers must decide on how best to solve a problem; often, the toughest part is deciding which solution to use. Below are some limitations to keep in mind when considering using reflection for data marshaling:

Do not make a simple problem complex. Reflection can be a hairy beast, so only use it when necessary. Once a developer understands reflection’s power, he or she may want to use it to solve every problem. Be careful not to solve a problem with reflection that you can solve more easily, and more quickly, without it (even if it means writing more code). Reflection is as dangerous as it is powerful.
Consider performance. Reflection can be a performance hog because it takes time and memory to discover and manipulate class properties during runtime.

Reassess the solution

As described above, the number one limitation to using runtime reflection is “Do not make a simple problem complex.” With reflection, this is unavoidable. Coupling reflection with recursion is one serious headache; reviewing the code is a nightmare; and determining exactly what the code is doing is an intricate process. The only way to truly determine the code’s behavior is to step through it, as it would behave at runtime, with sample data. However, to do this for every possible data combination is nearly impossible. Unit testing code helps the situation, but still cannot quell the fears of a developer who sees a possibility for failure. Fortunately, there is an alternative.

The alternative

Given the limitations listed earlier, there are definitely situations where using a reflective load procedure is more trouble than it’s worth. Code generation provides a versatile alternative. You can also use reflection to inspect a class and produce code for the load procedure. Andrew Hunt and David Thomas describe two types of code generators in The Pragmatic Programmer.

Passive: A passive code generator requires some human intervention to implement code. Many IDEs provide wizards that take this approach.
Active: Active code generation involves creating code that never has to be modified once it is generated. If a problem arises, the problem should be fixed in the code generator, and not in the generated source files. Ideally, this process should be included in the build process to ensure that the classes never become out of date.

The pros and cons associated with code generators include the following:

Pros:
- Simplicity: Generated code is generally more readable and debuggable for developers.
- Compile time errors: Reflexive procedures fail more often at runtime than during compilation. For instance, changing the object to be loaded will probably cause the generated loading class to throw a compilation error, but the reflexive procedure won’t see any difference until the class is encountered during runtime.
Cons:
- Maintenance: With passive code generation, changing the objects to be loaded may require the loading class to be updated or regenerated. If the classes are regenerated, then customizations may be lost.

Take a step back and try again

At this point we can see that using reflection during runtime is not acceptable. Active code generation gives us all the benefits of reflection, but none of its limitations. Reflection will still be used, but only during the code generation process, and not during runtime. The reasons are outlined below:

Less risk. Runtime reflection is definitely high risk, especially as the problem grows more complex.
Unit tests lie, but the compiler doesn’t.
Versatility. Generated code can realize all the benefits of runtime reflection, while gaining other benefits elusive in runtime reflection.
Understandability. Even with several more refactorings, the complexity of combining recursion with reflection is inescapable. Generated source code is much easier to explain and understand. The code generation process requires recursion and reflection, but the result is viewable source code, rather than a magically created object.

Write code that writes code

To write a code generator, you must think beyond simply coding a solution to solve a problem. A code generator (and reflection) recreate the mental gymnastics that coding often requires. If you use purely runtime reflection, you are forced to conceptualize the problem at runtime, rather than attack the problem with simple, compilable source code. Code generation lets you view the problem from both perspectives. The code generation process actually converts abstract ideas into concrete source code. Runtime reflection never escapes from abstraction.

The code generation process lets you commit your thought process to code, then generate and compile the code. The compiler lets you know if your thought process is syntactically incorrect; unit tests validate the code’s runtime behavior. Given its dynamic nature, runtime reflection cannot give this level of security.

The code generator

After experimenting with several failed designs, I decided that generating a method for every type of class needed to instantiate during the load procedure was the easiest approach. A method factory produces the right method for a particular class. A code builder object caches the requests for methods from the code factory to produce the final source file’s contents.

The MethodCode objects are the heart of the code generation process. Here is an example of the code generation object for an int:

public class MethodForInt extends MethodCode {
        private final static MethodParameter param = new MethodParameter(SimpleFileIterator.class, "parser");
        public MethodForInt(Class type, CodeBuilder builder){
               super(type, builder);
        }
        public MethodParameter[] getInputParameters(){
               return new MethodParameter[]{
                      param
               };
        }
        public MethodParameter[] getInstanceParameters(){
              return getInputParameters();
        }
        protected String getImplBody(CodeBuilder builder){
              return "return " + param.getName() + ".nextInt();n";
        }
}

The base class MethodCode does all the work. During the code generation process, the MethodCode class determines the method name and the skeleton code for the implementation. The MethodForInt class simply needs to define all the data specific to its method. The most important part of this is the getImplBody(CodeBuilder builder) method. This is where the function is defined. The two methods, getInputParameters() and getInstanceParameters(), determine the function signature. The function signature both declares the function and calls it from other functions. The MethodForInt class produces the following code during generation:

/** Generated Load method for int**/
final public static int loadint(com.thoughtworks.rettig.util.SimpleFileIterator parser){
return parser.nextInt();
}

Seamless generation

Code generation adds the extra burden of generating source code during the build process. You can handle this using an easily configurable build tool such as Ant. For example, to generate the code for this article’s example, I created the following task:

<target name = "GenerateLoad">
<java classname = "com.thoughtworks.rettig.loadGenerator.LoadWriter"
dir = "."
fork = "yes">
<arg value = "com.thoughtworks.rettig.example.generated"/>
<arg value = "com.thoughtworks.rettig.example.PurchaseOrder"/>
         </java>
</target>

The two arguments specify the source code’s destination package, and the class for which to create a load procedure. Once the task is defined and integrated into the build procedure, code generation becomes a seamless part of the build process.

Compare working solutions

With two working implementations, retrospective analysis may help those considering similar paths.

The real difference in these solutions is apparent when you encounter a runtime problem. With runtime reflection, you probably get an indecipherable stack trace due to the extensive use of reflection and recursion. The generated code gives you a simple stack trace that you can trace back to the generated source code for debugging.

Here is an example of two stack traces that were generated from the same error. I’ll let you judge which one you’d rather debug. (Note that I’ve removed the com.thoughtworks.rettig package qualifier for readability.)

Runtime Reflection Exception:
java.lang.NumberFormatException: itemName
at java.lang.Integer.parseInt(Integer.java:409)
at java.lang.Integer.parseInt(Integer.java:458)
at ...util.SimpleFileIterator.nextInt(SimpleFileIterator.java:82)
at ...dataLoader.SimpleFileLoadManager.load(SimpleFileLoadManager.java:44)
at ...dataLoader.ReflectiveObjectLoader.initializeInstance(ReflectiveObjectLoader.java:129)
at ...dataLoader.ReflectiveObjectLoader.constructObject(ReflectiveObjectLoader.java, Compiled Code)
at ...dataLoader.ReflectiveObjectLoader.initializeInstance(ReflectiveObjectLoader.java:134)
at ...dataLoader.ReflectiveObjectLoader.constructObjectArray(ReflectiveObjectLoader.java, Compiled Code)
at ...dataLoader.ReflectiveObjectLoader.initializeArray(ReflectiveObjectLoader.java:39)
at ...dataLoader.ReflectiveObjectLoader.initializeInstance(ReflectiveObjectLoader.java:123)
at ...dataLoader.ReflectiveObjectLoader.constructObject(ReflectiveObjectLoader.java, Compiled Code)
at ...dataLoader.ReflectiveObjectLoader.initializeInstance(ReflectiveObjectLoader.java:134)
at ...dataLoader.ReflectiveObjectLoader.initializeInstance(ReflectiveObjectLoader.java:103)

Here is the generated code exception:

java.lang.NumberFormatException: itemName
at java.lang.Integer.parseInt(Integer.java:409)
at java.lang.Integer.parseInt(Integer.java:458)
at ...util.SimpleFileIterator.nextInt(SimpleFileIterator.java:82)
at ....example.generated.PurchaseOrderLoader.loadint(PurchaseOrderLoader.java:32)
at ....example.generated.PurchaseOrderLoader.loadLineItem(PurchaseOrderLoader.java:22)
at ....example.generated.PurchaseOrderLoader.loadLineItemArray(PurchaseOrderLoader.java, Compiled Code)
at ....example.generated.PurchaseOrderLoader.loadPurchaseOrder(PurchaseOrderLoader.java:27)

For runtime reflection, heavy logging is required to isolate the problem. The heavy use of logging during the load procedure is a code smell (a term used by those in XP circles to identify code that may need a healthy dose of refactoring) that indicates that perhaps a different approach may be necessary. Using reflection, you can make the stack trace more meaningful, but this further complicates an already complicated situation. In the generated code approach, the resulting code just logs how runtime reflection would handle the situation.

The two implementations provided an ample testing ground for performance. (Thanks to Josh MacKenzie for pointing this out as a potential concern.) I was surprised to find that my example load was typically four to seven times slower using runtime reflection.

A typical run produces results like the following:

java com.thoughtworks.rettig.example.TestPerformance
Number of Iterations: 100000
Generated
Total time: 14481
Max Memory Used: 1337672
Reflection
Total time: 89219
Max Memory Used: 1407944

This delay can be attributed to the time reflection requires to discover the class attributes during runtime. The generated code simply consists of explicit calls. Runtime reflection uses slightly, but not significantly, more memory. Of course, the reflection could probably be better optimized, but this optimization’s complexity would be enormous, and probably still lag far behind an explicit solution.

Conversely, the generated code was a breeze to optimize. With a similar code generator on a previous project, I optimized the load procedure to operate in a smaller memory footprint. I was able to make the changes in the code generator in just a few minutes. The optimization introduced a bug, but the stack trace pointed me directly to the offending method generation procedure and I fixed it within minutes. I would not try this same optimization with runtime reflection because it would require mental backflips.

Running the source code

If you look over the source code, you may be able to better grasp some of the topics covered here. To compile and run the source code, just unzip the contents to an empty folder, then run ant Install at the command line. This will use the Ant build script to generate, compile, and unit test all the source code. (This installation assumes you already have Ant and JUnit 3.7 installed.)

I created a highly contrived example using a simple purchase order that contains several types of objects. The JUnit test cases demonstrate how you create a purchase order object from a file using each method. The test cases then validate the objects’ contents to ensure that the data is properly loaded. You can find the contents of the tests and all supporting classes in the following packages:

com.thoughtworks.rettig.example
com.thoughtworks.rettig.example.reflection
com.thoughtworks.rettig.example.generated

The most notable difference between the two test cases is that runtime reflection requires no supporting code to load the data. This is the magic of reflection. It only requires the class definition and location of the source data to load the data. The generated code example relies on a generated loading class before it can create the test case.

The object creation process is very similar for both implementations. Here is the reflection code:

SimpleFileIterator iter = new SimpleFileIterator(fileLocation);
LoadManager manager = new SimpleFileLoadManager(iter);
PurchaseOrder obj = (PurchaseOrder) ReflectiveObjectLoader.initializeInstance(PurchaseOrder.class, manager);

Here is the generated code:

SimpleFileIterator iter = new SimpleFileIterator(file);
PurchaseOrder po = PurchaseOrderLoader.loadPurchaseOrder(iter);

Rule of thumb

The benefits of reflection are obvious. When coupled with code generation it becomes an invaluable and, more importantly, a safe tool. There is often no other way to escape many seemingly redundant tasks. As for code generation: the more I work with it, the more I like it. With every refactoring and increase in functionality, the code becomes clearer and more understandable. However, runtime reflection has the opposite effect. The more I increase its functionality, the more it increases in complexity.

So, in the future, if you feel you need to conquer a complicated problem using reflection, just remember one rule: don’t do it at runtime.

Michael J. Rettig currently
works as a software developer at ThoughtWorks, a consulting
firm specializing in enterprise-transforming software. He holds a
BS in computer science from Valparaiso University. Michael is a
Java 2 Certified Programmer and Java 2 Certified Developer. He
would like to thank Martin Fowler, whose guidance made this article
possible.

Source: www.infoworld.com