Diagnose common runtime problems with hprof

Track down the culprits behind your application failures

Memory leaks and deadlocks and CPU hogs, oh my! Java application developers often face these runtime problems. They can be particularly daunting in a complex application with multiple threads running through hundreds of thousands of lines of code — an application you can’t ship because it grows in memory, becomes inactive, or gobbles up more CPU cycles than it should.

It’s no secret that Java profiling tools have had a long way to catch up to their alternate-language counterparts. Many powerful tools now exist to help us track down the culprits behind those common problems. But how do you develop confidence in your ability to use these tools effectively? After all, you’re using the tools to diagnose complex behavior you don’t understand. To compound your plight, the data provided by the tools is reasonably complex and the information you are looking at or for is not always clear.

When faced with similar problems in my previous incarnation as an experimental physicist, I created control experiments with predictable results. This helped me gain confidence in the measurement system I used in experiments that generated less predictable results. Similarly, this article uses the hprof profiling tool to examine three simple control applications that exhibit the three common problem behaviors listed above. While not as user-friendly as some commercial tools on the market, hprof is included with the Java 2 JDK and, as I’ll demonstrate, can effectively diagnose these behaviors.

Run with hprof

Running your program with hprof is easy. Simply invoke the Java runtime with the following command-line option, as described in the JDK tool documentation for the Java application launcher:

java -Xrunhprof[:help][:<suboption>=<value>,...] MyMainClass

A list of suboptions is available with the [:help] option shown. I generated the examples in this article using the Blackdown port of the JDK 1.3-RC1 for Linux with the following launch command:

java -classic -Xrunhprof:heap=sites,cpu=samples,depth=10,monitor=y,thread=y,doe=y MemoryLeak

The following list explains the function of each suboption used in the previous command:

  • heap=sites: Tells hprof to generate stack traces indicating where memory was allocated
  • cpu=samples: Tells hprof to use statistical sampling to determine where the CPU spends its time
  • depth=10: Tells hprof to show stack traces 10 levels deep, at most
  • monitor=y: Tells hprof to generate information on contention monitors used to synchronize the work of multiple threads
  • thread=y: Tells hprof to identify threads in stack traces
  • doe=y: Tells hprof to produce a dump of profiling data upon exit

If you use JDK 1.3, you need to turn off the default HotSpot compiler with the -classic option. HotSpot has its own profiler, invoked through an -Xprof option, that uses an output format different from the one I’ll describe here.

Running your program with hprof will leave a file called java.hprof.txt in your working directory; this file contains the profiling information collected while your program runs. You can also generate a dump at any time while your program is running by pressing Ctrl- in your Java console window on Unix or Ctrl-Break on Windows.

Anatomy of an hprof output file

An hprof output file includes sections describing various characteristics of the profiled Java program. It starts with a header that describes its format, which the header claims is subject to change without notice.

The output file’s Thread and Trace sections help you figure out what threads were active when your program ran and what they did. The Thread section provides a list of all threads started and terminated during the program’s life. The Trace section includes a list of numbered stack traces for some threads. These stack trace numbers are cross-referenced in other file sections.

The Heap Dump and Sites sections help you analyze memory usage. Depending on the heap suboption you choose when you start the virtual machine (VM), you can get a dump of all live objects in the Java heap (heap=dump) and/or a sorted list of allocation sites that identifies the most heavily allocated objects (heap=sites).

The CPU Samples and CPU Time sections help you understand CPU utilization; the section you get depends on your cpu suboption (cpu=samples or cpu=time). CPU Samples provides a statistical execution profile. CPU Time includes measurements of how many times a given method was called and how long each method took to execute.

The Monitor Time and Monitor Dump sections help you understand how synchronization affects your program’s performance. Monitor Time shows how much time your threads experience contention for locked resources. Monitor Dump is a snapshot of monitors currently in use. As you’ll see, Monitor Dump is useful for finding deadlocks.

Diagnose a memory leak

In Java, I define a memory leak as a (usually) unintentional failure to dereference discarded objects so that the garbage collector cannot reclaim the memory they use. The MemoryLeak program in Listing 1 is simple:

Listing 1. MemoryLeak program

01 import java.util.Vector;
02 
03 public class MemoryLeak {
04 
05    public static void main(String[] args) {
06 
07       int    MAX_CONSUMERS        = 10000;
08       int    SLEEP_BETWEEN_ALLOCS = 5;
09 
10       ConsumerContainer objectHolder = new ConsumerContainer();
11 
12       while(objectHolder.size() < MAX_CONSUMERS) {
13           System.out.println("Allocating object " + 
14                              Integer.toString(objectHolder.size())
15                             );
16           objectHolder.add(new MemoryConsumer());
17           try {
18               Thread.currentThread().sleep(SLEEP_BETWEEN_ALLOCS);
19           } catch (InterruptedException ie) {
20               // Do nothing.
21           }
22       } // while.
23    } // main.
24 
25 } // End of MemoryLeak.
26 
27 /** Named container class to hold object references. */
28 class ConsumerContainer extends Vector {}
29 
30 /** Class that consumes a fixed amount of memory. */
31 class MemoryConsumer {
32   public static final int MEMORY_BLOCK        = 1024;
33   public byte[]           memoryHoldingArray;
34 
35   MemoryConsumer() {
36       memoryHoldingArray = new byte[MEMORY_BLOCK];
37   }
38 } // End MemoryConsumer.

When the program runs, it creates a ConsumerContainer object, then starts creating and adding MemoryConsumer objects at least 1 KB in size to that ConsumerContainer object. Keeping the objects accessible makes them unavailable for garbage collection, simulating a memory leak.

Let’s look at select parts of the profile file. The Sites section’s first few lines clearly show what is happening:

SITES BEGIN (ordered by live bytes) Mon Sep  3 19:16:29 2001
          percent         live     alloc'ed       stack class
 rank   self  accum    bytes objs     bytes  objs trace name
    1 97.31% 97.31% 10280000 10000 10280000 10000  1995 [B
    2  0.39% 97.69%    40964     1    81880    10  1996 [L<Unknown>;
    3  0.38% 98.07%    40000 10000    40000 10000  1994 MemoryConsumer
    4  0.16% 98.23%    16388     1    16388     1  1295 [C
    5  0.16% 98.38%    16388     1    16388     1  1304 [C
    ...

There are 10,000 objects of type byte[] ([B in VM-speak) as well as 10,000 MemoryConsumer objects. The byte arrays take up 10,280,000 bytes, so apparently there is overhead just above the raw bytes that each array consumes. Since the number of objects allocated equals the number of live objects, we can conclude that none of these objects could be garbage collected. This is consistent with our expectations.

Another interesting point: the memory reported to be consumed by the MemoryConsumer objects does not include the memory consumed by the byte arrays. This shows that our profiling tool does not expose hierarchical containment relationships, but rather class-by-class statistics. This is important to understand when using hprof to pinpoint a memory leak.

Now, where did those leaky byte arrays come from? Notice that the MemoryConsumer objects and the byte arrays reference traces 1994 and 1995 in the following Trace section. Lo and behold, these traces tell us that the MemoryConsumer objects were created in the MemoryLeak class’s main() method and that the byte arrays were created in the constructor (<init>() method in VM-speak). We’ve found our memory leak, line numbers and all:

TRACE 1994: (thread=1)
        MemoryLeak.main(MemoryLeak.java:16)
TRACE 1995: (thread=1)
        MemoryConsumer.<init>(MemoryLeak.java:36)
        MemoryLeak.main(MemoryLeak.java:16)

Diagnose a CPU hog

In Listing 2, a BusyWork class has each thread call a method that regulates how much the thread works by varying its sleep time in between bouts of performing CPU-intensive calculations:

Listing 2. CPUHog program

01 /** Main class for control test. */
02 public class CPUHog {
03    public static void main(String[] args) {
04 
05       Thread slouch, workingStiff, workaholic;
06       slouch       = new Slouch();
07       workingStiff = new WorkingStiff();
08       workaholic   = new Workaholic();
09 
10       slouch.start();
11       workingStiff.start();
12       workaholic.start();
13    }
14 }
15 
16 /** Low CPU utilization thread. */
17 class Slouch extends Thread {
18   public Slouch() {
19      super("Slouch");
20   }
21   public void run() {
22      BusyWork.slouch();
23   }
24 }
25 
26 /** Medium CPU utilization thread. */
27 class WorkingStiff extends Thread {
28   public WorkingStiff() {
29       super("WorkingStiff");
30   }
31   public void run() {
32      BusyWork.workNormally();
33   }
34 }
35 
36 /** High CPU utilization thread. */
37 class Workaholic extends Thread {
38   public Workaholic() {
39       super("Workaholic");
40   }
41   public void run() {
42      BusyWork.workTillYouDrop();
43   }
44 }
45 
46 /** Class with static methods to consume varying amounts
47  *  of CPU time. */
48 class BusyWork {
49 
50   public static int callCount = 0;
51 
52   public static void slouch() {
53     int SLEEP_INTERVAL = 1000;
54     computeAndSleepLoop(SLEEP_INTERVAL); 
55   }
56 
57   public static void workNormally() {
58     int SLEEP_INTERVAL = 100;
59     computeAndSleepLoop(SLEEP_INTERVAL); 
60   }
61 
62   public static void workTillYouDrop() {
63     int SLEEP_INTERVAL = 10;
64     computeAndSleepLoop(SLEEP_INTERVAL); 
65   }
66   
67   private static void computeAndSleepLoop(int sleepInterval) {
68      int MAX_CALLS = 10000;
69      while (callCount < MAX_CALLS) {
70         computeAndSleep(sleepInterval);
71      }
72   } 
73 
74   private static void computeAndSleep(int sleepInterval) {
75         int    COMPUTATIONS = 1000;
76         double result;
77         
78         // Compute.
79         callCount++;
80         for (int i = 0; i < COMPUTATIONS; i++) {
81            result = Math.atan(callCount * Math.random());         
82         }
83 
84         // Sleep.
85         try {
86            Thread.currentThread().sleep(sleepInterval);
87         } catch (InterruptedException ie) {
88            // Do nothing.
89         }
90 
91   } // End computeAndSleep.
92 } // End BusyWork.

There are three threads — Workaholic, WorkingStiff, and Slouch — whose work ethics vary by orders of magnitude, judging by the work they choose to do. Examine the profile’s CPU Samples section shown below. The three highest ranked traces show that the CPU spent most of its time calculating random numbers and arc tangents, as we would expect:

CPU SAMPLES BEGIN (total = 935) Tue Sep  4 20:44:49 2001
rank   self  accum   count trace method
   1 39.04% 39.04%     365  2040 java/util/Random.next
   2 26.84% 65.88%     251  2042 java/util/Random.nextDouble
   3 10.91% 76.79%     102  2041 java/lang/StrictMath.atan
   4  8.13% 84.92%      76  2046 BusyWork.computeAndSleep
   5  4.28% 89.20%      40  2050 java/lang/Math.atan
   6  3.21% 92.41%      30  2045 java/lang/Math.random
   7  2.25% 94.65%      21  2051 java/lang/Math.random
   8  1.82% 96.47%      17  2044 java/util/Random.next
   9  1.50% 97.97%      14  2043 java/util/Random.nextDouble
  10  0.43% 98.40%       4  2047 BusyWork.computeAndSleep
  11  0.21% 98.61%       2  2048 java/lang/StrictMath.atan
  12  0.11% 98.72%       1  1578 java/io/BufferedReader.readLine
  13  0.11% 98.82%       1  2054 java/lang/Thread.sleep
  14  0.11% 98.93%       1  1956 java/security/PermissionCollection.setReadOnly
  15  0.11% 99.04%       1  2055 java/lang/Thread.sleep
  16  0.11% 99.14%       1  1593 java/lang/String.valueOf
  17  0.11% 99.25%       1  2052 java/lang/Math.random
  18  0.11% 99.36%       1  2049 java/util/Random.nextDouble
  19  0.11% 99.47%       1  2031 BusyWork.computeAndSleep
  20  0.11% 99.57%       1  1530 sun/io/CharToByteISO8859_1.convert
  ...

Note that calls to the BusyWork.computeAndSleep() method take up 8.13 percent, 0.43 percent, and 0.11 percent for the Workaholic, WorkingStiff, and Slouch threads, respectively. We can tell which threads these are by examining the traces referenced in the CPU Samples section’s trace column above (ranks 4, 10, and 19) in the following Trace section:

TRACE 2040: (thread=8)
        java/util/Random.next(Random.java:Unknown line)
        java/util/Random.nextDouble(Random.java:373)
        java/lang/Math.random(Math.java:557)
        BusyWork.computeAndSleep(CPUHog.java:81)
        BusyWork.computeAndSleepLoop(CPUHog.java:70)
        BusyWork.workTillYouDrop(CPUHog.java:64)
        Workaholic.run(CPUHog.java:42)
TRACE 2046: (thread=8)
        BusyWork.computeAndSleep(CPUHog.java:81)
        BusyWork.computeAndSleepLoop(CPUHog.java:70)
        BusyWork.workTillYouDrop(CPUHog.java:64)
        Workaholic.run(CPUHog.java:42)
TRACE 2047: (thread=7)
        BusyWork.computeAndSleep(CPUHog.java:81)
        BusyWork.computeAndSleepLoop(CPUHog.java:70)
        BusyWork.workNormally(CPUHog.java:59)
        WorkingStiff.run(CPUHog.java:32)
TRACE 2031: (thread=6)
        BusyWork.computeAndSleep(CPUHog.java:81)
        BusyWork.computeAndSleepLoop(CPUHog.java:70)
        BusyWork.slouch(CPUHog.java:54)
        Slouch.run(CPUHog.java:22)

Perhaps just as interesting as what the profiler shows us is what it doesn’t clearly show. Note that thread 8 (the Workaholic) is responsible for the top CPU hog call to the Random.next() method. The workTillYouDrop() method ultimately calls the Random.next() method, but that does not mean that it shows up as a top CPU hog on a line by itself, as we might expect.

The profiler flattens the hierarchy of method calls the way it flattened the hierarchy of object containment in the MemoryLeak example. It does not count CPU utilization by internal methods toward the total of a method lower on the stack trace. The calls to the BusyWork.computeAndSleep() method show up because this method is called many times by each thread. BusyWork.computeAndSleep() is often part of the sampled stack traces, so it frequently gets caught in the act. As was the case in the Sites section, here we can only discern what is really happening when we look at the stack traces.

Note: hprof’s cpu=time mode should show CPU usage hierarchically, but this feature appears to have a bug that leads to a segmentation fault in the 1.3 VMs I worked with.

Diagnose a deadlock

The third control program exhibits a classic deadlock, as shown in Listing 3:

Listing 3. Deadlock program

01 /** Main class for the control test. */
02 public class ClassicDeadlock {
03 
04    public static void main(String[] args) {
05       SynchronizedObject left;
06       SynchronizedObject right;
07 
08       LeftToRightGrabber leftToRight;
09       RightToLeftGrabber rightToLeft;
10 
11       left  = new SynchronizedObject("left");
12       right = new SynchronizedObject("right");
13 
14       leftToRight = new LeftToRightGrabber(left,right);
15       rightToLeft = new RightToLeftGrabber(left,right);
16 
17       leftToRight.start();
18       rightToLeft.start();
19    }
20 
21 } // End ClassicDeadlock.
22 
23 /** An object containing a synchronized method. */
24 class SynchronizedObject {
25   public String name;
26   public SynchronizedObject(String aName) {
27      name = aName;
28   }
29   public synchronized void synchronizedMethod() {
30       String message = name                        + 
31                        " lock acquired by thread " +
32                        Thread.currentThread().getName(); 
33       System.out.println(message);
34       System.out.flush();
35   }
36 } // End of SynchronizedObject.
37 
38 /** Base class for the Left and Right Grabber. */
39 abstract class GrabberBase extends Thread {
40    protected SynchronizedObject leftObject;
41    protected SynchronizedObject rightObject;
42 
43    public GrabberBase(SynchronizedObject left,
44                       SynchronizedObject right,
45                       String             name
46                      ) {
47      super(name);
48      leftObject  = left;
49      rightObject = right; 
50    } 
51 } // End of GrabberBase.
52 
53 /** Class to grab locks right and then left. */
54 class RightToLeftGrabber extends GrabberBase {
55       public RightToLeftGrabber( SynchronizedObject left,
56                                  SynchronizedObject right) {
57           super(left, right, "RightToLeft");
58       }
59 
60       public void run() {
61          while(true) {
62            grabRightToLeft();
63          }
64       }
65 
66       private void grabRightToLeft() {
67           synchronized (rightObject) {
68              leftObject.synchronizedMethod();
69           }
70       }
71 } // End of RightToLeftGrabber.
72 
73 /** Class to grab locks left and then right. */
74 class LeftToRightGrabber extends GrabberBase {
75       public LeftToRightGrabber( SynchronizedObject left,
76                                  SynchronizedObject right) {
77           super(left, right, "LeftToRight");
78       }
79 
80       public void run() {
81          while(true) {
82             grabLeftToRight();
83          }
84       }
85  
86       private void grabLeftToRight() {
87           synchronized (leftObject) {
88              rightObject.synchronizedMethod();
89           }
90       }
91 } // End of LeftToRightGrabber.

This program creates two equivalent objects, left and right, each having its own synchronization monitor. Two threads then try to grab locks on these objects in a particular order (by entering a block synchronized on one object and then calling a synchronized method on the other). If the LeftToRightGrabber thread can enter its synchronized block and the RightToLeftGrabber thread is scheduled to run before LeftToRightGrabber completes its work, a deadlock will occur. How long the program must run to hit a deadlock depends on the JVM’s thread scheduling algorithm. For me, this typically occurred during the third switch between thread contexts.

Once the deadlock occurs, hitting Ctrl- generates a profile file containing a Monitor Dump section, like so:

MONITOR DUMP BEGIN
    THREAD 8, trace 2018, status: CW
    THREAD 7, trace 2027, status: MW
    THREAD 6, trace 2016, status: MW
    THREAD 5, trace 2036, status: MW
    THREAD 4, trace 2037, status: CW
    THREAD 3, trace 2038, status: CW
    THREAD 2, trace 2039, status: R
    MONITOR java/lang/ref/Reference$Lock(80ddd80) unowned
        waiting to be notified: thread 3
    MONITOR SynchronizedObject(81ece98)
        owner: thread 7, entry count: 1
        waiting to enter: thread 6
    MONITOR SynchronizedObject(81ecd40)
        owner: thread 6, entry count: 1
        waiting to enter: thread 7
    MONITOR java/lang/ref/ReferenceQueue$Lock(80e7010) unowned
        waiting to be notified: thread 4
    ...

We can see that threads 6 and 7 (LeftToRight and RightToLeft from the Thread section) are both in waiting states. Thread 7 holds one of the SynchronizedObject instance locks while thread 6 waits for that lock, and thread 6 holds the other lock while thread 7 waits for it. The traces below confirm that each thread dutifully waits to obtain the lock held by the other thread — exactly where we expect them to wait:

TRACE 2027: (thread=7)
        RightToLeftGrabber.grabRightToLeft(ClassicDeadlock.java:68)
        RightToLeftGrabber.run(ClassicDeadlock.java:62)
TRACE 2016: (thread=6)
        LeftToRightGrabber.grabLeftToRight(ClassicDeadlock.java:88)
        LeftToRightGrabber.run(ClassicDeadlock.java:82)

hprof has shown us what stopped this program in its tracks.

Tackle uncharted territory

By writing simple control experiment programs and running them under your chosen profiling tool, you can gain experience with the tool and confidence to use it when you venture into uncharted territory. The programs examined in this article model three common problems. You can use them as-is with other profilers and let them serve as a starting point for your own experimentation.

hprof may not be the easiest tool with which to work, but it’s freely available to anyone using the Java 2 Platform and can effectively track down runtime problems like these.

Bill Pierce is a senior
software engineer at Seattle-based Vixel Corp., where he has helped
ship eight releases of an all-Java network management application
called SAN InSite. Previously, he spent 10 years pulling results
from measurement systems as an experimental physicist.

Source: www.infoworld.com