Diagnose common runtime problems with hprof
Track down the culprits behind your application failures
Memory leaks and deadlocks and CPU hogs, oh my! Java application developers often face these runtime problems. They can be particularly daunting in a complex application with multiple threads running through hundreds of thousands of lines of code — an application you can’t ship because it grows in memory, becomes inactive, or gobbles up more CPU cycles than it should.
It’s no secret that Java profiling tools have had a long way to catch up to their alternate-language counterparts. Many powerful tools now exist to help us track down the culprits behind those common problems. But how do you develop confidence in your ability to use these tools effectively? After all, you’re using the tools to diagnose complex behavior you don’t understand. To compound your plight, the data provided by the tools is reasonably complex and the information you are looking at or for is not always clear.
When faced with similar problems in my previous incarnation as an experimental physicist, I created control experiments with predictable results. This helped me gain confidence in the measurement system I used in experiments that generated less predictable results. Similarly, this article uses the hprof profiling tool to examine three simple control applications that exhibit the three common problem behaviors listed above. While not as user-friendly as some commercial tools on the market, hprof is included with the Java 2 JDK and, as I’ll demonstrate, can effectively diagnose these behaviors.
Run with hprof
Running your program with hprof is easy. Simply invoke the Java runtime with the following command-line option, as described in the JDK tool documentation for the Java application launcher:
java -Xrunhprof[:help][:<suboption>=<value>,...] MyMainClass
A list of suboptions is available with the [:help]
option shown. I generated the examples in this article using the Blackdown port of the JDK 1.3-RC1 for Linux with the following launch command:
java -classic -Xrunhprof:heap=sites,cpu=samples,depth=10,monitor=y,thread=y,doe=y MemoryLeak
The following list explains the function of each suboption used in the previous command:
heap=sites
: Tells hprof to generate stack traces indicating where memory was allocatedcpu=samples
: Tells hprof to use statistical sampling to determine where the CPU spends its timedepth=10
: Tells hprof to show stack traces 10 levels deep, at mostmonitor=y
: Tells hprof to generate information on contention monitors used to synchronize the work of multiple threadsthread=y
: Tells hprof to identify threads in stack tracesdoe=y
: Tells hprof to produce a dump of profiling data upon exit
If you use JDK 1.3, you need to turn off the default HotSpot compiler with the -classic
option. HotSpot has its own profiler, invoked through an -Xprof
option, that uses an output format different from the one I’ll describe here.
Running your program with hprof will leave a file called java.hprof.txt
in your working directory; this file contains the profiling information collected while your program runs. You can also generate a dump at any time while your program is running by pressing Ctrl- in your Java console window on Unix or Ctrl-Break on Windows.
Anatomy of an hprof output file
An hprof output file includes sections describing various characteristics of the profiled Java program. It starts with a header that describes its format, which the header claims is subject to change without notice.
The output file’s Thread and Trace sections help you figure out what threads were active when your program ran and what they did. The Thread section provides a list of all threads started and terminated during the program’s life. The Trace section includes a list of numbered stack traces for some threads. These stack trace numbers are cross-referenced in other file sections.
The Heap Dump and Sites sections help you analyze memory usage. Depending on the heap
suboption you choose when you start the virtual machine (VM), you can get a dump of all live objects in the Java heap (heap=dump
) and/or a sorted list of allocation sites that identifies the most heavily allocated objects (heap=sites
).
The CPU Samples and CPU Time sections help you understand CPU utilization; the section you get depends on your cpu
suboption (cpu=samples
or cpu=time
). CPU Samples provides a statistical execution profile. CPU Time includes measurements of how many times a given method was called and how long each method took to execute.
The Monitor Time and Monitor Dump sections help you understand how synchronization affects your program’s performance. Monitor Time shows how much time your threads experience contention for locked resources. Monitor Dump is a snapshot of monitors currently in use. As you’ll see, Monitor Dump is useful for finding deadlocks.
Diagnose a memory leak
In Java, I define a memory leak as a (usually) unintentional failure to dereference discarded objects so that the garbage collector cannot reclaim the memory they use. The MemoryLeak
program in Listing 1 is simple:
Listing 1. MemoryLeak program
01 import java.util.Vector;
02
03 public class MemoryLeak {
04
05 public static void main(String[] args) {
06
07 int MAX_CONSUMERS = 10000;
08 int SLEEP_BETWEEN_ALLOCS = 5;
09
10 ConsumerContainer objectHolder = new ConsumerContainer();
11
12 while(objectHolder.size() < MAX_CONSUMERS) {
13 System.out.println("Allocating object " +
14 Integer.toString(objectHolder.size())
15 );
16 objectHolder.add(new MemoryConsumer());
17 try {
18 Thread.currentThread().sleep(SLEEP_BETWEEN_ALLOCS);
19 } catch (InterruptedException ie) {
20 // Do nothing.
21 }
22 } // while.
23 } // main.
24
25 } // End of MemoryLeak.
26
27 /** Named container class to hold object references. */
28 class ConsumerContainer extends Vector {}
29
30 /** Class that consumes a fixed amount of memory. */
31 class MemoryConsumer {
32 public static final int MEMORY_BLOCK = 1024;
33 public byte[] memoryHoldingArray;
34
35 MemoryConsumer() {
36 memoryHoldingArray = new byte[MEMORY_BLOCK];
37 }
38 } // End MemoryConsumer.
When the program runs, it creates a ConsumerContainer
object, then starts creating and adding MemoryConsumer
objects at least 1 KB in size to that ConsumerContainer
object. Keeping the objects accessible makes them unavailable for garbage collection, simulating a memory leak.
Let’s look at select parts of the profile file. The Sites section’s first few lines clearly show what is happening:
SITES BEGIN (ordered by live bytes) Mon Sep 3 19:16:29 2001
percent live alloc'ed stack class
rank self accum bytes objs bytes objs trace name
1 97.31% 97.31% 10280000 10000 10280000 10000 1995 [B
2 0.39% 97.69% 40964 1 81880 10 1996 [L<Unknown>;
3 0.38% 98.07% 40000 10000 40000 10000 1994 MemoryConsumer
4 0.16% 98.23% 16388 1 16388 1 1295 [C
5 0.16% 98.38% 16388 1 16388 1 1304 [C
...
There are 10,000 objects of type byte[]
([B
in VM-speak) as well as 10,000 MemoryConsumer
objects. The byte arrays take up 10,280,000 bytes, so apparently there is overhead just above the raw bytes that each array consumes. Since the number of objects allocated equals the number of live objects, we can conclude that none of these objects could be garbage collected. This is consistent with our expectations.
Another interesting point: the memory reported to be consumed by the MemoryConsumer
objects does not include the memory consumed by the byte arrays. This shows that our profiling tool does not expose hierarchical containment relationships, but rather class-by-class statistics. This is important to understand when using hprof to pinpoint a memory leak.
Now, where did those leaky byte arrays come from? Notice that the MemoryConsumer
objects and the byte arrays reference traces 1994
and 1995
in the following Trace section. Lo and behold, these traces tell us that the MemoryConsumer
objects were created in the MemoryLeak
class’s main()
method and that the byte arrays were created in the constructor (<init>()
method in VM-speak). We’ve found our memory leak, line numbers and all:
TRACE 1994: (thread=1)
MemoryLeak.main(MemoryLeak.java:16)
TRACE 1995: (thread=1)
MemoryConsumer.<init>(MemoryLeak.java:36)
MemoryLeak.main(MemoryLeak.java:16)
Diagnose a CPU hog
In Listing 2, a BusyWork
class has each thread call a method that regulates how much the thread works by varying its sleep time in between bouts of performing CPU-intensive calculations:
Listing 2. CPUHog program
01 /** Main class for control test. */
02 public class CPUHog {
03 public static void main(String[] args) {
04
05 Thread slouch, workingStiff, workaholic;
06 slouch = new Slouch();
07 workingStiff = new WorkingStiff();
08 workaholic = new Workaholic();
09
10 slouch.start();
11 workingStiff.start();
12 workaholic.start();
13 }
14 }
15
16 /** Low CPU utilization thread. */
17 class Slouch extends Thread {
18 public Slouch() {
19 super("Slouch");
20 }
21 public void run() {
22 BusyWork.slouch();
23 }
24 }
25
26 /** Medium CPU utilization thread. */
27 class WorkingStiff extends Thread {
28 public WorkingStiff() {
29 super("WorkingStiff");
30 }
31 public void run() {
32 BusyWork.workNormally();
33 }
34 }
35
36 /** High CPU utilization thread. */
37 class Workaholic extends Thread {
38 public Workaholic() {
39 super("Workaholic");
40 }
41 public void run() {
42 BusyWork.workTillYouDrop();
43 }
44 }
45
46 /** Class with static methods to consume varying amounts
47 * of CPU time. */
48 class BusyWork {
49
50 public static int callCount = 0;
51
52 public static void slouch() {
53 int SLEEP_INTERVAL = 1000;
54 computeAndSleepLoop(SLEEP_INTERVAL);
55 }
56
57 public static void workNormally() {
58 int SLEEP_INTERVAL = 100;
59 computeAndSleepLoop(SLEEP_INTERVAL);
60 }
61
62 public static void workTillYouDrop() {
63 int SLEEP_INTERVAL = 10;
64 computeAndSleepLoop(SLEEP_INTERVAL);
65 }
66
67 private static void computeAndSleepLoop(int sleepInterval) {
68 int MAX_CALLS = 10000;
69 while (callCount < MAX_CALLS) {
70 computeAndSleep(sleepInterval);
71 }
72 }
73
74 private static void computeAndSleep(int sleepInterval) {
75 int COMPUTATIONS = 1000;
76 double result;
77
78 // Compute.
79 callCount++;
80 for (int i = 0; i < COMPUTATIONS; i++) {
81 result = Math.atan(callCount * Math.random());
82 }
83
84 // Sleep.
85 try {
86 Thread.currentThread().sleep(sleepInterval);
87 } catch (InterruptedException ie) {
88 // Do nothing.
89 }
90
91 } // End computeAndSleep.
92 } // End BusyWork.
There are three threads — Workaholic
, WorkingStiff
, and Slouch
— whose work ethics vary by orders of magnitude, judging by the work they choose to do. Examine the profile’s CPU Samples section shown below. The three highest ranked traces show that the CPU spent most of its time calculating random numbers and arc tangents, as we would expect:
CPU SAMPLES BEGIN (total = 935) Tue Sep 4 20:44:49 2001
rank self accum count trace method
1 39.04% 39.04% 365 2040 java/util/Random.next
2 26.84% 65.88% 251 2042 java/util/Random.nextDouble
3 10.91% 76.79% 102 2041 java/lang/StrictMath.atan
4 8.13% 84.92% 76 2046 BusyWork.computeAndSleep
5 4.28% 89.20% 40 2050 java/lang/Math.atan
6 3.21% 92.41% 30 2045 java/lang/Math.random
7 2.25% 94.65% 21 2051 java/lang/Math.random
8 1.82% 96.47% 17 2044 java/util/Random.next
9 1.50% 97.97% 14 2043 java/util/Random.nextDouble
10 0.43% 98.40% 4 2047 BusyWork.computeAndSleep
11 0.21% 98.61% 2 2048 java/lang/StrictMath.atan
12 0.11% 98.72% 1 1578 java/io/BufferedReader.readLine
13 0.11% 98.82% 1 2054 java/lang/Thread.sleep
14 0.11% 98.93% 1 1956 java/security/PermissionCollection.setReadOnly
15 0.11% 99.04% 1 2055 java/lang/Thread.sleep
16 0.11% 99.14% 1 1593 java/lang/String.valueOf
17 0.11% 99.25% 1 2052 java/lang/Math.random
18 0.11% 99.36% 1 2049 java/util/Random.nextDouble
19 0.11% 99.47% 1 2031 BusyWork.computeAndSleep
20 0.11% 99.57% 1 1530 sun/io/CharToByteISO8859_1.convert
...
Note that calls to the BusyWork.computeAndSleep()
method take up 8.13 percent, 0.43 percent, and 0.11 percent for the Workaholic
, WorkingStiff
, and Slouch
threads, respectively. We can tell which threads these are by examining the traces referenced in the CPU Samples section’s trace column above (ranks 4, 10, and 19) in the following Trace section:
TRACE 2040: (thread=8)
java/util/Random.next(Random.java:Unknown line)
java/util/Random.nextDouble(Random.java:373)
java/lang/Math.random(Math.java:557)
BusyWork.computeAndSleep(CPUHog.java:81)
BusyWork.computeAndSleepLoop(CPUHog.java:70)
BusyWork.workTillYouDrop(CPUHog.java:64)
Workaholic.run(CPUHog.java:42)
TRACE 2046: (thread=8)
BusyWork.computeAndSleep(CPUHog.java:81)
BusyWork.computeAndSleepLoop(CPUHog.java:70)
BusyWork.workTillYouDrop(CPUHog.java:64)
Workaholic.run(CPUHog.java:42)
TRACE 2047: (thread=7)
BusyWork.computeAndSleep(CPUHog.java:81)
BusyWork.computeAndSleepLoop(CPUHog.java:70)
BusyWork.workNormally(CPUHog.java:59)
WorkingStiff.run(CPUHog.java:32)
TRACE 2031: (thread=6)
BusyWork.computeAndSleep(CPUHog.java:81)
BusyWork.computeAndSleepLoop(CPUHog.java:70)
BusyWork.slouch(CPUHog.java:54)
Slouch.run(CPUHog.java:22)
Perhaps just as interesting as what the profiler shows us is what it doesn’t clearly show. Note that thread 8 (the Workaholic
) is responsible for the top CPU hog call to the Random.next()
method. The workTillYouDrop()
method ultimately calls the Random.next()
method, but that does not mean that it shows up as a top CPU hog on a line by itself, as we might expect.
The profiler flattens the hierarchy of method calls the way it flattened the hierarchy of object containment in the MemoryLeak
example. It does not count CPU utilization by internal methods toward the total of a method lower on the stack trace. The calls to the BusyWork.computeAndSleep()
method show up because this method is called many times by each thread. BusyWork.computeAndSleep()
is often part of the sampled stack traces, so it frequently gets caught in the act. As was the case in the Sites section, here we can only discern what is really happening when we look at the stack traces.
Note: hprof’s cpu=time
mode should show CPU usage hierarchically, but this feature appears to have a bug that leads to a segmentation fault in the 1.3 VMs I worked with.
Diagnose a deadlock
The third control program exhibits a classic deadlock, as shown in Listing 3:
Listing 3. Deadlock program
01 /** Main class for the control test. */
02 public class ClassicDeadlock {
03
04 public static void main(String[] args) {
05 SynchronizedObject left;
06 SynchronizedObject right;
07
08 LeftToRightGrabber leftToRight;
09 RightToLeftGrabber rightToLeft;
10
11 left = new SynchronizedObject("left");
12 right = new SynchronizedObject("right");
13
14 leftToRight = new LeftToRightGrabber(left,right);
15 rightToLeft = new RightToLeftGrabber(left,right);
16
17 leftToRight.start();
18 rightToLeft.start();
19 }
20
21 } // End ClassicDeadlock.
22
23 /** An object containing a synchronized method. */
24 class SynchronizedObject {
25 public String name;
26 public SynchronizedObject(String aName) {
27 name = aName;
28 }
29 public synchronized void synchronizedMethod() {
30 String message = name +
31 " lock acquired by thread " +
32 Thread.currentThread().getName();
33 System.out.println(message);
34 System.out.flush();
35 }
36 } // End of SynchronizedObject.
37
38 /** Base class for the Left and Right Grabber. */
39 abstract class GrabberBase extends Thread {
40 protected SynchronizedObject leftObject;
41 protected SynchronizedObject rightObject;
42
43 public GrabberBase(SynchronizedObject left,
44 SynchronizedObject right,
45 String name
46 ) {
47 super(name);
48 leftObject = left;
49 rightObject = right;
50 }
51 } // End of GrabberBase.
52
53 /** Class to grab locks right and then left. */
54 class RightToLeftGrabber extends GrabberBase {
55 public RightToLeftGrabber( SynchronizedObject left,
56 SynchronizedObject right) {
57 super(left, right, "RightToLeft");
58 }
59
60 public void run() {
61 while(true) {
62 grabRightToLeft();
63 }
64 }
65
66 private void grabRightToLeft() {
67 synchronized (rightObject) {
68 leftObject.synchronizedMethod();
69 }
70 }
71 } // End of RightToLeftGrabber.
72
73 /** Class to grab locks left and then right. */
74 class LeftToRightGrabber extends GrabberBase {
75 public LeftToRightGrabber( SynchronizedObject left,
76 SynchronizedObject right) {
77 super(left, right, "LeftToRight");
78 }
79
80 public void run() {
81 while(true) {
82 grabLeftToRight();
83 }
84 }
85
86 private void grabLeftToRight() {
87 synchronized (leftObject) {
88 rightObject.synchronizedMethod();
89 }
90 }
91 } // End of LeftToRightGrabber.
This program creates two equivalent objects, left
and right
, each having its own synchronization monitor. Two threads then try to grab locks on these objects in a particular order (by entering a block synchronized on one object and then calling a synchronized method on the other). If the LeftToRightGrabber
thread can enter its synchronized block and the RightToLeftGrabber
thread is scheduled to run before LeftToRightGrabber
completes its work, a deadlock will occur. How long the program must run to hit a deadlock depends on the JVM’s thread scheduling algorithm. For me, this typically occurred during the third switch between thread contexts.
Once the deadlock occurs, hitting Ctrl- generates a profile file containing a Monitor Dump section, like so:
MONITOR DUMP BEGIN
THREAD 8, trace 2018, status: CW
THREAD 7, trace 2027, status: MW
THREAD 6, trace 2016, status: MW
THREAD 5, trace 2036, status: MW
THREAD 4, trace 2037, status: CW
THREAD 3, trace 2038, status: CW
THREAD 2, trace 2039, status: R
MONITOR java/lang/ref/Reference$Lock(80ddd80) unowned
waiting to be notified: thread 3
MONITOR SynchronizedObject(81ece98)
owner: thread 7, entry count: 1
waiting to enter: thread 6
MONITOR SynchronizedObject(81ecd40)
owner: thread 6, entry count: 1
waiting to enter: thread 7
MONITOR java/lang/ref/ReferenceQueue$Lock(80e7010) unowned
waiting to be notified: thread 4
...
We can see that threads 6 and 7 (LeftToRight
and RightToLeft
from the Thread section) are both in waiting states. Thread 7 holds one of the SynchronizedObject
instance locks while thread 6 waits for that lock, and thread 6 holds the other lock while thread 7 waits for it. The traces below confirm that each thread dutifully waits to obtain the lock held by the other thread — exactly where we expect them to wait:
TRACE 2027: (thread=7)
RightToLeftGrabber.grabRightToLeft(ClassicDeadlock.java:68)
RightToLeftGrabber.run(ClassicDeadlock.java:62)
TRACE 2016: (thread=6)
LeftToRightGrabber.grabLeftToRight(ClassicDeadlock.java:88)
LeftToRightGrabber.run(ClassicDeadlock.java:82)
hprof has shown us what stopped this program in its tracks.
Tackle uncharted territory
By writing simple control experiment programs and running them under your chosen profiling tool, you can gain experience with the tool and confidence to use it when you venture into uncharted territory. The programs examined in this article model three common problems. You can use them as-is with other profilers and let them serve as a starting point for your own experimentation.
hprof may not be the easiest tool with which to work, but it’s freely available to anyone using the Java 2 Platform and can effectively track down runtime problems like these.