Can double-checked locking be fixed?

No matter how you rig it, double-checked locking still fails

In “Double-Checked Locking: Clever but Broken,” I looked at the double-checked locking (DCL) idiom, which is recommended by a number of Java books and articles as a way to reduce synchronization overhead. Unfortunately, DCL isn’t guaranteed to work under the current Java Memory Model (JMM.)

Judging from the many interesting letters and suggestions I received, readers certainly connected with that article. Many readers were surprised, incredulous, and sometimes even angered to learn of some of the stranger aspects of the Java Memory Model (JMM). While the JMM is complicated and full of surprises, the good news is that if you follow the rules — namely, synchronize whenever you read data that might have been written by a different thread or write data that will be read by a different thread — you have nothing to fear from the JMM. But if you want to understand more about what’s going on behind the scenes with concurrent programming, read on. Lots of readers tried to patch the holes in DCL; in this article I’ll show why those holes are harder to patch than they might first appear.

Quick review: The double-checked locking idiom

DCL is a lazy initialization technique that attempts to eliminate the synchronization overhead on the most common code path. Here is an example of the DCL idiom:

Listing 1: The double-checked locking idiom (DCL)

class SomeClass {
  private Resource resource = null;
  public Resource getResource() {
    if (resource == null) {
      synchronized {
        if (resource == null) 
          resource = new Resource();
      }
    }
    return resource;
  }
}

Is something broken?

When first confronted with the possibility of unpredictable behavior associated with DCL, many programmers worry that Java might be broken somehow. One reader wrote:

Everything that I have heard about DCL gives me the creepy feeling that all sorts of strange and unpredictable things can happen when compiler optimizations interact with multithreading. Isn’t Java supposed to protect us from accessing uninitialized objects and other unpredictable phenomena?

The good news is no, Java is not broken, but multithreading and memory coherency are more complicated subjects than they might appear. Fortunately, you don’t have to become an expert on the JMM. You can ignore all of this complexity if you just use the tool that Java provides for exactly this purpose — synchronization. If you synchronize every access to a variable that might have been written, or could be read by, another thread, you will have no memory coherency problems.

The Java architects strove to allow Java to perform well on cutting-edge hardware — at the cost of a somewhat heavyweight and hard-to-understand synchronization model. This complicated model has led people to try and outsmart the system by concocting clever schemes to avoid synchronization, such as DCL. But the problems with DCL result from the failure to use synchronization, not with the Java Memory Model itself.

Don’t try to fool the compiler

By far the most common category of suggested DCL fixes are those that try to fool the compiler into performing certain operations in a specific order. Attempting to trick the compiler proves dangerous for many reasons, the most obvious being that you might succeed in only fooling yourself into thinking that you’ve fooled the compiler. Believing that you’ve tricked the compiler can give you a false sense of confidence, when maybe you’ve only fooled this version of this compiler in this case.

The Java Language Specification (JLS) gives Java compilers and JVMs a good deal of latitude to reorder or optimize away operations. Java compilers are only required to maintain within-thread as-if-serial semantics, which means that the executing thread must not be able to detect any of these optimizations or reorderings. However, the JLS makes it clear that in the absence of synchronization, other threads might perceive memory updates in an order that “may be surprising.”

One commonly suggested fix for DCL is to use a temporary variable to try and force the constructor to execute before its reference is assigned:

Listing 2: Using a temporary variable

  public Resource getResource() {
    if (resource == null) {
      synchronized {
        if (resource == null) {
          Resource temp = new Resource();
          resource = temp;
        }
      }
    return resource;
  }

In Listing 2, you might think that as-if-serial semantics requires that the construction complete before resource is set, but that view is only from the perspective of the executing thread. Actually, the compiler is free to completely optimize away the temporary variable. Though numerous tricks have been suggested to prevent the compiler from optimizing away the temporary variable, such as making it public, the compiler can still vary the order in which assignments are made inside the synchronized block as long as the executing thread can’t tell the difference.

Another common suggestion is to use a “guard” variable to indicate that the initialization has completed. An example of this technique:

Listing 3: Using a guard variable

  private volatile boolean initialized = false;
  public Resource getResource() {
    if (resource == null || !initialized) {
      synchronized {
        if (resource == null)
          resource = new Resource();
      }
      initialized = (resource != null);
    }
    return resource;
  }

At first, the approach taken in Listing 3 looks promising. Because initialized is set after the synchronized block exits, it appears that resource will have been fully written to memory before initialized is set. Listing 3 even attempts to ensure that initialized is not set until resource is set by making initialized‘s value depend on resource‘s value. However, synchronization doesn’t work quite so literally. The compiler or JVM can move statements into synchronized blocks to reduce the cache-flush penalties associated with synchronization. But this means that from the perspective of other threads, initialized could still appear to be set before resource, and all of resource‘s fields are flushed to main memory.

Even if you can fool the compiler, you can’t fool the cache

It’s hard to fool the compiler, even if you can think like one. But if you do succeed in tricking the compiler, you still aren’t guaranteed correct programs with respect to the JMM. Compiler optimizations are only one source of potential reorderings; the processor and the cache can also affect the order in which other threads perceive memory updates. A modern processor can execute multiple instructions simultaneously, or out of order, as long as it can determine that the results of one operation are not required for another operation. Also, write-back caches might vary the order in which memory writes are committed to main memory.

As an example, consider this simple class:

public class SomeObject {
  int a;
  public SomeObject() {
    a = 1;
  }
}

Suppose your program instantiates a SomeObject, storing the result in the object MyObject‘s field called someField. A Java compiler would generate something like the Java byte code in Listing 4. In Listing 4, I’ve inlined the constructor for SomeObject and eliminated the stack-management instructions (dup, aload) for clarification:

Listing 4: Simplified Java byte code for creating a new SomeObject

   new <Class SomeObject> ; Allocate memory for a SomeObject
   invokespecial <Method java.lang.Object()>
                          ; Call the constructor for Object()
   iconst_1               ; Load the constant 1
                          ; Call this next operation FirstWrite
   putfield <Field int a> ; Store it in SomeObject.a
                          ; Call this next operation SecondWrite
   putfield <Field MyObject someField>
                          ; Store the reference somewhere

Now, suppose that a JIT (just-in-time) compiler translates the byte code from Listing 4 into machine code. The JIT would likely generate a call to the JVM’s equivalent of malloc(), a call to the constructor for Object, and two store-to-memory instructions — one for SomeObject‘s field a (FirstWrite) and one to store the resulting reference in someField (SecondWrite).

Even if you were assured that the processor would execute these instructions in exactly this order, other threads — running on other processors — examining main memory might not see them happen in that order. Even though FirstWrite executes before SecondWrite, the cache on the executing processor could flush the results of SecondWrite to main memory before it flushes the results of FirstWrite. As a result, another thread could see someField initialized to a partially constructed SomeObject.

This example might shed some light on a common misperception about the JMM and memory access reordering: the reorderings occur at the statement, method, or byte-code level. In reality, you should be more concerned about reorderings at the memory-fetch level. After the Java compiler emits its byte code, and the JIT compiles it to machine code, the machine code will execute on a real processor with a real cache. The JMM specifies what sort of hardware-based reorderings it will tolerate — and it is simply not the case that the JMM expects memory coherency across threads.

If you are not familiar with the specifics of what happens in modern processors and caches, you might find this sort of nondeterminism surprising and even disturbing. But the compiler and JVM can hide all this complexity from you — if you follow the rules embodied in the JLS. And when it comes to sharing memory between threads, the rule is simple: synchronize.

Why are these sorts of nondeterminism in processors and caches tolerated? Because they provide us with better performance. Many of the recent advances in computing performance have come through increased parallelism. That is why the JMM doesn’t assume that memory operations performed by one thread will be perceived as happening in the same order by another thread — so as not to hamstring Java’s performance on modern hardware. Only when two threads synchronize on the same monitor (or lock) can they rely on the ordering of memory operations.

Even if you fool the compiler and the cache…

Even if you were able to guarantee that writes to memory are completed in the desired order, that still might not be enough to render your programs correct with respect to the JMM. It is actually possible, under the current JMM, to force Java to update several variables to main memory in a specific order. You could do this:

class FullMemoryBarrierSingleton {
  private static boolean initialized = false;
  private static Resource resource = null;
  private static Object lock = new Object();
  public static Resource getResource() {
    if (!initialized) {           
      synchronized (lock) {
        if (!initialized && resource == null) 
          resource = new Resource();
      }
      synchronized (lock) {
        initialized = true;
      }
    }
    return resource;
  }
}

The current JMM does not permit the JVM to merge two synchronized blocks. So another thread could not possibly see initialized set before resource and all its fields are fully initialized and written to main memory. So have we solved the DCL problem? FullMemoryBarrierSingleton appears to avoid synchronization on the most common code path, without any obvious memory model hazards. Unfortunately, this is not exactly true.

Don’t forget the read barrier

Some processors exhibit cache coherency, which means that regardless of the contents of any given processor’s cache, each processor sees the same values of memory. (Of course, depending on what is cached where, access to certain memory locations might be faster from some processors than from others.) While cache coherency certainly makes life easier for programmers, it substantially complicates the hardware and limits how many processors can connect to the same memory bus.

An alternative architectural approach to cache coherency is to allow each processor to update its own cache independently, but offer some sort of synchronization mechanism to make sure that updates by one processor become visible to other processors in a deterministic manner. That mechanism is generally the memory barrier, and many processors, such as Alpha, PowerPC, and Sparc offer explicit memory barrier instructions. Generally, there are two types of memory barrier instructions:

A read barrier, which invalidates the contents of the executing processor’s cache so that changes made by other processors can become visible to the executing processor
A write barrier, which writes out the contents of the executing processor’s cache, so that changes made by the executing processor can become visible to others

On architectures without cache coherency, two processors can potentially fetch a value from the same memory address and see different results. To avoid this problem, either the programmer or compiler must use memory barrier instructions.

The JMM was designed to support architectures both with and without cache coherency. The JMM requires that a thread perform a read barrier after monitor entry and a write barrier before monitor exit. FullMemoryBarrierSingleton does indeed force the initializing thread to perform two write barriers so that resource and initialized are written to main memory in the proper order. So what could be wrong? The problem is that the other threads don’t necessarily perform a read barrier after determining that initialized is set, so they could possibly see stale values of resource or resource‘s fields.

To see how a thread could see stale values for resource, don’t think in terms of objects and fields, but instead in terms of memory locations and their contents. Perhaps the memory location corresponding to the field resource was already in the current processor’s cache before another processor initialized resource. Since the current processor has not performed a read barrier, it would see that the address of resource was already in its cache and just use the cached value, which is now stale. The same could happen with any nonvolatile field of resource. So by not synchronizing before acquiring the reference to resource, a thread might see a stale or garbage value for resource (or one of its fields).

Is this just theory, or could it happen to my code?

Most Java applications are hosted on Intel or Sparc systems, which offer stronger memory models than required by the Java Memory Model. (Sparc processors actually offer multiple memory models with varying levels of cache coherency.) And many systems have only a single processor. It might be tempting to dismiss these concerns as being only of theoretical value and think, “That couldn’t happen to us because we only use Solaris,” or, “All our systems have single processors.”

The danger behind dismissing these concerns is that these assumptions get buried in the code, where no one knows about them. Programs have a tendency to live much longer than expected. The Y2K phenomenon was dramatic evidence of that fact — programmers made memory optimizations 20 or 30 years ago, fully convinced that future programmers would certainly replace the code by the year 2000. Even if you’re sure that your program is only going to run on Linux/Intel in the foreseeable future, how do you know the same program won’t get rehosted to another platform 10 years from now? Will anyone remember that you assumed otherwise when you wrote some unsynchronized cache class deeply buried inside your application?

Conclusion

DCL, and other techniques for avoiding synchronization, expose many of the complexities of the JMM. The issues surrounding synchronization are subtle and complicated — so it is no surprise that many intelligent programmers have tried, but failed, to fix DCL.

The original goal of the Java Memory Model was to enable programmers to write concurrent programs in Java that would run efficiently on modern hardware, while still guaranteeing the Write Once, Run Anywhere behavior across a variety of computing architectures. Since there is now a JVM for nearly every conceivable processor, it is not unreasonable to expect that your code will eventually run on a different architecture than the one on which it was developed. So follow the rules now (synchronize!), and you can avoid a concurrency crisis in the future.

Brian Goetz is a professional software
developer with more than 15 years of experience. He is a principal
consultant at Quiotix, a
software development and consulting firm located in Los Altos,
Calif.

Source: www.infoworld.com