Can ThreadLocal solve the double-checked locking problem?

ThreadLocal appears to fix the thread-safety issues behind double-checked locking

In my previous article “Double-Checked Locking: Clever, but Broken,” I looked at some of the problems with double-checked locking (DCL), an idiom recommended by a number of Java books and articles as a way to reduce synchronization overhead when performing lazy initialization. Unfortunately, DCL isn’t guaranteed to work under the current Java Memory Model (JMM). The reasons why are subtle, and nearly all the proposed “solutions” are as flawed as the DCL idiom itself — see “Can Double-Checked Locking Be Fixed?” for some explanations of why this is so. The simple truth is that if threads must share data in a Java program, then you must use synchronization to guarantee that all threads have a consistent view of the data.

The ThreadLocal class, introduced in JDK 1.2, can help reduce some of the complexities involved in developing thread-safe classes by reducing the amount of data shared between threads. But as we shall see, sometimes this simplicity results in performance costs. Let’s review the DCL problem and then look at how you can use ThreadLocal to solve part of that problem.

What’s wrong with DCL again?

DCL is a technique for lazy initialization; it attempts to eliminate the synchronization overhead on the most common code path when fetching a reference to the lazily initialized object. Developers often try to avoid synchronizing on the common code path because of efficiency issues — synchronized operations run more slowly than unsynchronized ones. Here is an example of the (incorrect) DCL idiom and the (correct) single-check idiom it was intended to replace:

Listing 1. The double-checked locking (DCL) idiom

// This class is not thread-safe
class DoubleCheckExample { // Not thread-safe
  private static Resource resource = null;
  public static Resource getResource() {
    if (resource == null) {
      synchronized {
        if (resource == null) 
          resource = new Resource();
      }
    }
    return resource;
  }
}
// This class is thread-safe
class SingleCheckExample {
  private static Resource resource = null;
  public static Resource getResource() {
      synchronized {
        if (resource == null) 
          resource = new Resource();
      }
    }
    return resource;
  }
}

Note that in SingleCheckExample, you must execute a synchronized block every time getResource() is called, whereas in DoubleCheckExample, you have to synchronize only the first time. While it appears harmless, DCL doesn’t work because the JMM doesn’t guarantee that other threads will necessarily see updates to variables made by other threads, unless both threads synchronize on the same monitor. Without synchronizing when you access the shared variable (the resource field), under some architectures and with some unlucky timing, another thread could see a partially constructed Resource returned from getResource(). DCL falls afoul of the synchronization rules by having the first reference to resource, the check to see if it is null, appear outside the synchronized block. In order to guarantee that the Resource object is fully constructed before it is made visible to other threads, you must synchronize.

The DCL idiom appeals to the belief that Java programs execute sequentially in a predictable order of execution; in reality many operations can occur in parallel or in an order other than the obvious one. Most of the time, this parallelism is undetectable and desirable — it allows JVMs and hardware to execute Java programs faster. But sometimes the inherent nonsequentiality of modern computing hardware shows through in unexpected ways, such as when you try to bend the rules requiring synchronization when accessing a shared variable.

But do you have to synchronize every time?

However, as long as the object being retrieved will not change its state once constructed, the risks associated with DCL are only present the first time a thread accesses the shared Resource object. Because resource and the object(s) it references will not change once initialized, after the Resource object and the objects it references are made visible to a given thread the first time, they should remain visible and valid on subsequent invocations of getResource(). As long as each thread has synchronized the first time it calls getResource(), after the Resource is fully constructed, subsequent accesses to resource will be thread-safe.

Introducing ThreadLocal

Is there an easy way in Java to maintain per-thread state information so you can efficiently store the answer to the question “Has this thread synchronized on this monitor yet?” As of Java 1.2, there is: through the ThreadLocal class.

A thread-local variable is one that has a separate copy of its value for each thread that uses it. Each thread can manipulate its copy of the variable’s value independently of other threads and in fact, knows nothing about the existence or values of other threads’ copies of that variable. The ThreadLocal class first appeared in the Java Class Library in JDK 1.2. It receives relatively little attention, partially because initial implementations performed poorly, but it can be quite useful. Most threading facilities support thread-local variables; the omission of their support from the initial Java Thread API is a surprising one.

Because it was not implemented as part of the language, but instead as a class, manipulating a thread-local variable in Java is not as transparent as it would be in a language that supports thread-locals directly (such as the __declspec(thread) language extension offered by Microsoft Visual C++). ThreadLocal has the following simple interface, similar to java.lang.Reference; the interface functions as an indirect handle to the per-thread value:

public class ThreadLocal { 
  public Object get();
  public void set(Object newValue);
  public Object initialValue();
}

The get() and set() methods act as accessors for the current thread’s version of the variable’s value, and the optional initialValue() method acts like a constructor, which initializes the variable’s value on a per-thread basis.

Can ThreadLocal help fix DCL?

We can use ThreadLocal to achieve the DCL idiom’s explicit goal — lazy initialization without synchronization on the common code path. Consider this (thread-safe) version of DCL:

Listing 2. DCL using ThreadLocal

class ThreadLocalDCL {
  private static ThreadLocal initHolder = new ThreadLocal();
  private static Resource resource = null;
  public Resource getResource() {
    if (initHolder.get() == null) {
      synchronized {
        if (resource == null) 
          resource = new Resource();
        initHolder.set(Boolean.TRUE);
      }
    }
    return resource;
  }
}

How does this version differ from the classic DCL implementation? Instead of checking to see if the shared resource field is nonnull, we use a ThreadLocal to store the answer to the question “Has this thread been through the synchronized block yet?” The ThreadLocal.get() method is thread-safe, so calling it outside the synchronized block is safe. Since the thread-local operations do not involve sharing data between threads, we have none of the reordering problems that we would have had with a shared initialized variable. The resource field does not get referenced unless the thread has already executed the synchronized block, guaranteeing that if the Resource has been constructed by another thread, it and all the objects it references are visible to the executing thread. And on the common code path, where the Resource has already been initialized and the executing thread has a consistent view of the shared object, no synchronization is required. It appears that this meets the requirements that DCL was intended to address — lazy initialization without synchronization.

The proof is in the performance

But the real motivation behind DCL was to eliminate the synchronization on the common code path because synchronization is expensive. So to say that this technique “solves” the DCL problem, ThreadLocal would have to be faster than a synchronized lazy initialization like SingleCheckExample. Unfortunately, this is not yet the case.

The initial version of ThreadLocal performed poorly. It was implemented with a synchronized WeakHashMap, using the Thread object as the key. Thus executing ThreadLocal.get() or ThreadLocal.set() required not only synchronization but also de-referencing a weak reference. Needless to say, that would be even slower than SingleCheckExample.

The JDK 1.3, the current JDK as of this writing, features a substantially improved ThreadLocal implementation. The Thread class was modified to provide support for thread-local variables, and the ThreadLocal accessor methods run without synchronization, requiring only a HashMap lookup to find the local thread’s variable copy. However, the Thread.currentThread() method, which is used to find the current thread and thus the current thread’s thread-local variables, is surprisingly expensive.

Even under JDK 1.3, executing ThreadLocal.get() is about twice as slow as an uncontended synchronization. This means that while ThreadLocalDCL is an elegant and correct replacement for the incorrect DCL idiom, using ThreadLocal to implement DCL is still slower than SingleCheckExample — and the whole point was to be faster. If it’s not faster than synchronizing, we really can’t say that ThreadLocal solves the DCL problem.

What about JDK 1.4?

In JDK 1.4, ThreadLocal and Thread.currentThread() have been rewritten again to be much faster. The improved versions were included in the JDK 1.4 Beta 2 release, and informal tests show that ThreadLocal‘s performance (at least when using the -server JVM option, but not the -client JVM option) is substantially faster than either the previous version or an uncontended synchronization. So once JDK 1.4 becomes widely deployed, ThreadLocal will, in some environments, meet the performance goals of DCL without compromising thread safety.

We’re almost out of the woods

The 1.4 (Merlin) JDK is still in beta and, even when released, might not be widely deployed and used for some time. So for the present, using ThreadLocal to solve the DCL problem still falls short of the original goal of DCL — performance. However, after Merlin is released and more widely deployed, performance problems will no longer hobble ThreadLocal; it will become a practical, efficient, and clean solution to many problems in concurrent programming, not just lazy initialization.

Brian Goetz is a
professional software developer with more than 15 years of
experience. He is a principal consultant at Quiotix, a software development
and consulting firm located in Los Altos, Calif.

Source: www.infoworld.com