Can ThreadLocal solve the double-checked locking problem?
ThreadLocal appears to fix the thread-safety issues behind double-checked locking
In my previous article “Double-Checked Locking: Clever, but Broken,” I looked at some of the problems with double-checked locking (DCL), an idiom recommended by a number of Java books and articles as a way to reduce synchronization overhead when performing lazy initialization. Unfortunately, DCL isn’t guaranteed to work under the current Java Memory Model (JMM). The reasons why are subtle, and nearly all the proposed “solutions” are as flawed as the DCL idiom itself — see “Can Double-Checked Locking Be Fixed?” for some explanations of why this is so. The simple truth is that if threads must share data in a Java program, then you must use synchronization to guarantee that all threads have a consistent view of the data.
The ThreadLocal
class, introduced in JDK 1.2, can help reduce some of the complexities involved in developing thread-safe classes by reducing the amount of data shared between threads. But as we shall see, sometimes this simplicity results in performance costs. Let’s review the DCL problem and then look at how you can use ThreadLocal
to solve part of that problem.
What’s wrong with DCL again?
DCL is a technique for lazy initialization; it attempts to eliminate the synchronization overhead on the most common code path when fetching a reference to the lazily initialized object. Developers often try to avoid synchronizing on the common code path because of efficiency issues — synchronized operations run more slowly than unsynchronized ones. Here is an example of the (incorrect) DCL idiom and the (correct) single-check idiom it was intended to replace:
Listing 1. The double-checked locking (DCL) idiom
// This class is not thread-safe
class DoubleCheckExample { // Not thread-safe
private static Resource resource = null;
public static Resource getResource() {
if (resource == null) {
synchronized {
if (resource == null)
resource = new Resource();
}
}
return resource;
}
}
// This class is thread-safe
class SingleCheckExample {
private static Resource resource = null;
public static Resource getResource() {
synchronized {
if (resource == null)
resource = new Resource();
}
}
return resource;
}
}
Note that in SingleCheckExample
, you must execute a synchronized block every time getResource()
is called, whereas in DoubleCheckExample
, you have to synchronize only the first time. While it appears harmless, DCL doesn’t work because the JMM doesn’t guarantee that other threads will necessarily see updates to variables made by other threads, unless both threads synchronize on the same monitor. Without synchronizing when you access the shared variable (the resource
field), under some architectures and with some unlucky timing, another thread could see a partially constructed Resource
returned from getResource()
. DCL falls afoul of the synchronization rules by having the first reference to resource
, the check to see if it is null, appear outside the synchronized block. In order to guarantee that the Resource
object is fully constructed before it is made visible to other threads, you must synchronize.
The DCL idiom appeals to the belief that Java programs execute sequentially in a predictable order of execution; in reality many operations can occur in parallel or in an order other than the obvious one. Most of the time, this parallelism is undetectable and desirable — it allows JVMs and hardware to execute Java programs faster. But sometimes the inherent nonsequentiality of modern computing hardware shows through in unexpected ways, such as when you try to bend the rules requiring synchronization when accessing a shared variable.
But do you have to synchronize every time?
However, as long as the object being retrieved will not change its state once constructed, the risks associated with DCL are only present the first time a thread accesses the shared Resource
object. Because resource
and the object(s) it references will not change once initialized, after the Resource
object and the objects it references are made visible to a given thread the first time, they should remain visible and valid on subsequent invocations of getResource()
. As long as each thread has synchronized the first time it calls getResource()
, after the Resource
is fully constructed, subsequent accesses to resource
will be thread-safe.
Introducing ThreadLocal
Is there an easy way in Java to maintain per-thread state information so you can efficiently store the answer to the question “Has this thread synchronized on this monitor yet?” As of Java 1.2, there is: through the ThreadLocal
class.
A thread-local variable is one that has a separate copy of its value for each thread that uses it. Each thread can manipulate its copy of the variable’s value independently of other threads and in fact, knows nothing about the existence or values of other threads’ copies of that variable. The ThreadLocal
class first appeared in the Java Class Library in JDK 1.2. It receives relatively little attention, partially because initial implementations performed poorly, but it can be quite useful. Most threading facilities support thread-local variables; the omission of their support from the initial Java Thread API is a surprising one.
Because it was not implemented as part of the language, but instead as a class, manipulating a thread-local variable in Java is not as transparent as it would be in a language that supports thread-locals directly (such as the __declspec(thread)
language extension offered by Microsoft Visual C++). ThreadLocal
has the following simple interface, similar to java.lang.Reference
; the interface functions as an indirect handle to the per-thread value:
public class ThreadLocal {
public Object get();
public void set(Object newValue);
public Object initialValue();
}
The get()
and set()
methods act as accessors for the current thread’s version of the variable’s value, and the optional initialValue()
method acts like a constructor, which initializes the variable’s value on a per-thread basis.
Can ThreadLocal help fix DCL?
We can use ThreadLocal
to achieve the DCL idiom’s explicit goal — lazy initialization without synchronization on the common code path. Consider this (thread-safe) version of DCL:
Listing 2. DCL using ThreadLocal
class ThreadLocalDCL {
private static ThreadLocal initHolder = new ThreadLocal();
private static Resource resource = null;
public Resource getResource() {
if (initHolder.get() == null) {
synchronized {
if (resource == null)
resource = new Resource();
initHolder.set(Boolean.TRUE);
}
}
return resource;
}
}
How does this version differ from the classic DCL implementation? Instead of checking to see if the shared resource
field is nonnull, we use a ThreadLocal
to store the answer to the question “Has this thread been through the synchronized block yet?” The ThreadLocal.get()
method is thread-safe, so calling it outside the synchronized block is safe. Since the thread-local operations do not involve sharing data between threads, we have none of the reordering problems that we would have had with a shared initialized
variable. The resource
field does not get referenced unless the thread has already executed the synchronized block, guaranteeing that if the Resource
has been constructed by another thread, it and all the objects it references are visible to the executing thread. And on the common code path, where the Resource
has already been initialized and the executing thread has a consistent view of the shared object, no synchronization is required. It appears that this meets the requirements that DCL was intended to address — lazy initialization without synchronization.
The proof is in the performance
But the real motivation behind DCL was to eliminate the synchronization on the common code path because synchronization is expensive. So to say that this technique “solves” the DCL problem, ThreadLocal
would have to be faster than a synchronized lazy initialization like SingleCheckExample
. Unfortunately, this is not yet the case.
The initial version of ThreadLocal
performed poorly. It was implemented with a synchronized WeakHashMap
, using the Thread
object as the key. Thus executing ThreadLocal.get()
or ThreadLocal.set()
required not only synchronization but also de-referencing a weak reference. Needless to say, that would be even slower than SingleCheckExample
.
The JDK 1.3, the current JDK as of this writing, features a substantially improved ThreadLocal
implementation. The Thread
class was modified to provide support for thread-local variables, and the ThreadLocal
accessor methods run without synchronization, requiring only a HashMap
lookup to find the local thread’s variable copy. However, the Thread.currentThread()
method, which is used to find the current thread and thus the current thread’s thread-local variables, is surprisingly expensive.
Even under JDK 1.3, executing ThreadLocal.get()
is about twice as slow as an uncontended synchronization. This means that while ThreadLocalDCL
is an elegant and correct replacement for the incorrect DCL idiom, using ThreadLocal
to implement DCL is still slower than SingleCheckExample
— and the whole point was to be faster. If it’s not faster than synchronizing, we really can’t say that ThreadLocal
solves the DCL problem.
What about JDK 1.4?
In JDK 1.4, ThreadLocal
and Thread.currentThread()
have been rewritten again to be much faster. The improved versions were included in the JDK 1.4 Beta 2 release, and informal tests show that ThreadLocal
‘s performance (at least when using the -server
JVM option, but not the -client
JVM option) is substantially faster than either the previous version or an uncontended synchronization. So once JDK 1.4 becomes widely deployed, ThreadLocal
will, in some environments, meet the performance goals of DCL without compromising thread safety.
We’re almost out of the woods
The 1.4 (Merlin) JDK is still in beta and, even when released, might not be widely deployed and used for some time. So for the present, using ThreadLocal
to solve the DCL problem still falls short of the original goal of DCL — performance. However, after Merlin is released and more widely deployed, performance problems will no longer hobble ThreadLocal
; it will become a practical, efficient, and clean solution to many problems in concurrent programming, not just lazy initialization.