Race condition vs. data race
It may seem that the terms "race condition" and "data race" have the same meaning, while in fact, they are different. "Java Concurrency in Practice" ISBN: 0321349601 book says:
"Not all race conditions are data races, and not all data races are race conditions, but they both can cause concurrent programs to fail in unpredictable ways."
The Java memory model (JMM) does not directly mention race conditions, but there is a phrase:
"freedom from data races still allows errors arising from groups of operations that need to be perceived atomically and are not"
which is in essence an example of a race condition.
In order to demonstrate this difference, I am going to show a situation where there is a race condition but there is no data race, and a situation where there is a data race but there is no race condition.
Contents
Race condition
Race condition—a property of an algorithm (or a program, a system, etc.) manifested in displaying anomalous outcomes / behaviour because of unfortunate ordering / relative timing of events.
Data race
Data race—a property of an execution of a program.
According to the JMM, an execution is said to contain a data race if it contains at least two conflicting accesses (reads of or writes to the same variable)
that are not ordered by a happens-before (hb
) relation1.
Two accesses to the same variable are said to be conflicting if at least one of the accesses is a write.
This definition can probably be generalized by saying that an execution contains a data race if it contains at least two conflicting accesses that are not properly coordinated, a.k.a synchronized, but I am going to talk about data races as they are defined by the JMM. Unfortunately, the above definition has a significant flaw in it, which was pointed out many times by different people, though the problem has not been fixed as of the Java Language Specification (JLS) for the Java Platform, Standard Edition (Java SE) 17:
"JLS3 seems to contain a glitch that prevents me from proving that my program is free of data races"
Java Memory Model discussions list, the answer, 2005"I was wondering if there was a happens before guarantee for reads of volatiles relative to later writes."
concurrency-interest discussion list, the answer, 2012"Is volatile read happens-before volatile write?"
stackoverflow.com, 2013"The way "data race" and "correctly synchronized" programs are defined in JMM continue bothering me. It makes completely legit programs producing consistent results with all shared variables declared volatile … to be "incorrectly synchronized" because data race definition is equally applicable to plain and volatile accesses. So my question is: shouldn't data race be specified in JMM for non-volatile conflicting accesses only?"
I asked this question in concurrency-interest discussion list and recommend reading it for those who are interested in a formal explanation of the problem, 2017
Despite the incorrect definition stated by the JMM stays unchanged, I am going to use a fixed version:
An execution is said to contain a data race if it contains at least two conflicting non-volatile2 accesses
to a shared variable that are not ordered by an hb
relation.
Despite data race is a property of an execution and not a program, you may hear people saying that a program has a data race. And we do not have to look far to find such situations because even the JMM does so in some places. A phrase "the program has a data race" means that there are executions of the program, which are allowed3 by the JMM and contain a data race (hereafter in this post I will refer to an execution allowed by the JMM as just an execution). A phrase "the program / algorithm is racy" means that it has a race condition.
Examples
I will try to give links to the JMM sections needed to understand the explanations below, but it is still better if the reader is familiar with the JMM. If you feel a bit scared reading the JMM, maybe reading "Close Encounters of The Java Memory Model Kind" by Aleksey Shipilëv is going to be more fun.4
Race condition example
While I could have described an algorithm with a race condition in plain English, I will show a source code of a program in Java which has this condition, just to emphasize that data race and race condition do not necessarily imply one another even in Java programs.
class RaceConditionExample {
static volatile boolean flag = false;
static void raiseFlag() {
flag = true; // w1
}
public static void main(String... args) {
ForkJoinPool.commonPool().execute(RaceConditionExample::raiseFlag);
System.out.print(flag); // r
}
}
The only shared variable in the program is flag
5, and it is marked as volatile, hence by definition,
there are no data races in any execution of this program.
And yet it is obvious that the program may print not only "true" (let us say the desired outcome), but also "false",
because we do not impose any order between the action r
reading flag
for printing and the action w1
writing true to flag
in the method raiseFlag
.
More specifically, these two actions are synchronization actions,
thus they are totally ordered with synchronization order (so
).
But both orders
so1
:r
,w1
so2
:w1
,r
are allowed3 and lead to different program outputs—"false" and "true" respectively.
Moreover, not all the allowed executions even have the w1
action, as the raiseFlag
method
is not guaranteed to be called.
So the program clearly has a race condition despite having no data race.
We can get rid of the race condition in our program by waiting until flag
becomes true
(the approach is only used for demonstration purposes, so please do not copy it blindly to a real code because there are more sane ways
of accomplishing the same goal):
class FixedRaceConditionExample {
static volatile boolean flag = false; // w0
static void raiseFlag() {
flag = true; // w1
}
public static void main(String... args) {
ForkJoinPool.commonPool().execute(FixedRaceConditionExample::raiseFlag);
while (!flag); // r_i, where i ∈ [1, k], k is finite
System.out.print(flag); // r
}
}
For this modified program the JMM allows executions with only the following sets of so
:
-
so1
:w0
,w1
,r_1
,r
which gives synchronized-with (
sw
) relationsw(w1, r)
becausew1
andr
both affect the samevolatile
variableflag
, and hencehb(w1, r)
, thusr == true
. -
so2
:w0
,r_1
, … ,w1
, …,r_k
,r
which gives
r == true
for the same reasons as above.
We hereby presented a proof that all executions of this modified program produce the same result: they print "true", hence this program does not have a race condition.
Data race example
Let us change the example by getting rid of the volatile
modifier.
class DataRaceExample {
static boolean flag = false; // w0
static void raiseFlag() {
flag = true; // w1
}
public static void main(String... args) {
ForkJoinPool.commonPool().execute(DataRaceExample::raiseFlag);
while (!flag); // r_i, where i ∈ [1, k), k may be infinite
System.out.print(flag); // r
}
}
Now all executions have data races because flag
is not volatile
, and there are no sources for hb
relations between w
and any r_i
,
or between w
and r
. Therefore by definition, all executions of this program contain data races.
In this situation the JMM allows either true or false being read by any of the reads r_i
, r
.
So the program may not only print "true" or "false" but may also hang infinitely executing while (!flag)
,
thus never performing the action r
. In other words this program produces anomalous outcomes which are not caused by any unfortunate ordering or timing
but rather by data races in the executions of the program.
For the sake of completeness, I need to say that a data race does not always lead to unexpected outcomes, while a race condition by definition does. Sometimes data races are used to allow the program to perform faster; these are so-called benign data races. Examples of such benign cases can be found in the source code of the OpenJDK Java Development Kit (JDK)6:
// java.lang.String from OpenJDK JDK 17
// https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/java/lang/String.java#L2328-L2348
/** Cache the hash code for the string */
private int hash; // Default to 0
/**
* Cache if the hash has been calculated as actually being zero, enabling
* us to avoid recalculating this.
*/
private boolean hashIsZero; // Default to false;
public int hashCode() {
// The hash or hashIsZero fields are subject to a benign data race,
// making it crucial to ensure that any observable result of the
// calculation in this method stays correct under any possible read of
// these fields. Necessary restrictions to allow this to be correct
// without explicit memory fences or similar concurrency primitives is
// that we can ever only write to one of these two fields for a given
// String instance, and that the computation is idempotent and derived
// from immutable state
int h = hash;
if (h == 0 && !hashIsZero) {
h = isLatin1() ? StringLatin1.hashCode(value)
: StringUTF16.hashCode(value);
if (h == 0) {
hashIsZero = true;
} else {
hash = h;
}
}
return h;
}
Another example known to me was in OpenJDK JDK 6 and is long gone, but one still may see a corresponding question on stackoverflow.com and a concurrency-interest discussion.
-
Relations and partial/total orders mentioned in the JMM are all binary relations.
-
We can do volatile read/write actions in Java either by accessing a
volatile
field or by usingVarHandle.AccessMode.GET_VOLATILE
/VarHandle.AccessMode.SET_VOLATILE
. -
An execution is allowed by the JMM iff it
- is well-formed,
- satisfies the causality requirements,
- satisfies the requirements for observable behavior,
- obeys the
final
field semantics, - does not display word tearing.
-
More links to resources about the JMM and its new developments like access modes may be found here.
-
Here we are ignoring any shared variables used inside
ForkJoinPool.commonPool()
without loss of generality. .execute( DataRaceExample::raiseFlag) -
See this footnote for an introduction to some Java-related terms (Java SE/EE/ME, JDK, JRE, JVM, OpenJDK, Jakarta EE, etc.), which may be useful to orient rookie engineers/recruiters.