Categories
Correctness Data races Firefox

Fix your damned data races

Nathan Froyd recently wrote about how he has been using ThreadSanitizer to find data races in Firefox, and how a number of Firefox developers — particular in the networking and JS GC teams — have been fixing these.

This is great news. I want to re-emphasise and re-state one of the points from Nathan’s post, which is that data races are a class of bug that almost everybody underestimates. Unless you have, say, authored a specification of the memory model for a systems programming language, your intuition about the potential impact of many data races is probably wrong. And I’m going to give you three links to explain why.

Hans Boehm’s paper How to miscompile programs with “benign” data races explains very clearly that it’s possible to correctly conclude that a data race is benign at the level of machine code, but it’s almost impossible at the level of C or C++ code. And if you try to do the latter by inspecting the code generated by a C or C++ compiler, you are not allowing for the fact that other compilers (including future versions of the compiler you used) can and will generate different code, and so your conclusion is both incomplete and temporary.

Dmitri Vyukov’s blog post Benign data races: what could possibly go wrong? covers similar ground, giving more examples of how compilers can legitimately compile things in surprising ways. For example, at any point the storage used by a local variable can be temporarily used to hold a different variable’s value (think register spilling). If another thread reads this storage in an racy fashion, it could read the value of an unrelated value.

Finally, John Regehr’s blog has many posts that show how C and C++ compilers take advantage of undefined behaviour to do surprising (even shocking) program transformations, and how the extent of these transformations has steadily increased over time. Compilers genuinely are getting smarter, and are increasingly pushing the envelope of what a language will let them get away with. And the behaviour of a C or C++ programs is undefined in the presence of data races. (This is sometimes called “catch-fire” semantics, for reasons you should be able to guess.)

So, in summary: if you write C or C++ code that deals with threads in Firefox — and that probably includes everybody who writes C or C++ code in Firefox — you should have read at least the first two links I’ve given above. If you haven’t, please do so now. If you have read them and they haven’t made you at least slightly terrified, you should read them again. And if TSan identifies a data race in code that you are familiar with, please take it seriously, and fix it. Thank you.