xref: /xnu-11215.1.10/doc/primitives/atomics.md (revision 8d741a5de7ff4191bf97d57b9f54c2f6d4a15585)
1*8d741a5dSApple OSS DistributionsXNU use of Atomics and Memory Barriers
2*8d741a5dSApple OSS Distributions======================================
3*8d741a5dSApple OSS Distributions
4*8d741a5dSApple OSS DistributionsHow to use atomics and memory barriers in xnu.
5*8d741a5dSApple OSS Distributions
6*8d741a5dSApple OSS DistributionsGoal
7*8d741a5dSApple OSS Distributions----
8*8d741a5dSApple OSS Distributions
9*8d741a5dSApple OSS DistributionsThis document discusses the use of atomics and memory barriers in XNU. It is
10*8d741a5dSApple OSS Distributionsmeant as a guide to best practices, and warns against a variety of possible
11*8d741a5dSApple OSS Distributionspitfalls in the handling of atomics in C.
12*8d741a5dSApple OSS Distributions
13*8d741a5dSApple OSS DistributionsIt is assumed that the reader has a decent understanding of
14*8d741a5dSApple OSS Distributionsthe [C11 memory model](https://en.cppreference.com/w/c/atomic/memory_order)
15*8d741a5dSApple OSS Distributionsas this document builds on it, and explains the liberties XNU takes with said
16*8d741a5dSApple OSS Distributionsmodel.
17*8d741a5dSApple OSS Distributions
18*8d741a5dSApple OSS DistributionsAll the interfaces discussed in this document are available through
19*8d741a5dSApple OSS Distributionsthe `<os/atomic_private.h>` header.
20*8d741a5dSApple OSS Distributions
21*8d741a5dSApple OSS DistributionsNote: Linux has thorough documentation around memory barriers
22*8d741a5dSApple OSS Distributions(Documentation/memory-barriers.txt), some of which is Linux specific,
23*8d741a5dSApple OSS Distributionsbut most is not and is a valuable read.
24*8d741a5dSApple OSS Distributions
25*8d741a5dSApple OSS Distributions
26*8d741a5dSApple OSS DistributionsVocabulary
27*8d741a5dSApple OSS Distributions----------
28*8d741a5dSApple OSS Distributions
29*8d741a5dSApple OSS DistributionsIn the rest of this document we'll refer to the various memory ordering defined
30*8d741a5dSApple OSS Distributionsby C11 as relaxed, consume, acquire, release, acq\_rel and seq\_cst.
31*8d741a5dSApple OSS Distributions
32*8d741a5dSApple OSS Distributions`os_atomic` also tries to make the distinction between compiler **barriers**
33*8d741a5dSApple OSS Distributions(which limit how much the compiler can reorder code), and memory **fences**.
34*8d741a5dSApple OSS Distributions
35*8d741a5dSApple OSS Distributions
36*8d741a5dSApple OSS DistributionsThe dangers and pitfalls of C11's `<stdatomic.h>`
37*8d741a5dSApple OSS Distributions-------------------------------------------------
38*8d741a5dSApple OSS Distributions
39*8d741a5dSApple OSS DistributionsWhile the C11 memory model has likely been one of the most important additions
40*8d741a5dSApple OSS Distributionsto modern C, in the purest C tradition, it is a sharp tool.
41*8d741a5dSApple OSS Distributions
42*8d741a5dSApple OSS DistributionsBy default, C11 comes with two variants of each atomic "operation":
43*8d741a5dSApple OSS Distributions
44*8d741a5dSApple OSS Distributions- an *explicit* variant where memory orderings can be specified,
45*8d741a5dSApple OSS Distributions- a regular variant which is equivalent to the former with the *seq_cst*
46*8d741a5dSApple OSS Distributions  memory ordering.
47*8d741a5dSApple OSS Distributions
48*8d741a5dSApple OSS DistributionsWhen an `_Atomic` qualified variable is accessed directly without using
49*8d741a5dSApple OSS Distributionsany `atomic_*_explicit()` operation, then the compiler will generate the
50*8d741a5dSApple OSS Distributionsmatching *seq_cst* atomic operations on your behalf.
51*8d741a5dSApple OSS Distributions
52*8d741a5dSApple OSS DistributionsThe sequentially consistent world is extremely safe from a lot of compiler
53*8d741a5dSApple OSS Distributionsand hardware reorderings and optimizations, which is great, but comes with
54*8d741a5dSApple OSS Distributionsa huge cost in terms of memory barriers.
55*8d741a5dSApple OSS Distributions
56*8d741a5dSApple OSS Distributions
57*8d741a5dSApple OSS DistributionsIt seems very tempting to use `atomic_*_explicit()` functions with explicit
58*8d741a5dSApple OSS Distributionsmemory orderings, however, the compiler is entitled to perform a number of
59*8d741a5dSApple OSS Distributionsoptimizations with relaxed atomics, that most developers will not expect.
60*8d741a5dSApple OSS DistributionsIndeed, the compiler is perfectly allowed to perform various optimizations it
61*8d741a5dSApple OSS Distributionsdoes with other plain memory accesess such as coalescing, reordering, hoisting
62*8d741a5dSApple OSS Distributionsout of loops, ...
63*8d741a5dSApple OSS Distributions
64*8d741a5dSApple OSS DistributionsFor example, when the compiler can know what `doit` is doing (which due to LTO
65*8d741a5dSApple OSS Distributionsis almost always the case for XNU), is allowed to transform this code:
66*8d741a5dSApple OSS Distributions
67*8d741a5dSApple OSS Distributions```c
68*8d741a5dSApple OSS Distributions    void
69*8d741a5dSApple OSS Distributions    perform_with_progress(int steps, long _Atomic *progress)
70*8d741a5dSApple OSS Distributions    {
71*8d741a5dSApple OSS Distributions        for (int i = 0; i < steps; i++) {
72*8d741a5dSApple OSS Distributions            doit(i);
73*8d741a5dSApple OSS Distributions            atomic_store_explicit(progress, i, memory_order_relaxed);
74*8d741a5dSApple OSS Distributions        }
75*8d741a5dSApple OSS Distributions    }
76*8d741a5dSApple OSS Distributions```
77*8d741a5dSApple OSS Distributions
78*8d741a5dSApple OSS DistributionsInto this, which obviously defeats the entire purpose of `progress`:
79*8d741a5dSApple OSS Distributions
80*8d741a5dSApple OSS Distributions```c
81*8d741a5dSApple OSS Distributions    void
82*8d741a5dSApple OSS Distributions    perform_with_progress(int steps, long _Atomic *progress)
83*8d741a5dSApple OSS Distributions    {
84*8d741a5dSApple OSS Distributions        for (int i = 0; i < steps; i++) {
85*8d741a5dSApple OSS Distributions            doit(i);
86*8d741a5dSApple OSS Distributions        }
87*8d741a5dSApple OSS Distributions        atomic_store_explicit(progress, steps, memory_order_relaxed);
88*8d741a5dSApple OSS Distributions    }
89*8d741a5dSApple OSS Distributions```
90*8d741a5dSApple OSS Distributions
91*8d741a5dSApple OSS Distributions
92*8d741a5dSApple OSS DistributionsHow `os_atomic_*` tries to address `<stdatomic.h>` pitfalls
93*8d741a5dSApple OSS Distributions-----------------------------------------------------------
94*8d741a5dSApple OSS Distributions
95*8d741a5dSApple OSS Distributions1. the memory locations passed to the various `os_atomic_*`
96*8d741a5dSApple OSS Distributions   functions do not need to be marked `_Atomic` or `volatile`
97*8d741a5dSApple OSS Distributions   (or `_Atomic volatile`), which allow for use of atomic
98*8d741a5dSApple OSS Distributions   operations in code written before C11 was even a thing.
99*8d741a5dSApple OSS Distributions
100*8d741a5dSApple OSS Distributions   It is however recommended in new code to use the `_Atomic`
101*8d741a5dSApple OSS Distributions   specifier.
102*8d741a5dSApple OSS Distributions
103*8d741a5dSApple OSS Distributions2. `os_atomic_*` cannot be coalesced by the compiler:
104*8d741a5dSApple OSS Distributions   all accesses are performed on the specified locations
105*8d741a5dSApple OSS Distributions   as if their type was `_Atomic volatile` qualified.
106*8d741a5dSApple OSS Distributions
107*8d741a5dSApple OSS Distributions3. `os_atomic_*` only comes with the explicit variants:
108*8d741a5dSApple OSS Distributions   orderings must be provided and can express either memory orders
109*8d741a5dSApple OSS Distributions   where the name is the same as in C11 without the `memory_order_` prefix,
110*8d741a5dSApple OSS Distributions   or a compiler barrier ordering `compiler_acquire`, `compiler_release`,
111*8d741a5dSApple OSS Distributions   `compiler_acq_rel`.
112*8d741a5dSApple OSS Distributions
113*8d741a5dSApple OSS Distributions4. `os_atomic_*` emits the proper compiler barriers that
114*8d741a5dSApple OSS Distributions   correspond to the requested memory ordering (using
115*8d741a5dSApple OSS Distributions   `atomic_signal_fence()`).
116*8d741a5dSApple OSS Distributions
117*8d741a5dSApple OSS Distributions
118*8d741a5dSApple OSS DistributionsBest practices for the use of atomics in XNU
119*8d741a5dSApple OSS Distributions--------------------------------------------
120*8d741a5dSApple OSS Distributions
121*8d741a5dSApple OSS DistributionsFor most generic code, the `os_atomic_*` functions from
122*8d741a5dSApple OSS Distributions`<os/atomic_private.h>` are the preferred interfaces.
123*8d741a5dSApple OSS Distributions
124*8d741a5dSApple OSS Distributions`__sync_*`, `__c11_*` and `__atomic_*` compiler builtins should not be used.
125*8d741a5dSApple OSS Distributions
126*8d741a5dSApple OSS Distributions`<stdatomic.h>` functions may be used if:
127*8d741a5dSApple OSS Distributions
128*8d741a5dSApple OSS Distributions- compiler coalescing / reordering is desired (refcounting
129*8d741a5dSApple OSS Distributions  implementations may desire this for example).
130*8d741a5dSApple OSS Distributions
131*8d741a5dSApple OSS Distributions
132*8d741a5dSApple OSS DistributionsQualifying atomic variables with `_Atomic` or even
133*8d741a5dSApple OSS Distributions`_Atomic volatile` is encouraged, however authors must
134*8d741a5dSApple OSS Distributionsbe aware that a direct access to this variable will
135*8d741a5dSApple OSS Distributionsresult in quite heavy memory barriers.
136*8d741a5dSApple OSS Distributions
137*8d741a5dSApple OSS DistributionsThe *consume* memory ordering should not be used
138*8d741a5dSApple OSS Distributions(See *dependency* memory order later in this documentation).
139*8d741a5dSApple OSS Distributions
140*8d741a5dSApple OSS Distributions**Note**: `<libkern/OSAtomic.h>` provides a bunch of legacy
141*8d741a5dSApple OSS Distributionsatomic interfaces, but this header is considered obsolete
142*8d741a5dSApple OSS Distributionsand these functions should not be used in new code.
143*8d741a5dSApple OSS Distributions
144*8d741a5dSApple OSS Distributions
145*8d741a5dSApple OSS DistributionsHigh level overview of `os_atomic_*` interfaces
146*8d741a5dSApple OSS Distributions-----------------------------------------------
147*8d741a5dSApple OSS Distributions
148*8d741a5dSApple OSS Distributions### Compiler barriers and memory fences
149*8d741a5dSApple OSS Distributions
150*8d741a5dSApple OSS Distributions`os_compiler_barrier(mem_order?)` provides a compiler barrier,
151*8d741a5dSApple OSS Distributionswith an optional barrier ordering. It is implemented with C11's
152*8d741a5dSApple OSS Distributions`atomic_signal_fence()`. The barrier ordering argument is optional
153*8d741a5dSApple OSS Distributionsand defaults to the `acq_rel` compiler barrier (which prevents the
154*8d741a5dSApple OSS Distributionscompiler to reorder code in any direction around this barrier).
155*8d741a5dSApple OSS Distributions
156*8d741a5dSApple OSS Distributions`os_atomic_thread_fence(mem_order)` provides a memory barrier
157*8d741a5dSApple OSS Distributionsaccording to the semantics of `atomic_thread_fence()`. It always
158*8d741a5dSApple OSS Distributionsimplies the equivalent `os_compiler_barrier()` even on UP systems.
159*8d741a5dSApple OSS Distributions
160*8d741a5dSApple OSS Distributions### Init, load and store
161*8d741a5dSApple OSS Distributions
162*8d741a5dSApple OSS Distributions`os_atomic_init`, `os_atomic_load` and `os_atomic_store` provide
163*8d741a5dSApple OSS Distributionsfacilities equivalent to `atomic_init`, `atomic_load_explicit`
164*8d741a5dSApple OSS Distributionsand `atomic_store_explicit` respectively.
165*8d741a5dSApple OSS Distributions
166*8d741a5dSApple OSS DistributionsNote that `os_atomic_load` and `os_atomic_store` promise that they will
167*8d741a5dSApple OSS Distributionscompile to a plain load or store. `os_atomic_load_wide` and
168*8d741a5dSApple OSS Distributions`os_atomic_store_wide` can be used to have access to atomic loads and store
169*8d741a5dSApple OSS Distributionsthat involve more costly codegen (such as compare exchange loops).
170*8d741a5dSApple OSS Distributions
171*8d741a5dSApple OSS Distributions### Basic RMW (read/modify/write) atomic operations
172*8d741a5dSApple OSS Distributions
173*8d741a5dSApple OSS DistributionsThe following basic atomic RMW operations exist:
174*8d741a5dSApple OSS Distributions
175*8d741a5dSApple OSS Distributions- `inc`: atomic increment (equivalent to an atomic add of `1`),
176*8d741a5dSApple OSS Distributions- `dec`: atomic decrement (equivalent to an atomic sub of `1`),
177*8d741a5dSApple OSS Distributions- `add`: atomic add,
178*8d741a5dSApple OSS Distributions- `sub`: atomic sub,
179*8d741a5dSApple OSS Distributions- `or`: atomic bitwise or,
180*8d741a5dSApple OSS Distributions- `xor`: atomic bitwise xor,
181*8d741a5dSApple OSS Distributions- `and`: atomic bitwise and,
182*8d741a5dSApple OSS Distributions- `andnot`: atomic bitwise andnot (equivalent to atomic and of ~value),
183*8d741a5dSApple OSS Distributions- `min`: atomic min,
184*8d741a5dSApple OSS Distributions- `max`: atomic max.
185*8d741a5dSApple OSS Distributions
186*8d741a5dSApple OSS DistributionsFor any such operation, two variants exist:
187*8d741a5dSApple OSS Distributions
188*8d741a5dSApple OSS Distributions- `os_atomic_${op}_orig` (for example `os_atomic_add_orig`)
189*8d741a5dSApple OSS Distributions  which returns the value stored at the specified location
190*8d741a5dSApple OSS Distributions  *before* the atomic operation took place
191*8d741a5dSApple OSS Distributions- `os_atomic_${op}` (for example `os_atomic_add`) which
192*8d741a5dSApple OSS Distributions  returns the value stored at the specified location
193*8d741a5dSApple OSS Distributions  *after* the atomic operation took place
194*8d741a5dSApple OSS Distributions
195*8d741a5dSApple OSS DistributionsThis convention is picked for two reasons:
196*8d741a5dSApple OSS Distributions
197*8d741a5dSApple OSS Distributions1. `os_atomic_add(p, value, ...)` is essentially equivalent to the C
198*8d741a5dSApple OSS Distributions   in place addition `(*p += value)` which returns the result of the
199*8d741a5dSApple OSS Distributions   operation and not the original value of `*p`.
200*8d741a5dSApple OSS Distributions
201*8d741a5dSApple OSS Distributions2. Most subtle atomic algorithms do actually require the original value
202*8d741a5dSApple OSS Distributions   stored at the location, especially for bit manipulations:
203*8d741a5dSApple OSS Distributions   `(os_atomic_or_orig(p, bit, relaxed) & bit)` will atomically perform
204*8d741a5dSApple OSS Distributions   `*p |= bit` but also tell you whether `bit` was set in the original value.
205*8d741a5dSApple OSS Distributions
206*8d741a5dSApple OSS Distributions   Making it more explicit that the original value is used is hence
207*8d741a5dSApple OSS Distributions   important for readers and worth the extra five keystrokes.
208*8d741a5dSApple OSS Distributions
209*8d741a5dSApple OSS DistributionsTypically:
210*8d741a5dSApple OSS Distributions
211*8d741a5dSApple OSS Distributions```c
212*8d741a5dSApple OSS Distributions    static int _Atomic i = 0;
213*8d741a5dSApple OSS Distributions
214*8d741a5dSApple OSS Distributions    printf("%d\n", os_atomic_inc_orig(&i)); // prints 0
215*8d741a5dSApple OSS Distributions    printf("%d\n", os_atomic_inc(&i)); // prints 2
216*8d741a5dSApple OSS Distributions```
217*8d741a5dSApple OSS Distributions
218*8d741a5dSApple OSS Distributions### Atomic swap / compare and swap
219*8d741a5dSApple OSS Distributions
220*8d741a5dSApple OSS Distributions`os_atomic_xchg` is a simple wrapper around `atomic_exchange_explicit`.
221*8d741a5dSApple OSS Distributions
222*8d741a5dSApple OSS DistributionsThere are two variants of `os_atomic_cmpxchg` which are wrappers around
223*8d741a5dSApple OSS Distributions`atomic_compare_exchange_strong_explicit`. Both of these variants will
224*8d741a5dSApple OSS Distributionsreturn false/0 if the compare exchange failed, and true/1 if the expected
225*8d741a5dSApple OSS Distributionsvalue was found at the specified location and the new value was stored.
226*8d741a5dSApple OSS Distributions
227*8d741a5dSApple OSS Distributions1. `os_atomic_cmpxchg(address, expected, new_value, mem_order)` which
228*8d741a5dSApple OSS Distributions   will atomically store `new_value` at `address` if the current value
229*8d741a5dSApple OSS Distributions   is equal to `expected`.
230*8d741a5dSApple OSS Distributions
231*8d741a5dSApple OSS Distributions2. `os_atomic_cmpxchgv(address, expected, new_value, orig_value, mem_order)`
232*8d741a5dSApple OSS Distributions   which has an extra `orig_value` argument which must be a pointer to a local
233*8d741a5dSApple OSS Distributions   variable and will be filled with the current value at `address` whether the
234*8d741a5dSApple OSS Distributions   compare exchange was successful or not. In case of success, the loaded value
235*8d741a5dSApple OSS Distributions   will always be `expected`, however in case of failure it will be filled with
236*8d741a5dSApple OSS Distributions   the current value, which is helpful to redrive compare exchange loops.
237*8d741a5dSApple OSS Distributions
238*8d741a5dSApple OSS DistributionsUnlike `atomic_compare_exchange_strong_explicit`, a single ordering is
239*8d741a5dSApple OSS Distributionsspecified, which only takes effect in case of a successful compare exchange.
240*8d741a5dSApple OSS DistributionsIn C11 speak, `os_atomic_cmpxchg*` always specifies `memory_order_relaxed`
241*8d741a5dSApple OSS Distributionsfor the failure case ordering, as it is what is used most of the time.
242*8d741a5dSApple OSS Distributions
243*8d741a5dSApple OSS DistributionsThere is no wrapper around `atomic_compare_exchange_weak_explicit`,
244*8d741a5dSApple OSS Distributionsas `os_atomic_rmw_loop` offers a much better alternative for CAS-loops.
245*8d741a5dSApple OSS Distributions
246*8d741a5dSApple OSS Distributions### `os_atomic_rmw_loop`
247*8d741a5dSApple OSS Distributions
248*8d741a5dSApple OSS DistributionsThis expressive and versatile construct allows for really terse and
249*8d741a5dSApple OSS Distributionsway more readable compare exchange loops. It also uses LL/SC constructs more
250*8d741a5dSApple OSS Distributionsefficiently than a compare exchange loop would allow.
251*8d741a5dSApple OSS Distributions
252*8d741a5dSApple OSS DistributionsInstead of a typical CAS-loop in C11:
253*8d741a5dSApple OSS Distributions
254*8d741a5dSApple OSS Distributions```c
255*8d741a5dSApple OSS Distributions    int _Atomic *address;
256*8d741a5dSApple OSS Distributions    int old_value, new_value;
257*8d741a5dSApple OSS Distributions    bool success = false;
258*8d741a5dSApple OSS Distributions
259*8d741a5dSApple OSS Distributions    old_value = atomic_load_explicit(address, memory_order_relaxed);
260*8d741a5dSApple OSS Distributions    do {
261*8d741a5dSApple OSS Distributions        if (!validate(old_value)) {
262*8d741a5dSApple OSS Distributions            break;
263*8d741a5dSApple OSS Distributions        }
264*8d741a5dSApple OSS Distributions        new_value = compute_new_value(old_value);
265*8d741a5dSApple OSS Distributions        success = atomic_compare_exchange_weak_explicit(address, &old_value,
266*8d741a5dSApple OSS Distributions                new_value, memory_order_acquire, memory_order_relaxed);
267*8d741a5dSApple OSS Distributions    } while (__improbable(!success));
268*8d741a5dSApple OSS Distributions```
269*8d741a5dSApple OSS Distributions
270*8d741a5dSApple OSS Distributions`os_atomic_rmw_loop` allows this form:
271*8d741a5dSApple OSS Distributions
272*8d741a5dSApple OSS Distributions```c
273*8d741a5dSApple OSS Distributions    int _Atomic *address;
274*8d741a5dSApple OSS Distributions    int old_value, new_value;
275*8d741a5dSApple OSS Distributions    bool success;
276*8d741a5dSApple OSS Distributions
277*8d741a5dSApple OSS Distributions    success = os_atomic_rmw_loop(address, old_value, new_value, acquire, {
278*8d741a5dSApple OSS Distributions        if (!validate(old_value)) {
279*8d741a5dSApple OSS Distributions            os_atomic_rmw_loop_give_up(break);
280*8d741a5dSApple OSS Distributions        }
281*8d741a5dSApple OSS Distributions        new_value = compute_new_value(old_value);
282*8d741a5dSApple OSS Distributions    });
283*8d741a5dSApple OSS Distributions```
284*8d741a5dSApple OSS Distributions
285*8d741a5dSApple OSS DistributionsUnlike the C11 variant, it lets the reader know in program order that this will
286*8d741a5dSApple OSS Distributionsbe a CAS loop, and exposes the ordering upfront, while for traditional CAS loops
287*8d741a5dSApple OSS Distributionsone has to jump to the end of the code to understand what it does.
288*8d741a5dSApple OSS Distributions
289*8d741a5dSApple OSS DistributionsAny control flow that attempts to exit its scope of the loop needs to be
290*8d741a5dSApple OSS Distributionswrapped with `os_atomic_rmw_loop_give_up` (so that LL/SC architectures can
291*8d741a5dSApple OSS Distributionsabort their opened LL/SC transaction).
292*8d741a5dSApple OSS Distributions
293*8d741a5dSApple OSS DistributionsBecause these loops are LL/SC transactions, it is undefined to perform
294*8d741a5dSApple OSS Distributionsany store to memory (register operations are fine) within these loops,
295*8d741a5dSApple OSS Distributionsas these may cause the store-conditional to always fail.
296*8d741a5dSApple OSS DistributionsIn particular nesting of `os_atomic_rmw_loop` is invalid.
297*8d741a5dSApple OSS Distributions
298*8d741a5dSApple OSS DistributionsUse of `continue` within an `os_atomic_rmw_loop` is also invalid, instead an
299*8d741a5dSApple OSS Distributions`os_atomic_rmw_loop_give_up(goto again)` jumping to an `again:` label placed
300*8d741a5dSApple OSS Distributionsbefore the loop should be used in this way:
301*8d741a5dSApple OSS Distributions
302*8d741a5dSApple OSS Distributions```c
303*8d741a5dSApple OSS Distributions    int _Atomic *address;
304*8d741a5dSApple OSS Distributions    int old_value, new_value;
305*8d741a5dSApple OSS Distributions    bool success;
306*8d741a5dSApple OSS Distributions
307*8d741a5dSApple OSS Distributionsagain:
308*8d741a5dSApple OSS Distributions    success = os_atomic_rmw_loop(address, old_value, new_value, acquire, {
309*8d741a5dSApple OSS Distributions        if (needs_some_store_that_can_thwart_the_transaction(old_value)) {
310*8d741a5dSApple OSS Distributions            os_atomic_rmw_loop_give_up({
311*8d741a5dSApple OSS Distributions                // Do whatever you need to do/store to central memory
312*8d741a5dSApple OSS Distributions                // that would cause the loop to always fail
313*8d741a5dSApple OSS Distributions                do_my_rmw_loop_breaking_store();
314*8d741a5dSApple OSS Distributions
315*8d741a5dSApple OSS Distributions                // And only then redrive.
316*8d741a5dSApple OSS Distributions                goto again;
317*8d741a5dSApple OSS Distributions            });
318*8d741a5dSApple OSS Distributions        }
319*8d741a5dSApple OSS Distributions        if (!validate(old_value)) {
320*8d741a5dSApple OSS Distributions            os_atomic_rmw_loop_give_up(break);
321*8d741a5dSApple OSS Distributions        }
322*8d741a5dSApple OSS Distributions        new_value = compute_new_value(old_value);
323*8d741a5dSApple OSS Distributions    });
324*8d741a5dSApple OSS Distributions```
325*8d741a5dSApple OSS Distributions
326*8d741a5dSApple OSS Distributions### the *dependency* memory order
327*8d741a5dSApple OSS Distributions
328*8d741a5dSApple OSS DistributionsBecause the C11 *consume* memory order is broken in various ways,
329*8d741a5dSApple OSS Distributionsmost compilers, clang included, implement it as an equivalent
330*8d741a5dSApple OSS Distributionsfor `memory_order_acquire`. However, its concept is useful
331*8d741a5dSApple OSS Distributionsfor certain algorithms.
332*8d741a5dSApple OSS Distributions
333*8d741a5dSApple OSS DistributionsAs an attempt to provide a replacement for this, `<os/atomic_private.h>`
334*8d741a5dSApple OSS Distributionsimplements an entirely new *dependency* memory ordering.
335*8d741a5dSApple OSS Distributions
336*8d741a5dSApple OSS DistributionsThe purpose of this ordering is to provide a relaxed load followed by an
337*8d741a5dSApple OSS Distributionsimplicit compiler barrier, that can be used as a root for a chain of hardware
338*8d741a5dSApple OSS Distributionsdependencies that would otherwise pair with store-releases done at this address,
339*8d741a5dSApple OSS Distributionsvery much like the *consume* memory order is intended to provide.
340*8d741a5dSApple OSS Distributions
341*8d741a5dSApple OSS DistributionsHowever, unlike the *consume* memory ordering where the compiler had to follow
342*8d741a5dSApple OSS Distributionsthe dependencies, the *dependency* memory ordering relies on explicit
343*8d741a5dSApple OSS Distributionsannotations of when the dependencies are expected:
344*8d741a5dSApple OSS Distributions
345*8d741a5dSApple OSS Distributions- loads through a pointer loaded with a *dependency* memory ordering
346*8d741a5dSApple OSS Distributions  will provide a hardware dependency,
347*8d741a5dSApple OSS Distributions
348*8d741a5dSApple OSS Distributions- dependencies may be injected into other loads not performed through this
349*8d741a5dSApple OSS Distributions  particular pointer with the `os_atomic_load_with_dependency_on` and
350*8d741a5dSApple OSS Distributions  `os_atomic_inject_dependency` interfaces.
351*8d741a5dSApple OSS Distributions
352*8d741a5dSApple OSS DistributionsHere is an example of how it is meant to be used:
353*8d741a5dSApple OSS Distributions
354*8d741a5dSApple OSS Distributions```c
355*8d741a5dSApple OSS Distributions    struct foo {
356*8d741a5dSApple OSS Distributions        long value;
357*8d741a5dSApple OSS Distributions        long _Atomic flag;
358*8d741a5dSApple OSS Distributions    };
359*8d741a5dSApple OSS Distributions
360*8d741a5dSApple OSS Distributions    void
361*8d741a5dSApple OSS Distributions    publish(struct foo *p, long value)
362*8d741a5dSApple OSS Distributions    {
363*8d741a5dSApple OSS Distributions        p->value = value;
364*8d741a5dSApple OSS Distributions        os_atomic_store(&p->flag, 1, release);
365*8d741a5dSApple OSS Distributions    }
366*8d741a5dSApple OSS Distributions
367*8d741a5dSApple OSS Distributions
368*8d741a5dSApple OSS Distributions    bool
369*8d741a5dSApple OSS Distributions    broken_read(struct foo *p, long *value)
370*8d741a5dSApple OSS Distributions    {
371*8d741a5dSApple OSS Distributions        /*
372*8d741a5dSApple OSS Distributions         * This isn't safe, as there's absolutely no hardware dependency involved.
373*8d741a5dSApple OSS Distributions         * Using an acquire barrier would of course fix it but is quite expensive...
374*8d741a5dSApple OSS Distributions         */
375*8d741a5dSApple OSS Distributions        if (os_atomic_load(&p->flag, relaxed)) {
376*8d741a5dSApple OSS Distributions            *value = p->value;
377*8d741a5dSApple OSS Distributions            return true;
378*8d741a5dSApple OSS Distributions        }
379*8d741a5dSApple OSS Distributions        return false;
380*8d741a5dSApple OSS Distributions    }
381*8d741a5dSApple OSS Distributions
382*8d741a5dSApple OSS Distributions    bool
383*8d741a5dSApple OSS Distributions    valid_read(struct foo *p, long *value)
384*8d741a5dSApple OSS Distributions    {
385*8d741a5dSApple OSS Distributions        long flag = os_atomic_load(&p->flag, dependency);
386*8d741a5dSApple OSS Distributions        if (flag) {
387*8d741a5dSApple OSS Distributions            /*
388*8d741a5dSApple OSS Distributions             * Further the chain of dependency to any loads through `p`
389*8d741a5dSApple OSS Distributions             * which properly pair with the release barrier in `publish`.
390*8d741a5dSApple OSS Distributions             */
391*8d741a5dSApple OSS Distributions            *value = os_atomic_load_with_dependency_on(&p->value, flag);
392*8d741a5dSApple OSS Distributions            return true;
393*8d741a5dSApple OSS Distributions        }
394*8d741a5dSApple OSS Distributions        return false;
395*8d741a5dSApple OSS Distributions    }
396*8d741a5dSApple OSS Distributions```
397*8d741a5dSApple OSS Distributions
398*8d741a5dSApple OSS DistributionsThere are 4 interfaces involved with hardware dependencies:
399*8d741a5dSApple OSS Distributions
400*8d741a5dSApple OSS Distributions1. `os_atomic_load(..., dependency)` to initiate roots of hardware dependencies,
401*8d741a5dSApple OSS Distributions   that should pair with a store or rmw with release semantics or stronger
402*8d741a5dSApple OSS Distributions   (release, acq\_rel or seq\_cst),
403*8d741a5dSApple OSS Distributions
404*8d741a5dSApple OSS Distributions2. `os_atomic_inject_dependency` can be used to inject the dependency provided
405*8d741a5dSApple OSS Distributions   by a *dependency* load, or any other value that has had a dependency
406*8d741a5dSApple OSS Distributions   injected,
407*8d741a5dSApple OSS Distributions
408*8d741a5dSApple OSS Distributions3. `os_atomic_load_with_dependency_on` to do an otherwise related relaxed load
409*8d741a5dSApple OSS Distributions   that still prolongs a dependency chain,
410*8d741a5dSApple OSS Distributions
411*8d741a5dSApple OSS Distributions4. `os_atomic_make_dependency` to create an opaque token out of a given
412*8d741a5dSApple OSS Distributions   dependency root to inject into multiple loads.
413*8d741a5dSApple OSS Distributions
414*8d741a5dSApple OSS Distributions
415*8d741a5dSApple OSS Distributions**Note**: this technique is NOT safe when the compiler can reason about the
416*8d741a5dSApple OSS Distributionspointers that you are manipulating, for example if the compiler can know that
417*8d741a5dSApple OSS Distributionsthe pointer can only take a couple of values and ditch all these manually
418*8d741a5dSApple OSS Distributionscrafted dependency chains. Hopefully there will be a future C2Y standard that
419*8d741a5dSApple OSS Distributionsprovides a similar construct as a language feature instead.
420