xref: /xnu-10002.61.3/doc/atomics.md (revision 0f4c859e951fba394238ab619495c4e1d54d0f34)
1*0f4c859eSApple OSS DistributionsXNU use of Atomics and Memory Barriers
2*0f4c859eSApple OSS Distributions======================================
3*0f4c859eSApple OSS Distributions
4*0f4c859eSApple OSS DistributionsGoal
5*0f4c859eSApple OSS Distributions----
6*0f4c859eSApple OSS Distributions
7*0f4c859eSApple OSS DistributionsThis document discusses the use of atomics and memory barriers in XNU. It is
8*0f4c859eSApple OSS Distributionsmeant as a guide to best practices, and warns against a variety of possible
9*0f4c859eSApple OSS Distributionspitfalls in the handling of atomics in C.
10*0f4c859eSApple OSS Distributions
11*0f4c859eSApple OSS DistributionsIt is assumed that the reader has a decent understanding of
12*0f4c859eSApple OSS Distributionsthe [C11 memory model](https://en.cppreference.com/w/c/atomic/memory_order)
13*0f4c859eSApple OSS Distributionsas this document builds on it, and explains the liberties XNU takes with said
14*0f4c859eSApple OSS Distributionsmodel.
15*0f4c859eSApple OSS Distributions
16*0f4c859eSApple OSS DistributionsAll the interfaces discussed in this document are available through
17*0f4c859eSApple OSS Distributionsthe `<os/atomic_private.h>` header.
18*0f4c859eSApple OSS Distributions
19*0f4c859eSApple OSS DistributionsNote: Linux has thorough documentation around memory barriers
20*0f4c859eSApple OSS Distributions(Documentation/memory-barriers.txt), some of which is Linux specific,
21*0f4c859eSApple OSS Distributionsbut most is not and is a valuable read.
22*0f4c859eSApple OSS Distributions
23*0f4c859eSApple OSS Distributions
24*0f4c859eSApple OSS DistributionsVocabulary
25*0f4c859eSApple OSS Distributions----------
26*0f4c859eSApple OSS Distributions
27*0f4c859eSApple OSS DistributionsIn the rest of this document we'll refer to the various memory ordering defined
28*0f4c859eSApple OSS Distributionsby C11 as relaxed, consume, acquire, release, acq\_rel and seq\_cst.
29*0f4c859eSApple OSS Distributions
30*0f4c859eSApple OSS Distributions`os_atomic` also tries to make the distinction between compiler **barriers**
31*0f4c859eSApple OSS Distributions(which limit how much the compiler can reorder code), and memory **fences**.
32*0f4c859eSApple OSS Distributions
33*0f4c859eSApple OSS Distributions
34*0f4c859eSApple OSS DistributionsThe dangers and pitfalls of C11's `<stdatomic.h>`
35*0f4c859eSApple OSS Distributions-------------------------------------------------
36*0f4c859eSApple OSS Distributions
37*0f4c859eSApple OSS DistributionsWhile the C11 memory model has likely been one of the most important additions
38*0f4c859eSApple OSS Distributionsto modern C, in the purest C tradition, it is a sharp tool.
39*0f4c859eSApple OSS Distributions
40*0f4c859eSApple OSS DistributionsBy default, C11 comes with two variants of each atomic "operation":
41*0f4c859eSApple OSS Distributions
42*0f4c859eSApple OSS Distributions- an *explicit* variant where memory orderings can be specified,
43*0f4c859eSApple OSS Distributions- a regular variant which is equivalent to the former with the *seq_cst*
44*0f4c859eSApple OSS Distributions  memory ordering.
45*0f4c859eSApple OSS Distributions
46*0f4c859eSApple OSS DistributionsWhen an `_Atomic` qualified variable is accessed directly without using
47*0f4c859eSApple OSS Distributionsany `atomic_*_explicit()` operation, then the compiler will generate the
48*0f4c859eSApple OSS Distributionsmatching *seq_cst* atomic operations on your behalf.
49*0f4c859eSApple OSS Distributions
50*0f4c859eSApple OSS DistributionsThe sequentially consistent world is extremely safe from a lot of compiler
51*0f4c859eSApple OSS Distributionsand hardware reorderings and optimizations, which is great, but comes with
52*0f4c859eSApple OSS Distributionsa huge cost in terms of memory barriers.
53*0f4c859eSApple OSS Distributions
54*0f4c859eSApple OSS Distributions
55*0f4c859eSApple OSS DistributionsIt seems very tempting to use `atomic_*_explicit()` functions with explicit
56*0f4c859eSApple OSS Distributionsmemory orderings, however, the compiler is entitled to perform a number of
57*0f4c859eSApple OSS Distributionsoptimizations with relaxed atomics, that most developers will not expect.
58*0f4c859eSApple OSS DistributionsIndeed, the compiler is perfectly allowed to perform various optimizations it
59*0f4c859eSApple OSS Distributionsdoes with other plain memory accesess such as coalescing, reordering, hoisting
60*0f4c859eSApple OSS Distributionsout of loops, ...
61*0f4c859eSApple OSS Distributions
62*0f4c859eSApple OSS DistributionsFor example, when the compiler can know what `doit` is doing (which due to LTO
63*0f4c859eSApple OSS Distributionsis almost always the case for XNU), is allowed to transform this code:
64*0f4c859eSApple OSS Distributions
65*0f4c859eSApple OSS Distributions```c
66*0f4c859eSApple OSS Distributions    void
67*0f4c859eSApple OSS Distributions    perform_with_progress(int steps, long _Atomic *progress)
68*0f4c859eSApple OSS Distributions    {
69*0f4c859eSApple OSS Distributions        for (int i = 0; i < steps; i++) {
70*0f4c859eSApple OSS Distributions            doit(i);
71*0f4c859eSApple OSS Distributions            atomic_store_explicit(progress, i, memory_order_relaxed);
72*0f4c859eSApple OSS Distributions        }
73*0f4c859eSApple OSS Distributions    }
74*0f4c859eSApple OSS Distributions```
75*0f4c859eSApple OSS Distributions
76*0f4c859eSApple OSS DistributionsInto this, which obviously defeats the entire purpose of `progress`:
77*0f4c859eSApple OSS Distributions
78*0f4c859eSApple OSS Distributions```c
79*0f4c859eSApple OSS Distributions    void
80*0f4c859eSApple OSS Distributions    perform_with_progress(int steps, long _Atomic *progress)
81*0f4c859eSApple OSS Distributions    {
82*0f4c859eSApple OSS Distributions        for (int i = 0; i < steps; i++) {
83*0f4c859eSApple OSS Distributions            doit(i);
84*0f4c859eSApple OSS Distributions        }
85*0f4c859eSApple OSS Distributions        atomic_store_explicit(progress, steps, memory_order_relaxed);
86*0f4c859eSApple OSS Distributions    }
87*0f4c859eSApple OSS Distributions```
88*0f4c859eSApple OSS Distributions
89*0f4c859eSApple OSS Distributions
90*0f4c859eSApple OSS DistributionsHow `os_atomic_*` tries to address `<stdatomic.h>` pitfalls
91*0f4c859eSApple OSS Distributions-----------------------------------------------------------
92*0f4c859eSApple OSS Distributions
93*0f4c859eSApple OSS Distributions1. the memory locations passed to the various `os_atomic_*`
94*0f4c859eSApple OSS Distributions   functions do not need to be marked `_Atomic` or `volatile`
95*0f4c859eSApple OSS Distributions   (or `_Atomic volatile`), which allow for use of atomic
96*0f4c859eSApple OSS Distributions   operations in code written before C11 was even a thing.
97*0f4c859eSApple OSS Distributions
98*0f4c859eSApple OSS Distributions   It is however recommended in new code to use the `_Atomic`
99*0f4c859eSApple OSS Distributions   specifier.
100*0f4c859eSApple OSS Distributions
101*0f4c859eSApple OSS Distributions2. `os_atomic_*` cannot be coalesced by the compiler:
102*0f4c859eSApple OSS Distributions   all accesses are performed on the specified locations
103*0f4c859eSApple OSS Distributions   as if their type was `_Atomic volatile` qualified.
104*0f4c859eSApple OSS Distributions
105*0f4c859eSApple OSS Distributions3. `os_atomic_*` only comes with the explicit variants:
106*0f4c859eSApple OSS Distributions   orderings must be provided and can express either memory orders
107*0f4c859eSApple OSS Distributions   where the name is the same as in C11 without the `memory_order_` prefix,
108*0f4c859eSApple OSS Distributions   or a compiler barrier ordering `compiler_acquire`, `compiler_release`,
109*0f4c859eSApple OSS Distributions   `compiler_acq_rel`.
110*0f4c859eSApple OSS Distributions
111*0f4c859eSApple OSS Distributions4. `os_atomic_*` emits the proper compiler barriers that
112*0f4c859eSApple OSS Distributions   correspond to the requested memory ordering (using
113*0f4c859eSApple OSS Distributions   `atomic_signal_fence()`).
114*0f4c859eSApple OSS Distributions
115*0f4c859eSApple OSS Distributions
116*0f4c859eSApple OSS DistributionsBest practices for the use of atomics in XNU
117*0f4c859eSApple OSS Distributions--------------------------------------------
118*0f4c859eSApple OSS Distributions
119*0f4c859eSApple OSS DistributionsFor most generic code, the `os_atomic_*` functions from
120*0f4c859eSApple OSS Distributions`<os/atomic_private.h>` are the preferred interfaces.
121*0f4c859eSApple OSS Distributions
122*0f4c859eSApple OSS Distributions`__sync_*`, `__c11_*` and `__atomic_*` compiler builtins should not be used.
123*0f4c859eSApple OSS Distributions
124*0f4c859eSApple OSS Distributions`<stdatomic.h>` functions may be used if:
125*0f4c859eSApple OSS Distributions
126*0f4c859eSApple OSS Distributions- compiler coalescing / reordering is desired (refcounting
127*0f4c859eSApple OSS Distributions  implementations may desire this for example).
128*0f4c859eSApple OSS Distributions
129*0f4c859eSApple OSS Distributions
130*0f4c859eSApple OSS DistributionsQualifying atomic variables with `_Atomic` or even
131*0f4c859eSApple OSS Distributions`_Atomic volatile` is encouraged, however authors must
132*0f4c859eSApple OSS Distributionsbe aware that a direct access to this variable will
133*0f4c859eSApple OSS Distributionsresult in quite heavy memory barriers.
134*0f4c859eSApple OSS Distributions
135*0f4c859eSApple OSS DistributionsThe *consume* memory ordering should not be used
136*0f4c859eSApple OSS Distributions(See *dependency* memory order later in this documentation).
137*0f4c859eSApple OSS Distributions
138*0f4c859eSApple OSS Distributions**Note**: `<libkern/OSAtomic.h>` provides a bunch of legacy
139*0f4c859eSApple OSS Distributionsatomic interfaces, but this header is considered obsolete
140*0f4c859eSApple OSS Distributionsand these functions should not be used in new code.
141*0f4c859eSApple OSS Distributions
142*0f4c859eSApple OSS Distributions
143*0f4c859eSApple OSS DistributionsHigh level overview of `os_atomic_*` interfaces
144*0f4c859eSApple OSS Distributions-----------------------------------------------
145*0f4c859eSApple OSS Distributions
146*0f4c859eSApple OSS Distributions### Compiler barriers and memory fences
147*0f4c859eSApple OSS Distributions
148*0f4c859eSApple OSS Distributions`os_compiler_barrier(mem_order?)` provides a compiler barrier,
149*0f4c859eSApple OSS Distributionswith an optional barrier ordering. It is implemented with C11's
150*0f4c859eSApple OSS Distributions`atomic_signal_fence()`. The barrier ordering argument is optional
151*0f4c859eSApple OSS Distributionsand defaults to the `acq_rel` compiler barrier (which prevents the
152*0f4c859eSApple OSS Distributionscompiler to reorder code in any direction around this barrier).
153*0f4c859eSApple OSS Distributions
154*0f4c859eSApple OSS Distributions`os_atomic_thread_fence(mem_order)` provides a memory barrier
155*0f4c859eSApple OSS Distributionsaccording to the semantics of `atomic_thread_fence()`. It always
156*0f4c859eSApple OSS Distributionsimplies the equivalent `os_compiler_barrier()` even on UP systems.
157*0f4c859eSApple OSS Distributions
158*0f4c859eSApple OSS Distributions### Init, load and store
159*0f4c859eSApple OSS Distributions
160*0f4c859eSApple OSS Distributions`os_atomic_init`, `os_atomic_load` and `os_atomic_store` provide
161*0f4c859eSApple OSS Distributionsfacilities equivalent to `atomic_init`, `atomic_load_explicit`
162*0f4c859eSApple OSS Distributionsand `atomic_store_explicit` respectively.
163*0f4c859eSApple OSS Distributions
164*0f4c859eSApple OSS DistributionsNote that `os_atomic_load` and `os_atomic_store` promise that they will
165*0f4c859eSApple OSS Distributionscompile to a plain load or store. `os_atomic_load_wide` and
166*0f4c859eSApple OSS Distributions`os_atomic_store_wide` can be used to have access to atomic loads and store
167*0f4c859eSApple OSS Distributionsthat involve more costly codegen (such as compare exchange loops).
168*0f4c859eSApple OSS Distributions
169*0f4c859eSApple OSS Distributions### Basic RMW (read/modify/write) atomic operations
170*0f4c859eSApple OSS Distributions
171*0f4c859eSApple OSS DistributionsThe following basic atomic RMW operations exist:
172*0f4c859eSApple OSS Distributions
173*0f4c859eSApple OSS Distributions- `inc`: atomic increment (equivalent to an atomic add of `1`),
174*0f4c859eSApple OSS Distributions- `dec`: atomic decrement (equivalent to an atomic sub of `1`),
175*0f4c859eSApple OSS Distributions- `add`: atomic add,
176*0f4c859eSApple OSS Distributions- `sub`: atomic sub,
177*0f4c859eSApple OSS Distributions- `or`: atomic bitwise or,
178*0f4c859eSApple OSS Distributions- `xor`: atomic bitwise xor,
179*0f4c859eSApple OSS Distributions- `and`: atomic bitwise and,
180*0f4c859eSApple OSS Distributions- `andnot`: atomic bitwise andnot (equivalent to atomic and of ~value),
181*0f4c859eSApple OSS Distributions- `min`: atomic min,
182*0f4c859eSApple OSS Distributions- `max`: atomic max.
183*0f4c859eSApple OSS Distributions
184*0f4c859eSApple OSS DistributionsFor any such operation, two variants exist:
185*0f4c859eSApple OSS Distributions
186*0f4c859eSApple OSS Distributions- `os_atomic_${op}_orig` (for example `os_atomic_add_orig`)
187*0f4c859eSApple OSS Distributions  which returns the value stored at the specified location
188*0f4c859eSApple OSS Distributions  *before* the atomic operation took place
189*0f4c859eSApple OSS Distributions- `os_atomic_${op}` (for example `os_atomic_add`) which
190*0f4c859eSApple OSS Distributions  returns the value stored at the specified location
191*0f4c859eSApple OSS Distributions  *after* the atomic operation took place
192*0f4c859eSApple OSS Distributions
193*0f4c859eSApple OSS DistributionsThis convention is picked for two reasons:
194*0f4c859eSApple OSS Distributions
195*0f4c859eSApple OSS Distributions1. `os_atomic_add(p, value, ...)` is essentially equivalent to the C
196*0f4c859eSApple OSS Distributions   in place addition `(*p += value)` which returns the result of the
197*0f4c859eSApple OSS Distributions   operation and not the original value of `*p`.
198*0f4c859eSApple OSS Distributions
199*0f4c859eSApple OSS Distributions2. Most subtle atomic algorithms do actually require the original value
200*0f4c859eSApple OSS Distributions   stored at the location, especially for bit manipulations:
201*0f4c859eSApple OSS Distributions   `(os_atomic_or_orig(p, bit, relaxed) & bit)` will atomically perform
202*0f4c859eSApple OSS Distributions   `*p |= bit` but also tell you whether `bit` was set in the original value.
203*0f4c859eSApple OSS Distributions
204*0f4c859eSApple OSS Distributions   Making it more explicit that the original value is used is hence
205*0f4c859eSApple OSS Distributions   important for readers and worth the extra five keystrokes.
206*0f4c859eSApple OSS Distributions
207*0f4c859eSApple OSS DistributionsTypically:
208*0f4c859eSApple OSS Distributions
209*0f4c859eSApple OSS Distributions```c
210*0f4c859eSApple OSS Distributions    static int _Atomic i = 0;
211*0f4c859eSApple OSS Distributions
212*0f4c859eSApple OSS Distributions    printf("%d\n", os_atomic_inc_orig(&i)); // prints 0
213*0f4c859eSApple OSS Distributions    printf("%d\n", os_atomic_inc(&i)); // prints 2
214*0f4c859eSApple OSS Distributions```
215*0f4c859eSApple OSS Distributions
216*0f4c859eSApple OSS Distributions### Atomic swap / compare and swap
217*0f4c859eSApple OSS Distributions
218*0f4c859eSApple OSS Distributions`os_atomic_xchg` is a simple wrapper around `atomic_exchange_explicit`.
219*0f4c859eSApple OSS Distributions
220*0f4c859eSApple OSS DistributionsThere are two variants of `os_atomic_cmpxchg` which are wrappers around
221*0f4c859eSApple OSS Distributions`atomic_compare_exchange_strong_explicit`. Both of these variants will
222*0f4c859eSApple OSS Distributionsreturn false/0 if the compare exchange failed, and true/1 if the expected
223*0f4c859eSApple OSS Distributionsvalue was found at the specified location and the new value was stored.
224*0f4c859eSApple OSS Distributions
225*0f4c859eSApple OSS Distributions1. `os_atomic_cmpxchg(address, expected, new_value, mem_order)` which
226*0f4c859eSApple OSS Distributions   will atomically store `new_value` at `address` if the current value
227*0f4c859eSApple OSS Distributions   is equal to `expected`.
228*0f4c859eSApple OSS Distributions
229*0f4c859eSApple OSS Distributions2. `os_atomic_cmpxchgv(address, expected, new_value, orig_value, mem_order)`
230*0f4c859eSApple OSS Distributions   which has an extra `orig_value` argument which must be a pointer to a local
231*0f4c859eSApple OSS Distributions   variable and will be filled with the current value at `address` whether the
232*0f4c859eSApple OSS Distributions   compare exchange was successful or not. In case of success, the loaded value
233*0f4c859eSApple OSS Distributions   will always be `expected`, however in case of failure it will be filled with
234*0f4c859eSApple OSS Distributions   the current value, which is helpful to redrive compare exchange loops.
235*0f4c859eSApple OSS Distributions
236*0f4c859eSApple OSS DistributionsUnlike `atomic_compare_exchange_strong_explicit`, a single ordering is
237*0f4c859eSApple OSS Distributionsspecified, which only takes effect in case of a successful compare exchange.
238*0f4c859eSApple OSS DistributionsIn C11 speak, `os_atomic_cmpxchg*` always specifies `memory_order_relaxed`
239*0f4c859eSApple OSS Distributionsfor the failure case ordering, as it is what is used most of the time.
240*0f4c859eSApple OSS Distributions
241*0f4c859eSApple OSS DistributionsThere is no wrapper around `atomic_compare_exchange_weak_explicit`,
242*0f4c859eSApple OSS Distributionsas `os_atomic_rmw_loop` offers a much better alternative for CAS-loops.
243*0f4c859eSApple OSS Distributions
244*0f4c859eSApple OSS Distributions### `os_atomic_rmw_loop`
245*0f4c859eSApple OSS Distributions
246*0f4c859eSApple OSS DistributionsThis expressive and versatile construct allows for really terse and
247*0f4c859eSApple OSS Distributionsway more readable compare exchange loops. It also uses LL/SC constructs more
248*0f4c859eSApple OSS Distributionsefficiently than a compare exchange loop would allow.
249*0f4c859eSApple OSS Distributions
250*0f4c859eSApple OSS DistributionsInstead of a typical CAS-loop in C11:
251*0f4c859eSApple OSS Distributions
252*0f4c859eSApple OSS Distributions```c
253*0f4c859eSApple OSS Distributions    int _Atomic *address;
254*0f4c859eSApple OSS Distributions    int old_value, new_value;
255*0f4c859eSApple OSS Distributions    bool success = false;
256*0f4c859eSApple OSS Distributions
257*0f4c859eSApple OSS Distributions    old_value = atomic_load_explicit(address, memory_order_relaxed);
258*0f4c859eSApple OSS Distributions    do {
259*0f4c859eSApple OSS Distributions        if (!validate(old_value)) {
260*0f4c859eSApple OSS Distributions            break;
261*0f4c859eSApple OSS Distributions        }
262*0f4c859eSApple OSS Distributions        new_value = compute_new_value(old_value);
263*0f4c859eSApple OSS Distributions        success = atomic_compare_exchange_weak_explicit(address, &old_value,
264*0f4c859eSApple OSS Distributions                new_value, memory_order_acquire, memory_order_relaxed);
265*0f4c859eSApple OSS Distributions    } while (__improbable(!success));
266*0f4c859eSApple OSS Distributions```
267*0f4c859eSApple OSS Distributions
268*0f4c859eSApple OSS Distributions`os_atomic_rmw_loop` allows this form:
269*0f4c859eSApple OSS Distributions
270*0f4c859eSApple OSS Distributions```c
271*0f4c859eSApple OSS Distributions    int _Atomic *address;
272*0f4c859eSApple OSS Distributions    int old_value, new_value;
273*0f4c859eSApple OSS Distributions    bool success;
274*0f4c859eSApple OSS Distributions
275*0f4c859eSApple OSS Distributions    success = os_atomic_rmw_loop(address, old_value, new_value, acquire, {
276*0f4c859eSApple OSS Distributions        if (!validate(old_value)) {
277*0f4c859eSApple OSS Distributions            os_atomic_rmw_loop_give_up(break);
278*0f4c859eSApple OSS Distributions        }
279*0f4c859eSApple OSS Distributions        new_value = compute_new_value(old_value);
280*0f4c859eSApple OSS Distributions    });
281*0f4c859eSApple OSS Distributions```
282*0f4c859eSApple OSS Distributions
283*0f4c859eSApple OSS DistributionsUnlike the C11 variant, it lets the reader know in program order that this will
284*0f4c859eSApple OSS Distributionsbe a CAS loop, and exposes the ordering upfront, while for traditional CAS loops
285*0f4c859eSApple OSS Distributionsone has to jump to the end of the code to understand what it does.
286*0f4c859eSApple OSS Distributions
287*0f4c859eSApple OSS DistributionsAny control flow that attempts to exit its scope of the loop needs to be
288*0f4c859eSApple OSS Distributionswrapped with `os_atomic_rmw_loop_give_up` (so that LL/SC architectures can
289*0f4c859eSApple OSS Distributionsabort their opened LL/SC transaction).
290*0f4c859eSApple OSS Distributions
291*0f4c859eSApple OSS DistributionsBecause these loops are LL/SC transactions, it is undefined to perform
292*0f4c859eSApple OSS Distributionsany store to memory (register operations are fine) within these loops,
293*0f4c859eSApple OSS Distributionsas these may cause the store-conditional to always fail.
294*0f4c859eSApple OSS DistributionsIn particular nesting of `os_atomic_rmw_loop` is invalid.
295*0f4c859eSApple OSS Distributions
296*0f4c859eSApple OSS DistributionsUse of `continue` within an `os_atomic_rmw_loop` is also invalid, instead an
297*0f4c859eSApple OSS Distributions`os_atomic_rmw_loop_give_up(goto again)` jumping to an `again:` label placed
298*0f4c859eSApple OSS Distributionsbefore the loop should be used in this way:
299*0f4c859eSApple OSS Distributions
300*0f4c859eSApple OSS Distributions```c
301*0f4c859eSApple OSS Distributions    int _Atomic *address;
302*0f4c859eSApple OSS Distributions    int old_value, new_value;
303*0f4c859eSApple OSS Distributions    bool success;
304*0f4c859eSApple OSS Distributions
305*0f4c859eSApple OSS Distributionsagain:
306*0f4c859eSApple OSS Distributions    success = os_atomic_rmw_loop(address, old_value, new_value, acquire, {
307*0f4c859eSApple OSS Distributions        if (needs_some_store_that_can_thwart_the_transaction(old_value)) {
308*0f4c859eSApple OSS Distributions            os_atomic_rmw_loop_give_up({
309*0f4c859eSApple OSS Distributions                // Do whatever you need to do/store to central memory
310*0f4c859eSApple OSS Distributions                // that would cause the loop to always fail
311*0f4c859eSApple OSS Distributions                do_my_rmw_loop_breaking_store();
312*0f4c859eSApple OSS Distributions
313*0f4c859eSApple OSS Distributions                // And only then redrive.
314*0f4c859eSApple OSS Distributions                goto again;
315*0f4c859eSApple OSS Distributions            });
316*0f4c859eSApple OSS Distributions        }
317*0f4c859eSApple OSS Distributions        if (!validate(old_value)) {
318*0f4c859eSApple OSS Distributions            os_atomic_rmw_loop_give_up(break);
319*0f4c859eSApple OSS Distributions        }
320*0f4c859eSApple OSS Distributions        new_value = compute_new_value(old_value);
321*0f4c859eSApple OSS Distributions    });
322*0f4c859eSApple OSS Distributions```
323*0f4c859eSApple OSS Distributions
324*0f4c859eSApple OSS Distributions### the *dependency* memory order
325*0f4c859eSApple OSS Distributions
326*0f4c859eSApple OSS DistributionsBecause the C11 *consume* memory order is broken in various ways,
327*0f4c859eSApple OSS Distributionsmost compilers, clang included, implement it as an equivalent
328*0f4c859eSApple OSS Distributionsfor `memory_order_acquire`. However, its concept is useful
329*0f4c859eSApple OSS Distributionsfor certain algorithms.
330*0f4c859eSApple OSS Distributions
331*0f4c859eSApple OSS DistributionsAs an attempt to provide a replacement for this, `<os/atomic_private.h>`
332*0f4c859eSApple OSS Distributionsimplements an entirely new *dependency* memory ordering.
333*0f4c859eSApple OSS Distributions
334*0f4c859eSApple OSS DistributionsThe purpose of this ordering is to provide a relaxed load followed by an
335*0f4c859eSApple OSS Distributionsimplicit compiler barrier, that can be used as a root for a chain of hardware
336*0f4c859eSApple OSS Distributionsdependencies that would otherwise pair with store-releases done at this address,
337*0f4c859eSApple OSS Distributionsvery much like the *consume* memory order is intended to provide.
338*0f4c859eSApple OSS Distributions
339*0f4c859eSApple OSS DistributionsHowever, unlike the *consume* memory ordering where the compiler had to follow
340*0f4c859eSApple OSS Distributionsthe dependencies, the *dependency* memory ordering relies on explicit
341*0f4c859eSApple OSS Distributionsannotations of when the dependencies are expected:
342*0f4c859eSApple OSS Distributions
343*0f4c859eSApple OSS Distributions- loads through a pointer loaded with a *dependency* memory ordering
344*0f4c859eSApple OSS Distributions  will provide a hardware dependency,
345*0f4c859eSApple OSS Distributions
346*0f4c859eSApple OSS Distributions- dependencies may be injected into other loads not performed through this
347*0f4c859eSApple OSS Distributions  particular pointer with the `os_atomic_load_with_dependency_on` and
348*0f4c859eSApple OSS Distributions  `os_atomic_inject_dependency` interfaces.
349*0f4c859eSApple OSS Distributions
350*0f4c859eSApple OSS DistributionsHere is an example of how it is meant to be used:
351*0f4c859eSApple OSS Distributions
352*0f4c859eSApple OSS Distributions```c
353*0f4c859eSApple OSS Distributions    struct foo {
354*0f4c859eSApple OSS Distributions        long value;
355*0f4c859eSApple OSS Distributions        long _Atomic flag;
356*0f4c859eSApple OSS Distributions    };
357*0f4c859eSApple OSS Distributions
358*0f4c859eSApple OSS Distributions    void
359*0f4c859eSApple OSS Distributions    publish(struct foo *p, long value)
360*0f4c859eSApple OSS Distributions    {
361*0f4c859eSApple OSS Distributions        p->value = value;
362*0f4c859eSApple OSS Distributions        os_atomic_store(&p->flag, 1, release);
363*0f4c859eSApple OSS Distributions    }
364*0f4c859eSApple OSS Distributions
365*0f4c859eSApple OSS Distributions
366*0f4c859eSApple OSS Distributions    bool
367*0f4c859eSApple OSS Distributions    broken_read(struct foo *p, long *value)
368*0f4c859eSApple OSS Distributions    {
369*0f4c859eSApple OSS Distributions        /*
370*0f4c859eSApple OSS Distributions         * This isn't safe, as there's absolutely no hardware dependency involved.
371*0f4c859eSApple OSS Distributions         * Using an acquire barrier would of course fix it but is quite expensive...
372*0f4c859eSApple OSS Distributions         */
373*0f4c859eSApple OSS Distributions        if (os_atomic_load(&p->flag, relaxed)) {
374*0f4c859eSApple OSS Distributions            *value = p->value;
375*0f4c859eSApple OSS Distributions            return true;
376*0f4c859eSApple OSS Distributions        }
377*0f4c859eSApple OSS Distributions        return false;
378*0f4c859eSApple OSS Distributions    }
379*0f4c859eSApple OSS Distributions
380*0f4c859eSApple OSS Distributions    bool
381*0f4c859eSApple OSS Distributions    valid_read(struct foo *p, long *value)
382*0f4c859eSApple OSS Distributions    {
383*0f4c859eSApple OSS Distributions        long flag = os_atomic_load(&p->flag, dependency);
384*0f4c859eSApple OSS Distributions        if (flag) {
385*0f4c859eSApple OSS Distributions            /*
386*0f4c859eSApple OSS Distributions             * Further the chain of dependency to any loads through `p`
387*0f4c859eSApple OSS Distributions             * which properly pair with the release barrier in `publish`.
388*0f4c859eSApple OSS Distributions             */
389*0f4c859eSApple OSS Distributions            *value = os_atomic_load_with_dependency_on(&p->value, flag);
390*0f4c859eSApple OSS Distributions            return true;
391*0f4c859eSApple OSS Distributions        }
392*0f4c859eSApple OSS Distributions        return false;
393*0f4c859eSApple OSS Distributions    }
394*0f4c859eSApple OSS Distributions```
395*0f4c859eSApple OSS Distributions
396*0f4c859eSApple OSS DistributionsThere are 4 interfaces involved with hardware dependencies:
397*0f4c859eSApple OSS Distributions
398*0f4c859eSApple OSS Distributions1. `os_atomic_load(..., dependency)` to initiate roots of hardware dependencies,
399*0f4c859eSApple OSS Distributions   that should pair with a store or rmw with release semantics or stronger
400*0f4c859eSApple OSS Distributions   (release, acq\_rel or seq\_cst),
401*0f4c859eSApple OSS Distributions
402*0f4c859eSApple OSS Distributions2. `os_atomic_inject_dependency` can be used to inject the dependency provided
403*0f4c859eSApple OSS Distributions   by a *dependency* load, or any other value that has had a dependency
404*0f4c859eSApple OSS Distributions   injected,
405*0f4c859eSApple OSS Distributions
406*0f4c859eSApple OSS Distributions3. `os_atomic_load_with_dependency_on` to do an otherwise related relaxed load
407*0f4c859eSApple OSS Distributions   that still prolongs a dependency chain,
408*0f4c859eSApple OSS Distributions
409*0f4c859eSApple OSS Distributions4. `os_atomic_make_dependency` to create an opaque token out of a given
410*0f4c859eSApple OSS Distributions   dependency root to inject into multiple loads.
411*0f4c859eSApple OSS Distributions
412*0f4c859eSApple OSS Distributions
413*0f4c859eSApple OSS Distributions**Note**: this technique is NOT safe when the compiler can reason about the
414*0f4c859eSApple OSS Distributionspointers that you are manipulating, for example if the compiler can know that
415*0f4c859eSApple OSS Distributionsthe pointer can only take a couple of values and ditch all these manually
416*0f4c859eSApple OSS Distributionscrafted dependency chains. Hopefully there will be a future C2Y standard that
417*0f4c859eSApple OSS Distributionsprovides a similar construct as a language feature instead.
418