1*a325d9c4SApple OSS DistributionsXNU use of Atomics and Memory Barriers 2*a325d9c4SApple OSS Distributions====================================== 3*a325d9c4SApple OSS Distributions 4*a325d9c4SApple OSS DistributionsGoal 5*a325d9c4SApple OSS Distributions---- 6*a325d9c4SApple OSS Distributions 7*a325d9c4SApple OSS DistributionsThis document discusses the use of atomics and memory barriers in XNU. It is 8*a325d9c4SApple OSS Distributionsmeant as a guide to best practices, and warns against a variety of possible 9*a325d9c4SApple OSS Distributionspitfalls in the handling of atomics in C. 10*a325d9c4SApple OSS Distributions 11*a325d9c4SApple OSS DistributionsIt is assumed that the reader has a decent understanding of 12*a325d9c4SApple OSS Distributionsthe [C11 memory model](https://en.cppreference.com/w/c/atomic/memory_order) 13*a325d9c4SApple OSS Distributionsas this document builds on it, and explains the liberties XNU takes with said 14*a325d9c4SApple OSS Distributionsmodel. 15*a325d9c4SApple OSS Distributions 16*a325d9c4SApple OSS DistributionsAll the interfaces discussed in this document are available through 17*a325d9c4SApple OSS Distributionsthe `<os/atomic_private.h>` header. 18*a325d9c4SApple OSS Distributions 19*a325d9c4SApple OSS DistributionsNote: Linux has thorough documentation around memory barriers 20*a325d9c4SApple OSS Distributions(Documentation/memory-barriers.txt), some of which is Linux specific, 21*a325d9c4SApple OSS Distributionsbut most is not and is a valuable read. 22*a325d9c4SApple OSS Distributions 23*a325d9c4SApple OSS Distributions 24*a325d9c4SApple OSS DistributionsVocabulary 25*a325d9c4SApple OSS Distributions---------- 26*a325d9c4SApple OSS Distributions 27*a325d9c4SApple OSS DistributionsIn the rest of this document we'll refer to the various memory ordering defined 28*a325d9c4SApple OSS Distributionsby C11 as relaxed, consume, acquire, release, acq\_rel and seq\_cst. 29*a325d9c4SApple OSS Distributions 30*a325d9c4SApple OSS Distributions`os_atomic` also tries to make the distinction between compiler **barriers** 31*a325d9c4SApple OSS Distributions(which limit how much the compiler can reorder code), and memory **fences**. 32*a325d9c4SApple OSS Distributions 33*a325d9c4SApple OSS Distributions 34*a325d9c4SApple OSS DistributionsThe dangers and pitfalls of C11's `<stdatomic.h>` 35*a325d9c4SApple OSS Distributions------------------------------------------------- 36*a325d9c4SApple OSS Distributions 37*a325d9c4SApple OSS DistributionsWhile the C11 memory model has likely been one of the most important additions 38*a325d9c4SApple OSS Distributionsto modern C, in the purest C tradition, it is a sharp tool. 39*a325d9c4SApple OSS Distributions 40*a325d9c4SApple OSS DistributionsBy default, C11 comes with two variants of each atomic "operation": 41*a325d9c4SApple OSS Distributions 42*a325d9c4SApple OSS Distributions- an *explicit* variant where memory orderings can be specified, 43*a325d9c4SApple OSS Distributions- a regular variant which is equivalent to the former with the *seq_cst* 44*a325d9c4SApple OSS Distributions memory ordering. 45*a325d9c4SApple OSS Distributions 46*a325d9c4SApple OSS DistributionsWhen an `_Atomic` qualified variable is accessed directly without using 47*a325d9c4SApple OSS Distributionsany `atomic_*_explicit()` operation, then the compiler will generate the 48*a325d9c4SApple OSS Distributionsmatching *seq_cst* atomic operations on your behalf. 49*a325d9c4SApple OSS Distributions 50*a325d9c4SApple OSS DistributionsThe sequentially consistent world is extremely safe from a lot of compiler 51*a325d9c4SApple OSS Distributionsand hardware reorderings and optimizations, which is great, but comes with 52*a325d9c4SApple OSS Distributionsa huge cost in terms of memory barriers. 53*a325d9c4SApple OSS Distributions 54*a325d9c4SApple OSS Distributions 55*a325d9c4SApple OSS DistributionsIt seems very tempting to use `atomic_*_explicit()` functions with explicit 56*a325d9c4SApple OSS Distributionsmemory orderings, however, the compiler is entitled to perform a number of 57*a325d9c4SApple OSS Distributionsoptimizations with relaxed atomics, that most developers will not expect. 58*a325d9c4SApple OSS DistributionsIndeed, the compiler is perfectly allowed to perform various optimizations it 59*a325d9c4SApple OSS Distributionsdoes with other plain memory accesess such as coalescing, reordering, hoisting 60*a325d9c4SApple OSS Distributionsout of loops, ... 61*a325d9c4SApple OSS Distributions 62*a325d9c4SApple OSS DistributionsFor example, when the compiler can know what `doit` is doing (which due to LTO 63*a325d9c4SApple OSS Distributionsis almost always the case for XNU), is allowed to transform this code: 64*a325d9c4SApple OSS Distributions 65*a325d9c4SApple OSS Distributions```c 66*a325d9c4SApple OSS Distributions void 67*a325d9c4SApple OSS Distributions perform_with_progress(int steps, long _Atomic *progress) 68*a325d9c4SApple OSS Distributions { 69*a325d9c4SApple OSS Distributions for (int i = 0; i < steps; i++) { 70*a325d9c4SApple OSS Distributions doit(i); 71*a325d9c4SApple OSS Distributions atomic_store_explicit(progress, i, memory_order_relaxed); 72*a325d9c4SApple OSS Distributions } 73*a325d9c4SApple OSS Distributions } 74*a325d9c4SApple OSS Distributions``` 75*a325d9c4SApple OSS Distributions 76*a325d9c4SApple OSS DistributionsInto this, which obviously defeats the entire purpose of `progress`: 77*a325d9c4SApple OSS Distributions 78*a325d9c4SApple OSS Distributions```c 79*a325d9c4SApple OSS Distributions void 80*a325d9c4SApple OSS Distributions perform_with_progress(int steps, long _Atomic *progress) 81*a325d9c4SApple OSS Distributions { 82*a325d9c4SApple OSS Distributions for (int i = 0; i < steps; i++) { 83*a325d9c4SApple OSS Distributions doit(i); 84*a325d9c4SApple OSS Distributions } 85*a325d9c4SApple OSS Distributions atomic_store_explicit(progress, steps, memory_order_relaxed); 86*a325d9c4SApple OSS Distributions } 87*a325d9c4SApple OSS Distributions``` 88*a325d9c4SApple OSS Distributions 89*a325d9c4SApple OSS Distributions 90*a325d9c4SApple OSS DistributionsHow `os_atomic_*` tries to address `<stdatomic.h>` pitfalls 91*a325d9c4SApple OSS Distributions----------------------------------------------------------- 92*a325d9c4SApple OSS Distributions 93*a325d9c4SApple OSS Distributions1. the memory locations passed to the various `os_atomic_*` 94*a325d9c4SApple OSS Distributions functions do not need to be marked `_Atomic` or `volatile` 95*a325d9c4SApple OSS Distributions (or `_Atomic volatile`), which allow for use of atomic 96*a325d9c4SApple OSS Distributions operations in code written before C11 was even a thing. 97*a325d9c4SApple OSS Distributions 98*a325d9c4SApple OSS Distributions It is however recommended in new code to use the `_Atomic` 99*a325d9c4SApple OSS Distributions specifier. 100*a325d9c4SApple OSS Distributions 101*a325d9c4SApple OSS Distributions2. `os_atomic_*` cannot be coalesced by the compiler: 102*a325d9c4SApple OSS Distributions all accesses are performed on the specified locations 103*a325d9c4SApple OSS Distributions as if their type was `_Atomic volatile` qualified. 104*a325d9c4SApple OSS Distributions 105*a325d9c4SApple OSS Distributions3. `os_atomic_*` only comes with the explicit variants: 106*a325d9c4SApple OSS Distributions orderings must be provided and can express either memory orders 107*a325d9c4SApple OSS Distributions where the name is the same as in C11 without the `memory_order_` prefix, 108*a325d9c4SApple OSS Distributions or a compiler barrier ordering `compiler_acquire`, `compiler_release`, 109*a325d9c4SApple OSS Distributions `compiler_acq_rel`. 110*a325d9c4SApple OSS Distributions 111*a325d9c4SApple OSS Distributions4. `os_atomic_*` emits the proper compiler barriers that 112*a325d9c4SApple OSS Distributions correspond to the requested memory ordering (using 113*a325d9c4SApple OSS Distributions `atomic_signal_fence()`). 114*a325d9c4SApple OSS Distributions 115*a325d9c4SApple OSS Distributions 116*a325d9c4SApple OSS DistributionsBest practices for the use of atomics in XNU 117*a325d9c4SApple OSS Distributions-------------------------------------------- 118*a325d9c4SApple OSS Distributions 119*a325d9c4SApple OSS DistributionsFor most generic code, the `os_atomic_*` functions from 120*a325d9c4SApple OSS Distributions`<os/atomic_private.h>` are the preferred interfaces. 121*a325d9c4SApple OSS Distributions 122*a325d9c4SApple OSS Distributions`__sync_*`, `__c11_*` and `__atomic_*` compiler builtins should not be used. 123*a325d9c4SApple OSS Distributions 124*a325d9c4SApple OSS Distributions`<stdatomic.h>` functions may be used if: 125*a325d9c4SApple OSS Distributions 126*a325d9c4SApple OSS Distributions- compiler coalescing / reordering is desired (refcounting 127*a325d9c4SApple OSS Distributions implementations may desire this for example). 128*a325d9c4SApple OSS Distributions 129*a325d9c4SApple OSS Distributions 130*a325d9c4SApple OSS DistributionsQualifying atomic variables with `_Atomic` or even 131*a325d9c4SApple OSS Distributions`_Atomic volatile` is encouraged, however authors must 132*a325d9c4SApple OSS Distributionsbe aware that a direct access to this variable will 133*a325d9c4SApple OSS Distributionsresult in quite heavy memory barriers. 134*a325d9c4SApple OSS Distributions 135*a325d9c4SApple OSS DistributionsThe *consume* memory ordering should not be used 136*a325d9c4SApple OSS Distributions(See *dependency* memory order later in this documentation). 137*a325d9c4SApple OSS Distributions 138*a325d9c4SApple OSS Distributions**Note**: `<libkern/OSAtomic.h>` provides a bunch of legacy 139*a325d9c4SApple OSS Distributionsatomic interfaces, but this header is considered obsolete 140*a325d9c4SApple OSS Distributionsand these functions should not be used in new code. 141*a325d9c4SApple OSS Distributions 142*a325d9c4SApple OSS Distributions 143*a325d9c4SApple OSS DistributionsHigh level overview of `os_atomic_*` interfaces 144*a325d9c4SApple OSS Distributions----------------------------------------------- 145*a325d9c4SApple OSS Distributions 146*a325d9c4SApple OSS Distributions### Compiler barriers and memory fences 147*a325d9c4SApple OSS Distributions 148*a325d9c4SApple OSS Distributions`os_compiler_barrier(mem_order?)` provides a compiler barrier, 149*a325d9c4SApple OSS Distributionswith an optional barrier ordering. It is implemented with C11's 150*a325d9c4SApple OSS Distributions`atomic_signal_fence()`. The barrier ordering argument is optional 151*a325d9c4SApple OSS Distributionsand defaults to the `acq_rel` compiler barrier (which prevents the 152*a325d9c4SApple OSS Distributionscompiler to reorder code in any direction around this barrier). 153*a325d9c4SApple OSS Distributions 154*a325d9c4SApple OSS Distributions`os_atomic_thread_fence(mem_order)` provides a memory barrier 155*a325d9c4SApple OSS Distributionsaccording to the semantics of `atomic_thread_fence()`. It always 156*a325d9c4SApple OSS Distributionsimplies the equivalent `os_compiler_barrier()` even on UP systems. 157*a325d9c4SApple OSS Distributions 158*a325d9c4SApple OSS Distributions### Init, load and store 159*a325d9c4SApple OSS Distributions 160*a325d9c4SApple OSS Distributions`os_atomic_init`, `os_atomic_load` and `os_atomic_store` provide 161*a325d9c4SApple OSS Distributionsfacilities equivalent to `atomic_init`, `atomic_load_explicit` 162*a325d9c4SApple OSS Distributionsand `atomic_store_explicit` respectively. 163*a325d9c4SApple OSS Distributions 164*a325d9c4SApple OSS DistributionsNote that `os_atomic_load` and `os_atomic_store` promise that they will 165*a325d9c4SApple OSS Distributionscompile to a plain load or store. `os_atomic_load_wide` and 166*a325d9c4SApple OSS Distributions`os_atomic_store_wide` can be used to have access to atomic loads and store 167*a325d9c4SApple OSS Distributionsthat involve more costly codegen (such as compare exchange loops). 168*a325d9c4SApple OSS Distributions 169*a325d9c4SApple OSS Distributions### Basic RMW (read/modify/write) atomic operations 170*a325d9c4SApple OSS Distributions 171*a325d9c4SApple OSS DistributionsThe following basic atomic RMW operations exist: 172*a325d9c4SApple OSS Distributions 173*a325d9c4SApple OSS Distributions- `inc`: atomic increment (equivalent to an atomic add of `1`), 174*a325d9c4SApple OSS Distributions- `dec`: atomic decrement (equivalent to an atomic sub of `1`), 175*a325d9c4SApple OSS Distributions- `add`: atomic add, 176*a325d9c4SApple OSS Distributions- `sub`: atomic sub, 177*a325d9c4SApple OSS Distributions- `or`: atomic bitwise or, 178*a325d9c4SApple OSS Distributions- `xor`: atomic bitwise xor, 179*a325d9c4SApple OSS Distributions- `and`: atomic bitwise and, 180*a325d9c4SApple OSS Distributions- `andnot`: atomic bitwise andnot (equivalent to atomic and of ~value), 181*a325d9c4SApple OSS Distributions- `min`: atomic min, 182*a325d9c4SApple OSS Distributions- `max`: atomic max. 183*a325d9c4SApple OSS Distributions 184*a325d9c4SApple OSS DistributionsFor any such operation, two variants exist: 185*a325d9c4SApple OSS Distributions 186*a325d9c4SApple OSS Distributions- `os_atomic_${op}_orig` (for example `os_atomic_add_orig`) 187*a325d9c4SApple OSS Distributions which returns the value stored at the specified location 188*a325d9c4SApple OSS Distributions *before* the atomic operation took place 189*a325d9c4SApple OSS Distributions- `os_atomic_${op}` (for example `os_atomic_add`) which 190*a325d9c4SApple OSS Distributions returns the value stored at the specified location 191*a325d9c4SApple OSS Distributions *after* the atomic operation took place 192*a325d9c4SApple OSS Distributions 193*a325d9c4SApple OSS DistributionsThis convention is picked for two reasons: 194*a325d9c4SApple OSS Distributions 195*a325d9c4SApple OSS Distributions1. `os_atomic_add(p, value, ...)` is essentially equivalent to the C 196*a325d9c4SApple OSS Distributions in place addition `(*p += value)` which returns the result of the 197*a325d9c4SApple OSS Distributions operation and not the original value of `*p`. 198*a325d9c4SApple OSS Distributions 199*a325d9c4SApple OSS Distributions2. Most subtle atomic algorithms do actually require the original value 200*a325d9c4SApple OSS Distributions stored at the location, especially for bit manipulations: 201*a325d9c4SApple OSS Distributions `(os_atomic_or_orig(p, bit, relaxed) & bit)` will atomically perform 202*a325d9c4SApple OSS Distributions `*p |= bit` but also tell you whether `bit` was set in the original value. 203*a325d9c4SApple OSS Distributions 204*a325d9c4SApple OSS Distributions Making it more explicit that the original value is used is hence 205*a325d9c4SApple OSS Distributions important for readers and worth the extra five keystrokes. 206*a325d9c4SApple OSS Distributions 207*a325d9c4SApple OSS DistributionsTypically: 208*a325d9c4SApple OSS Distributions 209*a325d9c4SApple OSS Distributions```c 210*a325d9c4SApple OSS Distributions static int _Atomic i = 0; 211*a325d9c4SApple OSS Distributions 212*a325d9c4SApple OSS Distributions printf("%d\n", os_atomic_inc_orig(&i)); // prints 0 213*a325d9c4SApple OSS Distributions printf("%d\n", os_atomic_inc(&i)); // prints 2 214*a325d9c4SApple OSS Distributions``` 215*a325d9c4SApple OSS Distributions 216*a325d9c4SApple OSS Distributions### Atomic swap / compare and swap 217*a325d9c4SApple OSS Distributions 218*a325d9c4SApple OSS Distributions`os_atomic_xchg` is a simple wrapper around `atomic_exchange_explicit`. 219*a325d9c4SApple OSS Distributions 220*a325d9c4SApple OSS DistributionsThere are two variants of `os_atomic_cmpxchg` which are wrappers around 221*a325d9c4SApple OSS Distributions`atomic_compare_exchange_strong_explicit`. Both of these variants will 222*a325d9c4SApple OSS Distributionsreturn false/0 if the compare exchange failed, and true/1 if the expected 223*a325d9c4SApple OSS Distributionsvalue was found at the specified location and the new value was stored. 224*a325d9c4SApple OSS Distributions 225*a325d9c4SApple OSS Distributions1. `os_atomic_cmpxchg(address, expected, new_value, mem_order)` which 226*a325d9c4SApple OSS Distributions will atomically store `new_value` at `address` if the current value 227*a325d9c4SApple OSS Distributions is equal to `expected`. 228*a325d9c4SApple OSS Distributions 229*a325d9c4SApple OSS Distributions2. `os_atomic_cmpxchgv(address, expected, new_value, orig_value, mem_order)` 230*a325d9c4SApple OSS Distributions which has an extra `orig_value` argument which must be a pointer to a local 231*a325d9c4SApple OSS Distributions variable and will be filled with the current value at `address` whether the 232*a325d9c4SApple OSS Distributions compare exchange was successful or not. In case of success, the loaded value 233*a325d9c4SApple OSS Distributions will always be `expected`, however in case of failure it will be filled with 234*a325d9c4SApple OSS Distributions the current value, which is helpful to redrive compare exchange loops. 235*a325d9c4SApple OSS Distributions 236*a325d9c4SApple OSS DistributionsUnlike `atomic_compare_exchange_strong_explicit`, a single ordering is 237*a325d9c4SApple OSS Distributionsspecified, which only takes effect in case of a successful compare exchange. 238*a325d9c4SApple OSS DistributionsIn C11 speak, `os_atomic_cmpxchg*` always specifies `memory_order_relaxed` 239*a325d9c4SApple OSS Distributionsfor the failure case ordering, as it is what is used most of the time. 240*a325d9c4SApple OSS Distributions 241*a325d9c4SApple OSS DistributionsThere is no wrapper around `atomic_compare_exchange_weak_explicit`, 242*a325d9c4SApple OSS Distributionsas `os_atomic_rmw_loop` offers a much better alternative for CAS-loops. 243*a325d9c4SApple OSS Distributions 244*a325d9c4SApple OSS Distributions### `os_atomic_rmw_loop` 245*a325d9c4SApple OSS Distributions 246*a325d9c4SApple OSS DistributionsThis expressive and versatile construct allows for really terse and 247*a325d9c4SApple OSS Distributionsway more readable compare exchange loops. It also uses LL/SC constructs more 248*a325d9c4SApple OSS Distributionsefficiently than a compare exchange loop would allow. 249*a325d9c4SApple OSS Distributions 250*a325d9c4SApple OSS DistributionsInstead of a typical CAS-loop in C11: 251*a325d9c4SApple OSS Distributions 252*a325d9c4SApple OSS Distributions```c 253*a325d9c4SApple OSS Distributions int _Atomic *address; 254*a325d9c4SApple OSS Distributions int old_value, new_value; 255*a325d9c4SApple OSS Distributions bool success = false; 256*a325d9c4SApple OSS Distributions 257*a325d9c4SApple OSS Distributions old_value = atomic_load_explicit(address, memory_order_relaxed); 258*a325d9c4SApple OSS Distributions do { 259*a325d9c4SApple OSS Distributions if (!validate(old_value)) { 260*a325d9c4SApple OSS Distributions break; 261*a325d9c4SApple OSS Distributions } 262*a325d9c4SApple OSS Distributions new_value = compute_new_value(old_value); 263*a325d9c4SApple OSS Distributions success = atomic_compare_exchange_weak_explicit(address, &old_value, 264*a325d9c4SApple OSS Distributions new_value, memory_order_acquire, memory_order_relaxed); 265*a325d9c4SApple OSS Distributions } while (__improbable(!success)); 266*a325d9c4SApple OSS Distributions``` 267*a325d9c4SApple OSS Distributions 268*a325d9c4SApple OSS Distributions`os_atomic_rmw_loop` allows this form: 269*a325d9c4SApple OSS Distributions 270*a325d9c4SApple OSS Distributions```c 271*a325d9c4SApple OSS Distributions int _Atomic *address; 272*a325d9c4SApple OSS Distributions int old_value, new_value; 273*a325d9c4SApple OSS Distributions bool success; 274*a325d9c4SApple OSS Distributions 275*a325d9c4SApple OSS Distributions success = os_atomic_rmw_loop(address, old_value, new_value, acquire, { 276*a325d9c4SApple OSS Distributions if (!validate(old_value)) { 277*a325d9c4SApple OSS Distributions os_atomic_rmw_loop_give_up(break); 278*a325d9c4SApple OSS Distributions } 279*a325d9c4SApple OSS Distributions new_value = compute_new_value(old_value); 280*a325d9c4SApple OSS Distributions }); 281*a325d9c4SApple OSS Distributions``` 282*a325d9c4SApple OSS Distributions 283*a325d9c4SApple OSS DistributionsUnlike the C11 variant, it lets the reader know in program order that this will 284*a325d9c4SApple OSS Distributionsbe a CAS loop, and exposes the ordering upfront, while for traditional CAS loops 285*a325d9c4SApple OSS Distributionsone has to jump to the end of the code to understand what it does. 286*a325d9c4SApple OSS Distributions 287*a325d9c4SApple OSS DistributionsAny control flow that attempts to exit its scope of the loop needs to be 288*a325d9c4SApple OSS Distributionswrapped with `os_atomic_rmw_loop_give_up` (so that LL/SC architectures can 289*a325d9c4SApple OSS Distributionsabort their opened LL/SC transaction). 290*a325d9c4SApple OSS Distributions 291*a325d9c4SApple OSS DistributionsBecause these loops are LL/SC transactions, it is undefined to perform 292*a325d9c4SApple OSS Distributionsany store to memory (register operations are fine) within these loops, 293*a325d9c4SApple OSS Distributionsas these may cause the store-conditional to always fail. 294*a325d9c4SApple OSS DistributionsIn particular nesting of `os_atomic_rmw_loop` is invalid. 295*a325d9c4SApple OSS Distributions 296*a325d9c4SApple OSS DistributionsUse of `continue` within an `os_atomic_rmw_loop` is also invalid, instead an 297*a325d9c4SApple OSS Distributions`os_atomic_rmw_loop_give_up(goto again)` jumping to an `again:` label placed 298*a325d9c4SApple OSS Distributionsbefore the loop should be used in this way: 299*a325d9c4SApple OSS Distributions 300*a325d9c4SApple OSS Distributions```c 301*a325d9c4SApple OSS Distributions int _Atomic *address; 302*a325d9c4SApple OSS Distributions int old_value, new_value; 303*a325d9c4SApple OSS Distributions bool success; 304*a325d9c4SApple OSS Distributions 305*a325d9c4SApple OSS Distributionsagain: 306*a325d9c4SApple OSS Distributions success = os_atomic_rmw_loop(address, old_value, new_value, acquire, { 307*a325d9c4SApple OSS Distributions if (needs_some_store_that_can_thwart_the_transaction(old_value)) { 308*a325d9c4SApple OSS Distributions os_atomic_rmw_loop_give_up({ 309*a325d9c4SApple OSS Distributions // Do whatever you need to do/store to central memory 310*a325d9c4SApple OSS Distributions // that would cause the loop to always fail 311*a325d9c4SApple OSS Distributions do_my_rmw_loop_breaking_store(); 312*a325d9c4SApple OSS Distributions 313*a325d9c4SApple OSS Distributions // And only then redrive. 314*a325d9c4SApple OSS Distributions goto again; 315*a325d9c4SApple OSS Distributions }); 316*a325d9c4SApple OSS Distributions } 317*a325d9c4SApple OSS Distributions if (!validate(old_value)) { 318*a325d9c4SApple OSS Distributions os_atomic_rmw_loop_give_up(break); 319*a325d9c4SApple OSS Distributions } 320*a325d9c4SApple OSS Distributions new_value = compute_new_value(old_value); 321*a325d9c4SApple OSS Distributions }); 322*a325d9c4SApple OSS Distributions``` 323*a325d9c4SApple OSS Distributions 324*a325d9c4SApple OSS Distributions### the *dependency* memory order 325*a325d9c4SApple OSS Distributions 326*a325d9c4SApple OSS DistributionsBecause the C11 *consume* memory order is broken in various ways, 327*a325d9c4SApple OSS Distributionsmost compilers, clang included, implement it as an equivalent 328*a325d9c4SApple OSS Distributionsfor `memory_order_acquire`. However, its concept is useful 329*a325d9c4SApple OSS Distributionsfor certain algorithms. 330*a325d9c4SApple OSS Distributions 331*a325d9c4SApple OSS DistributionsAs an attempt to provide a replacement for this, `<os/atomic_private.h>` 332*a325d9c4SApple OSS Distributionsimplements an entirely new *dependency* memory ordering. 333*a325d9c4SApple OSS Distributions 334*a325d9c4SApple OSS DistributionsThe purpose of this ordering is to provide a relaxed load followed by an 335*a325d9c4SApple OSS Distributionsimplicit compiler barrier, that can be used as a root for a chain of hardware 336*a325d9c4SApple OSS Distributionsdependencies that would otherwise pair with store-releases done at this address, 337*a325d9c4SApple OSS Distributionsvery much like the *consume* memory order is intended to provide. 338*a325d9c4SApple OSS Distributions 339*a325d9c4SApple OSS DistributionsHowever, unlike the *consume* memory ordering where the compiler had to follow 340*a325d9c4SApple OSS Distributionsthe dependencies, the *dependency* memory ordering relies on explicit 341*a325d9c4SApple OSS Distributionsannotations of when the dependencies are expected: 342*a325d9c4SApple OSS Distributions 343*a325d9c4SApple OSS Distributions- loads through a pointer loaded with a *dependency* memory ordering 344*a325d9c4SApple OSS Distributions will provide a hardware dependency, 345*a325d9c4SApple OSS Distributions 346*a325d9c4SApple OSS Distributions- dependencies may be injected into other loads not performed through this 347*a325d9c4SApple OSS Distributions particular pointer with the `os_atomic_load_with_dependency_on` and 348*a325d9c4SApple OSS Distributions `os_atomic_inject_dependency` interfaces. 349*a325d9c4SApple OSS Distributions 350*a325d9c4SApple OSS DistributionsHere is an example of how it is meant to be used: 351*a325d9c4SApple OSS Distributions 352*a325d9c4SApple OSS Distributions```c 353*a325d9c4SApple OSS Distributions struct foo { 354*a325d9c4SApple OSS Distributions long value; 355*a325d9c4SApple OSS Distributions long _Atomic flag; 356*a325d9c4SApple OSS Distributions }; 357*a325d9c4SApple OSS Distributions 358*a325d9c4SApple OSS Distributions void 359*a325d9c4SApple OSS Distributions publish(struct foo *p, long value) 360*a325d9c4SApple OSS Distributions { 361*a325d9c4SApple OSS Distributions p->value = value; 362*a325d9c4SApple OSS Distributions os_atomic_store(&p->flag, 1, release); 363*a325d9c4SApple OSS Distributions } 364*a325d9c4SApple OSS Distributions 365*a325d9c4SApple OSS Distributions 366*a325d9c4SApple OSS Distributions bool 367*a325d9c4SApple OSS Distributions broken_read(struct foo *p, long *value) 368*a325d9c4SApple OSS Distributions { 369*a325d9c4SApple OSS Distributions /* 370*a325d9c4SApple OSS Distributions * This isn't safe, as there's absolutely no hardware dependency involved. 371*a325d9c4SApple OSS Distributions * Using an acquire barrier would of course fix it but is quite expensive... 372*a325d9c4SApple OSS Distributions */ 373*a325d9c4SApple OSS Distributions if (os_atomic_load(&p->flag, relaxed)) { 374*a325d9c4SApple OSS Distributions *value = p->value; 375*a325d9c4SApple OSS Distributions return true; 376*a325d9c4SApple OSS Distributions } 377*a325d9c4SApple OSS Distributions return false; 378*a325d9c4SApple OSS Distributions } 379*a325d9c4SApple OSS Distributions 380*a325d9c4SApple OSS Distributions bool 381*a325d9c4SApple OSS Distributions valid_read(struct foo *p, long *value) 382*a325d9c4SApple OSS Distributions { 383*a325d9c4SApple OSS Distributions long flag = os_atomic_load(&p->flag, dependency); 384*a325d9c4SApple OSS Distributions if (flag) { 385*a325d9c4SApple OSS Distributions /* 386*a325d9c4SApple OSS Distributions * Further the chain of dependency to any loads through `p` 387*a325d9c4SApple OSS Distributions * which properly pair with the release barrier in `publish`. 388*a325d9c4SApple OSS Distributions */ 389*a325d9c4SApple OSS Distributions *value = os_atomic_load_with_dependency_on(&p->value, flag); 390*a325d9c4SApple OSS Distributions return true; 391*a325d9c4SApple OSS Distributions } 392*a325d9c4SApple OSS Distributions return false; 393*a325d9c4SApple OSS Distributions } 394*a325d9c4SApple OSS Distributions``` 395*a325d9c4SApple OSS Distributions 396*a325d9c4SApple OSS DistributionsThere are 4 interfaces involved with hardware dependencies: 397*a325d9c4SApple OSS Distributions 398*a325d9c4SApple OSS Distributions1. `os_atomic_load(..., dependency)` to initiate roots of hardware dependencies, 399*a325d9c4SApple OSS Distributions that should pair with a store or rmw with release semantics or stronger 400*a325d9c4SApple OSS Distributions (release, acq\_rel or seq\_cst), 401*a325d9c4SApple OSS Distributions 402*a325d9c4SApple OSS Distributions2. `os_atomic_inject_dependency` can be used to inject the dependency provided 403*a325d9c4SApple OSS Distributions by a *dependency* load, or any other value that has had a dependency 404*a325d9c4SApple OSS Distributions injected, 405*a325d9c4SApple OSS Distributions 406*a325d9c4SApple OSS Distributions3. `os_atomic_load_with_dependency_on` to do an otherwise related relaxed load 407*a325d9c4SApple OSS Distributions that still prolongs a dependency chain, 408*a325d9c4SApple OSS Distributions 409*a325d9c4SApple OSS Distributions4. `os_atomic_make_dependency` to create an opaque token out of a given 410*a325d9c4SApple OSS Distributions dependency root to inject into multiple loads. 411*a325d9c4SApple OSS Distributions 412*a325d9c4SApple OSS Distributions 413*a325d9c4SApple OSS Distributions**Note**: this technique is NOT safe when the compiler can reason about the 414*a325d9c4SApple OSS Distributionspointers that you are manipulating, for example if the compiler can know that 415*a325d9c4SApple OSS Distributionsthe pointer can only take a couple of values and ditch all these manually 416*a325d9c4SApple OSS Distributionscrafted dependency chains. Hopefully there will be a future C2Y standard that 417*a325d9c4SApple OSS Distributionsprovides a similar construct as a language feature instead. 418