1*8d741a5dSApple OSS DistributionsXNU use of Atomics and Memory Barriers 2*8d741a5dSApple OSS Distributions====================================== 3*8d741a5dSApple OSS Distributions 4*8d741a5dSApple OSS DistributionsHow to use atomics and memory barriers in xnu. 5*8d741a5dSApple OSS Distributions 6*8d741a5dSApple OSS DistributionsGoal 7*8d741a5dSApple OSS Distributions---- 8*8d741a5dSApple OSS Distributions 9*8d741a5dSApple OSS DistributionsThis document discusses the use of atomics and memory barriers in XNU. It is 10*8d741a5dSApple OSS Distributionsmeant as a guide to best practices, and warns against a variety of possible 11*8d741a5dSApple OSS Distributionspitfalls in the handling of atomics in C. 12*8d741a5dSApple OSS Distributions 13*8d741a5dSApple OSS DistributionsIt is assumed that the reader has a decent understanding of 14*8d741a5dSApple OSS Distributionsthe [C11 memory model](https://en.cppreference.com/w/c/atomic/memory_order) 15*8d741a5dSApple OSS Distributionsas this document builds on it, and explains the liberties XNU takes with said 16*8d741a5dSApple OSS Distributionsmodel. 17*8d741a5dSApple OSS Distributions 18*8d741a5dSApple OSS DistributionsAll the interfaces discussed in this document are available through 19*8d741a5dSApple OSS Distributionsthe `<os/atomic_private.h>` header. 20*8d741a5dSApple OSS Distributions 21*8d741a5dSApple OSS DistributionsNote: Linux has thorough documentation around memory barriers 22*8d741a5dSApple OSS Distributions(Documentation/memory-barriers.txt), some of which is Linux specific, 23*8d741a5dSApple OSS Distributionsbut most is not and is a valuable read. 24*8d741a5dSApple OSS Distributions 25*8d741a5dSApple OSS Distributions 26*8d741a5dSApple OSS DistributionsVocabulary 27*8d741a5dSApple OSS Distributions---------- 28*8d741a5dSApple OSS Distributions 29*8d741a5dSApple OSS DistributionsIn the rest of this document we'll refer to the various memory ordering defined 30*8d741a5dSApple OSS Distributionsby C11 as relaxed, consume, acquire, release, acq\_rel and seq\_cst. 31*8d741a5dSApple OSS Distributions 32*8d741a5dSApple OSS Distributions`os_atomic` also tries to make the distinction between compiler **barriers** 33*8d741a5dSApple OSS Distributions(which limit how much the compiler can reorder code), and memory **fences**. 34*8d741a5dSApple OSS Distributions 35*8d741a5dSApple OSS Distributions 36*8d741a5dSApple OSS DistributionsThe dangers and pitfalls of C11's `<stdatomic.h>` 37*8d741a5dSApple OSS Distributions------------------------------------------------- 38*8d741a5dSApple OSS Distributions 39*8d741a5dSApple OSS DistributionsWhile the C11 memory model has likely been one of the most important additions 40*8d741a5dSApple OSS Distributionsto modern C, in the purest C tradition, it is a sharp tool. 41*8d741a5dSApple OSS Distributions 42*8d741a5dSApple OSS DistributionsBy default, C11 comes with two variants of each atomic "operation": 43*8d741a5dSApple OSS Distributions 44*8d741a5dSApple OSS Distributions- an *explicit* variant where memory orderings can be specified, 45*8d741a5dSApple OSS Distributions- a regular variant which is equivalent to the former with the *seq_cst* 46*8d741a5dSApple OSS Distributions memory ordering. 47*8d741a5dSApple OSS Distributions 48*8d741a5dSApple OSS DistributionsWhen an `_Atomic` qualified variable is accessed directly without using 49*8d741a5dSApple OSS Distributionsany `atomic_*_explicit()` operation, then the compiler will generate the 50*8d741a5dSApple OSS Distributionsmatching *seq_cst* atomic operations on your behalf. 51*8d741a5dSApple OSS Distributions 52*8d741a5dSApple OSS DistributionsThe sequentially consistent world is extremely safe from a lot of compiler 53*8d741a5dSApple OSS Distributionsand hardware reorderings and optimizations, which is great, but comes with 54*8d741a5dSApple OSS Distributionsa huge cost in terms of memory barriers. 55*8d741a5dSApple OSS Distributions 56*8d741a5dSApple OSS Distributions 57*8d741a5dSApple OSS DistributionsIt seems very tempting to use `atomic_*_explicit()` functions with explicit 58*8d741a5dSApple OSS Distributionsmemory orderings, however, the compiler is entitled to perform a number of 59*8d741a5dSApple OSS Distributionsoptimizations with relaxed atomics, that most developers will not expect. 60*8d741a5dSApple OSS DistributionsIndeed, the compiler is perfectly allowed to perform various optimizations it 61*8d741a5dSApple OSS Distributionsdoes with other plain memory accesess such as coalescing, reordering, hoisting 62*8d741a5dSApple OSS Distributionsout of loops, ... 63*8d741a5dSApple OSS Distributions 64*8d741a5dSApple OSS DistributionsFor example, when the compiler can know what `doit` is doing (which due to LTO 65*8d741a5dSApple OSS Distributionsis almost always the case for XNU), is allowed to transform this code: 66*8d741a5dSApple OSS Distributions 67*8d741a5dSApple OSS Distributions```c 68*8d741a5dSApple OSS Distributions void 69*8d741a5dSApple OSS Distributions perform_with_progress(int steps, long _Atomic *progress) 70*8d741a5dSApple OSS Distributions { 71*8d741a5dSApple OSS Distributions for (int i = 0; i < steps; i++) { 72*8d741a5dSApple OSS Distributions doit(i); 73*8d741a5dSApple OSS Distributions atomic_store_explicit(progress, i, memory_order_relaxed); 74*8d741a5dSApple OSS Distributions } 75*8d741a5dSApple OSS Distributions } 76*8d741a5dSApple OSS Distributions``` 77*8d741a5dSApple OSS Distributions 78*8d741a5dSApple OSS DistributionsInto this, which obviously defeats the entire purpose of `progress`: 79*8d741a5dSApple OSS Distributions 80*8d741a5dSApple OSS Distributions```c 81*8d741a5dSApple OSS Distributions void 82*8d741a5dSApple OSS Distributions perform_with_progress(int steps, long _Atomic *progress) 83*8d741a5dSApple OSS Distributions { 84*8d741a5dSApple OSS Distributions for (int i = 0; i < steps; i++) { 85*8d741a5dSApple OSS Distributions doit(i); 86*8d741a5dSApple OSS Distributions } 87*8d741a5dSApple OSS Distributions atomic_store_explicit(progress, steps, memory_order_relaxed); 88*8d741a5dSApple OSS Distributions } 89*8d741a5dSApple OSS Distributions``` 90*8d741a5dSApple OSS Distributions 91*8d741a5dSApple OSS Distributions 92*8d741a5dSApple OSS DistributionsHow `os_atomic_*` tries to address `<stdatomic.h>` pitfalls 93*8d741a5dSApple OSS Distributions----------------------------------------------------------- 94*8d741a5dSApple OSS Distributions 95*8d741a5dSApple OSS Distributions1. the memory locations passed to the various `os_atomic_*` 96*8d741a5dSApple OSS Distributions functions do not need to be marked `_Atomic` or `volatile` 97*8d741a5dSApple OSS Distributions (or `_Atomic volatile`), which allow for use of atomic 98*8d741a5dSApple OSS Distributions operations in code written before C11 was even a thing. 99*8d741a5dSApple OSS Distributions 100*8d741a5dSApple OSS Distributions It is however recommended in new code to use the `_Atomic` 101*8d741a5dSApple OSS Distributions specifier. 102*8d741a5dSApple OSS Distributions 103*8d741a5dSApple OSS Distributions2. `os_atomic_*` cannot be coalesced by the compiler: 104*8d741a5dSApple OSS Distributions all accesses are performed on the specified locations 105*8d741a5dSApple OSS Distributions as if their type was `_Atomic volatile` qualified. 106*8d741a5dSApple OSS Distributions 107*8d741a5dSApple OSS Distributions3. `os_atomic_*` only comes with the explicit variants: 108*8d741a5dSApple OSS Distributions orderings must be provided and can express either memory orders 109*8d741a5dSApple OSS Distributions where the name is the same as in C11 without the `memory_order_` prefix, 110*8d741a5dSApple OSS Distributions or a compiler barrier ordering `compiler_acquire`, `compiler_release`, 111*8d741a5dSApple OSS Distributions `compiler_acq_rel`. 112*8d741a5dSApple OSS Distributions 113*8d741a5dSApple OSS Distributions4. `os_atomic_*` emits the proper compiler barriers that 114*8d741a5dSApple OSS Distributions correspond to the requested memory ordering (using 115*8d741a5dSApple OSS Distributions `atomic_signal_fence()`). 116*8d741a5dSApple OSS Distributions 117*8d741a5dSApple OSS Distributions 118*8d741a5dSApple OSS DistributionsBest practices for the use of atomics in XNU 119*8d741a5dSApple OSS Distributions-------------------------------------------- 120*8d741a5dSApple OSS Distributions 121*8d741a5dSApple OSS DistributionsFor most generic code, the `os_atomic_*` functions from 122*8d741a5dSApple OSS Distributions`<os/atomic_private.h>` are the preferred interfaces. 123*8d741a5dSApple OSS Distributions 124*8d741a5dSApple OSS Distributions`__sync_*`, `__c11_*` and `__atomic_*` compiler builtins should not be used. 125*8d741a5dSApple OSS Distributions 126*8d741a5dSApple OSS Distributions`<stdatomic.h>` functions may be used if: 127*8d741a5dSApple OSS Distributions 128*8d741a5dSApple OSS Distributions- compiler coalescing / reordering is desired (refcounting 129*8d741a5dSApple OSS Distributions implementations may desire this for example). 130*8d741a5dSApple OSS Distributions 131*8d741a5dSApple OSS Distributions 132*8d741a5dSApple OSS DistributionsQualifying atomic variables with `_Atomic` or even 133*8d741a5dSApple OSS Distributions`_Atomic volatile` is encouraged, however authors must 134*8d741a5dSApple OSS Distributionsbe aware that a direct access to this variable will 135*8d741a5dSApple OSS Distributionsresult in quite heavy memory barriers. 136*8d741a5dSApple OSS Distributions 137*8d741a5dSApple OSS DistributionsThe *consume* memory ordering should not be used 138*8d741a5dSApple OSS Distributions(See *dependency* memory order later in this documentation). 139*8d741a5dSApple OSS Distributions 140*8d741a5dSApple OSS Distributions**Note**: `<libkern/OSAtomic.h>` provides a bunch of legacy 141*8d741a5dSApple OSS Distributionsatomic interfaces, but this header is considered obsolete 142*8d741a5dSApple OSS Distributionsand these functions should not be used in new code. 143*8d741a5dSApple OSS Distributions 144*8d741a5dSApple OSS Distributions 145*8d741a5dSApple OSS DistributionsHigh level overview of `os_atomic_*` interfaces 146*8d741a5dSApple OSS Distributions----------------------------------------------- 147*8d741a5dSApple OSS Distributions 148*8d741a5dSApple OSS Distributions### Compiler barriers and memory fences 149*8d741a5dSApple OSS Distributions 150*8d741a5dSApple OSS Distributions`os_compiler_barrier(mem_order?)` provides a compiler barrier, 151*8d741a5dSApple OSS Distributionswith an optional barrier ordering. It is implemented with C11's 152*8d741a5dSApple OSS Distributions`atomic_signal_fence()`. The barrier ordering argument is optional 153*8d741a5dSApple OSS Distributionsand defaults to the `acq_rel` compiler barrier (which prevents the 154*8d741a5dSApple OSS Distributionscompiler to reorder code in any direction around this barrier). 155*8d741a5dSApple OSS Distributions 156*8d741a5dSApple OSS Distributions`os_atomic_thread_fence(mem_order)` provides a memory barrier 157*8d741a5dSApple OSS Distributionsaccording to the semantics of `atomic_thread_fence()`. It always 158*8d741a5dSApple OSS Distributionsimplies the equivalent `os_compiler_barrier()` even on UP systems. 159*8d741a5dSApple OSS Distributions 160*8d741a5dSApple OSS Distributions### Init, load and store 161*8d741a5dSApple OSS Distributions 162*8d741a5dSApple OSS Distributions`os_atomic_init`, `os_atomic_load` and `os_atomic_store` provide 163*8d741a5dSApple OSS Distributionsfacilities equivalent to `atomic_init`, `atomic_load_explicit` 164*8d741a5dSApple OSS Distributionsand `atomic_store_explicit` respectively. 165*8d741a5dSApple OSS Distributions 166*8d741a5dSApple OSS DistributionsNote that `os_atomic_load` and `os_atomic_store` promise that they will 167*8d741a5dSApple OSS Distributionscompile to a plain load or store. `os_atomic_load_wide` and 168*8d741a5dSApple OSS Distributions`os_atomic_store_wide` can be used to have access to atomic loads and store 169*8d741a5dSApple OSS Distributionsthat involve more costly codegen (such as compare exchange loops). 170*8d741a5dSApple OSS Distributions 171*8d741a5dSApple OSS Distributions### Basic RMW (read/modify/write) atomic operations 172*8d741a5dSApple OSS Distributions 173*8d741a5dSApple OSS DistributionsThe following basic atomic RMW operations exist: 174*8d741a5dSApple OSS Distributions 175*8d741a5dSApple OSS Distributions- `inc`: atomic increment (equivalent to an atomic add of `1`), 176*8d741a5dSApple OSS Distributions- `dec`: atomic decrement (equivalent to an atomic sub of `1`), 177*8d741a5dSApple OSS Distributions- `add`: atomic add, 178*8d741a5dSApple OSS Distributions- `sub`: atomic sub, 179*8d741a5dSApple OSS Distributions- `or`: atomic bitwise or, 180*8d741a5dSApple OSS Distributions- `xor`: atomic bitwise xor, 181*8d741a5dSApple OSS Distributions- `and`: atomic bitwise and, 182*8d741a5dSApple OSS Distributions- `andnot`: atomic bitwise andnot (equivalent to atomic and of ~value), 183*8d741a5dSApple OSS Distributions- `min`: atomic min, 184*8d741a5dSApple OSS Distributions- `max`: atomic max. 185*8d741a5dSApple OSS Distributions 186*8d741a5dSApple OSS DistributionsFor any such operation, two variants exist: 187*8d741a5dSApple OSS Distributions 188*8d741a5dSApple OSS Distributions- `os_atomic_${op}_orig` (for example `os_atomic_add_orig`) 189*8d741a5dSApple OSS Distributions which returns the value stored at the specified location 190*8d741a5dSApple OSS Distributions *before* the atomic operation took place 191*8d741a5dSApple OSS Distributions- `os_atomic_${op}` (for example `os_atomic_add`) which 192*8d741a5dSApple OSS Distributions returns the value stored at the specified location 193*8d741a5dSApple OSS Distributions *after* the atomic operation took place 194*8d741a5dSApple OSS Distributions 195*8d741a5dSApple OSS DistributionsThis convention is picked for two reasons: 196*8d741a5dSApple OSS Distributions 197*8d741a5dSApple OSS Distributions1. `os_atomic_add(p, value, ...)` is essentially equivalent to the C 198*8d741a5dSApple OSS Distributions in place addition `(*p += value)` which returns the result of the 199*8d741a5dSApple OSS Distributions operation and not the original value of `*p`. 200*8d741a5dSApple OSS Distributions 201*8d741a5dSApple OSS Distributions2. Most subtle atomic algorithms do actually require the original value 202*8d741a5dSApple OSS Distributions stored at the location, especially for bit manipulations: 203*8d741a5dSApple OSS Distributions `(os_atomic_or_orig(p, bit, relaxed) & bit)` will atomically perform 204*8d741a5dSApple OSS Distributions `*p |= bit` but also tell you whether `bit` was set in the original value. 205*8d741a5dSApple OSS Distributions 206*8d741a5dSApple OSS Distributions Making it more explicit that the original value is used is hence 207*8d741a5dSApple OSS Distributions important for readers and worth the extra five keystrokes. 208*8d741a5dSApple OSS Distributions 209*8d741a5dSApple OSS DistributionsTypically: 210*8d741a5dSApple OSS Distributions 211*8d741a5dSApple OSS Distributions```c 212*8d741a5dSApple OSS Distributions static int _Atomic i = 0; 213*8d741a5dSApple OSS Distributions 214*8d741a5dSApple OSS Distributions printf("%d\n", os_atomic_inc_orig(&i)); // prints 0 215*8d741a5dSApple OSS Distributions printf("%d\n", os_atomic_inc(&i)); // prints 2 216*8d741a5dSApple OSS Distributions``` 217*8d741a5dSApple OSS Distributions 218*8d741a5dSApple OSS Distributions### Atomic swap / compare and swap 219*8d741a5dSApple OSS Distributions 220*8d741a5dSApple OSS Distributions`os_atomic_xchg` is a simple wrapper around `atomic_exchange_explicit`. 221*8d741a5dSApple OSS Distributions 222*8d741a5dSApple OSS DistributionsThere are two variants of `os_atomic_cmpxchg` which are wrappers around 223*8d741a5dSApple OSS Distributions`atomic_compare_exchange_strong_explicit`. Both of these variants will 224*8d741a5dSApple OSS Distributionsreturn false/0 if the compare exchange failed, and true/1 if the expected 225*8d741a5dSApple OSS Distributionsvalue was found at the specified location and the new value was stored. 226*8d741a5dSApple OSS Distributions 227*8d741a5dSApple OSS Distributions1. `os_atomic_cmpxchg(address, expected, new_value, mem_order)` which 228*8d741a5dSApple OSS Distributions will atomically store `new_value` at `address` if the current value 229*8d741a5dSApple OSS Distributions is equal to `expected`. 230*8d741a5dSApple OSS Distributions 231*8d741a5dSApple OSS Distributions2. `os_atomic_cmpxchgv(address, expected, new_value, orig_value, mem_order)` 232*8d741a5dSApple OSS Distributions which has an extra `orig_value` argument which must be a pointer to a local 233*8d741a5dSApple OSS Distributions variable and will be filled with the current value at `address` whether the 234*8d741a5dSApple OSS Distributions compare exchange was successful or not. In case of success, the loaded value 235*8d741a5dSApple OSS Distributions will always be `expected`, however in case of failure it will be filled with 236*8d741a5dSApple OSS Distributions the current value, which is helpful to redrive compare exchange loops. 237*8d741a5dSApple OSS Distributions 238*8d741a5dSApple OSS DistributionsUnlike `atomic_compare_exchange_strong_explicit`, a single ordering is 239*8d741a5dSApple OSS Distributionsspecified, which only takes effect in case of a successful compare exchange. 240*8d741a5dSApple OSS DistributionsIn C11 speak, `os_atomic_cmpxchg*` always specifies `memory_order_relaxed` 241*8d741a5dSApple OSS Distributionsfor the failure case ordering, as it is what is used most of the time. 242*8d741a5dSApple OSS Distributions 243*8d741a5dSApple OSS DistributionsThere is no wrapper around `atomic_compare_exchange_weak_explicit`, 244*8d741a5dSApple OSS Distributionsas `os_atomic_rmw_loop` offers a much better alternative for CAS-loops. 245*8d741a5dSApple OSS Distributions 246*8d741a5dSApple OSS Distributions### `os_atomic_rmw_loop` 247*8d741a5dSApple OSS Distributions 248*8d741a5dSApple OSS DistributionsThis expressive and versatile construct allows for really terse and 249*8d741a5dSApple OSS Distributionsway more readable compare exchange loops. It also uses LL/SC constructs more 250*8d741a5dSApple OSS Distributionsefficiently than a compare exchange loop would allow. 251*8d741a5dSApple OSS Distributions 252*8d741a5dSApple OSS DistributionsInstead of a typical CAS-loop in C11: 253*8d741a5dSApple OSS Distributions 254*8d741a5dSApple OSS Distributions```c 255*8d741a5dSApple OSS Distributions int _Atomic *address; 256*8d741a5dSApple OSS Distributions int old_value, new_value; 257*8d741a5dSApple OSS Distributions bool success = false; 258*8d741a5dSApple OSS Distributions 259*8d741a5dSApple OSS Distributions old_value = atomic_load_explicit(address, memory_order_relaxed); 260*8d741a5dSApple OSS Distributions do { 261*8d741a5dSApple OSS Distributions if (!validate(old_value)) { 262*8d741a5dSApple OSS Distributions break; 263*8d741a5dSApple OSS Distributions } 264*8d741a5dSApple OSS Distributions new_value = compute_new_value(old_value); 265*8d741a5dSApple OSS Distributions success = atomic_compare_exchange_weak_explicit(address, &old_value, 266*8d741a5dSApple OSS Distributions new_value, memory_order_acquire, memory_order_relaxed); 267*8d741a5dSApple OSS Distributions } while (__improbable(!success)); 268*8d741a5dSApple OSS Distributions``` 269*8d741a5dSApple OSS Distributions 270*8d741a5dSApple OSS Distributions`os_atomic_rmw_loop` allows this form: 271*8d741a5dSApple OSS Distributions 272*8d741a5dSApple OSS Distributions```c 273*8d741a5dSApple OSS Distributions int _Atomic *address; 274*8d741a5dSApple OSS Distributions int old_value, new_value; 275*8d741a5dSApple OSS Distributions bool success; 276*8d741a5dSApple OSS Distributions 277*8d741a5dSApple OSS Distributions success = os_atomic_rmw_loop(address, old_value, new_value, acquire, { 278*8d741a5dSApple OSS Distributions if (!validate(old_value)) { 279*8d741a5dSApple OSS Distributions os_atomic_rmw_loop_give_up(break); 280*8d741a5dSApple OSS Distributions } 281*8d741a5dSApple OSS Distributions new_value = compute_new_value(old_value); 282*8d741a5dSApple OSS Distributions }); 283*8d741a5dSApple OSS Distributions``` 284*8d741a5dSApple OSS Distributions 285*8d741a5dSApple OSS DistributionsUnlike the C11 variant, it lets the reader know in program order that this will 286*8d741a5dSApple OSS Distributionsbe a CAS loop, and exposes the ordering upfront, while for traditional CAS loops 287*8d741a5dSApple OSS Distributionsone has to jump to the end of the code to understand what it does. 288*8d741a5dSApple OSS Distributions 289*8d741a5dSApple OSS DistributionsAny control flow that attempts to exit its scope of the loop needs to be 290*8d741a5dSApple OSS Distributionswrapped with `os_atomic_rmw_loop_give_up` (so that LL/SC architectures can 291*8d741a5dSApple OSS Distributionsabort their opened LL/SC transaction). 292*8d741a5dSApple OSS Distributions 293*8d741a5dSApple OSS DistributionsBecause these loops are LL/SC transactions, it is undefined to perform 294*8d741a5dSApple OSS Distributionsany store to memory (register operations are fine) within these loops, 295*8d741a5dSApple OSS Distributionsas these may cause the store-conditional to always fail. 296*8d741a5dSApple OSS DistributionsIn particular nesting of `os_atomic_rmw_loop` is invalid. 297*8d741a5dSApple OSS Distributions 298*8d741a5dSApple OSS DistributionsUse of `continue` within an `os_atomic_rmw_loop` is also invalid, instead an 299*8d741a5dSApple OSS Distributions`os_atomic_rmw_loop_give_up(goto again)` jumping to an `again:` label placed 300*8d741a5dSApple OSS Distributionsbefore the loop should be used in this way: 301*8d741a5dSApple OSS Distributions 302*8d741a5dSApple OSS Distributions```c 303*8d741a5dSApple OSS Distributions int _Atomic *address; 304*8d741a5dSApple OSS Distributions int old_value, new_value; 305*8d741a5dSApple OSS Distributions bool success; 306*8d741a5dSApple OSS Distributions 307*8d741a5dSApple OSS Distributionsagain: 308*8d741a5dSApple OSS Distributions success = os_atomic_rmw_loop(address, old_value, new_value, acquire, { 309*8d741a5dSApple OSS Distributions if (needs_some_store_that_can_thwart_the_transaction(old_value)) { 310*8d741a5dSApple OSS Distributions os_atomic_rmw_loop_give_up({ 311*8d741a5dSApple OSS Distributions // Do whatever you need to do/store to central memory 312*8d741a5dSApple OSS Distributions // that would cause the loop to always fail 313*8d741a5dSApple OSS Distributions do_my_rmw_loop_breaking_store(); 314*8d741a5dSApple OSS Distributions 315*8d741a5dSApple OSS Distributions // And only then redrive. 316*8d741a5dSApple OSS Distributions goto again; 317*8d741a5dSApple OSS Distributions }); 318*8d741a5dSApple OSS Distributions } 319*8d741a5dSApple OSS Distributions if (!validate(old_value)) { 320*8d741a5dSApple OSS Distributions os_atomic_rmw_loop_give_up(break); 321*8d741a5dSApple OSS Distributions } 322*8d741a5dSApple OSS Distributions new_value = compute_new_value(old_value); 323*8d741a5dSApple OSS Distributions }); 324*8d741a5dSApple OSS Distributions``` 325*8d741a5dSApple OSS Distributions 326*8d741a5dSApple OSS Distributions### the *dependency* memory order 327*8d741a5dSApple OSS Distributions 328*8d741a5dSApple OSS DistributionsBecause the C11 *consume* memory order is broken in various ways, 329*8d741a5dSApple OSS Distributionsmost compilers, clang included, implement it as an equivalent 330*8d741a5dSApple OSS Distributionsfor `memory_order_acquire`. However, its concept is useful 331*8d741a5dSApple OSS Distributionsfor certain algorithms. 332*8d741a5dSApple OSS Distributions 333*8d741a5dSApple OSS DistributionsAs an attempt to provide a replacement for this, `<os/atomic_private.h>` 334*8d741a5dSApple OSS Distributionsimplements an entirely new *dependency* memory ordering. 335*8d741a5dSApple OSS Distributions 336*8d741a5dSApple OSS DistributionsThe purpose of this ordering is to provide a relaxed load followed by an 337*8d741a5dSApple OSS Distributionsimplicit compiler barrier, that can be used as a root for a chain of hardware 338*8d741a5dSApple OSS Distributionsdependencies that would otherwise pair with store-releases done at this address, 339*8d741a5dSApple OSS Distributionsvery much like the *consume* memory order is intended to provide. 340*8d741a5dSApple OSS Distributions 341*8d741a5dSApple OSS DistributionsHowever, unlike the *consume* memory ordering where the compiler had to follow 342*8d741a5dSApple OSS Distributionsthe dependencies, the *dependency* memory ordering relies on explicit 343*8d741a5dSApple OSS Distributionsannotations of when the dependencies are expected: 344*8d741a5dSApple OSS Distributions 345*8d741a5dSApple OSS Distributions- loads through a pointer loaded with a *dependency* memory ordering 346*8d741a5dSApple OSS Distributions will provide a hardware dependency, 347*8d741a5dSApple OSS Distributions 348*8d741a5dSApple OSS Distributions- dependencies may be injected into other loads not performed through this 349*8d741a5dSApple OSS Distributions particular pointer with the `os_atomic_load_with_dependency_on` and 350*8d741a5dSApple OSS Distributions `os_atomic_inject_dependency` interfaces. 351*8d741a5dSApple OSS Distributions 352*8d741a5dSApple OSS DistributionsHere is an example of how it is meant to be used: 353*8d741a5dSApple OSS Distributions 354*8d741a5dSApple OSS Distributions```c 355*8d741a5dSApple OSS Distributions struct foo { 356*8d741a5dSApple OSS Distributions long value; 357*8d741a5dSApple OSS Distributions long _Atomic flag; 358*8d741a5dSApple OSS Distributions }; 359*8d741a5dSApple OSS Distributions 360*8d741a5dSApple OSS Distributions void 361*8d741a5dSApple OSS Distributions publish(struct foo *p, long value) 362*8d741a5dSApple OSS Distributions { 363*8d741a5dSApple OSS Distributions p->value = value; 364*8d741a5dSApple OSS Distributions os_atomic_store(&p->flag, 1, release); 365*8d741a5dSApple OSS Distributions } 366*8d741a5dSApple OSS Distributions 367*8d741a5dSApple OSS Distributions 368*8d741a5dSApple OSS Distributions bool 369*8d741a5dSApple OSS Distributions broken_read(struct foo *p, long *value) 370*8d741a5dSApple OSS Distributions { 371*8d741a5dSApple OSS Distributions /* 372*8d741a5dSApple OSS Distributions * This isn't safe, as there's absolutely no hardware dependency involved. 373*8d741a5dSApple OSS Distributions * Using an acquire barrier would of course fix it but is quite expensive... 374*8d741a5dSApple OSS Distributions */ 375*8d741a5dSApple OSS Distributions if (os_atomic_load(&p->flag, relaxed)) { 376*8d741a5dSApple OSS Distributions *value = p->value; 377*8d741a5dSApple OSS Distributions return true; 378*8d741a5dSApple OSS Distributions } 379*8d741a5dSApple OSS Distributions return false; 380*8d741a5dSApple OSS Distributions } 381*8d741a5dSApple OSS Distributions 382*8d741a5dSApple OSS Distributions bool 383*8d741a5dSApple OSS Distributions valid_read(struct foo *p, long *value) 384*8d741a5dSApple OSS Distributions { 385*8d741a5dSApple OSS Distributions long flag = os_atomic_load(&p->flag, dependency); 386*8d741a5dSApple OSS Distributions if (flag) { 387*8d741a5dSApple OSS Distributions /* 388*8d741a5dSApple OSS Distributions * Further the chain of dependency to any loads through `p` 389*8d741a5dSApple OSS Distributions * which properly pair with the release barrier in `publish`. 390*8d741a5dSApple OSS Distributions */ 391*8d741a5dSApple OSS Distributions *value = os_atomic_load_with_dependency_on(&p->value, flag); 392*8d741a5dSApple OSS Distributions return true; 393*8d741a5dSApple OSS Distributions } 394*8d741a5dSApple OSS Distributions return false; 395*8d741a5dSApple OSS Distributions } 396*8d741a5dSApple OSS Distributions``` 397*8d741a5dSApple OSS Distributions 398*8d741a5dSApple OSS DistributionsThere are 4 interfaces involved with hardware dependencies: 399*8d741a5dSApple OSS Distributions 400*8d741a5dSApple OSS Distributions1. `os_atomic_load(..., dependency)` to initiate roots of hardware dependencies, 401*8d741a5dSApple OSS Distributions that should pair with a store or rmw with release semantics or stronger 402*8d741a5dSApple OSS Distributions (release, acq\_rel or seq\_cst), 403*8d741a5dSApple OSS Distributions 404*8d741a5dSApple OSS Distributions2. `os_atomic_inject_dependency` can be used to inject the dependency provided 405*8d741a5dSApple OSS Distributions by a *dependency* load, or any other value that has had a dependency 406*8d741a5dSApple OSS Distributions injected, 407*8d741a5dSApple OSS Distributions 408*8d741a5dSApple OSS Distributions3. `os_atomic_load_with_dependency_on` to do an otherwise related relaxed load 409*8d741a5dSApple OSS Distributions that still prolongs a dependency chain, 410*8d741a5dSApple OSS Distributions 411*8d741a5dSApple OSS Distributions4. `os_atomic_make_dependency` to create an opaque token out of a given 412*8d741a5dSApple OSS Distributions dependency root to inject into multiple loads. 413*8d741a5dSApple OSS Distributions 414*8d741a5dSApple OSS Distributions 415*8d741a5dSApple OSS Distributions**Note**: this technique is NOT safe when the compiler can reason about the 416*8d741a5dSApple OSS Distributionspointers that you are manipulating, for example if the compiler can know that 417*8d741a5dSApple OSS Distributionsthe pointer can only take a couple of values and ditch all these manually 418*8d741a5dSApple OSS Distributionscrafted dependency chains. Hopefully there will be a future C2Y standard that 419*8d741a5dSApple OSS Distributionsprovides a similar construct as a language feature instead. 420