1# Guard-object allocation policy 2 3The contemporary techniques we use to group and protect smaller allocations, 4such as zalloc and kalloc_type, are premised around type-isolation and VA 5sequestering. These techniques are impractical for larger allocations because 6they consume outsized chunks of virtual address space, and could lead to 7exhaustion. We therefore use a different strategy for larger allocations, which 8we call guard objects. 9 10 11## Algorithm 12 13Allocation policies are assigned to particular regions of kernel address space. 14The strategy specified by this document is used by regions such as the various 15pointer ranges of `kmem_alloc()`, which previously used a first-fit allocator. 16Under the guard-objects policy, the virtual address space is divided in 17*chunks*. A chunk has a size class of the form $2^k \times \mathtt{PAGE\_SIZE}$, 18and is made of $\mathcal{S}$ *slots* of that size. $\mathcal{S}$ varies with the 19size class: smaller size-classes will have more slots, and larger size classes 20will have fewer. Chunks are also configured to have $\mathcal{G}$ _guards_ and 21up to $\mathcal{Q}$ _quarantines_. 22 23Chunks maintain several other important pieces of information, such as: 24 25* which slots are allocated and which slots are free; 26* a dynamic count of quarantined slots within $[0, \mathcal{Q})$; 27* a count of *available* slots, which are the number of free slots in excess 28 of guards $\mathcal{G}$ and the number of currently quarantined slots. 29 30A chunk can be in three states: *empty* if all its slots are free, *partial* if 31it has at least one available slot, *full* if it has no available slots. Chunks 32are maintained in lists segregated by size class and state. 33 34### Allocating memory 35 36Memory requests for a given size are rounded up to the next size class of the 37form $2^k$. The allocator must first select an appropriate chunk. Partial chunks 38are preferred, and if no partial chunk exists, an empty chunk is allocated from 39unused virtual address space. 40 41A random slot is then chosen from any of the free slots in that chunk, and the 42available count of the chunk is decremented. If the chunk has now exhausted its 43available slots — only quarantined slots and guards are left — it's placed on 44its corresponding size class's full list. 45 46Visually, let’s consider an example with two partial chunks A and B, with 8 47slots each, for $\mathcal{G} = \mathcal{Q} = 2$: 48 49``` 50 A───────────────┬─┬─┬─┬─┐ B───────────────┬─┬─┬─┬─┐ 51 │ Available: 1 │X│ │ │X│ │ Available: 4 │ │ │X│ │ 52 ┌─▶│ Quarantine: 0 ├─┼─┼─┼─┤────▶│ Quarantine: 0 ├─┼─┼─┼─┤ 53┌─────────┐ │ │ Guards: 2 │X│X│X│ │ │ Guards: 2 │ │X│ │ │ 54│ partial │──┘ └───────────────┴─┴─┴─┴─┘ └───────────────┴─┴─┴─┴─┘ 55├─────────┤ 56│ full │ 57└─────────┘ 58 59Legend: 60┌─┐ ┌─┐ 61│ │ free slot │X│ allocated slot 62└─┘ └─┘ 63``` 64 65The first allocation will be performed from chunk A, using its last available 66slot and moving it to the full list: 67 68``` 69 B───────────────┬─┬─┬─┬─┐ 70 │ Available: 4 │ │ │X│ │ 71 ┌─▶│ Quarantine: 0 ├─┼─┼─┼─┤ 72┌─────────┐ │ │ Guards: 2 │ │X│ │ │ 73│ partial │──┘ └───────────────┴─┴─┴─┴─┘ 74├─────────┤ 75│ full │──┐ A───────────────┬─┬─┬─┬─┐ 76└─────────┘ │ │ Available: 0 │X│ │X│X│ 77 └─▶│ Quarantine: 0 ├─┼─┼─┼─┤ 78 │ Guards: 2 │X│X│X│ │ 79 └───────────────┴─┴─┴─┴─┘ 80``` 81 82When the next allocation request in this size class arrives, the allocator will 83select B because A is now full: 84 85``` 86 B───────────────┬─┬─┬─┬─┐ 87 │ Available: 3 │ │ │X│ │ 88 ┌─▶│ Quarantine: 0 ├─┼─┼─┼─┤ 89┌─────────┐ │ │ Guards: 2 │ │X│X│ │ 90│ partial │──┘ └───────────────┴─┴─┴─┴─┘ 91├─────────┤ 92│ full │──┐ A───────────────┬─┬─┬─┬─┐ 93└─────────┘ │ │ Available: 0 │X│ │X│X│ 94 └─▶│ Quarantine: 0 ├─┼─┼─┼─┤ 95 │ Guards: 2 │X│X│X│ │ 96 └───────────────┴─┴─┴─┴─┘ 97``` 98 99### Deallocating memory 100 101Deallocating a virtual memory range works in reverse. First, we evaluate which 102chunk and slot correspond to the range being freed. Since the semantics of the 103virtual memory subsystem mandate that we must support partial deallocations, we 104next consider whether the slot has become only partially free. If so, we have 105nothing more to do for now; the slot remains in use. 106 107If however the slot is now entirely free, then the quarantine count of the chunk 108is incremented. If at least $\mathcal{G} + \mathcal{Q}$ are free, then the 109quarantine is cleared. The idea behind this policy is that maintaining a good 110entropy requires enough free slots to choose from. As a result, once the free 111slot count dips below $\mathcal{G} + \mathcal{Q}$, freed slots are quarantined 112rather than made immediately available. 113 114Finally, we evaluate whether the chunk needs to be moved: 115 116* if a chunk's slots are all fully free, the chunk is marked as empty, and is 117 typically returned to the system as free space; 118* if the chunk previously had no slots available, but has any available now, 119 the chunk is moved to the partially-free chunks list. 120 121Visually, let’s consider an example with a chunk using 16 slots, 122under a configuration in which $\mathcal{G} = \mathcal{Q} = 4$: 123 124``` 125 A───────────────┬─┬─┬─┬─┬─┬─┬─┬─┐ 126┌─────────┐ │ Available: 1 │ │X│X│X│ │ │X│ │ 127│ partial │─────▶│ Quarantine: 1 ├─┼─┼─┼─┼─┼─┼─┼─┤ 128├─────────┤ │ Guards: 4 │ │X│X│X│X│X│ │X│ 129│ full │ └───────────────┴─┴─┴─┴─┴─┴─┴─┴─┘ 130└─────────┘ 131 132Legend: 133┌─┐ ┌─┐ 134│ │ free slot │X│ allocated slot 135└─┘ └─┘ 136``` 137 138If we now free an element, its slot is marked free, and the quarantine count 139is increased: 140 141``` 142 A───────────────┬─┬─┬─┬─┬─┬─┬─┬─┐ 143┌─────────┐ │ Available: 1 │ │X│X│X│ │ │X│ │ 144│ partial │─────▶│ Quarantine: 2 ├─┼─┼─┼─┼─┼─┼─┼─┤ 145├─────────┤ │ Guards: 4 │ │X│X│ │X│X│ │X│ 146│ full │ └───────────────┴─┴─┴─┴─┴─┴─┴─┴─┘ 147└─────────┘ 148``` 149 150If we allocate the last available element, a slot is now marked used, 151the available count drops to 0, and causes the chunk to now be full, 152and the quarantine stays untouched: 153 154``` 155┌─────────┐ 156│ partial │ A───────────────┬─┬─┬─┬─┬─┬─┬─┬─┐ 157├─────────┤ │ Available: 0 │X│X│X│X│ │ │X│ │ 158│ full │─────▶│ Quarantine: 2 ├─┼─┼─┼─┼─┼─┼─┼─┤ 159└─────────┘ │ Guards: 4 │ │X│X│ │X│X│ │X│ 160 └───────────────┴─┴─┴─┴─┴─┴─┴─┴─┘ 161``` 162 163Freeing just one element would return just one slot, and bump the quarantine 164count to 3. It takes freeing two elements for more than $\mathcal{G} + 165\mathcal{Q}$ slots to be free, leading to clearing the quarantine: 166 167``` 168 A───────────────┬─┬─┬─┬─┬─┬─┬─┬─┐ 169┌─────────┐ │ Available: 4 │X│X│X│X│ │ │X│ │ 170│ partial │─────▶│ Quarantine: 0 ├─┼─┼─┼─┼─┼─┼─┼─┤ 171├─────────┤ │ Guards: 4 │ │X│X│ │ │ │ │X│ 172│ full │ └───────────────┴─┴─┴─┴─┴─┴─┴─┴─┘ 173└─────────┘ 174``` 175 176### Operation clamping to a slot 177 178As long as a VM operation does not exceed slot bounds, any VM call is permitted, 179whether it is configuration altering calls such as `vm_protect()` or 180`vm_inherit()`, taking copies of the range with `mach_make_memory_entry()`, or 181replacing part of the mapping with `vm_allocate(..., VM_FLAGS_FIXED | 182VM_FLAGS_OVERWRITE)`. 183 184However, operations that cross a slot boundary are not permitted, and lead to 185termination. When guard object policies are in effect, allocations are 186randomized, and even in a single threaded context, clients can't assume that 187consecutive allocations will be served in address order, and as a result, 188operations crossing slot boundaries are always bugs. 189 190 191## Security motivation 192 193With the algorithm explanation, the “guard object” moniker should start to make 194sense: the goal is that unused free slots are object-sized guards — in the 195guard-page sense. Unlike usage of traditional guard pages, which only protect 196against linear buffer overflows, this scheme also adds a probabilistic 197mitigation against use-after-free or non-linear buffer overflows, forcing 198attackers to contend with a high probability of hitting these guards. 199 200Due to visible timing differences, it is assumed that an attacker can observe 201when a chunk is being allocated or freed. However, this doesn't give them an 202edge because the system maintains a guarantee that at least 203$\mathcal{G}/\mathcal{S}$ of the address space causes a crash when accessed at 204any given time. This is why we can let go of any form of sequestering of the 205address space for ranges managed with the guard-object allocation policy. 206 207### Use-after-Free exploitation strategies and failure rates 208 209Attackers attempting to exploit a use-after-free will be able to cause an 210element to be freed, knowing that a dangling pointer to it remains. They will 211then try to replace this freed element with another one of a different type 212that they control, creating a type confusion. 213 214Because allocating new chunks causes visible timing differences, we can assume 215that attackers are able to fill chunks with all slots corresponding to elements 216that they control. They also know which elements are in the same chunk, but 217don't know which element corresponds to which slot. 218 219**In the absence of the quarantine**, the best strategy for attackers trying to 220exploit a use-after free is to perform $\mathcal{S} − \mathcal{G}$ rounds 221of freeing then reallocating each element in the chunk, where the first free is 222to the element they are trying to use-after-free, so that they retain a dangling 223pointer to the original slot. Each round, the allocator will choose one slot 224among $\mathcal{G} + 1$, when only one corresponds to the slot that was freed by 225triggering the bug. The failure rate the attackers face with this strategy is 226 227$$failure\_rate = \left( 228\frac{\mathcal{G}}{\mathcal{G+1}}\right)^{\mathcal{S} - \mathcal{G}}$$ 229 230* $\mathcal{S} = 8, \mathcal{G} = 2$ yields a 8.8% failure rate; 231* $\mathcal{S} = 16, \mathcal{G} = 4$ yields a 6.9 failure rate. 232 233**Using the quarantine** further reduces an attacker's odds of success. Unlike 234before, they need to free at least $\mathcal{Q}$ elements before elements are 235made available for allocations again. As a result, a round now needs to be 236$\mathcal{Q}$ deallocations followed by $\mathcal{Q}$ allocations. As before, 237the very first deallocation is to the element the attacker tries to 238use-after-free. A round gives attackers $\frac{\mathcal{G}}{ 239\mathcal{G}+\mathcal{Q}}$ chances of failure. The aggregate failure rate for 240this strategy is 241 242$$failure\_rate =\left( 1 - \frac{(\mathcal{S} - \mathcal{G)} \bmod \mathcal{Q} 243}{\mathcal{G} + \mathcal{Q}} \right) 244\left(\frac{\mathcal{G}}{\mathcal{G} + \mathcal{Q}}\right)^{ 245\left\lfloor \frac{\mathcal{S} -\mathcal{G}}{\mathcal{Q}} \right\rfloor}$$ 246 247* $\mathcal{S}=8, \mathcal{G}=1, \mathcal{Q}=4$ yields a 8% failure rate; 248* $\mathcal{S}=8, \mathcal{G}=2, \mathcal{Q}=2$ yields a 12.5% failure rate; 249* $\mathcal{S}=16, \mathcal{G}=4, \mathcal{Q}=3$ yields a 10.7% failure rate; 250* $\mathcal{S}=16, \mathcal{G}=4, \mathcal{Q}=4$ yields a 12.5% failure rate. 251 252### Out-of-bound exploitation strategies 253 254Exploiting out-of-bound bugs requires knowing the distance between allocated 255objects, which an attacker a priori doesn’t know without some information leak. 256The regions protected by guard objects are coarsely randomized in the address 257space, so that attackers can’t predict how far an allocation is from other juicy 258exploitation targets in the address space such as the `__DATA` segment. Lastly, 259guard objects are combined in XNU with type isolation and allocation fronts. It 260makes cross-type attacks unreliable and unpredictable — as the various buckets 261of types are randomized per boot. 262 263The last remaining avenue of attack targets types that fall in the same 264allocation front. However, attackers still have to contend with the uniform 265$\mathcal{G}/\mathcal{S}$ density of holes, making out-of-bound unreliable with 266a probability of failure close to $\mathcal{G}/\mathcal{S}$. This probability of 267failure is typically worse than use-after-free for attackers, and intuitively, 268this is precisely because they need to know more information — the distance 269between objects — unlike use-after-free where that distance is obviously known 270to be zero. 271 272### Choice of parameters 273 274In the actual implementation, $\mathcal{S}$ scales up with the size of the 275allocations going up — as a way to limit the amount of metadata needed. 276Maintaining the $\mathcal{G}/\mathcal{S}$ and $\mathcal{Q}/\mathcal{S}$ ratios 277constant for any $\mathcal{S}$ allows for all probabilities to become 278independent of $\mathcal{S}$. 279 280Our goal is to make attackers face at least 10% chance of failure — in the 281absence of information disclosure — which is why we chose numbers where 282$\mathcal{G} = \mathcal{Q} = \mathcal{S}/4$, yielding: 283 284* 25% failure rates for out-of-bound exploitation; 285* 12.5% failure rates for use-after-free exploitation. 286 287