xref: /xnu-12377.81.4/doc/allocators/guard-objects.md (revision 043036a2b3718f7f0be807e2870f8f47d3fa0796)
1# Guard-object allocation policy
2
3The contemporary techniques we use to group and protect smaller allocations,
4such as zalloc and kalloc_type, are premised around type-isolation and VA
5sequestering. These techniques are impractical for larger allocations because
6they consume outsized chunks of virtual address space, and could lead to
7exhaustion. We therefore use a different strategy for larger allocations, which
8we call guard objects.
9
10
11## Algorithm
12
13Allocation policies are assigned to particular regions of kernel address space.
14The strategy specified by this document is used by regions such as the various
15pointer ranges of `kmem_alloc()`, which previously used a first-fit allocator.
16Under the guard-objects policy, the virtual address space is divided in
17*chunks*. A chunk has a size class of the form $2^k \times \mathtt{PAGE\_SIZE}$,
18and is made of $\mathcal{S}$ *slots* of that size. $\mathcal{S}$ varies with the
19size class: smaller size-classes will have more slots, and larger size classes
20will have fewer. Chunks are also configured to have $\mathcal{G}$ _guards_ and
21up to $\mathcal{Q}$ _quarantines_.
22
23Chunks maintain several other important pieces of information, such as:
24
25* which slots are allocated and which slots are free;
26* a dynamic count of quarantined slots within $[0, \mathcal{Q})$;
27* a count of *available* slots, which are the number of free slots in excess
28  of guards $\mathcal{G}$ and the number of currently quarantined slots.
29
30A chunk can be in three states: *empty* if all its slots are free, *partial* if
31it has at least one available slot, *full* if it has no available slots. Chunks
32are maintained in lists segregated by size class and state.
33
34### Allocating memory
35
36Memory requests for a given size are rounded up to the next size class of the
37form $2^k$. The allocator must first select an appropriate chunk. Partial chunks
38are preferred, and if no partial chunk exists, an empty chunk is allocated from
39unused virtual address space.
40
41A random slot is then chosen from any of the free slots in that chunk, and the
42available count of the chunk is decremented. If the chunk has now exhausted its
43available slots — only quarantined slots and guards are left — it's placed on
44its corresponding size class's full list.
45
46Visually, let’s consider an example with two partial chunks A and B, with 8
47slots each, for $\mathcal{G} = \mathcal{Q} = 2$:
48
49```
50                A───────────────┬─┬─┬─┬─┐     B───────────────┬─┬─┬─┬─┐
51                │ Available:  1 │X│ │ │X│     │ Available:  4 │ │ │X│ │
52             ┌─▶│ Quarantine: 0 ├─┼─┼─┼─┤────▶│ Quarantine: 0 ├─┼─┼─┼─┤
53┌─────────┐  │  │ Guards:     2 │X│X│X│ │     │ Guards:     2 │ │X│ │ │
54│ partial │──┘  └───────────────┴─┴─┴─┴─┘     └───────────────┴─┴─┴─┴─┘
55├─────────┤
56│  full   │
57└─────────┘
58
59Legend:
60┌─┐                  ┌─┐
61│ │ free slot        │X│ allocated slot
62└─┘                  └─┘
63```
64
65The first allocation will be performed from chunk A, using its last available
66slot and moving it to the full list:
67
68```
69                B───────────────┬─┬─┬─┬─┐
70                │ Available:  4 │ │ │X│ │
71             ┌─▶│ Quarantine: 0 ├─┼─┼─┼─┤
72┌─────────┐  │  │ Guards:     2 │ │X│ │ │
73│ partial │──┘  └───────────────┴─┴─┴─┴─┘
74├─────────┤
75│  full   │──┐  A───────────────┬─┬─┬─┬─┐
76└─────────┘  │  │ Available:  0 │X│ │X│X│
77             └─▶│ Quarantine: 0 ├─┼─┼─┼─┤
78                │ Guards:     2 │X│X│X│ │
79                └───────────────┴─┴─┴─┴─┘
80```
81
82When the next allocation request in this size class arrives, the allocator will
83select B because A is now full:
84
85```
86                B───────────────┬─┬─┬─┬─┐
87                │ Available:  3 │ │ │X│ │
88             ┌─▶│ Quarantine: 0 ├─┼─┼─┼─┤
89┌─────────┐  │  │ Guards:     2 │ │X│X│ │
90│ partial │──┘  └───────────────┴─┴─┴─┴─┘
91├─────────┤
92│  full   │──┐  A───────────────┬─┬─┬─┬─┐
93└─────────┘  │  │ Available:  0 │X│ │X│X│
94             └─▶│ Quarantine: 0 ├─┼─┼─┼─┤
95                │ Guards:     2 │X│X│X│ │
96                └───────────────┴─┴─┴─┴─┘
97```
98
99### Deallocating memory
100
101Deallocating a virtual memory range works in reverse. First, we evaluate which
102chunk and slot correspond to the range being freed. Since the semantics of the
103virtual memory subsystem mandate that we must support partial deallocations, we
104next consider whether the slot has become only partially free. If so, we have
105nothing more to do for now; the slot remains in use.
106
107If however the slot is now entirely free, then the quarantine count of the chunk
108is incremented. If at least $\mathcal{G} + \mathcal{Q}$ are free, then the
109quarantine is cleared. The idea behind this policy is that maintaining a good
110entropy requires enough free slots to choose from. As a result, once the free
111slot count dips below $\mathcal{G} + \mathcal{Q}$, freed slots are quarantined
112rather than made immediately available.
113
114Finally, we evaluate whether the chunk needs to be moved:
115
116* if a chunk's slots are all fully free, the chunk is marked as empty, and is
117  typically returned to the system as free space;
118* if the chunk previously had no slots available, but has any available now,
119  the chunk is moved to the partially-free chunks list.
120
121Visually, let’s consider an example with a chunk using 16 slots,
122under a configuration in which $\mathcal{G} = \mathcal{Q} = 4$:
123
124```
125                 A───────────────┬─┬─┬─┬─┬─┬─┬─┬─┐
126┌─────────┐      │ Available:  1 │ │X│X│X│ │ │X│ │
127│ partial │─────▶│ Quarantine: 1 ├─┼─┼─┼─┼─┼─┼─┼─┤
128├─────────┤      │ Guards:     4 │ │X│X│X│X│X│ │X│
129│  full   │      └───────────────┴─┴─┴─┴─┴─┴─┴─┴─┘
130└─────────┘
131
132Legend:
133┌─┐                  ┌─┐
134│ │ free slot        │X│ allocated slot
135└─┘                  └─┘
136```
137
138If we now free an element, its slot is marked free, and the quarantine count
139is increased:
140
141```
142                 A───────────────┬─┬─┬─┬─┬─┬─┬─┬─┐
143┌─────────┐      │ Available:  1 │ │X│X│X│ │ │X│ │
144│ partial │─────▶│ Quarantine: 2 ├─┼─┼─┼─┼─┼─┼─┼─┤
145├─────────┤      │ Guards:     4 │ │X│X│ │X│X│ │X│
146│  full   │      └───────────────┴─┴─┴─┴─┴─┴─┴─┴─┘
147└─────────┘
148```
149
150If we allocate the last available element, a slot is now marked used,
151the available count drops to 0, and causes the chunk to now be full,
152and the quarantine stays untouched:
153
154```
155┌─────────┐
156│ partial │      A───────────────┬─┬─┬─┬─┬─┬─┬─┬─┐
157├─────────┤      │ Available:  0 │X│X│X│X│ │ │X│ │
158│  full   │─────▶│ Quarantine: 2 ├─┼─┼─┼─┼─┼─┼─┼─┤
159└─────────┘      │ Guards:     4 │ │X│X│ │X│X│ │X│
160                 └───────────────┴─┴─┴─┴─┴─┴─┴─┴─┘
161```
162
163Freeing just one element would return just one slot, and bump the quarantine
164count to 3. It takes freeing two elements for more than $\mathcal{G} +
165\mathcal{Q}$ slots to be free, leading to clearing the quarantine:
166
167```
168                 A───────────────┬─┬─┬─┬─┬─┬─┬─┬─┐
169┌─────────┐      │ Available:  4 │X│X│X│X│ │ │X│ │
170│ partial │─────▶│ Quarantine: 0 ├─┼─┼─┼─┼─┼─┼─┼─┤
171├─────────┤      │ Guards:     4 │ │X│X│ │ │ │ │X│
172│  full   │      └───────────────┴─┴─┴─┴─┴─┴─┴─┴─┘
173└─────────┘
174```
175
176### Operation clamping to a slot
177
178As long as a VM operation does not exceed slot bounds, any VM call is permitted,
179whether it is configuration altering calls such as `vm_protect()` or
180`vm_inherit()`, taking copies of the range with `mach_make_memory_entry()`, or
181replacing part of the mapping with `vm_allocate(..., VM_FLAGS_FIXED |
182VM_FLAGS_OVERWRITE)`.
183
184However, operations that cross a slot boundary are not permitted, and lead to
185termination. When guard object policies are in effect, allocations are
186randomized, and even in a single threaded context, clients can't assume that
187consecutive allocations will be served in address order, and as a result,
188operations crossing slot boundaries are always bugs.
189
190
191## Security motivation
192
193With the algorithm explanation, the “guard object” moniker should start to make
194sense: the goal is that unused free slots are object-sized guards — in the
195guard-page sense. Unlike usage of traditional guard pages, which only protect
196against linear buffer overflows, this scheme also adds a probabilistic
197mitigation against use-after-free or non-linear buffer overflows, forcing
198attackers to contend with a high probability of hitting these guards.
199
200Due to visible timing differences, it is assumed that an attacker can observe
201when a chunk is being allocated or freed. However, this doesn't give them an
202edge because the system maintains a guarantee that at least
203$\mathcal{G}/\mathcal{S}$ of the address space causes a crash when accessed at
204any given time. This is why we can let go of any form of sequestering of the
205address space for ranges managed with the guard-object allocation policy.
206
207### Use-after-Free exploitation strategies and failure rates
208
209Attackers attempting to exploit a use-after-free will be able to cause an
210element to be freed, knowing that a dangling pointer to it remains. They will
211then try to replace this freed element with another one of a different type
212that they control, creating a type confusion.
213
214Because allocating new chunks causes visible timing differences, we can assume
215that attackers are able to fill chunks with all slots corresponding to elements
216that they control. They also know which elements are in the same chunk, but
217don't know which element corresponds to which slot.
218
219**In the absence of the quarantine**, the best strategy for attackers trying to
220exploit a use-after free is to perform $\mathcal{S} − \mathcal{G}$ rounds
221of freeing then reallocating each element in the chunk, where the first free is
222to the element they are trying to use-after-free, so that they retain a dangling
223pointer to the original slot. Each round, the allocator will choose one slot
224among $\mathcal{G} + 1$, when only one corresponds to the slot that was freed by
225triggering the bug. The failure rate the attackers face with this strategy is
226
227$$failure\_rate = \left(
228\frac{\mathcal{G}}{\mathcal{G+1}}\right)^{\mathcal{S} - \mathcal{G}}$$
229
230* $\mathcal{S} = 8, \mathcal{G} = 2$ yields a 8.8% failure rate;
231* $\mathcal{S} = 16, \mathcal{G} = 4$ yields a 6.9 failure rate.
232
233**Using the quarantine** further reduces an attacker's odds of success. Unlike
234before, they need to free at least $\mathcal{Q}$ elements before elements are
235made available for allocations again. As a result, a round now needs to be
236$\mathcal{Q}$ deallocations followed by $\mathcal{Q}$ allocations. As before,
237the very first deallocation is to the element the attacker tries to
238use-after-free. A round gives attackers $\frac{\mathcal{G}}{
239\mathcal{G}+\mathcal{Q}}$ chances of failure. The aggregate failure rate for
240this strategy is
241
242$$failure\_rate =\left( 1 - \frac{(\mathcal{S} - \mathcal{G)} \bmod \mathcal{Q}
243}{\mathcal{G} + \mathcal{Q}} \right)
244\left(\frac{\mathcal{G}}{\mathcal{G} + \mathcal{Q}}\right)^{
245\left\lfloor \frac{\mathcal{S} -\mathcal{G}}{\mathcal{Q}} \right\rfloor}$$
246
247* $\mathcal{S}=8, \mathcal{G}=1, \mathcal{Q}=4$ yields a 8% failure rate;
248* $\mathcal{S}=8, \mathcal{G}=2, \mathcal{Q}=2$ yields a 12.5% failure rate;
249* $\mathcal{S}=16, \mathcal{G}=4, \mathcal{Q}=3$ yields a 10.7% failure rate;
250* $\mathcal{S}=16, \mathcal{G}=4, \mathcal{Q}=4$ yields a 12.5% failure rate.
251
252### Out-of-bound exploitation strategies
253
254Exploiting out-of-bound bugs requires knowing the distance between allocated
255objects, which an attacker a priori doesn’t know without some information leak.
256The regions protected by guard objects are coarsely randomized in the address
257space, so that attackers can’t predict how far an allocation is from other juicy
258exploitation targets in the address space such as the `__DATA` segment.  Lastly,
259guard objects are combined in XNU with type isolation and allocation fronts. It
260makes cross-type attacks unreliable and unpredictable — as the various buckets
261of types are randomized per boot.
262
263The last remaining avenue of attack targets types that fall in the same
264allocation front. However, attackers still have to contend with the uniform
265$\mathcal{G}/\mathcal{S}$ density of holes, making out-of-bound unreliable with
266a probability of failure close to $\mathcal{G}/\mathcal{S}$. This probability of
267failure is typically worse than use-after-free for attackers, and intuitively,
268this is precisely because they need to know more information — the distance
269between objects — unlike use-after-free where that distance is obviously known
270to be zero.
271
272### Choice of parameters
273
274In the actual implementation, $\mathcal{S}$ scales up with the size of the
275allocations going up — as a way to limit the amount of metadata needed.
276Maintaining the $\mathcal{G}/\mathcal{S}$ and $\mathcal{Q}/\mathcal{S}$ ratios
277constant for any $\mathcal{S}$ allows for all probabilities to become
278independent of $\mathcal{S}$.
279
280Our goal is to make attackers face at least 10% chance of failure — in the
281absence of information disclosure — which is why we chose numbers where
282$\mathcal{G} = \mathcal{Q} = \mathcal{S}/4$, yielding:
283
284* 25% failure rates for out-of-bound exploitation;
285* 12.5% failure rates for use-after-free exploitation.
286
287