1# XNU Allocators best practices 2 3## Introduction 4 5XNU proposes two ways to allocate memory: 6 7- the VM subsystem that provides allocations at the granularity of pages (with 8 `kernel_memory_allocate` and similar interfaces); 9- the zone allocator subsystem (`<kern/zalloc.h>`) which is a slab-allocator of 10 objects of fixed size. 11 12In addition to that, `<kern/kalloc.h>` provides a variable-size general purpose 13allocator implemented as a collection of zones of fixed size, and overflowing to 14`kernel_memory_allocate` for allocations larger than a few pages (32KB when this 15document was being written but this is subject to change/tuning in the future). 16 17 18The Core Kernel allocators rely on the following headers: 19 20- `<kern/zalloc.h>` and `<kern/kalloc.h>` for its API surface, which most 21 clients should find sufficient, 22- `<kern/zalloc_internal.h>` for interfaces that need to be exported 23 for introspection and implementation purposes, and is not meant 24 for general consumption. 25 26This document will present the best practices to allocate memory 27in the kernel, from a security perspective. 28 29## Permanent allocations 30 31The kernel sometimes needs to provide persistent allocations that depend on 32parameters that aren't compile time constants, but will not vary over time (NCPU 33is an obvious example here). 34 35The zone subsystem provides a `zalloc_permanent*` family of functions that help 36allocating memory in such a fashion in a very compact way. 37 38Unlike the typical zone allocators, this allows for arbitrary sizes, in a 39similar fashion to `kalloc`. These functions will never fail (if the allocation 40fails, the kernel will panic), and always return zeroed memory. Trying to free 41these allocations results in a kernel panic. 42 43## Allocation flags 44 45Most `zalloc` or `kalloc` functions take `zalloc_flags_t` typed flags. 46When flags are expected, exactly one of `Z_WAITOK`, `Z_NOWAIT` or `Z_NOPAGEWAIT` 47is to be passed: 48 49- `Z_WAITOK` means that the zone allocator can wait and block, 50- `Z_NOWAIT` can be used to require a fully non blocking behavior, which can be 51 used for allocations under spinlock and other preemption disabled contexts; 52- `Z_NOPAGEWAIT` allows for the allocator to block (typically on mutexes), 53 but not to wait for available pages if there are none, this is only useful 54 for the buffer cache, and most client should either use `Z_NOWAIT` or `Z_WAITOK`. 55 56Other important flags: 57 58- `Z_ZERO` if zeroed memory is expected (nowadays most of the allocations will 59 be zeroed regardless, but it's always clearer to specify it), note that it is 60 often more efficient than calling bzero as the allocator tends to maintain 61 freed memory as zeroed in the first place, 62- `Z_NOFAIL` if the caller knows the allocation can't fail: allocations that are 63 made with `Z_WAITOK` from regular (non exhaustible) zones, or from `kalloc*` 64 interfaces with a size smaller than `KALLOC_SAFE_ALLOC_SIZE`, 65 will never fail (the kernel will instead panic if no memory can be found). 66 `Z_NOFAIL` can be used to denote that the caller knows about this. 67 If `Z_NOFAIL` is incorrectly used, then the zone allocator will panic at runtime. 68 69## Zones (`zalloc`) 70 71The first blessed way to allocate memory in the kernel is by using zones. 72Zones are mostly meant to be used in Core XNU and some "BSD" kexts. 73 74It is generally recommended to create zones early and to store the `zone_t` 75pointer in read-only memory (using `SECURITY_READ_ONLY_LATE` storage). 76 77Zones are more feature-rich than `kalloc`, and some features can only be 78used when making a zone: 79 80- the object type being allocated requires extremely strong segregation 81 from other types (typically `zone_require` will be used with this zone), 82- the object type implements some form of security boundary and wants to adopt 83 the read-only allocator (See `ZC_READONLY`), 84- the allocation must be per-cpu, 85- ... 86 87In the vast majority of cases however, using `kalloc_type` (or `IOMallocType`) 88is preferred. 89 90 91## The Typed allocator 92 93Ignoring VM allocations (or wrappers like `IOMemoryDescriptor`), the only 94blessed way to allocate typed memory in XNU is using the typed allocator 95`kalloc_type` or one of its variants (like IOKit's `IOMallocType`) and untyped 96memory that doesn't contain pointers is using the data API `kalloc_data` or 97one of its variants (like IOKit's `IOMallocData`). However, this comes with 98additional requirements. 99 100Note that at this time, those interfaces aren't exported to third parties, 101as its ABI has not yet converged. 102 103### A word about types 104 105The typed allocators assume that allocated types fit a very precise model. 106If the allocations you perform do not fit the model, then your types 107must be restructured to fit, for security reasons. 108 109A general theme will be the separation of data/primitive types from pointers, 110as attackers tend to use data/pointer overlaps to carry out their exploitations. 111 112The typed allocators use compiler support to infer signatures 113of the types being allocated. Because some scalars actually represent 114kernel pointers (like `vm_offset_t`,`vm_address_t`, `uintptr_t`, ...), 115types or structure members can be decorated with `__kernel_ptr_semantics` 116to denote when a data-looking type is actually a pointer. 117 118Do note that `__kernel_data_semantics` and `__kernel_dual_semantics` 119are also provided but should typically rarely be used. 120 121#### fixed-sized types 122 123The first case is fixed size types, this is typically a `struct`, `union` 124or C++ `class`. Fixed-size types must follow certain rules: 125 126- types should be small enough to fit in the zone allocator: 127 smaller than `KALLOC_SAFE_ALLOC_SIZE`. When this is not the case, 128 we have typically found that there is a large array of data, 129 or some buffer in that type, the solution is to outline this allocation. 130- for union types, data/pointer overlaps should be avoided if possible. 131 when this isn't possible, a zone should be considered. 132 133#### Variable-sized types 134 135These come in two variants: arrays, and arrays prefixed with a header. 136Any other case must be reduced to those, by possibly making more allocations. 137 138An array is simply an allocation of several fixed-size types, 139and the rules of "fixed-sized types" above apply to them. 140 141The following rules are expected when dealing with variable sized allocations: 142 143- variable sized allocations should have a single owner and not be refcounted; 144- under the header-prefixed form, if the header contains pointers, 145 then the array element type **must not** be only data. 146 147If those rules can't be followed, then the allocation must be split with 148the header becoming a fixed-sized type becoming the single owner 149of an array. 150 151#### Untyped memory 152 153When allocating untyped memory with the data APIs ensure that it doesn't 154contain kernel pointers. If your untyped allocation contains kernel pointers 155consider splitting the allocation into two: one part that is typed and contains 156the kernel pointers and the second that is untyped and data-only. 157 158### API surface 159 160<table> 161 <tr> 162 <th>Interface</th> 163 <th>API</th> 164 <th>Notes</th> 165 </tr> 166 <tr> 167 <td>Data/Primitive types</td> 168 <td> 169 <p> 170 <b>Core Kernel</b>:<br/> 171 <tt>kalloc_data(size, flags)</tt><br/> 172 <tt>krealloc_data(ptr, old_size, new_size, flags)</tt><br/> 173 <tt>kfree_data(ptr, size)</tt><br/> 174 <tt>kfree_data_addr(ptr)</tt> 175 </p> 176 <p> 177 <b>IOKit untyped variant (returns <tt>void *</tt>)</b>:<br/> 178 <tt>IOMallocData(size)</tt><br/> 179 <tt>IOMallocZeroData(size)</tt><br/> 180 <tt>IOFreeData(ptr, size)</tt> 181 </p> 182 <p> 183 <b>IOKit typed variant (returns <tt>type_t *</tt>)</b>:<br/> 184 <tt>IONewData(type_t, count)</tt><br/> 185 <tt>IONewZeroData(type_t, count)</tt><br/> 186 <tt>IODeleteData(ptr, type_t, count)</tt> 187 </p> 188 </td> 189 <td>This should be used when the allocated type contains no kernel pointer only</td> 190 </tr> 191 <tr> 192 <td>Fixed-sized type</td> 193 <td> 194 <p> 195 <b>Core Kernel</b>:<br/> 196 <tt>kalloc_type(type_t, flags)</tt><br/> 197 <tt>kfree_type(type_t, ptr)</tt> 198 </p> 199 <p> 200 <b>IOKit:</b><br/> 201 <tt>IOMallocType(type_t)</tt><br/> 202 <tt>IOFreeType(ptr, type_t)</tt> 203 </p> 204 </td> 205 <td> 206 <p> 207 Note that this is absolutely OK to use this variant 208 for data/primitive types, it will be redirected to <tt>kalloc_data</tt> 209 (or <tt>IOMallocData</tt>). 210 </p> 211 </td> 212 </tr> 213 <tr> 214 <td>Arrays of fixed-sized type</td> 215 <td> 216 <p> 217 <b>Core Kernel</b>:<br/> 218 <tt>kalloc_type(type_t, count, flags)</tt><br/> 219 <tt>kfree_type(type_t, count, ptr)</tt> 220 </p> 221 <p> 222 <b>IOKit:</b><br/> 223 <tt>IONew(type_t, count)</tt><br/> 224 <tt>IONewZero(type_t, count)</tt><br/> 225 <tt>IODelete(ptr, type_t, count)</tt> 226 </p> 227 </td> 228 <td> 229 <p> 230 <tt>kalloc_type(type_t, ...)</tt> (resp. <tt>IONew(type_t, 1)</tt>) 231 <b>isn't</b> equivalent to <tt>kalloc_type(type_t, 1, ...)</tt> 232 (resp. <tt>IOMallocType(type_t)</tt>). Mix-and-matching interfaces 233 will result in panics. 234 </p> 235 <p> 236 Note that this is absolutely OK to use this variant 237 for data/primitive types, it will be redirected to <tt>kalloc_data</tt>. 238 </p> 239 </td> 240 </tr> 241 <tr> 242 <td>Header-prefixed arrays of fixed-sized type</td> 243 <td> 244 <p> 245 <b>Core Kernel</b>:<br/> 246 <tt>kalloc_type(hdr_type_t, type_t, count, flags)</tt><br/> 247 <tt>kfree_type(hdr_type_t, type_t, count, ptr)</tt> 248 </p> 249 <p> 250 <b>IOKit:</b><br/> 251 <tt>IONew(hdr_type_t, type_t, count)</tt><br/> 252 <tt>IONewZero(hdr_type_t, type_t, count)</tt><br/> 253 <tt>IODelete(ptr, hdr_type_t, type_t, count)</tt> 254 </p> 255 </td> 256 <td> 257 <p> 258 <tt>hdr_type_t</tt> can't contain a refcount, 259 and <tt>type_t</tt> can't be a primitive type. 260 </p> 261 </td> 262 </tr> 263</table> 264 265## C++ classes and operator new. 266 267### `OSObject` subclasses 268 269All subclasses of `OSObject` must declare and define one of IOKit's 270`OSDeclare*` and `OSDefine*` macros. As part of those, an `operator new` and 271`operator delete` are injected that force objects to enroll into `kalloc_type`. 272 273Note that idiomatic IOKit is supposed to use `OSTypeAlloc(Class)`. 274 275### Other classes 276 277Unlike `OSObject` subclasses, regular C++ classes must adopt typed allocators 278manually. If your struct or class is POD then replacing usage of `new/delete` 279with `IOMallocType/IOFreeType` is safe. However, if you have non default 280structors or members of your class/struct have non default structors, then you 281must override operator new/delete as follows, which lets you to continue to use 282C++'s new and delete keywords to allocate/deallocate instances. 283 284```cpp 285struct Type { 286public: 287 void *operator new(size_t size) 288 { 289 return IOMallocType(Type); 290 } 291 292 void operator delete(void *mem, size_t size __unused) 293 { 294 IOFreeType(mem, Type); 295 } 296} 297``` 298When operator new/delete is overriden for a specific class, all its subclasses 299must also redefine their operator new/delete to use the typed allocators. 300 301### The case of `operator new[]` 302 303The ABI of `operator new[]` is unfortunate, as it denormalizes 304data that we prefer to be known by the owning object 305(the element sizes and array element count). 306 307It also makes those allocations ripe for abuse in an adversarial 308context as this denormalized information is at the begining 309of the structure, making it relatively easy to attack with 310out-of-bounds bugs. 311 312However, if those must be used, the following can be used 313to adopt typed allocators: 314 315```cpp 316struct Type { 317 /* C++ ABI for operator new[] */ 318 struct cpp_array_header { 319 size_t esize; 320 size_t count; 321 }; 322 323public: 324 void *operator new[](size_t count) 325 { 326 struct cpp_array_hdr *hdr; 327 hdr = IONew(struct cpp_array_hdr, Type, count); 328 if (hdr) { 329 hdr->esize = sizeof(Type); 330 hdr->count = count; 331 return (void *)(&hdr[1]); 332 } 333 return nullptr; 334 } 335 336 void operator delete[](void *ptr) 337 { 338 struct cpp_array_hdr *hdr; 339 340 hdr = (struct cpp_array_hdr *)((uintptr_t)ptr - sizeof(*hdr)); 341 IODelete(hdr, struct cpp_array_hdr, Type, hdr->count); 342 } 343} 344``` 345 346### Wrapping C++ type allocation in container OSObjects 347The blessed way of wrapping and passing a C++ type allocation for use in the 348libkern collection is using `OSValueObject`. Please do no use OSData for this 349purpose as its backing store should not contain kernel pointers. 350 351