1# VM API parameter sanitization 2 3Validating parameter values passed to virtual memory APIs primarily from user 4space. 5 6## Overview 7 8VM parameter sanitization aims to eliminate shallow input validation 9bugs like overflows caused by rounding addresses to required page size, 10by providing a set of APIs that can be used to perform consistent, thorough 11mathematical checks on the input. This allows for the rest of the subsystem to 12freely operate on the input without worrying that future computations may 13overflow. Note that these APIs are meant to primarily catch issues with 14mathematical computation and are not responsible for checking if the input 15value is within certain expected bounds or valid in the context of a specific 16VM API. 17 18## Semantic types 19 20To enforce that sanitization is performed on input prior to use, 21unsafe input types are encapsulated as opaque types (i.e wrapped inside a 22transparent union) to the internal implementation of the VM APIs. Performing 23mathematical operations on these opaque values without calling the 24respective sanitization functions (that validates and unwraps them) 25will generate a compiler error. 26 27Types that are typically considered unsafe (i.e require sanitization) include: 28- Address/offset for example vm_offset_t and vm_address_t 29- Size for example vm_size_t 30- Various flags like vm_prot_t and vm_inherit_t 31 32## Sanitizer functions 33 34The functions that sanitize various types of input values are implemented 35in `vm_sanitize.c` and documented in their corresponding header 36`vm_sanitize_internal.h`. 37 38## VM API boundary 39 40VM functions can be called from three places: userspace, kexts, and xnu itself. 41Functions callable from userspace should be fully sanitized. Functions 42callable from kexts and xnu are less thoroughly covered today. 43 44## Telemetry and error code compatibility 45 46When VM parameter sanitization finds a problem, it does the following: 47- returns an error to the API's caller 48- optionally *rewrites* that error first, either to a different 49 error code or to `KERN_SUCCESS`. 50- optionally *telemeters* that error, sending it to CoreAnalytics and ktriage. 51 52The option to rewrite and/or telemeter is chosen based on the sanitizer 53type and on the identity of the VM API that called the sanitizer. 54The VM API identity is the `vm_sanitize_caller_t` passed to the sanitizer 55function. This identity contains function pointers that override the 56default behavior (i.e. no rewrite, no telemetry). The overrides, if any, are set 57by `VM_SANITIZE_DEFINE_CALLER` in `vm_sanitize_error_compat.c`. 58 59Error code rewrites change the error code to better match historical 60behavior for binary compatibility purposes. There are two possible rewrites: 611. rewrite an error code to be a different error code. 622. rewrite an error code to be `KERN_SUCCESS`. The VM API returns success 63 immediately without executing the rest of its implementation. 64Not all changed error codes are (or could be) rewritten. 65 66Telemetry similarly may record two cases: 671. The error code being returned differs from its historical value. 682. The error code being returned would be different from its historical 69 value, but a rewrite has changed it to match the historical value instead. 70Not all changed error codes are (or could be) telemetered. Currently all 71rewrites performed are telemetered. 72 73An outline of the sequence: 741. VM API calls a sanitizer function, passing its own identity in `vms_caller`. 752. `vm_sanitize_<kind>` looks for invalid parameters. 763. If an invalid parameter is found, the sanitizer calls 77 `vm_sanitize_err_compat_<kind>` to handle any rewrites or telemetry. 784. `vm_sanitize_err_compat_<kind>` looks for an override handler 79 for that type in the caller's identity, and calls it if present. 805. `vm_sanitize_err_compat_<kind>_<caller>`, the override handler, examines the 81 parameters and chooses whether to rewrite and/or telemeter this error. 82 It returns a `vm_sanitize_compat_rewrite_t` containing its decision. 836. `vm_sanitize_err_compat_<kind>` applies any requested error code rewrite 84 and sends any requested telemetry. 857. The VM API receives the error from the sanitizer and returns it. 86 87There is a complication in step #7: how do the error compat and 88the sanitizer tell the VM API that it should halt and return `KERN_SUCCESS` 89immediately, distinct from the sanitizer telling the VM API that 90sanitization succeeded and the VM API should proceed normally? 91The scheme looks like this: 92- sanitizer returns `KERN_SUCCESS`: VM API may proceed normally 93- sanitizer returns not-`KERN_SUCCESS`: VM API shall return immediately 94 - sanitizer returns `VM_ERR_RETURN_NOW`: VM API shall return `KERN_SUCCESS` now 95 - sanitizer returns any other error code: VM API shall return that error now 96The mapping of `VM_ERR_RETURN_NOW` to `KERN_SUCCESS` is performed by 97`vm_sanitize_get_kern_return`. 98 99## How to: add a new sanitizer or sanitized type 100 101When a new type needs sanitization, use one of the following macros to declare 102and define the encapsulated opaque version: 103- `VM_GENERATE_UNSAFE_ADDR`: Should be used for a new variant that represents 104 address or offset 105- `VM_GENERATE_UNSAFE_SIZE`: Should be used for a new variant that represents 106 size 107- `VM_GENERATE_UNSAFE_TYPE`: Should be used for other types that are not 108 address or size. For example, this macro is currently used to define the 109 opaque protections type `vm_prot_ut`. 110 111These opaque types are declared in `vm_types_unsafe.h`. There are also some 112variants of these macros for specific purposes: 113- 32 bit variants like `VM_GENERATE_UNSAFE_ADDR32` should be used for 32bit 114 variants of address, offset and size. 115- BSD variants like `VM_GENERATE_UNSAFE_BSD_ADDR` for types that are 116 specifically used in the BSD subsystem and not in mach (for example: 117 caddr_t). 118- EXT variants like `VM_GENERATE_UNSAFE_EXT` should not be used directly. They 119 are intermediate implementation macros. 120- `VM_GENERATE_UNSAFE_WRAPPER` is a special macro that is needed to avoid 121 compiler errors when pointers of opaque types of a specific kind are 122 interchangeably used as pointer of another opaque type of the same kind for 123 example: 124 ``` 125 mach_vm_offset_ut *offset; 126 ... 127 mach_vm_address_ut *ptr = offset; 128 ``` 129 These macros define a common opaque type for the entire kind that other 130 `_ADDR`/`_SIZE` macros redirect to. 131 ``` 132 VM_GENERATE_UNSAFE_WRAPPER(uint64_t, vm_addr_struct_t); 133 ``` 134 generates the common opaque type for address and offset. All the `_ADDR` 135 macros define respective opaque types as a typedef of 136 `vm_addr_struct_t`. 137 ``` 138 VM_GENERATE_UNSAFE_ADDR(mach_vm_address_t, mach_vm_address_ut); 139 ``` 140 typedefs `mach_vm_address_ut` as a `vm_addr_struct_t`. 141 142## How to: add sanitization to new VM API 143 144Once the opaque type is available to use, modify the respective 145declaration/definition of the entry point to use the opaque types. 146 147### Opaque types in function prototype 148 149#### Adoption in MIG 150 151For APIs that are exposed via MIG, adopting the new opaque type in the 152API requires some additional steps as we want the opaque types to only appear 153in the kernel headers, leaving the userspace headers unchanged. 154- Associate the safe type with its unsafe type using `VM_UNSAFE_TYPE` or 155 `VM_TYPE_SAFE_UNSAFE` macros. For example: 156 ``` 157 type mach_vm_address_t = uint64_t VM_UNSAFE_TYPE(mach_vm_address_ut); 158 ``` 159 will cause MIG to use the original type `mach_vm_address_t` in the userspace 160 headers that are generated by MIG, but overload with the unsafe type 161 `mach_vm_address_ut` for kernel headers. 162 Similarly, 163 ``` 164 type pointer_t = ^array[] of MACH_MSG_TYPE_BYTE 165 VM_TYPE_SAFE_UNSAFE(vm_offset_t, pointer_ut); 166 ``` 167 replaces `pointer_t` with `vm_offset_t` in userspace headers 168 and `pointer_ut` in kernel headers. 169- Ensure that `VM_KERNEL_SERVER` is defined at the top of the defs file before 170 any includes. 171- Adopt the opaque types in the function definition present in the `.c` file. 172 ``` 173 kern_return_t 174 mach_vm_read( 175 vm_map_t map, 176 mach_vm_address_ut addr, 177 mach_vm_size_ut size, 178 pointer_ut *data_u, 179 mach_msg_type_number_t *data_size) 180 ``` 181 182#### Adoption in syscalls 183 184- Ensure that you have created the opaque types needed by the BSD subsystem 185 using `VM_GENERATE_UNSAFE_BSD_*` in `osfmk/mach/vm_types_unsafe.h`. 186- Add the new opaque type to `sys/_types/*` or `bsd/<arm or i386>/types.h`. 187 `caddr_ut` was added to `bsd/sys/_types/_caddr_t.h` and `user_addr_ut` was 188 added to `bsd/arm/types.h` and `bsd/i386/types.h`. When adding an opaque for 189 `caddr_t` you may also need to add opaque types for corresponding types like 190 `user_addr_t` as the syscall generated use those types. 191- Also add the types to `libsyscall/xcodescripts/create-syscalls.pl`. 192- Adopt the opaque type in the API in `syscalls.master`. 193 ``` 194 203 AUE_MLOCK ALL { int mlock(caddr_ut addr, size_ut len); } 195 ``` 196 `mlock` uses opaque type `caddr_ut` for its address and `size_ut` for its 197 size. 198- Modify `bsd/kern/makesyscalls.sh` to handle the new types added. 199 200#### Adoption in mach traps 201 202Function prototypes aren't generated automatically for mach traps as is the 203case for syscalls. Therefore we need to modify the mach trap manually to use 204the opaque type in `osfmk/mach/mach_traps.h`. 205``` 206struct _kernelrpc_mach_vm_deallocate_args { 207 PAD_ARG_(mach_port_name_t, target); /* 1 word */ 208 PAD_ARG_(mach_vm_address_ut, address); /* 2 words */ 209 PAD_ARG_(mach_vm_size_ut, size); /* 2 words */ 210}; /* Total: 5 */ 211extern kern_return_t _kernelrpc_mach_vm_deallocate_trap( 212 struct _kernelrpc_mach_vm_deallocate_args *args); 213``` 214### Perform sanitization 215 216Now that the internal function definitions see the opaque types, we need to 217perform the required sanitization. If multiple entry points call the same 218internal function, pass along the unsafe value and perform the check at the 219best choke point further down. For example the best choke point for the 220following APIs was `vm_map_copyin_internal`: 221- `mach_vm_read` 222- `vm_read` 223- `mach_vm_read_list` 224- `vm_read_list` 225- `vm_map_copyin` 226- `mach_vm_read_overwrite` 227- `mach_vm_copy` 228 229Once you have determined the right choke point create a 230`<function name>_sanitize` function that will sanitize all opaque types and 231return their unwrapped safe values. In this function you should call the 232sanitization functions provided in `vm_sanitize.c` to validate all opaque 233types adopted by the API. If you added a new type that doesn't have a 234corresponding sanitization function in `vm_sanitize.c`, please add one. 235For existing types, try to reuse the functions provided instead of 236writing new ones with specific purposes. `vm_sanitize.c` is meant to 237contain the basic blocks that could be chained to meet your specific 238requirements. 239 240#### Adding new functions to `vm_sanitize.c` 241 242- Mark function with `__attribute__((always_inline, 243 warn_unused_result))`. 244- Ensure that you return safe values on failure for all opaque types that 245 were supposed to be sanitized by the function. 246 247### Enforcement 248 249For files outside `osfmk/vm` and `bsd/vm` that need to see the opaque type 250add the following to their `conf/Makefile.template`: 251``` 252kern_mman.o_CFLAGS_ADD += -DVM_UNSAFE_TYPES 253``` 254 255## Tests 256 257Most VM API callable from userspace or kexts have tests that pass correct and 258incorrect input values, to verify that the functions return the expected error 259codes. These tests run every VM function that has sanitized parameters dozens 260or hundreds or thousands of times. 261 262The code for these tests is: 263- `tests/vm/vm_parameter_validation.c` (test `vm_parameter_validation_user` 264for userspace call sites) 265- `osfmk/tests/vm_parameter_validation_kern.c` (test 266`vm_parameter_validation_kern` for kernel or kext call sites) 267 268The expected error codes returned by these calls are stored in "golden" result 269files. If you change the error code of a VM API, or define a new flag bit that 270was previously unused, you may need to update the golden results. 271See `tests/vm/vm_parameter_validation.c` for instructions. 272 273You can run these tests locally. See `tests/vm/vm_parameter_validation.c` 274for instructions. 275 276A *trial* is a single VM function called with a single set of argument values. 277For example, `mach_vm_protect(VM_PROT_READ)` with address=0 and size=0 is a 278single trial. 279 280A *test* is made up of multiple trials: a single VM function called many 281times with many values for one sanitized parameter (or group of related 282parameters). For example, `mach_vm_protect(VM_PROT_READ)` with many different 283pairs of address and size is a single test. `mach_vm_protect` with a single 284valid address+size and many different `vm_prot_t` values is another test. 285 286The trial values in these tests are generally intended to provoke bugs 287that the sanitizers are supposed to catch. The list of trial values for 288address+size provokes various integer overflows if they are added and/or 289rounded. The list of trial values for flags like `vm_prot_t` includes at 290least one trial for every possible set bit. The list of trial values for 291a sanitized type or group of types is produced by a "generator". Each 292trial generator is in `osfmk/tests/vm_parameter_validation.h`. 293 294A test `harness` or `runner` is the loop that runs a VM function with 295every trial value, performing any setup necessary and collecting the results. 296These function names start with `test_`. For example, 297`test_mach_with_allocated_vm_prot_t` runs `vm_prot_t` trials of a VM API, 298each time passing it the address and size of a valid allocation and a 299different `vm_prot_t` value. This particular runner is used by some tests of 300`mach_vm_protect`, `mach_vm_wire`, and others. 301 302The output of all trials in one test is collected as `results_t`, storing the 303name of the test, the name of each trial, and the error code from each trial. 304The "error code" is also used for trial outcomes that are not return values 305from the VM API. For example, value `PANIC` means the trial was deliberately 306not executed because if it were it would have panicked and the test machinery 307can't handle that. 308 309After each test the collected results are processed. Normally this means 310comparing them to the expected results from the golden files. Test results 311may also be used to generate new golden files. Test results may also be 312dumped to console in their entirety. You can pipe dumped output to 313`tools/format_vm_parameter_validation.py`, which knows how to pretty-print 314some things. 315 316These tests are intended to exercise every kernel entry point from userspace 317directly, both MIG and syscall, even for functions that have no access via 318Libsystem or that Libsystem intercepts. For MIG entry points we generate our 319own MIG call sites; see `tests/Makefile` for details. For syscall entry points 320we sometimes call a `__function_name` entry point exported by Libsystem that 321is more direct than `function_name` would be. Examples: `__mmap`, `__msync`, 322`__msync_nocancel`. 323 324There are two sets of kernel entrypoints that are not exercised by these tests 325today: 3261. the MIG entrypoints that use 32-bit addresses, on platforms other than 327watchOS. These kernels respond to these MIG messages but Libsystem never sends 328them. We reviewed the vm32 implementations and decided they were safe and 329unlikely to do unsanitary things with the input values before passing them 330to VM API that perform sanitizations. These entrypoints should be disabled 331(rdar://124030574). 3322. the `kernelrpc` trap alternatives to some MIG entrypoints. We reviewed 333the trap implementations and decided they were safe and unlikely to do 334unsanitary things with the input values before passing them to VM API that 335perform sanitizations. 336 337## How to: add a new test 338 339You may need to write new tests in `vm_parameter_validation` if you do 340one of the following: 341- write a new VM API function (for userspace or kexts) that has parameters of 342sanitized types 343- implement sanitization in an existing VM API function for a parameter that 344was not previously sanitized 345 346Step 1: are you testing userspace callers (`tests/vm/vm_parameter_validation.c`), 347kernel/kext callers (`osfmk/tests/vm_parameter_validation_kern.c`), or both? 348If you are testing both kernel and userspace you may be able to share much of 349the implementation in the common file `osfmk/tests/vm_parameter_validation.h`. 350 351Step 2: decide what functions you are testing. Each API function with sanitized 352parameters get at least one test. Some functions are divided into multiple 353independent tests because the function has multiple modes of operation that 354use different parameter validation paths internally. For example, 355`mach_vm_allocate(VM_FLAGS_FIXED)` and `mach_vm_allocate(VM_FLAGS_ANYWHERE)` 356each get their own set of tests as if they were two different functions, 357because each handles their `addr/size` parameters differently. 358 359Step 3: decide what parameters you are testing. Each sanitized parameter or 360group of related parameters gets its own test. For example, `mach_vm_protect` 361has two parameter tests to perform, one for the protection parameter and one 362for the address and size parameters together. The sanitization of address and 363size are intertwined (we check for overflow of address+size), so they are 364tested together. The sanitization of the protection parameter is independent 365of the address and size, so it is tested separately. 366 367Step 4: for each parameter or group of parameters, decide what trial values 368should be tested. The trials should be exhaustive for small values, and 369exercise edge cases and invalid state for large values and interconnected 370values. `vm_prot_t` is exhaustive at the bit level (each bit is set in at 371least one trial) and probes edge cases like `rwx`. Address and size trials 372probe for overflows when the values are added and/or rounded to page sizes. 373Choose existing trial value generators for your parameters, or write new 374generators if you want a new type or different values for an existing type. 375Note that the trial name strings produced by the generator are used by 376`tools/format_vm_parameter_validation.py` to pretty-print your output; 377you may even want to edit that script to recognize new things from your 378code. The trial names are also used in the golden files; each trial 379name must be unique within a single test. 380 381Step 5: for each test, decide what setup is necessary for the test or for 382each trial in the test. Choose an existing test running or write a new 383runner with that setup and those trials. The test runner loops through 384the trial values produced by the trial generators above, performs the 385required setup for the test or for each trial, and calls the function 386to be tested. If there is an existing VM API with similar setup or 387similar parameters to yours then you can use the same runner or implement 388a variation on that runner. 389 390Step 6: if your VM API function has out parameters, test that they are 391modified or not modified as expected. This is not strictly related to 392parameter sanitization, but the sanitization error paths often have 393inconsistent out parameter handling so these tests are a convenient 394place to verify the desired behavior. 395 396Step 7: call all of your new tests from the top-level test functions 397`vm_parameter_validation_kern_test` and `vm_parameter_validation_user`. 398Wrap your calls in the same processing and deallocation functions as the 399other tests. You should not need to modify either of them. Note that string 400used to label the test (with the function and parameters being tested) is 401used by the pretty-printing in `tools/format_vm_parameter_validation.py` 402so choose it wisely; you may even want to edit that script to recognize 403new things from your code. The test name is also recorded in the golden 404files; each test name must be unique. 405 406Step 8: run your new tests and verify that the patterns of success and 407error are what you want. `tools/format_vm_parameter_validation.py` can 408pretty-print some of these outputs which makes them easier to examine. 409Make sure you test the platforms with unusual behavior, such as Intel 410and Rosetta where page sizes are different. See 411`tests/vm/vm_parameter_validation.c` for instructions on how to run your 412tests in BATS or locally. 413 414Step 9: if you are adding sanitization to an existing VM API, decide if 415you need error code compatibility handling. Run your new test before and 416after your new sanitization code is in place and compare the output from 417`DUMP_RESULTS=1`. If your new sanitization has changed the function's 418error code behavior then you may want to write error code compatibility 419rewrites and/or telemetry for binary compatibility. 420 421Step 10: update the "golden" files of expected results. This is done last 422when you are confident that your sanitization and tests are complete and 423stable. See `tests/vm/vm_parameter_validation.c` for instructions. 424