1# Bounds checking 2 3The goal of -fbounds-safety is to prevent buffer overflows from escalating into 4security issues. However, that escalation is prevented by crashing the program, 5which, in the case of the kernel, means panicking the system. While panicking 6is a lesser evil than allowing an attacker to compromise the system, it is 7still a drastic measure. 8 9xnu's build system supports several options controlling the enforcement of 10bounds checks via clang's -fbounds-safety extension. This document describes a 11process that implements our best practices to adopt -fbounds-safety in existing 12code. 13 14# Controllable aspects of -fbounds-safety 15 16-fbounds-safety is enabled at a file granularity in the xnu build system. 17Whether a given file builds with -fbounds-safety is controlled by the build 18system's configuration `files` under each kernel component. For instance, one of 19the first components in xnu to enable -fbounds-safety is bsd/net: as a result, 20bsd/conf/files is where build system modifications were made. 21 22There are five options that control aspects related to -fbounds-safety: 23 24* whether -fbounds-safety is enabled at all; 25* when it is disabled, whether we should allow `__indexable` and 26 `__bidi_indexable` in source (or emit a compile-time error if they're used); 27* when it is enabled, whether a trap should be a panic, or whether it should 28 only report a telemetry event; 29* when it is set to panic, whether we should optimize for code size at the 30 expense of the quality of debug information. 31* which set of new bounds checks (`-fbounds-safety-bringup-missing-checks`) are enabled 32 33## Code size tradeoffs 34 35We can ask clang to give us one trap instruction per function, which can have 36significant positive effects on code size and performance. However, every bounds 37check in that function will jump to that trap instruction when they fail. Debug 38information on the trap instruction will be meaningless and the debugger won't 39know where we came from. This manifests as a "function(), file.c:0" call stack 40entry in the backtrace. 41 42On the other hand, we can ask clang to give us one trap instruction per bounds 43check. In that configuration, we get arguably bad codegen, but the backtrace is 44always immediately readable and the trap location shows correctly in the 45debugger. 46 47To debug a panic in a build optimizing for code size, we can read disassembly 48and make inferences based on register values. For instance, if we look at one 49bounds check failing if register `x8` is greater than register `x9`, and in the 50context of our panic we know that `x8` is 0x0 and `x9` is 0x1000, then we know 51we can't possibly have failed because of that bounds check. There are scripts 52to automate this reasoning–ask the -fbounds-safety DRIs for help if you run into 53this situation. 54 55## Bounds checking adoption level options 56 57* (nothing): -fbounds-safety is disabled; it is an error to use `__indexable` 58 and `__bidi_indexable` in source. 59* `bound-checks-pending`: -fbounds-safety is disabled, but `__indexable` and 60 `__bidi_indexable` are defined to nothing instead of causing compile-time 61 errors. 62* `bound-checks`: -fbounds-safety is enabled; failing bounds checks panic; 63 optimize for code size at the detriment of debuggability. 64* `bound-checks-debug`: -fbounds-safety is enabled; failing bounds checks panic; 65 optimize for debug information at the detriment of efficient code. 66* `bound-checks-soft`: -fbounds-safety is disabled for RELEASE kernels; 67 for all other kernel configurations failing bounds checks generate a telemetry event 68 instead of panicking; optimize for debug information at the detriment of efficient code. 69* `bound-checks-seed`: -fbounds-safety is enabled. For RELEASE kernels, failing checks 70 generate a telemetry event instead of panicking; for all other kernel configurations 71 failing bound checks panic. 72 73These options are mutually exclusive. 74 75### Bounds checking adoption level modifier options 76 77In addition to the bounds checking adoption level options (e.g. 78`bounds-checks-debug`), modifier options can be added to the selected adoption 79level. Note it is invalid to use these options without first specifying a 80`bound-check*` level option (i.e. any level except "nothing"). 81Furthermore, the bound-check level option must appear before any modifiers (see examples below). 82 83* `bound-checks-new-checks`: If building with `-fbounds-safety` this causes 84 `-fbounds-safety-bringup-missing-checks` to be added to the compiler flags. 85 86Examples: 87 88``` 89# ok: `-fbounds-safety -fbounds-safety-bringup-missing-checks` passed to compiler 90test.c optional bounds-checks bound-checks-new-checks 91 92# invalid: An adoption level that's not "nothing" needs to be specified 93test.c optional bound-checks-new-checks 94 95# invalid: `bounds-checks` needs to be specified first 96test.c optional bound-checks-new-checks bounds-checks 97``` 98 99# The process of enabling bound-checks 100 101`bound-checks` is the final, desirable bounds checking adoption level 102configuration option. We do not enable `bound-checks` lightly, as it can 103introduce new reasons that xnu panics. We have found that the following process 104consistently helps land code changes that stick, and help reduce the likelihood 105of introducing problems that turn into bad kernels. 106 107## Step 1: adopt -fbounds-safety at desk 108 109When enabling -fbounds-safety, clang generates new diagnostics that ensure at 110compile-time that bounds could be known at runtime (if necessary) for all 111pointers, and new diagnostics for when a bounds check is likely (or guaranteed) 112to fail at runtime. The first step to adopting -fbounds-safety is making code 113changes to xnu such that it builds without any diagnostics, and testing at desk 114that your changes did not impact kernel functionality. 115 116For this step, you use `bound-checks-debug`. `bound-checks-debug` enables the 117entire breadth of -fbounds-safety diagnostics and gives you the most easily 118debugged bounds checks. You should also use bound-checks-debug for xnu changes 119that you send to integration testing. 120 121## Step 2: separate adoption from enablement 122 123Once you're confident in your code changes, everything builds, at-desk testing 124is successful and integration testing is happy, you start two pull requests: 125 126* one pull request with the necessary adoption code changes, configuring the 127 file to build with `bound-checks-pending`; 128* one pull request that changes `bound-checks-pending` to `bound-checks-soft`. 129 130This strategy can save your change and other people's changes even in the face 131of small errors. Read on to "where bound-checks-soft comes in" for more details. 132 133### Where bound-checks-pending comes in 134 135The configuration status quo of any file in xnu is to build with no options 136relating to -fbounds-safety. In this mode, -fbounds-safety's `__indexable` and 137`__bidi_indexable` keywords are **undefined**. It is a syntax error to use them. 138This is because `__indexable` and `__bidi_indexable` pointers are not 139ABI-compatible with plain C: if they were defined to nothing instead, and a use 140of `__indexable` or `__bidi_indexable` slipped into a header used by a set of 141files heterogeneously enabling -fbounds-safety, they could cause ABI breaks that 142would manifest as opaque runtime crashes instead of compile-time errors. 143 144However, adopting -fbounds-safety may require the explicit use of `__indexable` 145or `__bidi_indexable` pointers that are confined to the file being modified. 146Until `bound-checks-soft` is enabled, it must still be possible to build that 147file without -fbounds-safety. This is where `bound-checks-pending` comes in: 148this flag causes `__indexable` and `__bidi_indexable` to expand to nothing, and 149it disables warnings that will frequently trip in plain C files that are 150compatible with -fbounds-safety (such as -Wself-assign). This allows files that 151are compatible with -fbounds-safety to continue to build without it, while 152minimizing the risk of ABI incompatibilities. 153 154### Where bound-checks-soft comes in 155 156Using `bound-checks-soft` means that if a problem slips through qualification, 157the kernel is still probably livable. A kernel that is unlivable due to panics 158creates significant drag over the entire software development organization, and 159fixing it will be a same-day emergency that you will need to firefight and then 160root-cause. This **will** take precedence over any other work that you could 161rather be doing. On the other hand, "soft traps" generate telemetry without 162panicking. Kernels with known soft trap triggers are un-shippable, but they may 163still be livable. As a result, fixing these problems is merely very important. 164 165`bound-checks-soft` is enabled separately from the code change because even 166though `bound-checks-soft` is ideally non-fatal, failing a bounds check in 167certain conditions can still result in an un-livable kernel (for instance, 168if a check fails in a long, tight loop). If such a serious issue slips into 169qualification, integrators only need to back out the `bound-checks-soft` change 170(falling back to `bound-checks-pending`) instead of reverting your entire 171change. Reverting entire changes is a very destructive integration action: any 172_other_ changes that rely on your modifications may need to be cascaded out of 173the build as well. Given unfortunate-enough timing, there _may not be time_ to 174re-nominate feature work that must be backed out. Significant -fbounds-safety 175adoption experience in xnu and other projects has taught us that bundling in 176non-trivial code changes with the enablement of -fbounds-safety is a recipe for 177sadness and reprised work. 178 179### Where bound-checks-seed comes in 180 181If you want to enable `bound-checks` for internal users but want to use 182`bound-checks-soft` for external users in order to collect telemetry 183(e.g. during seeding), use `bound-checks-seed`. 184The expectation is that, once the telemetry is collected, you will change the 185file to `bound-checks` or disable -fbounds-safety. 186Due to security concerns, namely non fatal traps, `bound-checks-seed` 187is not meant to be shipped to customers outside of seeding. 188 189## Step 3: enable bound-checks 190 191We let changes with `bound-checks-soft` steep in internal releases to build up 192confidence that bounds checks don't trip during regular operations. During this 193period, failing bounds checks create telemetry events that are collected by 194XNU engineers instead of bringing down the system. Although failing bounds 195checks are never desirable, it is better to catch them at that stage than at any 196point after. 197 198Once we have confidence that a file doesn't cause issues when -fbounds-safety is 199enabled, we can change `bound-checks-soft` to the plain `bound-checks`. This is 200simply done with another pull request. 201 202Read "where bound-checks-seed comes in" for a different approach if you need 203a higher confidence level before enabling `bound-checks`. 204 205