1# Bounds checking 2 3The goal of -fbounds-safety is to prevent buffer overflows from escalating into 4security issues. However, that escalation is prevented by crashing the program, 5which, in the case of the kernel, means panicking the system. While panicking 6is a lesser evil than allowing an attacker to compromise the system, it is 7still a drastic measure. 8 9xnu's build system supports several options controlling the enforcement of 10bounds checks via clang's -fbounds-safety extension. This document describes a 11process that implements our best practices to adopt -fbounds-safety in existing 12code. 13 14# Controllable aspects of -fbounds-safety 15 16-fbounds-safety is enabled at a file granularity in the xnu build system. 17Whether a given file builds with -fbounds-safety is controlled by the build 18system's configuration `files` under each kernel component. For instance, one of 19the first components in xnu to enable -fbounds-safety is bsd/net: as a result, 20bsd/conf/files is where build system modifications were made. 21 22There are four options that control aspects related to -fbounds-safety: 23 24* whether -fbounds-safety is enabled at all; 25* when it is disabled, whether we should allow `__indexable` and 26 `__bidi_indexable` in source (or emit a compile-time error if they're used); 27* when it is enabled, whether a trap should be a panic, or whether it should 28 only report a telemetry event; 29* when it is set to panic, whether we should optimize for code size at the 30 expense of the quality of debug information. 31 32## Code size tradeoffs 33 34We can ask clang to give us one trap instruction per function, which can have 35significant positive effects on code size and performance. However, every bounds 36check in that function will jump to that trap instruction when they fail. Debug 37information on the trap instruction will be meaningless and the debugger won't 38know where we came from. This manifests as a "function(), file.c:0" call stack 39entry in the backtrace. 40 41On the other hand, we can ask clang to give us one trap instruction per bounds 42check. In that configuration, we get arguably bad codegen, but the backtrace is 43always immediately readable and the trap location shows correctly in the 44debugger. 45 46To debug a panic in a build optimizing for code size, we can read disassembly 47and make inferences based on register values. For instance, if we look at one 48bounds check failing if register `x8` is greater than register `x9`, and in the 49context of our panic we know that `x8` is 0x0 and `x9` is 0x1000, then we know 50we can't possibly have failed because of that bounds check. There are scripts 51to automate this reasoning–ask the -fbounds-safety DRIs for help if you run into 52this situation. 53 54## Bounds checking options 55 56* (nothing): -fbounds-safety is disabled; it is an error to use `__indexable` 57 and `__bidi_indexable` in source. 58* `bound-checks-pending`: -fbounds-safety is disabled, but `__indexable` and 59 `__bidi_indexable` are defined to nothing instead of causing compile-time 60 errors. 61* `bound-checks`: -fbounds-safety is enabled; failing bounds checks panic; 62 optimize for code size at the detriment of debuggability. 63* `bound-checks-debug`: -fbounds-safety is enabled; failing bounds checks panic; 64 optimize for debug information at the detriment of efficient code. 65* `bound-checks-soft`: -fbounds-safety is disabled for RELEASE kernels; 66 for all other kernel configurations failing bounds checks generate a telemetry event 67 instead of panicking; optimize for debug information at the detriment of efficient code. 68* `bound-checks-seed`: -fbounds-safety is enabled. For RELEASE kernels, failing checks 69 generate a telemetry event instead of panicking; for all other kernel configurations 70 failing bound checks panic. 71 72# The process of enabling bound-checks 73 74`bound-checks` is the final, desirable bounds checking configuration option. We 75do not enable `bound-checks` lightly, as it can introduce new reasons that xnu 76panics. We have found that the following process consistently helps land code 77changes that stick, and help reduce the likelihood of introducing problems that 78turn into bad kernels. 79 80## Step 1: adopt -fbounds-safety at desk 81 82When enabling -fbounds-safety, clang generates new diagnostics that ensure at 83compile-time that bounds could be known at runtime (if necessary) for all 84pointers, and new diagnostics for when a bounds check is likely (or guaranteed) 85to fail at runtime. The first step to adopting -fbounds-safety is making code 86changes to xnu such that it builds without any diagnostics, and testing at desk 87that your changes did not impact kernel functionality. 88 89For this step, you use `bound-checks-debug`. `bound-checks-debug` enables the 90entire breadth of -fbounds-safety diagnostics and gives you the most easily 91debugged bounds checks. You should also use bound-checks-debug for xnu changes 92that you send to integration testing. 93 94## Step 2: separate adoption from enablement 95 96Once you're confident in your code changes, everything builds, at-desk testing 97is successful and integration testing is happy, you start two pull requests: 98 99* one pull request with the necessary adoption code changes, configuring the 100 file to build with `bound-checks-pending`; 101* one pull request that changes `bound-checks-pending` to `bound-checks-soft`. 102 103This strategy can save your change and other people's changes even in the face 104of small errors. Read on to "where bound-checks-soft comes in" for more details. 105 106### Where bound-checks-pending comes in 107 108The configuration status quo of any file in xnu is to build with no options 109relating to -fbounds-safety. In this mode, -fbounds-safety's `__indexable` and 110`__bidi_indexable` keywords are **undefined**. It is a syntax error to use them. 111This is because `__indexable` and `__bidi_indexable` pointers are not 112ABI-compatible with plain C: if they were defined to nothing instead, and a use 113of `__indexable` or `__bidi_indexable` slipped into a header used by a set of 114files heterogeneously enabling -fbounds-safety, they could cause ABI breaks that 115would manifest as opaque runtime crashes instead of compile-time errors. 116 117However, adopting -fbounds-safety may require the explicit use of `__indexable` 118or `__bidi_indexable` pointers that are confined to the file being modified. 119Until `bound-checks-soft` is enabled, it must still be possible to build that 120file without -fbounds-safety. This is where `bound-checks-pending` comes in: 121this flag causes `__indexable` and `__bidi_indexable` to expand to nothing, and 122it disables warnings that will frequently trip in plain C files that are 123compatible with -fbounds-safety (such as -Wself-assign). This allows files that 124are compatible with -fbounds-safety to continue to build without it, while 125minimizing the risk of ABI incompatibilities. 126 127### Where bound-checks-soft comes in 128 129Using `bound-checks-soft` means that if a problem slips through qualification, 130the kernel is still probably livable. A kernel that is unlivable due to panics 131creates significant drag over the entire software development organization, and 132fixing it will be a same-day emergency that you will need to firefight and then 133root-cause. This **will** take precedence over any other work that you could 134rather be doing. On the other hand, "soft traps" generate telemetry without 135panicking. Kernels with known soft trap triggers are un-shippable, but they may 136still be livable. As a result, fixing these problems is merely very important. 137 138`bound-checks-soft` is enabled separately from the code change because even 139though `bound-checks-soft` is ideally non-fatal, failing a bounds check in 140certain conditions can still result in an un-livable kernel (for instance, 141if a check fails in a long, tight loop). If such a serious issue slips into 142qualification, integrators only need to back out the `bound-checks-soft` change 143(falling back to `bound-checks-pending`) instead of reverting your entire 144change. Reverting entire changes is a very destructive integration action: any 145_other_ changes that rely on your modifications may need to be cascaded out of 146the build as well. Given unfortunate-enough timing, there _may not be time_ to 147re-nominate feature work that must be backed out. Significant -fbounds-safety 148adoption experience in xnu and other projects has taught us that bundling in 149non-trivial code changes with the enablement of -fbounds-safety is a recipe for 150sadness and reprised work. 151 152### Where bound-checks-seed comes in 153 154If you want to enable `bound-checks` for internal users but want to use 155`bound-checks-soft` for external users in order to collect telemetry 156(e.g. during seeding), use `bound-checks-seed`. 157The expectation is that, once the telemetry is collected, you will change the 158file to `bound-checks` or disable -fbounds-safety. 159Due to security concerns, namely non fatal traps, `bound-checks-seed` 160is not meant to be shipped to customers outside of seeding. 161 162## Step 3: enable bound-checks 163 164We let changes with `bound-checks-soft` steep in internal releases to build up 165confidence that bounds checks don't trip during regular operations. During this 166period, failing bounds checks create telemetry events that are collected by 167XNU engineers instead of bringing down the system. Although failing bounds 168checks are never desirable, it is better to catch them at that stage than at any 169point after. 170 171Once we have confidence that a file doesn't cause issues when -fbounds-safety is 172enabled, we can change `bound-checks-soft` to the plain `bound-checks`. This is 173simply done with another pull request. 174 175Read "where bound-checks-seed comes in" for a different approach if you need 176a higher confidence level before enabling `bound-checks`. 177