1*4f1223e8SApple OSS Distributions# Bounds checking 2*4f1223e8SApple OSS Distributions 3*4f1223e8SApple OSS DistributionsThe goal of -fbounds-safety is to prevent buffer overflows from escalating into 4*4f1223e8SApple OSS Distributionssecurity issues. However, that escalation is prevented by crashing the program, 5*4f1223e8SApple OSS Distributionswhich, in the case of the kernel, means panicking the system. While panicking 6*4f1223e8SApple OSS Distributionsis a lesser evil than allowing an attacker to compromise the system, it is 7*4f1223e8SApple OSS Distributionsstill a drastic measure. 8*4f1223e8SApple OSS Distributions 9*4f1223e8SApple OSS Distributionsxnu's build system supports several options controlling the enforcement of 10*4f1223e8SApple OSS Distributionsbounds checks via clang's -fbounds-safety extension. This document describes a 11*4f1223e8SApple OSS Distributionsprocess that implements our best practices to adopt -fbounds-safety in existing 12*4f1223e8SApple OSS Distributionscode. 13*4f1223e8SApple OSS Distributions 14*4f1223e8SApple OSS Distributions# Controllable aspects of -fbounds-safety 15*4f1223e8SApple OSS Distributions 16*4f1223e8SApple OSS Distributions-fbounds-safety is enabled at a file granularity in the xnu build system. 17*4f1223e8SApple OSS DistributionsWhether a given file builds with -fbounds-safety is controlled by the build 18*4f1223e8SApple OSS Distributionssystem's configuration `files` under each kernel component. For instance, one of 19*4f1223e8SApple OSS Distributionsthe first components in xnu to enable -fbounds-safety is bsd/net: as a result, 20*4f1223e8SApple OSS Distributionsbsd/conf/files is where build system modifications were made. 21*4f1223e8SApple OSS Distributions 22*4f1223e8SApple OSS DistributionsThere are four options that control aspects related to -fbounds-safety: 23*4f1223e8SApple OSS Distributions 24*4f1223e8SApple OSS Distributions* whether -fbounds-safety is enabled at all; 25*4f1223e8SApple OSS Distributions* when it is disabled, whether we should allow `__indexable` and 26*4f1223e8SApple OSS Distributions `__bidi_indexable` in source (or emit a compile-time error if they're used); 27*4f1223e8SApple OSS Distributions* when it is enabled, whether a trap should be a panic, or whether it should 28*4f1223e8SApple OSS Distributions only report a telemetry event; 29*4f1223e8SApple OSS Distributions* when it is set to panic, whether we should optimize for code size at the 30*4f1223e8SApple OSS Distributions expense of the quality of debug information. 31*4f1223e8SApple OSS Distributions 32*4f1223e8SApple OSS Distributions## Code size tradeoffs 33*4f1223e8SApple OSS Distributions 34*4f1223e8SApple OSS DistributionsWe can ask clang to give us one trap instruction per function, which can have 35*4f1223e8SApple OSS Distributionssignificant positive effects on code size and performance. However, every bounds 36*4f1223e8SApple OSS Distributionscheck in that function will jump to that trap instruction when they fail. Debug 37*4f1223e8SApple OSS Distributionsinformation on the trap instruction will be meaningless and the debugger won't 38*4f1223e8SApple OSS Distributionsknow where we came from. This manifests as a "function(), file.c:0" call stack 39*4f1223e8SApple OSS Distributionsentry in the backtrace. 40*4f1223e8SApple OSS Distributions 41*4f1223e8SApple OSS DistributionsOn the other hand, we can ask clang to give us one trap instruction per bounds 42*4f1223e8SApple OSS Distributionscheck. In that configuration, we get arguably bad codegen, but the backtrace is 43*4f1223e8SApple OSS Distributionsalways immediately readable and the trap location shows correctly in the 44*4f1223e8SApple OSS Distributionsdebugger. 45*4f1223e8SApple OSS Distributions 46*4f1223e8SApple OSS DistributionsTo debug a panic in a build optimizing for code size, we can read disassembly 47*4f1223e8SApple OSS Distributionsand make inferences based on register values. For instance, if we look at one 48*4f1223e8SApple OSS Distributionsbounds check failing if register `x8` is greater than register `x9`, and in the 49*4f1223e8SApple OSS Distributionscontext of our panic we know that `x8` is 0x0 and `x9` is 0x1000, then we know 50*4f1223e8SApple OSS Distributionswe can't possibly have failed because of that bounds check. There are scripts 51*4f1223e8SApple OSS Distributionsto automate this reasoning–ask the -fbounds-safety DRIs for help if you run into 52*4f1223e8SApple OSS Distributionsthis situation. 53*4f1223e8SApple OSS Distributions 54*4f1223e8SApple OSS Distributions## Bounds checking options 55*4f1223e8SApple OSS Distributions 56*4f1223e8SApple OSS Distributions* (nothing): -fbounds-safety is disabled; it is an error to use `__indexable` 57*4f1223e8SApple OSS Distributions and `__bidi_indexable` in source. 58*4f1223e8SApple OSS Distributions* `bound-checks-pending`: -fbounds-safety is disabled, but `__indexable` and 59*4f1223e8SApple OSS Distributions `__bidi_indexable` are defined to nothing instead of causing compile-time 60*4f1223e8SApple OSS Distributions errors. 61*4f1223e8SApple OSS Distributions* `bound-checks`: -fbounds-safety is enabled; failing bounds checks panic; 62*4f1223e8SApple OSS Distributions optimize for code size at the detriment of debuggability. 63*4f1223e8SApple OSS Distributions* `bound-checks-debug`: -fbounds-safety is enabled; failing bounds checks panic; 64*4f1223e8SApple OSS Distributions optimize for debug information at the detriment of efficient code. 65*4f1223e8SApple OSS Distributions* `bound-checks-soft`: -fbounds-safety is disabled for RELEASE kernels; 66*4f1223e8SApple OSS Distributions for all other kernel configurations failing bounds checks generate a telemetry event 67*4f1223e8SApple OSS Distributions instead of panicking; optimize for debug information at the detriment of efficient code. 68*4f1223e8SApple OSS Distributions* `bound-checks-seed`: -fbounds-safety is enabled. For RELEASE kernels, failing checks 69*4f1223e8SApple OSS Distributions generate a telemetry event instead of panicking; for all other kernel configurations 70*4f1223e8SApple OSS Distributions failing bound checks panic. 71*4f1223e8SApple OSS Distributions 72*4f1223e8SApple OSS Distributions# The process of enabling bound-checks 73*4f1223e8SApple OSS Distributions 74*4f1223e8SApple OSS Distributions`bound-checks` is the final, desirable bounds checking configuration option. We 75*4f1223e8SApple OSS Distributionsdo not enable `bound-checks` lightly, as it can introduce new reasons that xnu 76*4f1223e8SApple OSS Distributionspanics. We have found that the following process consistently helps land code 77*4f1223e8SApple OSS Distributionschanges that stick, and help reduce the likelihood of introducing problems that 78*4f1223e8SApple OSS Distributionsturn into bad kernels. 79*4f1223e8SApple OSS Distributions 80*4f1223e8SApple OSS Distributions## Step 1: adopt -fbounds-safety at desk 81*4f1223e8SApple OSS Distributions 82*4f1223e8SApple OSS DistributionsWhen enabling -fbounds-safety, clang generates new diagnostics that ensure at 83*4f1223e8SApple OSS Distributionscompile-time that bounds could be known at runtime (if necessary) for all 84*4f1223e8SApple OSS Distributionspointers, and new diagnostics for when a bounds check is likely (or guaranteed) 85*4f1223e8SApple OSS Distributionsto fail at runtime. The first step to adopting -fbounds-safety is making code 86*4f1223e8SApple OSS Distributionschanges to xnu such that it builds without any diagnostics, and testing at desk 87*4f1223e8SApple OSS Distributionsthat your changes did not impact kernel functionality. 88*4f1223e8SApple OSS Distributions 89*4f1223e8SApple OSS DistributionsFor this step, you use `bound-checks-debug`. `bound-checks-debug` enables the 90*4f1223e8SApple OSS Distributionsentire breadth of -fbounds-safety diagnostics and gives you the most easily 91*4f1223e8SApple OSS Distributionsdebugged bounds checks. You should also use bound-checks-debug for xnu changes 92*4f1223e8SApple OSS Distributionsthat you send to integration testing. 93*4f1223e8SApple OSS Distributions 94*4f1223e8SApple OSS Distributions## Step 2: separate adoption from enablement 95*4f1223e8SApple OSS Distributions 96*4f1223e8SApple OSS DistributionsOnce you're confident in your code changes, everything builds, at-desk testing 97*4f1223e8SApple OSS Distributionsis successful and integration testing is happy, you start two pull requests: 98*4f1223e8SApple OSS Distributions 99*4f1223e8SApple OSS Distributions* one pull request with the necessary adoption code changes, configuring the 100*4f1223e8SApple OSS Distributions file to build with `bound-checks-pending`; 101*4f1223e8SApple OSS Distributions* one pull request that changes `bound-checks-pending` to `bound-checks-soft`. 102*4f1223e8SApple OSS Distributions 103*4f1223e8SApple OSS DistributionsThis strategy can save your change and other people's changes even in the face 104*4f1223e8SApple OSS Distributionsof small errors. Read on to "where bound-checks-soft comes in" for more details. 105*4f1223e8SApple OSS Distributions 106*4f1223e8SApple OSS Distributions### Where bound-checks-pending comes in 107*4f1223e8SApple OSS Distributions 108*4f1223e8SApple OSS DistributionsThe configuration status quo of any file in xnu is to build with no options 109*4f1223e8SApple OSS Distributionsrelating to -fbounds-safety. In this mode, -fbounds-safety's `__indexable` and 110*4f1223e8SApple OSS Distributions`__bidi_indexable` keywords are **undefined**. It is a syntax error to use them. 111*4f1223e8SApple OSS DistributionsThis is because `__indexable` and `__bidi_indexable` pointers are not 112*4f1223e8SApple OSS DistributionsABI-compatible with plain C: if they were defined to nothing instead, and a use 113*4f1223e8SApple OSS Distributionsof `__indexable` or `__bidi_indexable` slipped into a header used by a set of 114*4f1223e8SApple OSS Distributionsfiles heterogeneously enabling -fbounds-safety, they could cause ABI breaks that 115*4f1223e8SApple OSS Distributionswould manifest as opaque runtime crashes instead of compile-time errors. 116*4f1223e8SApple OSS Distributions 117*4f1223e8SApple OSS DistributionsHowever, adopting -fbounds-safety may require the explicit use of `__indexable` 118*4f1223e8SApple OSS Distributionsor `__bidi_indexable` pointers that are confined to the file being modified. 119*4f1223e8SApple OSS DistributionsUntil `bound-checks-soft` is enabled, it must still be possible to build that 120*4f1223e8SApple OSS Distributionsfile without -fbounds-safety. This is where `bound-checks-pending` comes in: 121*4f1223e8SApple OSS Distributionsthis flag causes `__indexable` and `__bidi_indexable` to expand to nothing, and 122*4f1223e8SApple OSS Distributionsit disables warnings that will frequently trip in plain C files that are 123*4f1223e8SApple OSS Distributionscompatible with -fbounds-safety (such as -Wself-assign). This allows files that 124*4f1223e8SApple OSS Distributionsare compatible with -fbounds-safety to continue to build without it, while 125*4f1223e8SApple OSS Distributionsminimizing the risk of ABI incompatibilities. 126*4f1223e8SApple OSS Distributions 127*4f1223e8SApple OSS Distributions### Where bound-checks-soft comes in 128*4f1223e8SApple OSS Distributions 129*4f1223e8SApple OSS DistributionsUsing `bound-checks-soft` means that if a problem slips through qualification, 130*4f1223e8SApple OSS Distributionsthe kernel is still probably livable. A kernel that is unlivable due to panics 131*4f1223e8SApple OSS Distributionscreates significant drag over the entire software development organization, and 132*4f1223e8SApple OSS Distributionsfixing it will be a same-day emergency that you will need to firefight and then 133*4f1223e8SApple OSS Distributionsroot-cause. This **will** take precedence over any other work that you could 134*4f1223e8SApple OSS Distributionsrather be doing. On the other hand, "soft traps" generate telemetry without 135*4f1223e8SApple OSS Distributionspanicking. Kernels with known soft trap triggers are un-shippable, but they may 136*4f1223e8SApple OSS Distributionsstill be livable. As a result, fixing these problems is merely very important. 137*4f1223e8SApple OSS Distributions 138*4f1223e8SApple OSS Distributions`bound-checks-soft` is enabled separately from the code change because even 139*4f1223e8SApple OSS Distributionsthough `bound-checks-soft` is ideally non-fatal, failing a bounds check in 140*4f1223e8SApple OSS Distributionscertain conditions can still result in an un-livable kernel (for instance, 141*4f1223e8SApple OSS Distributionsif a check fails in a long, tight loop). If such a serious issue slips into 142*4f1223e8SApple OSS Distributionsqualification, integrators only need to back out the `bound-checks-soft` change 143*4f1223e8SApple OSS Distributions(falling back to `bound-checks-pending`) instead of reverting your entire 144*4f1223e8SApple OSS Distributionschange. Reverting entire changes is a very destructive integration action: any 145*4f1223e8SApple OSS Distributions_other_ changes that rely on your modifications may need to be cascaded out of 146*4f1223e8SApple OSS Distributionsthe build as well. Given unfortunate-enough timing, there _may not be time_ to 147*4f1223e8SApple OSS Distributionsre-nominate feature work that must be backed out. Significant -fbounds-safety 148*4f1223e8SApple OSS Distributionsadoption experience in xnu and other projects has taught us that bundling in 149*4f1223e8SApple OSS Distributionsnon-trivial code changes with the enablement of -fbounds-safety is a recipe for 150*4f1223e8SApple OSS Distributionssadness and reprised work. 151*4f1223e8SApple OSS Distributions 152*4f1223e8SApple OSS Distributions### Where bound-checks-seed comes in 153*4f1223e8SApple OSS Distributions 154*4f1223e8SApple OSS DistributionsIf you want to enable `bound-checks` for internal users but want to use 155*4f1223e8SApple OSS Distributions`bound-checks-soft` for external users in order to collect telemetry 156*4f1223e8SApple OSS Distributions(e.g. during seeding), use `bound-checks-seed`. 157*4f1223e8SApple OSS DistributionsThe expectation is that, once the telemetry is collected, you will change the 158*4f1223e8SApple OSS Distributionsfile to `bound-checks` or disable -fbounds-safety. 159*4f1223e8SApple OSS DistributionsDue to security concerns, namely non fatal traps, `bound-checks-seed` 160*4f1223e8SApple OSS Distributionsis not meant to be shipped to customers outside of seeding. 161*4f1223e8SApple OSS Distributions 162*4f1223e8SApple OSS Distributions## Step 3: enable bound-checks 163*4f1223e8SApple OSS Distributions 164*4f1223e8SApple OSS DistributionsWe let changes with `bound-checks-soft` steep in internal releases to build up 165*4f1223e8SApple OSS Distributionsconfidence that bounds checks don't trip during regular operations. During this 166*4f1223e8SApple OSS Distributionsperiod, failing bounds checks create telemetry events that are collected by 167*4f1223e8SApple OSS DistributionsXNU engineers instead of bringing down the system. Although failing bounds 168*4f1223e8SApple OSS Distributionschecks are never desirable, it is better to catch them at that stage than at any 169*4f1223e8SApple OSS Distributionspoint after. 170*4f1223e8SApple OSS Distributions 171*4f1223e8SApple OSS DistributionsOnce we have confidence that a file doesn't cause issues when -fbounds-safety is 172*4f1223e8SApple OSS Distributionsenabled, we can change `bound-checks-soft` to the plain `bound-checks`. This is 173*4f1223e8SApple OSS Distributionssimply done with another pull request. 174*4f1223e8SApple OSS Distributions 175*4f1223e8SApple OSS DistributionsRead "where bound-checks-seed comes in" for a different approach if you need 176*4f1223e8SApple OSS Distributionsa higher confidence level before enabling `bound-checks`. 177