# Bounds checking The goal of -fbounds-safety is to prevent buffer overflows from escalating into security issues. However, that escalation is prevented by crashing the program, which, in the case of the kernel, means panicking the system. While panicking is a lesser evil than allowing an attacker to compromise the system, it is still a drastic measure. xnu's build system supports several options controlling the enforcement of bounds checks via clang's -fbounds-safety extension. This document describes a process that implements our best practices to adopt -fbounds-safety in existing code. # Controllable aspects of -fbounds-safety -fbounds-safety is enabled at a file granularity in the xnu build system. Whether a given file builds with -fbounds-safety is controlled by the build system's configuration `files` under each kernel component. For instance, one of the first components in xnu to enable -fbounds-safety is bsd/net: as a result, bsd/conf/files is where build system modifications were made. There are four options that control aspects related to -fbounds-safety: * whether -fbounds-safety is enabled at all; * when it is disabled, whether we should allow `__indexable` and `__bidi_indexable` in source (or emit a compile-time error if they're used); * when it is enabled, whether a trap should be a panic, or whether it should only report a telemetry event; * when it is set to panic, whether we should optimize for code size at the expense of the quality of debug information. ## Code size tradeoffs We can ask clang to give us one trap instruction per function, which can have significant positive effects on code size and performance. However, every bounds check in that function will jump to that trap instruction when they fail. Debug information on the trap instruction will be meaningless and the debugger won't know where we came from. This manifests as a "function(), file.c:0" call stack entry in the backtrace. On the other hand, we can ask clang to give us one trap instruction per bounds check. In that configuration, we get arguably bad codegen, but the backtrace is always immediately readable and the trap location shows correctly in the debugger. To debug a panic in a build optimizing for code size, we can read disassembly and make inferences based on register values. For instance, if we look at one bounds check failing if register `x8` is greater than register `x9`, and in the context of our panic we know that `x8` is 0x0 and `x9` is 0x1000, then we know we can't possibly have failed because of that bounds check. There are scripts to automate this reasoning–ask the -fbounds-safety DRIs for help if you run into this situation. ## Bounds checking options * (nothing): -fbounds-safety is disabled; it is an error to use `__indexable` and `__bidi_indexable` in source. * `bound-checks-pending`: -fbounds-safety is disabled, but `__indexable` and `__bidi_indexable` are defined to nothing instead of causing compile-time errors. * `bound-checks`: -fbounds-safety is enabled; failing bounds checks panic; optimize for code size at the detriment of debuggability. * `bound-checks-debug`: -fbounds-safety is enabled; failing bounds checks panic; optimize for debug information at the detriment of efficient code. * `bound-checks-soft`: -fbounds-safety is disabled for RELEASE kernels; for all other kernel configurations failing bounds checks generate a telemetry event instead of panicking; optimize for debug information at the detriment of efficient code. * `bound-checks-seed`: -fbounds-safety is enabled. For RELEASE kernels, failing checks generate a telemetry event instead of panicking; for all other kernel configurations failing bound checks panic. # The process of enabling bound-checks `bound-checks` is the final, desirable bounds checking configuration option. We do not enable `bound-checks` lightly, as it can introduce new reasons that xnu panics. We have found that the following process consistently helps land code changes that stick, and help reduce the likelihood of introducing problems that turn into bad kernels. ## Step 1: adopt -fbounds-safety at desk When enabling -fbounds-safety, clang generates new diagnostics that ensure at compile-time that bounds could be known at runtime (if necessary) for all pointers, and new diagnostics for when a bounds check is likely (or guaranteed) to fail at runtime. The first step to adopting -fbounds-safety is making code changes to xnu such that it builds without any diagnostics, and testing at desk that your changes did not impact kernel functionality. For this step, you use `bound-checks-debug`. `bound-checks-debug` enables the entire breadth of -fbounds-safety diagnostics and gives you the most easily debugged bounds checks. You should also use bound-checks-debug for xnu changes that you send to integration testing. ## Step 2: separate adoption from enablement Once you're confident in your code changes, everything builds, at-desk testing is successful and integration testing is happy, you start two pull requests: * one pull request with the necessary adoption code changes, configuring the file to build with `bound-checks-pending`; * one pull request that changes `bound-checks-pending` to `bound-checks-soft`. This strategy can save your change and other people's changes even in the face of small errors. Read on to "where bound-checks-soft comes in" for more details. ### Where bound-checks-pending comes in The configuration status quo of any file in xnu is to build with no options relating to -fbounds-safety. In this mode, -fbounds-safety's `__indexable` and `__bidi_indexable` keywords are **undefined**. It is a syntax error to use them. This is because `__indexable` and `__bidi_indexable` pointers are not ABI-compatible with plain C: if they were defined to nothing instead, and a use of `__indexable` or `__bidi_indexable` slipped into a header used by a set of files heterogeneously enabling -fbounds-safety, they could cause ABI breaks that would manifest as opaque runtime crashes instead of compile-time errors. However, adopting -fbounds-safety may require the explicit use of `__indexable` or `__bidi_indexable` pointers that are confined to the file being modified. Until `bound-checks-soft` is enabled, it must still be possible to build that file without -fbounds-safety. This is where `bound-checks-pending` comes in: this flag causes `__indexable` and `__bidi_indexable` to expand to nothing, and it disables warnings that will frequently trip in plain C files that are compatible with -fbounds-safety (such as -Wself-assign). This allows files that are compatible with -fbounds-safety to continue to build without it, while minimizing the risk of ABI incompatibilities. ### Where bound-checks-soft comes in Using `bound-checks-soft` means that if a problem slips through qualification, the kernel is still probably livable. A kernel that is unlivable due to panics creates significant drag over the entire software development organization, and fixing it will be a same-day emergency that you will need to firefight and then root-cause. This **will** take precedence over any other work that you could rather be doing. On the other hand, "soft traps" generate telemetry without panicking. Kernels with known soft trap triggers are un-shippable, but they may still be livable. As a result, fixing these problems is merely very important. `bound-checks-soft` is enabled separately from the code change because even though `bound-checks-soft` is ideally non-fatal, failing a bounds check in certain conditions can still result in an un-livable kernel (for instance, if a check fails in a long, tight loop). If such a serious issue slips into qualification, integrators only need to back out the `bound-checks-soft` change (falling back to `bound-checks-pending`) instead of reverting your entire change. Reverting entire changes is a very destructive integration action: any _other_ changes that rely on your modifications may need to be cascaded out of the build as well. Given unfortunate-enough timing, there _may not be time_ to re-nominate feature work that must be backed out. Significant -fbounds-safety adoption experience in xnu and other projects has taught us that bundling in non-trivial code changes with the enablement of -fbounds-safety is a recipe for sadness and reprised work. ### Where bound-checks-seed comes in If you want to enable `bound-checks` for internal users but want to use `bound-checks-soft` for external users in order to collect telemetry (e.g. during seeding), use `bound-checks-seed`. The expectation is that, once the telemetry is collected, you will change the file to `bound-checks` or disable -fbounds-safety. Due to security concerns, namely non fatal traps, `bound-checks-seed` is not meant to be shipped to customers outside of seeding. ## Step 3: enable bound-checks We let changes with `bound-checks-soft` steep in internal releases to build up confidence that bounds checks don't trip during regular operations. During this period, failing bounds checks create telemetry events that are collected by XNU engineers instead of bringing down the system. Although failing bounds checks are never desirable, it is better to catch them at that stage than at any point after. Once we have confidence that a file doesn't cause issues when -fbounds-safety is enabled, we can change `bound-checks-soft` to the plain `bound-checks`. This is simply done with another pull request. Read "where bound-checks-seed comes in" for a different approach if you need a higher confidence level before enabling `bound-checks`.