xref: /xnu-12377.1.9/doc/building/bound_checks.md (revision f6217f891ac0bb64f3d375211650a4c1ff8ca1ea)
1# Bounds checking
2
3The goal of -fbounds-safety is to prevent buffer overflows from escalating into
4security issues. However, that escalation is prevented by crashing the program,
5which, in the case of the kernel, means panicking the system. While panicking
6is a lesser evil than allowing an attacker to compromise the system, it is
7still a drastic measure.
8
9xnu's build system supports several options controlling the enforcement of
10bounds checks via clang's -fbounds-safety extension. This document describes a
11process that implements our best practices to adopt -fbounds-safety in existing
12code.
13
14# Controllable aspects of -fbounds-safety
15
16-fbounds-safety is enabled at a file granularity in the xnu build system.
17Whether a given file builds with -fbounds-safety is controlled by the build
18system's configuration `files` under each kernel component. For instance, one of
19the first components in xnu to enable -fbounds-safety is bsd/net: as a result,
20bsd/conf/files is where build system modifications were made.
21
22There are five options that control aspects related to -fbounds-safety:
23
24* whether -fbounds-safety is enabled at all;
25* when it is disabled, whether we should allow `__indexable` and
26  `__bidi_indexable` in source (or emit a compile-time error if they're used);
27* when it is enabled, whether a trap should be a panic, or whether it should
28  only report a telemetry event;
29* when it is set to panic, whether we should optimize for code size at the
30  expense of the quality of debug information.
31* which set of new bounds checks (`-fbounds-safety-bringup-missing-checks`) are enabled
32
33## Code size tradeoffs
34
35We can ask clang to give us one trap instruction per function, which can have
36significant positive effects on code size and performance. However, every bounds
37check in that function will jump to that trap instruction when they fail. Debug
38information on the trap instruction will be meaningless and the debugger won't
39know where we came from. This manifests as a "function(), file.c:0" call stack
40entry in the backtrace.
41
42On the other hand, we can ask clang to give us one trap instruction per bounds
43check. In that configuration, we get arguably bad codegen, but the backtrace is
44always immediately readable and the trap location shows correctly in the
45debugger.
46
47To debug a panic in a build optimizing for code size, we can read disassembly
48and make inferences based on register values. For instance, if we look at one
49bounds check failing if register `x8` is greater than register `x9`, and in the
50context of our panic we know that `x8` is 0x0 and `x9` is 0x1000, then we know
51we can't possibly have failed because of that bounds check. There are scripts
52to automate this reasoning–ask the -fbounds-safety DRIs for help if you run into
53this situation.
54
55## Bounds checking adoption level options
56
57* (nothing): -fbounds-safety is disabled; it is an error to use `__indexable`
58  and `__bidi_indexable` in source.
59* `bound-checks-pending`: -fbounds-safety is disabled, but `__indexable` and
60  `__bidi_indexable` are defined to nothing instead of causing compile-time
61  errors.
62* `bound-checks`: -fbounds-safety is enabled; failing bounds checks panic;
63  optimize for code size at the detriment of debuggability.
64* `bound-checks-debug`: -fbounds-safety is enabled; failing bounds checks panic;
65  optimize for debug information at the detriment of efficient code.
66* `bound-checks-soft`: -fbounds-safety is disabled for RELEASE kernels;
67  for all other kernel configurations failing bounds checks generate a telemetry event
68  instead of panicking; optimize for debug information at the detriment of efficient code.
69* `bound-checks-seed`: -fbounds-safety is enabled. For RELEASE kernels, failing checks
70  generate a telemetry event instead of panicking; for all other kernel configurations
71  failing bound checks panic.
72
73These options are mutually exclusive.
74
75### Bounds checking adoption level modifier options
76
77In addition to the bounds checking adoption level options (e.g.
78`bounds-checks-debug`), modifier options can be added to the selected adoption
79level. Note it is invalid to use these options without first specifying a
80`bound-check*` level option (i.e. any level except "nothing").
81Furthermore, the bound-check level option must appear before any modifiers (see examples below).
82
83* `bound-checks-new-checks`: If building with `-fbounds-safety` this causes
84  `-fbounds-safety-bringup-missing-checks` to be added to the compiler flags.
85
86Examples:
87
88```
89# ok: `-fbounds-safety -fbounds-safety-bringup-missing-checks` passed to compiler
90test.c optional bounds-checks bound-checks-new-checks
91
92# invalid: An adoption level that's not "nothing" needs to be specified
93test.c optional bound-checks-new-checks
94
95# invalid: `bounds-checks` needs to be specified first
96test.c optional bound-checks-new-checks bounds-checks
97```
98
99# The process of enabling bound-checks
100
101`bound-checks` is the final, desirable bounds checking adoption level
102configuration option. We do not enable `bound-checks` lightly, as it can
103introduce new reasons that xnu panics. We have found that the following process
104consistently helps land code changes that stick, and help reduce the likelihood
105of introducing problems that turn into bad kernels.
106
107## Step 1: adopt -fbounds-safety at desk
108
109When enabling -fbounds-safety, clang generates new diagnostics that ensure at
110compile-time that bounds could be known at runtime (if necessary) for all
111pointers, and new diagnostics for when a bounds check is likely (or guaranteed)
112to fail at runtime. The first step to adopting -fbounds-safety is making code
113changes to xnu such that it builds without any diagnostics, and testing at desk
114that your changes did not impact kernel functionality.
115
116For this step, you use `bound-checks-debug`. `bound-checks-debug` enables the
117entire breadth of -fbounds-safety diagnostics and gives you the most easily
118debugged bounds checks. You should also use bound-checks-debug for xnu changes
119that you send to integration testing.
120
121## Step 2: separate adoption from enablement
122
123Once you're confident in your code changes, everything builds, at-desk testing
124is successful and integration testing is happy, you start two pull requests:
125
126* one pull request with the necessary adoption code changes, configuring the
127  file to build with `bound-checks-pending`;
128* one pull request that changes `bound-checks-pending` to `bound-checks-soft`.
129
130This strategy can save your change and other people's changes even in the face
131of small errors. Read on to "where bound-checks-soft comes in" for more details.
132
133### Where bound-checks-pending comes in
134
135The configuration status quo of any file in xnu is to build with no options
136relating to -fbounds-safety. In this mode, -fbounds-safety's `__indexable` and
137`__bidi_indexable` keywords are **undefined**. It is a syntax error to use them.
138This is because `__indexable` and `__bidi_indexable` pointers are not
139ABI-compatible with plain C: if they were defined to nothing instead, and a use
140of `__indexable` or `__bidi_indexable` slipped into a header used by a set of
141files heterogeneously enabling -fbounds-safety, they could cause ABI breaks that
142would manifest as opaque runtime crashes instead of compile-time errors.
143
144However, adopting -fbounds-safety may require the explicit use of `__indexable`
145or `__bidi_indexable` pointers that are confined to the file being modified.
146Until `bound-checks-soft` is enabled, it must still be possible to build that
147file without -fbounds-safety. This is where `bound-checks-pending` comes in:
148this flag causes `__indexable` and `__bidi_indexable` to expand to nothing, and
149it disables warnings that will frequently trip in plain C files that are
150compatible with -fbounds-safety (such as -Wself-assign). This allows files that
151are compatible with -fbounds-safety to continue to build without it, while
152minimizing the risk of ABI incompatibilities.
153
154### Where bound-checks-soft comes in
155
156Using `bound-checks-soft` means that if a problem slips through qualification,
157the kernel is still probably livable. A kernel that is unlivable due to panics
158creates significant drag over the entire software development organization, and
159fixing it will be a same-day emergency that you will need to firefight and then
160root-cause. This **will** take precedence over any other work that you could
161rather be doing. On the other hand, "soft traps" generate telemetry without
162panicking. Kernels with known soft trap triggers are un-shippable, but they may
163still be livable. As a result, fixing these problems is merely very important.
164
165`bound-checks-soft` is enabled separately from the code change because even
166though `bound-checks-soft` is ideally non-fatal, failing a bounds check in
167certain conditions can still result in an un-livable kernel (for instance,
168if a check fails in a long, tight loop). If such a serious issue slips into
169qualification, integrators only need to back out the `bound-checks-soft` change
170(falling back to `bound-checks-pending`) instead of reverting your entire
171change. Reverting entire changes is a very destructive integration action: any
172_other_ changes that rely on your modifications may need to be cascaded out of
173the build as well. Given unfortunate-enough timing, there _may not be time_ to
174re-nominate feature work that must be backed out. Significant -fbounds-safety
175adoption experience in xnu and other projects has taught us that bundling in
176non-trivial code changes with the enablement of -fbounds-safety is a recipe for
177sadness and reprised work.
178
179### Where bound-checks-seed comes in
180
181If you want to enable `bound-checks` for internal users but want to use
182`bound-checks-soft` for external users in order to collect telemetry
183(e.g. during seeding), use `bound-checks-seed`.
184The expectation is that, once the telemetry is collected, you will change the
185file to `bound-checks` or disable -fbounds-safety.
186Due to security concerns, namely non fatal traps, `bound-checks-seed`
187is not meant to be shipped to customers outside of seeding.
188
189## Step 3: enable bound-checks
190
191We let changes with `bound-checks-soft` steep in internal releases to build up
192confidence that bounds checks don't trip during regular operations. During this
193period, failing bounds checks create telemetry events that are collected by
194XNU engineers instead of bringing down the system. Although failing bounds
195checks are never desirable, it is better to catch them at that stage than at any
196point after.
197
198Once we have confidence that a file doesn't cause issues when -fbounds-safety is
199enabled, we can change `bound-checks-soft` to the plain `bound-checks`. This is
200simply done with another pull request.
201
202Read "where bound-checks-seed comes in" for a different approach if you need
203a higher confidence level before enabling `bound-checks`.
204
205