xref: /xnu-11215.41.3/doc/building/bound_checks.md (revision 33de042d024d46de5ff4e89f2471de6608e37fa4)
1# Bounds checking
2
3The goal of -fbounds-safety is to prevent buffer overflows from escalating into
4security issues. However, that escalation is prevented by crashing the program,
5which, in the case of the kernel, means panicking the system. While panicking
6is a lesser evil than allowing an attacker to compromise the system, it is
7still a drastic measure.
8
9xnu's build system supports several options controlling the enforcement of
10bounds checks via clang's -fbounds-safety extension. This document describes a
11process that implements our best practices to adopt -fbounds-safety in existing
12code.
13
14# Controllable aspects of -fbounds-safety
15
16-fbounds-safety is enabled at a file granularity in the xnu build system.
17Whether a given file builds with -fbounds-safety is controlled by the build
18system's configuration `files` under each kernel component. For instance, one of
19the first components in xnu to enable -fbounds-safety is bsd/net: as a result,
20bsd/conf/files is where build system modifications were made.
21
22There are four options that control aspects related to -fbounds-safety:
23
24* whether -fbounds-safety is enabled at all;
25* when it is disabled, whether we should allow `__indexable` and
26  `__bidi_indexable` in source (or emit a compile-time error if they're used);
27* when it is enabled, whether a trap should be a panic, or whether it should
28  only report a telemetry event;
29* when it is set to panic, whether we should optimize for code size at the
30  expense of the quality of debug information.
31
32## Code size tradeoffs
33
34We can ask clang to give us one trap instruction per function, which can have
35significant positive effects on code size and performance. However, every bounds
36check in that function will jump to that trap instruction when they fail. Debug
37information on the trap instruction will be meaningless and the debugger won't
38know where we came from. This manifests as a "function(), file.c:0" call stack
39entry in the backtrace.
40
41On the other hand, we can ask clang to give us one trap instruction per bounds
42check. In that configuration, we get arguably bad codegen, but the backtrace is
43always immediately readable and the trap location shows correctly in the
44debugger.
45
46To debug a panic in a build optimizing for code size, we can read disassembly
47and make inferences based on register values. For instance, if we look at one
48bounds check failing if register `x8` is greater than register `x9`, and in the
49context of our panic we know that `x8` is 0x0 and `x9` is 0x1000, then we know
50we can't possibly have failed because of that bounds check. There are scripts
51to automate this reasoning–ask the -fbounds-safety DRIs for help if you run into
52this situation.
53
54## Bounds checking options
55
56* (nothing): -fbounds-safety is disabled; it is an error to use `__indexable`
57  and `__bidi_indexable` in source.
58* `bound-checks-pending`: -fbounds-safety is disabled, but `__indexable` and
59  `__bidi_indexable` are defined to nothing instead of causing compile-time
60  errors.
61* `bound-checks`: -fbounds-safety is enabled; failing bounds checks panic;
62  optimize for code size at the detriment of debuggability.
63* `bound-checks-debug`: -fbounds-safety is enabled; failing bounds checks panic;
64  optimize for debug information at the detriment of efficient code.
65* `bound-checks-soft`: -fbounds-safety is disabled for RELEASE kernels;
66  for all other kernel configurations failing bounds checks generate a telemetry event
67  instead of panicking; optimize for debug information at the detriment of efficient code.
68* `bound-checks-seed`: -fbounds-safety is enabled. For RELEASE kernels, failing checks
69  generate a telemetry event instead of panicking; for all other kernel configurations
70  failing bound checks panic.
71
72# The process of enabling bound-checks
73
74`bound-checks` is the final, desirable bounds checking configuration option. We
75do not enable `bound-checks` lightly, as it can introduce new reasons that xnu
76panics. We have found that the following process consistently helps land code
77changes that stick, and help reduce the likelihood of introducing problems that
78turn into bad kernels.
79
80## Step 1: adopt -fbounds-safety at desk
81
82When enabling -fbounds-safety, clang generates new diagnostics that ensure at
83compile-time that bounds could be known at runtime (if necessary) for all
84pointers, and new diagnostics for when a bounds check is likely (or guaranteed)
85to fail at runtime. The first step to adopting -fbounds-safety is making code
86changes to xnu such that it builds without any diagnostics, and testing at desk
87that your changes did not impact kernel functionality.
88
89For this step, you use `bound-checks-debug`. `bound-checks-debug` enables the
90entire breadth of -fbounds-safety diagnostics and gives you the most easily
91debugged bounds checks. You should also use bound-checks-debug for xnu changes
92that you send to integration testing.
93
94## Step 2: separate adoption from enablement
95
96Once you're confident in your code changes, everything builds, at-desk testing
97is successful and integration testing is happy, you start two pull requests:
98
99* one pull request with the necessary adoption code changes, configuring the
100  file to build with `bound-checks-pending`;
101* one pull request that changes `bound-checks-pending` to `bound-checks-soft`.
102
103This strategy can save your change and other people's changes even in the face
104of small errors. Read on to "where bound-checks-soft comes in" for more details.
105
106### Where bound-checks-pending comes in
107
108The configuration status quo of any file in xnu is to build with no options
109relating to -fbounds-safety. In this mode, -fbounds-safety's `__indexable` and
110`__bidi_indexable` keywords are **undefined**. It is a syntax error to use them.
111This is because `__indexable` and `__bidi_indexable` pointers are not
112ABI-compatible with plain C: if they were defined to nothing instead, and a use
113of `__indexable` or `__bidi_indexable` slipped into a header used by a set of
114files heterogeneously enabling -fbounds-safety, they could cause ABI breaks that
115would manifest as opaque runtime crashes instead of compile-time errors.
116
117However, adopting -fbounds-safety may require the explicit use of `__indexable`
118or `__bidi_indexable` pointers that are confined to the file being modified.
119Until `bound-checks-soft` is enabled, it must still be possible to build that
120file without -fbounds-safety. This is where `bound-checks-pending` comes in:
121this flag causes `__indexable` and `__bidi_indexable` to expand to nothing, and
122it disables warnings that will frequently trip in plain C files that are
123compatible with -fbounds-safety (such as -Wself-assign). This allows files that
124are compatible with -fbounds-safety to continue to build without it, while
125minimizing the risk of ABI incompatibilities.
126
127### Where bound-checks-soft comes in
128
129Using `bound-checks-soft` means that if a problem slips through qualification,
130the kernel is still probably livable. A kernel that is unlivable due to panics
131creates significant drag over the entire software development organization, and
132fixing it will be a same-day emergency that you will need to firefight and then
133root-cause. This **will** take precedence over any other work that you could
134rather be doing. On the other hand, "soft traps" generate telemetry without
135panicking. Kernels with known soft trap triggers are un-shippable, but they may
136still be livable. As a result, fixing these problems is merely very important.
137
138`bound-checks-soft` is enabled separately from the code change because even
139though `bound-checks-soft` is ideally non-fatal, failing a bounds check in
140certain conditions can still result in an un-livable kernel (for instance,
141if a check fails in a long, tight loop). If such a serious issue slips into
142qualification, integrators only need to back out the `bound-checks-soft` change
143(falling back to `bound-checks-pending`) instead of reverting your entire
144change. Reverting entire changes is a very destructive integration action: any
145_other_ changes that rely on your modifications may need to be cascaded out of
146the build as well. Given unfortunate-enough timing, there _may not be time_ to
147re-nominate feature work that must be backed out. Significant -fbounds-safety
148adoption experience in xnu and other projects has taught us that bundling in
149non-trivial code changes with the enablement of -fbounds-safety is a recipe for
150sadness and reprised work.
151
152### Where bound-checks-seed comes in
153
154If you want to enable `bound-checks` for internal users but want to use
155`bound-checks-soft` for external users in order to collect telemetry
156(e.g. during seeding), use `bound-checks-seed`.
157The expectation is that, once the telemetry is collected, you will change the
158file to `bound-checks` or disable -fbounds-safety.
159Due to security concerns, namely non fatal traps, `bound-checks-seed`
160is not meant to be shipped to customers outside of seeding.
161
162## Step 3: enable bound-checks
163
164We let changes with `bound-checks-soft` steep in internal releases to build up
165confidence that bounds checks don't trip during regular operations. During this
166period, failing bounds checks create telemetry events that are collected by
167XNU engineers instead of bringing down the system. Although failing bounds
168checks are never desirable, it is better to catch them at that stage than at any
169point after.
170
171Once we have confidence that a file doesn't cause issues when -fbounds-safety is
172enabled, we can change `bound-checks-soft` to the plain `bound-checks`. This is
173simply done with another pull request.
174
175Read "where bound-checks-seed comes in" for a different approach if you need
176a higher confidence level before enabling `bound-checks`.
177