xref: /xnu-11215.61.5/doc/building/bound_checks.md (revision 4f1223e81cd707a65cc109d0b8ad6653699da3c4)
1*4f1223e8SApple OSS Distributions# Bounds checking
2*4f1223e8SApple OSS Distributions
3*4f1223e8SApple OSS DistributionsThe goal of -fbounds-safety is to prevent buffer overflows from escalating into
4*4f1223e8SApple OSS Distributionssecurity issues. However, that escalation is prevented by crashing the program,
5*4f1223e8SApple OSS Distributionswhich, in the case of the kernel, means panicking the system. While panicking
6*4f1223e8SApple OSS Distributionsis a lesser evil than allowing an attacker to compromise the system, it is
7*4f1223e8SApple OSS Distributionsstill a drastic measure.
8*4f1223e8SApple OSS Distributions
9*4f1223e8SApple OSS Distributionsxnu's build system supports several options controlling the enforcement of
10*4f1223e8SApple OSS Distributionsbounds checks via clang's -fbounds-safety extension. This document describes a
11*4f1223e8SApple OSS Distributionsprocess that implements our best practices to adopt -fbounds-safety in existing
12*4f1223e8SApple OSS Distributionscode.
13*4f1223e8SApple OSS Distributions
14*4f1223e8SApple OSS Distributions# Controllable aspects of -fbounds-safety
15*4f1223e8SApple OSS Distributions
16*4f1223e8SApple OSS Distributions-fbounds-safety is enabled at a file granularity in the xnu build system.
17*4f1223e8SApple OSS DistributionsWhether a given file builds with -fbounds-safety is controlled by the build
18*4f1223e8SApple OSS Distributionssystem's configuration `files` under each kernel component. For instance, one of
19*4f1223e8SApple OSS Distributionsthe first components in xnu to enable -fbounds-safety is bsd/net: as a result,
20*4f1223e8SApple OSS Distributionsbsd/conf/files is where build system modifications were made.
21*4f1223e8SApple OSS Distributions
22*4f1223e8SApple OSS DistributionsThere are four options that control aspects related to -fbounds-safety:
23*4f1223e8SApple OSS Distributions
24*4f1223e8SApple OSS Distributions* whether -fbounds-safety is enabled at all;
25*4f1223e8SApple OSS Distributions* when it is disabled, whether we should allow `__indexable` and
26*4f1223e8SApple OSS Distributions  `__bidi_indexable` in source (or emit a compile-time error if they're used);
27*4f1223e8SApple OSS Distributions* when it is enabled, whether a trap should be a panic, or whether it should
28*4f1223e8SApple OSS Distributions  only report a telemetry event;
29*4f1223e8SApple OSS Distributions* when it is set to panic, whether we should optimize for code size at the
30*4f1223e8SApple OSS Distributions  expense of the quality of debug information.
31*4f1223e8SApple OSS Distributions
32*4f1223e8SApple OSS Distributions## Code size tradeoffs
33*4f1223e8SApple OSS Distributions
34*4f1223e8SApple OSS DistributionsWe can ask clang to give us one trap instruction per function, which can have
35*4f1223e8SApple OSS Distributionssignificant positive effects on code size and performance. However, every bounds
36*4f1223e8SApple OSS Distributionscheck in that function will jump to that trap instruction when they fail. Debug
37*4f1223e8SApple OSS Distributionsinformation on the trap instruction will be meaningless and the debugger won't
38*4f1223e8SApple OSS Distributionsknow where we came from. This manifests as a "function(), file.c:0" call stack
39*4f1223e8SApple OSS Distributionsentry in the backtrace.
40*4f1223e8SApple OSS Distributions
41*4f1223e8SApple OSS DistributionsOn the other hand, we can ask clang to give us one trap instruction per bounds
42*4f1223e8SApple OSS Distributionscheck. In that configuration, we get arguably bad codegen, but the backtrace is
43*4f1223e8SApple OSS Distributionsalways immediately readable and the trap location shows correctly in the
44*4f1223e8SApple OSS Distributionsdebugger.
45*4f1223e8SApple OSS Distributions
46*4f1223e8SApple OSS DistributionsTo debug a panic in a build optimizing for code size, we can read disassembly
47*4f1223e8SApple OSS Distributionsand make inferences based on register values. For instance, if we look at one
48*4f1223e8SApple OSS Distributionsbounds check failing if register `x8` is greater than register `x9`, and in the
49*4f1223e8SApple OSS Distributionscontext of our panic we know that `x8` is 0x0 and `x9` is 0x1000, then we know
50*4f1223e8SApple OSS Distributionswe can't possibly have failed because of that bounds check. There are scripts
51*4f1223e8SApple OSS Distributionsto automate this reasoning–ask the -fbounds-safety DRIs for help if you run into
52*4f1223e8SApple OSS Distributionsthis situation.
53*4f1223e8SApple OSS Distributions
54*4f1223e8SApple OSS Distributions## Bounds checking options
55*4f1223e8SApple OSS Distributions
56*4f1223e8SApple OSS Distributions* (nothing): -fbounds-safety is disabled; it is an error to use `__indexable`
57*4f1223e8SApple OSS Distributions  and `__bidi_indexable` in source.
58*4f1223e8SApple OSS Distributions* `bound-checks-pending`: -fbounds-safety is disabled, but `__indexable` and
59*4f1223e8SApple OSS Distributions  `__bidi_indexable` are defined to nothing instead of causing compile-time
60*4f1223e8SApple OSS Distributions  errors.
61*4f1223e8SApple OSS Distributions* `bound-checks`: -fbounds-safety is enabled; failing bounds checks panic;
62*4f1223e8SApple OSS Distributions  optimize for code size at the detriment of debuggability.
63*4f1223e8SApple OSS Distributions* `bound-checks-debug`: -fbounds-safety is enabled; failing bounds checks panic;
64*4f1223e8SApple OSS Distributions  optimize for debug information at the detriment of efficient code.
65*4f1223e8SApple OSS Distributions* `bound-checks-soft`: -fbounds-safety is disabled for RELEASE kernels;
66*4f1223e8SApple OSS Distributions  for all other kernel configurations failing bounds checks generate a telemetry event
67*4f1223e8SApple OSS Distributions  instead of panicking; optimize for debug information at the detriment of efficient code.
68*4f1223e8SApple OSS Distributions* `bound-checks-seed`: -fbounds-safety is enabled. For RELEASE kernels, failing checks
69*4f1223e8SApple OSS Distributions  generate a telemetry event instead of panicking; for all other kernel configurations
70*4f1223e8SApple OSS Distributions  failing bound checks panic.
71*4f1223e8SApple OSS Distributions
72*4f1223e8SApple OSS Distributions# The process of enabling bound-checks
73*4f1223e8SApple OSS Distributions
74*4f1223e8SApple OSS Distributions`bound-checks` is the final, desirable bounds checking configuration option. We
75*4f1223e8SApple OSS Distributionsdo not enable `bound-checks` lightly, as it can introduce new reasons that xnu
76*4f1223e8SApple OSS Distributionspanics. We have found that the following process consistently helps land code
77*4f1223e8SApple OSS Distributionschanges that stick, and help reduce the likelihood of introducing problems that
78*4f1223e8SApple OSS Distributionsturn into bad kernels.
79*4f1223e8SApple OSS Distributions
80*4f1223e8SApple OSS Distributions## Step 1: adopt -fbounds-safety at desk
81*4f1223e8SApple OSS Distributions
82*4f1223e8SApple OSS DistributionsWhen enabling -fbounds-safety, clang generates new diagnostics that ensure at
83*4f1223e8SApple OSS Distributionscompile-time that bounds could be known at runtime (if necessary) for all
84*4f1223e8SApple OSS Distributionspointers, and new diagnostics for when a bounds check is likely (or guaranteed)
85*4f1223e8SApple OSS Distributionsto fail at runtime. The first step to adopting -fbounds-safety is making code
86*4f1223e8SApple OSS Distributionschanges to xnu such that it builds without any diagnostics, and testing at desk
87*4f1223e8SApple OSS Distributionsthat your changes did not impact kernel functionality.
88*4f1223e8SApple OSS Distributions
89*4f1223e8SApple OSS DistributionsFor this step, you use `bound-checks-debug`. `bound-checks-debug` enables the
90*4f1223e8SApple OSS Distributionsentire breadth of -fbounds-safety diagnostics and gives you the most easily
91*4f1223e8SApple OSS Distributionsdebugged bounds checks. You should also use bound-checks-debug for xnu changes
92*4f1223e8SApple OSS Distributionsthat you send to integration testing.
93*4f1223e8SApple OSS Distributions
94*4f1223e8SApple OSS Distributions## Step 2: separate adoption from enablement
95*4f1223e8SApple OSS Distributions
96*4f1223e8SApple OSS DistributionsOnce you're confident in your code changes, everything builds, at-desk testing
97*4f1223e8SApple OSS Distributionsis successful and integration testing is happy, you start two pull requests:
98*4f1223e8SApple OSS Distributions
99*4f1223e8SApple OSS Distributions* one pull request with the necessary adoption code changes, configuring the
100*4f1223e8SApple OSS Distributions  file to build with `bound-checks-pending`;
101*4f1223e8SApple OSS Distributions* one pull request that changes `bound-checks-pending` to `bound-checks-soft`.
102*4f1223e8SApple OSS Distributions
103*4f1223e8SApple OSS DistributionsThis strategy can save your change and other people's changes even in the face
104*4f1223e8SApple OSS Distributionsof small errors. Read on to "where bound-checks-soft comes in" for more details.
105*4f1223e8SApple OSS Distributions
106*4f1223e8SApple OSS Distributions### Where bound-checks-pending comes in
107*4f1223e8SApple OSS Distributions
108*4f1223e8SApple OSS DistributionsThe configuration status quo of any file in xnu is to build with no options
109*4f1223e8SApple OSS Distributionsrelating to -fbounds-safety. In this mode, -fbounds-safety's `__indexable` and
110*4f1223e8SApple OSS Distributions`__bidi_indexable` keywords are **undefined**. It is a syntax error to use them.
111*4f1223e8SApple OSS DistributionsThis is because `__indexable` and `__bidi_indexable` pointers are not
112*4f1223e8SApple OSS DistributionsABI-compatible with plain C: if they were defined to nothing instead, and a use
113*4f1223e8SApple OSS Distributionsof `__indexable` or `__bidi_indexable` slipped into a header used by a set of
114*4f1223e8SApple OSS Distributionsfiles heterogeneously enabling -fbounds-safety, they could cause ABI breaks that
115*4f1223e8SApple OSS Distributionswould manifest as opaque runtime crashes instead of compile-time errors.
116*4f1223e8SApple OSS Distributions
117*4f1223e8SApple OSS DistributionsHowever, adopting -fbounds-safety may require the explicit use of `__indexable`
118*4f1223e8SApple OSS Distributionsor `__bidi_indexable` pointers that are confined to the file being modified.
119*4f1223e8SApple OSS DistributionsUntil `bound-checks-soft` is enabled, it must still be possible to build that
120*4f1223e8SApple OSS Distributionsfile without -fbounds-safety. This is where `bound-checks-pending` comes in:
121*4f1223e8SApple OSS Distributionsthis flag causes `__indexable` and `__bidi_indexable` to expand to nothing, and
122*4f1223e8SApple OSS Distributionsit disables warnings that will frequently trip in plain C files that are
123*4f1223e8SApple OSS Distributionscompatible with -fbounds-safety (such as -Wself-assign). This allows files that
124*4f1223e8SApple OSS Distributionsare compatible with -fbounds-safety to continue to build without it, while
125*4f1223e8SApple OSS Distributionsminimizing the risk of ABI incompatibilities.
126*4f1223e8SApple OSS Distributions
127*4f1223e8SApple OSS Distributions### Where bound-checks-soft comes in
128*4f1223e8SApple OSS Distributions
129*4f1223e8SApple OSS DistributionsUsing `bound-checks-soft` means that if a problem slips through qualification,
130*4f1223e8SApple OSS Distributionsthe kernel is still probably livable. A kernel that is unlivable due to panics
131*4f1223e8SApple OSS Distributionscreates significant drag over the entire software development organization, and
132*4f1223e8SApple OSS Distributionsfixing it will be a same-day emergency that you will need to firefight and then
133*4f1223e8SApple OSS Distributionsroot-cause. This **will** take precedence over any other work that you could
134*4f1223e8SApple OSS Distributionsrather be doing. On the other hand, "soft traps" generate telemetry without
135*4f1223e8SApple OSS Distributionspanicking. Kernels with known soft trap triggers are un-shippable, but they may
136*4f1223e8SApple OSS Distributionsstill be livable. As a result, fixing these problems is merely very important.
137*4f1223e8SApple OSS Distributions
138*4f1223e8SApple OSS Distributions`bound-checks-soft` is enabled separately from the code change because even
139*4f1223e8SApple OSS Distributionsthough `bound-checks-soft` is ideally non-fatal, failing a bounds check in
140*4f1223e8SApple OSS Distributionscertain conditions can still result in an un-livable kernel (for instance,
141*4f1223e8SApple OSS Distributionsif a check fails in a long, tight loop). If such a serious issue slips into
142*4f1223e8SApple OSS Distributionsqualification, integrators only need to back out the `bound-checks-soft` change
143*4f1223e8SApple OSS Distributions(falling back to `bound-checks-pending`) instead of reverting your entire
144*4f1223e8SApple OSS Distributionschange. Reverting entire changes is a very destructive integration action: any
145*4f1223e8SApple OSS Distributions_other_ changes that rely on your modifications may need to be cascaded out of
146*4f1223e8SApple OSS Distributionsthe build as well. Given unfortunate-enough timing, there _may not be time_ to
147*4f1223e8SApple OSS Distributionsre-nominate feature work that must be backed out. Significant -fbounds-safety
148*4f1223e8SApple OSS Distributionsadoption experience in xnu and other projects has taught us that bundling in
149*4f1223e8SApple OSS Distributionsnon-trivial code changes with the enablement of -fbounds-safety is a recipe for
150*4f1223e8SApple OSS Distributionssadness and reprised work.
151*4f1223e8SApple OSS Distributions
152*4f1223e8SApple OSS Distributions### Where bound-checks-seed comes in
153*4f1223e8SApple OSS Distributions
154*4f1223e8SApple OSS DistributionsIf you want to enable `bound-checks` for internal users but want to use
155*4f1223e8SApple OSS Distributions`bound-checks-soft` for external users in order to collect telemetry
156*4f1223e8SApple OSS Distributions(e.g. during seeding), use `bound-checks-seed`.
157*4f1223e8SApple OSS DistributionsThe expectation is that, once the telemetry is collected, you will change the
158*4f1223e8SApple OSS Distributionsfile to `bound-checks` or disable -fbounds-safety.
159*4f1223e8SApple OSS DistributionsDue to security concerns, namely non fatal traps, `bound-checks-seed`
160*4f1223e8SApple OSS Distributionsis not meant to be shipped to customers outside of seeding.
161*4f1223e8SApple OSS Distributions
162*4f1223e8SApple OSS Distributions## Step 3: enable bound-checks
163*4f1223e8SApple OSS Distributions
164*4f1223e8SApple OSS DistributionsWe let changes with `bound-checks-soft` steep in internal releases to build up
165*4f1223e8SApple OSS Distributionsconfidence that bounds checks don't trip during regular operations. During this
166*4f1223e8SApple OSS Distributionsperiod, failing bounds checks create telemetry events that are collected by
167*4f1223e8SApple OSS DistributionsXNU engineers instead of bringing down the system. Although failing bounds
168*4f1223e8SApple OSS Distributionschecks are never desirable, it is better to catch them at that stage than at any
169*4f1223e8SApple OSS Distributionspoint after.
170*4f1223e8SApple OSS Distributions
171*4f1223e8SApple OSS DistributionsOnce we have confidence that a file doesn't cause issues when -fbounds-safety is
172*4f1223e8SApple OSS Distributionsenabled, we can change `bound-checks-soft` to the plain `bound-checks`. This is
173*4f1223e8SApple OSS Distributionssimply done with another pull request.
174*4f1223e8SApple OSS Distributions
175*4f1223e8SApple OSS DistributionsRead "where bound-checks-seed comes in" for a different approach if you need
176*4f1223e8SApple OSS Distributionsa higher confidence level before enabling `bound-checks`.
177