xref: /xnu-11417.121.6/doc/primitives/string-handling.md (revision a1e26a70f38d1d7daa7b49b258e2f8538ad81650) !
1*a1e26a70SApple OSS Distributions# String handling in xnu
2*a1e26a70SApple OSS Distributions
3*a1e26a70SApple OSS Distributionsxnu implements most POSIX C string functions, including the inherited subset of
4*a1e26a70SApple OSS Distributionsstandard C string functions. Unfortunately, poor design choices have made many
5*a1e26a70SApple OSS Distributionsof these functions, including the more modern `strl` functions, confusing or
6*a1e26a70SApple OSS Distributionsunsafe. In addition, the advent of -fbounds-safety support in xnu is forcing
7*a1e26a70SApple OSS Distributionssome string handling practices to be revisited. This document explains the
8*a1e26a70SApple OSS Distributionsfailings of POSIX C string functions, xnu's `strbuf` functions, and their
9*a1e26a70SApple OSS Distributionsintersection with the -fbounds-safety C extension.
10*a1e26a70SApple OSS Distributions
11*a1e26a70SApple OSS Distributions## The short-form guidance
12*a1e26a70SApple OSS Distributions
13*a1e26a70SApple OSS Distributions* Use `strbuf*` when you have the length for all the strings;
14*a1e26a70SApple OSS Distributions* use `strl*` when you have the length of _one_ string, and the other is
15*a1e26a70SApple OSS Distributions  guaranteed to be NUL-terminated;
16*a1e26a70SApple OSS Distributions* use `str*` when you don't have the length for any of the strings, and they
17*a1e26a70SApple OSS Distributions  are all guaranteed to be NUL-terminated;
18*a1e26a70SApple OSS Distributions* stop using `strn*` functions.
19*a1e26a70SApple OSS Distributions
20*a1e26a70SApple OSS Distributions## Replacing `strncmp`
21*a1e26a70SApple OSS Distributions
22*a1e26a70SApple OSS Distributions`strncmp` is always wrong with -fbounds-safety, and it's unavailable as a
23*a1e26a70SApple OSS Distributionsresult. Given `strcmp(first, secnd, n)`, you need to know the types of `first`
24*a1e26a70SApple OSS Distributionsand `secnd` to pick a replacement. Choose according to this table:
25*a1e26a70SApple OSS Distributions
26*a1e26a70SApple OSS Distributions| strncmp(first, secnd, n) | __null_terminated first   | __indexable first               |
27*a1e26a70SApple OSS Distributions| ------------------------ | ------------------------- | ------------------------------- |
28*a1e26a70SApple OSS Distributions| __null_terminated secnd  | n/a                       | strlcmp(first, secnd, n1)       |
29*a1e26a70SApple OSS Distributions| __indexable secnd        | strlcmp(secnd, first, n2) | strbufcmp(first, n1, secnd, n2) |
30*a1e26a70SApple OSS Distributions
31*a1e26a70SApple OSS DistributionsUsing `strncmp` with two NUL-terminated strings is uncommon and it has no
32*a1e26a70SApple OSS Distributionsdirect replacement. The first person who needs to use -fbounds-safety in a file
33*a1e26a70SApple OSS Distributionsthat does this might need to write the string function.
34*a1e26a70SApple OSS Distributions
35*a1e26a70SApple OSS DistributionsIf you try to use `strlcmp` and you get a diagnostic like this:
36*a1e26a70SApple OSS Distributions
37*a1e26a70SApple OSS Distributions> passing 'const char *__indexable' to parameter of incompatible type
38*a1e26a70SApple OSS Distributions> 'const char *__null_terminated' is an unsafe operation ...
39*a1e26a70SApple OSS Distributions
40*a1e26a70SApple OSS Distributionsthen you might need to swap the two string arguments. `strlcmp` is sensitive to
41*a1e26a70SApple OSS Distributionsthe argument order: just like for `strlcpy`, the indexable string goes first.
42*a1e26a70SApple OSS Distributions
43*a1e26a70SApple OSS Distributions# The problems with string functions
44*a1e26a70SApple OSS Distributions
45*a1e26a70SApple OSS DistributionsPOSIX/BSD string handling functions come in many variants:
46*a1e26a70SApple OSS Distributions
47*a1e26a70SApple OSS Distributions* `str` functions (strlen, strcat, etc), unsafe for writing;
48*a1e26a70SApple OSS Distributions* `strn` functions (strnlen, strncat, etc), unsafe for writing;
49*a1e26a70SApple OSS Distributions* `strl` functions (strlcpy, strlcat, etc), safe but easily misunderstood.
50*a1e26a70SApple OSS Distributions
51*a1e26a70SApple OSS Distributions`str` functions for writing (`strcpy`, `strcat`, etc) are **all** unsafe
52*a1e26a70SApple OSS Distributionsbecause they don't care about the bounds of the output buffer. Most or all of
53*a1e26a70SApple OSS Distributionsthese functions have been deprecated or outright removed from xnu. You should
54*a1e26a70SApple OSS Distributionsnever use `str` functions to write to strings. Functions that simply read
55*a1e26a70SApple OSS Distributionsstrings (`strlen`, `strcmp`, `strchr`, etc) are generally found to be safe
56*a1e26a70SApple OSS Distributionsbecause there is no confusion that their input must be NUL-terminated and there
57*a1e26a70SApple OSS Distributionsis no danger of writing out of bounds (out of not writing at all).
58*a1e26a70SApple OSS Distributions
59*a1e26a70SApple OSS Distributions`strn` functions for writing (`strncpy`, `strncat`, etc) are **all** unsafe.
60*a1e26a70SApple OSS Distributions`strncpy` doesn't NUL-terminate the output buffer, and `strncat` doesn't accept
61*a1e26a70SApple OSS Distributionsa length for the output buffer. **All** new string buffers should include space
62*a1e26a70SApple OSS Distributionsfor a NUL terminator. `strn` functions for reading (`strncmp`, `strnlen`) are
63*a1e26a70SApple OSS Distributions_generally_ safe, but `strncmp` can cause confusion over which string is bound
64*a1e26a70SApple OSS Distributionsby the given size. In extreme cases, this can create information disclosure
65*a1e26a70SApple OSS Distributionsbugs or stability issues.
66*a1e26a70SApple OSS Distributions
67*a1e26a70SApple OSS Distributions`strl` functions, from OpenBSD, only come in writing variants, and they always
68*a1e26a70SApple OSS DistributionsNUL-terminate their output. This makes the writing part safe. (xnu adds `strl`
69*a1e26a70SApple OSS Distributionscomparison functions, which do no writing and are also safe.) However, these
70*a1e26a70SApple OSS Distributionsfunctions assume the output pointer is a buffer and the input is a NUL-
71*a1e26a70SApple OSS Distributionsterminated string. Because of coexistence with `strn` functions that make no
72*a1e26a70SApple OSS Distributionssuch assumption, this mental model isn't entirely adopted by many users. For
73*a1e26a70SApple OSS Distributionsinstance, the following code is buggy:
74*a1e26a70SApple OSS Distributions
75*a1e26a70SApple OSS Distributions```c
76*a1e26a70SApple OSS Distributionschar output[4];
77*a1e26a70SApple OSS Distributionschar input[8] = "abcdefgh"; /* not NUL-terminated */
78*a1e26a70SApple OSS Distributionsstrlcpy(output, input, sizeof(output));
79*a1e26a70SApple OSS Distributions```
80*a1e26a70SApple OSS Distributions
81*a1e26a70SApple OSS Distributions`strlcpy` returns the length of the input string; in xnu's implementation,
82*a1e26a70SApple OSS Distributionsliterally by calling `strlen(input)`. Even though only 3 characters are written
83*a1e26a70SApple OSS Distributionsto `output` (plus a NUL), `input` is read until reaching a NUL character. This
84*a1e26a70SApple OSS Distributionsis always a problem from the perspective of memory disclosures, and in some
85*a1e26a70SApple OSS Distributionscases, it can also lead to stability issues.
86*a1e26a70SApple OSS Distributions
87*a1e26a70SApple OSS Distributions`strlcpy_ret` is a convenience wrapper around `strlcpy`, which returns
88*a1e26a70SApple OSS Distributionsa `__null_terminated` pointer to the output string instead of the length of the input string.
89*a1e26a70SApple OSS DistributionsSimilarly to `strlcpy`, the `strlcpy_ret` will search for the NUL character
90*a1e26a70SApple OSS Distributionsin the input string.
91*a1e26a70SApple OSS Distributions
92*a1e26a70SApple OSS Distributions# Changes with -fbounds-safety
93*a1e26a70SApple OSS Distributions
94*a1e26a70SApple OSS DistributionsWhen enabling -fbounds-safety, character buffers and NUL-terminated strings are
95*a1e26a70SApple OSS Distributionstwo distinct types, and they do not implicitly convert to each other. This
96*a1e26a70SApple OSS Distributionsprevents confusing the two in the way that is problematic with `strlcpy`/`strlcpy_ret`,
97*a1e26a70SApple OSS Distributionsfor instance. However, it creates new problems:
98*a1e26a70SApple OSS Distributions
99*a1e26a70SApple OSS Distributions* What is the correct way to transform a character buffer into a NUL-terminated
100*a1e26a70SApple OSS Distributions  string?
101*a1e26a70SApple OSS Distributions* When -fbounds-safety flags that the use of a string function was improper,
102*a1e26a70SApple OSS Distributions  what is the solution?
103*a1e26a70SApple OSS Distributions
104*a1e26a70SApple OSS DistributionsThe most common use of character buffers is to build a string, and then this
105*a1e26a70SApple OSS Distributionsstring is passed without bounds as a NUL-terminated string to downstream users.
106*a1e26a70SApple OSS Distributions-fbounds-safety and XNU enshrine this practice with the following additions:
107*a1e26a70SApple OSS Distributions
108*a1e26a70SApple OSS Distributions* `tsnprintf`: like `snprintf`, but it returns a NUL-terminated string;
109*a1e26a70SApple OSS Distributions* `strbuf` functions, explicitly accepting character buffers and a distinct
110*a1e26a70SApple OSS Distributions  count for each:
111*a1e26a70SApple OSS Distributions  * `strbuflen(buffer, length)`: like `strnlen`;
112*a1e26a70SApple OSS Distributions  * `strbufcmp(a, alen, b, len)`: like `strcmp`;
113*a1e26a70SApple OSS Distributions  * `strbufcasecmp(a, alen, b, blen)`: like `strcasecmp`;
114*a1e26a70SApple OSS Distributions  * `strbufcpy(a, alen, b, blen)`: like `strlcpy` but returns `a` as a NUL-
115*a1e26a70SApple OSS Distributions    terminated string;
116*a1e26a70SApple OSS Distributions  * `strlcpy_ret(dst, src, n)`: like `strlcpy`, but returns `dst` as a NUL-
117*a1e26a70SApple OSS Distributions    terminated string;
118*a1e26a70SApple OSS Distributions  * `strbufcat(a, alen, b, blen)`: like `strlcat` but returns `a` as a NUL-
119*a1e26a70SApple OSS Distributions    terminated string;
120*a1e26a70SApple OSS Distributions* `strl` (new) functions, accepting _one_ character buffer of a known size and
121*a1e26a70SApple OSS Distributions  _one_ NUL-terminated string:
122*a1e26a70SApple OSS Distributions  * `strlcmp(a, b, alen)`: like `strcmp`;
123*a1e26a70SApple OSS Distributions  * `strlcasecmp(a, b, alen)`: like `strcasecmp`.
124*a1e26a70SApple OSS Distributions
125*a1e26a70SApple OSS Distributions`strbuf` functions additionally all have overloads accepting character arrays
126*a1e26a70SApple OSS Distributionsin lieu of a pointer+length pair: `strbuflen(array)`, `strbufcmp(a, b)`,
127*a1e26a70SApple OSS Distributions`strbufcasecmp(a, b)`, `strbufcpy(a, b)`, `strbufcat(a, b)`.
128*a1e26a70SApple OSS Distributions
129*a1e26a70SApple OSS DistributionsIf the destination array of `strbufcpy` or `strbufcat` has a size of 0, they
130*a1e26a70SApple OSS Distributionsreturn NULL without doing anything else. Otherwise, the destination is always
131*a1e26a70SApple OSS DistributionsNUL-terminated and returned as a NUL-terminated string pointer.
132*a1e26a70SApple OSS Distributions
133*a1e26a70SApple OSS DistributionsWhile you are modifying a string, you should reference its data as some flavor
134*a1e26a70SApple OSS Distributionsof indexable pointer, and only once you're done should you convert it to a
135*a1e26a70SApple OSS DistributionsNUL-terminated string. NUL-terminated character pointers are generally not
136*a1e26a70SApple OSS Distributionssuitable for modifications as bounds are determined by contents. Overwriting
137*a1e26a70SApple OSS Distributionsany NUL character found through a `__null_terminated` pointer access will result
138*a1e26a70SApple OSS Distributionsin a trap. For instance:
139*a1e26a70SApple OSS Distributions
140*a1e26a70SApple OSS Distributions```c
141*a1e26a70SApple OSS Distributionsvoid my_string_consuming_func(const char *);
142*a1e26a70SApple OSS Distributions
143*a1e26a70SApple OSS Distributions// lots of __unsafe!
144*a1e26a70SApple OSS Distributionschar *__null_terminated my_string = __unsafe_forge_null_terminated(
145*a1e26a70SApple OSS Distributions  kalloc_data(my_string_size, Z_WAITOK));
146*a1e26a70SApple OSS Distributionsmemcpy(
147*a1e26a70SApple OSS Distributions  __unsafe_forge_bidi_indexable(void *, my_string, my_string_size),
148*a1e26a70SApple OSS Distributions  my_data,
149*a1e26a70SApple OSS Distributions  my_string_size);
150*a1e26a70SApple OSS Distributionsmy_string_consuming_func(my_string);
151*a1e26a70SApple OSS Distributions```
152*a1e26a70SApple OSS Distributions
153*a1e26a70SApple OSS DistributionsThis code converts the string pointer to a NUL-terminated string too early,
154*a1e26a70SApple OSS Distributionswhile it's still being modified. Keeping my_string a `__null_terminated` pointer
155*a1e26a70SApple OSS Distributionswhile it's being modified leads to more forging, which has more chances of
156*a1e26a70SApple OSS Distributionsintroducing errors, and is less ergonomic. Consider this instead:
157*a1e26a70SApple OSS Distributions
158*a1e26a70SApple OSS Distributions```c
159*a1e26a70SApple OSS Distributionsvoid my_string_consuming_func(const char *);
160*a1e26a70SApple OSS Distributions
161*a1e26a70SApple OSS Distributionschar *my_buffer = kalloc_data(my_string_size, Z_WAITOK);
162*a1e26a70SApple OSS Distributionsconst char *__null_terminated finished_string =
163*a1e26a70SApple OSS Distributions  strbufcpy(my_buffer, my_string_size, my_data, my_string_size);
164*a1e26a70SApple OSS Distributionsmy_string_consuming_func(finished);
165*a1e26a70SApple OSS Distributions```
166*a1e26a70SApple OSS Distributions
167*a1e26a70SApple OSS DistributionsThis example has two views of the same data: `my_buffer` (through which the
168*a1e26a70SApple OSS Distributionsstring is being modified) and `finished_string` (which is `const` and
169*a1e26a70SApple OSS DistributionsNUL-terminated). Using `my_buffer` as an indexable pointer allows you to modify
170*a1e26a70SApple OSS Distributionsit ergonomically, and importantly, without forging. You turn it into a
171*a1e26a70SApple OSS DistributionsNUL-terminated string at the same time you turn it into a `const` reference,
172*a1e26a70SApple OSS Distributionssignalling that you're done making changes.
173*a1e26a70SApple OSS Distributions
174*a1e26a70SApple OSS DistributionsWith -fbounds-safety enabled, you should structure the final operation modifying
175*a1e26a70SApple OSS Distributionsa character array such that you get a NUL-terminated view of it. For instance,
176*a1e26a70SApple OSS Distributionsthis plain C code:
177*a1e26a70SApple OSS Distributions
178*a1e26a70SApple OSS Distributions```c
179*a1e26a70SApple OSS Distributionschar thread_name[MAXTHREADNAMESIZE];
180*a1e26a70SApple OSS Distributions(void) snprintf(thread_name, sizeof(thread_name),
181*a1e26a70SApple OSS Distributions        "dlil_input_%s", ifp->if_xname);
182*a1e26a70SApple OSS Distributionsthread_set_thread_name(inp->dlth_thread, thread_name);
183*a1e26a70SApple OSS Distributions```
184*a1e26a70SApple OSS Distributions
185*a1e26a70SApple OSS Distributionsbecomes:
186*a1e26a70SApple OSS Distributions
187*a1e26a70SApple OSS Distributions```c
188*a1e26a70SApple OSS Distributionschar thread_name_buf[MAXTHREADNAMESIZE];
189*a1e26a70SApple OSS Distributionsconst char *__null_terminated thread_name;
190*a1e26a70SApple OSS Distributionsthread_name = tsnprintf(thread_name_buf, sizeof(thread_name_buf),
191*a1e26a70SApple OSS Distributions        "dlil_input_%s", ifp->if_xname);
192*a1e26a70SApple OSS Distributionsthread_set_thread_name(inp->dlth_thread, thread_name);
193*a1e26a70SApple OSS Distributions```
194*a1e26a70SApple OSS Distributions
195*a1e26a70SApple OSS DistributionsAlthough `tsnprintf` and `strbuf` functions return a `__null_terminated`
196*a1e26a70SApple OSS Distributionspointer to you for convenience, not all use cases are resolved by calling
197*a1e26a70SApple OSS Distributions`tsnprintf` or `strbufcpy` once. As a quick reference, with -fbounds-safety
198*a1e26a70SApple OSS Distributionsenabled, you can use `__unsafe_null_terminated_from_indexable(p_start, p_nul)`
199*a1e26a70SApple OSS Distributionsto convert a character array to a `__null_terminated` string if you need to
200*a1e26a70SApple OSS Distributionsperform more manipulations. (`p_start` is a pointer to the first character, and
201*a1e26a70SApple OSS Distributions`p_nul` is a pointer to the NUL character in that string.) For instance, if you
202*a1e26a70SApple OSS Distributionsbuild a string with successive calls to `scnprintf`, you would use
203*a1e26a70SApple OSS Distributions`__unsafe_null_terminated_from_indexable` at the end of the sequence to get your
204*a1e26a70SApple OSS DistributionsNUL-terminated string pointer.
205*a1e26a70SApple OSS Distributions
206*a1e26a70SApple OSS DistributionsOccasionally, you need to turn a NUL-terminated string back into "char buffer"
207*a1e26a70SApple OSS Distributions(usually to interoperate with copy APIs that need a pointer and a byte count).
208*a1e26a70SApple OSS DistributionsWhen possible, it's advised to use APIs that copy NUL-terminated pointers (like
209*a1e26a70SApple OSS Distributions`strlcpy`). Otherwise, convert the NUL-terminated string to an indexable buffer
210*a1e26a70SApple OSS Distributionsusing `__null_terminated_to_indexable` (if you don't need the NUL terminator to
211*a1e26a70SApple OSS Distributionsbe in bounds of the result pointer) or `__unsafe_null_terminated_to_indexable`
212*a1e26a70SApple OSS Distributions(if you need it). Also keep in mind that in code which pervasively deals with
213*a1e26a70SApple OSS Distributionsbuffers that have lengths and some of them happen to also be NUL-terminated
214*a1e26a70SApple OSS Distributionsstrings, it could be simply more convenient to keep string buffers in some
215*a1e26a70SApple OSS Distributionsflavor of indexable pointers instead of having conversions from and to
216*a1e26a70SApple OSS DistributionsNUL-terminated strings.
217*a1e26a70SApple OSS Distributions
218*a1e26a70SApple OSS Distributions# I have a choice between `strn*`, `strl*`, `strbuf*`. Which one do I use?
219*a1e26a70SApple OSS Distributions
220*a1e26a70SApple OSS DistributionsYou might come across cases where the same function in different families would
221*a1e26a70SApple OSS Distributionsseem like they all do the trick. For instance:
222*a1e26a70SApple OSS Distributions
223*a1e26a70SApple OSS Distributions```c
224*a1e26a70SApple OSS Distributionsstruct foo {
225*a1e26a70SApple OSS Distributions    char buf1[10];
226*a1e26a70SApple OSS Distributions    char buf2[16];
227*a1e26a70SApple OSS Distributions};
228*a1e26a70SApple OSS Distributions
229*a1e26a70SApple OSS Distributionsvoid bar(struct foo *f) {
230*a1e26a70SApple OSS Distributions    /* how do I test whether buf1 and buf2 contain the same string? */
231*a1e26a70SApple OSS Distributions    if (strcmp(f->buf1, f->buf2) == 0) { /* ... */ }
232*a1e26a70SApple OSS Distributions    if (strncmp(f->buf1, f->buf2, sizeof(f->buf1)) == 0) { /* ... */ }
233*a1e26a70SApple OSS Distributions    if (strlcmp(f->buf1, f->buf2, sizeof(f->buf1)) == 0) { /* ... */ }
234*a1e26a70SApple OSS Distributions    if (strbufcmp(f->buf1, f->buf2) == 0) { /* ... */ }
235*a1e26a70SApple OSS Distributions}
236*a1e26a70SApple OSS Distributions```
237*a1e26a70SApple OSS Distributions
238*a1e26a70SApple OSS DistributionsWithout -fbounds-safety, these all work the same, but when you enable it,
239*a1e26a70SApple OSS Distributions`strbufcmp` could be the only one that builds. If you do not have the privilege
240*a1e26a70SApple OSS Distributionsof -fbounds-safety to guide you to the best choice, as a rule of thumb, you
241*a1e26a70SApple OSS Distributionsshould prefer APIs in the following order:
242*a1e26a70SApple OSS Distributions
243*a1e26a70SApple OSS Distributions1. `strbuf*` APIs;
244*a1e26a70SApple OSS Distributions2. `strl*` APIs;
245*a1e26a70SApple OSS Distributions3. `str*` APIs.
246*a1e26a70SApple OSS Distributions
247*a1e26a70SApple OSS DistributionsThat is, to implement `bar`, you have a choice of `strcmp`, `strncmp` and
248*a1e26a70SApple OSS Distributions`strbufcmp`, and you should prefer `strbufcmp`.
249*a1e26a70SApple OSS Distributions
250*a1e26a70SApple OSS Distributions`strn` functions are **never** recommended. You should use `strbuflen` over
251*a1e26a70SApple OSS Distributions`strnlen` (they do the same thing, but having a separate `strbuflen` function
252*a1e26a70SApple OSS Distributionsmakes the guidance to avoid `strn` functions easier), and you should use
253*a1e26a70SApple OSS Distributions`strbufcmp`, `strlcmp` or even `strcmp` over `strncmp` (depending on whether
254*a1e26a70SApple OSS Distributionsyou know the length of each string, of just one, or of neither).
255