xref: /xnu-10063.121.3/doc/primitives/string-handling.md (revision 2c2f96dc2b9a4408a43d3150ae9c105355ca3daa)
1*2c2f96dcSApple OSS Distributions# String handling in xnu
2*2c2f96dcSApple OSS Distributions
3*2c2f96dcSApple OSS Distributionsxnu implements most POSIX C string functions, including the inherited subset of
4*2c2f96dcSApple OSS Distributionsstandard C string functions. Unfortunately, poor design choices have made many
5*2c2f96dcSApple OSS Distributionsof these functions, including the more modern `strl` functions, confusing or
6*2c2f96dcSApple OSS Distributionsunsafe. In addition, the advent of -fbounds-safety support in xnu is forcing
7*2c2f96dcSApple OSS Distributionssome string handling practices to be revisited. This document explains the
8*2c2f96dcSApple OSS Distributionsfailings of POSIX C string functions, xnu's `strbuf` functions, and their
9*2c2f96dcSApple OSS Distributionsintersection with the -fbounds-safety C extension.
10*2c2f96dcSApple OSS Distributions
11*2c2f96dcSApple OSS Distributions## The short-form guidance
12*2c2f96dcSApple OSS Distributions
13*2c2f96dcSApple OSS Distributions* Use `strbuf*` when you have the length for all the strings;
14*2c2f96dcSApple OSS Distributions* use `strl*` when you have the length of _one_ string, and the other is
15*2c2f96dcSApple OSS Distributions  guaranteed to be NUL-terminated;
16*2c2f96dcSApple OSS Distributions* use `str*` when you don't have the length for any of the strings, and they
17*2c2f96dcSApple OSS Distributions  are all guaranteed to be NUL-terminated;
18*2c2f96dcSApple OSS Distributions* stop using `strn*` functions.
19*2c2f96dcSApple OSS Distributions
20*2c2f96dcSApple OSS Distributions# The problems with string functions
21*2c2f96dcSApple OSS Distributions
22*2c2f96dcSApple OSS DistributionsPOSIX string handling functions come in many variants:
23*2c2f96dcSApple OSS Distributions
24*2c2f96dcSApple OSS Distributions* `str` functions (strlen, strcat, etc), unsafe for writing;
25*2c2f96dcSApple OSS Distributions* `strn` functions (strnlen, strncat, etc), unsafe for writing;
26*2c2f96dcSApple OSS Distributions* `strl` functions (strlcpy, strlcat, etc), safe but easily misunderstood.
27*2c2f96dcSApple OSS Distributions
28*2c2f96dcSApple OSS Distributions`str` functions for writing (`strcpy`, `strcat`, etc) are **all** unsafe
29*2c2f96dcSApple OSS Distributionsbecause they don't care about the bounds of the output buffer. Most or all of
30*2c2f96dcSApple OSS Distributionsthese functions have been deprecated or outright removed from xnu. You should
31*2c2f96dcSApple OSS Distributionsnever use `str` functions to write to strings. Functions that simply read
32*2c2f96dcSApple OSS Distributionsstrings (`strlen`, `strcmp`, `strchr`, etc) are generally found to be safe
33*2c2f96dcSApple OSS Distributionsbecause there is no confusion that their input must be NUL-terminated and there
34*2c2f96dcSApple OSS Distributionsis no danger of writing out of bounds (out of not writing at all).
35*2c2f96dcSApple OSS Distributions
36*2c2f96dcSApple OSS Distributions`strn` functions for writing (`strncpy`, `strncat`, etc) are **all** unsafe.
37*2c2f96dcSApple OSS Distributions`strncpy` doesn't NUL-terminate the output buffer, and `strncat` doesn't accept
38*2c2f96dcSApple OSS Distributionsa length for the output buffer. **All** new string buffers should include space
39*2c2f96dcSApple OSS Distributionsfor a NUL terminator. `strn` functions for reading (`strncmp`, `strnlen`) are
40*2c2f96dcSApple OSS Distributions_generally_ safe, but `strncmp` can cause confusion over which string is bound
41*2c2f96dcSApple OSS Distributionsby the given size. In extreme cases, this can create information disclosure
42*2c2f96dcSApple OSS Distributionsbugs or stability issues.
43*2c2f96dcSApple OSS Distributions
44*2c2f96dcSApple OSS Distributions`strl` functions, in POSIX, only come in writing variants, and they always
45*2c2f96dcSApple OSS DistributionsNUL-terminate their output. This makes the writing part safe. (xnu adds `strl`
46*2c2f96dcSApple OSS Distributionscomparison functions, which do no writing and are also safe.) However, these
47*2c2f96dcSApple OSS Distributionsfunctions assume the output pointer is a buffer and the input is a NUL-
48*2c2f96dcSApple OSS Distributionsterminated string. Because of coexistence with `strn` functions that make no
49*2c2f96dcSApple OSS Distributionssuch assumption, this mental model isn't entirely adopted by many users. For
50*2c2f96dcSApple OSS Distributionsinstance, the following code is buggy:
51*2c2f96dcSApple OSS Distributions
52*2c2f96dcSApple OSS Distributions```c
53*2c2f96dcSApple OSS Distributionschar output[4];
54*2c2f96dcSApple OSS Distributionschar input[8] = "abcdefgh"; /* not NUL-terminated */
55*2c2f96dcSApple OSS Distributionsstrlcpy(output, input, sizeof(output));
56*2c2f96dcSApple OSS Distributions```
57*2c2f96dcSApple OSS Distributions
58*2c2f96dcSApple OSS Distributions`strlcpy` returns the length of the input string; in xnu's implementation,
59*2c2f96dcSApple OSS Distributionsliterally by calling `strlen(input)`. Even though only 3 characters are written
60*2c2f96dcSApple OSS Distributionsto `output` (plus a NUL), `input` is read until reaching a NUL character. This
61*2c2f96dcSApple OSS Distributionsis always a problem from the perspective of memory disclosures, and in some
62*2c2f96dcSApple OSS Distributionscases, it can also lead to stability issues.
63*2c2f96dcSApple OSS Distributions
64*2c2f96dcSApple OSS Distributions# Changes with -fbounds-safety
65*2c2f96dcSApple OSS Distributions
66*2c2f96dcSApple OSS DistributionsWhen enabling -fbounds-safety, character buffers and NUL-terminated strings are
67*2c2f96dcSApple OSS Distributionstwo distinct types, and they do not implicitly convert to each other. This
68*2c2f96dcSApple OSS Distributionsprevents confusing the two in the way that is problematic with `strlcpy`, for
69*2c2f96dcSApple OSS Distributionsinstance. However, it creates new problems:
70*2c2f96dcSApple OSS Distributions
71*2c2f96dcSApple OSS Distributions* What is the correct way to transform a character buffer into a NUL-terminated
72*2c2f96dcSApple OSS Distributions  string?
73*2c2f96dcSApple OSS Distributions* When -fbounds-safety flags that the use of a string function was improper,
74*2c2f96dcSApple OSS Distributions  what is the solution?
75*2c2f96dcSApple OSS Distributions
76*2c2f96dcSApple OSS DistributionsThe most common use of character buffers is to build a string, and then this
77*2c2f96dcSApple OSS Distributionsstring is passed without bounds as a NUL-terminated string to downstream users.
78*2c2f96dcSApple OSS Distributions-fbounds-safety and XNU enshrine this practice with the following additions:
79*2c2f96dcSApple OSS Distributions
80*2c2f96dcSApple OSS Distributions* `tsnprintf`: like `snprintf`, but it returns a NUL-terminated string;
81*2c2f96dcSApple OSS Distributions* `strbuf` functions, explicitly accepting character buffers and a distinct
82*2c2f96dcSApple OSS Distributions  count for each:
83*2c2f96dcSApple OSS Distributions  * `strbuflen(buffer, length)`: like `strnlen`;
84*2c2f96dcSApple OSS Distributions  * `strbufcmp(a, alen, b, len)`: like `strcmp`;
85*2c2f96dcSApple OSS Distributions  * `strbufcasecmp(a, alen, b, blen)`: like `strcasecmp`;
86*2c2f96dcSApple OSS Distributions  * `strbufcpy(a, alen, b, blen)`: like `strlcpy` but returns `a` as a NUL-
87*2c2f96dcSApple OSS Distributions    terminated string;
88*2c2f96dcSApple OSS Distributions  * `strbufcat(a, alen, b, blen)`: like `strlcat` but returns `a` as a NUL-
89*2c2f96dcSApple OSS Distributions    terminated string;
90*2c2f96dcSApple OSS Distributions* `strl` (new) functions, accepting _one_ character buffer of a known size and
91*2c2f96dcSApple OSS Distributions  _one_ NUL-terminated string:
92*2c2f96dcSApple OSS Distributions  * `strlcmp(a, b, alen)`: like `strcmp`;
93*2c2f96dcSApple OSS Distributions  * `strlcasecmp(a, b, alen)`: like `strcasecmp`.
94*2c2f96dcSApple OSS Distributions
95*2c2f96dcSApple OSS Distributions`strbuf` functions additionally all have overloads accepting character arrays
96*2c2f96dcSApple OSS Distributionsin lieu of a pointer+length pair: `strbuflen(array)`, `strbufcmp(a, b)`,
97*2c2f96dcSApple OSS Distributions`strbufcasecmp(a, b)`, `strbufcpy(a, b)`, `strbufcat(a, b)`.
98*2c2f96dcSApple OSS Distributions
99*2c2f96dcSApple OSS DistributionsIf the destination array of `strbufcpy` or `strbufcat` has a size of 0, they
100*2c2f96dcSApple OSS Distributionsreturn NULL without doing anything else. Otherwise, the destination is always
101*2c2f96dcSApple OSS DistributionsNUL-terminated and returned as a NUL-terminated string pointer.
102*2c2f96dcSApple OSS Distributions
103*2c2f96dcSApple OSS DistributionsWith -fbounds-safety enabled, the final operation modifying the character array
104*2c2f96dcSApple OSS Distributionsshould always return a NUL-terminated version of it. For instance, this plain C
105*2c2f96dcSApple OSS Distributionscode:
106*2c2f96dcSApple OSS Distributions
107*2c2f96dcSApple OSS Distributions```c
108*2c2f96dcSApple OSS Distributionschar thread_name[MAXTHREADNAMESIZE];
109*2c2f96dcSApple OSS Distributions(void) snprintf(thread_name, sizeof(thread_name),
110*2c2f96dcSApple OSS Distributions        "dlil_input_%s", ifp->if_xname);
111*2c2f96dcSApple OSS Distributionsthread_set_thread_name(inp->dlth_thread, thread_name);
112*2c2f96dcSApple OSS Distributions```
113*2c2f96dcSApple OSS Distributions
114*2c2f96dcSApple OSS Distributionsbecomes:
115*2c2f96dcSApple OSS Distributions
116*2c2f96dcSApple OSS Distributions```c
117*2c2f96dcSApple OSS Distributionschar thread_name_buf[MAXTHREADNAMESIZE];
118*2c2f96dcSApple OSS Distributionsconst char *__null_terminated thread_name;
119*2c2f96dcSApple OSS Distributionsthread_name = tsnprintf(thread_name_buf, sizeof(thread_name_buf),
120*2c2f96dcSApple OSS Distributions        "dlil_input_%s", ifp->if_xname);
121*2c2f96dcSApple OSS Distributionsthread_set_thread_name(inp->dlth_thread, thread_name);
122*2c2f96dcSApple OSS Distributions```
123*2c2f96dcSApple OSS Distributions
124*2c2f96dcSApple OSS DistributionsAlthough `tsnprintf` and `strbuf` functions return a `__null_terminated`
125*2c2f96dcSApple OSS Distributionspointer to you for convenience, not all use cases are resolved by calling
126*2c2f96dcSApple OSS Distributions`tsnprintf` or `strbufcpy` once. As a quick reference, with -fbounds-safety
127*2c2f96dcSApple OSS Distributionsenabled, you can use `__unsafe_null_terminated_from_indexable(p_start, p_nul)`
128*2c2f96dcSApple OSS Distributionsto convert a character array to a `__null_terminated` string if you need to
129*2c2f96dcSApple OSS Distributionsperform more manipulations. (`p_start` is a pointer to the first character, and
130*2c2f96dcSApple OSS Distributions`p_nul` is a pointer to the NUL character in that string.) For instance, if you
131*2c2f96dcSApple OSS Distributionsbuild a string with successive calls to `scnprintf`, you would use
132*2c2f96dcSApple OSS Distributions`__unsafe_null_terminated_from_indexable` at the end of the sequence to get your
133*2c2f96dcSApple OSS DistributionsNUL-terminated string pointer.
134*2c2f96dcSApple OSS Distributions
135*2c2f96dcSApple OSS Distributions# I have a choice between `strn*`, `strl*`, `strbuf*`. Which one do I use?
136*2c2f96dcSApple OSS Distributions
137*2c2f96dcSApple OSS DistributionsYou might come across cases where the same function in different families would
138*2c2f96dcSApple OSS Distributionsseem like they all do the trick. For instance:
139*2c2f96dcSApple OSS Distributions
140*2c2f96dcSApple OSS Distributions```c
141*2c2f96dcSApple OSS Distributionsstruct foo {
142*2c2f96dcSApple OSS Distributions    char buf1[10];
143*2c2f96dcSApple OSS Distributions    char buf2[16];
144*2c2f96dcSApple OSS Distributions};
145*2c2f96dcSApple OSS Distributions
146*2c2f96dcSApple OSS Distributionsvoid bar(struct foo *f) {
147*2c2f96dcSApple OSS Distributions    /* how do I test whether buf1 and buf2 contain the same string? */
148*2c2f96dcSApple OSS Distributions    if (strcmp(f->buf1, f->buf2) == 0) { /* ... */ }
149*2c2f96dcSApple OSS Distributions    if (strncmp(f->buf1, f->buf2, sizeof(f->buf1)) == 0) { /* ... */ }
150*2c2f96dcSApple OSS Distributions    if (strlcmp(f->buf1, f->buf2, sizeof(f->buf1)) == 0) { /* ... */ }
151*2c2f96dcSApple OSS Distributions    if (strbufcmp(f->buf1, f->buf2) == 0) { /* ... */ }
152*2c2f96dcSApple OSS Distributions}
153*2c2f96dcSApple OSS Distributions```
154*2c2f96dcSApple OSS Distributions
155*2c2f96dcSApple OSS DistributionsWithout -fbounds-safety, these all work the same, but when you enable it,
156*2c2f96dcSApple OSS Distributions`strbufcmp` could be the only one that builds. If you do not have the privilege
157*2c2f96dcSApple OSS Distributionsof -fbounds-safety to guide you to the best choice, as a rule of thumb, you
158*2c2f96dcSApple OSS Distributionsshould prefer APIs in the following order:
159*2c2f96dcSApple OSS Distributions
160*2c2f96dcSApple OSS Distributions1. `strbuf*` APIs;
161*2c2f96dcSApple OSS Distributions2. `strl*` APIs;
162*2c2f96dcSApple OSS Distributions3. `str*` APIs.
163*2c2f96dcSApple OSS Distributions
164*2c2f96dcSApple OSS DistributionsThat is, to implement `bar`, you have a choice of `strcmp`, `strncmp` and
165*2c2f96dcSApple OSS Distributions`strbufcmp`, and you should prefer `strbufcmp`.
166*2c2f96dcSApple OSS Distributions
167*2c2f96dcSApple OSS Distributions`strn` functions are **never** recommended. You should use `strbuflen` over
168*2c2f96dcSApple OSS Distributions`strnlen` (they do the same thing, but having a separate `strbuflen` function
169*2c2f96dcSApple OSS Distributionsmakes the guidance to avoid `strn` functions easier), and you should use
170*2c2f96dcSApple OSS Distributions`strbufcmp`, `strlcmp` or even `strcmp` over `strncmp` (depending on whether
171*2c2f96dcSApple OSS Distributionsyou know the length of each string, of just one, or of neither).
172