1*d8b80295SApple OSS Distributions# String handling in xnu 2*d8b80295SApple OSS Distributions 3*d8b80295SApple OSS Distributionsxnu implements most POSIX C string functions, including the inherited subset of 4*d8b80295SApple OSS Distributionsstandard C string functions. Unfortunately, poor design choices have made many 5*d8b80295SApple OSS Distributionsof these functions, including the more modern `strl` functions, confusing or 6*d8b80295SApple OSS Distributionsunsafe. In addition, the advent of -fbounds-safety support in xnu is forcing 7*d8b80295SApple OSS Distributionssome string handling practices to be revisited. This document explains the 8*d8b80295SApple OSS Distributionsfailings of POSIX C string functions, xnu's `strbuf` functions, and their 9*d8b80295SApple OSS Distributionsintersection with the -fbounds-safety C extension. 10*d8b80295SApple OSS Distributions 11*d8b80295SApple OSS Distributions## The short-form guidance 12*d8b80295SApple OSS Distributions 13*d8b80295SApple OSS Distributions* Use `strbuf*` when you have the length for all the strings; 14*d8b80295SApple OSS Distributions* use `strl*` when you have the length of _one_ string, and the other is 15*d8b80295SApple OSS Distributions guaranteed to be NUL-terminated; 16*d8b80295SApple OSS Distributions* use `str*` when you don't have the length for any of the strings, and they 17*d8b80295SApple OSS Distributions are all guaranteed to be NUL-terminated; 18*d8b80295SApple OSS Distributions* stop using `strn*` functions. 19*d8b80295SApple OSS Distributions 20*d8b80295SApple OSS Distributions# The problems with string functions 21*d8b80295SApple OSS Distributions 22*d8b80295SApple OSS DistributionsPOSIX string handling functions come in many variants: 23*d8b80295SApple OSS Distributions 24*d8b80295SApple OSS Distributions* `str` functions (strlen, strcat, etc), unsafe for writing; 25*d8b80295SApple OSS Distributions* `strn` functions (strnlen, strncat, etc), unsafe for writing; 26*d8b80295SApple OSS Distributions* `strl` functions (strlcpy, strlcat, etc), safe but easily misunderstood. 27*d8b80295SApple OSS Distributions 28*d8b80295SApple OSS Distributions`str` functions for writing (`strcpy`, `strcat`, etc) are **all** unsafe 29*d8b80295SApple OSS Distributionsbecause they don't care about the bounds of the output buffer. Most or all of 30*d8b80295SApple OSS Distributionsthese functions have been deprecated or outright removed from xnu. You should 31*d8b80295SApple OSS Distributionsnever use `str` functions to write to strings. Functions that simply read 32*d8b80295SApple OSS Distributionsstrings (`strlen`, `strcmp`, `strchr`, etc) are generally found to be safe 33*d8b80295SApple OSS Distributionsbecause there is no confusion that their input must be NUL-terminated and there 34*d8b80295SApple OSS Distributionsis no danger of writing out of bounds (out of not writing at all). 35*d8b80295SApple OSS Distributions 36*d8b80295SApple OSS Distributions`strn` functions for writing (`strncpy`, `strncat`, etc) are **all** unsafe. 37*d8b80295SApple OSS Distributions`strncpy` doesn't NUL-terminate the output buffer, and `strncat` doesn't accept 38*d8b80295SApple OSS Distributionsa length for the output buffer. **All** new string buffers should include space 39*d8b80295SApple OSS Distributionsfor a NUL terminator. `strn` functions for reading (`strncmp`, `strnlen`) are 40*d8b80295SApple OSS Distributions_generally_ safe, but `strncmp` can cause confusion over which string is bound 41*d8b80295SApple OSS Distributionsby the given size. In extreme cases, this can create information disclosure 42*d8b80295SApple OSS Distributionsbugs or stability issues. 43*d8b80295SApple OSS Distributions 44*d8b80295SApple OSS Distributions`strl` functions, in POSIX, only come in writing variants, and they always 45*d8b80295SApple OSS DistributionsNUL-terminate their output. This makes the writing part safe. (xnu adds `strl` 46*d8b80295SApple OSS Distributionscomparison functions, which do no writing and are also safe.) However, these 47*d8b80295SApple OSS Distributionsfunctions assume the output pointer is a buffer and the input is a NUL- 48*d8b80295SApple OSS Distributionsterminated string. Because of coexistence with `strn` functions that make no 49*d8b80295SApple OSS Distributionssuch assumption, this mental model isn't entirely adopted by many users. For 50*d8b80295SApple OSS Distributionsinstance, the following code is buggy: 51*d8b80295SApple OSS Distributions 52*d8b80295SApple OSS Distributions```c 53*d8b80295SApple OSS Distributionschar output[4]; 54*d8b80295SApple OSS Distributionschar input[8] = "abcdefgh"; /* not NUL-terminated */ 55*d8b80295SApple OSS Distributionsstrlcpy(output, input, sizeof(output)); 56*d8b80295SApple OSS Distributions``` 57*d8b80295SApple OSS Distributions 58*d8b80295SApple OSS Distributions`strlcpy` returns the length of the input string; in xnu's implementation, 59*d8b80295SApple OSS Distributionsliterally by calling `strlen(input)`. Even though only 3 characters are written 60*d8b80295SApple OSS Distributionsto `output` (plus a NUL), `input` is read until reaching a NUL character. This 61*d8b80295SApple OSS Distributionsis always a problem from the perspective of memory disclosures, and in some 62*d8b80295SApple OSS Distributionscases, it can also lead to stability issues. 63*d8b80295SApple OSS Distributions 64*d8b80295SApple OSS Distributions# Changes with -fbounds-safety 65*d8b80295SApple OSS Distributions 66*d8b80295SApple OSS DistributionsWhen enabling -fbounds-safety, character buffers and NUL-terminated strings are 67*d8b80295SApple OSS Distributionstwo distinct types, and they do not implicitly convert to each other. This 68*d8b80295SApple OSS Distributionsprevents confusing the two in the way that is problematic with `strlcpy`, for 69*d8b80295SApple OSS Distributionsinstance. However, it creates new problems: 70*d8b80295SApple OSS Distributions 71*d8b80295SApple OSS Distributions* What is the correct way to transform a character buffer into a NUL-terminated 72*d8b80295SApple OSS Distributions string? 73*d8b80295SApple OSS Distributions* When -fbounds-safety flags that the use of a string function was improper, 74*d8b80295SApple OSS Distributions what is the solution? 75*d8b80295SApple OSS Distributions 76*d8b80295SApple OSS DistributionsThe most common use of character buffers is to build a string, and then this 77*d8b80295SApple OSS Distributionsstring is passed without bounds as a NUL-terminated string to downstream users. 78*d8b80295SApple OSS Distributions-fbounds-safety and XNU enshrine this practice with the following additions: 79*d8b80295SApple OSS Distributions 80*d8b80295SApple OSS Distributions* `tsnprintf`: like `snprintf`, but it returns a NUL-terminated string; 81*d8b80295SApple OSS Distributions* `strbuf` functions, explicitly accepting character buffers and a distinct 82*d8b80295SApple OSS Distributions count for each: 83*d8b80295SApple OSS Distributions * `strbuflen(buffer, length)`: like `strnlen`; 84*d8b80295SApple OSS Distributions * `strbufcmp(a, alen, b, len)`: like `strcmp`; 85*d8b80295SApple OSS Distributions * `strbufcasecmp(a, alen, b, blen)`: like `strcasecmp`; 86*d8b80295SApple OSS Distributions * `strbufcpy(a, alen, b, blen)`: like `strlcpy` but returns `a` as a NUL- 87*d8b80295SApple OSS Distributions terminated string; 88*d8b80295SApple OSS Distributions * `strbufcat(a, alen, b, blen)`: like `strlcat` but returns `a` as a NUL- 89*d8b80295SApple OSS Distributions terminated string; 90*d8b80295SApple OSS Distributions* `strl` (new) functions, accepting _one_ character buffer of a known size and 91*d8b80295SApple OSS Distributions _one_ NUL-terminated string: 92*d8b80295SApple OSS Distributions * `strlcmp(a, b, alen)`: like `strcmp`; 93*d8b80295SApple OSS Distributions * `strlcasecmp(a, b, alen)`: like `strcasecmp`. 94*d8b80295SApple OSS Distributions 95*d8b80295SApple OSS Distributions`strbuf` functions additionally all have overloads accepting character arrays 96*d8b80295SApple OSS Distributionsin lieu of a pointer+length pair: `strbuflen(array)`, `strbufcmp(a, b)`, 97*d8b80295SApple OSS Distributions`strbufcasecmp(a, b)`, `strbufcpy(a, b)`, `strbufcat(a, b)`. 98*d8b80295SApple OSS Distributions 99*d8b80295SApple OSS DistributionsIf the destination array of `strbufcpy` or `strbufcat` has a size of 0, they 100*d8b80295SApple OSS Distributionsreturn NULL without doing anything else. Otherwise, the destination is always 101*d8b80295SApple OSS DistributionsNUL-terminated and returned as a NUL-terminated string pointer. 102*d8b80295SApple OSS Distributions 103*d8b80295SApple OSS DistributionsWith -fbounds-safety enabled, the final operation modifying the character array 104*d8b80295SApple OSS Distributionsshould always return a NUL-terminated version of it. For instance, this plain C 105*d8b80295SApple OSS Distributionscode: 106*d8b80295SApple OSS Distributions 107*d8b80295SApple OSS Distributions```c 108*d8b80295SApple OSS Distributionschar thread_name[MAXTHREADNAMESIZE]; 109*d8b80295SApple OSS Distributions(void) snprintf(thread_name, sizeof(thread_name), 110*d8b80295SApple OSS Distributions "dlil_input_%s", ifp->if_xname); 111*d8b80295SApple OSS Distributionsthread_set_thread_name(inp->dlth_thread, thread_name); 112*d8b80295SApple OSS Distributions``` 113*d8b80295SApple OSS Distributions 114*d8b80295SApple OSS Distributionsbecomes: 115*d8b80295SApple OSS Distributions 116*d8b80295SApple OSS Distributions```c 117*d8b80295SApple OSS Distributionschar thread_name_buf[MAXTHREADNAMESIZE]; 118*d8b80295SApple OSS Distributionsconst char *__null_terminated thread_name; 119*d8b80295SApple OSS Distributionsthread_name = tsnprintf(thread_name_buf, sizeof(thread_name_buf), 120*d8b80295SApple OSS Distributions "dlil_input_%s", ifp->if_xname); 121*d8b80295SApple OSS Distributionsthread_set_thread_name(inp->dlth_thread, thread_name); 122*d8b80295SApple OSS Distributions``` 123*d8b80295SApple OSS Distributions 124*d8b80295SApple OSS DistributionsAlthough `tsnprintf` and `strbuf` functions return a `__null_terminated` 125*d8b80295SApple OSS Distributionspointer to you for convenience, not all use cases are resolved by calling 126*d8b80295SApple OSS Distributions`tsnprintf` or `strbufcpy` once. As a quick reference, with -fbounds-safety 127*d8b80295SApple OSS Distributionsenabled, you can use `__unsafe_null_terminated_from_indexable(p_start, p_nul)` 128*d8b80295SApple OSS Distributionsto convert a character array to a `__null_terminated` string if you need to 129*d8b80295SApple OSS Distributionsperform more manipulations. (`p_start` is a pointer to the first character, and 130*d8b80295SApple OSS Distributions`p_nul` is a pointer to the NUL character in that string.) For instance, if you 131*d8b80295SApple OSS Distributionsbuild a string with successive calls to `scnprintf`, you would use 132*d8b80295SApple OSS Distributions`__unsafe_null_terminated_from_indexable` at the end of the sequence to get your 133*d8b80295SApple OSS DistributionsNUL-terminated string pointer. 134*d8b80295SApple OSS Distributions 135*d8b80295SApple OSS Distributions# I have a choice between `strn*`, `strl*`, `strbuf*`. Which one do I use? 136*d8b80295SApple OSS Distributions 137*d8b80295SApple OSS DistributionsYou might come across cases where the same function in different families would 138*d8b80295SApple OSS Distributionsseem like they all do the trick. For instance: 139*d8b80295SApple OSS Distributions 140*d8b80295SApple OSS Distributions```c 141*d8b80295SApple OSS Distributionsstruct foo { 142*d8b80295SApple OSS Distributions char buf1[10]; 143*d8b80295SApple OSS Distributions char buf2[16]; 144*d8b80295SApple OSS Distributions}; 145*d8b80295SApple OSS Distributions 146*d8b80295SApple OSS Distributionsvoid bar(struct foo *f) { 147*d8b80295SApple OSS Distributions /* how do I test whether buf1 and buf2 contain the same string? */ 148*d8b80295SApple OSS Distributions if (strcmp(f->buf1, f->buf2) == 0) { /* ... */ } 149*d8b80295SApple OSS Distributions if (strncmp(f->buf1, f->buf2, sizeof(f->buf1)) == 0) { /* ... */ } 150*d8b80295SApple OSS Distributions if (strlcmp(f->buf1, f->buf2, sizeof(f->buf1)) == 0) { /* ... */ } 151*d8b80295SApple OSS Distributions if (strbufcmp(f->buf1, f->buf2) == 0) { /* ... */ } 152*d8b80295SApple OSS Distributions} 153*d8b80295SApple OSS Distributions``` 154*d8b80295SApple OSS Distributions 155*d8b80295SApple OSS DistributionsWithout -fbounds-safety, these all work the same, but when you enable it, 156*d8b80295SApple OSS Distributions`strbufcmp` could be the only one that builds. If you do not have the privilege 157*d8b80295SApple OSS Distributionsof -fbounds-safety to guide you to the best choice, as a rule of thumb, you 158*d8b80295SApple OSS Distributionsshould prefer APIs in the following order: 159*d8b80295SApple OSS Distributions 160*d8b80295SApple OSS Distributions1. `strbuf*` APIs; 161*d8b80295SApple OSS Distributions2. `strl*` APIs; 162*d8b80295SApple OSS Distributions3. `str*` APIs. 163*d8b80295SApple OSS Distributions 164*d8b80295SApple OSS DistributionsThat is, to implement `bar`, you have a choice of `strcmp`, `strncmp` and 165*d8b80295SApple OSS Distributions`strbufcmp`, and you should prefer `strbufcmp`. 166*d8b80295SApple OSS Distributions 167*d8b80295SApple OSS Distributions`strn` functions are **never** recommended. You should use `strbuflen` over 168*d8b80295SApple OSS Distributions`strnlen` (they do the same thing, but having a separate `strbuflen` function 169*d8b80295SApple OSS Distributionsmakes the guidance to avoid `strn` functions easier), and you should use 170*d8b80295SApple OSS Distributions`strbufcmp`, `strlcmp` or even `strcmp` over `strncmp` (depending on whether 171*d8b80295SApple OSS Distributionsyou know the length of each string, of just one, or of neither). 172