1*e3723e1fSApple OSS Distributions# String handling in xnu 2*e3723e1fSApple OSS Distributions 3*e3723e1fSApple OSS Distributionsxnu implements most POSIX C string functions, including the inherited subset of 4*e3723e1fSApple OSS Distributionsstandard C string functions. Unfortunately, poor design choices have made many 5*e3723e1fSApple OSS Distributionsof these functions, including the more modern `strl` functions, confusing or 6*e3723e1fSApple OSS Distributionsunsafe. In addition, the advent of -fbounds-safety support in xnu is forcing 7*e3723e1fSApple OSS Distributionssome string handling practices to be revisited. This document explains the 8*e3723e1fSApple OSS Distributionsfailings of POSIX C string functions, xnu's `strbuf` functions, and their 9*e3723e1fSApple OSS Distributionsintersection with the -fbounds-safety C extension. 10*e3723e1fSApple OSS Distributions 11*e3723e1fSApple OSS Distributions## The short-form guidance 12*e3723e1fSApple OSS Distributions 13*e3723e1fSApple OSS Distributions* Use `strbuf*` when you have the length for all the strings; 14*e3723e1fSApple OSS Distributions* use `strl*` when you have the length of _one_ string, and the other is 15*e3723e1fSApple OSS Distributions guaranteed to be NUL-terminated; 16*e3723e1fSApple OSS Distributions* use `str*` when you don't have the length for any of the strings, and they 17*e3723e1fSApple OSS Distributions are all guaranteed to be NUL-terminated; 18*e3723e1fSApple OSS Distributions* stop using `strn*` functions. 19*e3723e1fSApple OSS Distributions 20*e3723e1fSApple OSS Distributions## Replacing `strncmp` 21*e3723e1fSApple OSS Distributions 22*e3723e1fSApple OSS Distributions`strncmp` is always wrong with -fbounds-safety, and it's unavailable as a 23*e3723e1fSApple OSS Distributionsresult. Given `strcmp(first, secnd, n)`, you need to know the types of `first` 24*e3723e1fSApple OSS Distributionsand `secnd` to pick a replacement. Choose according to this table: 25*e3723e1fSApple OSS Distributions 26*e3723e1fSApple OSS Distributions| strncmp(first, secnd, n) | __null_terminated first | __indexable first | 27*e3723e1fSApple OSS Distributions| ------------------------ | ------------------------- | ------------------------------- | 28*e3723e1fSApple OSS Distributions| __null_terminated secnd | n/a | strlcmp(first, secnd, n1) | 29*e3723e1fSApple OSS Distributions| __indexable secnd | strlcmp(secnd, first, n2) | strbufcmp(first, n1, secnd, n2) | 30*e3723e1fSApple OSS Distributions 31*e3723e1fSApple OSS DistributionsUsing `strncmp` with two NUL-terminated strings is uncommon and it has no 32*e3723e1fSApple OSS Distributionsdirect replacement. The first person who needs to use -fbounds-safety in a file 33*e3723e1fSApple OSS Distributionsthat does this might need to write the string function. 34*e3723e1fSApple OSS Distributions 35*e3723e1fSApple OSS DistributionsIf you try to use `strlcmp` and you get a diagnostic like this: 36*e3723e1fSApple OSS Distributions 37*e3723e1fSApple OSS Distributions> passing 'const char *__indexable' to parameter of incompatible type 38*e3723e1fSApple OSS Distributions> 'const char *__null_terminated' is an unsafe operation ... 39*e3723e1fSApple OSS Distributions 40*e3723e1fSApple OSS Distributionsthen you might need to swap the two string arguments. `strlcmp` is sensitive to 41*e3723e1fSApple OSS Distributionsthe argument order: just like for `strlcpy`, the indexable string goes first. 42*e3723e1fSApple OSS Distributions 43*e3723e1fSApple OSS Distributions# The problems with string functions 44*e3723e1fSApple OSS Distributions 45*e3723e1fSApple OSS DistributionsPOSIX/BSD string handling functions come in many variants: 46*e3723e1fSApple OSS Distributions 47*e3723e1fSApple OSS Distributions* `str` functions (strlen, strcat, etc), unsafe for writing; 48*e3723e1fSApple OSS Distributions* `strn` functions (strnlen, strncat, etc), unsafe for writing; 49*e3723e1fSApple OSS Distributions* `strl` functions (strlcpy, strlcat, etc), safe but easily misunderstood. 50*e3723e1fSApple OSS Distributions 51*e3723e1fSApple OSS Distributions`str` functions for writing (`strcpy`, `strcat`, etc) are **all** unsafe 52*e3723e1fSApple OSS Distributionsbecause they don't care about the bounds of the output buffer. Most or all of 53*e3723e1fSApple OSS Distributionsthese functions have been deprecated or outright removed from xnu. You should 54*e3723e1fSApple OSS Distributionsnever use `str` functions to write to strings. Functions that simply read 55*e3723e1fSApple OSS Distributionsstrings (`strlen`, `strcmp`, `strchr`, etc) are generally found to be safe 56*e3723e1fSApple OSS Distributionsbecause there is no confusion that their input must be NUL-terminated and there 57*e3723e1fSApple OSS Distributionsis no danger of writing out of bounds (out of not writing at all). 58*e3723e1fSApple OSS Distributions 59*e3723e1fSApple OSS Distributions`strn` functions for writing (`strncpy`, `strncat`, etc) are **all** unsafe. 60*e3723e1fSApple OSS Distributions`strncpy` doesn't NUL-terminate the output buffer, and `strncat` doesn't accept 61*e3723e1fSApple OSS Distributionsa length for the output buffer. **All** new string buffers should include space 62*e3723e1fSApple OSS Distributionsfor a NUL terminator. `strn` functions for reading (`strncmp`, `strnlen`) are 63*e3723e1fSApple OSS Distributions_generally_ safe, but `strncmp` can cause confusion over which string is bound 64*e3723e1fSApple OSS Distributionsby the given size. In extreme cases, this can create information disclosure 65*e3723e1fSApple OSS Distributionsbugs or stability issues. 66*e3723e1fSApple OSS Distributions 67*e3723e1fSApple OSS Distributions`strl` functions, from OpenBSD, only come in writing variants, and they always 68*e3723e1fSApple OSS DistributionsNUL-terminate their output. This makes the writing part safe. (xnu adds `strl` 69*e3723e1fSApple OSS Distributionscomparison functions, which do no writing and are also safe.) However, these 70*e3723e1fSApple OSS Distributionsfunctions assume the output pointer is a buffer and the input is a NUL- 71*e3723e1fSApple OSS Distributionsterminated string. Because of coexistence with `strn` functions that make no 72*e3723e1fSApple OSS Distributionssuch assumption, this mental model isn't entirely adopted by many users. For 73*e3723e1fSApple OSS Distributionsinstance, the following code is buggy: 74*e3723e1fSApple OSS Distributions 75*e3723e1fSApple OSS Distributions```c 76*e3723e1fSApple OSS Distributionschar output[4]; 77*e3723e1fSApple OSS Distributionschar input[8] = "abcdefgh"; /* not NUL-terminated */ 78*e3723e1fSApple OSS Distributionsstrlcpy(output, input, sizeof(output)); 79*e3723e1fSApple OSS Distributions``` 80*e3723e1fSApple OSS Distributions 81*e3723e1fSApple OSS Distributions`strlcpy` returns the length of the input string; in xnu's implementation, 82*e3723e1fSApple OSS Distributionsliterally by calling `strlen(input)`. Even though only 3 characters are written 83*e3723e1fSApple OSS Distributionsto `output` (plus a NUL), `input` is read until reaching a NUL character. This 84*e3723e1fSApple OSS Distributionsis always a problem from the perspective of memory disclosures, and in some 85*e3723e1fSApple OSS Distributionscases, it can also lead to stability issues. 86*e3723e1fSApple OSS Distributions 87*e3723e1fSApple OSS Distributions`strlcpy_ret` is a convenience wrapper around `strlcpy`, which returns 88*e3723e1fSApple OSS Distributionsa `__null_terminated` pointer to the output string instead of the length of the input string. 89*e3723e1fSApple OSS DistributionsSimilarly to `strlcpy`, the `strlcpy_ret` will search for the NUL character 90*e3723e1fSApple OSS Distributionsin the input string. 91*e3723e1fSApple OSS Distributions 92*e3723e1fSApple OSS Distributions# Changes with -fbounds-safety 93*e3723e1fSApple OSS Distributions 94*e3723e1fSApple OSS DistributionsWhen enabling -fbounds-safety, character buffers and NUL-terminated strings are 95*e3723e1fSApple OSS Distributionstwo distinct types, and they do not implicitly convert to each other. This 96*e3723e1fSApple OSS Distributionsprevents confusing the two in the way that is problematic with `strlcpy`/`strlcpy_ret`, 97*e3723e1fSApple OSS Distributionsfor instance. However, it creates new problems: 98*e3723e1fSApple OSS Distributions 99*e3723e1fSApple OSS Distributions* What is the correct way to transform a character buffer into a NUL-terminated 100*e3723e1fSApple OSS Distributions string? 101*e3723e1fSApple OSS Distributions* When -fbounds-safety flags that the use of a string function was improper, 102*e3723e1fSApple OSS Distributions what is the solution? 103*e3723e1fSApple OSS Distributions 104*e3723e1fSApple OSS DistributionsThe most common use of character buffers is to build a string, and then this 105*e3723e1fSApple OSS Distributionsstring is passed without bounds as a NUL-terminated string to downstream users. 106*e3723e1fSApple OSS Distributions-fbounds-safety and XNU enshrine this practice with the following additions: 107*e3723e1fSApple OSS Distributions 108*e3723e1fSApple OSS Distributions* `tsnprintf`: like `snprintf`, but it returns a NUL-terminated string; 109*e3723e1fSApple OSS Distributions* `strbuf` functions, explicitly accepting character buffers and a distinct 110*e3723e1fSApple OSS Distributions count for each: 111*e3723e1fSApple OSS Distributions * `strbuflen(buffer, length)`: like `strnlen`; 112*e3723e1fSApple OSS Distributions * `strbufcmp(a, alen, b, len)`: like `strcmp`; 113*e3723e1fSApple OSS Distributions * `strbufcasecmp(a, alen, b, blen)`: like `strcasecmp`; 114*e3723e1fSApple OSS Distributions * `strbufcpy(a, alen, b, blen)`: like `strlcpy` but returns `a` as a NUL- 115*e3723e1fSApple OSS Distributions terminated string; 116*e3723e1fSApple OSS Distributions * `strlcpy_ret(dst, src, n)`: like `strlcpy`, but returns `dst` as a NUL- 117*e3723e1fSApple OSS Distributions terminated string; 118*e3723e1fSApple OSS Distributions * `strbufcat(a, alen, b, blen)`: like `strlcat` but returns `a` as a NUL- 119*e3723e1fSApple OSS Distributions terminated string; 120*e3723e1fSApple OSS Distributions* `strl` (new) functions, accepting _one_ character buffer of a known size and 121*e3723e1fSApple OSS Distributions _one_ NUL-terminated string: 122*e3723e1fSApple OSS Distributions * `strlcmp(a, b, alen)`: like `strcmp`; 123*e3723e1fSApple OSS Distributions * `strlcasecmp(a, b, alen)`: like `strcasecmp`. 124*e3723e1fSApple OSS Distributions 125*e3723e1fSApple OSS Distributions`strbuf` functions additionally all have overloads accepting character arrays 126*e3723e1fSApple OSS Distributionsin lieu of a pointer+length pair: `strbuflen(array)`, `strbufcmp(a, b)`, 127*e3723e1fSApple OSS Distributions`strbufcasecmp(a, b)`, `strbufcpy(a, b)`, `strbufcat(a, b)`. 128*e3723e1fSApple OSS Distributions 129*e3723e1fSApple OSS DistributionsIf the destination array of `strbufcpy` or `strbufcat` has a size of 0, they 130*e3723e1fSApple OSS Distributionsreturn NULL without doing anything else. Otherwise, the destination is always 131*e3723e1fSApple OSS DistributionsNUL-terminated and returned as a NUL-terminated string pointer. 132*e3723e1fSApple OSS Distributions 133*e3723e1fSApple OSS DistributionsWhile you are modifying a string, you should reference its data as some flavor 134*e3723e1fSApple OSS Distributionsof indexable pointer, and only once you're done should you convert it to a 135*e3723e1fSApple OSS DistributionsNUL-terminated string. NUL-terminated character pointers are generally not 136*e3723e1fSApple OSS Distributionssuitable for modifications as bounds are determined by contents. Overwriting 137*e3723e1fSApple OSS Distributionsany NUL character found through a `__null_terminated` pointer access will result 138*e3723e1fSApple OSS Distributionsin a trap. For instance: 139*e3723e1fSApple OSS Distributions 140*e3723e1fSApple OSS Distributions```c 141*e3723e1fSApple OSS Distributionsvoid my_string_consuming_func(const char *); 142*e3723e1fSApple OSS Distributions 143*e3723e1fSApple OSS Distributions// lots of __unsafe! 144*e3723e1fSApple OSS Distributionschar *__null_terminated my_string = __unsafe_forge_null_terminated( 145*e3723e1fSApple OSS Distributions kalloc_data(my_string_size, Z_WAITOK)); 146*e3723e1fSApple OSS Distributionsmemcpy( 147*e3723e1fSApple OSS Distributions __unsafe_forge_bidi_indexable(void *, my_string, my_string_size), 148*e3723e1fSApple OSS Distributions my_data, 149*e3723e1fSApple OSS Distributions my_string_size); 150*e3723e1fSApple OSS Distributionsmy_string_consuming_func(my_string); 151*e3723e1fSApple OSS Distributions``` 152*e3723e1fSApple OSS Distributions 153*e3723e1fSApple OSS DistributionsThis code converts the string pointer to a NUL-terminated string too early, 154*e3723e1fSApple OSS Distributionswhile it's still being modified. Keeping my_string a `__null_terminated` pointer 155*e3723e1fSApple OSS Distributionswhile it's being modified leads to more forging, which has more chances of 156*e3723e1fSApple OSS Distributionsintroducing errors, and is less ergonomic. Consider this instead: 157*e3723e1fSApple OSS Distributions 158*e3723e1fSApple OSS Distributions```c 159*e3723e1fSApple OSS Distributionsvoid my_string_consuming_func(const char *); 160*e3723e1fSApple OSS Distributions 161*e3723e1fSApple OSS Distributionschar *my_buffer = kalloc_data(my_string_size, Z_WAITOK); 162*e3723e1fSApple OSS Distributionsconst char *__null_terminated finished_string = 163*e3723e1fSApple OSS Distributions strbufcpy(my_buffer, my_string_size, my_data, my_string_size); 164*e3723e1fSApple OSS Distributionsmy_string_consuming_func(finished); 165*e3723e1fSApple OSS Distributions``` 166*e3723e1fSApple OSS Distributions 167*e3723e1fSApple OSS DistributionsThis example has two views of the same data: `my_buffer` (through which the 168*e3723e1fSApple OSS Distributionsstring is being modified) and `finished_string` (which is `const` and 169*e3723e1fSApple OSS DistributionsNUL-terminated). Using `my_buffer` as an indexable pointer allows you to modify 170*e3723e1fSApple OSS Distributionsit ergonomically, and importantly, without forging. You turn it into a 171*e3723e1fSApple OSS DistributionsNUL-terminated string at the same time you turn it into a `const` reference, 172*e3723e1fSApple OSS Distributionssignalling that you're done making changes. 173*e3723e1fSApple OSS Distributions 174*e3723e1fSApple OSS DistributionsWith -fbounds-safety enabled, you should structure the final operation modifying 175*e3723e1fSApple OSS Distributionsa character array such that you get a NUL-terminated view of it. For instance, 176*e3723e1fSApple OSS Distributionsthis plain C code: 177*e3723e1fSApple OSS Distributions 178*e3723e1fSApple OSS Distributions```c 179*e3723e1fSApple OSS Distributionschar thread_name[MAXTHREADNAMESIZE]; 180*e3723e1fSApple OSS Distributions(void) snprintf(thread_name, sizeof(thread_name), 181*e3723e1fSApple OSS Distributions "dlil_input_%s", ifp->if_xname); 182*e3723e1fSApple OSS Distributionsthread_set_thread_name(inp->dlth_thread, thread_name); 183*e3723e1fSApple OSS Distributions``` 184*e3723e1fSApple OSS Distributions 185*e3723e1fSApple OSS Distributionsbecomes: 186*e3723e1fSApple OSS Distributions 187*e3723e1fSApple OSS Distributions```c 188*e3723e1fSApple OSS Distributionschar thread_name_buf[MAXTHREADNAMESIZE]; 189*e3723e1fSApple OSS Distributionsconst char *__null_terminated thread_name; 190*e3723e1fSApple OSS Distributionsthread_name = tsnprintf(thread_name_buf, sizeof(thread_name_buf), 191*e3723e1fSApple OSS Distributions "dlil_input_%s", ifp->if_xname); 192*e3723e1fSApple OSS Distributionsthread_set_thread_name(inp->dlth_thread, thread_name); 193*e3723e1fSApple OSS Distributions``` 194*e3723e1fSApple OSS Distributions 195*e3723e1fSApple OSS DistributionsAlthough `tsnprintf` and `strbuf` functions return a `__null_terminated` 196*e3723e1fSApple OSS Distributionspointer to you for convenience, not all use cases are resolved by calling 197*e3723e1fSApple OSS Distributions`tsnprintf` or `strbufcpy` once. As a quick reference, with -fbounds-safety 198*e3723e1fSApple OSS Distributionsenabled, you can use `__unsafe_null_terminated_from_indexable(p_start, p_nul)` 199*e3723e1fSApple OSS Distributionsto convert a character array to a `__null_terminated` string if you need to 200*e3723e1fSApple OSS Distributionsperform more manipulations. (`p_start` is a pointer to the first character, and 201*e3723e1fSApple OSS Distributions`p_nul` is a pointer to the NUL character in that string.) For instance, if you 202*e3723e1fSApple OSS Distributionsbuild a string with successive calls to `scnprintf`, you would use 203*e3723e1fSApple OSS Distributions`__unsafe_null_terminated_from_indexable` at the end of the sequence to get your 204*e3723e1fSApple OSS DistributionsNUL-terminated string pointer. 205*e3723e1fSApple OSS Distributions 206*e3723e1fSApple OSS DistributionsOccasionally, you need to turn a NUL-terminated string back into "char buffer" 207*e3723e1fSApple OSS Distributions(usually to interoperate with copy APIs that need a pointer and a byte count). 208*e3723e1fSApple OSS DistributionsWhen possible, it's advised to use APIs that copy NUL-terminated pointers (like 209*e3723e1fSApple OSS Distributions`strlcpy`). Otherwise, convert the NUL-terminated string to an indexable buffer 210*e3723e1fSApple OSS Distributionsusing `__null_terminated_to_indexable` (if you don't need the NUL terminator to 211*e3723e1fSApple OSS Distributionsbe in bounds of the result pointer) or `__unsafe_null_terminated_to_indexable` 212*e3723e1fSApple OSS Distributions(if you need it). Also keep in mind that in code which pervasively deals with 213*e3723e1fSApple OSS Distributionsbuffers that have lengths and some of them happen to also be NUL-terminated 214*e3723e1fSApple OSS Distributionsstrings, it could be simply more convenient to keep string buffers in some 215*e3723e1fSApple OSS Distributionsflavor of indexable pointers instead of having conversions from and to 216*e3723e1fSApple OSS DistributionsNUL-terminated strings. 217*e3723e1fSApple OSS Distributions 218*e3723e1fSApple OSS Distributions# I have a choice between `strn*`, `strl*`, `strbuf*`. Which one do I use? 219*e3723e1fSApple OSS Distributions 220*e3723e1fSApple OSS DistributionsYou might come across cases where the same function in different families would 221*e3723e1fSApple OSS Distributionsseem like they all do the trick. For instance: 222*e3723e1fSApple OSS Distributions 223*e3723e1fSApple OSS Distributions```c 224*e3723e1fSApple OSS Distributionsstruct foo { 225*e3723e1fSApple OSS Distributions char buf1[10]; 226*e3723e1fSApple OSS Distributions char buf2[16]; 227*e3723e1fSApple OSS Distributions}; 228*e3723e1fSApple OSS Distributions 229*e3723e1fSApple OSS Distributionsvoid bar(struct foo *f) { 230*e3723e1fSApple OSS Distributions /* how do I test whether buf1 and buf2 contain the same string? */ 231*e3723e1fSApple OSS Distributions if (strcmp(f->buf1, f->buf2) == 0) { /* ... */ } 232*e3723e1fSApple OSS Distributions if (strncmp(f->buf1, f->buf2, sizeof(f->buf1)) == 0) { /* ... */ } 233*e3723e1fSApple OSS Distributions if (strlcmp(f->buf1, f->buf2, sizeof(f->buf1)) == 0) { /* ... */ } 234*e3723e1fSApple OSS Distributions if (strbufcmp(f->buf1, f->buf2) == 0) { /* ... */ } 235*e3723e1fSApple OSS Distributions} 236*e3723e1fSApple OSS Distributions``` 237*e3723e1fSApple OSS Distributions 238*e3723e1fSApple OSS DistributionsWithout -fbounds-safety, these all work the same, but when you enable it, 239*e3723e1fSApple OSS Distributions`strbufcmp` could be the only one that builds. If you do not have the privilege 240*e3723e1fSApple OSS Distributionsof -fbounds-safety to guide you to the best choice, as a rule of thumb, you 241*e3723e1fSApple OSS Distributionsshould prefer APIs in the following order: 242*e3723e1fSApple OSS Distributions 243*e3723e1fSApple OSS Distributions1. `strbuf*` APIs; 244*e3723e1fSApple OSS Distributions2. `strl*` APIs; 245*e3723e1fSApple OSS Distributions3. `str*` APIs. 246*e3723e1fSApple OSS Distributions 247*e3723e1fSApple OSS DistributionsThat is, to implement `bar`, you have a choice of `strcmp`, `strncmp` and 248*e3723e1fSApple OSS Distributions`strbufcmp`, and you should prefer `strbufcmp`. 249*e3723e1fSApple OSS Distributions 250*e3723e1fSApple OSS Distributions`strn` functions are **never** recommended. You should use `strbuflen` over 251*e3723e1fSApple OSS Distributions`strnlen` (they do the same thing, but having a separate `strbuflen` function 252*e3723e1fSApple OSS Distributionsmakes the guidance to avoid `strn` functions easier), and you should use 253*e3723e1fSApple OSS Distributions`strbufcmp`, `strlcmp` or even `strcmp` over `strncmp` (depending on whether 254*e3723e1fSApple OSS Distributionsyou know the length of each string, of just one, or of neither). 255