1# XNU debugging 2 3xnu’s debugging macros are compatible with both Python 2 and 3. In practice, this means that Python 3 4features are unavailable and some Python 2 syntax is not allowed. Unfortunately, any syntax error will 5prevent use of all the macros, as they’re all imported into the same scripting environment. 6 7## Compatibility 8 9Avoid introducing specific compatibility shims, as there are a few existing ones that come with 10Python 2 and 3: 11 12* **six** has helpers that work in both Python 2 and 3, for things like the string type change 13* **future** backports features from Python 3 to Python 2 14 15For example, Python 2 contains **range** and **xrange**. Python 3 contains only **range** which has 16**xrange** semantics. The simplest solution is to port your code and use Python 3 way: 17 18``` 19# Use backported range from Python 3 20from builtins import range 21 22# Use range on both Python 2/3 runtimes 23for x in range(....): 24 .... 25``` 26 27Be very careful about using imports from 'future' library. Some of them are **very invasive** and change 28behavior of your code. This may cause strange runtime errors. For example: 29 30``` 31# Changes modules handling logic to make your code working with std library reorg (PEP 3108) 32from future import standard_library 33standard_library.install_aliases() 34 35# Replaces lot of common types like str with future's Python 3 backports. 36from builtins import * 37``` 38 39## Handling strings 40 41Macros use strings produced from the LLDB runtime. They must use **six** when doing certain operations 42to avoid exceptions. Until the transition is done, these canonical ways of dealing with strings cannot 43be used: 44 45* Using Unicode literals by default: 46 `from __future__ import unicode_literals` 47* **f-strings** 48 49Some advice: 50 51* Use byte strings explicitly when dealing with memory and not strings: 52 `b'string'` 53* Always properly encode/decode raw data to/from strings before passing it around, with `six.ensure_str` or 54 `six.ensure_bytes`. 55 56Improperly-typed strings will raise *different* exceptions on each runtime. 57 58* Python 2 raises codec exceptions when printing strings. 59* Python 3 complains about concatenation of objects of incompatible types (bytes and strings). 60 61### No convenient, common string type 62 63While it is possible to use future’s **newstr** to backport new string type to Python 3, there are 64issues with the Scripting Bridge (SB) API from LLDB. Python 3 will work out of the box but Python 2 65will complain because **newstr** maps to **unicode**. SB exposes **const char \*** as a native string, 66or just **str** in Python 2. For Python 2 we would have to explicitly encode all Unicode strings 67before calling the API. 68 69Another problem is that literals in form `'string'` are no longer compatible with unicode and need 70to be switched to `u'string'`. This can be changed with single import at the top of the file, but 71in some scenarios byte strings are expected. That change would require checking all strings in the 72code and changing some back to `b'string'`. 73 74Here’s an example of just how pervasive a change would be because this code would break in Python 2: 75 76``` 77from xnu import * 78 79@lldb_type_summary(['type']) 80def print_summary(): 81 .... 82``` 83 84The result is that we have non-unicode literal being registered with unicode API in Python 3. 85Unfortunately `'type' != b'type'` and thus LLDB will never match the type when printing summaries. 86 87Using native strings and literals allows for only minimal code changes to the macros that are still 88compatible with other projects using Python 2. 89 90### Check that an object is a string 91 92Avoid testing for `str` explicitly like `type(obj) == str`. This won’t work correctly as Python 2 93has multiple string types (`unicode`, `str`). Additionally, compatibility shims might introduce new 94string types. 95 96Instead, always use an inheritance-sensitive like like `isinstance(obj, six.string_types)`. 97 98### Dealing with binary data 99 100Python 2 bytes and strings are the same thing. This was the wrong design decision and Python 3 101(wisely) switched to using a separate type for human text. This lack of distinction in Python 2 102caused many programming errors, so it’s recommended to use **bytearray**, **bytes**, and 103**memoryviews** instead of a string. If a string is really required, encode the raw data explicitly 104using an escape method. 105 106### Encoding data to strings and back 107 108The simplest solution is to use **six** library and one of the functions like: 109 110``` 111mystring = six.ensure_str(object) 112``` 113 114This ensures the resulting value is a native string. It deals with Unicode in Python 2 automatically. 115The six library is still required even if data is encoding manually, since it converts types. 116 117``` 118from builtins import bytes 119str = six.ensure_str(bytes.decode('utf-8')) 120``` 121 122When converting data to a string, add an encoding type so Python knows how handle raw bytes. In most 123cases **utf-8** will work but be careful to be sure that the encoding matches your data. 124 125There are two options to consider when trying to get a string out of the raw data without knowing if 126they are valid string or not: 127 128* **lossy conversion** - escapes all non-standard characters in form of ‘\xNNN’ 129* **lossless conversion** - maps invalid characters to special unicode range so it can reconstruct 130the string precisely 131 132Which to use depends on the transformation goals. The lossy conversion produces a printable string 133with strange characters in it. The lossless option is meant to be used when a string is only a transport 134mechanism and needs to be converted back to original values later. 135 136Switch the method by using `errors` handler during conversion: 137 138``` 139# Lossy escapes invalid chars 140b.decode('utf-8', errors='`backslashreplace'`) 141# Lossy removes invalid chars 142b.decode('utf-8', errors='ignore') 143# Loss-less but may likely fail to print() 144b.decode('utf-8', errors='surrogateescape') 145``` 146 147## Handling numbers 148 149Numeric types are incompatible between Python 2 and 3: 150 151* **long** is not available in Python 3. 152* **int** is the only integral type in Python 3 and hasunlimited precission (but 32-bits in Python 2). 153 154This creates all sorts of issues with macros. Follow these rules to make integral types compatible 155in both modes: 156 157* Do not use **long** — replace it with **int**. 158* When using the **value** class, types will be promoted to **long** as there is special number 159handling in the xnu macro library. Remaining code should be reviewed and fixed, if appropriate. 160* Avoid relying on sign extension. 161* Always switch Python to use Python 3 division, where `/` converts to floating point and does 162a fractional division `//` is a floor division (like integers in C): 163 `from __future__ import division 164 ` 165* Use division operators according to Python 3 rules. 166 167### Common integer representation 168 169The goal is to always use Python 3’s integer handling, which means using **int** everywhere. 170 171xnu’s macros provide a custom integer type called **valueint** that is a replacement for **int** 172in the Python 2 runtime. That means it behaves almost like **int** from Python 3. When importing 173from macros this type replaces any use of **int**: 174 175``` 176# Replaces all int()s to be valueint 177from xnu import * 178from xnu import int 179 180# Does not replace int()s 181import xnu 182from xnu import a, b, c 183``` 184 185Avoid using `from builtins import int` suggested on the internet. It does not work correctly with 186xnu’s **value** class. The **valueint** class inherits from **newint** and fixes problematic behavior. 187 188This impacts the way an object is checked for being an integer. Be careful about following constructs: 189 190``` 191# BAD: generally not a good way to do type checking in Python 192if type(obj) is int: 193 194# BAD: int may have been replaced with valueint. 195if isinstance(obj, int): 196``` 197 198Instead, use the base integral type: 199 200``` 201if isinstance(obj, numbers.Integral): 202``` 203 204### Dealing with signed numbers 205 206Original code was using two operators to convert **value** class instance to number: 207 208* **__int__** produced **int** and was either signed or unsigned based on underlying SBType. 209* **__long__** was always signed. 210 211This is confusing when dealing with types. Always use **unsigned()** or **signed()** regardless of 212what the actual underlying type is to ensure that macros use the correct semantics. 213 214### Dividing numbers 215 216Python 2’s **/** operator has two behaviors depending on the types of its arguments (**float** vs. **int**). 217Always use Python 3’s division operator: 218 219``` 220# Switch compiler to use Python 3 semantics 221from __future__ import division 222 223float_val = a / b # This becomes true, fractional division that yields float 224floor_div = a // b # This is floor division, like C 225``` 226 227If the original behavior is required, use **old_div** to get Python 2 behavior: 228 229``` 230from past.utils import old_div 231 232value = old_div(a, b) # Matches Python 2 semantics 233``` 234 235If this isn’t handled correctly, `format` will complain that a float value is being passed to 236a non-float formatting character. Automated scripts that convert from Python 2 to 3 tend to use 237**old_div** during porting. In most cases that is not required. For kernel debugging and integer 238types, `//` is used commonly to match the C’s division behavior for integers. 239 240## Testing changes 241 242There is no perfect test suite to check that macros are producing a correct value compared to what 243the debugger sees in a target. 244 245Be careful when touching common framework code. For larger changes, ask the Platform Triage team to 246validate that the changes work in their environment before integration. 247 248### Coding style 249 250Use a static analyzer like **pylint** or **flake8** to check the macro source code: 251 252``` 253# Python 2 254$ pip install --user pylint flake8 255 256# Python 3 257$ pip install --user pylint flake8 258 259# Run the lint either by setting your path to point to one of the runtimes 260# or through python 261$ python2 -m pylint <src files/dirs> 262$ python3 -m pylint <src files/dirs> 263$ python2 -m flake8 <src files/dirs> 264$ python3 -m flake8 <src files/dirs> 265``` 266 267### Correctness 268 269Ensure the macro matches what LLDB returns from the REPL. For example, compare `showproc(xxx)` with `p/x *(proc_t)xxx`. 270 271``` 272# 1. Run LLDB with debug options set 273$ DEBUG_XNU_LLDBMACROS=1 LLDB_DEFAULT_PYTHON_VERSION=2 xcrun -sdk <sdk> lldb -c core <dsympath>/mach_kernel 274 275# 2. Optionally load modified operating system plugin 276(lldb) settings set target.process.python-os-plugin-path <srcpath>/tools/lldbmacros/core/operating_system.py 277 278# 3. Load modified scripts 279(lldb) command script import <srcpath>/tools/lldbmacros/xnu.py 280 281# 4. Exercise macros 282``` 283 284Depending on the change, test other targets and architectures (for instance, both Astris and KDP). 285 286### Regression 287 288This is simpler than previous step because the goal is to ensure behavior has not changed. 289You can speed up few things by using local symbols: 290 291``` 292# 1. Get a coredump from a device and kernel UUID 293# 2. Grab symbols with dsymForUUID 294$ dsymForUUID --nocache --copyExecutable --copyDestination <dsym path> 295 296# 3. Run lldb with local symbols to avoid dsymForUUID NFS 297 298$ xcrun -sdk <sdk> lldb -c core <dsym_path>/<kernel image> 299``` 300 301The actual steps are identical to previous testing. Run of a macro to different file with `-o <outfile>` 302option. Then run `diff` on the outputs of the baseline and both Python 2 and 3: 303 304* No environment variables to get baseline 305* Python 2 with changes 306* Python 3 with changes 307 308There may be different ordering of elements based on internal implementation differences of each 309Python runtime. Some macros produce files — check the actual file contents. 310 311It’s difficult to make this automated: 312 313* Some macros needs arguments which must be found in a core file. 314* Some macros take a long time to run against a target (more than 30 minutes). Instead, a core dump 315 should be taken and then inspected afterwards, but this ties up a lab device for the duration of the 316 test. 317* Even with coredumps, testing the macros takes too long in our automation system and triggers the 318 failsafe timeout. 319 320### Code coverage 321 322Use code coverage to check which parts of macros have actually been tested. 323Install **coverage** lib with: 324 325``` 326$ pip install --user coverage 327$ pip3 install --user coverage 328``` 329 330Then collect coverage:. 331 332``` 333# 1. Start LLDB with your macros as described above. 334 335# 2. Load and start code coverage recording. 336(lldb) script import coverage 337(lldb) script cov = coverage.Coverage() 338(lldb) script cov.start() 339 340# 3. Do the testing. 341 342# 4. Collect the coverage. 343(lldb) script cov.stop() 344(lldb) script cov.save() 345``` 346 347You can override the default file (*.coverage*) by adding an additional environment variable to LLDB: 348 349``` 350$ env COVERAGE_FILE="${OUTDIR}/.coverage.mytest.py2" # usual LLDB command line 351``` 352 353Combine coverage from multiple files: 354 355``` 356# Point PATH to local python where coverage is installed. 357$ export PATH="$HOME/Library/Python/3.8/bin:$PATH" 358 359# Use --keep to avoid deletion of input files after merge. 360$ coverage combine --keep <list of .coverage files or dirs to scan> 361 362# Get HTML report or use other subcommands to inspect. 363$ coverage html 364``` 365 366It is possible to start coverage collection **before** importing the operating system library and 367loading macros to check code run during bootstrapping. 368 369### Performance testing 370 371Some macros can run for a long time. Some code may be costly even if it looks simple because objects 372aren’t cached or too many temporary objects are created. Simple profiling is similar to collecting 373code coverage. 374 375Run this in LLDB to get a profile: 376 377``` 378# Python 2 example (Python 3 is slightly different) 379(lldb) script import cProfile, pstats, StringIO 380(lldb) script pr = cProfile.Profile() 381(lldb) script pr.enable() 382# Run the macro here: 383(lldb) showcurrentstacks 384(lldb) script pr.disable() 385(lldb) script s = StringIO.StringIO() 386(lldb) script ps = pstats.Stats(pr, stream=s).sort_stats('cumulative') 387(lldb) script ps.print_stats() 388(lldb) script print(s.getvalue()) 389``` 390 391This will use a function call profiler to collect information about which functions took the most 392time during the macro’s execution. For example: 393 394``` 395 396 1292170 function calls (1291646 primitive calls) in 3.425 seconds 397 398 Ordered by: cumulative time 399 400 ncalls tottime percall cumtime percall filename:lineno(function) 401 1 0.000 0.000 3.424 3.424 <src>/tools/lldbmacros/xnu.py:104(_internal_command_function) 402 1 0.000 0.000 3.424 3.424 <src>/tools/lldbmacros/process.py:1389(ShowCurrentStacks) 403 6 0.002 0.000 2.031 0.338 <src>/tools/lldbmacros/xnu.py:358(GetThreadBackTrace) 404 467 0.003 0.000 1.969 0.004 <src>/tools/lldbmacros/core/cvalue.py:464(cast) 405 1 0.000 0.000 1.757 1.757 <src>/tools/lldbmacros/xnu.py:323(GetKextSymbolInfo) 406 1 0.006 0.006 1.756 1.756 <src>/tools/lldbmacros/memory.py:2181(GetKextLoadInformation) 407 256 0.000 0.000 1.711 0.007 <src>/tools/lldbmacros/utils.py:142(Cast) 408 473 0.002 0.000 1.365 0.003 <src>/tools/lldbmacros/core/cvalue.py:500(gettype) 409 30 0.000 0.000 1.342 0.045 .../LLDB.framework/Resources/Python2/lldb/__init__.py:10442(FindTypes) 410 30 1.342 0.045 1.342 0.045 {lldb._lldb.SBTarget_FindTypes} 411 6 0.000 0.000 1.129 0.188 <src>/tools/lldbmacros/process.py:324(GetThreadSummary) 412 6 0.000 0.000 1.106 0.184 <src>/tools/lldbmacros/process.py:312(GetThreadName) 413 210 0.027 0.000 0.634 0.003 <src>/tools/lldbmacros/memory.py:2123(GetKmodWithAddr) 414 467 0.005 0.000 0.600 0.001 <src>/tools/lldbmacros/core/cvalue.py:343(_GetValueAsCast) 415``` 416 417## Debugging your changes 418 419YES, It is possible to use a debugger to debug your code! 420 421The steps are similar to testing techniques described above (use scrpting interactive mode). There is no point to 422document the debugger itself. Lets focus on how to use it on a real life example. The debugger used here is PDB which 423is part of Python installation so works out of the box. 424 425Problem: Something wrong is going on with addkext macro. What now? 426 427 (lldb) addkext -N com.apple.driver.AppleT8103PCIeC 428 Failed to read MachO for address 18446741875027613136 errormessage: seek to offset 2169512 is outside window [0, 1310] 429 Failed to read MachO for address 18446741875033537424 errormessage: seek to offset 8093880 is outside window [0, 1536] 430 Failed to read MachO for address 18446741875033568304 errormessage: seek to offset 8124208 is outside window [0, 1536] 431 ... 432 Fetching dSYM for 049b9a29-2efc-32c0-8a7f-5f29c12b870c 433 Adding dSYM (049b9a29-2efc-32c0-8a7f-5f29c12b870c) for /Library/Caches/com.apple.bni.symbols/bursar.apple.com/dsyms/StarE/AppleEmbeddedPCIE/AppleEmbeddedPCIE-502.100.35~3/049B9A29-2EFC-32C0-8A7F-5F29C12B870C/AppleT8103PCIeC 434 section '__TEXT' loaded at 0xfffffe001478c780 435 436There is no exception, lot of errors and no output. So what next? 437Try to narrow the problem down to an isolated piece of macro code: 438 439 1. Try to get values of globals through regular LLDB commands 440 2. Use interactive mode and invoke functions with arguments directly. 441 442After inspecting addkext macro code and calling few functions with arguments directly we can see that there is an 443exception in the end. It was just captured in try/catch block. So the simplified reproducer is: 444 445 (lldb) script 446 >>> import lldb 447 >>> import xnu 448 >>> err = lldb.SBError() 449 >>> data = xnu.LazyTarget.GetProcess().ReadMemory(0xfffffe0014c0f3f0, 0x000000000001b5d0, err) 450 >>> m = macho.MemMacho(data, len(data)) 451 Traceback (most recent call last): 452 File "<console>", line 1, in <module> 453 File ".../lldbmacros/macho.py", line 91, in __init__ 454 self.load(fp) 455 File ".../site-packages/macholib/MachO.py", line 133, in load 456 self.load_header(fh, 0, size) 457 File ".../site-packages/macholib/MachO.py", line 168, in load_header 458 hdr = MachOHeader(self, fh, offset, size, magic, hdr, endian) 459 File ".../site-packages/macholib/MachO.py", line 209, in __init__ 460 self.load(fh) 461 File ".../lldbmacros/macho.py", line 23, in new_load 462 _old_MachOHeader_load(s, fh) 463 File ".../site-packages/macholib/MachO.py", line 287, in load 464 fh.seek(seg.offset) 465 File ".../site-packages/macholib/util.py", line 91, in seek 466 self._checkwindow(seekto, "seek") 467 File ".../site-packages/macholib/util.py", line 76, in _checkwindow 468 raise IOError( 469 OSError: seek to offset 9042440 is outside window [0, 112080] 470 471Clearly an external library is involved and execution flow jumps between dSYM and the library few times. 472Lets try to look around with a debugger. 473 474 (lldb) script 475 # Prepare data variable as described above. 476 477 # Run last statement with debugger. 478 >>> import pdb 479 >>> pdb.run('m = macho.MemMacho(data, len(data))', globals(), locals()) 480 > <string>(1)<module>() 481 482 # Show debugger's help 483 (Pdb) help 484 485It is not possible to break on exception. Python uses them a lot so it is better to put a breakpoint to source 486code. This puts breakpoint on the IOError exception mentioned above. 487 488 (Pdb) break ~/Library/Python/3.8/lib/python/site-packages/macholib/util.py:76 489 Breakpoint 4 at ~/Library/Python/3.8/lib/python/site-packages/macholib/util.py:76 490 491You can now single step or continue the execution as usuall for a debugger. 492 493 (Pdb) cont 494 > /Users/tjedlicka/Library/Python/3.8/lib/python/site-packages/macholib/util.py(76)_checkwindow() 495 -> raise IOError( 496 (Pdb) bt 497 /Volumes/.../Python3.framework/Versions/3.8/lib/python3.8/bdb.py(580)run() 498 -> exec(cmd, globals, locals) 499 <string>(1)<module>() 500 /Volumes/...dSYM/Contents/Resources/Python/lldbmacros/macho.py(91)__init__() 501 -> self.load(fp) 502 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(133)load() 503 -> self.load_header(fh, 0, size) 504 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(168)load_header() 505 -> hdr = MachOHeader(self, fh, offset, size, magic, hdr, endian) 506 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(209)__init__() 507 -> self.load(fh) 508 /Volumes/...dSYM/Contents/Resources/Python/lldbmacros/macho.py(23)new_load() 509 -> _old_MachOHeader_load(s, fh) 510 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(287)load() 511 -> fh.seek(seg.offset) 512 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/util.py(91)seek() 513 -> self._checkwindow(seekto, "seek") 514 > /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/util.py(76)_checkwindow() 515 -> raise IOError( 516 517 518Now we can move a frame above and inspect stopped target: 519 520 # Show current frame arguments 521 (Pdb) up 522 (Pdb) a 523 self = <fileview [0, 112080] <macho.MemFile object at 0x1075cafd0>> 524 offset = 9042440 525 whence = 0 526 527 # globals, local or expressons 528 (Pdb) p type(seg.offset) 529 <class 'macholib.ptypes.p_uint32'> 530 (Pdb) p hex(seg.offset) 531 '0x89fa08' 532 533 # Find attributes of a Python object. 534 (Pdb) p dir(section_cls) 535 ['__class__', '__cmp__', ... ,'reserved3', 'sectname', 'segname', 'size', 'to_fileobj', 'to_mmap', 'to_str'] 536 (Pdb) p section_cls.sectname 537 <property object at 0x1077bbef0> 538 539Unfortunately everything looks correct but there is actually one ineteresting frame in the stack. The one which 540provides the offset to the seek method. Lets see where we are in the source code. 541 542 (Pdb) up 543 > /Users/tjedlicka/Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(287)load() 544 -> fh.seek(seg.offset) 545 (Pdb) list 546 282 not_zerofill = (seg.flags & S_ZEROFILL) != S_ZEROFILL 547 283 if seg.offset > 0 and seg.size > 0 and not_zerofill: 548 284 low_offset = min(low_offset, seg.offset) 549 285 if not_zerofill: 550 286 c = fh.tell() 551 287 -> fh.seek(seg.offset) 552 288 sd = fh.read(seg.size) 553 289 seg.add_section_data(sd) 554 290 fh.seek(c) 555 291 segs.append(seg) 556 292 # data is a list of segments 557 558Running debugger on working case and stepping through the load() method shows that this code is not present. 559That means we are broken by a library update! Older versions of library do not load data for a section. 560