1# XNU debugging 2 3xnu’s debugging macros are compatible with both Python 2 and 3. In practice, this means that Python 3 4features are unavailable and some Python 2 syntax is not allowed. Unfortunately, any syntax error will 5prevent use of all the macros, as they’re all imported into the same scripting environment. 6 7## Compatibility 8 9Avoid introducing specific compatibility shims, as there are a few existing ones that come with 10Python 2 and 3: 11 12* **six** has helpers that work in both Python 2 and 3, for things like the string type change 13* **future** backports features from Python 3 to Python 2 14 15For example, Python 2 contains **range** and **xrange**. Python 3 contains only **range** which has 16**xrange** semantics. The simplest solution is to port your code and use Python 3 way: 17 18``` 19# Use backported range from Python 3 20from builtins import range 21 22# Use range on both Python 2/3 runtimes 23for x in range(....): 24 .... 25``` 26 27Be very careful about using imports from 'future' library. Some of them are **very invasive** and change 28behavior of your code. This may cause strange runtime errors. For example: 29 30``` 31# Changes modules handling logic to make your code working with std library reorg (PEP 3108) 32from future import standard_library 33standard_library.install_aliases() 34 35# Replaces lot of common types like str with future's Python 3 backports. 36from builtins import * 37``` 38 39## Handling strings 40 41Macros use strings produced from the LLDB runtime. They must use **six** when doing certain operations 42to avoid exceptions. Until the transition is done, these canonical ways of dealing with strings cannot 43be used: 44 45* Using Unicode literals by default: 46 `from __future__ import unicode_literals` 47* **f-strings** 48 49Some advice: 50 51* Use byte strings explicitly when dealing with memory and not strings: 52 `b'string'` 53* Always properly encode/decode raw data to/from strings before passing it around, with `six.ensure_str` or 54 `six.ensure_bytes`. 55 56Improperly-typed strings will raise *different* exceptions on each runtime. 57 58* Python 2 raises codec exceptions when printing strings. 59* Python 3 complains about concatenation of objects of incompatible types (bytes and strings). 60 61### No convenient, common string type 62 63While it is possible to use future’s **newstr** to backport new string type to Python 3, there are 64issues with the Scripting Bridge (SB) API from LLDB. Python 3 will work out of the box but Python 2 65will complain because **newstr** maps to **unicode**. SB exposes **const char \*** as a native string, 66or just **str** in Python 2. For Python 2 we would have to explicitly encode all Unicode strings 67before calling the API. 68 69Another problem is that literals in form `'string'` are no longer compatible with unicode and need 70to be switched to `u'string'`. This can be changed with single import at the top of the file, but 71in some scenarios byte strings are expected. That change would require checking all strings in the 72code and changing some back to `b'string'`. 73 74Here’s an example of just how pervasive a change would be because this code would break in Python 2: 75 76``` 77from xnu import * 78 79@lldb_type_summary(['type']) 80def print_summary(): 81 .... 82``` 83 84The result is that we have non-unicode literal being registered with unicode API in Python 3. 85Unfortunately `'type' != b'type'` and thus LLDB will never match the type when printing summaries. 86 87Using native strings and literals allows for only minimal code changes to the macros that are still 88compatible with other projects using Python 2. 89 90### Check that an object is a string 91 92Avoid testing for `str` explicitly like `type(obj) == str`. This won’t work correctly as Python 2 93has multiple string types (`unicode`, `str`). Additionally, compatibility shims might introduce new 94string types. 95 96Instead, always use an inheritance-sensitive like like `isinstance(obj, six.string_types)`. 97 98### Dealing with binary data 99 100Python 2 bytes and strings are the same thing. This was the wrong design decision and Python 3 101(wisely) switched to using a separate type for human text. This lack of distinction in Python 2 102caused many programming errors, so it’s recommended to use **bytearray**, **bytes**, and 103**memoryviews** instead of a string. If a string is really required, encode the raw data explicitly 104using an escape method. 105 106### Accessing large amounts of binary data (or accessing small amounts frequently) 107 108In case you're planning on accessing large contiguous blocks of memory (e.g. reading a whole 10KB of memory), 109or you're accessing small semi-contiguous chunks (e.g. if you're parsing large structured data), then it might 110be hugely beneficial performance-wise to make use of the `io.SBProcessRawIO` class. Furthermore, if you're in 111a hurry and just want to read one specific chunk once, then it might be easier to use `LazyTarget.GetProcess().ReadMemory()` 112directly. 113 114In other words, avoid the following: 115 116``` 117data_ptr = kern.GetValueFromAddress(start_addr, 'uint8_t *') 118with open(filepath, 'wb') as f: 119 f.write(data_ptr[:4096]) 120``` 121 122And instead use: 123 124``` 125from core.io import SBProcessRawIO 126import shutil 127 128io_access = SBProcessRawIO(LazyTarget.GetProcess(), start_addr, 4096) 129with open(filepath, 'wb') as f: 130 shutil.copyfileobj(io_access, f) 131``` 132 133Or, if you're in a hurry: 134 135``` 136err = lldb.SBError() 137my_data = LazyTarget.GetProcess().ReadMemory(start_addr, length, err) 138if err.Success(): 139 # Use my precious data 140 pass 141``` 142 143For small semi-contiguous chunks, you can map the whole region and access random chunks from it like so: 144 145``` 146from core.io import SBProcessRawIO 147 148io_access = SBProcessRawIO(LazyTarget.GetProcess(), start_addr, size) 149io_access.seek(my_struct_offset) 150my_struct_contents = io_access.read(my_struct_size) 151``` 152 153Not only that, but you can also tack on a BufferedRandom class on top of the SBProcessRawIO instance, which 154provides you with buffering (aka caching) in case your random small chunk accesses are repeated: 155 156``` 157from core.io import SBProcessRawIO 158from io import BufferedRandom 159 160io_access = SBProcessRawIO(LazyTarget.GetProcess(), start_addr, size) 161buffered_io = BufferedRandom(io_access) 162# And then use buffered_io for your accesses 163``` 164 165### Encoding data to strings and back 166 167The simplest solution is to use **six** library and one of the functions like: 168 169``` 170mystring = six.ensure_str(object) 171``` 172 173This ensures the resulting value is a native string. It deals with Unicode in Python 2 automatically. 174The six library is still required even if data is encoding manually, since it converts types. 175 176``` 177from builtins import bytes 178str = six.ensure_str(bytes.decode('utf-8')) 179``` 180 181When converting data to a string, add an encoding type so Python knows how handle raw bytes. In most 182cases **utf-8** will work but be careful to be sure that the encoding matches your data. 183 184There are two options to consider when trying to get a string out of the raw data without knowing if 185they are valid string or not: 186 187* **lossy conversion** - escapes all non-standard characters in form of ‘\xNNN’ 188* **lossless conversion** - maps invalid characters to special unicode range so it can reconstruct 189the string precisely 190 191Which to use depends on the transformation goals. The lossy conversion produces a printable string 192with strange characters in it. The lossless option is meant to be used when a string is only a transport 193mechanism and needs to be converted back to original values later. 194 195Switch the method by using `errors` handler during conversion: 196 197``` 198# Lossy escapes invalid chars 199b.decode('utf-8', errors='`backslashreplace'`) 200# Lossy removes invalid chars 201b.decode('utf-8', errors='ignore') 202# Loss-less but may likely fail to print() 203b.decode('utf-8', errors='surrogateescape') 204``` 205 206## Handling numbers 207 208Numeric types are incompatible between Python 2 and 3: 209 210* **long** is not available in Python 3. 211* **int** is the only integral type in Python 3 and hasunlimited precission (but 32-bits in Python 2). 212 213This creates all sorts of issues with macros. Follow these rules to make integral types compatible 214in both modes: 215 216* Do not use **long** — replace it with **int**. 217* When using the **value** class, types will be promoted to **long** as there is special number 218handling in the xnu macro library. Remaining code should be reviewed and fixed, if appropriate. 219* Avoid relying on sign extension. 220* Always switch Python to use Python 3 division, where `/` converts to floating point and does 221a fractional division `//` is a floor division (like integers in C): 222 `from __future__ import division 223 ` 224* Use division operators according to Python 3 rules. 225 226### Common integer representation 227 228The goal is to always use Python 3’s integer handling, which means using **int** everywhere. 229 230xnu’s macros provide a custom integer type called **valueint** that is a replacement for **int** 231in the Python 2 runtime. That means it behaves almost like **int** from Python 3. When importing 232from macros this type replaces any use of **int**: 233 234``` 235# Replaces all int()s to be valueint 236from xnu import * 237from xnu import int 238 239# Does not replace int()s 240import xnu 241from xnu import a, b, c 242``` 243 244Avoid using `from builtins import int` suggested on the internet. It does not work correctly with 245xnu’s **value** class. The **valueint** class inherits from **newint** and fixes problematic behavior. 246 247This impacts the way an object is checked for being an integer. Be careful about following constructs: 248 249``` 250# BAD: generally not a good way to do type checking in Python 251if type(obj) is int: 252 253# BAD: int may have been replaced with valueint. 254if isinstance(obj, int): 255``` 256 257Instead, use the base integral type: 258 259``` 260if isinstance(obj, numbers.Integral): 261``` 262 263### Dealing with signed numbers 264 265Original code was using two operators to convert **value** class instance to number: 266 267* **__int__** produced **int** and was either signed or unsigned based on underlying SBType. 268* **__long__** was always signed. 269 270This is confusing when dealing with types. Always use **unsigned()** or **signed()** regardless of 271what the actual underlying type is to ensure that macros use the correct semantics. 272 273### Dividing numbers 274 275Python 2’s **/** operator has two behaviors depending on the types of its arguments (**float** vs. **int**). 276Always use Python 3’s division operator: 277 278``` 279# Switch compiler to use Python 3 semantics 280from __future__ import division 281 282float_val = a / b # This becomes true, fractional division that yields float 283floor_div = a // b # This is floor division, like C 284``` 285 286If the original behavior is required, use **old_div** to get Python 2 behavior: 287 288``` 289from past.utils import old_div 290 291value = old_div(a, b) # Matches Python 2 semantics 292``` 293 294If this isn’t handled correctly, `format` will complain that a float value is being passed to 295a non-float formatting character. Automated scripts that convert from Python 2 to 3 tend to use 296**old_div** during porting. In most cases that is not required. For kernel debugging and integer 297types, `//` is used commonly to match the C’s division behavior for integers. 298 299## Testing changes 300 301There is no perfect test suite to check that macros are producing a correct value compared to what 302the debugger sees in a target. 303 304Be careful when touching common framework code. For larger changes, ask the Platform Triage team to 305validate that the changes work in their environment before integration. 306 307### Coding style 308 309Use a static analyzer like **pylint** or **flake8** to check the macro source code: 310 311``` 312# Python 2 313$ pip install --user pylint flake8 314 315# Python 3 316$ pip install --user pylint flake8 317 318# Run the lint either by setting your path to point to one of the runtimes 319# or through python 320$ python2 -m pylint <src files/dirs> 321$ python3 -m pylint <src files/dirs> 322$ python2 -m flake8 <src files/dirs> 323$ python3 -m flake8 <src files/dirs> 324``` 325 326### Correctness 327 328Ensure the macro matches what LLDB returns from the REPL. For example, compare `showproc(xxx)` with `p/x *(proc_t)xxx`. 329 330``` 331# 1. Run LLDB with debug options set 332$ DEBUG_XNU_LLDBMACROS=1 LLDB_DEFAULT_PYTHON_VERSION=2 xcrun -sdk <sdk> lldb -c core <dsympath>/mach_kernel 333 334# 2. Optionally load modified operating system plugin 335(lldb) settings set target.process.python-os-plugin-path <srcpath>/tools/lldbmacros/core/operating_system.py 336 337# 3. Load modified scripts 338(lldb) command script import <srcpath>/tools/lldbmacros/xnu.py 339 340# 4. Exercise macros 341``` 342 343Depending on the change, test other targets and architectures (for instance, both Astris and KDP). 344 345### Regression 346 347This is simpler than previous step because the goal is to ensure behavior has not changed. 348You can speed up few things by using local symbols: 349 350``` 351# 1. Get a coredump from a device and kernel UUID 352# 2. Grab symbols with dsymForUUID 353$ dsymForUUID --nocache --copyExecutable --copyDestination <dsym path> 354 355# 3. Run lldb with local symbols to avoid dsymForUUID NFS 356 357$ xcrun -sdk <sdk> lldb -c core <dsym_path>/<kernel image> 358``` 359 360The actual steps are identical to previous testing. Run of a macro to different file with `-o <outfile>` 361option. Then run `diff` on the outputs of the baseline and both Python 2 and 3: 362 363* No environment variables to get baseline 364* Python 2 with changes 365* Python 3 with changes 366 367There may be different ordering of elements based on internal implementation differences of each 368Python runtime. Some macros produce files — check the actual file contents. 369 370It’s difficult to make this automated: 371 372* Some macros needs arguments which must be found in a core file. 373* Some macros take a long time to run against a target (more than 30 minutes). Instead, a core dump 374 should be taken and then inspected afterwards, but this ties up a lab device for the duration of the 375 test. 376* Even with coredumps, testing the macros takes too long in our automation system and triggers the 377 failsafe timeout. 378 379### Code coverage 380 381Use code coverage to check which parts of macros have actually been tested. 382Install **coverage** lib with: 383 384``` 385$ pip install --user coverage 386$ pip3 install --user coverage 387``` 388 389Then collect coverage:. 390 391``` 392# 1. Start LLDB with your macros as described above. 393 394# 2. Load and start code coverage recording. 395(lldb) script import coverage 396(lldb) script cov = coverage.Coverage() 397(lldb) script cov.start() 398 399# 3. Do the testing. 400 401# 4. Collect the coverage. 402(lldb) script cov.stop() 403(lldb) script cov.save() 404``` 405 406You can override the default file (*.coverage*) by adding an additional environment variable to LLDB: 407 408``` 409$ env COVERAGE_FILE="${OUTDIR}/.coverage.mytest.py2" # usual LLDB command line 410``` 411 412Combine coverage from multiple files: 413 414``` 415# Point PATH to local python where coverage is installed. 416$ export PATH="$HOME/Library/Python/3.8/bin:$PATH" 417 418# Use --keep to avoid deletion of input files after merge. 419$ coverage combine --keep <list of .coverage files or dirs to scan> 420 421# Get HTML report or use other subcommands to inspect. 422$ coverage html 423``` 424 425It is possible to start coverage collection **before** importing the operating system library and 426loading macros to check code run during bootstrapping. 427 428### Performance testing 429 430Some macros can run for a long time. Some code may be costly even if it looks simple because objects 431aren’t cached or too many temporary objects are created. Simple profiling is similar to collecting 432code coverage. 433 434Run this in LLDB to get a profile: 435 436``` 437# Python 2 example (Python 3 is slightly different) 438(lldb) script import cProfile, pstats, StringIO 439(lldb) script pr = cProfile.Profile() 440(lldb) script pr.enable() 441# Run the macro here: 442(lldb) showcurrentstacks 443(lldb) script pr.disable() 444(lldb) script s = StringIO.StringIO() 445(lldb) script ps = pstats.Stats(pr, stream=s).sort_stats('cumulative') 446(lldb) script ps.print_stats() 447(lldb) script print(s.getvalue()) 448``` 449 450This will use a function call profiler to collect information about which functions took the most 451time during the macro’s execution. For example: 452 453``` 454 455 1292170 function calls (1291646 primitive calls) in 3.425 seconds 456 457 Ordered by: cumulative time 458 459 ncalls tottime percall cumtime percall filename:lineno(function) 460 1 0.000 0.000 3.424 3.424 <src>/tools/lldbmacros/xnu.py:104(_internal_command_function) 461 1 0.000 0.000 3.424 3.424 <src>/tools/lldbmacros/process.py:1389(ShowCurrentStacks) 462 6 0.002 0.000 2.031 0.338 <src>/tools/lldbmacros/xnu.py:358(GetThreadBackTrace) 463 467 0.003 0.000 1.969 0.004 <src>/tools/lldbmacros/core/cvalue.py:464(cast) 464 1 0.000 0.000 1.757 1.757 <src>/tools/lldbmacros/xnu.py:323(GetKextSymbolInfo) 465 1 0.006 0.006 1.756 1.756 <src>/tools/lldbmacros/memory.py:2181(GetKextLoadInformation) 466 256 0.000 0.000 1.711 0.007 <src>/tools/lldbmacros/utils.py:142(Cast) 467 473 0.002 0.000 1.365 0.003 <src>/tools/lldbmacros/core/cvalue.py:500(gettype) 468 30 0.000 0.000 1.342 0.045 .../LLDB.framework/Resources/Python2/lldb/__init__.py:10442(FindTypes) 469 30 1.342 0.045 1.342 0.045 {lldb._lldb.SBTarget_FindTypes} 470 6 0.000 0.000 1.129 0.188 <src>/tools/lldbmacros/process.py:324(GetThreadSummary) 471 6 0.000 0.000 1.106 0.184 <src>/tools/lldbmacros/process.py:312(GetThreadName) 472 210 0.027 0.000 0.634 0.003 <src>/tools/lldbmacros/memory.py:2123(GetKmodWithAddr) 473 467 0.005 0.000 0.600 0.001 <src>/tools/lldbmacros/core/cvalue.py:343(_GetValueAsCast) 474``` 475 476## Debugging your changes 477 478YES, It is possible to use a debugger to debug your code! 479 480The steps are similar to testing techniques described above (use scrpting interactive mode). There is no point to 481document the debugger itself. Lets focus on how to use it on a real life example. The debugger used here is PDB which 482is part of Python installation so works out of the box. 483 484Problem: Something wrong is going on with addkext macro. What now? 485 486 (lldb) addkext -N com.apple.driver.AppleT8103PCIeC 487 Failed to read MachO for address 18446741875027613136 errormessage: seek to offset 2169512 is outside window [0, 1310] 488 Failed to read MachO for address 18446741875033537424 errormessage: seek to offset 8093880 is outside window [0, 1536] 489 Failed to read MachO for address 18446741875033568304 errormessage: seek to offset 8124208 is outside window [0, 1536] 490 ... 491 Fetching dSYM for 049b9a29-2efc-32c0-8a7f-5f29c12b870c 492 Adding dSYM (049b9a29-2efc-32c0-8a7f-5f29c12b870c) for /Library/Caches/com.apple.bni.symbols/bursar.apple.com/dsyms/StarE/AppleEmbeddedPCIE/AppleEmbeddedPCIE-502.100.35~3/049B9A29-2EFC-32C0-8A7F-5F29C12B870C/AppleT8103PCIeC 493 section '__TEXT' loaded at 0xfffffe001478c780 494 495There is no exception, lot of errors and no output. So what next? 496Try to narrow the problem down to an isolated piece of macro code: 497 498 1. Try to get values of globals through regular LLDB commands 499 2. Use interactive mode and invoke functions with arguments directly. 500 501After inspecting addkext macro code and calling few functions with arguments directly we can see that there is an 502exception in the end. It was just captured in try/catch block. So the simplified reproducer is: 503 504 (lldb) script 505 >>> import lldb 506 >>> import xnu 507 >>> err = lldb.SBError() 508 >>> data = xnu.LazyTarget.GetProcess().ReadMemory(0xfffffe0014c0f3f0, 0x000000000001b5d0, err) 509 >>> m = macho.MemMacho(data, len(data)) 510 Traceback (most recent call last): 511 File "<console>", line 1, in <module> 512 File ".../lldbmacros/macho.py", line 91, in __init__ 513 self.load(fp) 514 File ".../site-packages/macholib/MachO.py", line 133, in load 515 self.load_header(fh, 0, size) 516 File ".../site-packages/macholib/MachO.py", line 168, in load_header 517 hdr = MachOHeader(self, fh, offset, size, magic, hdr, endian) 518 File ".../site-packages/macholib/MachO.py", line 209, in __init__ 519 self.load(fh) 520 File ".../lldbmacros/macho.py", line 23, in new_load 521 _old_MachOHeader_load(s, fh) 522 File ".../site-packages/macholib/MachO.py", line 287, in load 523 fh.seek(seg.offset) 524 File ".../site-packages/macholib/util.py", line 91, in seek 525 self._checkwindow(seekto, "seek") 526 File ".../site-packages/macholib/util.py", line 76, in _checkwindow 527 raise IOError( 528 OSError: seek to offset 9042440 is outside window [0, 112080] 529 530Clearly an external library is involved and execution flow jumps between dSYM and the library few times. 531Lets try to look around with a debugger. 532 533 (lldb) script 534 # Prepare data variable as described above. 535 536 # Run last statement with debugger. 537 >>> import pdb 538 >>> pdb.run('m = macho.MemMacho(data, len(data))', globals(), locals()) 539 > <string>(1)<module>() 540 541 # Show debugger's help 542 (Pdb) help 543 544It is not possible to break on exception. Python uses them a lot so it is better to put a breakpoint to source 545code. This puts breakpoint on the IOError exception mentioned above. 546 547 (Pdb) break ~/Library/Python/3.8/lib/python/site-packages/macholib/util.py:76 548 Breakpoint 4 at ~/Library/Python/3.8/lib/python/site-packages/macholib/util.py:76 549 550You can now single step or continue the execution as usuall for a debugger. 551 552 (Pdb) cont 553 > /Users/tjedlicka/Library/Python/3.8/lib/python/site-packages/macholib/util.py(76)_checkwindow() 554 -> raise IOError( 555 (Pdb) bt 556 /Volumes/.../Python3.framework/Versions/3.8/lib/python3.8/bdb.py(580)run() 557 -> exec(cmd, globals, locals) 558 <string>(1)<module>() 559 /Volumes/...dSYM/Contents/Resources/Python/lldbmacros/macho.py(91)__init__() 560 -> self.load(fp) 561 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(133)load() 562 -> self.load_header(fh, 0, size) 563 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(168)load_header() 564 -> hdr = MachOHeader(self, fh, offset, size, magic, hdr, endian) 565 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(209)__init__() 566 -> self.load(fh) 567 /Volumes/...dSYM/Contents/Resources/Python/lldbmacros/macho.py(23)new_load() 568 -> _old_MachOHeader_load(s, fh) 569 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(287)load() 570 -> fh.seek(seg.offset) 571 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/util.py(91)seek() 572 -> self._checkwindow(seekto, "seek") 573 > /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/util.py(76)_checkwindow() 574 -> raise IOError( 575 576 577Now we can move a frame above and inspect stopped target: 578 579 # Show current frame arguments 580 (Pdb) up 581 (Pdb) a 582 self = <fileview [0, 112080] <macho.MemFile object at 0x1075cafd0>> 583 offset = 9042440 584 whence = 0 585 586 # globals, local or expressons 587 (Pdb) p type(seg.offset) 588 <class 'macholib.ptypes.p_uint32'> 589 (Pdb) p hex(seg.offset) 590 '0x89fa08' 591 592 # Find attributes of a Python object. 593 (Pdb) p dir(section_cls) 594 ['__class__', '__cmp__', ... ,'reserved3', 'sectname', 'segname', 'size', 'to_fileobj', 'to_mmap', 'to_str'] 595 (Pdb) p section_cls.sectname 596 <property object at 0x1077bbef0> 597 598Unfortunately everything looks correct but there is actually one ineteresting frame in the stack. The one which 599provides the offset to the seek method. Lets see where we are in the source code. 600 601 (Pdb) up 602 > /Users/tjedlicka/Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(287)load() 603 -> fh.seek(seg.offset) 604 (Pdb) list 605 282 not_zerofill = (seg.flags & S_ZEROFILL) != S_ZEROFILL 606 283 if seg.offset > 0 and seg.size > 0 and not_zerofill: 607 284 low_offset = min(low_offset, seg.offset) 608 285 if not_zerofill: 609 286 c = fh.tell() 610 287 -> fh.seek(seg.offset) 611 288 sd = fh.read(seg.size) 612 289 seg.add_section_data(sd) 613 290 fh.seek(c) 614 291 segs.append(seg) 615 292 # data is a list of segments 616 617Running debugger on working case and stepping through the load() method shows that this code is not present. 618That means we are broken by a library update! Older versions of library do not load data for a section. 619