1# XNU debugging 2 3xnu’s debugging macros are compatible with both Python 2 and 3. In practice, this means that Python 3 4features are unavailable and some Python 2 syntax is not allowed. Unfortunately, any syntax error will 5prevent use of all the macros, as they’re all imported into the same scripting environment. 6 7## Compatibility 8 9Avoid introducing specific compatibility shims, as there are a few existing ones that come with 10Python 2 and 3: 11 12* **six** has helpers that work in both Python 2 and 3, for things like the string type change 13* **future** backports features from Python 3 to Python 2 14 15For example, Python 2 contains **range** and **xrange**. Python 3 contains only **range** which has 16**xrange** semantics. The simplest solution is to port your code and use Python 3 way: 17 18``` 19# Use backported range from Python 3 20from builtins import range 21 22# Use range on both Python 2/3 runtimes 23for x in range(....): 24 .... 25``` 26 27Be very careful about using imports from 'future' library. Some of them are **very invasive** and change 28behavior of your code. This may cause strange runtime errors. For example: 29 30``` 31# Changes modules handling logic to make your code working with std library reorg (PEP 3108) 32from future import standard_library 33standard_library.install_aliases() 34 35# Replaces lot of common types like str with future's Python 3 backports. 36from builtins import * 37``` 38 39## Handling strings 40 41Macros use strings produced from the LLDB runtime. They must use **six** when doing certain operations 42to avoid exceptions. Until the transition is done, these canonical ways of dealing with strings cannot 43be used: 44 45* Using Unicode literals by default: 46 `from __future__ import unicode_literals` 47* **f-strings** 48 49Some advice: 50 51* Use byte strings explicitly when dealing with memory and not strings: 52 `b'string'` 53* Always properly encode/decode raw data to/from strings before passing it around, with `six.ensure_str` or 54 `six.ensure_bytes`. 55 56Improperly-typed strings will raise *different* exceptions on each runtime. 57 58* Python 2 raises codec exceptions when printing strings. 59* Python 3 complains about concatenation of objects of incompatible types (bytes and strings). 60 61### No convenient, common string type 62 63While it is possible to use future’s **newstr** to backport new string type to Python 3, there are 64issues with the Scripting Bridge (SB) API from LLDB. Python 3 will work out of the box but Python 2 65will complain because **newstr** maps to **unicode**. SB exposes **const char \*** as a native string, 66or just **str** in Python 2. For Python 2 we would have to explicitly encode all Unicode strings 67before calling the API. 68 69Another problem is that literals in form `'string'` are no longer compatible with unicode and need 70to be switched to `u'string'`. This can be changed with single import at the top of the file, but 71in some scenarios byte strings are expected. That change would require checking all strings in the 72code and changing some back to `b'string'`. 73 74Here’s an example of just how pervasive a change would be because this code would break in Python 2: 75 76``` 77from xnu import * 78 79@lldb_type_summary(['type']) 80def print_summary(): 81 .... 82``` 83 84The result is that we have non-unicode literal being registered with unicode API in Python 3. 85Unfortunately `'type' != b'type'` and thus LLDB will never match the type when printing summaries. 86 87Using native strings and literals allows for only minimal code changes to the macros that are still 88compatible with other projects using Python 2. 89 90### Check that an object is a string 91 92Avoid testing for `str` explicitly like `type(obj) == str`. This won’t work correctly as Python 2 93has multiple string types (`unicode`, `str`). Additionally, compatibility shims might introduce new 94string types. 95 96Instead, always use an inheritance-sensitive like like `isinstance(obj, six.string_types)`. 97 98### Dealing with binary data 99 100Python 2 bytes and strings are the same thing. This was the wrong design decision and Python 3 101(wisely) switched to using a separate type for human text. This lack of distinction in Python 2 102caused many programming errors, so it’s recommended to use **bytearray**, **bytes**, and 103**memoryviews** instead of a string. If a string is really required, encode the raw data explicitly 104using an escape method. 105 106### Accessing large amounts of binary data (or accessing small amounts frequently) 107 108In case you're planning on accessing large contiguous blocks of memory (e.g. reading a whole 10KB of memory), 109or you're accessing small semi-contiguous chunks (e.g. if you're parsing large structured data), then it might 110be hugely beneficial performance-wise to make use of the `io.SBProcessRawIO` class. Furthermore, if you're in 111a hurry and just want to read one specific chunk once, then it might be easier to use `LazyTarget.GetProcess().ReadMemory()` 112directly. 113 114In other words, avoid the following: 115 116``` 117data_ptr = kern.GetValueFromAddress(start_addr, 'uint8_t *') 118with open(filepath, 'wb') as f: 119 f.write(data_ptr[:4096]) 120``` 121 122And instead use: 123 124``` 125from core.io import SBProcessRawIO 126import shutil 127 128io_access = SBProcessRawIO(LazyTarget.GetProcess(), start_addr, 4096) 129with open(filepath, 'wb') as f: 130 shutil.copyfileobj(io_access, f) 131``` 132 133Or, if you're in a hurry: 134 135``` 136err = lldb.SBError() 137my_data = LazyTarget.GetProcess().ReadMemory(start_addr, length, err) 138if err.Success(): 139 # Use my precious data 140 pass 141``` 142 143For small semi-contiguous chunks, you can map the whole region and access random chunks from it like so: 144 145``` 146from core.io import SBProcessRawIO 147 148io_access = SBProcessRawIO(LazyTarget.GetProcess(), start_addr, size) 149io_access.seek(my_struct_offset) 150my_struct_contents = io_access.read(my_struct_size) 151``` 152 153Not only that, but you can also tack on a BufferedRandom class on top of the SBProcessRawIO instance, which 154provides you with buffering (aka caching) in case your random small chunk accesses are repeated: 155 156``` 157from core.io import SBProcessRawIO 158from io import BufferedRandom 159 160io_access = SBProcessRawIO(LazyTarget.GetProcess(), start_addr, size) 161buffered_io = BufferedRandom(io_access) 162# And then use buffered_io for your accesses 163``` 164 165### Encoding data to strings and back 166 167The simplest solution is to use **six** library and one of the functions like: 168 169``` 170mystring = six.ensure_str(object) 171``` 172 173This ensures the resulting value is a native string. It deals with Unicode in Python 2 automatically. 174The six library is still required even if data is encoding manually, since it converts types. 175 176``` 177from builtins import bytes 178str = six.ensure_str(bytes.decode('utf-8')) 179``` 180 181When converting data to a string, add an encoding type so Python knows how handle raw bytes. In most 182cases **utf-8** will work but be careful to be sure that the encoding matches your data. 183 184There are two options to consider when trying to get a string out of the raw data without knowing if 185they are valid string or not: 186 187* **lossy conversion** - escapes all non-standard characters in form of ‘\xNNN’ 188* **lossless conversion** - maps invalid characters to special unicode range so it can reconstruct 189the string precisely 190 191Which to use depends on the transformation goals. The lossy conversion produces a printable string 192with strange characters in it. The lossless option is meant to be used when a string is only a transport 193mechanism and needs to be converted back to original values later. 194 195Switch the method by using `errors` handler during conversion: 196 197``` 198# Lossy escapes invalid chars 199b.decode('utf-8', errors='`backslashreplace'`) 200# Lossy removes invalid chars 201b.decode('utf-8', errors='ignore') 202# Loss-less but may likely fail to print() 203b.decode('utf-8', errors='surrogateescape') 204``` 205 206## Handling numbers 207 208Numeric types are incompatible between Python 2 and 3: 209 210* **long** is not available in Python 3. 211* **int** is the only integral type in Python 3 and hasunlimited precission (but 32-bits in Python 2). 212 213This creates all sorts of issues with macros. Follow these rules to make integral types compatible 214in both modes: 215 216* Do not use **long** — replace it with **int**. 217* When using the **value** class, types will be promoted to **long** as there is special number 218handling in the xnu macro library. Remaining code should be reviewed and fixed, if appropriate. 219* Avoid relying on sign extension. 220* Always switch Python to use Python 3 division, where `/` converts to floating point and does 221a fractional division `//` is a floor division (like integers in C): 222 `from __future__ import division 223 ` 224* Use division operators according to Python 3 rules. 225 226### Common integer representation 227 228The goal is to always use Python 3’s integer handling, which means using **int** everywhere. 229 230xnu’s macros provide a custom integer type called **valueint** that is a replacement for **int** 231in the Python 2 runtime. That means it behaves almost like **int** from Python 3. When importing 232from macros this type replaces any use of **int**: 233 234``` 235# Replaces all int()s to be valueint 236from xnu import * 237from xnu import int 238 239# Does not replace int()s 240import xnu 241from xnu import a, b, c 242``` 243 244Avoid using `from builtins import int` suggested on the internet. It does not work correctly with 245xnu’s **value** class. The **valueint** class inherits from **newint** and fixes problematic behavior. 246 247This impacts the way an object is checked for being an integer. Be careful about following constructs: 248 249``` 250# BAD: generally not a good way to do type checking in Python 251if type(obj) is int: 252 253# BAD: int may have been replaced with valueint. 254if isinstance(obj, int): 255``` 256 257Instead, use the base integral type: 258 259``` 260if isinstance(obj, numbers.Integral): 261``` 262 263### Dealing with signed numbers 264 265Original code was using two operators to convert **value** class instance to number: 266 267* **__int__** produced **int** and was either signed or unsigned based on underlying SBType. 268* **__long__** was always signed. 269 270This is confusing when dealing with types. Always use **unsigned()** or **signed()** regardless of 271what the actual underlying type is to ensure that macros use the correct semantics. 272 273### Dividing numbers 274 275Python 2’s **/** operator has two behaviors depending on the types of its arguments (**float** vs. **int**). 276Always use Python 3’s division operator: 277 278``` 279# Switch compiler to use Python 3 semantics 280from __future__ import division 281 282float_val = a / b # This becomes true, fractional division that yields float 283floor_div = a // b # This is floor division, like C 284``` 285 286If the original behavior is required, use **old_div** to get Python 2 behavior: 287 288``` 289from past.utils import old_div 290 291value = old_div(a, b) # Matches Python 2 semantics 292``` 293 294If this isn’t handled correctly, `format` will complain that a float value is being passed to 295a non-float formatting character. Automated scripts that convert from Python 2 to 3 tend to use 296**old_div** during porting. In most cases that is not required. For kernel debugging and integer 297types, `//` is used commonly to match the C’s division behavior for integers. 298 299## Testing changes 300 301There is no perfect test suite to check that macros are producing a correct value compared to what 302the debugger sees in a target. 303 304Be careful when touching common framework code. For larger changes, ask the Platform Triage team to 305validate that the changes work in their environment before integration. 306 307### Coding style 308 309Use a static analyzer like **pylint** or **flake8** to check the macro source code: 310 311``` 312# Python 2 313$ pip install --user pylint flake8 314 315# Python 3 316$ pip install --user pylint flake8 317 318# Run the lint either by setting your path to point to one of the runtimes 319# or through python 320$ python2 -m pylint <src files/dirs> 321$ python3 -m pylint <src files/dirs> 322$ python2 -m flake8 <src files/dirs> 323$ python3 -m flake8 <src files/dirs> 324``` 325 326### Correctness 327 328Ensure the macro matches what LLDB returns from the REPL. For example, compare `showproc(xxx)` with `p/x *(proc_t)xxx`. 329 330``` 331# 1. Run LLDB with debug options set 332$ DEBUG_XNU_LLDBMACROS=1 LLDB_DEFAULT_PYTHON_VERSION=2 xcrun -sdk <sdk> lldb -c core <dsympath>/mach_kernel 333 334# 2. Optionally load modified operating system plugin 335(lldb) settings set target.process.python-os-plugin-path <srcpath>/tools/lldbmacros/core/operating_system.py 336 337# 3. Load modified scripts 338(lldb) command script import <srcpath>/tools/lldbmacros/xnu.py 339 340# 4. Exercise macros 341``` 342 343Depending on the change, test other targets and architectures (for instance, both Astris and KDP). 344 345### Regression 346 347This is simpler than previous step because the goal is to ensure behavior has not changed. 348You can speed up few things by using local symbols: 349 350``` 351# 1. Get a coredump from a device and kernel UUID 352# 2. Grab symbols with dsymForUUID 353$ dsymForUUID --nocache --copyExecutable --copyDestination <dsym path> 354 355# 3. Run lldb with local symbols to avoid dsymForUUID NFS 356 357$ xcrun -sdk <sdk> lldb -c core <dsym_path>/<kernel image> 358``` 359 360The actual steps are identical to previous testing. Run of a macro to different file with `-o <outfile>` 361option. Then run `diff` on the outputs of the baseline and both Python 2 and 3: 362 363* No environment variables to get baseline 364* Python 2 with changes 365* Python 3 with changes 366 367There may be different ordering of elements based on internal implementation differences of each 368Python runtime. Some macros produce files — check the actual file contents. 369 370It’s difficult to make this automated: 371 372* Some macros needs arguments which must be found in a core file. 373* Some macros take a long time to run against a target (more than 30 minutes). Instead, a core dump 374 should be taken and then inspected afterwards, but this ties up a lab device for the duration of the 375 test. 376* Even with coredumps, testing the macros takes too long in our automation system and triggers the 377 failsafe timeout. 378 379### Code coverage 380 381Use code coverage to check which parts of macros have actually been tested. 382Install **coverage** lib with: 383 384``` 385$ pip install --user coverage 386$ pip3 install --user coverage 387``` 388 389Then collect coverage:. 390 391``` 392# 1. Start LLDB with your macros as described above. 393 394# 2. Load and start code coverage recording. 395(lldb) script import coverage 396(lldb) script cov = coverage.Coverage() 397(lldb) script cov.start() 398 399# 3. Do the testing. 400 401# 4. Collect the coverage. 402(lldb) script cov.stop() 403(lldb) script cov.save() 404``` 405 406You can override the default file (*.coverage*) by adding an additional environment variable to LLDB: 407 408``` 409$ env COVERAGE_FILE="${OUTDIR}/.coverage.mytest.py2" # usual LLDB command line 410``` 411 412Combine coverage from multiple files: 413 414``` 415# Point PATH to local python where coverage is installed. 416$ export PATH="$HOME/Library/Python/3.8/bin:$PATH" 417 418# Use --keep to avoid deletion of input files after merge. 419$ coverage combine --keep <list of .coverage files or dirs to scan> 420 421# Get HTML report or use other subcommands to inspect. 422$ coverage html 423``` 424 425It is possible to start coverage collection **before** importing the operating system library and 426loading macros to check code run during bootstrapping. 427 428### Performance testing 429 430Some macros can run for a long time. Some code may be costly even if it looks simple because objects 431aren’t cached or too many temporary objects are created. Simple profiling is similar to collecting 432code coverage. 433 434First setup your environment: 435 436``` 437# Install gprof2dot 438$ python3 -m pip install gprof2dot 439# Install graphviz 440$ brew install graphviz 441``` 442 443Then to profile commands, follow this sequence: 444 445``` 446(lldb) xnudebug profile /tmp/macro.prof showcurrentstacks 447[... command outputs ...] 448 449 Ordered by: cumulative time 450 List reduced from 468 to 30 due to restriction <30> 451 452 ncalls tottime percall cumtime percall filename:lineno(function) 453 [... profiling output ...] 454 455Profile info saved to "/tmp/macro.prof" 456``` 457 458Then to visualize callgraphs in context, in a separate shell: 459 460``` 461# Now convert the file to a colored SVG call graph 462$ python3 -m gprof2dot -f pstats /tmp/macro.prof -o /tmp/call.dot 463$ dot -O -T svg /tmp/call.dot 464 465# and view it in your favourite viewer 466$ open /tmp/call.dot.svg 467``` 468 469## Debugging your changes 470 471YES, It is possible to use a debugger to debug your code! 472 473The steps are similar to testing techniques described above (use scripting interactive mode). There is no point to 474document the debugger itself. Lets focus on how to use it on a real life example. The debugger used here is PDB which 475is part of Python installation so works out of the box. 476 477Problem: Something wrong is going on with addkext macro. What now? 478 479 (lldb) addkext -N com.apple.driver.AppleT8103PCIeC 480 Failed to read MachO for address 18446741875027613136 errormessage: seek to offset 2169512 is outside window [0, 1310] 481 Failed to read MachO for address 18446741875033537424 errormessage: seek to offset 8093880 is outside window [0, 1536] 482 Failed to read MachO for address 18446741875033568304 errormessage: seek to offset 8124208 is outside window [0, 1536] 483 ... 484 Fetching dSYM for 049b9a29-2efc-32c0-8a7f-5f29c12b870c 485 Adding dSYM (049b9a29-2efc-32c0-8a7f-5f29c12b870c) for /Library/Caches/com.apple.bni.symbols/bursar.apple.com/dsyms/StarE/AppleEmbeddedPCIE/AppleEmbeddedPCIE-502.100.35~3/049B9A29-2EFC-32C0-8A7F-5F29C12B870C/AppleT8103PCIeC 486 section '__TEXT' loaded at 0xfffffe001478c780 487 488There is no exception, lot of errors and no output. So what next? 489Try to narrow the problem down to an isolated piece of macro code: 490 491 1. Try to get values of globals through regular LLDB commands 492 2. Use interactive mode and invoke functions with arguments directly. 493 494After inspecting addkext macro code and calling few functions with arguments directly we can see that there is an 495exception in the end. It was just captured in try/catch block. So the simplified reproducer is: 496 497 (lldb) script 498 >>> import lldb 499 >>> import xnu 500 >>> err = lldb.SBError() 501 >>> data = xnu.LazyTarget.GetProcess().ReadMemory(0xfffffe0014c0f3f0, 0x000000000001b5d0, err) 502 >>> m = macho.MemMacho(data, len(data)) 503 Traceback (most recent call last): 504 File "<console>", line 1, in <module> 505 File ".../lldbmacros/macho.py", line 91, in __init__ 506 self.load(fp) 507 File ".../site-packages/macholib/MachO.py", line 133, in load 508 self.load_header(fh, 0, size) 509 File ".../site-packages/macholib/MachO.py", line 168, in load_header 510 hdr = MachOHeader(self, fh, offset, size, magic, hdr, endian) 511 File ".../site-packages/macholib/MachO.py", line 209, in __init__ 512 self.load(fh) 513 File ".../lldbmacros/macho.py", line 23, in new_load 514 _old_MachOHeader_load(s, fh) 515 File ".../site-packages/macholib/MachO.py", line 287, in load 516 fh.seek(seg.offset) 517 File ".../site-packages/macholib/util.py", line 91, in seek 518 self._checkwindow(seekto, "seek") 519 File ".../site-packages/macholib/util.py", line 76, in _checkwindow 520 raise IOError( 521 OSError: seek to offset 9042440 is outside window [0, 112080] 522 523Clearly an external library is involved and execution flow jumps between dSYM and the library few times. 524Lets try to look around with a debugger. 525 526 (lldb) script 527 # Prepare data variable as described above. 528 529 # Run last statement with debugger. 530 >>> import pdb 531 >>> pdb.run('m = macho.MemMacho(data, len(data))', globals(), locals()) 532 > <string>(1)<module>() 533 534 # Show debugger's help 535 (Pdb) help 536 537It is not possible to break on exception. Python uses them a lot so it is better to put a breakpoint to source 538code. This puts breakpoint on the IOError exception mentioned above. 539 540 (Pdb) break ~/Library/Python/3.8/lib/python/site-packages/macholib/util.py:76 541 Breakpoint 4 at ~/Library/Python/3.8/lib/python/site-packages/macholib/util.py:76 542 543You can now single step or continue the execution as usuall for a debugger. 544 545 (Pdb) cont 546 > /Users/tjedlicka/Library/Python/3.8/lib/python/site-packages/macholib/util.py(76)_checkwindow() 547 -> raise IOError( 548 (Pdb) bt 549 /Volumes/.../Python3.framework/Versions/3.8/lib/python3.8/bdb.py(580)run() 550 -> exec(cmd, globals, locals) 551 <string>(1)<module>() 552 /Volumes/...dSYM/Contents/Resources/Python/lldbmacros/macho.py(91)__init__() 553 -> self.load(fp) 554 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(133)load() 555 -> self.load_header(fh, 0, size) 556 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(168)load_header() 557 -> hdr = MachOHeader(self, fh, offset, size, magic, hdr, endian) 558 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(209)__init__() 559 -> self.load(fh) 560 /Volumes/...dSYM/Contents/Resources/Python/lldbmacros/macho.py(23)new_load() 561 -> _old_MachOHeader_load(s, fh) 562 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(287)load() 563 -> fh.seek(seg.offset) 564 /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/util.py(91)seek() 565 -> self._checkwindow(seekto, "seek") 566 > /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/util.py(76)_checkwindow() 567 -> raise IOError( 568 569 570Now we can move a frame above and inspect stopped target: 571 572 # Show current frame arguments 573 (Pdb) up 574 (Pdb) a 575 self = <fileview [0, 112080] <macho.MemFile object at 0x1075cafd0>> 576 offset = 9042440 577 whence = 0 578 579 # globals, local or expressons 580 (Pdb) p type(seg.offset) 581 <class 'macholib.ptypes.p_uint32'> 582 (Pdb) p hex(seg.offset) 583 '0x89fa08' 584 585 # Find attributes of a Python object. 586 (Pdb) p dir(section_cls) 587 ['__class__', '__cmp__', ... ,'reserved3', 'sectname', 'segname', 'size', 'to_fileobj', 'to_mmap', 'to_str'] 588 (Pdb) p section_cls.sectname 589 <property object at 0x1077bbef0> 590 591Unfortunately everything looks correct but there is actually one ineteresting frame in the stack. The one which 592provides the offset to the seek method. Lets see where we are in the source code. 593 594 (Pdb) up 595 > /Users/tjedlicka/Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(287)load() 596 -> fh.seek(seg.offset) 597 (Pdb) list 598 282 not_zerofill = (seg.flags & S_ZEROFILL) != S_ZEROFILL 599 283 if seg.offset > 0 and seg.size > 0 and not_zerofill: 600 284 low_offset = min(low_offset, seg.offset) 601 285 if not_zerofill: 602 286 c = fh.tell() 603 287 -> fh.seek(seg.offset) 604 288 sd = fh.read(seg.size) 605 289 seg.add_section_data(sd) 606 290 fh.seek(c) 607 291 segs.append(seg) 608 292 # data is a list of segments 609 610Running debugger on working case and stepping through the load() method shows that this code is not present. 611That means we are broken by a library update! Older versions of library do not load data for a section. 612