xref: /xnu-8792.41.9/doc/debugging.md (revision 5c2921b07a2480ab43ec66f5b9e41cb872bc554f)
1# XNU debugging
2
3xnu’s debugging macros are compatible with both Python 2 and 3. In practice, this means that Python 3
4features are unavailable and some Python 2 syntax is not allowed. Unfortunately, any syntax error will
5prevent use of all the macros, as they’re all imported into the same scripting environment.
6
7## Compatibility
8
9Avoid introducing specific compatibility shims, as there are a few existing ones that come with
10Python 2 and 3:
11
12* **six** has helpers that work in both Python 2 and 3, for things like the string type change
13* **future** backports features from Python 3 to Python 2
14
15For example, Python 2 contains **range** and **xrange**. Python 3 contains only **range** which has
16**xrange** semantics. The simplest solution is to port your code and use Python 3 way:
17
18```
19# Use backported range from Python 3
20from builtins import range
21
22# Use range on both Python 2/3 runtimes
23for x in range(....):
24   ....
25```
26
27Be very careful about using imports from 'future' library. Some of them are **very invasive** and change
28behavior of your code. This may cause strange runtime errors. For example:
29
30```
31# Changes modules handling logic to make your code working with std library reorg (PEP 3108)
32from future import standard_library
33standard_library.install_aliases()
34
35# Replaces lot of common types like str with future's Python 3 backports.
36from builtins import *
37```
38
39## Handling strings
40
41Macros use strings produced from the LLDB runtime. They must use **six** when doing certain operations
42to avoid exceptions. Until the transition is done, these canonical ways of dealing with strings cannot
43be used:
44
45* Using Unicode literals by default:
46   `from __future__ import unicode_literals`
47* **f-strings**
48
49Some advice:
50
51* Use byte strings explicitly when dealing with memory and not strings:
52  `b'string'`
53* Always properly encode/decode raw data to/from strings before passing it around, with `six.ensure_str` or
54  `six.ensure_bytes`.
55
56Improperly-typed strings will raise *different* exceptions on each runtime.
57
58* Python 2 raises codec exceptions when printing strings.
59* Python 3 complains about concatenation of objects of incompatible types (bytes and strings).
60
61### No convenient, common string type
62
63While it is possible to use future’s **newstr** to backport new string type to Python 3, there are
64issues with the Scripting Bridge (SB) API from LLDB. Python 3 will work out of the box but Python 2
65will complain because **newstr** maps to **unicode**. SB exposes **const char \*** as a native string,
66or just **str** in Python 2. For Python 2 we would have to explicitly encode all Unicode strings
67before calling the API.
68
69Another problem is that literals in form `'string'` are no longer compatible with unicode and need
70to be switched to `u'string'`. This can be changed with single import at the top of the file, but
71in some scenarios byte strings are expected. That change would require checking all strings in the
72code and changing some back to  `b'string'`.
73
74Here’s an example of just how pervasive a change would be because this code would break in Python 2:
75
76```
77from xnu import *
78
79@lldb_type_summary(['type'])
80def print_summary():
81   ....
82```
83
84The result is that we have non-unicode literal being registered with unicode API in Python 3.
85Unfortunately `'type' != b'type'` and thus LLDB will never match the type when printing summaries.
86
87Using native strings and literals allows for only minimal code changes to the macros that are still
88compatible with other projects using Python 2.
89
90### Check that an object is a string
91
92Avoid testing for `str` explicitly like `type(obj) == str`. This won’t work correctly as Python 2
93has multiple string types (`unicode`, `str`). Additionally, compatibility shims might introduce new
94string types.
95
96Instead, always use an inheritance-sensitive like like `isinstance(obj, six.string_types)`.
97
98### Dealing with binary data
99
100Python 2 bytes and strings are the same thing. This was the wrong design decision and Python 3
101(wisely) switched to using a separate type for human text. This lack of distinction in Python 2
102caused many programming errors, so it’s recommended to use **bytearray**, **bytes**, and
103**memoryviews** instead of a string. If a string is really required, encode the raw data explicitly
104using an escape method.
105
106### Accessing large amounts of binary data (or accessing small amounts frequently)
107
108In case you're planning on accessing large contiguous blocks of memory (e.g. reading a whole 10KB of memory),
109or you're accessing small semi-contiguous chunks (e.g. if you're parsing large structured data), then it might
110be hugely beneficial performance-wise to make use of the `io.SBProcessRawIO` class. Furthermore, if you're in
111a hurry and just want to read one specific chunk once, then it might be easier to use `LazyTarget.GetProcess().ReadMemory()`
112directly.
113
114In other words, avoid the following:
115
116```
117data_ptr = kern.GetValueFromAddress(start_addr, 'uint8_t *')
118with open(filepath, 'wb') as f:
119    f.write(data_ptr[:4096])
120```
121
122And instead use:
123
124```
125from core.io import SBProcessRawIO
126import shutil
127
128io_access = SBProcessRawIO(LazyTarget.GetProcess(), start_addr, 4096)
129with open(filepath, 'wb') as f:
130    shutil.copyfileobj(io_access, f)
131```
132
133Or, if you're in a hurry:
134
135```
136err = lldb.SBError()
137my_data = LazyTarget.GetProcess().ReadMemory(start_addr, length, err)
138if err.Success():
139    # Use my precious data
140    pass
141```
142
143For small semi-contiguous chunks, you can map the whole region and access random chunks from it like so:
144
145```
146from core.io import SBProcessRawIO
147
148io_access = SBProcessRawIO(LazyTarget.GetProcess(), start_addr, size)
149io_access.seek(my_struct_offset)
150my_struct_contents = io_access.read(my_struct_size)
151```
152
153Not only that, but you can also tack on a BufferedRandom class on top of the SBProcessRawIO instance, which
154provides you with buffering (aka caching) in case your random small chunk accesses are repeated:
155
156```
157from core.io import SBProcessRawIO
158from io import BufferedRandom
159
160io_access = SBProcessRawIO(LazyTarget.GetProcess(), start_addr, size)
161buffered_io = BufferedRandom(io_access)
162# And then use buffered_io for your accesses
163```
164
165### Encoding data to strings and back
166
167The simplest solution is to use **six** library and one of the functions like:
168
169```
170mystring = six.ensure_str(object)
171```
172
173This ensures the resulting value is a native string. It deals with Unicode in Python 2 automatically.
174The six library is still required even if data is encoding manually, since it converts types.
175
176```
177from builtins import bytes
178str = six.ensure_str(bytes.decode('utf-8'))
179```
180
181When converting data to a string, add an encoding type so Python knows how handle raw bytes. In most
182cases **utf-8** will work but be careful to be sure that the encoding matches your data.
183
184There are two options to consider when trying to get a string out of the raw data without knowing if
185they are valid string or not:
186
187* **lossy conversion** - escapes all non-standard characters in form of ‘\xNNN’
188* **lossless conversion** - maps invalid characters to special unicode range so it can reconstruct
189the string precisely
190
191Which to use depends on the transformation goals. The lossy conversion produces a printable string
192with strange characters in it. The lossless option is meant to be used when a string is only a transport
193mechanism and needs to be converted back to original values later.
194
195Switch the method by using `errors` handler during conversion:
196
197```
198# Lossy escapes invalid chars
199b.decode('utf-8', errors='`backslashreplace'`)
200# Lossy removes invalid chars
201b.decode('utf-8', errors='ignore')
202# Loss-less but may likely fail to print()
203b.decode('utf-8', errors='surrogateescape')
204```
205
206## Handling numbers
207
208Numeric types are incompatible between Python 2 and 3:
209
210* **long** is not available in Python 3.
211* **int** is the only integral type in Python 3 and hasunlimited precission (but 32-bits in Python 2).
212
213This creates all sorts of issues with macros. Follow these rules to make integral types compatible
214in both modes:
215
216* Do not use **long** — replace it with **int**.
217* When using the **value** class, types will be promoted to **long** as there is special number
218handling in the xnu macro library. Remaining code should be reviewed and fixed, if appropriate.
219* Avoid relying on sign extension.
220* Always switch Python to use Python 3 division, where `/` converts to floating point and does
221a fractional division `//` is a floor division (like integers in C):
222   `from __future__ import division
223   `
224* Use division operators according to Python 3 rules.
225
226### Common integer representation
227
228The goal is to always use Python 3’s integer handling, which means using **int** everywhere.
229
230xnu’s macros provide a custom integer type called **valueint** that is a replacement for **int**
231in the Python 2 runtime. That means it behaves almost like **int** from Python 3. When importing
232from macros this type replaces any use of **int**:
233
234```
235# Replaces all int()s to be valueint
236from xnu import *
237from xnu import int
238
239# Does not replace int()s
240import xnu
241from xnu import a, b, c
242```
243
244Avoid using `from builtins import int` suggested on the internet. It does not work correctly with
245xnu’s **value** class. The **valueint** class inherits from **newint** and fixes problematic behavior.
246
247This impacts the way an object is checked for being an integer. Be careful about following constructs:
248
249```
250# BAD: generally not a good way to do type checking in Python
251if type(obj) is int:
252
253# BAD: int may have been replaced with valueint.
254if isinstance(obj, int):
255```
256
257Instead, use the base integral type:
258
259```
260if isinstance(obj, numbers.Integral):
261```
262
263### Dealing with signed numbers
264
265Original code was using two operators to convert **value** class instance to number:
266
267* **__int__** produced **int** and was either signed or unsigned based on underlying SBType.
268* **__long__** was always signed.
269
270This is confusing when dealing with types. Always use **unsigned()** or **signed()** regardless of
271what the actual underlying type is to ensure that macros use the correct semantics.
272
273### Dividing numbers
274
275Python 2’s **/** operator has two behaviors depending on the types of its arguments (**float** vs. **int**).
276Always use Python 3’s division operator:
277
278```
279# Switch compiler to use Python 3 semantics
280from __future__ import division
281
282float_val = a / b  # This becomes true, fractional division that yields float
283floor_div = a // b # This is floor division, like C
284```
285
286If the original behavior is required, use **old_div** to get Python 2 behavior:
287
288```
289from past.utils import old_div
290
291value = old_div(a, b)     # Matches Python 2 semantics
292```
293
294If this isn’t handled correctly, `format` will complain that a float value is being passed to
295a non-float formatting character. Automated scripts that convert from Python 2 to 3 tend to use
296**old_div** during porting. In most cases that is not required. For kernel debugging and integer
297types, `//` is used commonly to match the C’s division behavior for integers.
298
299## Testing changes
300
301There is no perfect test suite to check that macros are producing a correct value compared to what
302the debugger sees in a target.
303
304Be careful when touching common framework code. For larger changes, ask the Platform Triage team to
305validate that the changes work in their environment before integration.
306
307### Coding style
308
309Use a static analyzer like **pylint** or **flake8** to check the macro source code:
310
311```
312# Python 2
313$ pip install --user pylint flake8
314
315# Python 3
316$ pip install --user pylint flake8
317
318# Run the lint either by setting your path to point to one of the runtimes
319# or through python
320$ python2 -m pylint <src files/dirs>
321$ python3 -m pylint <src files/dirs>
322$ python2 -m flake8 <src files/dirs>
323$ python3 -m flake8 <src files/dirs>
324```
325
326### Correctness
327
328Ensure the macro matches what LLDB returns from the REPL. For example, compare `showproc(xxx)` with `p/x *(proc_t)xxx`.
329
330```
331# 1. Run LLDB with debug options set
332$ DEBUG_XNU_LLDBMACROS=1 LLDB_DEFAULT_PYTHON_VERSION=2 xcrun -sdk <sdk> lldb -c core <dsympath>/mach_kernel
333
334# 2. Optionally load modified operating system plugin
335(lldb) settings set target.process.python-os-plugin-path <srcpath>/tools/lldbmacros/core/operating_system.py
336
337# 3. Load modified scripts
338(lldb) command script import <srcpath>/tools/lldbmacros/xnu.py
339
340# 4. Exercise macros
341```
342
343Depending on the change, test other targets and architectures (for instance, both Astris and KDP).
344
345### Regression
346
347This is simpler than previous step because the goal is to ensure behavior has not changed.
348You can speed up few things by using local symbols:
349
350```
351# 1. Get a coredump from a device and kernel UUID
352# 2. Grab symbols with dsymForUUID
353$ dsymForUUID --nocache --copyExecutable --copyDestination <dsym path>
354
355# 3. Run lldb with local symbols to avoid dsymForUUID NFS
356
357$ xcrun -sdk <sdk> lldb -c core <dsym_path>/<kernel image>
358```
359
360The actual steps are identical to previous testing. Run of a macro to different file with `-o <outfile>`
361option. Then run `diff` on the outputs of the baseline and both Python 2 and 3:
362
363* No environment variables to get baseline
364* Python 2 with changes
365* Python 3 with changes
366
367There may be different ordering of elements based on internal implementation differences of each
368Python runtime. Some macros produce files — check the actual file contents.
369
370It’s difficult to make this automated:
371
372* Some macros needs arguments which must be found in a core file.
373* Some macros take a long time to run against a target (more than 30 minutes). Instead, a core dump
374  should be taken and then inspected afterwards, but this ties up a lab device for the duration of the
375  test.
376* Even with coredumps, testing the macros takes too long in our automation system and triggers the
377  failsafe timeout.
378
379### Code coverage
380
381Use code coverage to check which parts of macros have actually been tested.
382Install **coverage** lib with:
383
384```
385$ pip install --user coverage
386$ pip3 install --user coverage
387```
388
389Then collect coverage:.
390
391```
392# 1. Start LLDB with your macros as described above.
393
394# 2. Load and start code coverage recording.
395(lldb) script import coverage
396(lldb) script cov = coverage.Coverage()
397(lldb) script cov.start()
398
399# 3. Do the testing.
400
401# 4. Collect the coverage.
402(lldb) script cov.stop()
403(lldb) script cov.save()
404```
405
406You can override the default file (*.coverage*) by adding an additional environment variable to LLDB:
407
408```
409$ env COVERAGE_FILE="${OUTDIR}/.coverage.mytest.py2" # usual LLDB command line
410```
411
412Combine coverage from multiple files:
413
414```
415# Point PATH to local python where coverage is installed.
416$ export PATH="$HOME/Library/Python/3.8/bin:$PATH"
417
418# Use --keep to avoid deletion of input files after merge.
419$ coverage combine --keep <list of .coverage files or dirs to scan>
420
421# Get HTML report or use other subcommands to inspect.
422$ coverage html
423```
424
425It is possible to start coverage collection **before** importing the operating system library and
426loading macros to check code run during bootstrapping.
427
428### Performance testing
429
430Some macros can run for a long time. Some code may be costly even if it looks simple because objects
431aren’t cached or too many temporary objects are created. Simple profiling is similar to collecting
432code coverage.
433
434Run this in LLDB to get a profile:
435
436```
437# Python 2 example (Python 3 is slightly different)
438(lldb) script import cProfile, pstats, StringIO
439(lldb) script pr = cProfile.Profile()
440(lldb) script pr.enable()
441# Run the macro here:
442(lldb) showcurrentstacks
443(lldb) script pr.disable()
444(lldb) script s = StringIO.StringIO()
445(lldb) script ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')
446(lldb) script ps.print_stats()
447(lldb) script print(s.getvalue())
448```
449
450This will use a function call profiler to collect information about which functions took the most
451time during the macro’s execution. For example:
452
453```
454
455        1292170 function calls (1291646 primitive calls) in 3.425 seconds
456
457  Ordered by: cumulative time
458
459  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
460       1    0.000    0.000    3.424    3.424 <src>/tools/lldbmacros/xnu.py:104(_internal_command_function)
461       1    0.000    0.000    3.424    3.424 <src>/tools/lldbmacros/process.py:1389(ShowCurrentStacks)
462       6    0.002    0.000    2.031    0.338 <src>/tools/lldbmacros/xnu.py:358(GetThreadBackTrace)
463     467    0.003    0.000    1.969    0.004 <src>/tools/lldbmacros/core/cvalue.py:464(cast)
464       1    0.000    0.000    1.757    1.757 <src>/tools/lldbmacros/xnu.py:323(GetKextSymbolInfo)
465       1    0.006    0.006    1.756    1.756 <src>/tools/lldbmacros/memory.py:2181(GetKextLoadInformation)
466     256    0.000    0.000    1.711    0.007 <src>/tools/lldbmacros/utils.py:142(Cast)
467     473    0.002    0.000    1.365    0.003 <src>/tools/lldbmacros/core/cvalue.py:500(gettype)
468      30    0.000    0.000    1.342    0.045 .../LLDB.framework/Resources/Python2/lldb/__init__.py:10442(FindTypes)
469      30    1.342    0.045    1.342    0.045 {lldb._lldb.SBTarget_FindTypes}
470       6    0.000    0.000    1.129    0.188 <src>/tools/lldbmacros/process.py:324(GetThreadSummary)
471       6    0.000    0.000    1.106    0.184 <src>/tools/lldbmacros/process.py:312(GetThreadName)
472     210    0.027    0.000    0.634    0.003 <src>/tools/lldbmacros/memory.py:2123(GetKmodWithAddr)
473     467    0.005    0.000    0.600    0.001 <src>/tools/lldbmacros/core/cvalue.py:343(_GetValueAsCast)
474```
475
476## Debugging your changes
477
478YES, It is possible to use a debugger to debug your code!
479
480The steps are similar to testing techniques described above (use scrpting interactive mode). There is no point to
481document the debugger itself. Lets focus on how to use it on a real life example. The debugger used here is PDB which
482is part of Python installation so works out of the box.
483
484Problem: Something wrong is going on with addkext macro. What now?
485
486    (lldb) addkext -N com.apple.driver.AppleT8103PCIeC
487    Failed to read MachO for address 18446741875027613136 errormessage: seek to offset 2169512 is outside window [0, 1310]
488    Failed to read MachO for address 18446741875033537424 errormessage: seek to offset 8093880 is outside window [0, 1536]
489    Failed to read MachO for address 18446741875033568304 errormessage: seek to offset 8124208 is outside window [0, 1536]
490	...
491	Fetching dSYM for 049b9a29-2efc-32c0-8a7f-5f29c12b870c
492    Adding dSYM (049b9a29-2efc-32c0-8a7f-5f29c12b870c) for /Library/Caches/com.apple.bni.symbols/bursar.apple.com/dsyms/StarE/AppleEmbeddedPCIE/AppleEmbeddedPCIE-502.100.35~3/049B9A29-2EFC-32C0-8A7F-5F29C12B870C/AppleT8103PCIeC
493    section '__TEXT' loaded at 0xfffffe001478c780
494
495There is no exception, lot of errors and no output. So what next?
496Try to narrow the problem down to an isolated piece of macro code:
497
498  1. Try to get values of globals through regular LLDB commands
499  2. Use interactive mode and invoke functions with arguments directly.
500
501After inspecting addkext macro code and calling few functions with arguments directly we can see that there is an
502exception in the end. It was just captured in try/catch block. So the simplified reproducer is:
503
504    (lldb) script
505	>>> import lldb
506	>>> import xnu
507	>>> err = lldb.SBError()
508	>>> data = xnu.LazyTarget.GetProcess().ReadMemory(0xfffffe0014c0f3f0, 0x000000000001b5d0, err)
509    >>> m = macho.MemMacho(data, len(data))
510    Traceback (most recent call last):
511      File "<console>", line 1, in <module>
512      File ".../lldbmacros/macho.py", line 91, in __init__
513        self.load(fp)
514      File ".../site-packages/macholib/MachO.py", line 133, in load
515        self.load_header(fh, 0, size)
516      File ".../site-packages/macholib/MachO.py", line 168, in load_header
517        hdr = MachOHeader(self, fh, offset, size, magic, hdr, endian)
518      File ".../site-packages/macholib/MachO.py", line 209, in __init__
519        self.load(fh)
520      File ".../lldbmacros/macho.py", line 23, in new_load
521        _old_MachOHeader_load(s, fh)
522      File ".../site-packages/macholib/MachO.py", line 287, in load
523        fh.seek(seg.offset)
524      File ".../site-packages/macholib/util.py", line 91, in seek
525        self._checkwindow(seekto, "seek")
526      File ".../site-packages/macholib/util.py", line 76, in _checkwindow
527        raise IOError(
528    OSError: seek to offset 9042440 is outside window [0, 112080]
529
530Clearly an external library is involved and execution flow jumps between dSYM and the library few times.
531Lets try to look around with a debugger.
532
533    (lldb) script
534	# Prepare data variable as described above.
535
536	# Run last statement with debugger.
537	>>> import pdb
538	>>> pdb.run('m = macho.MemMacho(data, len(data))', globals(), locals())
539	> <string>(1)<module>()
540
541	# Show debugger's help
542	(Pdb) help
543
544It is not possible to break on exception. Python uses them a lot so it is better to put a breakpoint to source
545code. This puts breakpoint on the IOError exception mentioned above.
546
547	(Pdb) break ~/Library/Python/3.8/lib/python/site-packages/macholib/util.py:76
548    Breakpoint 4 at ~/Library/Python/3.8/lib/python/site-packages/macholib/util.py:76
549
550You can now single step or continue the execution as usuall for a debugger.
551
552    (Pdb) cont
553    > /Users/tjedlicka/Library/Python/3.8/lib/python/site-packages/macholib/util.py(76)_checkwindow()
554    -> raise IOError(
555    (Pdb) bt
556      /Volumes/.../Python3.framework/Versions/3.8/lib/python3.8/bdb.py(580)run()
557    -> exec(cmd, globals, locals)
558      <string>(1)<module>()
559      /Volumes/...dSYM/Contents/Resources/Python/lldbmacros/macho.py(91)__init__()
560    -> self.load(fp)
561      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(133)load()
562    -> self.load_header(fh, 0, size)
563      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(168)load_header()
564    -> hdr = MachOHeader(self, fh, offset, size, magic, hdr, endian)
565      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(209)__init__()
566    -> self.load(fh)
567      /Volumes/...dSYM/Contents/Resources/Python/lldbmacros/macho.py(23)new_load()
568    -> _old_MachOHeader_load(s, fh)
569      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(287)load()
570    -> fh.seek(seg.offset)
571      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/util.py(91)seek()
572    -> self._checkwindow(seekto, "seek")
573    > /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/util.py(76)_checkwindow()
574    -> raise IOError(
575
576
577Now we can move a frame above and inspect stopped target:
578
579    # Show current frame arguments
580    (Pdb) up
581    (Pdb) a
582    self = <fileview [0, 112080] <macho.MemFile object at 0x1075cafd0>>
583    offset = 9042440
584    whence = 0
585
586    # globals, local or expressons
587    (Pdb) p type(seg.offset)
588    <class 'macholib.ptypes.p_uint32'>
589    (Pdb) p hex(seg.offset)
590    '0x89fa08'
591
592    # Find attributes of a Python object.
593    (Pdb) p dir(section_cls)
594    ['__class__', '__cmp__', ... ,'reserved3', 'sectname', 'segname', 'size', 'to_fileobj', 'to_mmap', 'to_str']
595    (Pdb) p section_cls.sectname
596    <property object at 0x1077bbef0>
597
598Unfortunately everything looks correct but there is actually one ineteresting frame in the stack. The one which
599provides the offset to the seek method. Lets see where we are in the source code.
600
601    (Pdb) up
602    > /Users/tjedlicka/Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(287)load()
603    -> fh.seek(seg.offset)
604    (Pdb) list
605    282  	                        not_zerofill = (seg.flags & S_ZEROFILL) != S_ZEROFILL
606    283  	                        if seg.offset > 0 and seg.size > 0 and not_zerofill:
607    284  	                            low_offset = min(low_offset, seg.offset)
608    285  	                        if not_zerofill:
609    286  	                            c = fh.tell()
610    287  ->	                            fh.seek(seg.offset)
611    288  	                            sd = fh.read(seg.size)
612    289  	                            seg.add_section_data(sd)
613    290  	                            fh.seek(c)
614    291  	                        segs.append(seg)
615    292  	                # data is a list of segments
616
617Running debugger on working case and stepping through the load() method shows that this code is not present.
618That means we are broken by a library update! Older versions of library do not load data for a section.
619