xref: /xnu-8020.101.4/doc/debugging.md (revision e7776783b89a353188416a9a346c6cdb4928faad)
1# XNU debugging
2
3xnu’s debugging macros are compatible with both Python 2 and 3. In practice, this means that Python 3
4features are unavailable and some Python 2 syntax is not allowed. Unfortunately, any syntax error will
5prevent use of all the macros, as they’re all imported into the same scripting environment.
6
7## Compatibility
8
9Avoid introducing specific compatibility shims, as there are a few existing ones that come with
10Python 2 and 3:
11
12* **six** has helpers that work in both Python 2 and 3, for things like the string type change
13* **future** backports features from Python 3 to Python 2
14
15For example, Python 2 contains **range** and **xrange**. Python 3 contains only **range** which has
16**xrange** semantics. The simplest solution is to port your code and use Python 3 way:
17
18```
19# Use backported range from Python 3
20from builtins import range
21
22# Use range on both Python 2/3 runtimes
23for x in range(....):
24   ....
25```
26
27Be very careful about using imports from 'future' library. Some of them are **very invasive** and change
28behavior of your code. This may cause strange runtime errors. For example:
29
30```
31# Changes modules handling logic to make your code working with std library reorg (PEP 3108)
32from future import standard_library
33standard_library.install_aliases()
34
35# Replaces lot of common types like str with future's Python 3 backports.
36from builtins import *
37```
38
39## Handling strings
40
41Macros use strings produced from the LLDB runtime. They must use **six** when doing certain operations
42to avoid exceptions. Until the transition is done, these canonical ways of dealing with strings cannot
43be used:
44
45* Using Unicode literals by default:
46   `from __future__ import unicode_literals`
47* **f-strings**
48
49Some advice:
50
51* Use byte strings explicitly when dealing with memory and not strings:
52  `b'string'`
53* Always properly encode/decode raw data to/from strings before passing it around, with `six.ensure_str` or
54  `six.ensure_bytes`.
55
56Improperly-typed strings will raise *different* exceptions on each runtime.
57
58* Python 2 raises codec exceptions when printing strings.
59* Python 3 complains about concatenation of objects of incompatible types (bytes and strings).
60
61### No convenient, common string type
62
63While it is possible to use future’s **newstr** to backport new string type to Python 3, there are
64issues with the Scripting Bridge (SB) API from LLDB. Python 3 will work out of the box but Python 2
65will complain because **newstr** maps to **unicode**. SB exposes **const char \*** as a native string,
66or just **str** in Python 2. For Python 2 we would have to explicitly encode all Unicode strings
67before calling the API.
68
69Another problem is that literals in form `'string'` are no longer compatible with unicode and need
70to be switched to `u'string'`. This can be changed with single import at the top of the file, but
71in some scenarios byte strings are expected. That change would require checking all strings in the
72code and changing some back to  `b'string'`.
73
74Here’s an example of just how pervasive a change would be because this code would break in Python 2:
75
76```
77from xnu import *
78
79@lldb_type_summary(['type'])
80def print_summary():
81   ....
82```
83
84The result is that we have non-unicode literal being registered with unicode API in Python 3.
85Unfortunately `'type' != b'type'` and thus LLDB will never match the type when printing summaries.
86
87Using native strings and literals allows for only minimal code changes to the macros that are still
88compatible with other projects using Python 2.
89
90### Check that an object is a string
91
92Avoid testing for `str` explicitly like `type(obj) == str`. This won’t work correctly as Python 2
93has multiple string types (`unicode`, `str`). Additionally, compatibility shims might introduce new
94string types.
95
96Instead, always use an inheritance-sensitive like like `isinstance(obj, six.string_types)`.
97
98### Dealing with binary data
99
100Python 2 bytes and strings are the same thing. This was the wrong design decision and Python 3
101(wisely) switched to using a separate type for human text. This lack of distinction in Python 2
102caused many programming errors, so it’s recommended to use **bytearray**, **bytes**, and
103**memoryviews** instead of a string. If a string is really required, encode the raw data explicitly
104using an escape method.
105
106### Encoding data to strings and back
107
108The simplest solution is to use **six** library and one of the functions like:
109
110```
111mystring = six.ensure_str(object)
112```
113
114This ensures the resulting value is a native string. It deals with Unicode in Python 2 automatically.
115The six library is still required even if data is encoding manually, since it converts types.
116
117```
118from builtins import bytes
119str = six.ensure_str(bytes.decode('utf-8'))
120```
121
122When converting data to a string, add an encoding type so Python knows how handle raw bytes. In most
123cases **utf-8** will work but be careful to be sure that the encoding matches your data.
124
125There are two options to consider when trying to get a string out of the raw data without knowing if
126they are valid string or not:
127
128* **lossy conversion** - escapes all non-standard characters in form of ‘\xNNN’
129* **lossless conversion** - maps invalid characters to special unicode range so it can reconstruct
130the string precisely
131
132Which to use depends on the transformation goals. The lossy conversion produces a printable string
133with strange characters in it. The lossless option is meant to be used when a string is only a transport
134mechanism and needs to be converted back to original values later.
135
136Switch the method by using `errors` handler during conversion:
137
138```
139# Lossy escapes invalid chars
140b.decode('utf-8', errors='`backslashreplace'`)
141# Lossy removes invalid chars
142b.decode('utf-8', errors='ignore')
143# Loss-less but may likely fail to print()
144b.decode('utf-8', errors='surrogateescape')
145```
146
147## Handling numbers
148
149Numeric types are incompatible between Python 2 and 3:
150
151* **long** is not available in Python 3.
152* **int** is the only integral type in Python 3 and hasunlimited precission (but 32-bits in Python 2).
153
154This creates all sorts of issues with macros. Follow these rules to make integral types compatible
155in both modes:
156
157* Do not use **long** — replace it with **int**.
158* When using the **value** class, types will be promoted to **long** as there is special number
159handling in the xnu macro library. Remaining code should be reviewed and fixed, if appropriate.
160* Avoid relying on sign extension.
161* Always switch Python to use Python 3 division, where `/` converts to floating point and does
162a fractional division `//` is a floor division (like integers in C):
163   `from __future__ import division
164   `
165* Use division operators according to Python 3 rules.
166
167### Common integer representation
168
169The goal is to always use Python 3’s integer handling, which means using **int** everywhere.
170
171xnu’s macros provide a custom integer type called **valueint** that is a replacement for **int**
172in the Python 2 runtime. That means it behaves almost like **int** from Python 3. When importing
173from macros this type replaces any use of **int**:
174
175```
176# Replaces all int()s to be valueint
177from xnu import *
178from xnu import int
179
180# Does not replace int()s
181import xnu
182from xnu import a, b, c
183```
184
185Avoid using `from builtins import int` suggested on the internet. It does not work correctly with
186xnu’s **value** class. The **valueint** class inherits from **newint** and fixes problematic behavior.
187
188This impacts the way an object is checked for being an integer. Be careful about following constructs:
189
190```
191# BAD: generally not a good way to do type checking in Python
192if type(obj) is int:
193
194# BAD: int may have been replaced with valueint.
195if isinstance(obj, int):
196```
197
198Instead, use the base integral type:
199
200```
201if isinstance(obj, numbers.Integral):
202```
203
204### Dealing with signed numbers
205
206Original code was using two operators to convert **value** class instance to number:
207
208* **__int__** produced **int** and was either signed or unsigned based on underlying SBType.
209* **__long__** was always signed.
210
211This is confusing when dealing with types. Always use **unsigned()** or **signed()** regardless of
212what the actual underlying type is to ensure that macros use the correct semantics.
213
214### Dividing numbers
215
216Python 2’s **/** operator has two behaviors depending on the types of its arguments (**float** vs. **int**).
217Always use Python 3’s division operator:
218
219```
220# Switch compiler to use Python 3 semantics
221from __future__ import division
222
223float_val = a / b  # This becomes true, fractional division that yields float
224floor_div = a // b # This is floor division, like C
225```
226
227If the original behavior is required, use **old_div** to get Python 2 behavior:
228
229```
230from past.utils import old_div
231
232value = old_div(a, b)     # Matches Python 2 semantics
233```
234
235If this isn’t handled correctly, `format` will complain that a float value is being passed to
236a non-float formatting character. Automated scripts that convert from Python 2 to 3 tend to use
237**old_div** during porting. In most cases that is not required. For kernel debugging and integer
238types, `//` is used commonly to match the C’s division behavior for integers.
239
240## Testing changes
241
242There is no perfect test suite to check that macros are producing a correct value compared to what
243the debugger sees in a target.
244
245Be careful when touching common framework code. For larger changes, ask the Platform Triage team to
246validate that the changes work in their environment before integration.
247
248### Coding style
249
250Use a static analyzer like **pylint** or **flake8** to check the macro source code:
251
252```
253# Python 2
254$ pip install --user pylint flake8
255
256# Python 3
257$ pip install --user pylint flake8
258
259# Run the lint either by setting your path to point to one of the runtimes
260# or through python
261$ python2 -m pylint <src files/dirs>
262$ python3 -m pylint <src files/dirs>
263$ python2 -m flake8 <src files/dirs>
264$ python3 -m flake8 <src files/dirs>
265```
266
267### Correctness
268
269Ensure the macro matches what LLDB returns from the REPL. For example, compare `showproc(xxx)` with `p/x *(proc_t)xxx`.
270
271```
272# 1. Run LLDB with debug options set
273$ DEBUG_XNU_LLDBMACROS=1 LLDB_DEFAULT_PYTHON_VERSION=2 xcrun -sdk <sdk> lldb -c core <dsympath>/mach_kernel
274
275# 2. Optionally load modified operating system plugin
276(lldb) settings set target.process.python-os-plugin-path <srcpath>/tools/lldbmacros/core/operating_system.py
277
278# 3. Load modified scripts
279(lldb) command script import <srcpath>/tools/lldbmacros/xnu.py
280
281# 4. Exercise macros
282```
283
284Depending on the change, test other targets and architectures (for instance, both Astris and KDP).
285
286### Regression
287
288This is simpler than previous step because the goal is to ensure behavior has not changed.
289You can speed up few things by using local symbols:
290
291```
292# 1. Get a coredump from a device and kernel UUID
293# 2. Grab symbols with dsymForUUID
294$ dsymForUUID --nocache --copyExecutable --copyDestination <dsym path>
295
296# 3. Run lldb with local symbols to avoid dsymForUUID NFS
297
298$ xcrun -sdk <sdk> lldb -c core <dsym_path>/<kernel image>
299```
300
301The actual steps are identical to previous testing. Run of a macro to different file with `-o <outfile>`
302option. Then run `diff` on the outputs of the baseline and both Python 2 and 3:
303
304* No environment variables to get baseline
305* Python 2 with changes
306* Python 3 with changes
307
308There may be different ordering of elements based on internal implementation differences of each
309Python runtime. Some macros produce files — check the actual file contents.
310
311It’s difficult to make this automated:
312
313* Some macros needs arguments which must be found in a core file.
314* Some macros take a long time to run against a target (more than 30 minutes). Instead, a core dump
315  should be taken and then inspected afterwards, but this ties up a lab device for the duration of the
316  test.
317* Even with coredumps, testing the macros takes too long in our automation system and triggers the
318  failsafe timeout.
319
320### Code coverage
321
322Use code coverage to check which parts of macros have actually been tested.
323Install **coverage** lib with:
324
325```
326$ pip install --user coverage
327$ pip3 install --user coverage
328```
329
330Then collect coverage:.
331
332```
333# 1. Start LLDB with your macros as described above.
334
335# 2. Load and start code coverage recording.
336(lldb) script import coverage
337(lldb) script cov = coverage.Coverage()
338(lldb) script cov.start()
339
340# 3. Do the testing.
341
342# 4. Collect the coverage.
343(lldb) script cov.stop()
344(lldb) script cov.save()
345```
346
347You can override the default file (*.coverage*) by adding an additional environment variable to LLDB:
348
349```
350$ env COVERAGE_FILE="${OUTDIR}/.coverage.mytest.py2" # usual LLDB command line
351```
352
353Combine coverage from multiple files:
354
355```
356# Point PATH to local python where coverage is installed.
357$ export PATH="$HOME/Library/Python/3.8/bin:$PATH"
358
359# Use --keep to avoid deletion of input files after merge.
360$ coverage combine --keep <list of .coverage files or dirs to scan>
361
362# Get HTML report or use other subcommands to inspect.
363$ coverage html
364```
365
366It is possible to start coverage collection **before** importing the operating system library and
367loading macros to check code run during bootstrapping.
368
369### Performance testing
370
371Some macros can run for a long time. Some code may be costly even if it looks simple because objects
372aren’t cached or too many temporary objects are created. Simple profiling is similar to collecting
373code coverage.
374
375Run this in LLDB to get a profile:
376
377```
378# Python 2 example (Python 3 is slightly different)
379(lldb) script import cProfile, pstats, StringIO
380(lldb) script pr = cProfile.Profile()
381(lldb) script pr.enable()
382# Run the macro here:
383(lldb) showcurrentstacks
384(lldb) script pr.disable()
385(lldb) script s = StringIO.StringIO()
386(lldb) script ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')
387(lldb) script ps.print_stats()
388(lldb) script print(s.getvalue())
389```
390
391This will use a function call profiler to collect information about which functions took the most
392time during the macro’s execution. For example:
393
394```
395
396        1292170 function calls (1291646 primitive calls) in 3.425 seconds
397
398  Ordered by: cumulative time
399
400  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
401       1    0.000    0.000    3.424    3.424 <src>/tools/lldbmacros/xnu.py:104(_internal_command_function)
402       1    0.000    0.000    3.424    3.424 <src>/tools/lldbmacros/process.py:1389(ShowCurrentStacks)
403       6    0.002    0.000    2.031    0.338 <src>/tools/lldbmacros/xnu.py:358(GetThreadBackTrace)
404     467    0.003    0.000    1.969    0.004 <src>/tools/lldbmacros/core/cvalue.py:464(cast)
405       1    0.000    0.000    1.757    1.757 <src>/tools/lldbmacros/xnu.py:323(GetKextSymbolInfo)
406       1    0.006    0.006    1.756    1.756 <src>/tools/lldbmacros/memory.py:2181(GetKextLoadInformation)
407     256    0.000    0.000    1.711    0.007 <src>/tools/lldbmacros/utils.py:142(Cast)
408     473    0.002    0.000    1.365    0.003 <src>/tools/lldbmacros/core/cvalue.py:500(gettype)
409      30    0.000    0.000    1.342    0.045 .../LLDB.framework/Resources/Python2/lldb/__init__.py:10442(FindTypes)
410      30    1.342    0.045    1.342    0.045 {lldb._lldb.SBTarget_FindTypes}
411       6    0.000    0.000    1.129    0.188 <src>/tools/lldbmacros/process.py:324(GetThreadSummary)
412       6    0.000    0.000    1.106    0.184 <src>/tools/lldbmacros/process.py:312(GetThreadName)
413     210    0.027    0.000    0.634    0.003 <src>/tools/lldbmacros/memory.py:2123(GetKmodWithAddr)
414     467    0.005    0.000    0.600    0.001 <src>/tools/lldbmacros/core/cvalue.py:343(_GetValueAsCast)
415```
416
417## Debugging your changes
418
419YES, It is possible to use a debugger to debug your code!
420
421The steps are similar to testing techniques described above (use scrpting interactive mode). There is no point to
422document the debugger itself. Lets focus on how to use it on a real life example. The debugger used here is PDB which
423is part of Python installation so works out of the box.
424
425Problem: Something wrong is going on with addkext macro. What now?
426
427    (lldb) addkext -N com.apple.driver.AppleT8103PCIeC
428    Failed to read MachO for address 18446741875027613136 errormessage: seek to offset 2169512 is outside window [0, 1310]
429    Failed to read MachO for address 18446741875033537424 errormessage: seek to offset 8093880 is outside window [0, 1536]
430    Failed to read MachO for address 18446741875033568304 errormessage: seek to offset 8124208 is outside window [0, 1536]
431	...
432	Fetching dSYM for 049b9a29-2efc-32c0-8a7f-5f29c12b870c
433    Adding dSYM (049b9a29-2efc-32c0-8a7f-5f29c12b870c) for /Library/Caches/com.apple.bni.symbols/bursar.apple.com/dsyms/StarE/AppleEmbeddedPCIE/AppleEmbeddedPCIE-502.100.35~3/049B9A29-2EFC-32C0-8A7F-5F29C12B870C/AppleT8103PCIeC
434    section '__TEXT' loaded at 0xfffffe001478c780
435
436There is no exception, lot of errors and no output. So what next?
437Try to narrow the problem down to an isolated piece of macro code:
438
439  1. Try to get values of globals through regular LLDB commands
440  2. Use interactive mode and invoke functions with arguments directly.
441
442After inspecting addkext macro code and calling few functions with arguments directly we can see that there is an
443exception in the end. It was just captured in try/catch block. So the simplified reproducer is:
444
445    (lldb) script
446	>>> import lldb
447	>>> import xnu
448	>>> err = lldb.SBError()
449	>>> data = xnu.LazyTarget.GetProcess().ReadMemory(0xfffffe0014c0f3f0, 0x000000000001b5d0, err)
450    >>> m = macho.MemMacho(data, len(data))
451    Traceback (most recent call last):
452      File "<console>", line 1, in <module>
453      File ".../lldbmacros/macho.py", line 91, in __init__
454        self.load(fp)
455      File ".../site-packages/macholib/MachO.py", line 133, in load
456        self.load_header(fh, 0, size)
457      File ".../site-packages/macholib/MachO.py", line 168, in load_header
458        hdr = MachOHeader(self, fh, offset, size, magic, hdr, endian)
459      File ".../site-packages/macholib/MachO.py", line 209, in __init__
460        self.load(fh)
461      File ".../lldbmacros/macho.py", line 23, in new_load
462        _old_MachOHeader_load(s, fh)
463      File ".../site-packages/macholib/MachO.py", line 287, in load
464        fh.seek(seg.offset)
465      File ".../site-packages/macholib/util.py", line 91, in seek
466        self._checkwindow(seekto, "seek")
467      File ".../site-packages/macholib/util.py", line 76, in _checkwindow
468        raise IOError(
469    OSError: seek to offset 9042440 is outside window [0, 112080]
470
471Clearly an external library is involved and execution flow jumps between dSYM and the library few times.
472Lets try to look around with a debugger.
473
474    (lldb) script
475	# Prepare data variable as described above.
476
477	# Run last statement with debugger.
478	>>> import pdb
479	>>> pdb.run('m = macho.MemMacho(data, len(data))', globals(), locals())
480	> <string>(1)<module>()
481
482	# Show debugger's help
483	(Pdb) help
484
485It is not possible to break on exception. Python uses them a lot so it is better to put a breakpoint to source
486code. This puts breakpoint on the IOError exception mentioned above.
487
488	(Pdb) break ~/Library/Python/3.8/lib/python/site-packages/macholib/util.py:76
489    Breakpoint 4 at ~/Library/Python/3.8/lib/python/site-packages/macholib/util.py:76
490
491You can now single step or continue the execution as usuall for a debugger.
492
493    (Pdb) cont
494    > /Users/tjedlicka/Library/Python/3.8/lib/python/site-packages/macholib/util.py(76)_checkwindow()
495    -> raise IOError(
496    (Pdb) bt
497      /Volumes/.../Python3.framework/Versions/3.8/lib/python3.8/bdb.py(580)run()
498    -> exec(cmd, globals, locals)
499      <string>(1)<module>()
500      /Volumes/...dSYM/Contents/Resources/Python/lldbmacros/macho.py(91)__init__()
501    -> self.load(fp)
502      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(133)load()
503    -> self.load_header(fh, 0, size)
504      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(168)load_header()
505    -> hdr = MachOHeader(self, fh, offset, size, magic, hdr, endian)
506      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(209)__init__()
507    -> self.load(fh)
508      /Volumes/...dSYM/Contents/Resources/Python/lldbmacros/macho.py(23)new_load()
509    -> _old_MachOHeader_load(s, fh)
510      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(287)load()
511    -> fh.seek(seg.offset)
512      /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/util.py(91)seek()
513    -> self._checkwindow(seekto, "seek")
514    > /Users/.../Library/Python/3.8/lib/python/site-packages/macholib/util.py(76)_checkwindow()
515    -> raise IOError(
516
517
518Now we can move a frame above and inspect stopped target:
519
520    # Show current frame arguments
521    (Pdb) up
522    (Pdb) a
523    self = <fileview [0, 112080] <macho.MemFile object at 0x1075cafd0>>
524    offset = 9042440
525    whence = 0
526
527    # globals, local or expressons
528    (Pdb) p type(seg.offset)
529    <class 'macholib.ptypes.p_uint32'>
530    (Pdb) p hex(seg.offset)
531    '0x89fa08'
532
533    # Find attributes of a Python object.
534    (Pdb) p dir(section_cls)
535    ['__class__', '__cmp__', ... ,'reserved3', 'sectname', 'segname', 'size', 'to_fileobj', 'to_mmap', 'to_str']
536    (Pdb) p section_cls.sectname
537    <property object at 0x1077bbef0>
538
539Unfortunately everything looks correct but there is actually one ineteresting frame in the stack. The one which
540provides the offset to the seek method. Lets see where we are in the source code.
541
542    (Pdb) up
543    > /Users/tjedlicka/Library/Python/3.8/lib/python/site-packages/macholib/MachO.py(287)load()
544    -> fh.seek(seg.offset)
545    (Pdb) list
546    282  	                        not_zerofill = (seg.flags & S_ZEROFILL) != S_ZEROFILL
547    283  	                        if seg.offset > 0 and seg.size > 0 and not_zerofill:
548    284  	                            low_offset = min(low_offset, seg.offset)
549    285  	                        if not_zerofill:
550    286  	                            c = fh.tell()
551    287  ->	                            fh.seek(seg.offset)
552    288  	                            sd = fh.read(seg.size)
553    289  	                            seg.add_section_data(sd)
554    290  	                            fh.seek(c)
555    291  	                        segs.append(seg)
556    292  	                # data is a list of segments
557
558Running debugger on working case and stepping through the load() method shows that this code is not present.
559That means we are broken by a library update! Older versions of library do not load data for a section.
560