xref: /xnu-8792.81.2/doc/recount.md (revision 19c3b8c28c31cb8130e034cfb5df6bf9ba342d90)
1# Recount
2
3Recount is a resource accounting subsystem in the kernel that tracks the CPU resources consumed by threads, tasks, coalitions, and processors.
4It supports attributing counts to a specific level of the CPU topology (per-CPU and per-CPU kind).
5ARM64 devices with a fast timebase read and Intel devices can track time spent in the kernel (system) separately from user space.
664-bit, non-virtualized (e.g. _not_ running under a hypervisor) devices also accumulate instructions and cycles at each context switch.
7These two metrics are abbreviated to cycles-per-instruction, or CPI, for brevity.
8ARM64 devices can also track task and thread energy in nanojoules.
9
10By default, Recount tracks its counters per-CPU kind (e.g. performance or efficiency) for threads, per-CPU for tasks, and per-CPU kind for coalitions.
11
12## High-Level Interfaces
13
14These interfaces report counter data to user space and are backed by Recount.
15
16| Interface                   | Entity      | Target        | Tests | Time | CPI | Energy | Perf Levels |
17| --------------------------: | ----------- | ------------- | :---: | :--: | :-: | :----: | :---------: |
18|                 `getrusage` | task        | self/children |  FP   |  ✓¹  |     |        |             |
19|           `prod_pid_rusage` | task        | pid           |  FP   |  ✓   |  ✓  |   ✓    |     ✓²      |
20|          `PROC_PIDTASKINFO` | task        | pid           |  FP   |  ✓   |  ✓  |        |     ✓²      |
21|           `TASK_BASIC_INFO` | task        | task port     |  FP   |  ✓¹  |     |        |             |
22|    `TASK_ABSOLUTETIME_INFO` | task        | task port     |  FP   |  ✓   |     |        |             |
23|           `TASK_POWER_INFO` | task        | task port     |  FP   |  ✓   |     |        |             |
24| `TASK_INSPECT_BASIC_COUNTS` | task        | task inspect  |   P   |      |  ✓  |        |             |
25|         `THREAD_BASIC_INFO` | thread      | thread port   |   P   |  ✓   |     |        |             |
26|      `THREAD_EXTENDED_INFO` | thread      | thread port   |       |  ✓   |     |        |             |
27|           `proc_threadinfo` | thread      | thread ID     |       |  ✓   |     |        |             |
28|         `proc_threadcounts` | thread      | thread ID     |   F   |  ✓   |  ✓  |   ✓    |      ✓      |
29|         `thread_selfcounts` | thread      | self          |  FP   |  ✓   |  ✓  |   ✓    |      ✓      |
30|          `thread_selfusage` | thread      | self          |  FP   |  ✓   |     |        |             |
31|            `coalition_info` | coalition   | coalition ID  |   F   |  ✓   |  ✓  |   ✓    |     ✓²      |
32|        `HOST_CPU_LOAD_INFO` | system      | all           |       |  ✓   |     |        |             |
33|   `PROCESSOR_CPU_LOAD_INFO` | processor   | port          |       |  ✓   |     |        |             |
34|                 `stackshot` | task/thread | all           |   P   |  ✓   |  ✓  |        |     ✓²      |
35|                      DTrace | thread      | any           |       |  ✓   |  ✓  |        |             |
36|                       kperf | task/thread | any           |       |  ✓   |  ✓  |        |     ✓²      |
37
38- Under Tests, "F" is functional and "P" is performance.
39- ¹ Time precision is microseconds.
40- ² These return overall totals and hard-code a separate, P-core-only value.
41
42## LLDB
43
44The `recount` macro inspects counters in an LLDB session and is generally useful for retrospective analysis of CPU usage.
45Its subcommands print each metric as a column and then uses rows for the groupings, like per-CPU or per-CPU kind values.
46Tables also include formulaic columns that can be derived from two metrics, like CPI or power.
47By default, it prints the times in seconds, but the `-M` flag switches the output to Mach time values.
48
49- `recount thread <thread-ptr> [...]` prints a table of per-CPU kind counts for threads.
50
51- `recount task <task-ptr> [...]` prints a table of per-CPU counts for tasks.
52	- `-T` prints the task's active thread counters in additional tables.
53	- `-F <name>` finds the task matching the provided name instead of using a task pointer.
54
55- `recount coalition <coalition-ptr>` prints a table of per-CPU kind counts for each coalition, not including the currently-active tasks.
56Coalition pointers can be found with the `showtaskcoalitions` macro, and should be _resource_ coalitions.
57
58- `recount processor <processor-ptr-or-cpu-id>` prints a table of counts for a processor.
59	- `-T` prints the processor's active thread counters in an additional table.
60	- `-A` includes all processors in the output.
61
62- `recount diagnose` prints information useful for debugging the Recount subsystem itself.
63
64- `recount triage` is meant to be used by the automated panic debug scripts.
65
66## Internals
67
68Accounting for groups of entities like threads and tasks starts with a `recount_plan_t`, declared using `RECOUNT_PLAN_DECLARE` and defined with `RECOUNT_PLAN_DEFINE`, which takes the topology, or granularity, of the counting.
69The plan topology defines how many `recount_usage` structures are needed.
70To count CPU resource usage, a `struct recount_usage` has the following fields:
71
72- `ru_system_time_mach`: the total time spent in the kernel consumed, in Mach time units
73- `ru_user_time_mach`: the total time spent in user space consumed, in Mach time units
74- `ru_cycles`: the cycles run by a CPU with `CONFIG_PERVASIVE_CPI`
75- `ru_instructions`: the instructions retired by a CPU with `CONFIG_PERVASIVE_CPI`
76- `ru_energy_nj`: the energy consumed by a CPU, in nano-Joules with `CONFIG_PERVASIVE_ENERGY`
77
78At context switch, `recount_switch_thread` captures the hardware counters with `recount_snapshot` into a `struct recount_snap`.
79The CPU's previous snapshot, stored in the `_snaps_percpu` per-CPU variable, is subtracted from the new one to get a delta to add to the currently-executing entity's usage structure.
80The per-CPU variable is then updated with the current snapshot for the next switch.
81The user/kernel transition code calls `recount_leave_user` or `recount_enter_user`, which performs the same operation, except with `recount_snapshot_speculative`.
82It relies on other synchronization barriers in the transition code to provide keep the snapshot precise.
83
84Processors also track their idle time separately from the usage structure with paired calls to `recount_processor_idle` and `recount_processor_run`.
85Idle time has no user component and doesn't consume instructions or cycles, so a full usage structure isn't necessary.
86It stores the last update time in a 64-bit value combined with a state stored in the top two bits to determine whether the processor is currently idle or active.
87
88A `struct recount_track` is the primary data structure found in threads, tasks, and processors.
89Tracks include a `recount_usage` structure but ensures that each is updated atomically with respect to readers.
90
91### Track Atomicity
92
93To ensure the accuracy of formulas involving multiple metrics, like CPI, all metrics must be updated atomically from the perspective of the reader.
94A traditional locking mechanism would prevent the writer from updating the counts while readers are present, so Recount uses a sequence lock instead.
95Writers make a generation count odd before updating any of the values and then set it back to even when all values are updated.
96Readers wait until the generation count becomes even before trying to read the values, and if the counter changes by the time they're done reading them, it retries the read.
97Since three entities need to be updated at once (thread, task, and processor), only the last update has a release barrier to publish the writes.
98When reporting just user and system time, taking the sequence lock as a reader introduced unacceptable overhead.
99The sequence lock doesn't need to be taken for these metrics since they're never updated simultaneously.
100
101The coalition counters are not updated by threads switching off-CPU and are instead protected by the coalition lock while a task exits and rolls up its counters to the coalition.
102Reading the counters requires holding the lock and iterating the constituent tasks, grouping their per-CPU counters into per-CPU kind ones.
103
104### Energy
105
106The energy counters on ARM systems count a custom unit of energy that needs to be scaled to nanojoules.
107Because this unit can be very small and may overflow a 64-bit counter, it's scaled to nanojoules during context-switch.
108