xref: /xnu-8796.101.5/doc/recount.md (revision aca3beaa3dfbd42498b42c5e5ce20a938e6554e5)
1*aca3beaaSApple OSS Distributions# Recount
2*aca3beaaSApple OSS Distributions
3*aca3beaaSApple OSS DistributionsRecount is a resource accounting subsystem in the kernel that tracks the CPU resources consumed by threads, tasks, coalitions, and processors.
4*aca3beaaSApple OSS DistributionsIt supports attributing counts to a specific level of the CPU topology (per-CPU and per-CPU kind).
5*aca3beaaSApple OSS DistributionsARM64 devices with a fast timebase read and Intel devices can track time spent in the kernel (system) separately from user space.
6*aca3beaaSApple OSS Distributions64-bit, non-virtualized (e.g. _not_ running under a hypervisor) devices also accumulate instructions and cycles at each context switch.
7*aca3beaaSApple OSS DistributionsThese two metrics are abbreviated to cycles-per-instruction, or CPI, for brevity.
8*aca3beaaSApple OSS DistributionsARM64 devices can also track task and thread energy in nanojoules.
9*aca3beaaSApple OSS Distributions
10*aca3beaaSApple OSS DistributionsBy default, Recount tracks its counters per-CPU kind (e.g. performance or efficiency) for threads, per-CPU for tasks, and per-CPU kind for coalitions.
11*aca3beaaSApple OSS Distributions
12*aca3beaaSApple OSS Distributions## High-Level Interfaces
13*aca3beaaSApple OSS Distributions
14*aca3beaaSApple OSS DistributionsThese interfaces report counter data to user space and are backed by Recount.
15*aca3beaaSApple OSS Distributions
16*aca3beaaSApple OSS Distributions| Interface                   | Entity      | Target        | Tests | Time | CPI | Energy | Perf Levels |
17*aca3beaaSApple OSS Distributions| --------------------------: | ----------- | ------------- | :---: | :--: | :-: | :----: | :---------: |
18*aca3beaaSApple OSS Distributions|                 `getrusage` | task        | self/children |  FP   |  ✓¹  |     |        |             |
19*aca3beaaSApple OSS Distributions|           `prod_pid_rusage` | task        | pid           |  FP   |  ✓   |  ✓  |   ✓    |     ✓²      |
20*aca3beaaSApple OSS Distributions|          `PROC_PIDTASKINFO` | task        | pid           |  FP   |  ✓   |  ✓  |        |     ✓²      |
21*aca3beaaSApple OSS Distributions|           `TASK_BASIC_INFO` | task        | task port     |  FP   |  ✓¹  |     |        |             |
22*aca3beaaSApple OSS Distributions|    `TASK_ABSOLUTETIME_INFO` | task        | task port     |  FP   |  ✓   |     |        |             |
23*aca3beaaSApple OSS Distributions|           `TASK_POWER_INFO` | task        | task port     |  FP   |  ✓   |     |        |             |
24*aca3beaaSApple OSS Distributions| `TASK_INSPECT_BASIC_COUNTS` | task        | task inspect  |   P   |      |  ✓  |        |             |
25*aca3beaaSApple OSS Distributions|         `THREAD_BASIC_INFO` | thread      | thread port   |   P   |  ✓   |     |        |             |
26*aca3beaaSApple OSS Distributions|      `THREAD_EXTENDED_INFO` | thread      | thread port   |       |  ✓   |     |        |             |
27*aca3beaaSApple OSS Distributions|           `proc_threadinfo` | thread      | thread ID     |       |  ✓   |     |        |             |
28*aca3beaaSApple OSS Distributions|         `proc_threadcounts` | thread      | thread ID     |   F   |  ✓   |  ✓  |   ✓    |      ✓      |
29*aca3beaaSApple OSS Distributions|         `thread_selfcounts` | thread      | self          |  FP   |  ✓   |  ✓  |   ✓    |      ✓      |
30*aca3beaaSApple OSS Distributions|          `thread_selfusage` | thread      | self          |  FP   |  ✓   |     |        |             |
31*aca3beaaSApple OSS Distributions|            `coalition_info` | coalition   | coalition ID  |   F   |  ✓   |  ✓  |   ✓    |     ✓²      |
32*aca3beaaSApple OSS Distributions|        `HOST_CPU_LOAD_INFO` | system      | all           |       |  ✓   |     |        |             |
33*aca3beaaSApple OSS Distributions|   `PROCESSOR_CPU_LOAD_INFO` | processor   | port          |       |  ✓   |     |        |             |
34*aca3beaaSApple OSS Distributions|                 `stackshot` | task/thread | all           |   P   |  ✓   |  ✓  |        |     ✓²      |
35*aca3beaaSApple OSS Distributions|                      DTrace | thread      | any           |       |  ✓   |  ✓  |        |             |
36*aca3beaaSApple OSS Distributions|                       kperf | task/thread | any           |       |  ✓   |  ✓  |        |     ✓²      |
37*aca3beaaSApple OSS Distributions
38*aca3beaaSApple OSS Distributions- Under Tests, "F" is functional and "P" is performance.
39*aca3beaaSApple OSS Distributions- ¹ Time precision is microseconds.
40*aca3beaaSApple OSS Distributions- ² These return overall totals and hard-code a separate, P-core-only value.
41*aca3beaaSApple OSS Distributions
42*aca3beaaSApple OSS Distributions## LLDB
43*aca3beaaSApple OSS Distributions
44*aca3beaaSApple OSS DistributionsThe `recount` macro inspects counters in an LLDB session and is generally useful for retrospective analysis of CPU usage.
45*aca3beaaSApple OSS DistributionsIts subcommands print each metric as a column and then uses rows for the groupings, like per-CPU or per-CPU kind values.
46*aca3beaaSApple OSS DistributionsTables also include formulaic columns that can be derived from two metrics, like CPI or power.
47*aca3beaaSApple OSS DistributionsBy default, it prints the times in seconds, but the `-M` flag switches the output to Mach time values.
48*aca3beaaSApple OSS Distributions
49*aca3beaaSApple OSS Distributions- `recount thread <thread-ptr> [...]` prints a table of per-CPU kind counts for threads.
50*aca3beaaSApple OSS Distributions
51*aca3beaaSApple OSS Distributions- `recount task <task-ptr> [...]` prints a table of per-CPU counts for tasks.
52*aca3beaaSApple OSS Distributions	- `-T` prints the task's active thread counters in additional tables.
53*aca3beaaSApple OSS Distributions	- `-F <name>` finds the task matching the provided name instead of using a task pointer.
54*aca3beaaSApple OSS Distributions
55*aca3beaaSApple OSS Distributions- `recount coalition <coalition-ptr>` prints a table of per-CPU kind counts for each coalition, not including the currently-active tasks.
56*aca3beaaSApple OSS DistributionsCoalition pointers can be found with the `showtaskcoalitions` macro, and should be _resource_ coalitions.
57*aca3beaaSApple OSS Distributions
58*aca3beaaSApple OSS Distributions- `recount processor <processor-ptr-or-cpu-id>` prints a table of counts for a processor.
59*aca3beaaSApple OSS Distributions	- `-T` prints the processor's active thread counters in an additional table.
60*aca3beaaSApple OSS Distributions	- `-A` includes all processors in the output.
61*aca3beaaSApple OSS Distributions
62*aca3beaaSApple OSS Distributions- `recount diagnose` prints information useful for debugging the Recount subsystem itself.
63*aca3beaaSApple OSS Distributions
64*aca3beaaSApple OSS Distributions- `recount triage` is meant to be used by the automated panic debug scripts.
65*aca3beaaSApple OSS Distributions
66*aca3beaaSApple OSS Distributions## Internals
67*aca3beaaSApple OSS Distributions
68*aca3beaaSApple OSS DistributionsAccounting for groups of entities like threads and tasks starts with a `recount_plan_t`, declared using `RECOUNT_PLAN_DECLARE` and defined with `RECOUNT_PLAN_DEFINE`, which takes the topology, or granularity, of the counting.
69*aca3beaaSApple OSS DistributionsThe plan topology defines how many `recount_usage` structures are needed.
70*aca3beaaSApple OSS DistributionsTo count CPU resource usage, a `struct recount_usage` has the following fields:
71*aca3beaaSApple OSS Distributions
72*aca3beaaSApple OSS Distributions- `ru_system_time_mach`: the total time spent in the kernel consumed, in Mach time units
73*aca3beaaSApple OSS Distributions- `ru_user_time_mach`: the total time spent in user space consumed, in Mach time units
74*aca3beaaSApple OSS Distributions- `ru_cycles`: the cycles run by a CPU with `CONFIG_PERVASIVE_CPI`
75*aca3beaaSApple OSS Distributions- `ru_instructions`: the instructions retired by a CPU with `CONFIG_PERVASIVE_CPI`
76*aca3beaaSApple OSS Distributions- `ru_energy_nj`: the energy consumed by a CPU, in nano-Joules with `CONFIG_PERVASIVE_ENERGY`
77*aca3beaaSApple OSS Distributions
78*aca3beaaSApple OSS DistributionsAt context switch, `recount_switch_thread` captures the hardware counters with `recount_snapshot` into a `struct recount_snap`.
79*aca3beaaSApple OSS DistributionsThe CPU's previous snapshot, stored in the `_snaps_percpu` per-CPU variable, is subtracted from the new one to get a delta to add to the currently-executing entity's usage structure.
80*aca3beaaSApple OSS DistributionsThe per-CPU variable is then updated with the current snapshot for the next switch.
81*aca3beaaSApple OSS DistributionsThe user/kernel transition code calls `recount_leave_user` or `recount_enter_user`, which performs the same operation, except with `recount_snapshot_speculative`.
82*aca3beaaSApple OSS DistributionsIt relies on other synchronization barriers in the transition code to provide keep the snapshot precise.
83*aca3beaaSApple OSS Distributions
84*aca3beaaSApple OSS DistributionsProcessors also track their idle time separately from the usage structure with paired calls to `recount_processor_idle` and `recount_processor_run`.
85*aca3beaaSApple OSS DistributionsIdle time has no user component and doesn't consume instructions or cycles, so a full usage structure isn't necessary.
86*aca3beaaSApple OSS DistributionsIt stores the last update time in a 64-bit value combined with a state stored in the top two bits to determine whether the processor is currently idle or active.
87*aca3beaaSApple OSS Distributions
88*aca3beaaSApple OSS DistributionsA `struct recount_track` is the primary data structure found in threads, tasks, and processors.
89*aca3beaaSApple OSS DistributionsTracks include a `recount_usage` structure but ensures that each is updated atomically with respect to readers.
90*aca3beaaSApple OSS Distributions
91*aca3beaaSApple OSS Distributions### Track Atomicity
92*aca3beaaSApple OSS Distributions
93*aca3beaaSApple OSS DistributionsTo ensure the accuracy of formulas involving multiple metrics, like CPI, all metrics must be updated atomically from the perspective of the reader.
94*aca3beaaSApple OSS DistributionsA traditional locking mechanism would prevent the writer from updating the counts while readers are present, so Recount uses a sequence lock instead.
95*aca3beaaSApple OSS DistributionsWriters make a generation count odd before updating any of the values and then set it back to even when all values are updated.
96*aca3beaaSApple OSS DistributionsReaders wait until the generation count becomes even before trying to read the values, and if the counter changes by the time they're done reading them, it retries the read.
97*aca3beaaSApple OSS DistributionsSince three entities need to be updated at once (thread, task, and processor), only the last update has a release barrier to publish the writes.
98*aca3beaaSApple OSS DistributionsWhen reporting just user and system time, taking the sequence lock as a reader introduced unacceptable overhead.
99*aca3beaaSApple OSS DistributionsThe sequence lock doesn't need to be taken for these metrics since they're never updated simultaneously.
100*aca3beaaSApple OSS Distributions
101*aca3beaaSApple OSS DistributionsThe coalition counters are not updated by threads switching off-CPU and are instead protected by the coalition lock while a task exits and rolls up its counters to the coalition.
102*aca3beaaSApple OSS DistributionsReading the counters requires holding the lock and iterating the constituent tasks, grouping their per-CPU counters into per-CPU kind ones.
103*aca3beaaSApple OSS Distributions
104*aca3beaaSApple OSS Distributions### Energy
105*aca3beaaSApple OSS Distributions
106*aca3beaaSApple OSS DistributionsThe energy counters on ARM systems count a custom unit of energy that needs to be scaled to nanojoules.
107*aca3beaaSApple OSS DistributionsBecause this unit can be very small and may overflow a 64-bit counter, it's scaled to nanojoules during context-switch.
108