xref: /xnu-11215.61.5/doc/building/xnu_build_consolidation.md (revision 4f1223e81cd707a65cc109d0b8ad6653699da3c4)
1# XNU Build Consolidation
2
3Making xnu build more efficently when targetting several devices at once.
4
5## Introduction and motivation
6
7XNU is supported on approximately 20 different targets. Whilst in some cases the differences between two
8given targets are small (e.g. when they both support the same ISA), XNU has traditionally required to have
9separate builds in cases where the topology of the targets differ (for example, when they feature different
10core/cluster counts or cache sizes). Similarly, SoC-specific fix-ups are usually conditionally compiled
11based on the target.
12
13Given the time it takes to compile all three different variants (release, debug and development) for each
14supported SoC, usually several times a day for various teams across Apple, the goal of this project was to
15reduce the number of existing builds, as well as to set up a simple framework that makes it easier to share
16builds across different SoCs moving forward.
17
18Although this effort could be extended to KEXTs, and hence lead to shared KernelCaches across devices, the
19scope of this document only includes XNU. In cases where KEXTs also differ across targets, or perhaps the
20required KEXTs are completely different in the first place, the kernel still needs to be linked
21appropriately with different sets of KEXTs and hence KernelCaches cannot be shared.
22
23
24## Changes required in XNU
25
26The kernel itself is relatively SoC-agnostic, although strongly architecture-dependent; this is because most
27of the SoC-specific aspects of the KernelCache are abstracted by the KEXTs. Things that pertain to the
28kernel include:
29
30* Number of cores/clusters in the system, their physical IDs and type.
31* Addresses of PIO registers that are to be accessed from the XNU side.
32* L1/L2 cache geometry parameters (e.g. size, number of set/ways).
33* Just like other components, the kernel has its share of responsibility when it comes to setting up HID
34registers and applying fix-ups at various points during boot or elsewhere at runtime.
35* Certain kernel-visible architectural features are optional, which means that two same-generation SoCs may
36still differ in their feature set.
37
38All of these problems can be solved through a mix of relying more heavily on device tree information and
39performing runtime checks. The latter is possible because both the ARM architecture and the Apple's
40extensions provide r/o registers that can be checked at runtime to discover supported features as well as
41various CPU-specific parameters.
42
43### Obtaining cache geometry parameters at runtime
44
45Although not often, the kernel may still require deriving, one way or another, parameters like cache sizes
46and number of set/ways. XNU needs most of this information to perform set/way clean/invalidate operations.
47Prior to this work, these values were hardcoded for each supported target in `proc_reg.h`, and used in
48`caches_asm.s`. The ARM architecture provides the `CCSIDR_EL1` register, which can be used in conjunction
49with `CSSELR_EL1` to select the target cache and obtain geometry information.
50
51
52### Performing CPU/Revision-specific checks at runtime
53
54CPU and revision checks may be required at various places, although the focus here has been the application
55of tunables at boot time.
56
57Tunables are often applied:
58
59* On a specific core type of a specific SoC.
60* On a subset of all of the CPU revisions.
61* On all P-cores or all E-cores.
62
63This has led in the past to a number of nested, conditionally-compiled blocks of code that are not easy to
64understand or manage as new tunables are added or SoCs/revisions are deprecated.
65
66The changes applied as part of this work focus mainly on:
67
681. Decoupling the tunable-application code from `start.s`.
692. Splitting the tunable-application code across different files, one per supported architecture (e.g.
70`tunables_h7.h`, or `tunables_h11.h`).
713. Providing "templates" for the most commonly-used combinations of tunables.
724. Providing a family of assembly macros that can be used to conditionally execute code on a specific core
73type, CPU ID, revision(s), or a combination of these.
74
75All of the macros live in the 64-bit version of `proc_reg.h`, and are SoC-agnostic; they simply check the
76`MIDR_EL1` register against a CPU revision that is passed as a parameter to the macro, where applicable.
77Similarly, where a block of code is to be executed on a core type, rather than a specific core ID, a couple
78of the provided macros can check this against `MPIDR_EL1`.
79
80
81### Checking for feature compatibility at runtime
82
83Some architectural features are optional, which means that, when disabled at compile-time, this may cause
84two same-generation SoCs to diverge.
85
86
87Rather than disabling features, and assuming this does not pose security risks or performance regressions,
88the preferred approach is to compile them in, but perform runtime checks to enable/disable them, possibly in
89early boot. The way these checks are performed varies from feature to feature (for example, VHE is an ARM
90feature, and the ARM ARM specifies how it can be discovered). For Apple-specific features, these are all
91advertised through the `AIDR_EL1` register. One of the changes is the addition of a function,
92ml_feature_supported(), that may be used to check for the presence of a feature at runtime.
93
94
95### Deriving core/cluster counts from device tree
96
97One of the aspects that until now has been hardcoded in XNU is the system topology: number of cores/clusters
98and their physical IDs. This effort piggybacks on other recent XNU changes which aimed to consolidate
99topology-related information into XNU, by parsing it from the device tree and exporting it to KEXTs through
100well-defined APIs.
101
102Changes applied as part of the XNU consolidation project include:
103
104* Extending the `ml_*` API to extract cluster information from the topology parser. New APIs include the following:
105    * `ml_get_max_cluster_number()`
106    * `ml_get_cluster_count()`
107    * `ml_get_first_cpu_id()`
108* Removing hardcoded core counts (`CPU_COUNT`) and cluster counts (`ARM_CLUSTER_COUNT`) from XNU, and
109replacing them with `ml_*` calls.
110* Similarly, deriving CPU physical IDs from the topology parser.
111
112
113### Allocating memory that is core size/cluster size/cache size aligned
114
115In some cases, certain statically-allocated arrays/structures need to be cache line-aligned, or have one
116element per core or cluster. Whilst this information is not known precisely at compile time anymore, the
117following macros have been added to provide a reasonably close upper bound:
118
119* `MAX_CPUS`
120* `MAX_CPU_CLUSTERS`
121* `MAX_L2_CLINE`
122
123These macros are defined in `board_config.h`, and should be set to the same value for a group of targets
124sharing a single build. Note that these no longer reflect actual counts and sizes, and the real values need
125to be queried at runtime through the `ml_` API.
126
127The L1 cache line size is still hardcoded, and defined as `MMU_CLINE`. Since this value is always the same
128and very often checked at various places across XNU and elsewhere, it made sense to keep it as a compile
129time macro rather than relying on runtime checks.
130
131### Restrictions on conditional compilation
132
133Currently, a family of per-SoC macros are defined at build time to enable XNU to conditionally compile code
134for different targets. These are named `ARM[64]_BOARD_CONFIG_[TARGET_NAME]`, and have historically been used
135in different places across the kernel; for example, when applying tunables, various fixes, or enabling
136disabling features. In order not to create divergences in the future across same-generation SoCs, but also
137to keep the codebase consistent, the recommendation is to avoid the use of these macros whenever possible.
138
139Instead, XNU itself defines yet another family of macros that are defined for all targets of a particular
140generation. These are named after the P-CORE introduced by each (for example, `APPLEMONSOON`, or
141`APPLEVORTEX`), and are preferred over the SoC-specific ones. Where a generation macro is not enough to
142provide correctness (which happens, for example, when the code block at hand should not be executed on a
143given SoC of the same family), appropriate runtime checks can be performed inside the conditionally-compiled
144code block. `machine_read_midr()` and `get_arm_cpu_version()` may be used for this purpose.
145