xref: /xnu-11215.61.5/doc/arm/sme.md (revision 4f1223e81cd707a65cc109d0b8ad6653699da3c4)
1*4f1223e8SApple OSS DistributionsARM Scalable Matrix Extension
2*4f1223e8SApple OSS Distributions=============================
3*4f1223e8SApple OSS Distributions
4*4f1223e8SApple OSS DistributionsManaging hardware resources related to SME state.
5*4f1223e8SApple OSS Distributions
6*4f1223e8SApple OSS DistributionsIntroduction
7*4f1223e8SApple OSS Distributions------------
8*4f1223e8SApple OSS Distributions
9*4f1223e8SApple OSS DistributionsThis document describes how xnu manages the hardware resources associated with
10*4f1223e8SApple OSS DistributionsARM's Scalable Matrix Extension (SME).
11*4f1223e8SApple OSS Distributions
12*4f1223e8SApple OSS DistributionsSME is an ARMv9 extension intended to accelerate matrix math operations.  SME
13*4f1223e8SApple OSS Distributionsbuilds on top of ARM's previous Scalable Vector Extension (SVE), which extends
14*4f1223e8SApple OSS Distributionsthe length of the FPSIMD register files and adds new 1D vector-math
15*4f1223e8SApple OSS Distributionsinstructions.  SME extends SVE by adding a matrix register file and associated
16*4f1223e8SApple OSS Distributions2D matrix-math instructions.  SME2 further extends SME with additional
17*4f1223e8SApple OSS Distributionsinstructions and register state.
18*4f1223e8SApple OSS Distributions
19*4f1223e8SApple OSS DistributionsThis document summarizes SVE, SME, and SME2 hardware features that are relevant
20*4f1223e8SApple OSS Distributionsto xnu.  It is not intended as a full programming guide for SVE or SME: readers
21*4f1223e8SApple OSS Distributionsmay find a full description of these ISAs in the
22*4f1223e8SApple OSS Distributions[SVE supplement to the ARM ARM](https://developer.arm.com/documentation/ddi0584/latest/)
23*4f1223e8SApple OSS Distributionsand [SME supplement to the ARM ARM](https://developer.arm.com/documentation/ddi0616/latest/),
24*4f1223e8SApple OSS Distributionsrespectively.
25*4f1223e8SApple OSS Distributions
26*4f1223e8SApple OSS Distributions
27*4f1223e8SApple OSS Distributions
28*4f1223e8SApple OSS DistributionsHardware overview
29*4f1223e8SApple OSS Distributions-----------------
30*4f1223e8SApple OSS Distributions
31*4f1223e8SApple OSS Distributions### EL0-accessible state
32*4f1223e8SApple OSS Distributions
33*4f1223e8SApple OSS DistributionsSVE, SME, and SME2 introduce four new EL0-accessible register
34*4f1223e8SApple OSS Distributionsfiles<sup>[1](#feat_sve_footnote)</sup>:
35*4f1223e8SApple OSS Distributions
36*4f1223e8SApple OSS Distributions- vector registers `Z0`-`Z31`
37*4f1223e8SApple OSS Distributions- predicate registers `P0`-`P15`
38*4f1223e8SApple OSS Distributions- matrix data `ZA` (SME/SME2 only)
39*4f1223e8SApple OSS Distributions- look-up table `ZT0` (SME2 only)
40*4f1223e8SApple OSS Distributions
41*4f1223e8SApple OSS DistributionsThese register files are unbanked, i.e., their contents are shared across all
42*4f1223e8SApple OSS Distributionsexception levels.  Data can be copied between these registers and system memory
43*4f1223e8SApple OSS Distributionsusing specialized `ldr` and `str` variants.  SME also adds `mov` variants that
44*4f1223e8SApple OSS Distributionscan directly copy data between the vector and matrix register files.
45*4f1223e8SApple OSS Distributions
46*4f1223e8SApple OSS DistributionsMost of these register files supplement, rather than replace, the existing ARM
47*4f1223e8SApple OSS Distributionsregister files.  However the `Z` register file effectively extends the length of
48*4f1223e8SApple OSS Distributionsthe existing FPSIMD `V` register file.  Instructions targeting the `V` register
49*4f1223e8SApple OSS Distributionsfile will now access the lower 128 bits of the corresponding `Z` register.
50*4f1223e8SApple OSS Distributions
51*4f1223e8SApple OSS DistributionsThe size of most of these files is defined by the *streaming vector length*
52*4f1223e8SApple OSS Distributions(SVL), a power-of-two between 128 and 2048 inclusive.  Each `Z` register is SVL
53*4f1223e8SApple OSS Distributionsbits in size; each `P` register is SVL / 8 bits in size; and `ZA` is SVL x SVL
54*4f1223e8SApple OSS Distributionsbits in size.  The value of SVL is determined by both hardware and software.
55*4f1223e8SApple OSS DistributionsHardware places an implementation-defined cap on SVL, and privileged software
56*4f1223e8SApple OSS Distributionscan further reduce SVL for itself and lower exception levels.
57*4f1223e8SApple OSS Distributions
58*4f1223e8SApple OSS DistributionsIn contrast, `ZT0` is fixed at 512 bits, independent of SVL.
59*4f1223e8SApple OSS Distributions
60*4f1223e8SApple OSS DistributionsSME also adds a single EL0-accessible system register `TPIDR2_EL0`.  Like
61*4f1223e8SApple OSS Distributions`TPIDR_EL0`, `TPIDR2_EL0` is officially reserved for ABI use, but its contents
62*4f1223e8SApple OSS Distributionshave no particular meaning to hardware.
63*4f1223e8SApple OSS Distributions
64*4f1223e8SApple OSS Distributions### `PSTATE` changes
65*4f1223e8SApple OSS Distributions
66*4f1223e8SApple OSS DistributionsSME adds two orthogonal states to `PSTATE`.
67*4f1223e8SApple OSS Distributions
68*4f1223e8SApple OSS Distributions`PSTATE.SM` moves the CPU in and out of a special execution mode called
69*4f1223e8SApple OSS Distributions*streaming SVE mode*.  Software must enter streaming SVE mode to execute most
70*4f1223e8SApple OSS DistributionsSME instructions.  However software must then exit streaming SVE mode to execute
71*4f1223e8SApple OSS Distributionsmany legacy SIMD instructions<sup>[2](#feat_sme_fa64_footnote)</sup>.  To make
72*4f1223e8SApple OSS Distributionsthings even more complicated, these transitions cause the CPU to zero out the
73*4f1223e8SApple OSS Distributions`V`/`Z` and `P` register files, and to set all `FPSR` flags.  When software
74*4f1223e8SApple OSS Distributionsneeds to retain this state across `PSTATE.SM` transitions, it must manually
75*4f1223e8SApple OSS Distributionsstash the state in memory.
76*4f1223e8SApple OSS Distributions
77*4f1223e8SApple OSS Distributions`PSTATE.ZA` independently controls whether the contents of `ZA` and `ZT0` are
78*4f1223e8SApple OSS Distributionsvalid.  Setting `PSTATE.ZA` zeroes out both register files, and enables
79*4f1223e8SApple OSS Distributionsinstructions that access them.  Clearing `PSTATE.ZA` causes `ZA` and `ZT0`
80*4f1223e8SApple OSS Distributionsaccesses to trap.
81*4f1223e8SApple OSS Distributions
82*4f1223e8SApple OSS DistributionsMost SME instructions require both `PSTATE.SM` and `PSTATE.ZA` to be
83*4f1223e8SApple OSS Distributionsset, so software usually toggles both bits at the same time.  However setting
84*4f1223e8SApple OSS Distributionsthese bits independently can be useful when software needs to interleave SME and
85*4f1223e8SApple OSS DistributionsFPSIMD instructions.  If software needs to temporarily exit streaming SVE mode
86*4f1223e8SApple OSS Distributionsto execute FPSIMD instructions, setting `PSTATE.{SM,ZA} = {0,1}` will do so
87*4f1223e8SApple OSS Distributionswithout clobbering the `ZA` or `ZT0` array.
88*4f1223e8SApple OSS Distributions
89*4f1223e8SApple OSS Distributions`PSTATE.{SM,ZA} = {0,0}` acts as a hint to the CPU that it may power down
90*4f1223e8SApple OSS DistributionsSME-related hardware.  Hence software should clear these bits as soon as
91*4f1223e8SApple OSS DistributionsSME state can be discarded.
92*4f1223e8SApple OSS Distributions
93*4f1223e8SApple OSS DistributionsThese `PSTATE` bits are accessible to software in several ways:
94*4f1223e8SApple OSS Distributions
95*4f1223e8SApple OSS Distributions- Reads or writes to the `SVCR` system register, which packs both bits into
96*4f1223e8SApple OSS Distributions  a single register
97*4f1223e8SApple OSS Distributions- Writes to the `SVCRSM`, `SVCRZA`, or `SVCRSMZA` system registers with the
98*4f1223e8SApple OSS Distributions  immediate values `0` or `1`, which directly modify the specified bit(s)
99*4f1223e8SApple OSS Distributions- `sm{start,stop} (sm|za)` pseudo-instructions, which are assembler aliases for
100*4f1223e8SApple OSS Distributions  the above `msr` instructions
101*4f1223e8SApple OSS Distributions
102*4f1223e8SApple OSS DistributionsRegardless of which method is used to access these bits, software generally does
103*4f1223e8SApple OSS Distributionsnot need explicit barriers.  Specifically, ARM guarantees that all direct and
104*4f1223e8SApple OSS Distributionsindirect reads from these bits will appear in program order relative to any
105*4f1223e8SApple OSS Distributionsdirect writes.
106*4f1223e8SApple OSS Distributions
107*4f1223e8SApple OSS Distributions### Other hardware resources
108*4f1223e8SApple OSS Distributions
109*4f1223e8SApple OSS DistributionsAn implementation may share SME compute resources across multiple CPUs.  In this
110*4f1223e8SApple OSS Distributionscase, the per-CPU `SMPRI_EL1` controls the relative priority of the SME
111*4f1223e8SApple OSS Distributionsinstructions issued by that CPU.  ARM guarantees that higher `SMPRI_EL1` values
112*4f1223e8SApple OSS Distributionsindicate higher priorities, and that setting `SMPRI_EL1 = 0` on all CPUs is a
113*4f1223e8SApple OSS Distributionssafe way to disable SME prioritization.  Otherwise the exact meaning of
114*4f1223e8SApple OSS Distributions`SMPRI_EL1` is implementation-defined.
115*4f1223e8SApple OSS Distributions
116*4f1223e8SApple OSS DistributionsEL2 may trap guest reads and writes to `SMPRI_EL1` using the fine-grained trap
117*4f1223e8SApple OSS Distributionscontrols `HFGRTR_EL2.nSMPRI_EL1` and `HFGWTR_EL2.nSMPRI_EL1`, respectively.
118*4f1223e8SApple OSS DistributionsAlternatively, EL2 may adjust the effective SME priority at EL0 and EL1 without
119*4f1223e8SApple OSS Distributionstrapping, by populating the lookup table register `SMPRIMAP_EL2` and setting the
120*4f1223e8SApple OSS Distributionscontrol bit `HCRX_EL2.SMPME`.  When `HCRX_EL2.SMPME` is set, SME instructions
121*4f1223e8SApple OSS Distributionsexecuted at EL0 and EL1 will interpret `SMPRI_EL1` as an index into
122*4f1223e8SApple OSS Distributions`SMPRIMAP_EL2` rather than as a raw priority value.
123*4f1223e8SApple OSS Distributions
124*4f1223e8SApple OSS Distributions`SMIDR_EL1` advertises hardware properties about the SME implementation,
125*4f1223e8SApple OSS Distributionsincluding whether SME execution priority is implemented.
126*4f1223e8SApple OSS Distributions
127*4f1223e8SApple OSS Distributions`CPACR_EL1` and `CPTR_ELx` have controls that can trap SVE and SME operations.
128*4f1223e8SApple OSS DistributionsTwo of these are relevant to Apple's SME
129*4f1223e8SApple OSS Distributionsimplementation<sup>[3](#cpacr_zen_footnote)</sup>:
130*4f1223e8SApple OSS Distributions
131*4f1223e8SApple OSS Distributions- `SMEN`: trap SME instructions and register accesses, including SVE
132*4f1223e8SApple OSS Distributions  instructions executed during streaming SVE mode.
133*4f1223e8SApple OSS Distributions- `FPEN`: trap FPSIMD, SME, and SVE instructions and most register accesses, but
134*4f1223e8SApple OSS Distributions  *not* `SVCR` accesses.  Lower priority than `SMEN`.
135*4f1223e8SApple OSS Distributions
136*4f1223e8SApple OSS DistributionsSeveral SME registers aren't affected by these controls, since they have their
137*4f1223e8SApple OSS Distributionsown trapping mechanisms.  `SMPRI_EL1` has fine-grained hypervisor trap controls
138*4f1223e8SApple OSS Distributionsas described above.  `SMIDR_EL1` accesses can trap to the hypervisor using the
139*4f1223e8SApple OSS Distributionsexisting `HCR_EL2.TID1` control bit.  Finally `TPIDR2_EL0` has a dedicated
140*4f1223e8SApple OSS Distributionscontrol bit `SCTLR_ELx.EnTP2` along with fine-grained trap controls
141*4f1223e8SApple OSS Distributions`HFG{R,W}TR_EL2.TPIDR2_EL0`.
142*4f1223e8SApple OSS Distributions
143*4f1223e8SApple OSS Distributions
144*4f1223e8SApple OSS DistributionsSoftware usage
145*4f1223e8SApple OSS Distributions--------------
146*4f1223e8SApple OSS Distributions
147*4f1223e8SApple OSS Distributions### SME `PSTATE` management
148*4f1223e8SApple OSS Distributions
149*4f1223e8SApple OSS Distributionsxnu has in-kernel SIMD instructions<sup>[4](#xnu_simd_footnote)</sup> which
150*4f1223e8SApple OSS Distributionsbecome illegal while the CPU is in streaming SVE mode.  This poses a problem if
151*4f1223e8SApple OSS Distributionsxnu interrupts EL0 while it is in the middle of executing SME-accelerated code.
152*4f1223e8SApple OSS Distributions
153*4f1223e8SApple OSS DistributionsHence, anytime xnu enters the kernel with `PSTATE.SM` set, it saves the current
154*4f1223e8SApple OSS Distributions`Z`, `P`, and `SVCR` values and then clears `PSTATE.SM`.  xnu later restores
155*4f1223e8SApple OSS Distributionsthese values during kernel exit.  These operations occur in an assembly-only
156*4f1223e8SApple OSS Distributionsmodule (`locore.s`) where we have strict control over code generation, and can
157*4f1223e8SApple OSS Distributionsguarantee that no problematic SIMD instructions are executed while `PSTATE.SM`
158*4f1223e8SApple OSS Distributionsis set.
159*4f1223e8SApple OSS Distributions
160*4f1223e8SApple OSS DistributionsSince the kernel may interrupt *itself*, kernel code is forbidden from entering
161*4f1223e8SApple OSS Distributionsstreaming SVE mode.  This policy means that xnu does not need to preserve
162*4f1223e8SApple OSS Distributions`TPIDR2_EL0`, `ZA`, or `ZT0` during kernel entry and exit, since there are no
163*4f1223e8SApple OSS Distributionsin-kernel SME operations that could clobber them.
164*4f1223e8SApple OSS Distributions
165*4f1223e8SApple OSS Distributions### Context switching
166*4f1223e8SApple OSS Distributions
167*4f1223e8SApple OSS Distributionsxnu saves and restores `TPIDR2_EL0`, `ZA`, and `ZT0` inside the ARM64
168*4f1223e8SApple OSS Distributionsimplementation of `machine_switch_context()`, specifically as the routines
169*4f1223e8SApple OSS Distributions`machine_{save,restore}_sme_context()` in `osfmk/arm64/pcb.c`.  These in turn
170*4f1223e8SApple OSS Distributionsbuild on lower-level routines to save and load SME register state, located in
171*4f1223e8SApple OSS Distributions`osfmk/arm64/sme.c`.  The low-level routines are built on top of the SME `str`
172*4f1223e8SApple OSS Distributionsand `ldr` instructions, which can be executed outside of streaming SVE mode.
173*4f1223e8SApple OSS Distributions
174*4f1223e8SApple OSS Distributions`machine_{save,restore}_sme_context()` unconditionally save and restore
175*4f1223e8SApple OSS Distributions`TPIDR2_EL0`, since its contents are valid even when EL0 isn't actually using
176*4f1223e8SApple OSS DistributionsSME.  However `ZA`'s and `ZT0`'s contents are often invalid and hence do not
177*4f1223e8SApple OSS Distributionsrequire context-switching.  `machine_save_sme_context()` reads `SVCR.ZA`
178*4f1223e8SApple OSS Distributionsto determine if the `ZA` and `ZT0` arrays were actually valid at context-switch
179*4f1223e8SApple OSS Distributionstime.  If not, it skips saving the invalid `ZA` and `ZT0` contents.
180*4f1223e8SApple OSS Distributions
181*4f1223e8SApple OSS DistributionsLikewise, when context-switching back to a thread where the saved-state
182*4f1223e8SApple OSS Distributions`SVCR.ZA` is cleared, `machine_restore_sme_context()` simply ensures that the
183*4f1223e8SApple OSS DistributionsCPU's `PSTATE.ZA` bit is cleared (executing `smstop za` if necessary).  xnu does
184*4f1223e8SApple OSS Distributionsnot need to manually invalidate any `ZA` or `ZT0` state left by a previous
185*4f1223e8SApple OSS Distributionsthread: the next time `PSTATE.ZA` is enabled, the CPU is architecturally
186*4f1223e8SApple OSS Distributionsguaranteed to zero out both register files.
187*4f1223e8SApple OSS Distributions
188*4f1223e8SApple OSS DistributionsAs noted above, xnu saves `SVCR` on kernel entry and uses it to restore
189*4f1223e8SApple OSS Distributions`PSTATE.SM` on kernel exit.  Hence `machine_restore_sme_context()` updates
190*4f1223e8SApple OSS Distributions`PSTATE.ZA` to match the new process's saved state, but doesn't update
191*4f1223e8SApple OSS Distributions`PSTATE.SM`.  Likewise `machine_restore_sme_context()` doesn't manipulate the `Z`
192*4f1223e8SApple OSS Distributionsor `P` register files, since these will be updated on kernel exit.
193*4f1223e8SApple OSS Distributions
194*4f1223e8SApple OSS DistributionsSince SME thread state (`thread->machine.usme`) is large, and won't be used by
195*4f1223e8SApple OSS Distributionsmost threads, xnu lazily allocates the backing memory the first time a thread
196*4f1223e8SApple OSS Distributionsencounters an SME instruction.  This is implemented by clearing `SCTLR_EL1.SMEN`
197*4f1223e8SApple OSS Distributionsinside `machine_restore_sme_context()`, then performing the allocation during
198*4f1223e8SApple OSS Distributionsthe subsequent SME trap.
199*4f1223e8SApple OSS Distributions
200*4f1223e8SApple OSS Distributions### Execution priority
201*4f1223e8SApple OSS Distributions
202*4f1223e8SApple OSS Distributionsxnu does not currently have an API for changing SME execution priority.
203*4f1223e8SApple OSS DistributionsAccordingly xnu resets `SMPRI_EL1` to `0` during CPU initialization, and
204*4f1223e8SApple OSS Distributionsotherwise does not modify it at runtime.
205*4f1223e8SApple OSS Distributions
206*4f1223e8SApple OSS Distributions### Power management
207*4f1223e8SApple OSS Distributions
208*4f1223e8SApple OSS Distributionsxnu updates `PSTATE.ZA` during `machine_switch_sme_context()` using the `SVCR`
209*4f1223e8SApple OSS Distributionsvalue stashed in the new thread's SME state.  If the new process has never used
210*4f1223e8SApple OSS DistributionsSME, and hence doesn't have saved `ZA` state, xnu unconditionally clears
211*4f1223e8SApple OSS Distributions`PSTATE.ZA`.  This policy means that xnu issues the power-down hint
212*4f1223e8SApple OSS Distributions`PSTATE.{SM,ZA} = {0,0}` on every context-switch, unless the new thread has live
213*4f1223e8SApple OSS Distributions`ZA` state.  (Recall that `PSTATE.SM` was previously cleared on kernel entry.)
214*4f1223e8SApple OSS Distributions
215*4f1223e8SApple OSS DistributionsBy extension, xnu will always issue this hint before entering WFI.  In order to
216*4f1223e8SApple OSS Distributionsreach `arm64_retention_wfi()`, xnu must first context-switch to the idle thread,
217*4f1223e8SApple OSS Distributionswhich never has `ZA` state.
218*4f1223e8SApple OSS Distributions
219*4f1223e8SApple OSS Distributions### Virtualizing SME
220*4f1223e8SApple OSS Distributions
221*4f1223e8SApple OSS DistributionsSME introduces a number of new registers that the hypervisor needs to manage.
222*4f1223e8SApple OSS Distributions`SMCR_ELx` is the only one of these that's banked between EL1 and EL2.  The
223*4f1223e8SApple OSS Distributions`SVCR`, `SMPRI_EL1`, and `TPIDR2_EL0` system registers are all shared between
224*4f1223e8SApple OSS Distributionsthe host and guest, and must be managed by the host hypervisor accordingly.
225*4f1223e8SApple OSS Distributions
226*4f1223e8SApple OSS DistributionsMore critically, the `Z`, `P`, `ZA`, and `ZT0` register files are also shared
227*4f1223e8SApple OSS Distributionsacross all exception levels.  To minimize the cost of managing this unbanked SME
228*4f1223e8SApple OSS Distributionsregister state, xnu tries to keep the guest matrix state resident in the CPU as
229*4f1223e8SApple OSS Distributionslong as possible, even when the guest traps to EL2.  xnu will only spill the `ZA`
230*4f1223e8SApple OSS Distributionsand `ZT0` state back to memory when one of two things happens:
231*4f1223e8SApple OSS Distributions
232*4f1223e8SApple OSS Distributions(1) The `hv_vcpu_run` trap handler returns control all the way back to the VMM
233*4f1223e8SApple OSS Distributions    thread at host EL0
234*4f1223e8SApple OSS Distributions
235*4f1223e8SApple OSS Distributions(2) xnu needs to context-switch the host VMM thread that owns the vCPU
236*4f1223e8SApple OSS Distributions
237*4f1223e8SApple OSS DistributionsIn these cases xnu will spill the guest `ZA` and `ZT0` state back to memory,
238*4f1223e8SApple OSS Distributionsthen replace them with the VMM thread's or new thread's state (respectively).
239*4f1223e8SApple OSS Distributions
240*4f1223e8SApple OSS DistributionsUnfortunately since xnu has to disable streaming SVE mode to handle traps, it's
241*4f1223e8SApple OSS Distributionsstill forced to spill `Z` and `P` state to memory anytime the guest traps to EL2
242*4f1223e8SApple OSS Distributionswith `PSTATE.SM` set.
243*4f1223e8SApple OSS Distributions
244*4f1223e8SApple OSS Distributions
245*4f1223e8SApple OSS DistributionsSince xnu doesn't currently support SME prioritization, it sets `HCRX_EL2.SMPME`
246*4f1223e8SApple OSS Distributionsand populates all `SMPRIMAP_EL2` entries with a value of `0`.  Guest OSes are
247*4f1223e8SApple OSS Distributionsstill allowed to write to `SMPRI_EL1`, but currently this has no effect on
248*4f1223e8SApple OSS Distributionsthe actual hardware priority.
249*4f1223e8SApple OSS Distributions
250*4f1223e8SApple OSS Distributions
251*4f1223e8SApple OSS Distributions
252*4f1223e8SApple OSS DistributionsFootnotes
253*4f1223e8SApple OSS Distributions---------
254*4f1223e8SApple OSS Distributions
255*4f1223e8SApple OSS Distributions<a name="feat_sve_footnote"></a>1. For simplicity, this section describes the
256*4f1223e8SApple OSS Distributionsbehavior on Apple CPUs.  Details like register length and accessibility may
257*4f1223e8SApple OSS Distributionsdepend on whether the CPU is in streaming SVE mode (described later in the
258*4f1223e8SApple OSS Distributionsdocument).  Apple's current SME implementation simply makes SVE features
259*4f1223e8SApple OSS Distributionsinaccessible outside this mode.
260*4f1223e8SApple OSS Distributions
261*4f1223e8SApple OSS Distributions<a name="feat_sme_fa64_footnote"></a>2. The optional CPU feature FEAT_SME_FA64
262*4f1223e8SApple OSS Distributionsallows full use of the SIMD instruction set inside streaming SVE mode.
263*4f1223e8SApple OSS DistributionsHowever xnu does not currently support any CPUs which implement FEAT_SME_FA64.
264*4f1223e8SApple OSS Distributions
265*4f1223e8SApple OSS Distributions<a name="cpacr_zen_footnote"></a>3. `CPACR_EL1` and `CPTR_ELx` also have a
266*4f1223e8SApple OSS Distributionsdiscrete trap control `ZEN` for SVE instruction and register accesses performed
267*4f1223e8SApple OSS Distributionsoutside streaming SVE mode.  This trap control isn't currently relevant to Apple
268*4f1223e8SApple OSS DistributionsCPUs, since Apple's current SME implementation only allows SVE accesses inside
269*4f1223e8SApple OSS Distributionsstreaming SVE mode.
270*4f1223e8SApple OSS Distributions
271*4f1223e8SApple OSS Distributions<a name="xnu_simd_footnote"></a>4. LLVM is surprisingly aggressive about
272*4f1223e8SApple OSS Distributionsemitting SIMD instructions unless explicitly inhibited by compiler flags.  Even
273*4f1223e8SApple OSS Distributionsif the xnu build started inhibiting these instructions for targets that support
274*4f1223e8SApple OSS DistributionsSME, they could still appear in existing kext binaries.
275*4f1223e8SApple OSS Distributions
276