xref: /xnu-12377.61.12/doc/allocators/xnu-kernel-restricted.md (revision 4d495c6e23c53686cf65f45067f79024cf5dcee8)
1*4d495c6eSApple OSS Distributions# Protecting Kernel-Private Memory
2*4d495c6eSApple OSS Distributions
3*4d495c6eSApple OSS Distributions## Intro
4*4d495c6eSApple OSS Distributions
5*4d495c6eSApple OSS DistributionsWe can classify all kernel-allocated memory into two high-level categories:
6*4d495c6eSApple OSS Distributions
7*4d495c6eSApple OSS Distributions1. *Kernel-private memory*
8*4d495c6eSApple OSS Distributions2. *Kernel-shareable memory*
9*4d495c6eSApple OSS Distributions
10*4d495c6eSApple OSS Distributions*Kernel-private memory* covers all the memory used exclusively by the kernel, that is never meant to be shared with external domains. Therefore, such memory should never be mapped into different address spaces — neither to userspace nor to coprocessors via IOMMUs/DARTs. All zone/`kalloc_type()` managed memory which contains pointers is de facto kernel private — as sharing kernel pointers with other domains would be a security violation. It is however worth noting that some data allocations are never going to be shared, and can be considered kernel-private memory.
11*4d495c6eSApple OSS Distributions
12*4d495c6eSApple OSS Distributions*Kernel-shareable memory* covers allocations made by the kernel that are meant to be shared with external address spaces by-design. Such memory is not allowed to contain kernel pointers nor any kernel-private information, and as a result is always pure data allocations.
13*4d495c6eSApple OSS Distributions
14*4d495c6eSApple OSS DistributionsA lot of work has been done in our type-segregated allocators that we can leverage so that the kernel can enforce appropriate mapping policies to make sure that kernel-private memory actually stays private even in the presence of bugs. Without such enforcement, attackers could attempt exploiting various kernel interfaces to gain access to kernel-private memory into their address space, which would bypass most of the state-of-the-art mitigations in the kernel.
15*4d495c6eSApple OSS Distributions
16*4d495c6eSApple OSS DistributionsThis document covers the problem space, the security boundaries we defend, and the technical details of the mitigation.
17*4d495c6eSApple OSS Distributions
18*4d495c6eSApple OSS Distributions## Problem space
19*4d495c6eSApple OSS Distributions
20*4d495c6eSApple OSS DistributionsThe security boundaries we consider here are:
21*4d495c6eSApple OSS Distributions
22*4d495c6eSApple OSS Distributions1. **user → kernel**: we consider attackers that have successfully compromised a userspace process, and attempt to compromise the kernel via any form of kernel vulnerability, including Mach VM logic bugs;
23*4d495c6eSApple OSS Distributions2. **coprocessors → kernel**: we consider attackers that have successfully compromised a coprocessor, and attempt to compromise the kernel via any RPC interface exposed by kernel extensions to these coprocessors.
24*4d495c6eSApple OSS Distributions
25*4d495c6eSApple OSS DistributionsThese boundaries are special, because they often comprise APIs to map or share memory between the kernel and userspace or coprocessors, that could be misused:
26*4d495c6eSApple OSS Distributions
27*4d495c6eSApple OSS Distributions* The Mach VM subsystem manages virtual address spaces; therefore, bugs in this subsystem could be abused to create illegal mappings to kernel-private memory.
28*4d495c6eSApple OSS Distributions* Many coprocessors operate on memory shared with the Application Processor, and need to access memory owned by userspace tasks as well as memory managed by the kernel. Because of that, some kernel extensions expose RPC interfaces to their counterpart coprocessor, that allow for mapping memory via their IOMMU/DART. This exposes a wide — and usually bespoke — attack surface that can lead to illegal mappings to kernel-private memory to be created.
29*4d495c6eSApple OSS Distributions
30*4d495c6eSApple OSS DistributionsIf attackers could gain the ability to map kernel-private memory into an address space they control, they effectively defeat the boundary. This allows them to access kernel pointers freely, which at least gives them a way to guide attacks — if the mapping is read-only — but could even give them arbitrary kernel read-write right away. At this point, most kernel mitigations can more easily be bypassed, and exploitation becomes significantly easier.
31*4d495c6eSApple OSS Distributions
32*4d495c6eSApple OSS Distributions## XNU_KERNEL_RESTRICTED
33*4d495c6eSApple OSS Distributions
34*4d495c6eSApple OSS DistributionsThe Secure Page Table Monitor (SPTM) is highly privileged component that defines and enforces all the policies that govern page table management, for both the kernel and user applications, on behalf of XNU. Its goal is to protect the overall system by securing the page tables against bad actors, even in the presence of a compromised kernel.
35*4d495c6eSApple OSS Distributions
36*4d495c6eSApple OSS DistributionsThe SPTM has a *type system*, which sits at the heart of the SPTM security policies and primarily comprises the *frame table*, a data structure that stores metadata associated with every managed physical page in the system, alongside an immutable security policies that described what is allowed or disallowed for that specific physical page at any given time. For each frame type, there is a very clear set of policies that governs the permitted states for a given physical page and restricts which transitions are allowed.
37*4d495c6eSApple OSS Distributions
38*4d495c6eSApple OSS DistributionsTo address the above, the SPTM introduced a dedicated frame type for kernel-private memory: `XNU_KERNEL_RESTRICTED` (X_K_R). This type has three special policies that the SPTM enforces even in the presence of an XNU compromise:
39*4d495c6eSApple OSS Distributions
40*4d495c6eSApple OSS Distributions1. `XNU_KERNEL_RESTRICTED` pages can only be mapped in the kernel address space — hence never in any user process.
41*4d495c6eSApple OSS Distributions2. `XNU_KERNEL_RESTRICTED` pages are not allowed be mapped via IOMMU/DART.
42*4d495c6eSApple OSS Distributions3. `XNU_KERNEL_RESTRICTED` pages are only allowed a single mapping beyond the physical aperture static one.
43*4d495c6eSApple OSS Distributions
44*4d495c6eSApple OSS DistributionsBecause all transitions that would affect mappings have to go through the SPTM, these policies can be enforced, and will lead to a panic if an `XNU_KERNEL_RESTRICTED` page is being involved in an illegal transition.
45*4d495c6eSApple OSS Distributions
46*4d495c6eSApple OSS Distributions
47*4d495c6eSApple OSS Distributions```
48*4d495c6eSApple OSS Distributions
49*4d495c6eSApple OSS Distributions ┌──────────────────────────────────────────────────────┐
50*4d495c6eSApple OSS Distributions │                                                      │
51*4d495c6eSApple OSS Distributions │                                                      │
52*4d495c6eSApple OSS Distributions │                     ┌────────────┐   ┌────────────┐  │
53*4d495c6eSApple OSS Distributions │                     │            │   │            │  │
54*4d495c6eSApple OSS Distributions │   userspace         │   Task A   │   │   Task B   │  │
55*4d495c6eSApple OSS Distributions │                     │            │   │            │  │
56*4d495c6eSApple OSS Distributions │                     └──────▲─────┘   └─────▲──────┘  │
57*4d495c6eSApple OSS Distributions │                                            │         │
58*4d495c6eSApple OSS Distributions │                            │               │         │
59*4d495c6eSApple OSS Distributions │                                            │         │
60*4d495c6eSApple OSS Distributions │                            │               │         │
61*4d495c6eSApple OSS Distributions │                                            │         │
62*4d495c6eSApple OSS Distributions ├────────────────────────────┼───────────────┼─────────┤
63*4d495c6eSApple OSS Distributions │                                            │         │
64*4d495c6eSApple OSS Distributions │                   ┌────────┴────────┐      │         │        ┌─────────────┐
65*4d495c6eSApple OSS Distributions │                   │   X_K_R page    │      │         │        │             │
66*4d495c6eSApple OSS Distributions │                   │   refcnt == 1   │─ ─ ─ ┼ ─ ─ ─ ─ ┼ ─ ─ ─ ▶│ Coprocessor │
67*4d495c6eSApple OSS Distributions │                   │                 │      │         │  ┌────▶│      C      │
68*4d495c6eSApple OSS Distributions │  kernelspace      └─────────────────┘      │         │  │     │             │
69*4d495c6eSApple OSS Distributions │                                            │         │  │     └─────────────┘
70*4d495c6eSApple OSS Distributions │                                  ┌─────────┴───────┐ │  │
71*4d495c6eSApple OSS Distributions │                                  │ non-X_K_R page  │ │  │
72*4d495c6eSApple OSS Distributions │                                  │   refcnt == 3   │─┼──┘
73*4d495c6eSApple OSS Distributions │                                  │                 │ │
74*4d495c6eSApple OSS Distributions │                                  └─────────────────┘ │
75*4d495c6eSApple OSS Distributions └──────────────────────────────────────────────────────┘
76*4d495c6eSApple OSS Distributions
77*4d495c6eSApple OSS Distributions
78*4d495c6eSApple OSS Distributions  ─────────▶    Legal mapping
79*4d495c6eSApple OSS Distributions
80*4d495c6eSApple OSS Distributions
81*4d495c6eSApple OSS Distributions  ─ ─ ─ ─ ─▶   Illegal mapping
82*4d495c6eSApple OSS Distributions
83*4d495c6eSApple OSS Distributions```
84*4d495c6eSApple OSS Distributions
85*4d495c6eSApple OSS Distributions## Security value
86*4d495c6eSApple OSS Distributions
87*4d495c6eSApple OSS Distributions### Deterministic runtime mitigation
88*4d495c6eSApple OSS Distributions
89*4d495c6eSApple OSS DistributionsThis mitigation stops **any** exploitation technique that involves mapping kernel-private memory outside of the kernel address space, and forces attackers to go down the path of full classic kernel exploitation. This means facing all the kernel mitigations, including MTE. On top of mitigating all the attacks that rely on using sharing/mapping interfaces, there is an immediate impact on another class of MachVM security bugs: *Physical Use-after-free*.
90*4d495c6eSApple OSS Distributions
91*4d495c6eSApple OSS DistributionsThe Mach VM manages the lifecycle of physical pages on the system. VM maps are the source of truth of the system, and the pmap and page-tables are a live cache of that state. The Mach VM has had bugs where the page tables would have dangling page table entries (PTEs) — where these PTEs represented mappings that should not exist anymore, and that the VM lost track of.
92*4d495c6eSApple OSS Distributions
93*4d495c6eSApple OSS DistributionsWe call this class of inconsistency bugs *Physical Use-after-free (PUAF)*. When the VM thinks a page became unused, it adds it to a freelist of physical pages in order to repurpose it to hold new content, for possibly a completely different security domain. In the case of a PUAF, the VM leaves a dangling mapping that an attacker can take abuse to still access the content of a page after it has been repurposed.
94*4d495c6eSApple OSS Distributions
95*4d495c6eSApple OSS Distributions`XNU_KERNEL_RESTRICTED` forms a guarantee around this bug-class. SPTM requires that a page has no active mappings when it is retyped, and while the VM has lost track of the dangling mapping, SPTM will not. As a result, it becomes impossible for an attacker to maintain access via a dangling PTE to a page that was or would become `XNU_KERNEL_RESTRICTED`: the SPTM would detect the illegal retyping operation and would panic the system immediately. Gaining access to `XNU_KERNEL_RESTRICTED` memory via PUAF is hence deterministically stopped.
96*4d495c6eSApple OSS Distributions
97*4d495c6eSApple OSS DistributionsHowever, attackers can try to exploit PUAFs on the same frame type, which would not go through an SPTM retyping operation. For example, a page that was used for user data in a task A that gets reused to hold completely different data into a task B is such a scenario, and leads to an attacker breaking the process address space isolation the VM is meant to provide. To address that, we use a runtime check each time the Mach VM moves a physical page into a “freed” state. We simply utilize SPTM’s precise tracking of mappings and use it to assert that the page indeed has no active mapping. As a result, any direct attempt to recycle a physical page with active mappings deterministically panic the system.
98*4d495c6eSApple OSS Distributions
99*4d495c6eSApple OSS Distributions### Protecting MTE
100*4d495c6eSApple OSS Distributions
101*4d495c6eSApple OSS DistributionsWe apply MTE to the kernel to any dynamic memory that contains kernel pointers, in order to mitigate use-after-free and out-of-bounds bugs. This can also be extended to all kernel-private memory, not just the part that contains kernel pointers — but isn’t at this time. The more memory we tag, the larger the attack surface we protect.
102*4d495c6eSApple OSS Distributions
103*4d495c6eSApple OSS DistributionsHowever, if there is any way to access tagged memory without going through tag checks, MTE is bypassed. Which is why we have to disallow any attempt of mapping MTE tagged pages outside of the kernel address space, which completely coincide with the purpose of `XNU_KERNEL_RESTRICTED`. In the end, this is no different than the motivation described above; it is just that MTE makes it even more appealing for attackers to reach for said primitives, which makes `XNU_KERNEL_RESTRICTED` even more important.
104*4d495c6eSApple OSS Distributions
105*4d495c6eSApple OSS Distributions## Typing memory
106*4d495c6eSApple OSS Distributions
107*4d495c6eSApple OSS DistributionsTo know what is kernel-private with pointers, kernel-private without pointers and kernel-shareable, we use the type-based segregation provided by in [kalloc_type](https://security.apple.com/blog/towards-the-next-generation-of-xnu-memory-safety/). The zone allocator (*zalloc*) and *kmem* already propagate metadata describing the allocation, via the `KMEM_DATA`, or `KMEM_DATA_SHARED` flags:
108*4d495c6eSApple OSS Distributions
109*4d495c6eSApple OSS Distributions* All *kmem* based allocations, besides `KMEM_COMPRESSOR` and `KMEM_DATA_SHARED` pages, are typed as `XNU_KERNEL_RESTRICTED`.
110*4d495c6eSApple OSS Distributions* All *zalloc* allocations, besides the shareable data heap and ROAllocator, are typed `XNU_KERNEL_RESTRICTED`.
111*4d495c6eSApple OSS Distributions
112*4d495c6eSApple OSS Distributions
113