1*043036a2SApple OSS Distributions# Protecting Kernel-Private Memory 2*043036a2SApple OSS Distributions 3*043036a2SApple OSS Distributions## Intro 4*043036a2SApple OSS Distributions 5*043036a2SApple OSS DistributionsWe can classify all kernel-allocated memory into two high-level categories: 6*043036a2SApple OSS Distributions 7*043036a2SApple OSS Distributions1. *Kernel-private memory* 8*043036a2SApple OSS Distributions2. *Kernel-shareable memory* 9*043036a2SApple OSS Distributions 10*043036a2SApple OSS Distributions*Kernel-private memory* covers all the memory used exclusively by the kernel, that is never meant to be shared with external domains. Therefore, such memory should never be mapped into different address spaces — neither to userspace nor to coprocessors via IOMMUs/DARTs. All zone/`kalloc_type()` managed memory which contains pointers is de facto kernel private — as sharing kernel pointers with other domains would be a security violation. It is however worth noting that some data allocations are never going to be shared, and can be considered kernel-private memory. 11*043036a2SApple OSS Distributions 12*043036a2SApple OSS Distributions*Kernel-shareable memory* covers allocations made by the kernel that are meant to be shared with external address spaces by-design. Such memory is not allowed to contain kernel pointers nor any kernel-private information, and as a result is always pure data allocations. 13*043036a2SApple OSS Distributions 14*043036a2SApple OSS DistributionsA lot of work has been done in our type-segregated allocators that we can leverage so that the kernel can enforce appropriate mapping policies to make sure that kernel-private memory actually stays private even in the presence of bugs. Without such enforcement, attackers could attempt exploiting various kernel interfaces to gain access to kernel-private memory into their address space, which would bypass most of the state-of-the-art mitigations in the kernel. 15*043036a2SApple OSS Distributions 16*043036a2SApple OSS DistributionsThis document covers the problem space, the security boundaries we defend, and the technical details of the mitigation. 17*043036a2SApple OSS Distributions 18*043036a2SApple OSS Distributions## Problem space 19*043036a2SApple OSS Distributions 20*043036a2SApple OSS DistributionsThe security boundaries we consider here are: 21*043036a2SApple OSS Distributions 22*043036a2SApple OSS Distributions1. **user → kernel**: we consider attackers that have successfully compromised a userspace process, and attempt to compromise the kernel via any form of kernel vulnerability, including Mach VM logic bugs; 23*043036a2SApple OSS Distributions2. **coprocessors → kernel**: we consider attackers that have successfully compromised a coprocessor, and attempt to compromise the kernel via any RPC interface exposed by kernel extensions to these coprocessors. 24*043036a2SApple OSS Distributions 25*043036a2SApple OSS DistributionsThese boundaries are special, because they often comprise APIs to map or share memory between the kernel and userspace or coprocessors, that could be misused: 26*043036a2SApple OSS Distributions 27*043036a2SApple OSS Distributions* The Mach VM subsystem manages virtual address spaces; therefore, bugs in this subsystem could be abused to create illegal mappings to kernel-private memory. 28*043036a2SApple OSS Distributions* Many coprocessors operate on memory shared with the Application Processor, and need to access memory owned by userspace tasks as well as memory managed by the kernel. Because of that, some kernel extensions expose RPC interfaces to their counterpart coprocessor, that allow for mapping memory via their IOMMU/DART. This exposes a wide — and usually bespoke — attack surface that can lead to illegal mappings to kernel-private memory to be created. 29*043036a2SApple OSS Distributions 30*043036a2SApple OSS DistributionsIf attackers could gain the ability to map kernel-private memory into an address space they control, they effectively defeat the boundary. This allows them to access kernel pointers freely, which at least gives them a way to guide attacks — if the mapping is read-only — but could even give them arbitrary kernel read-write right away. At this point, most kernel mitigations can more easily be bypassed, and exploitation becomes significantly easier. 31*043036a2SApple OSS Distributions 32*043036a2SApple OSS Distributions## XNU_KERNEL_RESTRICTED 33*043036a2SApple OSS Distributions 34*043036a2SApple OSS DistributionsThe Secure Page Table Monitor (SPTM) is highly privileged component that defines and enforces all the policies that govern page table management, for both the kernel and user applications, on behalf of XNU. Its goal is to protect the overall system by securing the page tables against bad actors, even in the presence of a compromised kernel. 35*043036a2SApple OSS Distributions 36*043036a2SApple OSS DistributionsThe SPTM has a *type system*, which sits at the heart of the SPTM security policies and primarily comprises the *frame table*, a data structure that stores metadata associated with every managed physical page in the system, alongside an immutable security policies that described what is allowed or disallowed for that specific physical page at any given time. For each frame type, there is a very clear set of policies that governs the permitted states for a given physical page and restricts which transitions are allowed. 37*043036a2SApple OSS Distributions 38*043036a2SApple OSS DistributionsTo address the above, the SPTM introduced a dedicated frame type for kernel-private memory: `XNU_KERNEL_RESTRICTED` (X_K_R). This type has three special policies that the SPTM enforces even in the presence of an XNU compromise: 39*043036a2SApple OSS Distributions 40*043036a2SApple OSS Distributions1. `XNU_KERNEL_RESTRICTED` pages can only be mapped in the kernel address space — hence never in any user process. 41*043036a2SApple OSS Distributions2. `XNU_KERNEL_RESTRICTED` pages are not allowed be mapped via IOMMU/DART. 42*043036a2SApple OSS Distributions3. `XNU_KERNEL_RESTRICTED` pages are only allowed a single mapping beyond the physical aperture static one. 43*043036a2SApple OSS Distributions 44*043036a2SApple OSS DistributionsBecause all transitions that would affect mappings have to go through the SPTM, these policies can be enforced, and will lead to a panic if an `XNU_KERNEL_RESTRICTED` page is being involved in an illegal transition. 45*043036a2SApple OSS Distributions 46*043036a2SApple OSS Distributions 47*043036a2SApple OSS Distributions``` 48*043036a2SApple OSS Distributions 49*043036a2SApple OSS Distributions ┌──────────────────────────────────────────────────────┐ 50*043036a2SApple OSS Distributions │ │ 51*043036a2SApple OSS Distributions │ │ 52*043036a2SApple OSS Distributions │ ┌────────────┐ ┌────────────┐ │ 53*043036a2SApple OSS Distributions │ │ │ │ │ │ 54*043036a2SApple OSS Distributions │ userspace │ Task A │ │ Task B │ │ 55*043036a2SApple OSS Distributions │ │ │ │ │ │ 56*043036a2SApple OSS Distributions │ └──────▲─────┘ └─────▲──────┘ │ 57*043036a2SApple OSS Distributions │ │ │ 58*043036a2SApple OSS Distributions │ │ │ │ 59*043036a2SApple OSS Distributions │ │ │ 60*043036a2SApple OSS Distributions │ │ │ │ 61*043036a2SApple OSS Distributions │ │ │ 62*043036a2SApple OSS Distributions ├────────────────────────────┼───────────────┼─────────┤ 63*043036a2SApple OSS Distributions │ │ │ 64*043036a2SApple OSS Distributions │ ┌────────┴────────┐ │ │ ┌─────────────┐ 65*043036a2SApple OSS Distributions │ │ X_K_R page │ │ │ │ │ 66*043036a2SApple OSS Distributions │ │ refcnt == 1 │─ ─ ─ ┼ ─ ─ ─ ─ ┼ ─ ─ ─ ▶│ Coprocessor │ 67*043036a2SApple OSS Distributions │ │ │ │ │ ┌────▶│ C │ 68*043036a2SApple OSS Distributions │ kernelspace └─────────────────┘ │ │ │ │ │ 69*043036a2SApple OSS Distributions │ │ │ │ └─────────────┘ 70*043036a2SApple OSS Distributions │ ┌─────────┴───────┐ │ │ 71*043036a2SApple OSS Distributions │ │ non-X_K_R page │ │ │ 72*043036a2SApple OSS Distributions │ │ refcnt == 3 │─┼──┘ 73*043036a2SApple OSS Distributions │ │ │ │ 74*043036a2SApple OSS Distributions │ └─────────────────┘ │ 75*043036a2SApple OSS Distributions └──────────────────────────────────────────────────────┘ 76*043036a2SApple OSS Distributions 77*043036a2SApple OSS Distributions 78*043036a2SApple OSS Distributions ─────────▶ Legal mapping 79*043036a2SApple OSS Distributions 80*043036a2SApple OSS Distributions 81*043036a2SApple OSS Distributions ─ ─ ─ ─ ─▶ Illegal mapping 82*043036a2SApple OSS Distributions 83*043036a2SApple OSS Distributions``` 84*043036a2SApple OSS Distributions 85*043036a2SApple OSS Distributions## Security value 86*043036a2SApple OSS Distributions 87*043036a2SApple OSS Distributions### Deterministic runtime mitigation 88*043036a2SApple OSS Distributions 89*043036a2SApple OSS DistributionsThis mitigation stops **any** exploitation technique that involves mapping kernel-private memory outside of the kernel address space, and forces attackers to go down the path of full classic kernel exploitation. This means facing all the kernel mitigations, including MTE. On top of mitigating all the attacks that rely on using sharing/mapping interfaces, there is an immediate impact on another class of MachVM security bugs: *Physical Use-after-free*. 90*043036a2SApple OSS Distributions 91*043036a2SApple OSS DistributionsThe Mach VM manages the lifecycle of physical pages on the system. VM maps are the source of truth of the system, and the pmap and page-tables are a live cache of that state. The Mach VM has had bugs where the page tables would have dangling page table entries (PTEs) — where these PTEs represented mappings that should not exist anymore, and that the VM lost track of. 92*043036a2SApple OSS Distributions 93*043036a2SApple OSS DistributionsWe call this class of inconsistency bugs *Physical Use-after-free (PUAF)*. When the VM thinks a page became unused, it adds it to a freelist of physical pages in order to repurpose it to hold new content, for possibly a completely different security domain. In the case of a PUAF, the VM leaves a dangling mapping that an attacker can take abuse to still access the content of a page after it has been repurposed. 94*043036a2SApple OSS Distributions 95*043036a2SApple OSS Distributions`XNU_KERNEL_RESTRICTED` forms a guarantee around this bug-class. SPTM requires that a page has no active mappings when it is retyped, and while the VM has lost track of the dangling mapping, SPTM will not. As a result, it becomes impossible for an attacker to maintain access via a dangling PTE to a page that was or would become `XNU_KERNEL_RESTRICTED`: the SPTM would detect the illegal retyping operation and would panic the system immediately. Gaining access to `XNU_KERNEL_RESTRICTED` memory via PUAF is hence deterministically stopped. 96*043036a2SApple OSS Distributions 97*043036a2SApple OSS DistributionsHowever, attackers can try to exploit PUAFs on the same frame type, which would not go through an SPTM retyping operation. For example, a page that was used for user data in a task A that gets reused to hold completely different data into a task B is such a scenario, and leads to an attacker breaking the process address space isolation the VM is meant to provide. To address that, we use a runtime check each time the Mach VM moves a physical page into a “freed” state. We simply utilize SPTM’s precise tracking of mappings and use it to assert that the page indeed has no active mapping. As a result, any direct attempt to recycle a physical page with active mappings deterministically panic the system. 98*043036a2SApple OSS Distributions 99*043036a2SApple OSS Distributions### Protecting MTE 100*043036a2SApple OSS Distributions 101*043036a2SApple OSS DistributionsWe apply MTE to the kernel to any dynamic memory that contains kernel pointers, in order to mitigate use-after-free and out-of-bounds bugs. This can also be extended to all kernel-private memory, not just the part that contains kernel pointers — but isn’t at this time. The more memory we tag, the larger the attack surface we protect. 102*043036a2SApple OSS Distributions 103*043036a2SApple OSS DistributionsHowever, if there is any way to access tagged memory without going through tag checks, MTE is bypassed. Which is why we have to disallow any attempt of mapping MTE tagged pages outside of the kernel address space, which completely coincide with the purpose of `XNU_KERNEL_RESTRICTED`. In the end, this is no different than the motivation described above; it is just that MTE makes it even more appealing for attackers to reach for said primitives, which makes `XNU_KERNEL_RESTRICTED` even more important. 104*043036a2SApple OSS Distributions 105*043036a2SApple OSS Distributions## Typing memory 106*043036a2SApple OSS Distributions 107*043036a2SApple OSS DistributionsTo know what is kernel-private with pointers, kernel-private without pointers and kernel-shareable, we use the type-based segregation provided by in [kalloc_type](https://security.apple.com/blog/towards-the-next-generation-of-xnu-memory-safety/). The zone allocator (*zalloc*) and *kmem* already propagate metadata describing the allocation, via the `KMEM_DATA`, or `KMEM_DATA_SHARED` flags: 108*043036a2SApple OSS Distributions 109*043036a2SApple OSS Distributions* All *kmem* based allocations, besides `KMEM_COMPRESSOR` and `KMEM_DATA_SHARED` pages, are typed as `XNU_KERNEL_RESTRICTED`. 110*043036a2SApple OSS Distributions* All *zalloc* allocations, besides the shareable data heap and ROAllocator, are typed `XNU_KERNEL_RESTRICTED`. 111*043036a2SApple OSS Distributions 112*043036a2SApple OSS Distributions 113