xref: /xnu-10002.61.3/doc/memorystatus/overview.md (revision 0f4c859e951fba394238ab619495c4e1d54d0f34)
1*0f4c859eSApple OSS Distributions# Memorystatus Subsystem
2*0f4c859eSApple OSS Distributions
3*0f4c859eSApple OSS Distributions1. [Overview](#overview)
4*0f4c859eSApple OSS Distributions1. [Code layout](#code-layout)
5*0f4c859eSApple OSS Distributions1. [Design](#design)
6*0f4c859eSApple OSS Distributions1. [Threads](#threads)
7*0f4c859eSApple OSS Distributions1. [Snapshots](#snapshots)
8*0f4c859eSApple OSS Distributions1. [Dumping Caches](#dumping-caches)
9*0f4c859eSApple OSS Distributions
10*0f4c859eSApple OSS Distributions## Overview
11*0f4c859eSApple OSS Distributions<a name="overview"></a>
12*0f4c859eSApple OSS Distributions
13*0f4c859eSApple OSS DistributionsThe xnu memorystatus subsystem is responsible for recovering the system when we're running dangerously low
14*0f4c859eSApple OSS Distributionscertain resources. Currently it monitors the following resources:
15*0f4c859eSApple OSS Distributions
16*0f4c859eSApple OSS Distributions- memory
17*0f4c859eSApple OSS Distributions- vnodes
18*0f4c859eSApple OSS Distributions- compressor space
19*0f4c859eSApple OSS Distributions- swap space
20*0f4c859eSApple OSS Distributions- zone map VA
21*0f4c859eSApple OSS Distributions
22*0f4c859eSApple OSS DistributionsDepending on the resource, there are a variety of actions that memorystatus might take.
23*0f4c859eSApple OSS DistributionsOne of the most common actions is to kill 1 or more processes in an attempt to recover the system.
24*0f4c859eSApple OSS DistributionsIn addition to monitoring system level resources, the memorystatus code is also responsible
25*0f4c859eSApple OSS Distributionsfor killing processes that go over their per-process memory limits.
26*0f4c859eSApple OSS Distributions
27*0f4c859eSApple OSS DistributionsThe memorystatus contains code to perform four actions in response to resource shortages:
28*0f4c859eSApple OSS Distributions- Kill Processes
29*0f4c859eSApple OSS Distributions- Freeze Processes
30*0f4c859eSApple OSS Distributions- Send warning notifications
31*0f4c859eSApple OSS Distributions- Swap memory from apps
32*0f4c859eSApple OSS Distributions
33*0f4c859eSApple OSS DistributionsEach of these actions are  covered in their own document in this folder.
34*0f4c859eSApple OSS Distributions
35*0f4c859eSApple OSS Distributions## Code Layout
36*0f4c859eSApple OSS Distributions<a name="code-layout"></a>
37*0f4c859eSApple OSS Distributions
38*0f4c859eSApple OSS DistributionsThe memorystatus code lives on the BSD side of xnu. It's comprised of the following C files:
39*0f4c859eSApple OSS Distributions
40*0f4c859eSApple OSS Distributions- `bsd/kern/kern_memorystatus_policy.c`
41*0f4c859eSApple OSS Distributions  Contains the policy decisions around when to perform which action.
42*0f4c859eSApple OSS Distributions- `bsd/kern/kern_memorystatus_freeze.c`
43*0f4c859eSApple OSS Distributions  Implementation of the freezer. See `doc/memorystatus/freezer.md` for details.
44*0f4c859eSApple OSS Distributions- `bsd/kern/kern_memorystatus.c`
45*0f4c859eSApple OSS Distributions  Contains mechanical code to implement the kill and swap actions. Should not contain any policy
46*0f4c859eSApple OSS Distributions  (that should be in `bsd/kern/kern_memorystatus_policy.c`), but that's a recent refactor so
47*0f4c859eSApple OSS Distributions  is a bit of a WIP.
48*0f4c859eSApple OSS Distributions- `bsd/kern/kern_memorystatus_notify.c`
49*0f4c859eSApple OSS Distributions  Contains both the policy and mechanical bits to send out memory pressure notifications. See `doc/memorystatus/notify.md`
50*0f4c859eSApple OSS Distributions
51*0f4c859eSApple OSS DistributionsAnd the following headers:
52*0f4c859eSApple OSS Distributions- `bsd/kern/kern_memorystatus_internal.h`
53*0f4c859eSApple OSS Distributions- `bsd/sys/kern_memorystatus_notify.h`
54*0f4c859eSApple OSS Distributions- `bsd/sys/kern_memorystatus_freeze.h`
55*0f4c859eSApple OSS Distributions- `bsd/sys/kern_memorystatus.h`
56*0f4c859eSApple OSS Distributions
57*0f4c859eSApple OSS Distributions## Design
58*0f4c859eSApple OSS Distributions<a name="design"></a>
59*0f4c859eSApple OSS Distributions
60*0f4c859eSApple OSS DistributionsThe memorystatus subsystem is designed around a central health check.
61*0f4c859eSApple OSS DistributionsAll of the fields in this health check are defined in the `memorystatus_system_health_t` struct. See `bsd/kern/kern_memorystatus_internal.h` for the struct definition.
62*0f4c859eSApple OSS Distributions
63*0f4c859eSApple OSS DistributionsMost of the monitoring and actions taken by the memorystatus subsystem happen in the `memorystatus_thread` (`bsd/kern/kern_memorystatus.c`). However, there are some synchronous actions that happen on other threads. See `doc/memorystatus/kill.md` for more documentation on specific kill types.
64*0f4c859eSApple OSS Distributions
65*0f4c859eSApple OSS DistributionsWhenever it's woken up the memorystatus thread does the following:
66*0f4c859eSApple OSS Distributions1. Fill in the system health state by calling `memorystatus_health_check`)
67*0f4c859eSApple OSS Distributions1. Log this state to the os log (or serial if we're early in boot)
68*0f4c859eSApple OSS Distributions1. Check if the system is healthy via `memorystatus_is_system_healthy`
69*0f4c859eSApple OSS Distributions1. If the system is unhealthy, pick a recovery action and perform it. See `memorystatus_pick_action` (in `bsd/kern/kern_memorystatus_policy.c`) for the conditions that trigger specific actions. Note that we sometimes do pre-emptive actions on a healthy system if we're somewhat low on a specific resource. For example, we'll kill procs over their soft limit if we're under 15% available pages even if the system is otherwise healthy.
70*0f4c859eSApple OSS Distributions1. Go back to step 1 until the system is healthy and the thread can block.
71*0f4c859eSApple OSS Distributions
72*0f4c859eSApple OSS DistributionsNotice that the memorystatus thread does not explicitly check why it was woken up.
73*0f4c859eSApple OSS DistributionsTo keep the synchronization simple, anytime a resource shortage is detected the memorystatus
74*0f4c859eSApple OSS Distributionsthread is woken up *blindly* and it will do a full system health check.
75*0f4c859eSApple OSS Distributions
76*0f4c859eSApple OSS Distributions### Jetsam Bands
77*0f4c859eSApple OSS Distributions
78*0f4c859eSApple OSS DistributionsThe memorystatus subsystem has 210 priority levels. Every process in the system (except launchd) has a jetsam priority level. Higher numbers are more important.
79*0f4c859eSApple OSS Distributions
80*0f4c859eSApple OSS DistributionsEach priority level is tracked as a TAILQ linked list . There is one global array, `memstat_bucket`, containing all of these TAILQ lists.
81*0f4c859eSApple OSS DistributionsA process's priority is tracked in the proc structure (See `bsd/sys/proc_internal.h`). `p_memstat_effective_priority` stores the proc's current jetsam priority, and `p_memstat_list` stores the TAILQ linkage. All lists are protected by the `proc_list_mlock` (Yes this is bad for scalability. Ideally we'd use finer grain locking or at least not share the global lock with the scheduler. See [rdar://36390487](rdar://36390487)) .
82*0f4c859eSApple OSS Distributions
83*0f4c859eSApple OSS DistributionsMany kill types kill in ascending jetsam priority level. See `doc/memorystatus/kill.md` for more details.
84*0f4c859eSApple OSS DistributionsThe jetsam band is either asserted by [RunningBoard](https://stashweb.sd.apple.com/projects/COREOS/repos/runningboard/browse) (apps and runningboard managed daemons) or determined by the jetsam priority set in the [JetsamProperties](https://stashweb.sd.apple.com/projects/COREOS/repos/jetsamproperties/browse) database.
85*0f4c859eSApple OSS Distributions
86*0f4c859eSApple OSS DistributionsFor reference, here are some of the band numbers:
87*0f4c859eSApple OSS Distributions| Band Number | Name | Description |
88*0f4c859eSApple OSS Distributions| ----------- | ---- | ----------- |
89*0f4c859eSApple OSS Distributions| 0 | `JETSAM_PRIORITY_IDLE` | Idle processes |
90*0f4c859eSApple OSS Distributions| 30 | `JETSAM_PRIORITY_BACKGROUND` | Docked apps on iOS. Some active daemons on other platforms. |
91*0f4c859eSApple OSS Distributions| 40 | `JETSAM_PRIORITY_MAIL` | Docked apps on watchOS. Some active daemons on other platforms. |
92*0f4c859eSApple OSS Distributions| 75 | `JETSAM_PRIORITY_FREEZER` | Suspended & frozen processes |
93*0f4c859eSApple OSS Distributions| 100 | `JETSAM_PRIORITY_FOREGROUND` | Foreground app processes |
94*0f4c859eSApple OSS Distributions| 140 | - | mediaserverd |
95*0f4c859eSApple OSS Distributions| 160 | `JETSAM_PRIORITY_HOME` | SpringBoard |
96*0f4c859eSApple OSS Distributions| 180 | `JETSAM_PRIORITY_IMPORTANT` | RunningBoard, watchdogd, thermalmonitord, etc.. |
97*0f4c859eSApple OSS Distributions| 190 | `JETSAM_PRIORITY_CRITICAL` | CommCenter |
98*0f4c859eSApple OSS Distributions
99*0f4c859eSApple OSS DistributionsSee the full jetsam band reference on [confluence](https://confluence.sd.apple.com/display/allOSSystemsInternals/Jetsam#Jetsam-JetsamPriorities).
100*0f4c859eSApple OSS Distributions
101*0f4c859eSApple OSS Distributions### Daemon lifecycle
102*0f4c859eSApple OSS Distributions
103*0f4c859eSApple OSS DistributionsThe memorystatus subsystem is heavily intertwined with daemon lifecycle. A full discussion of daemon lifecycle is outside the scope of this document. If you're curious, here are some good resources:
104*0f4c859eSApple OSS Distributions- [Daemon Overview](https://confluence.sd.apple.com/display/allOSSystemsInternals/Daemons#)
105*0f4c859eSApple OSS Distributions- [RunningBoard's Process Management Documentation](https://confluence.sd.apple.com/display/allOSSystemsInternals/Process+Management+Paradigms)
106*0f4c859eSApple OSS Distributions- [PressuredExit (A.K.A. activity tracking)](https://confluence.sd.apple.com/display/allOSSystemsInternals/Pressured+Exit)
107*0f4c859eSApple OSS Distributions
108*0f4c859eSApple OSS DistributionsFrom the perspective of memorystatus there are essentially two kinds of processes: managed and unmanaged. Managed processes have their lifecycle managed by RunningBoard and have the `P_MEMSTAT_MANAGED` bit set on the `p_memstat_state` field. RunningBoard moves these processes between different jetsam bands based on their open assertions.
109*0f4c859eSApple OSS Distributions
110*0f4c859eSApple OSS DistributionsUnmanaged processes go into their active jetsam band when they take out transactions.
111*0f4c859eSApple OSS Distributions
112*0f4c859eSApple OSS DistributionsDaemons have different memory limits when they're inactive (in band 0) vs. active (above band 0). The inactive memory limit, active memory limit, and active jetsam band are determined via [JetsamProperties](https://stashweb.sd.apple.com/projects/COREOS/repos/jetsamproperties/browse). [Launchd](https://stashweb.sd.apple.com/projects/COREOS/repos/libxpc/browse) reads the JetsamProperties database and passes these values down to the kernel via posix_spawn(2) attributes. memorystatus stashes these values on the proc structure (`p_memstat_memlimit_active`, `p_memstat_memlimit_inactive`, `p_memstat_requestedpriority`), and applies them as daemons move between states.
113*0f4c859eSApple OSS Distributions
114*0f4c859eSApple OSS Distributions### Memory Monitoring
115*0f4c859eSApple OSS Distributions
116*0f4c859eSApple OSS DistributionsMemorystatus makes most memory decisions based on the `memorystatus_available_pages` metric. This metric reflects the number of pages that memorystatus thinks could quickly be made free. This metric is defined in the `VM_CHECK_MEMORYSTATUS` macro in `osfmk/vm/vm_page.h`.
117*0f4c859eSApple OSS Distributions
118*0f4c859eSApple OSS DistributionsCurrently on non-macOS systems, it's defined as `pageable_external + free + secluded_over_target + purgeable`. Breaking that down:
119*0f4c859eSApple OSS Distributions- pageable_external: file backed page count
120*0f4c859eSApple OSS Distributions- free: free page count
121*0f4c859eSApple OSS Distributions- secluded_over_target: `(vm_page_secluded_count - vm_page_secluded_target)`. This target comes from the device tree `kern.secluded_mem_mb`. Secluded memory is a special pool of memory that's intended for the camera so that it can startup faster on memory constrained systems.
122*0f4c859eSApple OSS Distributions- purgeable: The number of purgeable volatile pages in the system. Purgeable memory is an API for clients to specify that the VM can treat the contents of a range of pages as volatile and quickly free the backing pages under pressure. See `osfmk/mach/vm_purgable.h` for the API. Note that the API was accidentally exported with incorrect spelling ("purgable" instead of "purgeable")
123*0f4c859eSApple OSS Distributions
124*0f4c859eSApple OSS DistributionsSince we purge purgeable memory and trim the secluded pool quickly under memory pressure, this can generally be approximated to `free + file_backed` for a system under pressure.
125*0f4c859eSApple OSS Distributions
126*0f4c859eSApple OSS Distributions
127*0f4c859eSApple OSS DistributionsThe `VM_CHECK_MEMORYSTATUS` macro is called whenever a page is allocated, wired, freed, etc... Basically `memorystatus_available_pages` is supposed to always be accurate down to a page level. On our larger memory systems (8 and 16GB iPads in particular) this might be overkill.
128*0f4c859eSApple OSS DistributionsAnd it calls into `memorystatus_pages_update` to actually update `memorystatus_available_pages` and issue the blind wakeup of the memorystatus thread if necessary. `memorystatus_pages_update` is also responsible for waking the freezer and memory pressure notification threads.
129*0f4c859eSApple OSS Distributions
130*0f4c859eSApple OSS Distributions<a name="threads"></a>
131*0f4c859eSApple OSS Distributions
132*0f4c859eSApple OSS DistributionsThis section lists the threads that comprise the memorystatus subsystem. More details on each thread are below.
133*0f4c859eSApple OSS Distributions
134*0f4c859eSApple OSS Distributions| Thread name | Main function | wake event |
135*0f4c859eSApple OSS Distributions| ----------- | ------------- | ---------- |
136*0f4c859eSApple OSS Distributions| VM\_memorystatus\_1 | `memorystatus_thread` | `jt_wakeup_cond` in `jetsam_thread_state_t` |
137*0f4c859eSApple OSS Distributions| VM\_freezer | `memorystatus_freeze_thread` | `memorystatus_freeze_wakeup` |
138*0f4c859eSApple OSS Distributions| VM\_pressure | `vm_pressure_thread` | `vm_pressure_thread` |
139*0f4c859eSApple OSS Distributions
140*0f4c859eSApple OSS Distributions### VM\_memorystatus_1
141*0f4c859eSApple OSS Distributions
142*0f4c859eSApple OSS DistributionsThis is the jetsam thread. It's responsible for running the system health check and performing most jetsam kills (see `doc/memorystatus/kill.md` for a kill breakdown).
143*0f4c859eSApple OSS Distributions
144*0f4c859eSApple OSS DistributionsIt's woken up via a call to `memorystatus_thread_wake` whenever any subsystem determines we're running low on a monitored resource. The wakeup is blind and the thread will immediately do a health check to determine what's wrong with the system.
145*0f4c859eSApple OSS Distributions
146*0f4c859eSApple OSS DistributionsNB: There are technically three memorystatus threads: `VM_memorystatus_1`, `VM_memorystatus_2`, and `VM_memorystatus_3`. But we currently only use `VM_memorystatus_1`. At one point we tried to parallelize jetsam to speed it up, but this effort was unsuccessful. The other threads are just dead code at this point.
147*0f4c859eSApple OSS Distributions
148*0f4c859eSApple OSS Distributions### VM\_freezer
149*0f4c859eSApple OSS Distributions
150*0f4c859eSApple OSS DistributionsThis is the freezer thread. It's responsible for freezing processes under memory pressure and demoting processes when the freezer is full. See `doc/memorystatus/freeze.md` for more details on the freezer.
151*0f4c859eSApple OSS Distributions
152*0f4c859eSApple OSS DistributionsIt's woken up by issuing a `thread_wakeup` call to the `memorystatus_freeze_wakeup` global. This is done in `memorystatus_pages_update` if `memorystatus_freeze_thread_should_run` returns true. It's also done whenever `memorystatus_on_inactivity` runs.
153*0f4c859eSApple OSS Distributions
154*0f4c859eSApple OSS DistributionsUpon wakeup the freezer thread will call `memorystatus_pick_freeze_count_for_wakeup` and attempt
155*0f4c859eSApple OSS Distributionsto freeze up to that many processes before blocking. `memorystatus_pick_freeze_count_for_wakeup` returns 1 on most platforms. But if app swap is enabled (M1 and later iPad Pros) it will return the total number of procs in all eligible bands.
156*0f4c859eSApple OSS Distributions
157*0f4c859eSApple OSS Distributions### VM\_pressure
158*0f4c859eSApple OSS Distributions
159*0f4c859eSApple OSS DistributionsThis is the memorystatus notification thread. It's woken up by the pageout thread via `vm_pressure_response`. `vm_pressure_response` is also called in `memorystatus_pages_update`.
160*0f4c859eSApple OSS Distributions
161*0f4c859eSApple OSS DistributionsWhen awoken it calls `consider_vm_pressure_events` which winds its way to `memorystatus_update_vm_pressure`. This routine checks if the pressure level has changed and issues memory pressure notifications. It also schedules the thread call for sustained pressure kills.
162*0f4c859eSApple OSS Distributions
163*0f4c859eSApple OSS DistributionsOn macOS this thread also does idle exit kills.
164*0f4c859eSApple OSS Distributions
165*0f4c859eSApple OSS Distributions## Snapshots
166*0f4c859eSApple OSS Distributions<a name="snapshots"></a>
167*0f4c859eSApple OSS DistributionsThe memorystatus subsystem provides a snapshot mechanism so that
168*0f4c859eSApple OSS DistributionsReportCrash can generate JetsamEvent.ips files. These files contain
169*0f4c859eSApple OSS Distributionsa snapshot of the system at the time that memorystatus performed
170*0f4c859eSApple OSS Distributionssome kills. The snapshot data structure is `memorystatus_jetsam_snapshot_t` defined in `bsd/sys/kern_memorystatus.h`. Generally speaking the snapshot contains system level memory statistics along with entries for each process in the system. Since we do not want to wake up ReportCrash while the system is low on memory, we maintain one global snapshot (`memorystatus_jetsam_snapshot` in `bsd/kern/kern_memorystatus.c`) while we're performing kills and only wake up ReportCrash once the system is healthy again. See `memorystatus_post_snapshot` in `bsd/kern/kern_memorystatus.c` which is called right before the jetsam thread blocks.
171*0f4c859eSApple OSS Distributions
172*0f4c859eSApple OSS Distributions**NB**: Posting the snapshot just means sending a notification to userspace that the snapshot is ready. Userspace (currently OSAnalytics) must make the `memorystatus_control` syscall with the `MEMORYSTATUS_CMD_GET_JETSAM_SNAPSHOT` subcommand to retrieve the snapshot. See `memorystatus_cmd_get_jetsam_snapshot` in `bsd/kern/kern_memorystatus.c` for details. Since we only have one global snapshot its cleared on read and thus can only have 1 consumer in userspace.
173*0f4c859eSApple OSS Distributions
174*0f4c859eSApple OSS Distributions### Freezer Snapshot
175*0f4c859eSApple OSS DistributionsThe freezer snapshot, `memorystatus_jetsam_snapshot_freezer`, is a second global jetsam snapshot object. It reuses the snapshot struct definition but only contains apps that have been jetsammed.
176*0f4c859eSApple OSS Distributionsdasd reads this snapshot and uses it as an input for its freezer recommendation algorithm. However, we're not currently using the dasd recommendation algorithm for the freezer so this snapshot really only serves a diagnostic purpose today.
177*0f4c859eSApple OSS DistributionsThis snapshot is also reset when dasd reads it. Note that it has to be separate from the OSAnalytics snapshot so that these daemons can read the snapshots independently.
178*0f4c859eSApple OSS Distributions
179*0f4c859eSApple OSS Distributions## Dumping Caches
180*0f4c859eSApple OSS Distributions<a name="dumping-caches"></a>
181*0f4c859eSApple OSS Distributions
182*0f4c859eSApple OSS DistributionsIn general system caches should be cleared before we do higher band jetsams. Userspace entities should do this via purgeable memory if possible, or memory pressure notifications if not. In the kernel, memorystatus calls `memorystatus_approaching_fg_band` when we're about to do a fg band kill. This in turn calls `memorystatus_dump_caches` to clear the PPLs cache and purge all task corpses. This also sends out a notification to other entities to clear their caches (see `memorystatus_issue_fg_band_notify`). To avoid unnecessary corpse forking and purging, memorystatus blocks all additional corpse creation after it purges them until the system returns to a healthy state.
183