x86/mm: Eliminate window where TLB flushes may be inadvertently skipped
tl;dr: There is a window in the mm switching code where the new CR3 is
set and the CPU should be getting TLB flushes for the new mm. But
should_flush_tlb() has a bug and suppresses the flush. Fix it by
widening the window where should_flush_tlb() sends an IPI.
Long Version:
=== History ===
There were a few things leading up to this.
First, updating mm_cpumask() was observed to be too expensive, so it was
made lazier. But being lazy caused too many unnecessary IPIs to CPUs
due to the now-lazy mm_cpumask(). So code was added to cull
mm_cpumask() periodically[2]. But that culling was a bit too aggressive
and skipped sending TLB flushes to CPUs that need them. So here we are
again.
=== Problem ===
The too-aggressive code in should_flush_tlb() strikes in this window:
// Turn on IPIs for this CPU/mm combination, but only
// if should_flush_tlb() agrees:
cpumask_set_cpu(cpu, mm_cpumask(next));
next_tlb_gen = atomic64_read(&next->context.tlb_gen);
choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);
load_new_mm_cr3(need_flush);
// ^ After 'need_flush' is set to false, IPIs *MUST*
// be sent to this CPU and not be ignored.
this_cpu_write(cpu_tlbstate.loaded_mm, next);
// ^ Not until this point does should_flush_tlb()
// become true!
should_flush_tlb() will suppress TLB flushes between load_new_mm_cr3()
and writing to 'loaded_mm', which is a window where they should not be
suppressed. Whoops.
=== Solution ===
Thankfully, the fuzzy "just about to write CR3" window is already marked
with loaded_mm==LOADED_MM_SWITCHING. Simply checking for that state in
should_flush_tlb() is sufficient to ensure that the CPU is targeted with
an IPI.
This will cause more TLB flush IPIs. But the window is relatively small
and I do not expect this to cause any kind of measurable performance
impact.
Update the comment where LOADED_MM_SWITCHING is written since it grew
yet another user.
Peter Z also raised a concern that should_flush_tlb() might not observe
'loaded_mm' and 'is_lazy' in the same order that switch_mm_irqs_off()
writes them. Add a barrier to ensure that they are observed in the
order they are written.
Analysis and contextual insights are available on OpenCVE Cloud.
No vendor fix or workaround currently provided.
Additional remediation guidance may be available on OpenCVE Cloud.
Tracking
Sign in to view the affected projects.
| Source | ID | Title |
|---|---|---|
Debian DLA |
DLA-4271-1 | linux-6.1 security update |
Debian DSA |
DSA-5925-1 | linux security update |
EUVD |
EUVD-2025-15884 | In the Linux kernel, the following vulnerability has been resolved: x86/mm: Eliminate window where TLB flushes may be inadvertently skipped tl;dr: There is a window in the mm switching code where the new CR3 is set and the CPU should be getting TLB flushes for the new mm. But should_flush_tlb() has a bug and suppresses the flush. Fix it by widening the window where should_flush_tlb() sends an IPI. Long Version: === History === There were a few things leading up to this. First, updating mm_cpumask() was observed to be too expensive, so it was made lazier. But being lazy caused too many unnecessary IPIs to CPUs due to the now-lazy mm_cpumask(). So code was added to cull mm_cpumask() periodically[2]. But that culling was a bit too aggressive and skipped sending TLB flushes to CPUs that need them. So here we are again. === Problem === The too-aggressive code in should_flush_tlb() strikes in this window: // Turn on IPIs for this CPU/mm combination, but only // if should_flush_tlb() agrees: cpumask_set_cpu(cpu, mm_cpumask(next)); next_tlb_gen = atomic64_read(&next->context.tlb_gen); choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush); load_new_mm_cr3(need_flush); // ^ After 'need_flush' is set to false, IPIs *MUST* // be sent to this CPU and not be ignored. this_cpu_write(cpu_tlbstate.loaded_mm, next); // ^ Not until this point does should_flush_tlb() // become true! should_flush_tlb() will suppress TLB flushes between load_new_mm_cr3() and writing to 'loaded_mm', which is a window where they should not be suppressed. Whoops. === Solution === Thankfully, the fuzzy "just about to write CR3" window is already marked with loaded_mm==LOADED_MM_SWITCHING. Simply checking for that state in should_flush_tlb() is sufficient to ensure that the CPU is targeted with an IPI. This will cause more TLB flush IPIs. But the window is relatively small and I do not expect this to cause any kind of measurable performance impact. Update the comment where LOADED_MM_SWITCHING is written since it grew yet another user. Peter Z also raised a concern that should_flush_tlb() might not observe 'loaded_mm' and 'is_lazy' in the same order that switch_mm_irqs_off() writes them. Add a barrier to ensure that they are observed in the order they are written. |
Ubuntu USN |
USN-7654-1 | Linux kernel vulnerabilities |
Ubuntu USN |
USN-7654-2 | Linux kernel (Real-time) vulnerabilities |
Ubuntu USN |
USN-7654-3 | Linux kernel (FIPS) vulnerabilities |
Ubuntu USN |
USN-7654-4 | Linux kernel (KVM) vulnerabilities |
Ubuntu USN |
USN-7654-5 | Linux kernel (Xilinx ZynqMP) vulnerabilities |
Ubuntu USN |
USN-7655-1 | Linux kernel (Intel IoTG) vulnerabilities |
Ubuntu USN |
USN-7686-1 | Linux kernel (Raspberry Pi) vulnerabilities |
Ubuntu USN |
USN-7699-1 | Linux kernel vulnerabilities |
Ubuntu USN |
USN-7699-2 | Linux kernel (HWE) vulnerabilities |
Ubuntu USN |
USN-7711-1 | Linux kernel (Azure) vulnerabilities |
Ubuntu USN |
USN-7712-1 | Linux kernel (Azure FIPS) vulnerabilities |
Ubuntu USN |
USN-7712-2 | Linux kernel (Azure) vulnerabilities |
Ubuntu USN |
USN-7721-1 | Linux kernel (Azure) vulnerabilities |
Tue, 16 Dec 2025 20:45:00 +0000
| Type | Values Removed | Values Added |
|---|---|---|
| First Time appeared |
Debian
Debian debian Linux |
|
| Weaknesses | NVD-CWE-noinfo | |
| CPEs | cpe:2.3:o:debian:debian_linux:11.0:*:*:*:*:*:*:* cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* cpe:2.3:o:linux:linux_kernel:6.15:rc1:*:*:*:*:*:* cpe:2.3:o:linux:linux_kernel:6.15:rc2:*:*:*:*:*:* cpe:2.3:o:linux:linux_kernel:6.15:rc3:*:*:*:*:*:* cpe:2.3:o:linux:linux_kernel:6.15:rc4:*:*:*:*:*:* cpe:2.3:o:linux:linux_kernel:6.15:rc5:*:*:*:*:*:* |
|
| Vendors & Products |
Debian
Debian debian Linux |
|
| Metrics |
cvssV3_1
|
cvssV3_1
|
Mon, 03 Nov 2025 20:30:00 +0000
| Type | Values Removed | Values Added |
|---|---|---|
| References |
|
Mon, 14 Jul 2025 13:45:00 +0000
| Type | Values Removed | Values Added |
|---|---|---|
| Metrics |
epss
|
epss
|
Thu, 22 May 2025 02:45:00 +0000
| Type | Values Removed | Values Added |
|---|---|---|
| References |
| |
| Metrics |
threat_severity
|
cvssV3_1
|
Tue, 20 May 2025 16:15:00 +0000
| Type | Values Removed | Values Added |
|---|---|---|
| Description | In the Linux kernel, the following vulnerability has been resolved: x86/mm: Eliminate window where TLB flushes may be inadvertently skipped tl;dr: There is a window in the mm switching code where the new CR3 is set and the CPU should be getting TLB flushes for the new mm. But should_flush_tlb() has a bug and suppresses the flush. Fix it by widening the window where should_flush_tlb() sends an IPI. Long Version: === History === There were a few things leading up to this. First, updating mm_cpumask() was observed to be too expensive, so it was made lazier. But being lazy caused too many unnecessary IPIs to CPUs due to the now-lazy mm_cpumask(). So code was added to cull mm_cpumask() periodically[2]. But that culling was a bit too aggressive and skipped sending TLB flushes to CPUs that need them. So here we are again. === Problem === The too-aggressive code in should_flush_tlb() strikes in this window: // Turn on IPIs for this CPU/mm combination, but only // if should_flush_tlb() agrees: cpumask_set_cpu(cpu, mm_cpumask(next)); next_tlb_gen = atomic64_read(&next->context.tlb_gen); choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush); load_new_mm_cr3(need_flush); // ^ After 'need_flush' is set to false, IPIs *MUST* // be sent to this CPU and not be ignored. this_cpu_write(cpu_tlbstate.loaded_mm, next); // ^ Not until this point does should_flush_tlb() // become true! should_flush_tlb() will suppress TLB flushes between load_new_mm_cr3() and writing to 'loaded_mm', which is a window where they should not be suppressed. Whoops. === Solution === Thankfully, the fuzzy "just about to write CR3" window is already marked with loaded_mm==LOADED_MM_SWITCHING. Simply checking for that state in should_flush_tlb() is sufficient to ensure that the CPU is targeted with an IPI. This will cause more TLB flush IPIs. But the window is relatively small and I do not expect this to cause any kind of measurable performance impact. Update the comment where LOADED_MM_SWITCHING is written since it grew yet another user. Peter Z also raised a concern that should_flush_tlb() might not observe 'loaded_mm' and 'is_lazy' in the same order that switch_mm_irqs_off() writes them. Add a barrier to ensure that they are observed in the order they are written. | |
| Title | x86/mm: Eliminate window where TLB flushes may be inadvertently skipped | |
| References |
|
|
Status: PUBLISHED
Assigner: Linux
Published:
Updated: 2026-05-11T21:18:38.788Z
Reserved: 2025-04-16T04:51:23.974Z
Link: CVE-2025-37964
No data.
Status : Analyzed
Published: 2025-05-20T16:15:34.683
Modified: 2025-12-16T20:30:11.323
Link: CVE-2025-37964
OpenCVE Enrichment
Updated: 2025-06-24T09:44:23Z
Debian DLA
Debian DSA
EUVD
Ubuntu USN