drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
routine - such function is expected to be called only after the
respective init function - drm_sched_init() - was executed successfully.
Happens that we faced a driver probe failure in the Steam Deck
recently, and the function drm_sched_fini() was called even without
its counter-part had been previously called, causing the following oops:
amdgpu: probe of 0000:04:00.0 failed with error -110
BUG: kernel NULL pointer dereference, address: 0000000000000090
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[...]
Call Trace:
<TASK>
amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
devm_drm_dev_init_release+0x49/0x70
[...]
To prevent that, check if the drm_sched was properly initialized for a
given ring before calling its fini counter-part.
Notice ideally we'd use sched.ready for that; such field is set as the latest
thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
field - in the above oops for example, it was a GFX ring causing the crash, and
the sched.ready field was set to true in the ring init routine, regardless of
the state of the DRM scheduler. Hence, we ended-up using sched.ops as per
Christian's suggestion [0], and also removed the no_scheduler check [1].
[0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/
[1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/
Analysis and contextual insights are available on OpenCVE Cloud.
No vendor fix or workaround currently provided.
Additional remediation guidance may be available on OpenCVE Cloud.
Tracking
Sign in to view the affected projects.
No advisories yet.
Wed, 02 Apr 2025 15:15:00 +0000
| Type | Values Removed | Values Added |
|---|---|---|
| First Time appeared |
Linux
Linux linux Kernel |
|
| CPEs | cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* cpe:2.3:o:linux:linux_kernel:6.2:rc1:*:*:*:*:*:* cpe:2.3:o:linux:linux_kernel:6.2:rc2:*:*:*:*:*:* cpe:2.3:o:linux:linux_kernel:6.2:rc3:*:*:*:*:*:* cpe:2.3:o:linux:linux_kernel:6.2:rc4:*:*:*:*:*:* cpe:2.3:o:linux:linux_kernel:6.2:rc5:*:*:*:*:*:* cpe:2.3:o:linux:linux_kernel:6.2:rc6:*:*:*:*:*:* cpe:2.3:o:linux:linux_kernel:6.2:rc7:*:*:*:*:*:* |
|
| Vendors & Products |
Linux
Linux linux Kernel |
Wed, 06 Nov 2024 21:15:00 +0000
| Type | Values Removed | Values Added |
|---|---|---|
| Metrics |
cvssV3_1
|
cvssV3_1
|
Status: PUBLISHED
Assigner: Linux
Published:
Updated: 2026-05-11T19:32:14.970Z
Reserved: 2024-05-21T15:19:24.233Z
Link: CVE-2023-52738
Updated: 2024-08-02T23:11:36.070Z
Status : Analyzed
Published: 2024-05-21T16:15:13.740
Modified: 2025-04-02T14:51:01.783
Link: CVE-2023-52738
OpenCVE Enrichment
No data.