Andi Shyti [Mon, 17 Jun 2024 18:42:42 +0000 (20:42 +0200)]
drm/i915/gem: Return NULL instead of '0'
Commit 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup
warning") returns '0' from i915_gem_stolen_lmem_setup(), but it's
supposed to return a pointer to the intel_memory_region
structure.
Sparse complains with the following message:
>> drivers/gpu/drm/i915/gem/i915_gem_stolen.c:943:32: sparse: sparse:
Using plain integer as NULL pointer
In the past, there were similar failures reported by CI from other IGT
tests, observed on other platforms.
Before commit 63baf4f3d587 ("drm/i915/gt: Only wait for GPU activity
before unbinding a GGTT fence"), i915_vma_revoke_fence() was waiting for
idleness of vma->active via fence_update(). That commit introduced
vma->fence->active in order for the fence_update() to be able to wait
selectively on that one instead of vma->active since only idleness of
fence registers was needed. But then, another commit 0d86ee35097a
("drm/i915/gt: Make fence revocation unequivocal") replaced the call to
fence_update() in i915_vma_revoke_fence() with only fence_write(), and
also added that GEM_BUG_ON(!i915_active_is_idle(&fence->active)) in front.
No justification was provided on why we might then expect idleness of
vma->fence->active without first waiting on it.
The issue can be potentially caused by a race among revocation of fence
registers on one side and sequential execution of signal callbacks invoked
on completion of a request that was using them on the other, still
processed in parallel to revocation of those fence registers. Fix it by
waiting for idleness of vma->fence->active in i915_vma_revoke_fence().
Jonathan Cavitt [Mon, 22 Apr 2024 13:59:59 +0000 (06:59 -0700)]
drm/i915/gem: Downgrade stolen lmem setup warning
In the case where lmem_size < dsm_base, hardware is reporting that
stolen lmem is unusable. In this case, instead of throwing a warning,
we can continue execution as normal by disabling stolen LMEM support.
For example, this change will allow the following error report from
ATS-M to no longer apply:
drm/i915/gt: Delete the live_hearbeat_fast selftest
The test is trying to push the heartbeat frequency to the limit, which
might sometimes fail. Such a failure does not provide valuable
information, because it does not indicate that there is something
necessarily wrong with either the driver or the hardware.
Remove the test to prevent random, unnecessary failures from appearing
in CI.
Andi Shyti [Thu, 23 May 2024 23:58:53 +0000 (01:58 +0200)]
drm/i915: Increase FLR timeout from 3s to 9s
Following the guidelines it takes 3 seconds to perform an FLR
reset. Let's give it a bit more slack because this time can
change depending on the platform and on the firmware
Andi Shyti [Fri, 17 May 2024 09:06:16 +0000 (11:06 +0200)]
drm/i915/gt: Fix CCS id's calculation for CCS mode setting
The whole point of the previous fixes has been to change the CCS
hardware configuration to generate only one stream available to
the compute users. We did this by changing the info.engine_mask
that is set during device probe, reset during the detection of
the fused engines, and finally reset again when choosing the CCS
mode.
We can't use the engine_mask variable anymore, as with the
current configuration, it imposes only one CCS no matter what the
hardware configuration is.
Before changing the engine_mask for the third time, save it and
use it for calculating the CCS mode.
After the previous changes, the user reported a performance drop
to around 1/4. We have tested that the compute operations, with
the current patch, have improved by the same factor.
Fixes: 6db31251bb26 ("drm/i915/gt: Enable only one CCS for compute workload") Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> Cc: Gnattu OC <gnattuoc@me.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Tested-by: Jian Ye <jian.ye@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Tested-by: Gnattu OC <gnattuoc@me.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240517090616.242529-1-andi.shyti@linux.intel.com
With gcc-7 and earlier, there are lots of warnings like
In file included from <command-line>:0:0:
In function '__guc_context_policy_add_priority.isra.66',
inlined from '__guc_context_set_prio.isra.67' at drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3292:3,
inlined from 'guc_context_set_prio' at drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3320:2:
include/linux/compiler_types.h:399:38: error: call to '__compiletime_assert_631' declared with attribute error: FIELD_PREP: mask is not constant
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
...
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:2422:3: note: in expansion of macro 'FIELD_PREP'
FIELD_PREP(GUC_KLV_0_KEY, GUC_CONTEXT_POLICIES_KLV_ID_##id) | \
^~~~~~~~~~
Make sure that GUC_KLV_0_KEY is an unsigned value to avoid the warning.
Fixes: 77b6f79df66e ("drm/i915/guc: Update to GuC version 69.0.3") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240430164809.482131-1-julia.filipchuk@intel.com
Tvrtko Ursulin [Tue, 14 May 2024 14:59:39 +0000 (15:59 +0100)]
drm/i915: Support replaying GPU hangs with captured context image
When debugging GPU hangs Mesa developers are finding it useful to replay
the captured error state against the simulator. But due various simulator
limitations which prevent replicating all hangs, one step further is being
able to replay against a real GPU.
This is almost doable today with the missing part being able to upload the
captured context image into the driver state prior to executing the
uploaded hanging batch and all the buffers.
To enable this last part we add a new context parameter called
I915_CONTEXT_PARAM_CONTEXT_IMAGE. It follows the existing SSEU
configuration pattern of being able to select which context to apply
against, paired with the actual image and its size.
Since this is adding a new concept of debug only uapi, we hide it behind
a new kconfig option and also require activation with a module parameter.
Together with a warning banner printed at driver load, all those combined
should be sufficient to guard against inadvertently enabling the feature.
In terms of implementation we allow the legacy context set param to be
used since that removes the need to record the per context data in the
proto context, while still allowing flexibility of specifying context
images for any context.
Mesa MR using the uapi can be seen at:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27594
v2:
* Fix whitespace alignment as per checkpatch.
* Added warning on userspace misuse.
* Rebase for extracting ce->default_state shadowing.
v3:
* Rebase for I915_CONTEXT_PARAM_LOW_LATENCY.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Carlos Santa <carlos.santa@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Tested-by: Carlos Santa <carlos.santa@intel.com> Signed-off-by: Tvrtko Ursulin <tursulin@igalia.com> Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://patchwork.freedesktop.org/patch/msgid/20240514145939.87427-2-tursulin@igalia.com
drm/tests: Add a unit test for range bias allocation
Allocate cleared blocks in the bias range when the DRM
buddy's clear avail is zero. This will validate the bias
range allocation in scenarios like system boot when no
cleared blocks are available and exercise the fallback
path too. The resulting blocks should always be dirty.
v1:(Matthew)
- move the size to the variable declaration section.
- move the mm.clear_avail init to allocator init.
drm/buddy: Fix the range bias clear memory allocation issue
Problem statement: During the system boot time, an application request
for the bulk volume of cleared range bias memory when the clear_avail
is zero, we dont fallback into normal allocation method as we had an
unnecessary clear_avail check which prevents the fallback method leads
to fb allocation failure following system goes into unresponsive state.
Solution: Remove the unnecessary clear_avail check in the range bias
allocation function.
v2: add a kunit for this corner case (Daniel Vetter)
Chris Wilson [Tue, 23 Apr 2024 16:23:10 +0000 (18:23 +0200)]
drm/i915/gt: Disarm breadcrumbs if engines are already idle
The breadcrumbs use a GT wakeref for guarding the interrupt, but are
disarmed during release of the engine wakeref. This leaves a hole where
we may attach a breadcrumb just as the engine is parking (after it has
parked its breadcrumbs), execute the irq worker with some signalers still
attached, but never be woken again.
That issue manifests itself in CI with IGT runner timeouts while tests
are waiting indefinitely for release of all GT wakerefs.
Dave Airlie [Fri, 10 May 2024 00:22:58 +0000 (10:22 +1000)]
Merge tag 'drm-msm-next-2024-05-07' of https://gitlab.freedesktop.org/drm/msm into drm-next
Updates for v6.10
Core:
- Switched to generating register header files during build process
instead of shipping pre-generated headers
- Merged DPU and MDP4 format databases.
DP:
- Stop using compat string to distinguish DP and eDP cases
- Added support for X Elite platform (X1E80100)
- Reworked DP aux/audio support
- Added SM6350 DP to the bindings (no driver changes, using SM8350
as a fallback compat)
Lucas De Marchi [Mon, 6 May 2024 14:19:17 +0000 (07:19 -0700)]
drm/xe/ads: Use flexible-array
Zero-length arrays are deprecated and flexible arrays should be
used instead: https://www.kernel.org/doc/html/v6.9-rc7/process/deprecated.html#zero-length-and-one-element-arrays
Reported-by: kernel test robot <lkp@intel.com> Reported-by: Julia Lawall <julia.lawall@inria.fr> Closes: https://lore.kernel.org/r/202405051824.AmjAI5Pg-lkp@intel.com/ Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240506141917.205714-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit ee7284230644e21fef0e38fc5bf8f907b6bb7f7c) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Ville Syrjälä [Thu, 2 May 2024 12:14:21 +0000 (15:14 +0300)]
drm/i915: Fix HAS_REGION() usage in intel_gt_probe_lmem()
HAS_REGION() takes a bitmask, not the region ID. This causes the
GEM_BUG_ON() to assert that the SMEM region is available rather
than the intended LMEM region. No real harm since SMEM is always
available, but also not checking what was intended.
There was a patch supposed to fix an issue of illegal attempts to free a
still active i915 VMA object when parking a GT believed to be idle,
reported by CI on 2-GT Meteor Lake. As a solution, an extra wakeref for
a Primary GT was acquired from i915_gem_do_execbuffer() -- see commit f56fe3e91787 ("drm/i915: Fix a VMA UAF for multi-gt platform").
However, that fix occurred insufficient -- the issue was still reported by
CI. That wakeref was released on exit from i915_gem_do_execbuffer(), then
potentially before completion of the request and deactivation of its
associated VMAs. Moreover, CI reports indicated that single-GT platforms
also suffered sporadically from the same race.
Since that issue was fixed by another commit f3c71b2ded5c ("drm/i915/vma:
Fix UAF on destroy against retire race"), the changes introduced by that
insufficient fix were dropped as no longer useful. However, that series
resulted in another VMA UAF scenario now being triggered in CI.
We defer actually closing, unbinding and destroying a VMA until next idle
point, or until the object is freed in the meantime. By postponing the
unbind, we allow for the VMA to be reopened by the client, avoiding the
work required to rebind the VMA.
Starting from commit b0647a5e79b1 ("drm/i915: Avoid live-lock with
i915_vma_parked()"), we assume that as long as a GT is held idle, no VMA
would be reopened while we destroy them. That assumption is no longer
true in multi-GT configurations, where a VMA we reopen may be handled by a
GT different from the one that we already keep active via its engine while
we set up an execbuf request.
Restoring the extra GT0 PM wakeref removed from i915_gem_do_execbuffer()
processing path seems to fix this issue.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10608 Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Nirmoy Das <nirmoy.das@linux.intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@linux.intel.com> Fixes: 1f33dc0c1189 ("drm/i915: Remove extra multi-gt pm-references") Link: https://patchwork.freedesktop.org/patch/msgid/20240506180253.96858-2-janusz.krzysztofik@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We don't need to run the validation of the XML files if we are just
compiling the kernel. Skip the validation unless the user enables
corresponding Kconfig option. This removes a warning from gen_header.py
about lxml being not installed.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/all/20240409120108.2303d0bd@canb.auug.org.au/ Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/592558/ Signed-off-by: Rob Clark <robdclark@chromium.org>
Rob Clark [Sat, 4 May 2024 16:31:13 +0000 (09:31 -0700)]
drm/msm/a6xx: Cleanup indexed regs const'ness
These tables were made non-const in commit 3cba4a2cdff3 ("drm/msm/a6xx:
Update ROQ size in coredump") in order to avoid powering up the GPU when
reading back a devcoredump. Instead let's just stash the count that is
potentially read from hw in struct a6xx_gpu_state_obj, and make the
tables const again.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/592699/
Connor Abbott [Fri, 3 May 2024 13:42:34 +0000 (14:42 +0100)]
drm/msm: Add devcoredump support for a750
Add an a750 case to the various places where we choose a list of
registers.
Patchwork: https://patchwork.freedesktop.org/patch/592519/
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/592519 Signed-off-by: Rob Clark <robdclark@chromium.org>
Connor Abbott [Fri, 3 May 2024 13:42:33 +0000 (14:42 +0100)]
drm/msm: Adjust a7xx GBIF debugbus dumping
Use the kgsl-style list of indices, because this is about to change for
a750 and we want to reuse the downstream header directly.
Patchwork: https://patchwork.freedesktop.org/patch/592520/
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/592520 Signed-off-by: Rob Clark <robdclark@chromium.org>
Connor Abbott [Fri, 3 May 2024 13:42:32 +0000 (14:42 +0100)]
drm/msm: Update a6xx registers XML
Update to Mesa commit e82d70d472cc ("freedreno/a7xx: Add
A7XX_HLSQ_DP_STR location from kgsl").
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/592518/ Signed-off-by: Rob Clark <robdclark@chromium.org>
Connor Abbott [Fri, 3 May 2024 13:42:31 +0000 (14:42 +0100)]
drm/msm: Fix imported a750 snapshot header for upstream
Add A7XX prefixes necessary because we use the same code for dumping
a6xx and a7xx, fix register name prefixes for upstream, and use the
upstream header.
Patchwork: https://patchwork.freedesktop.org/patch/592517/
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/592517 Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/592516/ Signed-off-by: Rob Clark <robdclark@chromium.org>
Konrad Dybcio [Mon, 22 Apr 2024 22:41:32 +0000 (00:41 +0200)]
MAINTAINERS: Add Konrad Dybcio as a reviewer for the Adreno driver
Add myself as a reviewer for Adreno driver changes.
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/590705/ Signed-off-by: Rob Clark <robdclark@chromium.org>
Konrad Dybcio [Mon, 22 Apr 2024 22:41:31 +0000 (00:41 +0200)]
MAINTAINERS: Add a separate entry for Qualcomm Adreno GPU drivers
The msm driver is.. gigantic and covers display hardware (incl. things
concerning (e)DP, DSI, HDMI), as well as the entire lineup of Adreno
GPUs (with hw bringup, memory mappings, userspace interaction etc.).
Because of that, people listed as M:/R: receive patches concerning
drivers for any part of the display block OR the GPU. Separate the
latter, as it's both a functionally separate block and is of
interest to different folks.
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/590704/ Signed-off-by: Rob Clark <robdclark@chromium.org>
Instead of relying on handwavy null checks down the cleanup chain,
explicitly de-allocate the LLC data and free a6xx_gpu instead.
Fixes: 76efc2453d0e ("drm/msm/gpu: Fix crash during system suspend after unbind") Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/588919/ Signed-off-by: Rob Clark <robdclark@chromium.org>
drm/msm/adreno: fix CP cycles stat retrieval on a7xx
a7xx_submit() should use the a7xx variant of the RBBM_PERFCTR_CP register
for retrieving the CP cycles value before and after the submitted command
stream execution.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com> Fixes: af66706accdf ("drm/msm/a6xx: Add skeleton A7xx support")
Patchwork: https://patchwork.freedesktop.org/patch/588445/ Signed-off-by: Rob Clark <robdclark@chromium.org>
Zan Dobersek [Thu, 29 Feb 2024 07:49:11 +0000 (08:49 +0100)]
drm/msm/a7xx: allow writing to CP_BV counter selection registers
In addition to the CP_PERFCTR_CP_SEL register range, allow writes to the
CP_BV_PERFCTR_CP_SEL registers in the 0x8e0-0x8e6 range for profiling
purposes of tools like fdperf and perfetto.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Patchwork: https://patchwork.freedesktop.org/patch/580548/
[fixup a730_protect size] Signed-off-by: Rob Clark <robdclark@chromium.org>
Sean Anderson [Fri, 8 Mar 2024 20:47:41 +0000 (15:47 -0500)]
drm: zynqmp_dpsub: Always register bridge
We must always register the DRM bridge, since zynqmp_dp_hpd_work_func
calls drm_bridge_hpd_notify, which in turn expects hpd_mutex to be
initialized. We do this before zynqmp_dpsub_drm_init since that calls
drm_bridge_attach. This fixes the following lockdep warning:
The regulator_disable() added by the original commit solves one kind of
regulator imbalance but adds another one as it allows the regulator to be
disabled one more time than it is enabled in the following scenario:
The use case for introducing the additional regulator_disable() was
removing the module for debugging (see link below for the discussion). If
the module is removed after a .atomic_pre_enable, i.e. with an active
pipeline from the DRM point of view, .atomic_disable is not called and thus
the regulator would not be disabled.
According to the discussion however there is no actual use case for
removing the module with an active pipeline, except for
debugging/development.
On the other hand, the occurrence of a PLL lock failure is possible due to
any physical reason (e.g. a temporary hardware failure for electrical
reasons) so handling it gracefully should be supported. As there is no way
for .atomic[_pre]_enable to report an error to the core, the only clean way
to support it is calling regulator_disabled() only in .atomic_disable,
unconditionally, as it was before.
drm/fbdev-generic: Do not set physical framebuffer address
Framebuffer memory is allocated via vzalloc() from non-contiguous
physical pages. The physical framebuffer start address is therefore
meaningless. Do not set it.
The value is not used within the kernel and only exported to userspace
on dedicated ARM configs. No functional change is expected.
v2:
- refer to vzalloc() in commit message (Javier)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: a5b44c4adb16 ("drm/fbdev-generic: Always use shadow buffering") Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Zack Rusin <zackr@vmware.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: <stable@vger.kernel.org> # v6.4+ Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Reviewed-by: Zack Rusin <zack.rusin@broadcom.com> Reviewed-by: Sui Jingfeng <sui.jingfeng@linux.dev> Tested-by: Sui Jingfeng <sui.jingfeng@linux.dev> Acked-by: Maxime Ripard <mripard@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240419083331.7761-2-tzimmermann@suse.de
(cherry picked from commit 73ef0aecba78aa9ebd309b10b6cd17d94e632892) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Boris Brezillon [Tue, 30 Apr 2024 11:37:27 +0000 (13:37 +0200)]
drm/panthor: Fix the FW reset logic
In the post_reset function, if the fast reset didn't succeed, we
are not clearing the fast_reset flag, which prevents firmware
sections from being reloaded. While at it, use panthor_fw_stop()
instead of manually writing DISABLE to the MCU_CONTROL register.
Boris Brezillon [Thu, 2 May 2024 15:52:48 +0000 (17:52 +0200)]
drm/panthor: Make sure we handle 'unknown group state' case properly
When we check for state values returned by the FW, we only cover part of
the 0:7 range. Make sure we catch FW inconsistencies by adding a default
to the switch statement, and flagging the group state as unknown in that
case.
When an unknown state is detected, we trigger a reset, and consider the
group as unusable after that point, to prevent the potential corruption
from creeping in other places if we continue executing stuff on this
context.
v2:
- Add Steve's R-b
- Fix commit message
Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/dri-devel/3b7fd2f2-679e-440c-81cd-42fc2573b515@moroto.mountain/T/#u Suggested-by: Steven Price <steven.price@arm.com> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240502155248.1430582-1-boris.brezillon@collabora.com
drm: move DRM-related CONFIG options into DRM submenu
When you create a submenu using the 'menu' syntax, there is no
ambiguity about its end because the code between 'menu' and 'endmenu'
becomes the submenu.
In contrast, 'menuconfig' does not have the corresponding end marker.
Instead, the end of the submenu is inferred from symbol dependencies.
This is detailed in Documentation/kbuild/kconfig-language.rst, starting
line 348. It outlines two methods to place the code under the submenu:
(1) Open an if-block immediately after 'menuconfig', enclosing the
submenu content within it
(2) Add 'depends on' to every symbol intended for the submenu
Many subsystems opt for (1) because it reliably maintains the submenu
structure.
The DRM subsystem adopts (2). The submenu ends when the sequence of
'depends on DRM' breaks. It can be confirmed by running a GUI frontend
such as 'make menuconfig' and visiting the DRM menu:
< > Direct Rendering Manager (XFree86 4.1.0 and higher DRI support) ----
If you toggle this, you will notice most of the DRM-related options
appear below it, not in the submenu.
I highly recommend the approach (1). Obviously, (2) is not reliable,
as the submenu breaks whenever someone forgets to add 'depends on DRM'.
This commit encloses the entire DRM configuration with 'if DRM' and
'endif', except for DRM_PANEL_ORIENTATION_QUIRKS.
Note:
Now, 'depends on DRM' properties inside the if-block are all redundant.
I leave it as follow-up cleanups.
Revert "drm/display: Make all helpers visible and switch to depends on"
This reverts commit d674858ff979550a0e97b4ac766f2640f0d9d7e7, as helper
code should always be selected by the driver that needs it, for the
convenience of the final user configuring a kernel.
The user who configures a kernel should not need to know which helpers
are needed for the driver he is interested in. Making a driver depend
on helper code means that the user needs to know which helpers to enable
first, which is very user-unfriendly.
This reverts commit c0e0f139354c01e0213204e4a96e7076e5a3e396, as helper
code should always be selected by the driver that needs it, for the
convenience of the final user configuring a kernel.
The user who configures a kernel should not need to know which helpers
are needed for the driver he is interested in. Making a driver depend
on helper code means that the user needs to know which helpers to enable
first, which is very user-unfriendly.
Revert "drm: Switch DRM_DISPLAY_HELPER to depends on"
This reverts commit e075e496f516bf92bc0cbaf94d64e8d4a6b58321, as helper
code should always be selected by the driver that needs it, for the
convenience of the final user configuring a kernel.
The user who configures a kernel should not need to know which helpers
are needed for the driver he is interested in. Making a driver depend
on helper code means that the user needs to know which helpers to enable
first, which is very user-unfriendly.
Revert "drm: Switch DRM_DISPLAY_DP_AUX_BUS to depends on"
This reverts commit 4d15125d7fe637f401e64e33c99513adf6586fdd, as helper
code should always be selected by the driver that needs it, for the
convenience of the final user configuring a kernel.
The user who configures a kernel should not need to know which helpers
are needed for the driver he is interested in. Making a driver depend
on helper code means that the user needs to know which helpers to enable
first, which is very user-unfriendly.
Revert "drm: Switch DRM_DISPLAY_DP_HELPER to depends on"
This reverts commit 0323287de87d7e6e9c22c57d7440aa353a2298d0, as helper
code should always be selected by the driver that needs it, for the
convenience of the final user configuring a kernel.
The user who configures a kernel should not need to know which helpers
are needed for the driver he is interested in. Making a driver depend
on helper code means that the user needs to know which helpers to enable
first, which is very user-unfriendly.
Revert "drm: Switch DRM_DISPLAY_HDCP_HELPER to depends on"
This reverts commit 3166e7e6d935caaef07605a5c90773fbf9ffeaf4, as helper
code should always be selected by the driver that needs it, for the
convenience of the final user configuring a kernel.
The user who configures a kernel should not need to know which helpers
are needed for the driver he is interested in. Making a driver depend
on helper code means that the user needs to know which helpers to enable
first, which is very user-unfriendly.
Revert "drm: Switch DRM_DISPLAY_HDMI_HELPER to depends on"
This reverts commit f6d2dc03fa8546b284dd8c1af027d9fac5725921, as helper
code should always be selected by the driver that needs it, for the
convenience of the final user configuring a kernel.
The user who configures a kernel should not need to know which helpers
are needed for the driver he is interested in. Making a driver depend
on helper code means that the user needs to know which helpers to enable
first, which is very user-unfriendly.
In order to detect duplicate implementations for the same workaround,
early in the implementation of RTP it was decided to error out even if
the values set are exactly the same. With the introduction of 18034896535
in commit 74671d23ca18 ("drm/xe/xe2: Add workaround 18034896535"), LNL
stepping with graphics stepping A1 now gives the following error on
module load:
Matthew Auld [Tue, 23 Apr 2024 07:47:23 +0000 (08:47 +0100)]
drm/xe/vm: prevent UAF in rebind_work_func()
We flush the rebind worker during the vm close phase, however in places
like preempt_fence_work_func() we seem to queue the rebind worker
without first checking if the vm has already been closed. The concern
here is the vm being closed with the worker flushed, but then being
rearmed later, which looks like potential uaf, since there is no actual
refcounting to track the queued worker. We can't take the vm->lock here
in preempt_rebind_work_func() to first check if the vm is closed since
that will deadlock, so instead flush the worker again when the vm
refcount reaches zero.
v2:
- Grabbing vm->lock in the preempt worker creates a deadlock, so
checking the closed state is tricky. Instead flush the worker when
the refcount reaches zero. It should be impossible to queue the
preempt worker without already holding vm ref.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1676 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1591 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1364 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1304 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1249 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423074721.119633-4-matthew.auld@intel.com
(cherry picked from commit 3d44d67c441a9fe6f81a1d705f7de009a32a5b35) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Thomas Hellström [Tue, 23 Apr 2024 12:11:14 +0000 (14:11 +0200)]
drm/xe: Fix unexpected backmerge results
The recent backmerge from drm-next to drm-xe-next brought with it
some silent unexpected results. One code snippet was added twice
and a partial revert had merge errors. Fix that up to
reinstate the affected code as it was before the backmerge.
Display i915:
- More initial work to make display code more independent from i915 (Jani)
- Convert i915/xe fbdev to DRM client (Thomas Zimmermann)
- VLV/CHV DPIO register cleanup (Ville)
Ville Syrjälä [Mon, 22 Apr 2024 08:34:56 +0000 (11:34 +0300)]
drm/i915/dpio: Clean up the vlv/chv PHY register bits
Use REG_BIT() & co. for the vlv/chv DPIO PHY registers.
Note that DPIO_BIAS_CURRENT_CTL_SHIFT was incorrectly defined
to be 21 wheres 20 is the correct value. It is not used in the
code though so didn't bother splitting to a separate patch.
Ville Syrjälä [Mon, 22 Apr 2024 08:34:54 +0000 (11:34 +0300)]
drm/i915/dpio: Rename a few CHV DPIO PHY registers
Drop the leading underscore from the CHV PHY common lane
register definitions. We use these directly from actual
code so the underscore here is misleading as usually it indicates
an intermediate define that shouldn't be used directly.
Ville Syrjälä [Mon, 22 Apr 2024 08:34:53 +0000 (11:34 +0300)]
drm/i915/dpio: Give VLV DPIO group register a clearer name
Include _GRP in VLV DPIO PHY group access register define
names. Makes it more obvious where the accesses will land.
Also matches the naming used by BXT already.
Ville Syrjälä [Mon, 22 Apr 2024 08:34:52 +0000 (11:34 +0300)]
drm/i915/dpio: Derive the phy from the port rather than pipe in encoder hooks
In the encoder hooks we are dealing primarily with the encoder,
so derive the DPIO PHY from the encoder rather than the pipe.
Technically this doesn't matter as we can't cross connect
pipes<->port across PHY boundaries, but it does conveny the
intention more accurately.
Ville Syrjälä [Mon, 22 Apr 2024 08:34:51 +0000 (11:34 +0300)]
drm/i915/dpio: s/pipe/ch/
Stop using 'pipe' directly as the DPIO PHY channel. This
does happen to work on VLV since it just has the one PHY
with CH0==pipe A and CH1==pipe B. But explicitly converting
the thing to the right enum makes the whole thing less
confusing.
VLV_PLL_DW9_BCAST is actually VLV_PCS_DW17_BCAST. The address
does kinda look like it goes to the PLL block on a first glance,
but broadcast is special and doesn't even exist for the PLL
(only PCS and TX have it).
The fact that we use a broadcast write here is a bit sketchy
IMO since we're now blasting the register to all PCS splines
across the whole PHY. So the PCS registers in the other channel
(ie. other pipe/port) will also be written. But I guess the
fact that we always write the same value should make this a nop
even if the other channel is already enabled (assuming the VBIOS/GOP
didn't screw up and use some other value...).
drm/i915/gt: Automate CCS Mode setting during engine resets
We missed setting the CCS mode during resume and engine resets.
Create a workaround to be added in the engine's workaround list.
This workaround sets the XEHP_CCS_MODE value at every reset.
The issue can be reproduced by running:
$ clpeak --kernel-latency
Without resetting the CCS mode, we encounter a fence timeout:
Fence expiration time out i915-0000:03:00.0:clpeak[2387]:2!
Fixes: 2bebae0112b1 ("drm/i915/gt: Enable only one CCS for compute workload") Reported-by: Gnattu OC <gnattuoc@me.com> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10895 Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: <stable@vger.kernel.org> # v6.2+ Tested-by: Gnattu OC <gnattuoc@me.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Tested-by: Krzysztof Gibala <krzysztof.gibala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240426000723.229296-1-andi.shyti@linux.intel.com
Dave Airlie [Tue, 30 Apr 2024 04:20:31 +0000 (14:20 +1000)]
Merge tag 'drm-intel-gt-next-2024-04-26' of https://anongit.freedesktop.org/git/drm/drm-intel into drm-next
UAPI Changes:
- drm/i915/guc: Use context hints for GT frequency
Allow user to provide a low latency context hint. When set, KMD
sends a hint to GuC which results in special handling for this
context. SLPC will ramp the GT frequency aggressively every time
it switches to this context. The down freq threshold will also be
lower so GuC will ramp down the GT freq for this context more slowly.
We also disable waitboost for this context as that will interfere with
the strategy.
We need to enable the use of SLPC Compute strategy during init, but
it will apply only to contexts that set this bit during context
creation.
Userland can check whether this feature is supported using a new param-
I915_PARAM_HAS_CONTEXT_FREQ_HINT. This flag is true for all guc submission
enabled platforms as they use SLPC for frequency management.
The Mesa usage model for this flag is here -
https://gitlab.freedesktop.org/sushmave/mesa/-/commits/compute_hint
- drm/i915/gt: Enable only one CCS for compute workload
Enable only one CCS engine by default with all the compute sices
allocated to it.
While generating the list of UABI engines to be exposed to the
user, exclude any additional CCS engines beyond the first
instance
***
NOTE: This W/A will make all DG2 SKUs appear like single CCS SKUs by
default to mitigate a hardware bug. All the EUs will still remain
usable, and all the userspace drivers have been confirmed to be able
to dynamically detect the change in number of CCS engines and adjust.
For the smaller percent of applications that get perf benefit from
letting the userspace driver dispatch across all 4 CCS engines we will
be introducing a sysfs control as a later patch to choose 4 CCS each
with 25% EUs (or 50% if 2 CCS).
- Add new and fix to existing workarounds: Wa_14018575942 (MTL),
Wa_16019325821 (Gen12.70), Wa_14019159160 (MTL), Wa_16015675438,
Wa_14020495402 (Gen12.70) (Tejas, John, Lucas)
- Fix UAF on destroy against retire race and remove two earlier
partial fixes (Janusz)
- Limit the reserved VM space to only the platforms that need it (Andi)
- Reset queue_priority_hint on parking for execlist platforms (Chris)
- Fix gt reset with GuC submission is disabled (Nirmoy)
- Correct capture of EIR register on hang (John)
- Remove usage of the deprecated ida_simple_xx() API
- Refactor confusing __intel_gt_reset() (Nirmoy)
- Fix the fix for GuC reset lock confusion (John)
- Simplify/extend platform check for Wa_14018913170 (John)
- Replace dev_priv with i915 (Andi)
- Add and use gt_to_guc() wrapper (Andi)
- Remove bogus null check (Rodrigo, Dan)
Merge tag 'sched-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
- Fix EEVDF corner cases
- Fix two nohz_full= related bugs that can cause boot crashes
and warnings
* tag 'sched-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/isolation: Fix boot crash when maxcpus < first housekeeping CPU
sched/isolation: Prevent boot crash when the boot CPU is nohz_full
sched/eevdf: Prevent vlag from going out of bounds in reweight_eevdf()
sched/eevdf: Fix miscalculation in reweight_entity() when se is not curr
sched/eevdf: Always update V if se->on_rq when reweighting
Merge tag 'x86-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Make the CPU_MITIGATIONS=n interaction with conflicting
mitigation-enabling boot parameters a bit saner.
- Re-enable CPU mitigations by default on non-x86
- Fix TDX shared bit propagation on mprotect()
- Fix potential show_regs() system hang when PKE initialization
is not fully finished yet.
- Add the 0x10-0x1f model IDs to the Zen5 range
- Harden #VC instruction emulation some more
* tag 'x86-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu: Ignore "mitigations" kernel parameter if CPU_MITIGATIONS=n
cpu: Re-enable CPU mitigations by default for !X86 architectures
x86/tdx: Preserve shared bit on mprotect()
x86/cpu: Fix check for RDPKRU in __show_regs()
x86/CPU/AMD: Add models 0x10-0x1f to the Zen5 range
x86/sev: Check for MWAITX and MONITORX opcodes in the #VC handler
sched/isolation: Fix boot crash when maxcpus < first housekeeping CPU
housekeeping_setup() checks cpumask_intersects(present, online) to ensure
that the kernel will have at least one housekeeping CPU after smp_init(),
but this doesn't work if the maxcpus= kernel parameter limits the number of
processors available after bootup.
For example, a kernel with "maxcpus=2 nohz_full=0-2" parameters crashes at
boot time on a virtual machine with 4 CPUs.
Change housekeeping_setup() to use cpumask_first_and() and check that the
returned CPU number is valid and less than setup_max_cpus.
Another corner case is "nohz_full=0" on a machine with a single CPU or with
the maxcpus=1 kernel argument. In this case non_housekeeping_mask is empty
and tick_nohz_full_setup() makes no sense. And indeed, the kernel hits the
WARN_ON(tick_nohz_full_running) in tick_sched_do_timer().
And how should the kernel interpret the "nohz_full=" parameter? It should
be silently ignored, but currently cpulist_parse() happily returns the
empty cpumask and this leads to the same problem.
Change housekeeping_setup() to check cpumask_empty(non_housekeeping_mask)
and do nothing in this case.
sched/isolation: Prevent boot crash when the boot CPU is nohz_full
Documentation/timers/no_hz.rst states that the "nohz_full=" mask must not
include the boot CPU, which is no longer true after:
08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full").
However after:
aae17ebb53cd ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")
the kernel will crash at boot time in this case; housekeeping_any_cpu()
returns an invalid CPU number until smp_init() brings the first
housekeeping CPU up.
Change housekeeping_any_cpu() to check the result of cpumask_any_and() and
return smp_processor_id() in this case.
This is just the simple and backportable workaround which fixes the
symptom, but smp_processor_id() at boot time should be safe at least for
type == HK_TYPE_TIMER, this more or less matches the tick_do_timer_boot_cpu
logic.
There is no worry about cpu_down(); tick_nohz_cpu_down() will not allow to
offline tick_do_timer_cpu (the 1st online housekeeping CPU).
Fixes: aae17ebb53cd ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work") Reported-by: Chris von Recklinghausen <crecklin@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Phil Auld <pauld@redhat.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240411143905.GA19288@redhat.com Closes: https://lore.kernel.org/all/20240402105847.GA24832@redhat.com/
Merge tag 'rust-fixes-6.9' of https://github.com/Rust-for-Linux/linux
Pull Rust fixes from Miguel Ojeda:
- Soundness: make internal functions generated by the 'module!' macro
inaccessible, do not implement 'Zeroable' for 'Infallible' and
require 'Send' for the 'Module' trait.
- Build: avoid errors with "empty" files and workaround 'rustdoc' ICE.
- Kconfig: depend on '!CFI_CLANG' and avoid selecting 'CONSTRUCTORS'.
- Code docs: remove non-existing key from 'module!' macro example.
- Docs: trivial rendering fix in arch table.
* tag 'rust-fixes-6.9' of https://github.com/Rust-for-Linux/linux:
rust: remove `params` from `module` macro example
kbuild: rust: force `alloc` extern to allow "empty" Rust files
kbuild: rust: remove unneeded `@rustc_cfg` to avoid ICE
rust: kernel: require `Send` for `Module` implementations
rust: phy: implement `Send` for `Registration`
rust: make mutually exclusive with CFI_CLANG
rust: macros: fix soundness issue in `module!` macro
rust: init: remove impl Zeroable for Infallible
docs: rust: fix improper rendering in Arch Support page
rust: don't select CONSTRUCTORS