]> git.dujemihanovic.xyz Git - linux.git/log
linux.git
8 weeks agodrm/amdkfd: APIs to stop/start KFD scheduling
Amber Lin [Mon, 29 Jul 2024 18:22:30 +0000 (14:22 -0400)]
drm/amdkfd: APIs to stop/start KFD scheduling

Provide amdgpu_amdkfd_stop_sched() for amdgpu to stop KFD scheduling
compute work on HIQ. amdgpu_amdkfd_start_sched() resumes the scheduling.
When amdgpu_amdkfd_stop_sched is called, KFD will unmap queues from
runlist. If users send ioctls to KFD to create queues, they'll be added
but those queues won't be mapped to runlist (so not scheduled) until
amdgpu_amdkfd_start_sched is called.

v2: fix build (Alex)

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
8 weeks agodrm/amdgpu/gfx9: Add cleaner shader support for GFX9.4.4 hardware
Srinivasan Shanmugam [Mon, 29 Jul 2024 16:48:45 +0000 (22:18 +0530)]
drm/amdgpu/gfx9: Add cleaner shader support for GFX9.4.4 hardware

This commit extends the cleaner shader feature to support GFX9.4.4
hardware.

The cleaner shader feature is used to clear or initialize certain GPU
resources, such as Local Data Share (LDS), Vector General Purpose
Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). This
operation needs to be performed in isolation, while no other tasks
should be running on the GPU at the same time.

Previously, the cleaner shader feature was implemented for GFX9.4.3
hardware. This commit adds support for GFX9.4.4 hardware by allowing the
cleaner shader to be used with this hardware version.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
8 weeks agodrm/amdgpu/gfx9: Add cleaner shader for GFX9.4.3
Srinivasan Shanmugam [Mon, 29 Jul 2024 16:44:41 +0000 (22:14 +0530)]
drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.3

This commit adds the cleaner shader microcode for GFX9.4.3 GPUs. The
cleaner shader is a piece of GPU code that is used to clear or
initialize certain GPU resources, such as Local Data Share (LDS), Vector
General Purpose Registers (VGPRs), and Scalar General Purpose Registers
(SGPRs).

Clearing these resources is important for ensuring data isolation
between different workloads running on the GPU. Without the cleaner
shader, residual data from a previous workload could potentially be
accessed by a subsequent workload, leading to data leaks and incorrect
computation results.

The cleaner shader microcode is represented as an array of 32-bit words
(`gfx_9_4_3_cleaner_shader_hex`). This array is the binary
representation of the cleaner shader code, which is written in a
low-level GPU instruction set.

When the cleaner shader feature is enabled, the AMDGPU driver loads this
array into a specific location in the GPU memory. The GPU then reads
this memory location to fetch and execute the cleaner shader
instructions.

The cleaner shader is executed automatically by the GPU at the end of
each workload, before the next workload starts. This ensures that all
GPU resources are in a clean state before the start of each workload.

This addition is part of the cleaner shader feature implementation. The
cleaner shader feature helps improve GPU performance and resource
utilization by cleaning up GPU resources after they are used. It also
enhances security and reliability by preventing data leaks between
workloads.

v2: fix copyright date (Alex)

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
8 weeks agodrm/amdgpu/gfx9: Implement cleaner shader support for GFX9.4.3 hardware
Srinivasan Shanmugam [Mon, 29 Jul 2024 16:42:02 +0000 (22:12 +0530)]
drm/amdgpu/gfx9: Implement cleaner shader support for GFX9.4.3 hardware

The patch modifies the gfx_v9_4_3_kiq_set_resources function to write
the cleaner shader's memory controller address to the ring buffer. It
also adds a new function, gfx_v9_4_3_ring_emit_cleaner_shader, which
emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer.

This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the
gfx_v9_4_3 module. This packet is used to emit the cleaner shader, which
is used to clear GPU memory before it's reused, helping to prevent data
leakage between different processes.

Finally, the patch updates the ring function structures to include the
new gfx_v9_4_3_ring_emit_cleaner_shader function. This allows the
cleaner shader to be emitted as part of the ring's operations.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
8 weeks agodrm/amdgpu/gfx9: Implement cleaner shader support for GFX9 hardware
Srinivasan Shanmugam [Mon, 29 Jul 2024 16:26:57 +0000 (21:56 +0530)]
drm/amdgpu/gfx9: Implement cleaner shader support for GFX9 hardware

The patch modifies the gfx_v9_0_kiq_set_resources function to write
the cleaner shader's memory controller address to the ring buffer. It
also adds a new function, gfx_v9_0_ring_emit_cleaner_shader, which
emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer.

This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the
gfx_v9_0 module. This packet is used to emit the cleaner shader, which
is used to clear GPU memory before it's reused, helping to prevent data
leakage between different processes.

Finally, the patch updates the ring function structures to include the
new gfx_v9_0_ring_emit_cleaner_shader function. This allows the
cleaner shader to be emitted as part of the ring's operations.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
8 weeks agodrm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER for cleaner shader execution
Srinivasan Shanmugam [Sun, 7 Jul 2024 03:24:04 +0000 (08:54 +0530)]
drm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER for cleaner shader execution

This commit adds the PACKET3_RUN_CLEANER_SHADER definition. This packet
is a command packet used to instruct the GPU to execute the cleaner
shader.

The cleaner shader is a piece of GPU code that is used to clear or
initialize certain GPU resources, such as Local Data Share (LDS), Vector
General Purpose Registers (VGPRs), and Scalar General Purpose Registers
(SGPRs). Clearing these resources is important for ensuring data
isolation between different workloads running on the GPU.

The PACKET3_RUN_CLEANER_SHADER packet is used to trigger the execution
of the cleaner shader on the GPU. The packet consists of a header
followed by a RESERVED field, which is programmed to zero. When the GPU
receives this packet, it fetches and executes the cleaner shader
instructions from the location specified in the packet.

The cleaner shader feature helps to enhances security and reliability by
preventing data leaks between workloads.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
8 weeks agodrm/amdgpu: Add sysfs interface for running cleaner shader
Srinivasan Shanmugam [Mon, 27 May 2024 02:08:21 +0000 (07:38 +0530)]
drm/amdgpu: Add sysfs interface for running cleaner shader

This patch adds a new sysfs interface for running the cleaner shader on
AMD GPUs. The cleaner shader is used to clear GPU memory before it's
reused, which can help prevent data leakage between different processes.

The new sysfs file is write-only and is named `run_cleaner_shader`.
Write the number of the partition to this file to trigger the cleaner shader
on that partition. There is only one partition on GPUs which do not
support partitioning.

Changes made in this patch:

- Added `amdgpu_set_run_cleaner_shader` function to handle writes to the
  `run_cleaner_shader` sysfs file.
- Added `run_cleaner_shader` to the list of device attributes in
  `amdgpu_device_attrs`.
- Updated `default_attr_update` to handle `run_cleaner_shader`.
- Added `AMDGPU_DEVICE_ATTR_WO` macro to create write-only device
  attributes.

v2: fix error handling (Alex)

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
8 weeks agodrm/amdgpu: Add enforce_isolation sysfs attribute
Srinivasan Shanmugam [Mon, 27 May 2024 02:00:47 +0000 (07:30 +0530)]
drm/amdgpu: Add enforce_isolation sysfs attribute

This commit adds a new sysfs attribute 'enforce_isolation' to control
the 'enforce_isolation' setting per GPU. The attribute can be read and
written, and accepts values 0 (disabled) and 1 (enabled).

When 'enforce_isolation' is enabled, reserved VMIDs are allocated for
each ring. When it's disabled, the reserved VMIDs are freed.

The set function locks a mutex before changing the 'enforce_isolation'
flag and the VMIDs, and unlocks it afterwards. This ensures that these
operations are atomic and prevents race conditions and other concurrency
issues.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
8 weeks agodrm/amdgpu: Enforce isolation as part of the job
Srinivasan Shanmugam [Wed, 20 Mar 2024 01:12:38 +0000 (06:42 +0530)]
drm/amdgpu: Enforce isolation as part of the job

This patch adds a new parameter 'enforce_isolation' to the amdgpu_job
structure. This parameter is used to determine whether shader isolation
should be enforced for a job. The enforce_isolation parameter is then
stored in the amdgpu_job structure and used when flushing the VM.

The enforce_isolation field of the amdgpu_job structure is set directly
after the job is allocated

This change allows more fine-grained control over shader isolation,
making it possible to enforce isolation on a per-job basis rather than
globally. This can be useful in scenarios where only certain jobs
require isolation.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
2 months agodrm/amdgpu: abort KIQ waits when there is a pending reset
Victor Skvortsov [Fri, 2 Aug 2024 18:22:26 +0000 (14:22 -0400)]
drm/amdgpu: abort KIQ waits when there is a pending reset

Stop waiting for the KIQ to return back when there is a reset pending.
It's quite likely that the KIQ will never response.

Signed-off-by: Koenig Christian <Christian.Koenig@amd.com>
Suggested-by: Lazar Lijo <Lijo.Lazar@amd.com>
Tested-by: Victor Skvortsov <victor.skvortsov@amd.com>
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Make enforce_isolation setting per GPU
Srinivasan Shanmugam [Mon, 29 Jul 2024 16:05:26 +0000 (21:35 +0530)]
drm/amdgpu: Make enforce_isolation setting per GPU

This commit makes enforce_isolation setting to be per GPU and per
partition by adding the enforce_isolation array to the adev structure.
The adev variable is set based on the global enforce_isolation module
parameter during device initialization.

In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU
is used to determine whether to enforce isolation between graphics and
compute processes on that GPU.

In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU
and partition is used to determine whether to enforce isolation between
graphics and compute processes on that GPU and partition.

This allows the enforce_isolation setting to be controlled individually
for each GPU and each partition, which is useful in a system with
multiple GPUs and partitions where different isolation settings might be
desired for different GPUs and partitions.

v2: fix loop in amdgpu_vmid_mgr_init() (Alex)

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
2 months agodrm/amdgpu: Emit cleaner shader at end of IB submission
Alex Deucher [Tue, 12 Mar 2024 18:22:26 +0000 (14:22 -0400)]
drm/amdgpu: Emit cleaner shader at end of IB submission

This commit introduces the emission of a cleaner shader at the end of
the IB submission process. This is achieved by adding a new function
pointer, `emit_cleaner_shader`, to the `amdgpu_ring_funcs` structure. If
the `emit_cleaner_shader` function is set in the ring functions, it is
called during the VM flush process.

The cleaner shader is only emitted if the `enable_cleaner_shader` flag
is set in the `amdgpu_device` structure. This allows the cleaner shader
emission to be controlled on a per-device basis.

By emitting a cleaner shader at the end of the IB submission, we can
ensure that the VM state is properly cleaned up after each submission.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
2 months agodrm/amdgpu: Add infrastructure for Cleaner Shader feature
Srinivasan Shanmugam [Thu, 6 Jun 2024 07:42:40 +0000 (13:12 +0530)]
drm/amdgpu: Add infrastructure for Cleaner Shader feature

The cleaner shader is used by the CP firmware to clean LDS and GPRs
between processes on the CUs.

This adds an internal API for GFX IP code to allocate and initialize the
cleaner shader.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
2 months agodrm/amdgpu: handle enforce isolation on non-0 gfxhub
Alex Deucher [Wed, 14 Aug 2024 23:06:36 +0000 (19:06 -0400)]
drm/amdgpu: handle enforce isolation on non-0 gfxhub

Some chips have more than one gfxhub so check if we
are a gfxhub rather than just gfxhub 0.

Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1
Alex Deucher [Wed, 14 Aug 2024 14:28:24 +0000 (10:28 -0400)]
drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1

The workaround seems to cause stability issues on other
SDMA 5.2.x IPs.

Fixes: a03ebf116303 ("drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3556
Acked-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add vcn ip dump support for vcn_v2_6
Sunil Khatri [Mon, 5 Aug 2024 11:57:58 +0000 (17:27 +0530)]
drm/amdgpu: add vcn ip dump support for vcn_v2_6

Add support for logging the registers in devcoredump
buffer for vcn_v2_6.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add print support for vcn_v2_5 ip dump
Sunil Khatri [Mon, 5 Aug 2024 11:56:31 +0000 (17:26 +0530)]
drm/amdgpu: add print support for vcn_v2_5 ip dump

Add support for logging the registers in devcoredump
buffer for vcn_v2_5.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add vcn_v2_5 ip dump support
Sunil Khatri [Mon, 5 Aug 2024 11:53:55 +0000 (17:23 +0530)]
drm/amdgpu: add vcn_v2_5 ip dump support

Add support of vcn ip dump in the devcoredump
for vcn_v2_5.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add print support for vcn_v2_0 ip dump
Sunil Khatri [Mon, 5 Aug 2024 11:49:42 +0000 (17:19 +0530)]
drm/amdgpu: add print support for vcn_v2_0 ip dump

Add support for logging the registers in devcoredump
buffer for vcn_v2_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add vcn_v2_0 ip dump support
Sunil Khatri [Mon, 5 Aug 2024 11:48:09 +0000 (17:18 +0530)]
drm/amdgpu: add vcn_v2_0 ip dump support

Add support of vcn ip dump in the devcoredump
for vcn_v2_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add print support for vcn_v1_0 ip dump
Sunil Khatri [Mon, 5 Aug 2024 11:38:22 +0000 (17:08 +0530)]
drm/amdgpu: add print support for vcn_v1_0 ip dump

Add support for logging the registers in devcoredump
buffer for vcn_v1_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add vcn_v1_0 ip dump support
Sunil Khatri [Mon, 5 Aug 2024 11:36:24 +0000 (17:06 +0530)]
drm/amdgpu: add vcn_v1_0 ip dump support

Add support of vcn ip dump in the devcoredump
for vcn_v1_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add print support for vcn_v4_0_5 ip dump
Sunil Khatri [Mon, 5 Aug 2024 07:28:46 +0000 (12:58 +0530)]
drm/amdgpu: add print support for vcn_v4_0_5 ip dump

Add support for logging the registers in devcoredump
buffer for vcn_v4_0_5.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add print support for vcn_v4_0 ip dump
Sunil Khatri [Mon, 5 Aug 2024 07:13:07 +0000 (12:43 +0530)]
drm/amdgpu: add print support for vcn_v4_0 ip dump

Add support for logging the registers in devcoredump
buffer for vcn_v4_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add print support for vcn_v4_0_3 ip dump
Sunil Khatri [Mon, 5 Aug 2024 07:22:27 +0000 (12:52 +0530)]
drm/amdgpu: add print support for vcn_v4_0_3 ip dump

Add support for logging the registers in devcoredump
buffer for vcn_v4_0_3.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add vcn_v4_0_5 ip dump support
Sunil Khatri [Mon, 5 Aug 2024 07:27:09 +0000 (12:57 +0530)]
drm/amdgpu: add vcn_v4_0_5 ip dump support

Add support of vcn ip dump in the devcoredump
for vcn_v4_0_5.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add vcn_v4_0 ip dump support
Sunil Khatri [Mon, 5 Aug 2024 07:01:21 +0000 (12:31 +0530)]
drm/amdgpu: add vcn_v4_0 ip dump support

Add support of vcn ip dump in the devcoredump
for vcn_v4_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add vcn_v4_0_3 ip dump support
Sunil Khatri [Mon, 5 Aug 2024 07:19:20 +0000 (12:49 +0530)]
drm/amdgpu: add vcn_v4_0_3 ip dump support

Add support of vcn ip dump in the devcoredump
for vcn_v4_0_3.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add print support for vcn_v5_0 ip dump
Sunil Khatri [Thu, 1 Aug 2024 13:49:27 +0000 (19:19 +0530)]
drm/amdgpu: add print support for vcn_v5_0 ip dump

Add support for logging the registers in devcoredump
buffer for vcn_v5_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/mes12: add API for user queue reset
Alex Deucher [Mon, 3 Jun 2024 17:48:40 +0000 (13:48 -0400)]
drm/amdgpu/mes12: add API for user queue reset

Add API for resetting user queues.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/mes11: add API for user queue reset
Alex Deucher [Mon, 3 Jun 2024 17:48:07 +0000 (13:48 -0400)]
drm/amdgpu/mes11: add API for user queue reset

Add API for resetting user queues.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/mes: add API for user queue reset
Alex Deucher [Mon, 3 Jun 2024 17:35:05 +0000 (13:35 -0400)]
drm/amdgpu/mes: add API for user queue reset

Add API for resetting user queues.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx11: export gfx_v11_0_request_gfx_index_mutex()
Alex Deucher [Fri, 12 Jul 2024 20:39:30 +0000 (16:39 -0400)]
drm/amdgpu/gfx11: export gfx_v11_0_request_gfx_index_mutex()

It will be used by the queue reset code.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx11: add a mutex for the gfx semaphore
Alex Deucher [Fri, 12 Jul 2024 20:37:33 +0000 (16:37 -0400)]
drm/amdgpu/gfx11: add a mutex for the gfx semaphore

This will be used in more places in the future so
add a mutex.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx11: enter safe mode before touching CP_INT_CNTL
Alex Deucher [Fri, 12 Jul 2024 19:36:19 +0000 (15:36 -0400)]
drm/amdgpu/gfx11: enter safe mode before touching CP_INT_CNTL

Need to enter safe mode before touching GC MMIO.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx7: add ring reset callback for gfx
Alex Deucher [Thu, 18 Jul 2024 19:59:20 +0000 (15:59 -0400)]
drm/amdgpu/gfx7: add ring reset callback for gfx

Add ring reset callback for gfx.

v2: fix operator precedence (kernel test robot)

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx8: add ring reset callback for gfx
Alex Deucher [Thu, 18 Jul 2024 19:50:23 +0000 (15:50 -0400)]
drm/amdgpu/gfx8: add ring reset callback for gfx

Add ring reset callback for gfx.

v2: fix operator precedence (kernel test robot)

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add vcn_v5_0 ip dump support
Sunil Khatri [Thu, 1 Aug 2024 13:47:11 +0000 (19:17 +0530)]
drm/amdgpu: add vcn_v5_0 ip dump support

Add support of vcn ip dump in the devcoredump
for vcn_v5_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add print support for vcn_v3_0 ip dump
Sunil Khatri [Wed, 24 Jul 2024 11:18:28 +0000 (16:48 +0530)]
drm/amdgpu: add print support for vcn_v3_0 ip dump

Add support for logging the registers in devcoredump
buffer for vcn_v3_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add vcn_v3_0 ip dump support
Sunil Khatri [Wed, 24 Jul 2024 11:05:41 +0000 (16:35 +0530)]
drm/amdgpu: add vcn_v3_0 ip dump support

Add support of vcn ip dump in the devcoredump
for vcn_v3_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add vcn ip dump ptr in vcn global struct
Sunil Khatri [Tue, 23 Jul 2024 07:38:55 +0000 (13:08 +0530)]
drm/amdgpu: add vcn ip dump ptr in vcn global struct

Add pointer to the vcn ip dump in the vcn global structure
to be accessible for all vcn version via global adev.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd: Remove unused declarations
Zhang Zekun [Mon, 12 Aug 2024 12:24:15 +0000 (20:24 +0800)]
drm/amd: Remove unused declarations

amdgpu_gart_table_vram_pin() and amdgpu_gart_table_vram_unpin() has
been removed since commit 575e55ee4fbc ("drm/amdgpu: recover gart table
at resume") remain the declarations untouched in the header files.

Besides, amdgpu_dm_display_resume() has also beed removed since
commit a80aa93de1a0 ("drm/amd/display: Unify dm resume sequence into a
single call"). So, let's remove this unused declarations.

Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/radeon/evergreen_cs: fix int overflow errors in cs track offsets
Nikita Zhandarovich [Tue, 6 Aug 2024 17:19:04 +0000 (10:19 -0700)]
drm/radeon/evergreen_cs: fix int overflow errors in cs track offsets

Several cs track offsets (such as 'track->db_s_read_offset')
either are initialized with or plainly take big enough values that,
once shifted 8 bits left, may be hit with integer overflow if the
resulting values end up going over u32 limit.

Same goes for a few instances of 'surf.layer_size * mslice'
multiplications that are added to 'offset' variable - they may
potentially overflow as well and need to be validated properly.

While some debug prints in this code section take possible overflow
issues into account, simply casting to (unsigned long) may be
erroneous in its own way, as depending on CPU architecture one is
liable to get different results.

Fix said problems by:
 - casting 'offset' to fixed u64 data type instead of
 ambiguous unsigned long.
 - casting one of the operands in vulnerable to integer
 overflow cases to u64.
 - adjust format specifiers in debug prints to properly
 represent 'offset' values.

Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.

Fixes: 285484e2d55e ("drm/radeon: add support for evergreen/ni tiling informations v11")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: fixing rlc firmware loading failure issue
Yang Wang [Tue, 13 Aug 2024 05:51:48 +0000 (13:51 +0800)]
drm/amdgpu: fixing rlc firmware loading failure issue

Skip rlc firmware validation to ignore firmware header size mismatch issues.
This restores the workaround added in
commit 849e133c973c ("drm/amdgpu: Fix the null pointer when load rlc firmware")

Fixes: 3af2c80ae2f5 ("drm/amdgpu: refine gfx10 firmware loading")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3551
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: remove ME0 registers from mi300 dump
Sunil Khatri [Tue, 13 Aug 2024 17:04:26 +0000 (22:34 +0530)]
drm/amdgpu: remove ME0 registers from  mi300 dump

Remove ME0 registers from MI300 gfx_9_4_3 ipdump
MI300 does not have  gfx ME and hence those register
are just empty one and could be dropped.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx9: use rlc safe mode for soft recovery
Alex Deucher [Wed, 24 Jul 2024 22:20:57 +0000 (18:20 -0400)]
drm/amdgpu/gfx9: use rlc safe mode for soft recovery

Protect the MMIO access with safe mode.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx9.4.3: use rlc safe mode for soft recovery
Alex Deucher [Wed, 24 Jul 2024 22:20:44 +0000 (18:20 -0400)]
drm/amdgpu/gfx9.4.3: use rlc safe mode for soft recovery

Protect the MMIO access with safe mode.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx9.4.3: use proper rlc safe mode helpers
Alex Deucher [Wed, 24 Jul 2024 22:04:44 +0000 (18:04 -0400)]
drm/amdgpu/gfx9.4.3: use proper rlc safe mode helpers

Rather than open coding it for the queue reset.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx9: use proper rlc safe mode helpers
Alex Deucher [Wed, 24 Jul 2024 21:59:47 +0000 (17:59 -0400)]
drm/amdgpu/gfx9: use proper rlc safe mode helpers

Rather than open coding it for the queue reset.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx9: add ring reset callback for gfx
Alex Deucher [Wed, 17 Jul 2024 23:02:50 +0000 (19:02 -0400)]
drm/amdgpu/gfx9: add ring reset callback for gfx

Add ring reset callback for gfx.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx9: per queue reset only on bare metal
Alex Deucher [Thu, 18 Jul 2024 14:20:56 +0000 (10:20 -0400)]
drm/amdgpu/gfx9: per queue reset only on bare metal

It's not supported under SR-IOV at the moment.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx9.4.3: implement reset_hw_queue for gfx9.4.3
Jiadong Zhu [Thu, 4 Jul 2024 06:51:58 +0000 (14:51 +0800)]
drm/amdgpu/gfx9.4.3: implement reset_hw_queue for gfx9.4.3

Using mmio to do queue reset. Enter safe mode
before writing mmio registers.

v2: set register instance offset according to xcc id.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx9: implement reset_hw_queue for gfx9
Jiadong Zhu [Thu, 4 Jul 2024 04:24:31 +0000 (12:24 +0800)]
drm/amdgpu/gfx9: implement reset_hw_queue for gfx9

Using mmio to do queue reset. Enter safe mode
when writing registers.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx: add a new kiq_pm4_funcs callback for reset_hw_queue
Jiadong Zhu [Thu, 4 Jul 2024 04:12:42 +0000 (12:12 +0800)]
drm/amdgpu/gfx: add a new kiq_pm4_funcs callback for reset_hw_queue

Add reset_hw_queue in kiq_pm4_funcs callbacks.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx_9.4.3: wait for reset done before remap
Jiadong Zhu [Fri, 28 Jun 2024 03:48:22 +0000 (11:48 +0800)]
drm/amdgpu/gfx_9.4.3: wait for reset done before remap

There is a racing condition that cp firmware modifies
MQD in reset sequence after driver updates it for
remapping. We have to wait till CP_HQD_ACTIVE becoming
false then remap the queue.

v2: fix KIQ locking (Alex)
v3: fix KIQ locking harder

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx9.4.3: remap queue after reset successfully
Jiadong Zhu [Fri, 14 Jun 2024 05:05:32 +0000 (13:05 +0800)]
drm/amdgpu/gfx9.4.3: remap queue after reset successfully

Kiq command unmap_queues only does the dequeueing action.
We have to map the queue back with clean mqd.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx9.4.3: add ring reset callback
Alex Deucher [Mon, 3 Jun 2024 21:24:03 +0000 (17:24 -0400)]
drm/amdgpu/gfx9.4.3: add ring reset callback

Add ring reset callback for compute.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx9: wait for reset done before remap
Jiadong Zhu [Tue, 2 Jul 2024 01:03:49 +0000 (09:03 +0800)]
drm/amdgpu/gfx9: wait for reset done before remap

There is a racing condition that cp firmware modifies
MQD in reset sequence after driver updates it for
remapping. We have to wait till CP_HQD_ACTIVE becoming
false then remap the queue.

v2: fix KIQ locking (Alex)
v3: fix KIQ locking harder

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx9: remap queue after reset successfully
Jiadong Zhu [Tue, 11 Jun 2024 10:06:44 +0000 (18:06 +0800)]
drm/amdgpu/gfx9: remap queue after reset successfully

Kiq command unmap_queues only does the dequeueing action.
We have to map the queue back with clean mqd.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx9: add ring reset callback
Alex Deucher [Mon, 3 Jun 2024 21:23:14 +0000 (17:23 -0400)]
drm/amdgpu/gfx9: add ring reset callback

Add ring reset callback for compute.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: increase the reset counter for the queue reset
Prike Liang [Wed, 12 Jun 2024 07:49:38 +0000 (15:49 +0800)]
drm/amdgpu: increase the reset counter for the queue reset

Update the reset counter for the amdgpu_cs_query_reset_state()

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add per ring reset support (v5)
Alex Deucher [Mon, 3 Jun 2024 18:38:20 +0000 (14:38 -0400)]
drm/amdgpu: add per ring reset support (v5)

If a specific job is hung, try and reset just the
ring associated with the job.

v2: move to amdgpu_job.c
v3: fix drm_sched_stop() handling when ring reset fails
v4: drop unnecessary amdgpu_fence_driver_clear_job_fences() and
    drm_sched_increase_karma()
v5: rework sched_stop handling

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add new ring reset callback
Alex Deucher [Tue, 9 Apr 2024 17:11:53 +0000 (13:11 -0400)]
drm/amdgpu: add new ring reset callback

Use this to reset just a single ring.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Return earlier in amdgpu_sw_ring_ib_end if mcbp is off
Soham Dandapat [Mon, 29 Jul 2024 06:29:11 +0000 (11:59 +0530)]
drm/amdgpu: Return earlier in amdgpu_sw_ring_ib_end if mcbp is off

As we don't trigger preemption is sw ring muxer when mcbp is
disabled,so return earlier in amdgpu_sw_ring_ib_end function
if mcbp is disabled ,not required to call amdgpu_ring_mux_end_ib

Signed-off-by: Soham Dandapat <sdandapa@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add cp queue registers print for gfx9_4_3
Sunil Khatri [Thu, 8 Aug 2024 08:53:07 +0000 (14:23 +0530)]
drm/amdgpu: add cp queue registers print for gfx9_4_3

Add gfx9_4_3 print support of CP queue registers
for all queues to be used by devcoredump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add cp queue registers for gfx9_4_3 ipdump
Sunil Khatri [Thu, 8 Aug 2024 08:39:06 +0000 (14:09 +0530)]
drm/amdgpu: add cp queue registers for gfx9_4_3 ipdump

Add gfx9 support of CP queue registers for all queues
to be used by devcoredump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Align hwss_wait_for_all_blank_complete descriptor with implementation
Srinivasan Shanmugam [Mon, 12 Aug 2024 10:23:36 +0000 (15:53 +0530)]
drm/amd/display: Align hwss_wait_for_all_blank_complete descriptor with implementation

The descriptor for `hwss_wait_for_all_blank_complete` was previously
misaligned with the actual implementation. This commit refines the
descriptor to reflect the implementation of
`hwss_wait_for_all_blank_complete`

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_hw_sequencer.c:991: warning: expecting prototype for hwss_wait_for_blank_complete(). Prototype was for hwss_wait_for_all_blank_complete() instead

Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add print support for gfx9_4_3 ipdump
Sunil Khatri [Thu, 8 Aug 2024 06:58:59 +0000 (12:28 +0530)]
drm/amdgpu: add print support for gfx9_4_3 ipdump

Add support of gfx9_4_3 ipdump print so devcoredump
could trigger it to dump the captured registers
in devcoredump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add gfx9_4_3 register support in ipdump
Sunil Khatri [Thu, 8 Aug 2024 06:40:18 +0000 (12:10 +0530)]
drm/amdgpu: add gfx9_4_3 register support in ipdump

Add general registers of gfx9_4_3 in ipdump for
devcoredump support.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/amdgpu: cleanup parse_cs callbacks
David (Ming Qiang) Wu [Fri, 2 Aug 2024 18:29:41 +0000 (14:29 -0400)]
drm/amd/amdgpu: cleanup parse_cs callbacks

Because gpu_addr is updated in the calling routine
(amdgpu_cs_patch_ibs()),it is removed in the callback.

Use .patch_cs_in_place instead of .parse_cs for
amdgpu_vce_ring_parse_cs_vm() as there is no need for keeping
a temporary IB, therefore ib->sa_bo is NULL and amdgpu_ib_free()
is removed.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/amdgpu: command submission parser for JPEG
David (Ming Qiang) Wu [Thu, 8 Aug 2024 16:19:50 +0000 (12:19 -0400)]
drm/amd/amdgpu: command submission parser for JPEG

Add JPEG IB command parser to ensure registers
in the command are within the JPEG IP block.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/mes12: fix suspend issue
Jack Xiao [Wed, 7 Aug 2024 04:03:11 +0000 (12:03 +0800)]
drm/amdgpu/mes12: fix suspend issue

Use mes pipe to unmap kcq and kgq.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/mes12: sw/hw fini for unified mes
Jack Xiao [Wed, 7 Aug 2024 07:23:16 +0000 (15:23 +0800)]
drm/amdgpu/mes12: sw/hw fini for unified mes

Free memory for two pipes and unmap pipe0 via pipe1.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/mes12: configure two pipes hardware resources
Jack Xiao [Wed, 7 Aug 2024 06:49:30 +0000 (14:49 +0800)]
drm/amdgpu/mes12: configure two pipes hardware resources

Configure two pipes with different hardware resources.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/mes12: adjust mes12 sw/hw init for multiple pipes
Jack Xiao [Wed, 7 Aug 2024 06:44:07 +0000 (14:44 +0800)]
drm/amdgpu/mes12: adjust mes12 sw/hw init for multiple pipes

Adjust mes12 sw/hw initiailization for both pipe0 and pipe1
enablement. The two pipes are almost identical pipe. Pipe0
behaves like schq and pipe1 like kiq, pipe0 was mapped by pipe1.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/mes12: add mes pipe switch support
Jack Xiao [Wed, 7 Aug 2024 06:15:48 +0000 (14:15 +0800)]
drm/amdgpu/mes12: add mes pipe switch support

Add mes pipe switch to let caller choose pipe
to submit packet.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Block MMR_READ IOCTL in reset
Victor Skvortsov [Thu, 8 Aug 2024 17:40:23 +0000 (13:40 -0400)]
drm/amdgpu: Block MMR_READ IOCTL in reset

Register access from userspace should be blocked until
reset is complete.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdkfd: fallback to pipe reset on queue reset fail for gfx9
Jonathan Kim [Tue, 30 Jul 2024 16:52:20 +0000 (12:52 -0400)]
drm/amdkfd: fallback to pipe reset on queue reset fail for gfx9

If queue reset fails, tell the CP to reset the pipe.
Since queues multiplex context per pipe and we've issued a device wide
preemption prior to the hang, we can assume the hung pipe only has one
queue to reset on pipe reset.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Reorder to read EFI exported ROM first
Lijo Lazar [Mon, 12 Aug 2024 03:32:57 +0000 (09:02 +0530)]
drm/amdgpu: Reorder to read EFI exported ROM first

On EFI BIOSes, PCI ROM may be exported through EFI_PCI_IO_PROTOCOL and
expansion ROM BARs may not be enabled. Choose to read from EFI exported
ROM data before reading PCI Expansion ROM BAR.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/mes12: load unified mes fw on pipe0 and pipe1
Jack Xiao [Wed, 7 Aug 2024 05:19:59 +0000 (13:19 +0800)]
drm/amdgpu/mes12: load unified mes fw on pipe0 and pipe1

Enable unified mes firmware to load on pipe0 and pipe1.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Disable dpm_enabled flag while VF is in reset
Victor Skvortsov [Thu, 8 Aug 2024 17:22:34 +0000 (13:22 -0400)]
drm/amdgpu: Disable dpm_enabled flag while VF is in reset

VFs do not perform HW fini/suspend in FLR, so the dpm_enabled
is incorrectly kept enabled. Add interface to disable it in
virt_pre_reset call.

v2: Made implementation generic for all asics
v3: Re-order conditionals so PP_MP1_STATE_FLR is only evaluated on VF

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agoRevert "drm/amdgpu: Extend KIQ reg polling wait for VF"
Victor Skvortsov [Thu, 25 Jul 2024 13:51:56 +0000 (09:51 -0400)]
Revert "drm/amdgpu: Extend KIQ reg polling wait for VF"

KIQ timeouts no longer seen.

This reverts commit 3a19a8af64eaff8a8b230796741a1a8277205344.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Update kmd_fw_shared for VCN5
Yinjie Yao [Fri, 9 Aug 2024 21:20:26 +0000 (17:20 -0400)]
drm/amdgpu: Update kmd_fw_shared for VCN5

kmd_fw_shared changed in VCN5

Signed-off-by: Yinjie Yao <yinjie.yao@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdkfd: Add node_id to location_id generically
Lijo Lazar [Wed, 7 Aug 2024 16:11:59 +0000 (21:41 +0530)]
drm/amdkfd: Add node_id to location_id generically

If there are multiple nodes per kfd device, add nodeid to location_id to
differentiate.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/amdgpu: add HDP_SD support on gc 12.0.0/1
Kenneth Feng [Thu, 8 Aug 2024 04:19:22 +0000 (12:19 +0800)]
drm/amd/amdgpu: add HDP_SD support on gc 12.0.0/1

add HDP_SD support on gc 12.0.0/1

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/sriov: extend NV_MAILBOX_POLL_MSG_TIMEDOUT
Victor Zhao [Wed, 7 Aug 2024 09:32:27 +0000 (17:32 +0800)]
drm/amd/sriov: extend NV_MAILBOX_POLL_MSG_TIMEDOUT

on MI300/MI308 UBB products, when doing mode1 reset, since 1 gpu need to
wait all 8 gpus finish mode1 reset and then do re-init. As observed,
sometimes the gpu which triggered the reset need to wait 15s for all
gpus to finish.

If poll msg timeout, guest driver will send the reset message again, and
may mess up the following reinit sequence on other gpus.

So extend the time to cover the maximum time needed to recover.

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Promote DAL to 3.2.296
Martin Leung [Mon, 5 Aug 2024 15:00:14 +0000 (11:00 -0400)]
drm/amd/display: Promote DAL to 3.2.296

This version brings along following fixes:
- Fix some cursor issue
- Fix print format specifiers in DC_LOG_IPS
- Fix minor coding errors in dml21 phase 5
- Fix MST BW calculation Regression
- Improve FAM control for DCN401
- Add null pointer checks for some code
- Refactor 3DLUT for non-DMA
- Optimize vstartup position for AS-SDP
- Update to using new dccg callbacks
- Enable otg synchronization logic for DCN321
- Disable DCN401 UCLK P-State support on full updates

Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Martin Leung <Martin.Leung@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Remove unnecessary call to REG_SEQ_SUBMIT|WAIT_DONE
Rodrigo Siqueira [Fri, 2 Aug 2024 18:33:20 +0000 (12:33 -0600)]
drm/amd/display: Remove unnecessary call to REG_SEQ_SUBMIT|WAIT_DONE

[why & how]
Remove unnecessary call to REG_SEQ_SUBMIT and REG_SEQ_WAIT_DONE, since
those macros are not necessary anymore at the dpp1 set degamma. Those
are part of an old implementation.

Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Adjust cursor position
Rodrigo Siqueira [Thu, 1 Aug 2024 22:16:35 +0000 (16:16 -0600)]
drm/amd/display: Adjust cursor position

[why & how]
When the commit 9d84c7ef8a87 ("drm/amd/display: Correct cursor position
on horizontal mirror") was introduced, it used the wrong calculation for
the position copy for X. This commit uses the correct calculation for that
based on the original patch.

Fixes: 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: fix cursor offset on rotation 180
Melissa Wen [Tue, 31 Jan 2023 16:05:46 +0000 (15:05 -0100)]
drm/amd/display: fix cursor offset on rotation 180

[why & how]
Cursor gets clipped off in the middle of the screen with hw
rotation 180. Fix a miscalculation of cursor offset when it's
placed near the edges in the pipe split case.

Cursor bugs with hw rotation were reported on AMD issue
tracker:
https://gitlab.freedesktop.org/drm/amd/-/issues/2247

The issues on rotation 270 was fixed by:
https://lore.kernel.org/amd-gfx/20221118125935.4013669-22-Brian.Chang@amd.com/
that partially addressed the rotation 180 too. So, this patch is the
final bits for rotation 180.

Reported-by: Xaver Hugl <xaver.hugl@gmail.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2247
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Fixes: 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror")
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Improve FAM control for DCN401
Rodrigo Siqueira [Mon, 29 Jul 2024 17:59:25 +0000 (11:59 -0600)]
drm/amd/display: Improve FAM control for DCN401

[why & how]
When the commit 5324e2b205a2 ("drm/amd/display: Add driver support for
future FAMS versions") was introduced, it missed some of the FAM2 code.
This commit introduces the code that control the FAM enable and disable.

Fixes: 5324e2b205a2 ("drm/amd/display: Add driver support for future FAMS versions")
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Remove unused field
Rodrigo Siqueira [Fri, 2 Aug 2024 18:31:42 +0000 (12:31 -0600)]
drm/amd/display: Remove unused field

[why & how]
Remove force_backlight_start_level since it is never used.

Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Fix MST BW calculation Regression
Fangzhi Zuo [Mon, 29 Jul 2024 14:23:03 +0000 (10:23 -0400)]
drm/amd/display: Fix MST BW calculation Regression

[Why & How]
Revert commit 8b2cb32cf0c6
("drm/amd/display: FEC overhead should be checked once for mst slot nums")
Because causes bw calculation regression

Cc: mario.limonciello@amd.com
Cc: alexander.deucher@amd.com
Reported-by: jirislaby@kernel.org
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3495
Closes: https://bugzilla.suse.com/show_bug.cgi?id=1228093
Reviewed-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Enable otg synchronization logic for DCN321
Loan Chen [Fri, 2 Aug 2024 05:57:40 +0000 (13:57 +0800)]
drm/amd/display: Enable otg synchronization logic for DCN321

[Why]
Tiled display cannot synchronize properly after S3.
The fix for commit 5f0c74915815 ("drm/amd/display: Fix for otg
synchronization logic") is not enable in DCN321, which causes
the otg is excluded from synchronization.

[How]
Enable otg synchronization logic in dcn321.

Fixes: 5f0c74915815 ("drm/amd/display: Fix for otg synchronization logic")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Loan Chen <lo-an.chen@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: remove redundant msg to pmfw at boot/resume
Charlene Liu [Thu, 1 Aug 2024 22:18:20 +0000 (18:18 -0400)]
drm/amd/display: remove redundant msg to pmfw at boot/resume

[why & how]
this is to remove redundant msg to pmfw at boot/resume
since bios already power up dcn.

Reviewed-by: Chris Park <chris.park@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Set max VTotal cap for dcn401
Dillon Varone [Fri, 2 Aug 2024 17:50:10 +0000 (13:50 -0400)]
drm/amd/display: Set max VTotal cap for dcn401

[WHY&HOW]
Set max VTotal cap for dcn401 because VTotal
register is only 16 bits wide on dcn401.

Reviewed-by: Chris Park <chris.park@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Perform outstanding programming on full updates
Dillon Varone [Thu, 1 Aug 2024 19:38:34 +0000 (15:38 -0400)]
drm/amd/display: Perform outstanding programming on full updates

[WHY]
In certain scenarios DC can internally trigger back to back full updates
which will miss some required programming that is normally deferred
until post update via optimize_bandwidth.

[HOW]
In back to back update scenarios, wait for pending updates to complete
and perform any strictly required outstanding programming.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Disable DCN401 UCLK P-State support on full updates
Dillon Varone [Thu, 1 Aug 2024 19:35:51 +0000 (15:35 -0400)]
drm/amd/display: Disable DCN401 UCLK P-State support on full updates

[WHY&HOW]
It is not guaranteed even for HW exclusive P-State methods (like
VActive) that P-state will be supported properly until optimize
bandwidth is called, so unconditionally disable it on full updates.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Reduce redundant minimal transitions due to SubVP
Dillon Varone [Mon, 29 Jul 2024 22:17:55 +0000 (18:17 -0400)]
drm/amd/display: Reduce redundant minimal transitions due to SubVP

[WHY]
Stream ID's associated with phantom pipes can change often as they
are reconstructed on full updates, however they can remain identical
depending on the required update.

[HOW]
In the case phantom streams and pipe topologies remain the same
between updates, mark the transition as seamless.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Add null check for 'afb' in amdgpu_dm_plane_handle_cursor_update...
Srinivasan Shanmugam [Fri, 2 Aug 2024 07:05:13 +0000 (12:35 +0530)]
drm/amd/display: Add null check for 'afb' in amdgpu_dm_plane_handle_cursor_update (v2)

This commit adds a null check for the 'afb' variable in the
amdgpu_dm_plane_handle_cursor_update function. Previously, 'afb' was
assumed to be null, but was used later in the code without a null check.
This could potentially lead to a null pointer dereference.

Changes since v1:
- Moved the null check for 'afb' to the line where 'afb' is used. (Alex)

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:1298 amdgpu_dm_plane_handle_cursor_update() error: we previously assumed 'afb' could be null (see line 1252)

Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Co-developed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>