Stephen Brennan [Wed, 1 May 2024 16:29:56 +0000 (09:29 -0700)]
kprobe/ftrace: bail out if ftrace was killed
If an error happens in ftrace, ftrace_kill() will prevent disarming
kprobes. Eventually, the ftrace_ops associated with the kprobes will be
freed, yet the kprobes will still be active, and when triggered, they
will use the freed memory, likely resulting in a page fault and panic.
This behavior can be reproduced quite easily, by creating a kprobe and
then triggering a ftrace_kill(). For simplicity, we can simulate an
ftrace error with a kernel module like [1]:
sudo perf probe --add commit_creds
sudo perf trace -e probe:commit_creds
# In another terminal
make
sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug
# Back to perf terminal
# ctrl-c
sudo perf probe --del commit_creds
After a short period, a page fault and panic would occur as the kprobe
continues to execute and uses the freed ftrace_ops. While ftrace_kill()
is supposed to be used only in extreme circumstances, it is invoked in
FTRACE_WARN_ON() and so there are many places where an unexpected bug
could be triggered, yet the system may continue operating, possibly
without the administrator noticing. If ftrace_kill() does not panic the
system, then we should do everything we can to continue operating,
rather than leave a ticking time bomb.
selftests/ftrace: Fix required features for VFS type test case
Since the VFS type argument test case uses fprobe events, it must
check the availablity of dynamic_events file and fprobe events syntax
in README. Without this fix, the test fails if CONFIG_FPROBE_EVENTS=n.
objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids
Profiling shows that calling nr_possible_cpus() in objpool_pop() takes
a noticeable amount of CPU (when profiled on 80-core machine), as we
need to recalculate number of set bits in a CPU bit mask. This number
can't change, so there is no point in paying the price for recalculating
it. As such, cache this value in struct objpool_head and use it in
objpool_pop().
On the other hand, cached pool->nr_cpus isn't necessary, as it's not
used in hot path and is also a pretty trivial value to retrieve. So drop
pool->nr_cpus in favor of using nr_cpu_ids everywhere. This way the size
of struct objpool_head remains the same, which is a nice bonus.
Same BPF selftests benchmarks were used to evaluate the effect. Using
changes in previous patch (inlining of objpool_pop/objpool_push) as
baseline, here are the differences:
objpool: enable inlining objpool_push() and objpool_pop() operations
objpool_push() and objpool_pop() are very performance-critical functions
and can be called very frequently in kretprobe triggering path.
As such, it makes sense to allow compiler to inline them completely to
eliminate function calls overhead. Luckily, their logic is quite well
isolated and doesn't have any sprawling dependencies.
This patch moves both objpool_push() and objpool_pop() into
include/linux/objpool.h and marks them as static inline functions,
enabling inlining. To avoid anyone using internal helpers
(objpool_try_get_slot, objpool_try_add_slot), rename them to use leading
underscores.
We used kretprobe microbenchmark from BPF selftests (bench trig-kprobe
and trig-kprobe-multi benchmarks) running no-op BPF kretprobe/kretprobe.multi
programs in a tight loop to evaluate the effect. BPF own overhead in
this case is minimal and it mostly stresses the rest of in-kernel
kretprobe infrastructure overhead. Results are in millions of calls per
second. This is not super scientific, but shows the trend nevertheless.
rethook: honor CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING in rethook_try_get()
Take into account CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING when validating
that RCU is watching when trying to setup rethooko on a function entry.
One notable exception when we force rcu_is_watching() check is
CONFIG_KPROBE_EVENTS_ON_NOTRACE=y case, in which case kretprobes will use
old-style int3-based workflow instead of relying on ftrace, making RCU
watching check important to validate.
This further (in addition to improvements in the previous patch)
improves BPF multi-kretprobe (which rely on rethook) runtime throughput
by 2.3%, according to BPF benchmarks ([0]).
ftrace: make extra rcu_is_watching() validation check optional
Introduce CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING config option to
control whether ftrace low-level code performs additional
rcu_is_watching()-based validation logic in an attempt to catch noinstr
violations.
This check is expected to never be true and is mostly useful for
low-level validation of ftrace subsystem invariants. For most users it
should probably be kept disabled to eliminate unnecessary runtime
overhead.
This improves BPF multi-kretprobe (relying on ftrace and rethook
infrastructure) runtime throughput by 2%, according to BPF benchmarks ([0]).
Jonathan Haslam [Mon, 22 Apr 2024 10:23:05 +0000 (03:23 -0700)]
uprobes: reduce contention on uprobes_tree access
Active uprobes are stored in an RB tree and accesses to this tree are
dominated by read operations. Currently these accesses are serialized by
a spinlock but this leads to enormous contention when large numbers of
threads are executing active probes.
This patch converts the spinlock used to serialize access to the
uprobes_tree RB tree into a reader-writer spinlock. This lock type
aligns naturally with the overwhelmingly read-only nature of the tree
usage here. Although the addition of reader-writer spinlocks are
discouraged [0], this fix is proposed as an interim solution while an
RCU based approach is implemented (that work is in a nascent form). This
fix also has the benefit of being trivial, self contained and therefore
simple to backport.
We have used a uprobe benchmark from the BPF selftests [1] to estimate
the improvements. Each block of results below show 1 line per execution
of the benchmark ("the "Summary" line) and each line is a run with one
more thread added - a thread is a "producer". The lines are edited to
remove extraneous output.
The tests were executed with this driver script:
for num_threads in {1..20}
do
sudo ./bench -a -p $num_threads trig-uprobe-nop | grep Summary
done
The numbers in parenthesis give averaged throughput per thread which is
of greatest interest here as a measure of scalability. Improvements are
in the order of 22 - 68% with this particular benchmark (mean = 43%).
V2:
- Updated commit message to include benchmark results.
Kui-Feng Lee [Mon, 8 Apr 2024 17:51:40 +0000 (10:51 -0700)]
rethook: Remove warning messages printed for finding return address of a frame.
The function rethook_find_ret_addr() prints a warning message and returns 0
when the target task is running and is not the "current" task in order to
prevent the incorrect return address, although it still may return an
incorrect address.
However, the warning message turns into noise when BPF profiling programs
call bpf_get_task_stack() on running tasks in a firm with a large number of
hosts.
The callers should be aware and willing to take the risk of receiving an
incorrect return address from a task that is currently running other than
the "current" one. A warning is not needed here as the callers are intent
on it.
Link: https://lore.kernel.org/all/20240408175140.60223-1-thinker.li@gmail.com/ Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Ye Bin [Fri, 22 Mar 2024 06:43:08 +0000 (14:43 +0800)]
selftests/ftrace: add fprobe test cases for VFS type "%pd" and "%pD"
This patch adds fprobe test cases for new print format type "%pd/%pD".The
test cases test the following items:
1. Test "%pd" type for dput();
2. Test "%pD" type for vfs_read();
This test case require enable CONFIG_HAVE_FUNCTION_ARG_ACCESS_API configuration.
Ye Bin [Fri, 22 Mar 2024 06:43:07 +0000 (14:43 +0800)]
selftests/ftrace: add kprobe test cases for VFS type "%pd" and "%pD"
This patch adds test cases for new print format type "%pd/%pD".The test cases
test the following items:
1. Test README if add "%pd/%pD" type;
2. Test "%pd" type for dput();
3. Test "%pD" type for vfs_read();
This test case require enable CONFIG_HAVE_FUNCTION_ARG_ACCESS_API configuration.
Ye Bin [Fri, 22 Mar 2024 06:43:05 +0000 (14:43 +0800)]
tracing/probes: support '%pD' type for print struct file's name
As like '%pd' type, this patch supports print type '%pD' for print file's
name. For example "name=$arg1:%pD" casts the `$arg1` as (struct file*),
dereferences the "file.f_path.dentry.d_name.name" field and stores it to
"name" argument as a kernel string.
Here is an example:
[tracing]# echo 'p:testprobe vfs_read name=$arg1:%pD' > kprobe_event
[tracing]# echo 1 > events/kprobes/testprobe/enable
[tracing]# grep -q "1" events/kprobes/testprobe/enable
[tracing]# echo 0 > events/kprobes/testprobe/enable
[tracing]# grep "vfs_read" trace | grep "enable"
grep-15108 [003] ..... 5228.328609: testprobe: (vfs_read+0x4/0xbb0) name="enable"
Note that this expects the given argument (e.g. $arg1) is an address of struct
file. User must ensure it.
Ye Bin [Fri, 22 Mar 2024 06:43:04 +0000 (14:43 +0800)]
tracing/probes: support '%pd' type for print struct dentry's name
During fault locating, the file name needs to be printed based on the
dentry address. The offset needs to be calculated each time, which
is troublesome. Similar to printk, kprobe support print type '%pd' for
print dentry's name. For example "name=$arg1:%pd" casts the `$arg1`
as (struct dentry *), dereferences the "d_name.name" field and stores
it to "name" argument as a kernel string.
Here is an example:
[tracing]# echo 'p:testprobe dput name=$arg1:%pd' > kprobe_events
[tracing]# echo 1 > events/kprobes/testprobe/enable
[tracing]# grep -q "1" events/kprobes/testprobe/enable
[tracing]# echo 0 > events/kprobes/testprobe/enable
[tracing]# cat trace | grep "enable"
bash-14844 [002] ..... 16912.889543: testprobe: (dput+0x4/0x30) name="enable"
grep-15389 [003] ..... 16922.834182: testprobe: (dput+0x4/0x30) name="enable"
grep-15389 [003] ..... 16922.836103: testprobe: (dput+0x4/0x30) name="enable"
bash-14844 [001] ..... 16931.820909: testprobe: (dput+0x4/0x30) name="enable"
Note that this expects the given argument (e.g. $arg1) is an address of struct
dentry. User must ensure it.
It's very common with BPF-based uprobe/uretprobe use cases to have
a system-wide (not PID specific) probes used. In this case uprobe's
trace_uprobe_filter->nr_systemwide counter is bumped at registration
time, and actual filtering is short circuited at the time when
uprobe/uretprobe is triggered.
This is a great optimization, and the only issue with it is that to even
get to checking this counter uprobe subsystem is taking
read-side trace_uprobe_filter->rwlock. This is actually noticeable in
profiles and is just another point of contention when uprobe is
triggered on multiple CPUs simultaneously.
This patch moves this nr_systemwide check outside of filter list's
rwlock scope, as rwlock is meant to protect list modification, while
nr_systemwide-based check is speculative and racy already, despite the
lock (as discussed in [0]). trace_uprobe_filter_remove() and
trace_uprobe_filter_add() already check for filter->nr_systewide
explicitly outside of __uprobe_perf_filter, so no modifications are
required there.
AFTER
=====
uprobe-nop : 2.878 ± 0.017M/s (+5.5%, total +8.3%)
uprobe-push : 2.753 ± 0.013M/s (+5.3%, total +10.2%)
uprobe-ret : 1.142 ± 0.010M/s (+3.8%, total +3.8%)
uretprobe-nop : 1.444 ± 0.008M/s (+3.5%, total +6.5%)
uretprobe-push : 1.410 ± 0.010M/s (+4.8%, total +7.1%)
uretprobe-ret : 0.816 ± 0.002M/s (+2.0%, total +3.9%)
In the above, first percentage value is based on top of previous patch
(lazy uprobe buffer optimization), while the "total" percentage is
based on kernel without any of the changes in this patch set.
As can be seen, we get about 4% - 10% speed up, in total, with both lazy
uprobe buffer and speculative filter check optimizations.
Andrii Nakryiko [Mon, 18 Mar 2024 18:17:27 +0000 (11:17 -0700)]
uprobes: prepare uprobe args buffer lazily
uprobe_cpu_buffer and corresponding logic to store uprobe args into it
are used for uprobes/uretprobes that are created through tracefs or
perf events.
BPF is yet another user of uprobe/uretprobe infrastructure, but doesn't
need uprobe_cpu_buffer and associated data. For BPF-only use cases this
buffer handling and preparation is a pure overhead. At the same time,
BPF-only uprobe/uretprobe usage is very common in practice. Also, for
a lot of cases applications are very senstivie to performance overheads,
as they might be tracing a very high frequency functions like
malloc()/free(), so every bit of performance improvement matters.
All that is to say that this uprobe_cpu_buffer preparation is an
unnecessary overhead that each BPF user of uprobes/uretprobe has to pay.
This patch is changing this by making uprobe_cpu_buffer preparation
optional. It will happen only if either tracefs-based or perf event-based
uprobe/uretprobe consumer is registered for given uprobe/uretprobe. For
BPF-only use cases this step will be skipped.
We used uprobe/uretprobe benchmark which is part of BPF selftests (see [0])
to estimate the improvements. We have 3 uprobe and 3 uretprobe
scenarios, which vary an instruction that is replaced by uprobe: nop
(fastest uprobe case), `push rbp` (typical case), and non-simulated
`ret` instruction (slowest case). Benchmark thread is constantly calling
user space function in a tight loop. User space function has attached
BPF uprobe or uretprobe program doing nothing but atomic counter
increments to count number of triggering calls. Benchmark emits
throughput in millions of executions per second.
Andrii Nakryiko [Mon, 18 Mar 2024 18:17:26 +0000 (11:17 -0700)]
uprobes: encapsulate preparation of uprobe args buffer
Move the logic of fetching temporary per-CPU uprobe buffer and storing
uprobes args into it to a new helper function. Store data size as part
of this buffer, simplifying interfaces a bit, as now we only pass single
uprobe_cpu_buffer reference around, instead of pointer + dsize.
This logic was duplicated across uprobe_dispatcher and uretprobe_dispatcher,
and now will be centralized. All this is also in preparation to make
this uprobe_cpu_buffer handling logic optional in the next patch.
Merge tag 'sched-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
- Fix EEVDF corner cases
- Fix two nohz_full= related bugs that can cause boot crashes
and warnings
* tag 'sched-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/isolation: Fix boot crash when maxcpus < first housekeeping CPU
sched/isolation: Prevent boot crash when the boot CPU is nohz_full
sched/eevdf: Prevent vlag from going out of bounds in reweight_eevdf()
sched/eevdf: Fix miscalculation in reweight_entity() when se is not curr
sched/eevdf: Always update V if se->on_rq when reweighting
Merge tag 'x86-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Make the CPU_MITIGATIONS=n interaction with conflicting
mitigation-enabling boot parameters a bit saner.
- Re-enable CPU mitigations by default on non-x86
- Fix TDX shared bit propagation on mprotect()
- Fix potential show_regs() system hang when PKE initialization
is not fully finished yet.
- Add the 0x10-0x1f model IDs to the Zen5 range
- Harden #VC instruction emulation some more
* tag 'x86-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu: Ignore "mitigations" kernel parameter if CPU_MITIGATIONS=n
cpu: Re-enable CPU mitigations by default for !X86 architectures
x86/tdx: Preserve shared bit on mprotect()
x86/cpu: Fix check for RDPKRU in __show_regs()
x86/CPU/AMD: Add models 0x10-0x1f to the Zen5 range
x86/sev: Check for MWAITX and MONITORX opcodes in the #VC handler
sched/isolation: Fix boot crash when maxcpus < first housekeeping CPU
housekeeping_setup() checks cpumask_intersects(present, online) to ensure
that the kernel will have at least one housekeeping CPU after smp_init(),
but this doesn't work if the maxcpus= kernel parameter limits the number of
processors available after bootup.
For example, a kernel with "maxcpus=2 nohz_full=0-2" parameters crashes at
boot time on a virtual machine with 4 CPUs.
Change housekeeping_setup() to use cpumask_first_and() and check that the
returned CPU number is valid and less than setup_max_cpus.
Another corner case is "nohz_full=0" on a machine with a single CPU or with
the maxcpus=1 kernel argument. In this case non_housekeeping_mask is empty
and tick_nohz_full_setup() makes no sense. And indeed, the kernel hits the
WARN_ON(tick_nohz_full_running) in tick_sched_do_timer().
And how should the kernel interpret the "nohz_full=" parameter? It should
be silently ignored, but currently cpulist_parse() happily returns the
empty cpumask and this leads to the same problem.
Change housekeeping_setup() to check cpumask_empty(non_housekeeping_mask)
and do nothing in this case.
sched/isolation: Prevent boot crash when the boot CPU is nohz_full
Documentation/timers/no_hz.rst states that the "nohz_full=" mask must not
include the boot CPU, which is no longer true after:
08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full").
However after:
aae17ebb53cd ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")
the kernel will crash at boot time in this case; housekeeping_any_cpu()
returns an invalid CPU number until smp_init() brings the first
housekeeping CPU up.
Change housekeeping_any_cpu() to check the result of cpumask_any_and() and
return smp_processor_id() in this case.
This is just the simple and backportable workaround which fixes the
symptom, but smp_processor_id() at boot time should be safe at least for
type == HK_TYPE_TIMER, this more or less matches the tick_do_timer_boot_cpu
logic.
There is no worry about cpu_down(); tick_nohz_cpu_down() will not allow to
offline tick_do_timer_cpu (the 1st online housekeeping CPU).
Fixes: aae17ebb53cd ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work") Reported-by: Chris von Recklinghausen <crecklin@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Phil Auld <pauld@redhat.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240411143905.GA19288@redhat.com Closes: https://lore.kernel.org/all/20240402105847.GA24832@redhat.com/
Merge tag 'rust-fixes-6.9' of https://github.com/Rust-for-Linux/linux
Pull Rust fixes from Miguel Ojeda:
- Soundness: make internal functions generated by the 'module!' macro
inaccessible, do not implement 'Zeroable' for 'Infallible' and
require 'Send' for the 'Module' trait.
- Build: avoid errors with "empty" files and workaround 'rustdoc' ICE.
- Kconfig: depend on '!CFI_CLANG' and avoid selecting 'CONSTRUCTORS'.
- Code docs: remove non-existing key from 'module!' macro example.
- Docs: trivial rendering fix in arch table.
* tag 'rust-fixes-6.9' of https://github.com/Rust-for-Linux/linux:
rust: remove `params` from `module` macro example
kbuild: rust: force `alloc` extern to allow "empty" Rust files
kbuild: rust: remove unneeded `@rustc_cfg` to avoid ICE
rust: kernel: require `Send` for `Module` implementations
rust: phy: implement `Send` for `Registration`
rust: make mutually exclusive with CFI_CLANG
rust: macros: fix soundness issue in `module!` macro
rust: init: remove impl Zeroable for Infallible
docs: rust: fix improper rendering in Arch Support page
rust: don't select CONSTRUCTORS
Merge tag 'riscv-for-linus-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for TASK_SIZE on rv64/NOMMU, to reflect the lack of user/kernel
separation
- A fix to avoid loading rv64/NOMMU kernel past the start of RAM
- A fix for RISCV_HWPROBE_EXT_ZVFHMIN on ilp32 to avoid signed integer
overflow in the bitmask
- The sud_test kselftest has been fixed to properly swizzle the syscall
number into the return register, which are not the same on RISC-V
- A fix for a build warning in the perf tools on rv32
- A fix for the CBO selftests, to avoid non-constants leaking into the
inline asm
- A pair of fixes for T-Head PBMT errata probing, which has been
renamed MAE by the vendor
* tag 'riscv-for-linus-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: selftests: cbo: Ensure asm operands match constraints, take 2
perf riscv: Fix the warning due to the incompatible type
riscv: T-Head: Test availability bit before enabling MAE errata
riscv: thead: Rename T-Head PBMT to MAE
selftests: sud_test: return correct emulated syscall value on RISC-V
riscv: hwprobe: fix invalid sign extension for RISCV_HWPROBE_EXT_ZVFHMIN
riscv: Fix loading 64-bit NOMMU kernels past the start of RAM
riscv: Fix TASK_SIZE on 64-bit NOMMU
Merge tag 'i2c-for-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Fix a race condition in the at24 eeprom handler, a NULL pointer
exception in the I2C core for controllers only using target modes,
drop a MAINTAINERS entry, and fix an incorrect DT binding for at24"
* tag 'i2c-for-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: smbus: fix NULL function pointer dereference
MAINTAINERS: Drop entry for PCA9541 bus master selector
eeprom: at24: fix memory corruption race condition
dt-bindings: eeprom: at24: Fix ST M24C64-D compatible schema
Merge tag 'soundwire-6.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire fix from Vinod Koul:
- Single AMD driver fix for wake interrupt handling in clockstop mode
* tag 'soundwire-6.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: amd: fix for wake interrupt handling for clockstop mode
Wolfram Sang [Fri, 26 Apr 2024 06:44:08 +0000 (08:44 +0200)]
i2c: smbus: fix NULL function pointer dereference
Baruch reported an OOPS when using the designware controller as target
only. Target-only modes break the assumption of one transfer function
always being available. Fix this by always checking the pointer in
__i2c_transfer.
Reported-by: Baruch Siach <baruch@tkos.co.il> Closes: https://lore.kernel.org/r/4269631780e5ba789cf1ae391eec1b959def7d99.1712761976.git.baruch@tkos.co.il Fixes: 4b1acc43331d ("i2c: core changes for slave support")
[wsa: dropped the simplification in core-smbus to avoid theoretical regressions] Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Tested-by: Baruch Siach <baruch@tkos.co.il>
Merge tag 'soc-fixes-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"There are a lot of minor DT fixes for Mediatek, Rockchip, Qualcomm and
Microchip and NXP, addressing both build-time warnings and bugs found
during runtime testing.
Most of these changes are machine specific fixups, but there are a few
notable regressions that affect an entire SoC:
- The Qualcomm MSI support that was improved for 6.9 ended up being
wrong on some chips and now gets fixed.
- The i.MX8MP camera interface broke due to a typo and gets updated
again.
The main driver fix is also for Qualcomm platforms, rewriting an
interface in the QSEECOM firmware support that could lead to crashing
the kernel from a trusted application.
The only other code changes are minor fixes for Mediatek SoC drivers"
* tag 'soc-fixes-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (50 commits)
ARM: dts: imx6ull-tarragon: fix USB over-current polarity
soc: mediatek: mtk-socinfo: depends on CONFIG_SOC_BUS
soc: mediatek: mtk-svs: Append "-thermal" to thermal zone names
arm64: dts: imx8mp: Fix assigned-clocks for second CSI2
ARM: dts: microchip: at91-sama7g54_curiosity: Replace regulator-suspend-voltage with the valid property
ARM: dts: microchip: at91-sama7g5ek: Replace regulator-suspend-voltage with the valid property
arm64: dts: rockchip: Fix USB interface compatible string on kobol-helios64
arm64: dts: qcom: sc8180x: Fix ss_phy_irq for secondary USB controller
arm64: dts: qcom: sm8650: Fix the msi-map entries
arm64: dts: qcom: sm8550: Fix the msi-map entries
arm64: dts: qcom: sm8450: Fix the msi-map entries
arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP
arm64: dts: qcom: x1e80100: Fix the compatible for cluster idle states
arm64: dts: qcom: Fix type of "wdog" IRQs for remoteprocs
arm64: dts: rockchip: regulator for sd needs to be always on for BPI-R2Pro
dt-bindings: rockchip: grf: Add missing type to 'pcie-phy' node
arm64: dts: rockchip: drop redundant disable-gpios in Lubancat 2
arm64: dts: rockchip: drop redundant disable-gpios in Lubancat 1
arm64: dts: rockchip: drop redundant pcie-reset-suspend in Scarlet Dumo
arm64: dts: rockchip: mark system power controller and fix typo on orangepi-5-plus
...
Merge tag 'mm-hotfixes-stable-2024-04-26-13-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"11 hotfixes. 8 are cc:stable and the remaining 3 (nice ratio!) address
post-6.8 issues or aren't considered suitable for backporting.
All except one of these are for MM. I see no particular theme - it's
singletons all over"
* tag 'mm-hotfixes-stable-2024-04-26-13-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio()
selftests: mm: protection_keys: save/restore nr_hugepages value from launch script
stackdepot: respect __GFP_NOLOCKDEP allocation flag
hugetlb: check for anon_vma prior to folio allocation
mm: zswap: fix shrinker NULL crash with cgroup_disable=memory
mm: turn folio_test_hugetlb into a PageType
mm: support page_mapcount() on page_has_type() pages
mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros
mm/hugetlb: fix missing hugetlb_lock for resv uncharge
selftests: mm: fix unused and uninitialized variable warning
selftests/harness: remove use of LINE_MAX
Merge tag 'mtd/fixes-for-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull MTD fixes from Miquel Raynal:
"There has been OTP support improvements in the NVMEM subsystem, and
later also improvements of OTP support in the NAND subsystem. This
lead to situations that we currently cannot handle, so better prevent
this situation from happening in order to avoid canceling device's
probe.
In the raw NAND subsystem, two runtime fixes have been shared, one
fixing two important commands in the Qcom driver since it got reworked
and a NULL pointer dereference happening on STB chips.
Arnd also fixed a UBSAN link failure on diskonchip"
* tag 'mtd/fixes-for-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: limit OTP NVMEM cell parse to non-NAND devices
mtd: diskonchip: work around ubsan link failure
mtd: rawnand: qcom: Fix broken OP_RESET_DEVICE command in qcom_misc_cmd_type_exec()
mtd: rawnand: brcmnand: Fix data access violation for STB chip
Merge tag 'gpio-fixes-for-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix a regression in pin access control in gpio-tegra186
- make data pointer dereference robust in Intel Tangier driver
* tag 'gpio-fixes-for-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: tegra186: Fix tegra186_gpio_is_accessible() check
gpio: tangier: Use correct type for the IRQ chip data
Merge tag 'cxl-fixes-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fix from Dave Jiang:
- Fix potential payload size confusion in cxl_mem_get_poison()
* tag 'cxl-fixes-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/core: Fix potential payload size confusion in cxl_mem_get_poison()
Merge tag 'for-6.9/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix 6.9 regression so that DM device removal is performed
synchronously by default.
Asynchronous removal has always been possible but it isn't the
default. It is important that synchronous removal be preserved,
otherwise it is an interface change that breaks lvm2.
- Remove errant semicolon in drivers/md/dm-vdo/murmurhash3.c
* tag 'for-6.9/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: restore synchronous close of device mapper block device
dm vdo murmurhash: remove unneeded semicolon
Merge tag 'vfs-6.9-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains a few small fixes for this merge window and the attempt
to handle the ntfs removal regression that was reported a little while
ago:
- After the removal of the legacy ntfs driver we received reports
about regressions for some people that do mount "ntfs" explicitly
and expect the driver to be available. Since ntfs3 is a drop-in for
legacy ntfs we alias legacy ntfs to ntfs3 just like ext3 is aliased
to ext4.
We also enforce legacy ntfs is always mounted read-only and give it
custom file operations to ensure that ioctl()'s can't be abused to
perform write operations.
- Fix an unbalanced module_get() in bdev_open().
- Two smaller fixes for the netfs work done earlier in this cycle.
- Fix the errno returned from the new FS_IOC_GETUUID and
FS_IOC_GETFSSYSFSPATH ioctls. Both commands just pull information
out of the superblock so there's no need to call into the actual
ioctl handlers.
So instead of returning ENOIOCTLCMD to indicate to fallback we just
return ENOTTY directly avoiding that indirection"
* tag 'vfs-6.9-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
netfs: Fix the pre-flush when appending to a file in writethrough mode
netfs: Fix writethrough-mode error handling
ntfs3: add legacy ntfs file operations
ntfs3: enforce read-only when used as legacy ntfs driver
ntfs3: serve as alias for the legacy ntfs driver
block: fix module reference leakage from bdev_open_by_dev error path
fs: Return ENOTTY directly if FS_IOC_GETUUID or FS_IOC_GETFSSYSFSPATH fail
Merge tag 'loongarch-fixes-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Fix some build errors and some trivial runtime bugs"
* tag 'loongarch-fixes-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Lately init pmu after smp is online
LoongArch: Fix callchain parse error with kernel tracepoint events
LoongArch: Fix access error when read fault on a write-only VMA
LoongArch: Fix a build error due to __tlb_remove_tlb_entry()
LoongArch: Fix Kconfig item and left code related to CRASH_CORE
Merge tag 'pwm/for-6.9-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull maintainer entry update from Uwe Kleine-König:
"This is just an update to my maintainer entries as I will switch jobs
soon. Getting a contact email address into the MAINTAINERS file that
will work also after my switch will hopefully reduce people mailing to
the then non-existing address.
I also drop my co-maintenance for SIOX, but that continues to be in
good hands"
* tag 'pwm/for-6.9-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
MAINTAINERS: Update Uwe's email address, drop SIOX maintenance
Merge tag 'drm-fixes-2024-04-26' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Regular weekly merge request, mostly amdgpu and misc bits in
xe/etnaviv/gma500 and some core changes. Nothing too outlandish, seems
to be about normal for this time of release.
atomic-helpers:
- Fix memory leak in drm_format_conv_state_copy()
fbdev:
- fbdefio: Fix address calculation
amdgpu:
- Suspend/resume fix
- Don't expose gpu_od directory if it's empty
- SDMA 4.4.2 fix
- VPE fix
- BO eviction fix
- UMSCH fix
- SMU 13.0.6 reset fixes
- GPUVM flush accounting fix
- SDMA 5.2 fix
- Fix possible UAF in mes code
* tag 'drm-fixes-2024-04-26' of https://gitlab.freedesktop.org/drm/kernel: (23 commits)
Revert "drm/etnaviv: Expose a few more chipspecs to userspace"
drm/etnaviv: fix tx clock gating on some GC7000 variants
drm/xe/guc: Fix arguments passed to relay G2H handlers
drm/xe: call free_gsc_pkt only once on action add failure
drm/xe: Remove sysfs only once on action add failure
fbdev: fix incorrect address computation in deferred IO
drm/amdgpu/mes: fix use-after-free issue
drm/amdgpu/sdma5.2: use legacy HDP flush for SDMA2/3
drm/amdgpu: Fix the ring buffer size for queue VM flush
drm/amdkfd: Add VRAM accounting for SVM migration
drm/amd/pm: Restore config space after reset
drm/amdgpu/umsch: don't execute umsch test when GPU is in reset/suspend
drm/amdkfd: Fix rescheduling of restore worker
drm/amdgpu: Update BO eviction priorities
drm/amdgpu/vpe: fix vpe dpm setup failed
drm/amdgpu: Assign correct bits for SDMA HDP flush
drm/amdgpu/pm: Remove gpu_od if it's an empty directory
drm/amdkfd: make sure VM is ready for updating operations
drm/amdgpu: Fix leak when GPU memory allocation fails
drm/amdkfd: Fix eviction fence handling
...
Merge patch series "RISC-V: Test th.sxstatus.MAEE bit before enabling MAEE"
Christoph Müllner <christoph.muellner@vrull.eu> says:
Currently, the Linux kernel suffers from a boot regression when running
on the c906 QEMU emulation. Details have been reported here by Björn Töpel:
https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg04766.html
The main issue is, that Linux enables XTheadMae for CPUs that have a T-Head
mvendorid but QEMU maintainers don't want to emulate a CPU that uses
reserved bits in PTEs. See also the following discussion for more
context:
https://lists.gnu.org/archive/html/qemu-devel/2024-02/msg00775.html
This series renames "T-Head PBMT" to "MAE"/"XTheadMae" and only enables
it if the th.sxstatus.MAEE bit is set.
The th.sxstatus CSR is documented here:
https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadsxstatus.adoc
XTheadMae is documented here:
https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadmae.adoc
The QEMU patch to emulate th.sxstatus with the MAEE bit not set is here:
https://lore.kernel.org/all/20240329120427.684677-1-christoph.muellner@vrull.eu/
After applying the referenced QEMU patch, this patchset allows to
successfully boot a C906 QEMU system emulation ("-cpu thead-c906").
* b4-shazam-lts:
riscv: T-Head: Test availability bit before enabling MAE errata
riscv: thead: Rename T-Head PBMT to MAE
Andrew Jones [Fri, 22 Mar 2024 13:47:28 +0000 (14:47 +0100)]
RISC-V: selftests: cbo: Ensure asm operands match constraints, take 2
Commit 0de65288d75f ("RISC-V: selftests: cbo: Ensure asm operands
match constraints") attempted to ensure MK_CBO() would always
provide to a compile-time constant when given a constant, but
cpu_to_le32() isn't necessarily going to do that. Switch to manually
shifting the bytes, when needed, to finally get this right.
perf riscv: Fix the warning due to the incompatible type
In the 32-bit platform, the second argument of getline is expectd to be
'size_t *'(aka 'unsigned int *'), but line_sz is of type
'unsigned long *'. Therefore, declare line_sz as size_t.
Merge tag 'qcom-drivers-fixes-for-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into for-next
Qualcomm driver fix for v6.9
This reworks the memory layout of the argument buffers passed to trusted
applications in QSEECOM, to avoid failures and system crashes.
* tag 'qcom-drivers-fixes-for-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
firmware: qcom: uefisecapp: Fix memory related IO errors and crashes
Merge tag 'imx-fixes-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into for-next
i.MX fixes for 6.9, round 2:
- Fix i.MX8MP the second CSI2 assigned-clock property which got wrong by
commit f78835d1e616 ("arm64: dts: imx8mp: reparent MEDIA_MIPI_PHY1_REF
to CLK_24M")
- Correct USB over-current polarity for imx6ull-tarragon board
* tag 'imx-fixes-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: imx6ull-tarragon: fix USB over-current polarity
arm64: dts: imx8mp: Fix assigned-clocks for second CSI2
Merge tag 'mtk-dts64-fixes-for-v6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux into for-next
MediaTek ARM64 DTS fixes for v6.9
This fixes some dts validation issues against bindings for multiple SoCs,
GPU voltage constraints for Chromebook devices, missing gce-client-reg
on various nodes (performance issues) on MT8183/92/95, and also fixes
boot issues on MT8195 when SPMI is built as module.
* tag 'mtk-dts64-fixes-for-v6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux:
arm64: dts: mediatek: mt2712: fix validation errors
arm64: dts: mediatek: mt7986: prefix BPI-R3 cooling maps with "map-"
arm64: dts: mediatek: mt7986: drop invalid thermal block clock
arm64: dts: mediatek: mt7986: drop "#reset-cells" from Ethernet controller
arm64: dts: mediatek: mt7986: drop invalid properties from ethsys
arm64: dts: mediatek: mt7622: drop "reset-names" from thermal block
arm64: dts: mediatek: mt7622: fix ethernet controller "compatible"
arm64: dts: mediatek: mt7622: fix IR nodename
arm64: dts: mediatek: mt7622: fix clock controllers
arm64: dts: mediatek: mt8186-corsola: Update min voltage constraint for Vgpu
arm64: dts: mediatek: mt8183-kukui: Use default min voltage for MT6358
arm64: dts: mediatek: mt8195-cherry: Update min voltage constraint for MT6315
arm64: dts: mediatek: mt8192-asurada: Update min voltage constraint for MT6315
arm64: dts: mediatek: cherry: Describe CPU supplies
arm64: dts: mediatek: mt8195: Add missing gce-client-reg to mutex1
arm64: dts: mediatek: mt8195: Add missing gce-client-reg to mutex
arm64: dts: mediatek: mt8195: Add missing gce-client-reg to vpp/vdosys
arm64: dts: mediatek: mt8192: Add missing gce-client-reg to mutex
arm64: dts: mediatek: mt8183: Add power-domains properity to mfgcfg
Merge tag 'at91-fixes-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into for-next
AT91 fixes for 6.9
It contains:
- fixes for regulator nodes on SAMA7G5 based boards: proper DT property is used
to setup regulators suspend voltage.
* tag 'at91-fixes-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
ARM: dts: microchip: at91-sama7g54_curiosity: Replace regulator-suspend-voltage with the valid property
ARM: dts: microchip: at91-sama7g5ek: Replace regulator-suspend-voltage with the valid property
Merge tag 'qcom-arm64-fixes-for-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into for-next
Qualcomm Arm64 DeviceTree fixes for v6.9
This corrects the watchdog IRQ flags for a number of remoteproc
instances, which otherwise prevents the driver from probe in the face of
a probe deferral.
Improvements in other areas, such as USB, have made it possible for CX
rail voltage on SC8280XP to be lowered, no longer meeting requirements
of active PCIe controllers. Necessary votes are added to these
controllers.
The MSI definitions for PCIe controllers in SM8450, SM8550, and SM8650
was incorrect, due to a bug in the driver. As this has now been fixed
the definition needs to be corrected.
Lastly, the SuperSpeed PHY irq of the second USB controller in SC8180x,
and the compatible string for X1 Elite domain idle states are corrected.
* tag 'qcom-arm64-fixes-for-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: dts: qcom: sc8180x: Fix ss_phy_irq for secondary USB controller
arm64: dts: qcom: sm8650: Fix the msi-map entries
arm64: dts: qcom: sm8550: Fix the msi-map entries
arm64: dts: qcom: sm8450: Fix the msi-map entries
arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP
arm64: dts: qcom: x1e80100: Fix the compatible for cluster idle states
arm64: dts: qcom: Fix type of "wdog" IRQs for remoteprocs
Merge branch 'v6.9-armsoc/dtsfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into for-next
* 'v6.9-armsoc/dtsfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: Fix USB interface compatible string on kobol-helios64
arm64: dts: rockchip: regulator for sd needs to be always on for BPI-R2Pro
dt-bindings: rockchip: grf: Add missing type to 'pcie-phy' node
arm64: dts: rockchip: drop redundant disable-gpios in Lubancat 2
arm64: dts: rockchip: drop redundant disable-gpios in Lubancat 1
arm64: dts: rockchip: drop redundant pcie-reset-suspend in Scarlet Dumo
arm64: dts: rockchip: mark system power controller and fix typo on orangepi-5-plus
arm64: dts: rockchip: Designate the system power controller on QuartzPro64
arm64: dts: rockchip: drop panel port unit address in GRU Scarlet
arm64: dts: rockchip: Remove unsupported node from the Pinebook Pro dts
arm64: dts: rockchip: Fix the i2c address of es8316 on Cool Pi CM5
arm64: dts: rockchip: add regulators for PCIe on RK3399 Puma Haikou
arm64: dts: rockchip: enable internal pull-up on PCIE_WAKE# for RK3399 Puma
arm64: dts: rockchip: enable internal pull-up on Q7_USB_ID for RK3399 Puma
arm64: dts: rockchip: fix alphabetical ordering RK3399 puma
arm64: dts: rockchip: enable internal pull-up for Q7_THRM# on RK3399 Puma
arm64: dts: rockchip: set PHY address of MT7531 switch to 0x1f
David Howells [Fri, 26 Apr 2024 11:15:15 +0000 (12:15 +0100)]
netfs: Fix the pre-flush when appending to a file in writethrough mode
In netfs_perform_write(), when the file is marked NETFS_ICTX_WRITETHROUGH
or O_*SYNC or RWF_*SYNC was specified, write-through caching is performed
on a buffered file. When setting up for write-through, we flush any
conflicting writes in the region and wait for the write to complete,
failing if there's a write error to return.
The issue arises if we're writing at or above the EOF position because we
skip the flush and - more importantly - the wait. This becomes a problem
if there's a partial folio at the end of the file that is being written out
and we want to make a write to it too. Both the already-running write and
the write we start both want to clear the writeback mark, but whoever is
second causes a warning looking something like:
------------[ cut here ]------------
R=00000012: folio 11 is not under writeback
WARNING: CPU: 34 PID: 654 at fs/netfs/write_collect.c:105
...
CPU: 34 PID: 654 Comm: kworker/u386:27 Tainted: G S ...
...
Workqueue: events_unbound netfs_write_collection_worker
...
RIP: 0010:netfs_writeback_lookup_folio
Fix this by making the flush-and-wait unconditional. It will do nothing if
there are no folios in the pagecache and will return quickly if there are
no folios in the region specified.
Further, move the WBC attachment above the flush call as the flush is going
to attach a WBC and detach it again if it is not present - and since we
need one anyway we might as well share it.
Fixes: 41d8e7673a77 ("netfs: Implement a write-through caching option") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202404161031.468b84f-oliver.sang@intel.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/2150448.1714130115@warthog.procyon.org.uk Reviewed-by: Jeffrey Layton <jlayton@kernel.org>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
MAINTAINERS: Update Uwe's email address, drop SIOX maintenance
In the context of changing my career path, my Pengutronix email address
will soon stop to be available to me. Update the PWM maintainer entry to
my kernel.org identity.
I drop my co-maintenance of SIOX. Thorsten will continue to care for
it with the support of the Pengutronix kernel team.
MAINTAINERS: Drop entry for PCA9541 bus master selector
I no longer have access to PCA9541 hardware, and I am no longer involved
in related development. Listing me as PCA9541 maintainer does not make
sense anymore. Remove PCA9541 from MAINTAINERS to let its support default
to the generic I2C multiplexer entry.
Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Peter Rosin <peda@axentia.se> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Merge tag '9p-for-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull 9p fix from Eric Van Hensbergen:
"This contains a single mitigation to help deal with an apparent race
condition between client and server having to deal with inode number
collisions"
* tag '9p-for-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
fs/9p: mitigate inode collisions
Merge tag 'acpi-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix three recent regressions, one introduced while enabling a
new platform firmware feature for power management, and two introduced
by a recent CPPC library update.
Specifics:
- Allow two overlapping Low-Power S0 Idle _DSM function sets to be
used at the same time (Rafael Wysocki)
- Fix bit offset computation in MASK_VAL() macro used for applying a
bitmask to a new CPPC register value (Jarred White)
- Fix access width field usage for PCC registers in CPPC (Vanshidhar
Konda)"
* tag 'acpi-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PM: s2idle: Evaluate all Low-Power S0 Idle _DSM functions
ACPI: CPPC: Fix access width used for PCC registers
ACPI: CPPC: Fix bit_offset shift in MASK_VAL() macro
Merge tag 'net-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, wireless and bluetooth.
Nothing major, regression fixes are mostly in drivers, two more of
those are flowing towards us thru various trees. I wish some of the
changes went into -rc5, we'll try to keep an eye on frequency of PRs
from sub-trees.
Also disproportional number of fixes for bugs added in v6.4, strange
coincidence.
Current release - regressions:
- igc: fix LED-related deadlock on driver unbind
- wifi: mac80211: small fixes to recent clean up of the connection
process
- Revert "wifi: iwlwifi: bump FW API to 90 for BZ/SC devices", kernel
doesn't have all the code to deal with that version, yet
- Bluetooth:
- set power_ctrl_enabled on NULL returned by gpiod_get_optional()
- qca: fix invalid device address check, again
- Bluetooth:
- lots of fixes for the command submission rework from v6.4
- qca: fix NULL-deref on non-serdev suspend
Misc:
- tools: ynl: don't ignore errors in NLMSG_DONE messages"
* tag 'net-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().
net: b44: set pause params only when interface is up
tls: fix lockless read of strp->msg_ready in ->poll
dpll: fix dpll_pin_on_pin_register() for multiple parent pins
net: ravb: Fix registered interrupt names
octeontx2-af: fix the double free in rvu_npc_freemem()
net: ethernet: ti: am65-cpts: Fix PTPv1 message type on TX packets
ice: fix LAG and VF lock dependency in ice_reset_vf()
iavf: Fix TC config comparison with existing adapter TC config
i40e: Report MFS in decimal base instead of hex
i40e: Do not use WQ_MEM_RECLAIM flag for workqueue
net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()
net/mlx5e: Advertise mlx5 ethernet driver updates sk_buff md_dst for MACsec
macsec: Detect if Rx skb is macsec-related for offloading devices that update md_dst
ethernet: Add helper for assigning packet type when dest address does not match device address
macsec: Enable devices to advertise whether they update sk_buff md_dst during offloads
net: phy: dp83869: Fix MII mode failure
netfilter: nf_tables: honor table dormant flag from netdev release event path
eth: bnxt: fix counting packets discarded due to OOM and netpoll
igc: Fix LED-related deadlock on driver unbind
...
riscv: T-Head: Test availability bit before enabling MAE errata
T-Head's memory attribute extension (XTheadMae) (non-compatible
equivalent of RVI's Svpbmt) is currently assumed for all T-Head harts.
However, QEMU recently decided to drop acceptance of guests that write
reserved bits in PTEs.
As XTheadMae uses reserved bits in PTEs and Linux applies the MAE errata
for all T-Head harts, this broke the Linux startup on QEMU emulations
of the C906 emulation.
This patch attempts to address this issue by testing the MAE-enable bit
in the th.sxstatus CSR. This CSR is available in HW and can be
emulated in QEMU.
This patch also makes the XTheadMae probing mechanism reliable, because
a test for the right combination of mvendorid, marchid, and mimpid
is not sufficient to enable MAE.
After git bisecting and digging into the code, I believe the root cause is
that _deferred_list field of folio is unioned with _hugetlb_subpool field.
In __update_and_free_hugetlb_folio(), folio->_deferred_list is
initialized leading to corrupted folio->_hugetlb_subpool when folio is
hugetlb. Later free_huge_folio() will use _hugetlb_subpool and above
warning happens.
But it is assumed hugetlb flag must have been cleared when calling
folio_put() in update_and_free_hugetlb_folio(). This assumption is broken
due to below race:
CPU1 CPU2
dissolve_free_huge_page update_and_free_pages_bulk
update_and_free_hugetlb_folio hugetlb_vmemmap_restore_folios
folio_clear_hugetlb_vmemmap_optimized
clear_flag = folio_test_hugetlb_vmemmap_optimized
if (clear_flag) <-- False, it's already cleared.
__folio_clear_hugetlb(folio) <-- Hugetlb is not cleared.
folio_put
free_huge_folio <-- free_the_page is expected.
list_for_each_entry()
__folio_clear_hugetlb <-- Too late.
Fix this issue by checking whether folio is hugetlb directly instead of
checking clear_flag to close the race window.
Link: https://lkml.kernel.org/r/20240419085819.1901645-1-linmiaohe@huawei.com Fixes: 32c877191e02 ("hugetlb: do not clear hugetlb dtor until allocating vmemmap") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
selftests: mm: protection_keys: save/restore nr_hugepages value from launch script
The save/restore of nr_hugepages was added to the test itself by using the
atexit() functionality. But it is broken as parent exits after creating
child. Hence calling the atexit() function early. That's not it. The
child exits after creating its child and so on.
The parent cannot wait to get the termination status for its children as
it'll keep on holding the resources until the new pkey allocation fails.
It is impossible to wait for exits of all the grand and great grand
children. Hence the restoring of nr_hugepages value from parent is wrong.
Let's save/restore the nr_hugepages settings in the launch script
instead of doing it in the test.
Link: https://lkml.kernel.org/r/20240419115027.3848958-1-usama.anjum@collabora.com Fixes: c52eb6db7b7d ("selftests: mm: restore settings from only parent process") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Reported-by: Joey Gouly <joey.gouly@arm.com> Closes: https://lore.kernel.org/all/20240418125250.GA2941398@e124191.cambridge.arm.com Cc: Joey Gouly <joey.gouly@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Clément Léger [Wed, 6 Dec 2023 13:44:37 +0000 (14:44 +0100)]
selftests: sud_test: return correct emulated syscall value on RISC-V
Currently, the sud_test expects the emulated syscall to return the
emulated syscall number. This assumption only works on architectures
were the syscall calling convention use the same register for syscall
number/syscall return value. This is not the case for RISC-V and thus
the return value must be also emulated using the provided ucontext.
Merge tag 'nfsd-6.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Revert some backchannel fixes that went into v6.9-rc
* tag 'nfsd-6.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
Revert "NFSD: Convert the callback workqueue to use delayed_work"
Revert "NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down"
Merge tag 'for-linus-2024042501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Benjamin Tissoires:
- A couple of i2c-hid fixes (Kenny Levinsen & Nam Cao)
- A config issue with mcp-2221 when CONFIG_IIO is not enabled
(Abdelrahman Morsy)
- A dev_err fix in intel-ish-hid (Zhang Lixu)
- A couple of mouse fixes for both nintendo and Logitech-dj (Nuno
Pereira and Yaraslau Furman)
- I'm changing my main kernel email address as it's way simpler for me
than the Red Hat one (Benjamin Tissoires)
* tag 'for-linus-2024042501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: mcp-2221: cancel delayed_work only when CONFIG_IIO is enabled
HID: logitech-dj: allow mice to use all types of reports
HID: i2c-hid: Revert to await reset ACK before reading report descriptor
HID: nintendo: Fix N64 controller being identified as mouse
MAINTAINERS: update Benjamin's email address
HID: intel-ish-hid: ipc: Fix dev_err usage with uninitialized dev->devc
HID: i2c-hid: remove I2C_HID_READ_PENDING flag to prevent lock-up
Sergei Antonov [Mon, 22 Apr 2024 15:36:07 +0000 (18:36 +0300)]
mmc: moxart: fix handling of sgm->consumed, otherwise WARN_ON triggers
When e.g. 8 bytes are to be read, sgm->consumed equals 8 immediately after
sg_miter_next() call. The driver then increments it as bytes are read,
so sgm->consumed becomes 16 and this warning triggers in sg_miter_stop():
WARN_ON(miter->consumed > miter->length);
WARNING: CPU: 0 PID: 28 at lib/scatterlist.c:925 sg_miter_stop+0x2c/0x10c
CPU: 0 PID: 28 Comm: kworker/0:2 Tainted: G W 6.9.0-rc5-dirty #249
Hardware name: Generic DT based system
Workqueue: events_freezable mmc_rescan
Call trace:.
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x44/0x5c
dump_stack_lvl from __warn+0x78/0x16c
__warn from warn_slowpath_fmt+0xb0/0x160
warn_slowpath_fmt from sg_miter_stop+0x2c/0x10c
sg_miter_stop from moxart_request+0xb0/0x468
moxart_request from mmc_start_request+0x94/0xa8
mmc_start_request from mmc_wait_for_req+0x60/0xa8
mmc_wait_for_req from mmc_app_send_scr+0xf8/0x150
mmc_app_send_scr from mmc_sd_setup_card+0x1c/0x420
mmc_sd_setup_card from mmc_sd_init_card+0x12c/0x4dc
mmc_sd_init_card from mmc_attach_sd+0xf0/0x16c
mmc_attach_sd from mmc_rescan+0x1e0/0x298
mmc_rescan from process_scheduled_works+0x2e4/0x4ec
process_scheduled_works from worker_thread+0x1ec/0x24c
worker_thread from kthread+0xd4/0xe0
kthread from ret_from_fork+0x14/0x38
This patch adds initial zeroing of sgm->consumed. It is then incremented
as bytes are read or written.
Signed-off-by: Sergei Antonov <saproj@gmail.com> Cc: Linus Walleij <linus.walleij@linaro.org> Fixes: 3ee0e7c3e67c ("mmc: moxart-mmc: Use sg_miter for PIO") Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20240422153607.963672-1-saproj@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Jakub Kicinski [Thu, 25 Apr 2024 15:46:53 +0000 (08:46 -0700)]
Merge tag 'nf-24-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains two Netfilter/IPVS fixes for net:
Patch #1 fixes SCTP checksumming for IPVS with gso packets,
from Ismael Luceno.
Patch #2 honor dormant flag from netdev event path to fix a possible
double hook unregistration.
* tag 'nf-24-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: honor table dormant flag from netdev release event path
ipvs: Fix checksumming on GSO of SCTP packets
====================
Remove argument `params` from the `module` macro example, because the
macro does not currently support module parameters since it was not sent
with the initial merge.
Miguel Ojeda [Mon, 22 Apr 2024 09:06:44 +0000 (11:06 +0200)]
kbuild: rust: force `alloc` extern to allow "empty" Rust files
If one attempts to build an essentially empty file somewhere in the
kernel tree, it leads to a build error because the compiler does not
recognize the `new_uninit` unstable feature:
The reason is that we pass `-Zcrate-attr='feature(new_uninit)'` (together
with `-Zallow-features=new_uninit`) to let non-`rust/` code use that
unstable feature.
However, the compiler only recognizes the feature if the `alloc` crate
is resolved (the feature is an `alloc` one). `--extern alloc`, which we
pass, is not enough to resolve the crate.
Introducing a reference like `use alloc;` or `extern crate alloc;`
solves the issue, thus this is not seen in normal files. For instance,
`use`ing the `kernel` prelude introduces such a reference, since `alloc`
is used inside.
While normal use of the build system is not impacted by this, it can still
be fairly confusing for kernel developers [1], thus use the unstable
`force` option of `--extern` [2] (added in Rust 1.71 [3]) to force the
compiler to resolve `alloc`.
This new unstable feature is only needed meanwhile we use the other
unstable feature, since then we will not need `-Zcrate-attr`.
Peter Münster [Wed, 24 Apr 2024 13:51:52 +0000 (15:51 +0200)]
net: b44: set pause params only when interface is up
b44_free_rings() accesses b44::rx_buffers (and ::tx_buffers)
unconditionally, but b44::rx_buffers is only valid when the
device is up (they get allocated in b44_open(), and deallocated
again in b44_close()), any other time these are just a NULL pointers.
So if you try to change the pause params while the network interface
is disabled/administratively down, everything explodes (which likely
netifd tries to do).
Link: https://github.com/openwrt/openwrt/issues/13789 Fixes: 1da177e4c3f4 (Linux-2.6.12-rc2) Cc: stable@vger.kernel.org Reported-by: Peter Münster <pm@a16n.net> Suggested-by: Jonas Gorski <jonas.gorski@gmail.com> Signed-off-by: Vaclav Svoboda <svoboda@neng.cz> Tested-by: Peter Münster <pm@a16n.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Peter Münster <pm@a16n.net> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/87y192oolj.fsf@a16n.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
tls: fix lockless read of strp->msg_ready in ->poll
tls_sk_poll is called without locking the socket, and needs to read
strp->msg_ready (via tls_strp_msg_ready). Convert msg_ready to a bool
and use READ_ONCE/WRITE_ONCE where needed. The remaining reads are
only performed when the socket is locked.
dpll: fix dpll_pin_on_pin_register() for multiple parent pins
In scenario where pin is registered with multiple parent pins via
dpll_pin_on_pin_register(..), all belonging to the same dpll device.
A second call to dpll_pin_on_pin_unregister(..) would cause a call trace,
as it tries to use already released registration resources (due to fix
introduced in b446631f355e). In this scenario pin was registered twice,
so resources are not yet expected to be release until each registered
pin/pin pair is unregistered.
Currently, the following crash/call trace is produced when ice driver is
removed on the system with installed E810T NIC which includes dpll device:
Fix by adding a parent pointer as a cookie when creating a registration,
also when searching for it. For the regular pins pass NULL, this allows to
create separated registration for each parent the pin is registered with.
As interrupts are now requested from ravb_probe(), before calling
register_netdev(), ndev->name still contains the template "eth%d",
leading to funny names in /proc/interrupts. E.g. on R-Car E3:
Jason Reeder [Wed, 24 Apr 2024 07:16:26 +0000 (12:46 +0530)]
net: ethernet: ti: am65-cpts: Fix PTPv1 message type on TX packets
The CPTS, by design, captures the messageType (Sync, Delay_Req, etc.)
field from the second nibble of the PTP header which is defined in the
PTPv2 (1588-2008) specification. In the PTPv1 (1588-2002) specification
the first two bytes of the PTP header are defined as the versionType
which is always 0x0001. This means that any PTPv1 packets that are
tagged for TX timestamping by the CPTS will have their messageType set
to 0x0 which corresponds to a Sync message type. This causes issues
when a PTPv1 stack is expecting a Delay_Req (messageType: 0x1)
timestamp that never appears.
Fix this by checking if the ptp_class of the timestamped TX packet is
PTP_CLASS_V1 and then matching the PTP sequence ID to the stored
sequence ID in the skb->cb data structure. If the sequence IDs match
and the packet is of type PTPv1 then there is a chance that the
messageType has been incorrectly stored by the CPTS so overwrite the
messageType stored by the CPTS with the messageType from the skb->cb
data structure. This allows the PTPv1 stack to receive TX timestamps
for Delay_Req packets which are necessary to lock onto a PTP Leader.
Signed-off-by: Jason Reeder <jreeder@ti.com> Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com> Tested-by: Ed Trexel <ed.trexel@hp.com> Fixes: f6bd59526ca5 ("net: ethernet: ti: introduce am654 common platform time sync driver") Link: https://lore.kernel.org/r/20240424071626.32558-1-r-gunasekaran@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Fix isolation of broadcast traffic and unmatched unicast traffic with MACsec offload
Some device drivers support devices that enable them to annotate whether a
Rx skb refers to a packet that was processed by the MACsec offloading
functionality of the device. Logic in the Rx handling for MACsec offload
does not utilize this information to preemptively avoid forwarding to the
macsec netdev currently. Because of this, things like multicast messages or
unicast messages with an unmatched destination address such as ARP requests
are forwarded to the macsec netdev whether the message received was MACsec
encrypted or not. The goal of this patch series is to improve the Rx
handling for MACsec offload for devices capable of annotating skbs received
that were decrypted by the NIC offload for MACsec.
Here is a summary of the issue that occurs with the existing logic today.
* The current design of the MACsec offload handling path tries to use
"best guess" mechanisms for determining whether a packet associated
with the currently handled skb in the datapath was processed via HW
offload
* The best guess mechanism uses the following heuristic logic (in order of
precedence)
- Check if header destination MAC address matches MACsec netdev MAC
address -> forward to MACsec port
- Check if packet is multicast traffic -> forward to MACsec port
- MACsec security channel was able to be looked up from skb offload
context (mlx5 only) -> forward to MACsec port
* Problem: plaintext traffic can potentially solicit a MACsec encrypted
response from the offload device
- Core aspect of MACsec is that it identifies unauthorized LAN connections
and excludes them from communication
+ This behavior can be seen when not enabling offload for MACsec
- The offload behavior violates this principle in MACsec
I believe this behavior is a security bug since applications utilizing
MACsec could be exploited using this behavior, and the correct way to
resolve this is by having the hardware correctly indicate whether MACsec
offload occurred for the packet or not. In the patches in this series, I
leave a warning for when the problematic path occurs because I cannot
figure out a secure way to fix the security issue that applies to the core
MACsec offload handling in the Rx path without breaking MACsec offload for
other vendors.
Shown at the bottom is an example use case where plaintext traffic sent to
a physical port of a NIC configured for MACsec offload is unable to be
handled correctly by the software stack when the NIC provides awareness to
the kernel about whether the received packet is MACsec traffic or not. In
this specific example, plaintext ARP requests are being responded with
MACsec encrypted ARP replies (which leads to routing information being
unable to be built for the requester).
Side 1
ip link del macsec0
ip address flush mlx5_1
ip address add 1.1.1.1/24 dev mlx5_1
ip link set dev mlx5_1 up
ip link add link mlx5_1 macsec0 type macsec sci 1 encrypt on
ip link set dev macsec0 address 00:11:22:33:44:66
ip macsec offload macsec0 mac
ip macsec add macsec0 tx sa 0 pn 1 on key 00 dffafc8d7b9a43d5b9a3dfbbf6a30c16
ip macsec add macsec0 rx sci 2 on
ip macsec add macsec0 rx sci 2 sa 0 pn 1 on key 00 ead3664f508eb06c40ac7104cdae4ce5
ip address flush macsec0
ip address add 2.2.2.1/24 dev macsec0
ip link set dev macsec0 up
# macsec0 enters promiscuous mode.
# This enables all traffic received on macsec_vlan to be processed by
# the macsec offload rx datapath. This however means that traffic
# meant to be received by mlx5_1 will be incorrectly steered to
# macsec0 as well.
ip link add link macsec0 name macsec_vlan type vlan id 1
ip link set dev macsec_vlan address 00:11:22:33:44:88
ip address flush macsec_vlan
ip address add 3.3.3.1/24 dev macsec_vlan
ip link set dev macsec_vlan up
Side 2
ip link del macsec0
ip address flush mlx5_1
ip address add 1.1.1.2/24 dev mlx5_1
ip link set dev mlx5_1 up
ip link add link mlx5_1 macsec0 type macsec sci 2 encrypt on
ip link set dev macsec0 address 00:11:22:33:44:77
ip macsec offload macsec0 mac
ip macsec add macsec0 tx sa 0 pn 1 on key 00 ead3664f508eb06c40ac7104cdae4ce5
ip macsec add macsec0 rx sci 1 on
ip macsec add macsec0 rx sci 1 sa 0 pn 1 on key 00 dffafc8d7b9a43d5b9a3dfbbf6a30c16
ip address flush macsec0
ip address add 2.2.2.2/24 dev macsec0
ip link set dev macsec0 up
# macsec0 enters promiscuous mode.
# This enables all traffic received on macsec_vlan to be processed by
# the macsec offload rx datapath. This however means that traffic
# meant to be received by mlx5_1 will be incorrectly steered to
# macsec0 as well.
ip link add link macsec0 name macsec_vlan type vlan id 1
ip link set dev macsec_vlan address 00:11:22:33:44:99
ip address flush macsec_vlan
ip address add 3.3.3.2/24 dev macsec_vlan
ip link set dev macsec_vlan up
Side 1
ping -I mlx5_1 1.1.1.2
PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 mlx5_1: 56(84) bytes of data.
From 1.1.1.1 icmp_seq=1 Destination Host Unreachable
ping: sendmsg: No route to host
From 1.1.1.1 icmp_seq=2 Destination Host Unreachable
From 1.1.1.1 icmp_seq=3 Destination Host Unreachable
Changes:
v2->v3:
* Made dev paramater const for eth_skb_pkt_type helper as suggested by Sabrina
Dubroca <sd@queasysnail.net>
v1->v2:
* Fixed series subject to detail the issue being fixed
* Removed strange characters from cover letter
* Added comment in example that illustrates the impact involving
promiscuous mode
* Added patch for generalizing packet type detection
* Added Fixes: tags and targeting net
* Removed pointless warning in the heuristic Rx path for macsec offload
* Applied small refactor in Rx path offload to minimize scope of rx_sc
local variable
Jacob Keller [Tue, 23 Apr 2024 18:27:20 +0000 (11:27 -0700)]
ice: fix LAG and VF lock dependency in ice_reset_vf()
9f74a3dfcf83 ("ice: Fix VF Reset paths when interface in a failed over
aggregate"), the ice driver has acquired the LAG mutex in ice_reset_vf().
The commit placed this lock acquisition just prior to the acquisition of
the VF configuration lock.
If ice_reset_vf() acquires the configuration lock via the ICE_VF_RESET_LOCK
flag, this could deadlock with ice_vc_cfg_qs_msg() because it always
acquires the locks in the order of the VF configuration lock and then the
LAG mutex.
Lockdep reports this violation almost immediately on creating and then
removing 2 VF:
======================================================
WARNING: possible circular locking dependency detected
6.8.0-rc6 #54 Tainted: G W O
------------------------------------------------------
kworker/60:3/6771 is trying to acquire lock: ff40d43e099380a0 (&vf->cfg_lock){+.+.}-{3:3}, at: ice_reset_vf+0x22f/0x4d0 [ice]
but task is already holding lock: ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&pf->lag_mutex);
lock(&vf->cfg_lock);
lock(&pf->lag_mutex);
lock(&vf->cfg_lock);
To avoid deadlock, we must acquire the LAG mutex only after acquiring the
VF configuration lock. Fix the ice_reset_vf() to acquire the LAG mutex only
after we either acquire or check that the VF configuration lock is held.
Fixes: 9f74a3dfcf83 ("ice: Fix VF Reset paths when interface in a failed over aggregate") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Dave Ertman <david.m.ertman@intel.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240423182723.740401-5-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
iavf: Fix TC config comparison with existing adapter TC config
Same number of TCs doesn't imply that underlying TC configs are
same. The config could be different due to difference in number
of queues in each TC. Add utility function to determine if TC
configs are same.
Fixes: d5b33d024496 ("i40evf: add ndo_setup_tc callback to i40evf") Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Mineri Bhange <minerix.bhange@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240423182723.740401-4-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the MFS is set below the default (0x2600), a warning message is
reported like the following :
MFS for port 1 has been set below the default: 600
This message is a bit confusing as the number shown here (600) is in
fact an hexa number: 0x600 = 1536
Without any explicit "0x" prefix, this message is read like the MFS is
set to 600 bytes.
MFS, as per MTUs, are usually expressed in decimal base.
This commit reports both current and default MFS values in decimal
so it's less confusing for end-users.
A typical warning message looks like the following :
MFS for port 1 (1536) has been set below the default (9728)
Signed-off-by: Erwan Velu <e.velu@criteo.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Fixes: 3a2c6ced90e1 ("i40e: Add a check to see if MFS is set") Link: https://lore.kernel.org/r/20240423182723.740401-3-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
i40e: Do not use WQ_MEM_RECLAIM flag for workqueue
Issue reported by customer during SRIOV testing, call trace:
When both i40e and the i40iw driver are loaded, a warning
in check_flush_dependency is being triggered. This seems
to be because of the i40e driver workqueue is allocated with
the WQ_MEM_RECLAIM flag, and the i40iw one is not.
Similar error was encountered on ice too and it was fixed by
removing the flag. Do the same for i40e too.
Dan Carpenter [Tue, 23 Apr 2024 16:15:22 +0000 (19:15 +0300)]
net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()
The rx_chn->irq[] array is unsigned int but it should be signed for the
error handling to work. Also if k3_udma_glue_rx_get_irq() returns zero
then we should return -ENXIO instead of success.
Fixes: 128d5874c082 ("net: ti: icssg-prueth: Add ICSSG ethernet driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Link: https://lore.kernel.org/r/05282415-e7f4-42f3-99f8-32fde8f30936@moroto.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net/mlx5e: Advertise mlx5 ethernet driver updates sk_buff md_dst for MACsec
mlx5 Rx flow steering and CQE handling enable the driver to be able to
update an skb's md_dst attribute as MACsec when MACsec traffic arrives when
a device is configured for offloading. Advertise this to the core stack to
take advantage of this capability.
Cc: stable@vger.kernel.org Fixes: b7c9400cbc48 ("net/mlx5e: Implement MACsec Rx data path using MACsec skb_metadata_dst") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/20240423181319.115860-5-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
macsec: Detect if Rx skb is macsec-related for offloading devices that update md_dst
Can now correctly identify where the packets should be delivered by using
md_dst or its absence on devices that provide it.
This detection is not possible without device drivers that update md_dst. A
fallback pattern should be used for supporting such device drivers. This
fallback mode causes multicast messages to be cloned to both the non-macsec
and macsec ports, independent of whether the multicast message received was
encrypted over MACsec or not. Other non-macsec traffic may also fail to be
handled correctly for devices in promiscuous mode.
macsec: Enable devices to advertise whether they update sk_buff md_dst during offloads
Cannot know whether a Rx skb missing md_dst is intended for MACsec or not
without knowing whether the device is able to update this field during an
offload. Assume that an offload to a MACsec device cannot support updating
md_dst by default. Capable devices can advertise that they do indicate that
an skb is related to a MACsec offloaded packet using the md_dst.
Derek Foreman [Mon, 18 Mar 2024 12:32:07 +0000 (07:32 -0500)]
drm/etnaviv: fix tx clock gating on some GC7000 variants
commit 4bce244272513 ("drm/etnaviv: disable tx clock gating for GC7000
rev6203") accidentally applied the fix for i.MX8MN errata ERR050226 to
GC2000 instead of GC7000, failing to disable tx clock gating for GC7000
rev 0x6023 as intended.
Additional clean-up further propagated this issue, partially breaking
the clock gating fixes added for GC7000 rev 6202 in commit 432f51e7deeda
("drm/etnaviv: add clock gating workaround for GC7000 r6202").
Signed-off-by: Derek Foreman <derek.foreman@collabora.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>