Joshua Hay [Wed, 4 Sep 2024 15:47:48 +0000 (17:47 +0200)]
idpf: enable WB_ON_ITR
Tell hardware to write back completed descriptors even when interrupts
are disabled. Otherwise, descriptors might not be written back until
the hardware can flush a full cacheline of descriptors. This can cause
unnecessary delays when traffic is light (or even trigger Tx queue
timeout).
The example scenario to reproduce the Tx timeout if the fix is not
applied:
- configure at least 2 Tx queues to be assigned to the same q_vector,
- generate a huge Tx traffic on the first Tx queue
- try to send a few packets using the second Tx queue.
In such a case Tx timeout will appear on the second Tx queue because no
completion descriptors are written back for that queue while interrupts
are disabled due to NAPI polling.
Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support") Fixes: a5ab9ee0df0b ("idpf: add singleq start_xmit and napi poll") Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Michal Kubiak [Wed, 4 Sep 2024 15:47:47 +0000 (17:47 +0200)]
idpf: fix netdev Tx queue stop/wake
netif_txq_maybe_stop() returns -1, 0, or 1, while
idpf_tx_maybe_stop_common() says it returns 0 or -EBUSY. As a result,
there sometimes are Tx queue timeout warnings despite that the queue
is empty or there is at least enough space to restart it.
Make idpf_tx_maybe_stop_common() inline and returning true or false,
handling the return of netif_txq_maybe_stop() properly. Use a correct
goto in idpf_tx_maybe_stop_splitq() to avoid stopping the queue or
incrementing the stops counter twice.
Fixes: 6818c4d5b3c2 ("idpf: add splitq start_xmit") Fixes: a5ab9ee0df0b ("idpf: add singleq start_xmit and napi poll") Cc: stable@vger.kernel.org # 6.7+ Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Joshua Hay [Wed, 4 Sep 2024 15:47:46 +0000 (17:47 +0200)]
idpf: refactor Tx completion routines
Add a mechanism to guard against stashing partial packets into the hash
table to make the driver more robust, with more efficient decision
making when cleaning.
Don't stash partial packets. This can happen when an RE (Report Event)
completion is received in flow scheduling mode, or when an out of order
RS (Report Status) completion is received. The first buffer with the skb
is stashed, but some or all of its frags are not because the stack is
out of reserve buffers. This leaves the ring in a weird state since
the frags are still on the ring.
Use the field libeth_sqe::nr_frags to track the number of
fragments/tx_bufs representing the packet. The clean routines check to
make sure there are enough reserve buffers on the stack before stashing
any part of the packet. If there are not, next_to_clean is left pointing
to the first buffer of the packet that failed to be stashed. This leaves
the whole packet on the ring, and the next time around, cleaning will
start from this packet.
An RS completion is still expected for this packet in either case. So
instead of being cleaned from the hash table, it will be cleaned from
the ring directly. This should all still be fine since the DESC_UNUSED
and BUFS_UNUSED will reflect the state of the ring. If we ever fall
below the thresholds, the TxQ will still be stopped, giving the
completion queue time to catch up. This may lead to stopping the queue
more frequently, but it guarantees the Tx ring will always be in a good
state.
Also, always use the idpf_tx_splitq_clean function to clean descriptors,
i.e. use it from clean_buf_ring as well. This way we avoid duplicating
the logic and make sure we're using the same reserve buffers guard rail.
This does require a switch from the s16 next_to_clean overflow
descriptor ring wrap calculation to u16 and the normal ring size check.
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
&idpf_tx_buffer is almost identical to the previous generations, as well
as the way it's handled. Moreover, relying on dma_unmap_addr() and
!!buf->skb instead of explicit defining of buffer's type was never good.
Use the newly added libeth helpers to do it properly and reduce the
copy-paste around the Tx code.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Software-side Tx buffers for storing DMA, frame size, skb pointers etc.
are pretty much generic and every driver defines them the same way. The
same can be said for software Tx completions -- same napi_consume_skb()s
and all that...
Add a couple simple wrappers for doing that to stop repeating the old
tale at least within the Intel code. Drivers are free to use 'priv'
member at the end of the structure.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
David S. Miller [Mon, 9 Sep 2024 13:14:54 +0000 (14:14 +0100)]
Merge branch 'unmask-dscp-part-four'
Ido Schimmel says:
====================
Unmask upper DSCP bits - part 4 (last)
tl;dr - This patchset finishes to unmask the upper DSCP bits in the IPv4
flow key in preparation for allowing IPv4 FIB rules to match on DSCP. No
functional changes are expected.
The TOS field in the IPv4 flow key ('flowi4_tos') is used during FIB
lookup to match against the TOS selector in FIB rules and routes.
It is currently impossible for user space to configure FIB rules that
match on the DSCP value as the upper DSCP bits are either masked in the
various call sites that initialize the IPv4 flow key or along the path
to the FIB core.
In preparation for adding a DSCP selector to IPv4 and IPv6 FIB rules, we
need to make sure the entire DSCP value is present in the IPv4 flow key.
This patchset finishes to unmask the upper DSCP bits by adjusting all
the callers of ip_route_output_key() to properly initialize the full
DSCP value in the IPv4 flow key.
No functional changes are expected as commit 1fa3314c14c6 ("ipv4:
Centralize TOS matching") moved the masking of the upper DSCP bits to
the core where 'flowi4_tos' is matched against the TOS selector.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.
Note that the 'tos' variable holds the full DS field.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
netfilter: nft_flow_offload: Unmask upper DSCP bits in nft_flow_route()
Unmask the upper DSCP bits when calling nf_route() which eventually
calls ip_route_output_key() so that in the future it could perform the
FIB lookup according to the full DSCP value.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_xmit()
Unmask the upper DSCP bits when initializing an IPv4 flow key via
ip_tunnel_init_flow() before passing it to ip_route_output_key() so that
in the future we could perform the FIB lookup according to the full DSCP
value.
Note that the 'tos' variable includes the full DS field. Either the one
specified as part of the tunnel parameters or the one inherited from the
inner packet.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4: ip_tunnel: Unmask upper DSCP bits in ip_md_tunnel_xmit()
Unmask the upper DSCP bits when initializing an IPv4 flow key via
ip_tunnel_init_flow() before passing it to ip_route_output_key() so that
in the future we could perform the FIB lookup according to the full DSCP
value.
Note that the 'tos' variable includes the full DS field. Either the one
specified via the tunnel key or the one inherited from the inner packet.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_bind_dev()
Unmask the upper DSCP bits when initializing an IPv4 flow key via
ip_tunnel_init_flow() before passing it to ip_route_output_key() so that
in the future we could perform the FIB lookup according to the full DSCP
value.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The usage looks weird since it checks @ns_type but calls namespace()
it is found for all existing class definitions that the other filed is
also assigned once one is assigned in current kernel tree, so fix this
weird usage by checking @namespace to call namespace().
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 9 Sep 2024 09:29:05 +0000 (10:29 +0100)]
Merge branch 'fs_enet-cleanup'
Maxime Chevallier says:
====================
net: ethernet: fs_enet: Cleanup and phylink conversion
This is V3 of a series that cleans-up fs_enet, with the ultimate goal of
converting it to phylink (patch 8).
The main changes compared to V2 are :
- Reviewed-by tags from Andrew were gathered
- Patch 5 now includes the removal of now unused includes, thanks
Andrew for spotting this
- Patch 4 is new, it reworks the adjust_link to move the spinlock
acquisition to a more suitable location. Although this dissapears in
the actual phylink port, it makes the phylink conversion clearer on
that point
- Patch 8 includes fixes in the tx_timeout cancellation, to prevent
taking rtnl twice when canceling a pending tx_timeout. Thanks Jakub
for spotting this.
Link to V2: https://lore.kernel.org/netdev/20240829161531.610874-1-maxime.chevallier@bootlin.com/
Link to V1: https://lore.kernel.org/netdev/20240828095103.132625-1-maxime.chevallier@bootlin.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
fs_enet is a quite old but still used Ethernet driver found on some NXP
devices. It has support for 10/100 Mbps ethernet, with half and full
duplex. Some variants of it can use RMII, while other integrations are
MII-only.
This also allows removing some internal flags such as the use_rmii flag.
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: fs_enet: simplify clock handling with devm accessors
devm_clock_get_enabled() can be used to simplify clock handling for the
PER register clock.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: fs_enet: use macros for speed and duplex values
The PHY speed and duplex should be manipulated using the SPEED_XXX and
DUPLEX_XXX macros available. Use it in the fcc, fec and scc MAC for
fs_enet.
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: fs_enet: drop unused phy_info and mii_if_info
There's no user of the struct phy_info, the 'phy' field and the
mii_if_info in the fs_enet driver, probably dating back when phylib
wasn't as widely used. Drop these from the driver code.
As the definition for struct mii_if_info is no longer required, drop the
include for linux/mii.h altogether in the driver.
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: fs_enet: only protect the .restart() call in .adjust_link
When .adjust_link() gets called, it runs in thread context, with the
phydev->lock held. We only need to protect the fep->fecp/fccp/sccp
register that are accessed within the .restart() function from
concurrent access from the interrupts.
These registers are being protected by the fep->lock spinlock, so we can
move the spinlock protection around the .restart() call instead of the
entire adjust_link() call. By doing so, we can simplify further the
.adjust_link() callback and avoid the intermediate helper.
Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: fs_enet: drop the .adjust_link custom fs_ops
There's no in-tree user for the fs_ops .adjust_link() function, so we
can always use the generic one in fe_enet-main.
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Due to the age of the driver and the slow recent activity on it, the code
has taken some layers of dust. Clean the main driver file up so that it
passes checkpatch and also conforms with the net coding style.
Changes include :
- Re-ordering of the variable declarations for RCT
- Fixing the comment styles to either one-line comments, or net-style
comments
- Adding braces around single-statement 'else' clauses
- Aligning function/macro parameters on the opening parenthesis
- Simplifying checks for NULL pointers
- Splitting cascaded assignments into individual assignments
- Fixing some typos
- Fixing whitespace issues
This is a cosmetic change and doesn't introduce any change in behaviour.
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The ENET driver has SPDX tags in the header files, but they were missing
in the C files. Change the licence information to SPDX format.
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ptp/ioctl: support MONOTONIC{,_RAW} timestamps for PTP_SYS_OFFSET_EXTENDED
The ability to read the PHC (Physical Hardware Clock) alongside
multiple system clocks is currently dependent on the specific
hardware architecture. This limitation restricts the use of
PTP_SYS_OFFSET_PRECISE to certain hardware configurations.
The generic soultion which would work across all architectures
is to read the PHC along with the latency to perform PHC-read as
offered by PTP_SYS_OFFSET_EXTENDED which provides pre and post
timestamps. However, these timestamps are currently limited
to the CLOCK_REALTIME timebase. Since CLOCK_REALTIME is affected
by NTP (or similar time synchronization services), it can
experience significant jumps forward or backward. This hinders
the precise latency measurements that PTP_SYS_OFFSET_EXTENDED
is designed to provide.
This problem could be addressed by supporting MONOTONIC_RAW
timestamps within PTP_SYS_OFFSET_EXTENDED. Unlike CLOCK_REALTIME
or CLOCK_MONOTONIC, the MONOTONIC_RAW timebase is unaffected
by NTP adjustments.
This enhancement can be implemented by utilizing one of the three
reserved words within the PTP_SYS_OFFSET_EXTENDED struct to pass
the clock-id for timestamps. The current behavior aligns with
clock-id for CLOCK_REALTIME timebase (value of 0), ensuring
backward compatibility of the UAPI.
Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: sched: consistently use rcu_replace_pointer() in taprio_change()
According to Vinicius (and carefully looking through the whole
https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa
once again), txtime branch of 'taprio_change()' is not going to
race against 'advance_sched()'. But using 'rcu_replace_pointer()'
in the former may be a good idea as well.
Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 7 Sep 2024 01:39:31 +0000 (18:39 -0700)]
Merge tag 'nf-next-24-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
Patch #1 adds ctnetlink support for kernel side filtering for
deletions, from Changliang Wu.
Patch #2 updates nft_counter support to Use u64_stats_t,
from Sebastian Andrzej Siewior.
Patch #3 uses kmemdup_array() in all xtables frontends,
from Yan Zhen.
Patch #4 is a oneliner to use ERR_CAST() in nf_conntrack instead
opencoded casting, from Shen Lichuan.
Patch #5 removes unused argument in nftables .validate interface,
from Florian Westphal.
Patch #6 is a oneliner to correct a typo in nftables kdoc,
from Simon Horman.
Patch #7 fixes missing kdoc in nftables, also from Simon.
Patch #8 updates nftables to handle timeout less than CONFIG_HZ.
Patch #9 rejects element expiration if timeout is zero,
otherwise it is silently ignored.
Patch #10 disallows element expiration larger than timeout.
Patch #11 removes unnecessary READ_ONCE annotation while mutex is held.
Patch #12 adds missing READ_ONCE/WRITE_ONCE annotation in dynset.
Patch #13 annotates data-races around element expiration.
Patch #14 allocates timeout and expiration in one single set element
extension, they are tighly couple, no reason to keep them
separated anymore.
Patch #15 updates nftables to interpret zero timeout element as never
times out. Note that it is already possible to declare sets
with elements that never time out but this generalizes to all
kind of set with timeouts.
Patch #16 supports for element timeout and expiration updates.
* tag 'nf-next-24-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: set element timeout update support
netfilter: nf_tables: zero timeout means element never times out
netfilter: nf_tables: consolidate timeout extension for elements
netfilter: nf_tables: annotate data-races around element expiration
netfilter: nft_dynset: annotate data-races around set timeout
netfilter: nf_tables: remove annotation to access set timeout while holding lock
netfilter: nf_tables: reject expiration higher than timeout
netfilter: nf_tables: reject element expiration with no timeout
netfilter: nf_tables: elements with timeout below CONFIG_HZ never expire
netfilter: nf_tables: Add missing Kernel doc
netfilter: nf_tables: Correct spelling in nf_tables.h
netfilter: nf_tables: drop unused 3rd argument from validate callback ops
netfilter: conntrack: Convert to use ERR_CAST()
netfilter: Use kmemdup_array instead of kmemdup for multiple allocation
netfilter: nft_counter: Use u64_stats_t for statistic.
netfilter: ctnetlink: support CTA_FILTER for flush
====================
The PCIe bus can be pretty busy during boot and probe function can
see excessive delays. Let's find the minimal value out of several
tests and use it as estimated value.
Jakub Kicinski [Sat, 7 Sep 2024 01:23:51 +0000 (18:23 -0700)]
Merge branch 'octeontx2-address-some-warnings'
Simon Horman says:
====================
octeontx2: Address some warnings
This patchset addresses some warnings flagged by Sparse, gcc-14, and
clang-18 in files touched by recent patch submissions.
Although these changes do not alter the functionality of the code, by
addressing them real problems introduced in future which are flagged by
Sparse will stand out more readily.
Simon Horman [Wed, 4 Sep 2024 18:29:36 +0000 (19:29 +0100)]
octeontx2-af: Pass string literal as format argument of alloc_workqueue()
Recently I noticed that both gcc-14 and clang-18 report that passing
a non-string literal as the format argument of alloc_workqueue()
is potentially insecure.
E.g. clang-18 says:
.../rvu.c:2493:32: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
2493 | mw->mbox_wq = alloc_workqueue(name,
| ^~~~
.../rvu.c:2493:32: note: treat the string as an argument to avoid this
2493 | mw->mbox_wq = alloc_workqueue(name,
| ^
| "%s",
It is always the case where the contents of name is safe to pass as the
format argument. That is, in my understanding, it never contains any
format escape sequences.
But, it seems better to be safe than sorry. And, as a bonus, compiler
output becomes less verbose by addressing this issue as suggested by
clang-18.
Edward Cree [Wed, 4 Sep 2024 18:11:56 +0000 (19:11 +0100)]
sfc: siena: rip out rss-context dead code
Siena hardware does not support custom RSS contexts, but when the
driver was forked from sfc.ko, some of the plumbing for them was
copied across from the common code. Actually trying to use them
would lead to EOPNOTSUPP as the relevant efx_nic_type methods were
not populated.
Remove this dead code from the Siena driver.
net: tls: wait for async completion on last message
When asynchronous encryption is used KTLS sends out the final data at
proto->close time. This becomes problematic when the task calling
close() receives a signal. In this case it can happen that
tcp_sendmsg_locked() called at close time returns -ERESTARTSYS and the
final data is not sent.
The described situation happens when KTLS is used in conjunction with
io_uring, as io_uring uses task_work_add() to add work to the current
userspace task. A discussion of the problem along with a reproducer can
be found in [1] and [2]
Fix this by waiting for the asynchronous encryption to be completed on
the final message. With this there is no data left to be sent at close
time.
====================
make use of the helper macro LIST_HEAD()
The macro LIST_HEAD() declares a list variable and
initializes it, which can be used to simplify the steps
of list initialization, thereby simplifying the code.
These serials just do some equivalatent substitutions,
and with no functional modifications.
====================
Chen Ni [Wed, 4 Sep 2024 08:49:51 +0000 (16:49 +0800)]
sfc: convert comma to semicolon
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.
Found by inspection.
No functional change intended.
Compile tested only.
Chen Ni [Wed, 4 Sep 2024 08:40:34 +0000 (16:40 +0800)]
sfc/siena: Convert comma to semicolon
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.
Found by inspection.
No functional change intended.
Compile tested only.
Chen Ni [Wed, 4 Sep 2024 08:17:28 +0000 (16:17 +0800)]
ionic: Convert comma to semicolon
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.
Found by inspection.
No functional change intended.
Compile tested only.
Chen Ni [Wed, 4 Sep 2024 08:08:45 +0000 (16:08 +0800)]
net: atlantic: convert comma to semicolon
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.
Found by inspection.
No functional change intended.
Compile tested only.
Gal Pressman [Wed, 4 Sep 2024 07:49:22 +0000 (10:49 +0300)]
bnx2x: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:21 +0000 (10:49 +0300)]
cxgb4: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:20 +0000 (10:49 +0300)]
ixgbe: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:19 +0000 (10:49 +0300)]
igc: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:18 +0000 (10:49 +0300)]
igb: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:17 +0000 (10:49 +0300)]
ice: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:16 +0000 (10:49 +0300)]
i40e: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:15 +0000 (10:49 +0300)]
net: netcp: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:14 +0000 (10:49 +0300)]
net: ti: icssg-prueth: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:13 +0000 (10:49 +0300)]
net: ethernet: ti: cpsw_ethtool: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:12 +0000 (10:49 +0300)]
net: ethernet: ti: am65-cpsw-ethtool: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:11 +0000 (10:49 +0300)]
mlxsw: spectrum: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:10 +0000 (10:49 +0300)]
net: sparx5: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:09 +0000 (10:49 +0300)]
net: lan966x: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Wed, 4 Sep 2024 07:49:08 +0000 (10:49 +0300)]
lan743x: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 6 Sep 2024 07:41:36 +0000 (08:41 +0100)]
Merge branch 'microchip=ksz8-cleanup'
Pieter Van Trappen says:
====================
net: dsa: microchip: rename and clean ksz8 series files
The first KSZ8 series implementation was done for a KSZ8795 device but
since several other KSZ8 devices have been added. Rename these files
to adhere to the ksz8 naming convention as already used in most
functions and the existing ksz8.h; add an explanatory note.
In addition, clean the files by removing macros that are defined at
more than one place and remove confusion by renaming the KSZ8830
string which in fact is not an existing KSZ series switch.
Signed-off-by: Pieter Van Trappen <pieter.van.trappen@cern.ch>
---
v4:
- correct once more Kconfig list of supported switches
v3: https://lore.kernel.org/netdev/20240903072946.344507-1-vtpieter@gmail.com/
- rename all KSZ8830 to KSZ88X3 only (not KSZ8863)
- update Kconfig as per Arun's suggestion
v2: https://lore.kernel.org/netdev/20240830141250.30425-1-vtpieter@gmail.com/
- more finegrained description in Kconfig and ksz8.c header
- add KSZ8830/ksz8830 to KSZ8863/ksz88x3 renaming
Replace ksz8830 with ksz88x3 for CHIP_ID definition and other
strings. This due to KSZ8830 not being an actual switch but the Chip
ID shared among KSZ8863/8873 switches, impossible to differentiate
from their Chip ID or Revision ID registers.
Now all KSZ*_CHIP_ID macros refer to actual, existing switches which
removes confusion.
Signed-off-by: Pieter Van Trappen <pieter.van.trappen@cern.ch> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: microchip: clean up ksz8_reg definition macros
Remove macros that are already defined at more appropriate places.
Signed-off-by: Pieter Van Trappen <pieter.van.trappen@cern.ch> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The first KSZ8 series implementation was done for a KSZ8795 device but
since several other KSZ8 devices have been added. Rename these files
to adhere to the ksz8 naming convention as already used in most
functions and the existing ksz8.h; add an explanatory note.
Signed-off-by: Pieter Van Trappen <pieter.van.trappen@cern.ch> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This series includes adding realtek automotive ethernet driver
and adding rtase ethernet driver entry in MAINTAINERS file.
This ethernet device driver for the PCIe interface of
Realtek Automotive Ethernet Switch,applicable to
RTL9054, RTL9068, RTL9072, RTL9075, RTL9068, RTL9071.
====================
Justin Lai [Wed, 4 Sep 2024 03:21:11 +0000 (11:21 +0800)]
rtase: Implement ethtool function
Implement the ethtool function to support users to obtain network card
information, including obtaining various device settings, Report whether
physical link is up, Report pause parameters, Set pause parameters,
Return extended statistics about the device.
Justin Lai [Wed, 4 Sep 2024 03:21:09 +0000 (11:21 +0800)]
rtase: Implement net_device_ops
1. Implement .ndo_set_rx_mode so that the device can change address
list filtering.
2. Implement .ndo_set_mac_address so that mac address can be changed.
3. Implement .ndo_change_mtu so that mtu can be changed.
4. Implement .ndo_tx_timeout to perform related processing when the
transmitter does not make any progress.
5. Implement .ndo_get_stats64 to provide statistics that are called
when the user wants to get network device usage.
6. Implement .ndo_vlan_rx_add_vid to register VLAN ID when the device
supports VLAN filtering.
7. Implement .ndo_vlan_rx_kill_vid to unregister VLAN ID when the device
supports VLAN filtering.
8. Implement the .ndo_setup_tc to enable setting any "tc" scheduler,
classifier or action on dev.
9. Implement .ndo_fix_features enables adjusting requested feature flags
based on device-specific constraints.
10. Implement .ndo_set_features enables updating device configuration to
new features.
Justin Lai [Wed, 4 Sep 2024 03:21:08 +0000 (11:21 +0800)]
rtase: Implement a function to receive packets
Implement rx_handler to read the information of the rx descriptor,
thereby checking the packet accordingly and storing the packet
in the socket buffer to complete the reception of the packet.
Justin Lai [Wed, 4 Sep 2024 03:21:07 +0000 (11:21 +0800)]
rtase: Implement .ndo_start_xmit function
Implement .ndo_start_xmit function to fill the information of the packet
to be transmitted into the tx descriptor, and then the hardware will
transmit the packet using the information in the tx descriptor.
In addition, we also implemented the tx_handler function to enable the
tx descriptor to be reused.
Justin Lai [Wed, 4 Sep 2024 03:21:06 +0000 (11:21 +0800)]
rtase: Implement hardware configuration function
Implement rtase_hw_config to set default hardware settings, including
setting interrupt mitigation, tx/rx DMA burst, interframe gap time,
rx packet filter, near fifo threshold and fill descriptor ring and
tally counter address, and enable flow control. When filling the
rx descriptor ring, the first group of queues needs to be processed
separately because the positions of the first group of queues are not
regular with other subsequent groups. The other queues are all newly
added features, but we want to retain the original design. So they were
not put together.
Justin Lai [Wed, 4 Sep 2024 03:21:05 +0000 (11:21 +0800)]
rtase: Implement the interrupt routine and rtase_poll
1. Implement rtase_interrupt to handle txQ0/rxQ0, txQ4~txQ7 interrupts,
and implement rtase_q_interrupt to handle txQ1/rxQ1, txQ2/rxQ2 and
txQ3/rxQ3 interrupts.
2. Implement rtase_poll to call ring_handler to process the tx or
rx packet of each ring. If the returned value is budget,it means that
there is still work of a certain ring that has not yet been completed.
Justin Lai [Wed, 4 Sep 2024 03:21:03 +0000 (11:21 +0800)]
rtase: Implement the .ndo_open function
Implement the .ndo_open function to set default hardware settings
and initialize the descriptor ring and interrupts. Among them,
when requesting interrupt, because the first group of interrupts
needs to process more events, the overall structure and interrupt
handler will be different from other groups of interrupts, so it
needs to be handled separately. The first set of interrupt handlers
need to handle the interrupt status of RXQ0 and TXQ0, TXQ4~7,
while other groups of interrupt handlers will handle the interrupt
status of RXQ1&TXQ1 or RXQ2&TXQ2 or RXQ3&TXQ3 according to the
interrupt vector.
Joe Damato [Wed, 4 Sep 2024 15:34:30 +0000 (15:34 +0000)]
net: napi: Prevent overflow of napi_defer_hard_irqs
In commit 6f8b12d661d0 ("net: napi: add hard irqs deferral feature")
napi_defer_irqs was added to net_device and napi_defer_irqs_count was
added to napi_struct, both as type int.
This value never goes below zero, so there is not reason for it to be a
signed int. Change the type for both from int to u32, and add an
overflow check to sysfs to limit the value to S32_MAX.
The limit of S32_MAX was chosen because the practical limit before this
patch was S32_MAX (anything larger was an overflow) and thus there are
no behavioral changes introduced. If the extra bit is needed in the
future, the limit can be raised.
Merge tag 'net-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from can, bluetooth and wireless.
No known regressions at this point. Another calm week, but chances are
that has more to do with vacation season than the quality of our work.
Current release - new code bugs:
- smc: prevent NULL pointer dereference in txopt_get
- eth: ti: am65-cpsw: number of XDP-related fixes
Previous releases - regressions:
- Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over
BREDR/LE", it breaks existing user space
- Bluetooth: qca: if memdump doesn't work, re-enable IBS to avoid
later problems with suspend
- can: mcp251x: fix deadlock if an interrupt occurs during
mcp251x_open
- eth: r8152: fix the firmware communication error due to use of bulk
write
- ptp: ocp: fix serial port information export
- eth: igb: fix not clearing TimeSync interrupts for 82580
- Revert "wifi: ath11k: support hibernation", fix suspend on Lenovo
Previous releases - always broken:
- eth: intel: fix crashes and bugs when reconfiguration and resets
happening in parallel
- wifi: ath11k: fix NULL dereference in ath11k_mac_get_eirp_power()
Misc:
- docs: netdev: document guidance on cleanup.h"
* tag 'net-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
ila: call nf_unregister_net_hooks() sooner
tools/net/ynl: fix cli.py --subscribe feature
MAINTAINERS: fix ptp ocp driver maintainers address
selftests: net: enable bind tests
net: dsa: vsc73xx: fix possible subblocks range of CAPT block
sched: sch_cake: fix bulk flow accounting logic for host fairness
docs: netdev: document guidance on cleanup.h
net: xilinx: axienet: Fix race in axienet_stop
net: bridge: br_fdb_external_learn_add(): always set EXT_LEARN
r8152: fix the firmware doesn't work
fou: Fix null-ptr-deref in GRO.
bareudp: Fix device stats updates.
net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanup
bpf, net: Fix a potential race in do_sock_getsockopt()
net: dqs: Do not use extern for unused dql_group
sch/netem: fix use after free in netem_dequeue
usbnet: modern method to get random MAC
MAINTAINERS: wifi: cw1200: add net-cw1200.h
ice: do not bring the VSI up, if it was down before the XDP setup
ice: remove ICE_CFG_BUSY locking from AF_XDP code
...
Merge tag 'spi-fix-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few small driver specific fixes (including some of the widespread
work on fixing missing ID tables for module autoloading and the revert
of some problematic PM work in spi-rockchip), some improvements to the
MAINTAINERS information for the NXP drivers and the addition of a new
device ID to spidev"
* tag 'spi-fix-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
MAINTAINERS: SPI: Add mailing list imx@lists.linux.dev for nxp spi drivers
MAINTAINERS: SPI: Add freescale lpspi maintainer information
spi: spi-fsl-lpspi: Fix off-by-one in prescale max
spi: spidev: Add missing spi_device_id for jg10309-01
spi: bcm63xx: Enable module autoloading
spi: intel: Add check devm_kasprintf() returned value
spi: spidev: Add an entry for elgin,jg10309-01
spi: rockchip: Resolve unbalanced runtime PM / system PM handling
Merge tag 'regulator-fix-v6.11-stub' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"A fix from Doug Anderson for a missing stub, required to fix the build
for some newly added users of devm_regulator_bulk_get_const() in
!REGULATOR configurations"
* tag 'regulator-fix-v6.11-stub' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: core: Stub devm_regulator_bulk_get_const() if !CONFIG_REGULATOR
Merge tag 'rust-fixes-6.11-2' of https://github.com/Rust-for-Linux/linux
Pull Rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:
- Fix builds for nightly compiler users now that 'new_uninit' was
split into new features by using an alternative approach for the
code that used what is now called the 'box_uninit_write' feature
- Allow the 'stable_features' lint to preempt upcoming warnings about
them, since soon there will be unstable features that will become
stable in nightly compilers
- Export bss symbols too
'kernel' crate:
- 'block' module: fix wrong usage of lockdep API
'macros' crate:
- Provide correct provenance when constructing 'THIS_MODULE'
Documentation:
- Remove unintended indentation (blockquotes) in generated output
- Fix a couple typos
MAINTAINERS:
- Remove Wedson as Rust maintainer
- Update Andreas' email"
* tag 'rust-fixes-6.11-2' of https://github.com/Rust-for-Linux/linux:
MAINTAINERS: update Andreas Hindborg's email address
MAINTAINERS: Remove Wedson as Rust maintainer
rust: macros: provide correct provenance when constructing THIS_MODULE
rust: allow `stable_features` lint
docs: rust: remove unintended blockquote in Quick Start
rust: alloc: eschew `Box<MaybeUninit<T>>::write`
rust: kernel: fix typos in code comments
docs: rust: remove unintended blockquote in Coding Guidelines
rust: block: fix wrong usage of lockdep API
rust: kbuild: fix export of bss symbols
Merge tag 'trace-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix adding a new fgraph callback after function graph tracing has
already started.
If the new caller does not initialize its hash before registering the
fgraph_ops, it can cause a NULL pointer dereference. Fix this by
adding a new parameter to ftrace_graph_enable_direct() passing in the
newly added gops directly and not rely on using the fgraph_array[],
as entries in the fgraph_array[] must be initialized.
Assign the new gops to the fgraph_array[] after it goes through
ftrace_startup_subops() as that will properly initialize the
gops->ops and initialize its hashes.
- Fix a memory leak in fgraph storage memory test.
If the "multiple fgraph storage on a function" boot up selftest fails
in the registering of the function graph tracer, it will not free the
memory it allocated for the filter. Break the loop up into two where
it allocates the filters first and then registers the functions where
any errors will do the appropriate clean ups.
- Only clear the timerlat timers if it has an associated kthread.
In the rtla tool that uses timerlat, if it was killed just as it was
shutting down, the signals can free the kthread and the timer. But
the closing of the timerlat files could cause the hrtimer_cancel() to
be called on the already freed timer. As the kthread variable is is
set to NULL when the kthreads are stopped and the timers are freed it
can be used to know not to call hrtimer_cancel() on the timer if the
kthread variable is NULL.
- Use a cpumask to keep track of osnoise/timerlat kthreads
The timerlat tracer can use user space threads for its analysis. With
the killing of the rtla tool, the kernel can get confused between if
it is using a user space thread to analyze or one of its own kernel
threads. When this confusion happens, kthread_stop() can be called on
a user space thread and bad things happen. As the kernel threads are
per-cpu, a bitmask can be used to know when a kernel thread is used
or when a user space thread is used.
- Add missing interface_lock to osnoise/timerlat stop_kthread()
The stop_kthread() function in osnoise/timerlat clears the osnoise
kthread variable, and if it was a user space thread does a put_task
on it. But this can race with the closing of the timerlat files that
also does a put_task on the kthread, and if the race happens the task
will have put_task called on it twice and oops.
- Add cond_resched() to the tracing_iter_reset() loop.
The latency tracers keep writing to the ring buffer without resetting
when it issues a new "start" event (like interrupts being disabled).
When reading the buffer with an iterator, the tracing_iter_reset()
sets its pointer to that start event by walking through all the
events in the buffer until it gets to the time stamp of the start
event. In the case of a very large buffer, the loop that looks for
the start event has been reported taking a very long time with a non
preempt kernel that it can trigger a soft lock up warning. Add a
cond_resched() into that loop to make sure that doesn't happen.
- Use list_del_rcu() for eventfs ei->list variable
It was reported that running loops of creating and deleting kprobe
events could cause a crash due to the eventfs list iteration hitting
a LIST_POISON variable. This is because the list is protected by SRCU
but when an item is deleted from the list, it was using list_del()
which poisons the "next" pointer. This is what list_del_rcu() was to
prevent.
* tag 'trace-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread()
tracing/timerlat: Only clear timer if a kthread exists
tracing/osnoise: Use a cpumask to know what threads are kthreads
eventfs: Use list_del_rcu() for SRCU protected list variable
tracing: Avoid possible softlockup in tracing_iter_reset()
tracing: Fix memory leak in fgraph storage selftest
tracing: fgraph: Fix to add new fgraph_ops to array after ftrace_startup_subops()
Memory state around the buggy address: ffff88806461ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88806461ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888064620000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^ ffff888064620080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888064620100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Fixes: 7f00feaf1076 ("ila: Add generic ILA translation facility") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Reviewed-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20240904144418.1162839-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Execution of command:
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml /
--subscribe "monitor" --sleep 10
fails with:
File "/repo/./tools/net/ynl/cli.py", line 109, in main
ynl.check_ntf()
File "/repo/tools/net/ynl/lib/ynl.py", line 924, in check_ntf
op = self.rsp_by_value[nl_msg.cmd()]
KeyError: 19
Parsing Generic Netlink notification messages performs lookup for op in
the message. The message was not yet decoded, and is not yet considered
GenlMsg, thus msg.cmd() returns Generic Netlink family id (19) instead of
proper notification command id (i.e.: DPLL_CMD_PIN_CHANGE_NTF=13).
Allow the op to be obtained within NetlinkProtocol.decode(..) itself if the
op was not passed to the decode function, thus allow parsing of Generic
Netlink notifications without causing the failure.