]> git.dujemihanovic.xyz Git - linux.git/log
linux.git
5 weeks agoeth: fbnic: Add devlink firmware version info
Lee Trager [Thu, 5 Sep 2024 23:37:51 +0000 (16:37 -0700)]
eth: fbnic: Add devlink firmware version info

This adds support to show firmware version information for both stored and
running firmware versions. The version and commit is displayed separately
to aid monitoring tools which only care about the version.

Example output:
  # devlink dev info
  pci/0000:01:00.0:
    driver fbnic
    serial_number 88-25-08-ff-ff-01-50-92
    versions:
        running:
          fw 24.07.15-017
          fw.commit h999784ae9df0
          fw.bootloader 24.07.10-000
          fw.bootloader.commit hfef3ac835ce7
        stored:
          fw 24.07.24-002
          fw.commit hc9d14a68b3f2
          fw.bootloader 24.07.22-000
          fw.bootloader.commit h922f8493eb96
          fw.undi 01.00.03-000

Signed-off-by: Lee Trager <lee@trager.us>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240905233820.1713043-1-lee@trager.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agoMerge branch 'net-lan966x-use-the-newly-introduced-fdma-library'
Paolo Abeni [Tue, 10 Sep 2024 09:04:18 +0000 (11:04 +0200)]
Merge branch 'net-lan966x-use-the-newly-introduced-fdma-library'

Daniel Machon says:

====================
net: lan966x: use the newly introduced FDMA library

This patch series is the second of a 2-part series [1], that adds a new
common FDMA library for Microchip switch chips Sparx5 and lan966x. These
chips share the same FDMA engine, and as such will benefit from a common
library with a common implementation.  This also has the benefit of
removing a lot of open-coded bookkeeping and duplicate code for the two
drivers.

In this second series, the FDMA library will be taken into use by the
lan966x switch driver.

 ###################
 # Example of use: #
 ###################

- Initialize the rx and tx fdma structs with values for: number of
  DCB's, number of DB's, channel ID, DB size (data buffer size), and
  total size of the requested memory. Also provide two callbacks:
  nextptr_cb() and dataptr_cb() for getting the nextptr and dataptr.

- Allocate memory using fdma_alloc_phys() or fdma_alloc_coherent().

- Initialize the DCB's with fdma_dcb_init().

- Add new DCB's with fdma_dcb_add().

- Free memory with fdma_free_phys() or fdma_free_coherent().

 #####################
 # Patch  breakdown: #
 #####################

Patch #1:  select FDMA library for lan966x.

Patch #2:  includes the fdma_api.h header and removes old symbols.

Patch #3:  replaces old rx and tx variables with equivalent ones from the
           fdma struct. Only the variables that can be changed without
           breaking traffic is changed in this patch.

Patch #4:  uses the library for allocation of rx buffers. This requires
           quite a bit of refactoring in this single patch.

Patch #5:  uses the library for adding DCB's in the rx path.

Patch #6:  uses the library for freeing rx buffers.

Patch #7:  uses the library for allocation of tx buffers. This requires
           quite a bit of refactoring in this single patch.

Patch #8:  uses the library for adding DCB's in the tx path.

Patch #9:  uses the library helpers in the tx path.

Patch #10: ditch last_in_use variable and use library instead.

Patch #11: uses library helpers throughout.

Patch #12: refactor lan966x_fdma_reload() function.

[1] https://lore.kernel.org/netdev/20240902-fdma-sparx5-v1-0-1e7d5e5a9f34@microchip.com/

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
====================

Link: https://patch.msgid.link/20240905-fdma-lan966x-v1-0-e083f8620165@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: lan966x: refactor buffer reload function
Daniel Machon [Thu, 5 Sep 2024 08:06:40 +0000 (10:06 +0200)]
net: lan966x: refactor buffer reload function

Now that we store everything in the fdma structs, refactor
lan966x_fdma_reload() to store and restore the entire struct.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: lan966x: use a few FDMA helpers throughout
Daniel Machon [Thu, 5 Sep 2024 08:06:39 +0000 (10:06 +0200)]
net: lan966x: use a few FDMA helpers throughout

The library provides helpers for a number of DCB and DB operations. Use
these throughout the code and remove the old ones.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: lan966x: ditch tx->last_in_use variable
Daniel Machon [Thu, 5 Sep 2024 08:06:38 +0000 (10:06 +0200)]
net: lan966x: ditch tx->last_in_use variable

This variable is used in the tx path to determine the last used DCB. The
library has the variable last_dcb for the exact same purpose. Ditch the
last_in_use variable throughout.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: lan966x: use library helper for freeing tx buffers
Daniel Machon [Thu, 5 Sep 2024 08:06:37 +0000 (10:06 +0200)]
net: lan966x: use library helper for freeing tx buffers

The library has the helper fdma_free_phys() for freeing physical FDMA
memory. Use it in the exit path.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: lan966x: use FDMA library for adding DCB's in the tx path
Daniel Machon [Thu, 5 Sep 2024 08:06:36 +0000 (10:06 +0200)]
net: lan966x: use FDMA library for adding DCB's in the tx path

Use the fdma_dcb_add() function to add DCB's in the tx path. This gets
rid of the open-coding of nextptr and dataptr handling and leaves it to
the library.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: lan966x: use the FDMA library for allocation of tx buffers
Daniel Machon [Thu, 5 Sep 2024 08:06:35 +0000 (10:06 +0200)]
net: lan966x: use the FDMA library for allocation of tx buffers

Use the two functions: fdma_alloc_phys() and fdma_dcb_init() for rx
buffer allocation and use the new buffers throughout.

In order to replace the old buffers with the new ones, we have to do the
following refactoring:

    - use fdma_alloc_phys() and fdma_dcb_init()

    - replace the variables: tx->dma, tx->dcbs and tx->curr_entry
      with the equivalents from the FDMA struct.

    - add lan966x_fdma_tx_dataptr_cb callback for obtaining the dataptr.

    - Initialize FDMA struct values.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: lan966x: use library helper for freeing rx buffers
Daniel Machon [Thu, 5 Sep 2024 08:06:34 +0000 (10:06 +0200)]
net: lan966x: use library helper for freeing rx buffers

The library has the helper fdma_free_phys() for freeing physical FDMA
memory. Use it in the exit path.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: lan966x: use FDMA library for adding DCB's in the rx path
Daniel Machon [Thu, 5 Sep 2024 08:06:33 +0000 (10:06 +0200)]
net: lan966x: use FDMA library for adding DCB's in the rx path

Use the fdma_dcb_add() function to add DCB's in the rx path. This gets
rid of the open-coding of nextptr and dataptr handling and the functions
for adding DCB's.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: lan966x: use the FDMA library for allocation of rx buffers
Daniel Machon [Thu, 5 Sep 2024 08:06:32 +0000 (10:06 +0200)]
net: lan966x: use the FDMA library for allocation of rx buffers

Use the two functions: fdma_alloc_phys() and fdma_dcb_init() for rx
buffer allocation and use the new buffers throughout.

In order to replace the old buffers with the new ones, we have to do the
following refactoring:

    - use fdma_alloc_phys() and fdma_dcb_init()

    - replace the variables: rx->dma, rx->dcbs and rx->last_entry
      with the equivalents from the FDMA struct.

    - make use of fdma->db_size for rx buffer size.

    - add lan966x_fdma_rx_dataptr_cb callback for obtaining the dataptr.

    - Initialize FDMA struct values.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: lan966x: replace a few variables with new equivalent ones
Daniel Machon [Thu, 5 Sep 2024 08:06:31 +0000 (10:06 +0200)]
net: lan966x: replace a few variables with new equivalent ones

Replace the old rx and tx variables: channel_id, FDMA_DCB_MAX,
FDMA_RX_DCB_MAX_DBS, FDMA_TX_DCB_MAX_DBS, dcb_index and db_index with
the equivalents from the FDMA rx and tx structs. These variables are not
entangled in any buffer allocation and can therefore be replaced in
advance.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: lan966x: use FDMA library symbols
Daniel Machon [Thu, 5 Sep 2024 08:06:30 +0000 (10:06 +0200)]
net: lan966x: use FDMA library symbols

Include and use the new FDMA header, which now provides the required
masks and bit offsets for operating on the DCB's and DB's.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: lan966x: select FDMA library
Daniel Machon [Thu, 5 Sep 2024 08:06:29 +0000 (10:06 +0200)]
net: lan966x: select FDMA library

Select the newly introduced FDMA library.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agoMerge branch 'ionic-convert-rx-queue-buffers-to-use-page_pool'
Jakub Kicinski [Tue, 10 Sep 2024 02:20:42 +0000 (19:20 -0700)]
Merge branch 'ionic-convert-rx-queue-buffers-to-use-page_pool'

Brett Creeley says:

====================
ionic: convert Rx queue buffers to use page_pool

Our home-grown buffer management needs to go away and we need to play
nicely with the page_pool infrastructure.  This patchset cleans up some
of our API use and converts the Rx traffic queues to use page_pool.
The first few patches are for tidying up things, then a small XDP
configuration refactor, adding page_pool support, and finally adding
support to hot swap an XDP program without having to reconfigure
anything.

The result is code that more closely follows current patterns, as well as
a either a performance boost or equivalent performance as seen with
iperf testing:

  mss   netio    tx_pps      rx_pps      total_pps      tx_bw    rx_bw    total_bw
  ----  -------  ----------  ----------  -----------  -------  -------  ----------
Before:
  256   bidir    13,839,293  15,515,227  29,354,520        34       38          71
  512   bidir    13,913,249  14,671,693  28,584,942        62       65         127
  1024  bidir    13,006,189  13,695,413  26,701,602       109      115         224
  1448  bidir    12,489,905  12,791,734  25,281,639       145      149         294
  2048  bidir    9,195,622   9,247,649   18,443,271       148      149         297
  4096  bidir    5,149,716   5,247,917   10,397,633       160      163         323
  8192  bidir    3,029,993   3,008,882   6,038,875        179      179         358
  9000  bidir    2,789,358   2,800,744   5,590,102        181      180         361

After:
  256   bidir    21,540,037  21,344,644  42,884,681        52       52         104
  512   bidir    23,170,014  19,207,260  42,377,274       103       85         188
  1024  bidir    17,934,280  17,819,247  35,753,527       150      149         299
  1448  bidir    15,242,515  14,907,030  30,149,545       167      174         341
  2048  bidir    10,692,542  10,663,023  21,355,565       177      176         353
  4096  bidir    6,024,977   6,083,580   12,108,557       187      180         367
  8192  bidir    3,090,449   3,048,266   6,138,715        180      176         356
  9000  bidir    2,859,146   2,864,226   5,723,372        178      180         358

v2: https://lore.kernel.org/20240826184422.21895-1-brett.creeley@amd.com
v1: https://lore.kernel.org/20240625165658.34598-1-shannon.nelson@amd.com
====================

Link: https://patch.msgid.link/20240906232623.39651-1-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoionic: Allow XDP program to be hot swapped
Brett Creeley [Fri, 6 Sep 2024 23:26:23 +0000 (16:26 -0700)]
ionic: Allow XDP program to be hot swapped

Using examples of other driver(s), add the ability to hot-swap an XDP
program without having to reconfigure the queues. To prevent the
q->xdp_prog to be read/written more than once use READ_ONCE() and
WRITE_ONCE() on the q->xdp_prog.

The q->xdp_prog was being checked in multiple different for loops in the
hot path. The change to allow xdp_prog hot swapping created the
possibility for many READ_ONCE(q->xdp_prog) calls during a single napi
callback. Refactor the Rx napi handling to allow a previous
READ_ONCE(q->xdp_prog) (or NULL for hwstamp_rxq) to be passed into the
relevant functions.

Also, move other Rx related hotpath handling into the newly created
ionic_rx_cq_service() function to reduce the scope of the xdp_prog
local variable and put all Rx handling in one function similar to Tx.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-8-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoionic: convert Rx queue buffers to use page_pool
Shannon Nelson [Fri, 6 Sep 2024 23:26:22 +0000 (16:26 -0700)]
ionic: convert Rx queue buffers to use page_pool

Our home-grown buffer management needs to go away and we need
to be playing nicely with the page_pool infrastructure.  This
converts the Rx traffic queues to use page_pool.

Also, since ionic_rx_buf_size() was removed, redefine
IONIC_PAGE_SIZE to account for IONIC_MAX_BUF_LEN being the
largest allowed buffer to prevent overflowing u16 variables,
which could happen when PAGE_SIZE is defined as >= 64KB.

include/linux/minmax.h:93:37: warning: conversion from 'long unsigned int' to 'u16' {aka 'short unsigned int'} changes value from '65536' to '0' [-Woverflow]

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-7-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoionic: Fully reconfigure queues when going to/from a NULL XDP program
Brett Creeley [Fri, 6 Sep 2024 23:26:21 +0000 (16:26 -0700)]
ionic: Fully reconfigure queues when going to/from a NULL XDP program

Currently when going to/from a NULL XDP program the driver uses
ionic_stop_queues_reconfig() and then ionic_start_queues_reconfig() in
order to re-register the xdp_rxq_info and re-init the queues. This is
fine until page_pool(s) are used in an upcoming patch.

In preparation for adding page_pool support make sure to completely
rebuild the queues when going to/from a NULL XDP program. Without this
change the call to mem_allocator_disconnect() never happens when going
to a NULL XDP program, which eventually results in
xdp_rxq_info_reg_mem_model() failing with -ENOSPC due to the mem_id_pool
ida having no remaining space.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-6-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoionic: always use rxq_info
Shannon Nelson [Fri, 6 Sep 2024 23:26:20 +0000 (16:26 -0700)]
ionic: always use rxq_info

Instead of setting up and tearing down the rxq_info only when the XDP
program is loaded or unloaded, we will build the rxq_info whether or not
XDP is in use.  This is the more common use pattern and better supports
future conversion to page_pool.  Since the rxq_info wants the napi_id
we re-order things slightly to tie this into the queue init and deinit
functions where we do the add and delete of napi.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-5-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoionic: use per-queue xdp_prog
Shannon Nelson [Fri, 6 Sep 2024 23:26:19 +0000 (16:26 -0700)]
ionic: use per-queue xdp_prog

We originally were using a per-interface xdp_prog variable to track
a loaded XDP program since we knew there would never be support for a
per-queue XDP program.  With that, we only built the per queue rxq_info
struct when an XDP program was loaded and removed it on XDP program unload,
and used the pointer as an indicator in the Rx hotpath to know to how build
the buffers.  However, that's really not the model generally used, and
makes a conversion to page_pool Rx buffer cacheing a little problematic.

This patch converts the driver to use the more common approach of using
a per-queue xdp_prog pointer to work out buffer allocations and need
for bpf_prog_run_xdp().  We jostle a couple of fields in the queue struct
in order to keep the new xdp_prog pointer in a warm cacheline.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-4-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoionic: rename ionic_xdp_rx_put_bufs
Shannon Nelson [Fri, 6 Sep 2024 23:26:18 +0000 (16:26 -0700)]
ionic: rename ionic_xdp_rx_put_bufs

We aren't "putting" buf, we're just unlinking them from our tracking in
order to let the XDP_TX and XDP_REDIRECT tx clean paths take care of the
pages when they are done with them.  This rename clears up the intent.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-3-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoionic: debug line for Tx completion errors
Shannon Nelson [Fri, 6 Sep 2024 23:26:17 +0000 (16:26 -0700)]
ionic: debug line for Tx completion errors

Here's a little debugging aid in case the device starts throwing
Tx completion errors.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-2-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'rx-software-timestamp-for-all-round-3'
Jakub Kicinski [Tue, 10 Sep 2024 00:44:52 +0000 (17:44 -0700)]
Merge branch 'rx-software-timestamp-for-all-round-3'

Gal Pressman says:

====================
RX software timestamp for all - round 3

Rounds 1 & 2 of drivers conversion were merged [1][2], this round will
complete the work.

[1] https://lore.kernel.org/netdev/20240901112803.212753-1-gal@nvidia.com/
[2] https://lore.kernel.org/netdev/20240904074922.256275-1-gal@nvidia.com/
====================

Link: https://patch.msgid.link/20240906144632.404651-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoptp: ptp_ines: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:32 +0000 (17:46 +0300)]
ptp: ptp_ines: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-17-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoixp4xx_eth: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:31 +0000 (17:46 +0300)]
ixp4xx_eth: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patch.msgid.link/20240906144632.404651-16-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:30 +0000 (17:46 +0300)]
net: stmmac: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-15-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agosfc/siena: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:29 +0000 (17:46 +0300)]
sfc/siena: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://patch.msgid.link/20240906144632.404651-14-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agosfc: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:28 +0000 (17:46 +0300)]
sfc: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://patch.msgid.link/20240906144632.404651-13-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoqede: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:27 +0000 (17:46 +0300)]
qede: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-12-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: mscc: ocelot: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:26 +0000 (17:46 +0300)]
net: mscc: ocelot: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-11-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/funeth: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:25 +0000 (17:46 +0300)]
net/funeth: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-10-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoenic: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:24 +0000 (17:46 +0300)]
enic: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-9-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: thunderx: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:23 +0000 (17:46 +0300)]
net: thunderx: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-8-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoliquidio: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:22 +0000 (17:46 +0300)]
liquidio: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-7-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: macb: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:21 +0000 (17:46 +0300)]
net: macb: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Link: https://patch.msgid.link/20240906144632.404651-6-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoamd-xgbe: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:20 +0000 (17:46 +0300)]
amd-xgbe: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Link: https://patch.msgid.link/20240906144632.404651-5-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agobonding: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:19 +0000 (17:46 +0300)]
bonding: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-4-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agotg3: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:18 +0000 (17:46 +0300)]
tg3: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240906144632.404651-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agobnxt_en: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:17 +0000 (17:46 +0300)]
bnxt_en: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240906144632.404651-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: ti: icssg-prueth: Make pa_stats optional
MD Danish Anwar [Fri, 6 Sep 2024 09:36:49 +0000 (15:06 +0530)]
net: ti: icssg-prueth: Make pa_stats optional

pa_stats is optional in dt bindings, make it optional in driver as well.
Currently if pa_stats syscon regmap is not found driver returns -ENODEV.
Fix this by not returning an error in case pa_stats is not found and
continue generating ethtool stats without pa_stats.

Fixes: 550ee90ac61c ("net: ti: icssg-prueth: Add support for PA Stats")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240906093649.870883-1-danishanwar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: ibm: emac: Use __iomem annotation for emac_[xg]aht_base
Simon Horman [Fri, 6 Sep 2024 07:36:09 +0000 (08:36 +0100)]
net: ibm: emac: Use __iomem annotation for emac_[xg]aht_base

dev->emacp contains an __iomem pointer and values derived
from it are used as __iomem pointers. So use this annotation
in the return type for helpers that derive pointers from dev->emacp.

Flagged by Sparse as:

.../core.c:444:36: warning: incorrect type in argument 1 (different address spaces)
.../core.c:444:36:    expected unsigned int volatile [noderef] [usertype] __iomem *addr
.../core.c:444:36:    got unsigned int [usertype] *
.../core.c: note: in included file:
.../core.h:416:25: warning: cast removes address space '__iomem' of expression

Compile tested only.
No functional change intended.

Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240906-emac-iomem-v1-1-207cc4f3fed0@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'selftests-net-add-packetdrill'
Jakub Kicinski [Tue, 10 Sep 2024 00:38:03 +0000 (17:38 -0700)]
Merge branch 'selftests-net-add-packetdrill'

Willem de Bruijn says:

====================
selftests/net: add packetdrill

Lay the groundwork to import into kselftests the over 150 packetdrill
TCP/IP conformance tests on github.com/google/packetdrill.

1/2: add kselftest infra for TEST_PROGS that need an interpreter

2/2: add the specific packetdrill tests
====================

Link: https://patch.msgid.link/20240905231653.2427327-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests/net: integrate packetdrill with ksft
Willem de Bruijn [Thu, 5 Sep 2024 23:15:52 +0000 (19:15 -0400)]
selftests/net: integrate packetdrill with ksft

Lay the groundwork to import into kselftests the over 150 packetdrill
TCP/IP conformance tests on github.com/google/packetdrill.

Florian recently added support for packetdrill tests in nf_conntrack,
in commit a8a388c2aae49 ("selftests: netfilter: add packetdrill based
conntrack tests").

This patch takes a slightly different approach. It relies on
ksft_runner.sh to run every *.pkt file in the directory.

Any future imports of packetdrill tests should require no additional
coding. Just add the *.pkt files.

Initially import only two features/directories from github. One with a
single script, and one with two. This was the only reason to pick
tcp/inq and tcp/md5.

The path replaces the directory hierarchy in github with a flat space
of files: $(subst /,_,$(wildcard tcp/**/*.pkt)). This is the most
straightforward option to integrate with kselftests. The Linked thread
reviewed two ways to maintain the hierarchy: TEST_PROGS_RECURSE and
PRESERVE_TEST_DIRS. But both introduce significant changes to
kselftest infra and with that risk to existing tests.

Implementation notes:
- restore alphabetical order when adding the new directory to
  tools/testing/selftests/Makefile
- imported *.pkt files and support verbatim from the github project,
  except for
    - update `source ./defaults.sh` path (to adjust for flat dir)
    - add SPDX headers
    - remove one author statement
    - Acknowledgment: drop an e (checkpatch)

Tested:
make -C tools/testing/selftests \
  TARGETS=net/packetdrill \
  run_tests

make -C tools/testing/selftests \
  TARGETS=net/packetdrill \
  install INSTALL_PATH=$KSFT_INSTALL_PATH

# in virtme-ng
./run_kselftest.sh -c net/packetdrill
./run_kselftest.sh -t net/packetdrill:tcp_inq_client.pkt

Link: https://lore.kernel.org/netdev/20240827193417.2792223-1-willemdebruijn.kernel@gmail.com/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240905231653.2427327-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests: support interpreted scripts with ksft_runner.sh
Willem de Bruijn [Thu, 5 Sep 2024 23:15:51 +0000 (19:15 -0400)]
selftests: support interpreted scripts with ksft_runner.sh

Support testcases that are themselves not executable, but need an
interpreter to run them.

If a test file is not executable, but an executable file
ksft_runner.sh exists in the TARGET dir, kselftest will run

    ./ksft_runner.sh ./$BASENAME_TEST

Packetdrill may add hundreds of packetdrill scripts for testing. These
scripts must be passed to the packetdrill process.

Have kselftest run each test directly, as it already solves common
runner requirements like parallel execution and isolation (netns).
A previous RFC added a wrapper in between, which would have to
reimplement such functionality.

Link: https://lore.kernel.org/netdev/66d4d97a4cac_3df182941a@willemb.c.googlers.com.notmuch/T/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240905231653.2427327-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'various-cleanups'
Jakub Kicinski [Tue, 10 Sep 2024 00:17:45 +0000 (17:17 -0700)]
Merge branch 'various-cleanups'

Rosen Penev says:

====================
various cleanups

Allow CI to build. Also a bugfix for dual GMAC devices.
====================

Link: https://patch.msgid.link/20240905194938.8453-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: ag71xx: disable napi interrupts during probe
Sven Eckelmann [Thu, 5 Sep 2024 19:49:38 +0000 (12:49 -0700)]
net: ag71xx: disable napi interrupts during probe

ag71xx_probe is registering ag71xx_interrupt as handler for gmac0/gmac1
interrupts. The handler is trying to use napi_schedule to handle the
processing of packets. But the netif_napi_add for this device is
called a lot later in ag71xx_probe.

It can therefore happen that a still running gmac0/gmac1 is triggering the
interrupt handler with a bit from AG71XX_INT_POLL set in
AG71XX_REG_INT_STATUS. The handler will then call napi_schedule and the
napi code will crash the system because the ag->napi is not yet
initialized.

The gmcc0/gmac1 must be brought in a state in which it doesn't signal a
AG71XX_INT_POLL related status bits as interrupt before registering the
interrupt handler. ag71xx_hw_start will take care of re-initializing the
AG71XX_REG_INT_ENABLE.

This will become relevant when dual GMAC devices get added here.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20240905194938.8453-8-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: ag71xx: remove always true branch
Rosen Penev [Thu, 5 Sep 2024 19:49:37 +0000 (12:49 -0700)]
net: ag71xx: remove always true branch

The opposite of this condition is checked above and if true, function
returns. Which means this can never be false.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20240905194938.8453-7-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: ag71xx: get reset control using devm api
Rosen Penev [Thu, 5 Sep 2024 19:49:36 +0000 (12:49 -0700)]
net: ag71xx: get reset control using devm api

Currently, the of variant is missing reset_control_put in error paths.
The devm variant does not require it.

Allows removing mdio_reset from the struct as it is not used outside the
function.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20240905194938.8453-6-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: ag71xx: use ethtool_puts
Rosen Penev [Thu, 5 Sep 2024 19:49:35 +0000 (12:49 -0700)]
net: ag71xx: use ethtool_puts

Allows simplifying get_strings and avoids manual pointer manipulation.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20240905194938.8453-5-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: ag71xx: update FIFO bits and descriptions
Rosen Penev [Thu, 5 Sep 2024 19:49:34 +0000 (12:49 -0700)]
net: ag71xx: update FIFO bits and descriptions

Taken from QCA SDK. No functional difference as same bits get applied.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20240905194938.8453-4-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: ag71xx: add MODULE_DESCRIPTION
Rosen Penev [Thu, 5 Sep 2024 19:49:33 +0000 (12:49 -0700)]
net: ag71xx: add MODULE_DESCRIPTION

Now that COMPILE_TEST is enabled, it gets flagged when building with
allmodconfig W=1 builds. Text taken from the beginning of the file.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240905194938.8453-3-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: ag71xx: add COMPILE_TEST to test compilation
Rosen Penev [Thu, 5 Sep 2024 19:49:32 +0000 (12:49 -0700)]
net: ag71xx: add COMPILE_TEST to test compilation

While this driver is meant for MIPS only, it can be compiled on x86 just
fine. Remove pointless parentheses while at it.

Enables CI building of this driver.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240905194938.8453-2-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'af_unix-correct-manage_oob-when-oob-follows-a-consumed-oob'
Jakub Kicinski [Tue, 10 Sep 2024 00:14:28 +0000 (17:14 -0700)]
Merge branch 'af_unix-correct-manage_oob-when-oob-follows-a-consumed-oob'

Kuniyuki Iwashima says:

====================
af_unix: Correct manage_oob() when OOB follows a consumed OOB.

Recently syzkaller reported UAF of OOB skb.

The bug was introduced by commit 93c99f21db36 ("af_unix: Don't stop
recv(MSG_DONTWAIT) if consumed OOB skb is at the head.") but uncovered
by another recent commit 8594d9b85c07 ("af_unix: Don't call skb_get()
for OOB skb.").

[0]: https://lore.kernel.org/netdev/00000000000083b05a06214c9ddc@google.com/
====================

Link: https://patch.msgid.link/20240905193240.17565-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoaf_unix: Don't return OOB skb in manage_oob().
Kuniyuki Iwashima [Thu, 5 Sep 2024 19:32:40 +0000 (12:32 -0700)]
af_unix: Don't return OOB skb in manage_oob().

syzbot reported use-after-free in unix_stream_recv_urg(). [0]

The scenario is

  1. send(MSG_OOB)
  2. recv(MSG_OOB)
     -> The consumed OOB remains in recv queue
  3. send(MSG_OOB)
  4. recv()
     -> manage_oob() returns the next skb of the consumed OOB
     -> This is also OOB, but unix_sk(sk)->oob_skb is not cleared
  5. recv(MSG_OOB)
     -> unix_sk(sk)->oob_skb is used but already freed

The recent commit 8594d9b85c07 ("af_unix: Don't call skb_get() for OOB
skb.") uncovered the issue.

If the OOB skb is consumed and the next skb is peeked in manage_oob(),
we still need to check if the skb is OOB.

Let's do so by falling back to the following checks in manage_oob()
and add the test case in selftest.

Note that we need to add a similar check for SIOCATMARK.

[0]:
BUG: KASAN: slab-use-after-free in unix_stream_read_actor+0xa6/0xb0 net/unix/af_unix.c:2959
Read of size 4 at addr ffff8880326abcc4 by task syz-executor178/5235

CPU: 0 UID: 0 PID: 5235 Comm: syz-executor178 Not tainted 6.11.0-rc5-syzkaller-00742-gfbdaffe41adc #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:93 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
 print_address_description mm/kasan/report.c:377 [inline]
 print_report+0x169/0x550 mm/kasan/report.c:488
 kasan_report+0x143/0x180 mm/kasan/report.c:601
 unix_stream_read_actor+0xa6/0xb0 net/unix/af_unix.c:2959
 unix_stream_recv_urg+0x1df/0x320 net/unix/af_unix.c:2640
 unix_stream_read_generic+0x2456/0x2520 net/unix/af_unix.c:2778
 unix_stream_recvmsg+0x22b/0x2c0 net/unix/af_unix.c:2996
 sock_recvmsg_nosec net/socket.c:1046 [inline]
 sock_recvmsg+0x22f/0x280 net/socket.c:1068
 ____sys_recvmsg+0x1db/0x470 net/socket.c:2816
 ___sys_recvmsg net/socket.c:2858 [inline]
 __sys_recvmsg+0x2f0/0x3e0 net/socket.c:2888
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5360d6b4e9
Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff29b3a458 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
RAX: ffffffffffffffda RBX: 00007fff29b3a638 RCX: 00007f5360d6b4e9
RDX: 0000000000002001 RSI: 0000000020000640 RDI: 0000000000000003
RBP: 00007f5360dde610 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007fff29b3a628 R14: 0000000000000001 R15: 0000000000000001
 </TASK>

Allocated by task 5235:
 kasan_save_stack mm/kasan/common.c:47 [inline]
 kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
 unpoison_slab_object mm/kasan/common.c:312 [inline]
 __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338
 kasan_slab_alloc include/linux/kasan.h:201 [inline]
 slab_post_alloc_hook mm/slub.c:3988 [inline]
 slab_alloc_node mm/slub.c:4037 [inline]
 kmem_cache_alloc_node_noprof+0x16b/0x320 mm/slub.c:4080
 __alloc_skb+0x1c3/0x440 net/core/skbuff.c:667
 alloc_skb include/linux/skbuff.h:1320 [inline]
 alloc_skb_with_frags+0xc3/0x770 net/core/skbuff.c:6528
 sock_alloc_send_pskb+0x91a/0xa60 net/core/sock.c:2815
 sock_alloc_send_skb include/net/sock.h:1778 [inline]
 queue_oob+0x108/0x680 net/unix/af_unix.c:2198
 unix_stream_sendmsg+0xd24/0xf80 net/unix/af_unix.c:2351
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0x221/0x270 net/socket.c:745
 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
 ___sys_sendmsg net/socket.c:2651 [inline]
 __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2680
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 5235:
 kasan_save_stack mm/kasan/common.c:47 [inline]
 kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
 poison_slab_object+0xe0/0x150 mm/kasan/common.c:240
 __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256
 kasan_slab_free include/linux/kasan.h:184 [inline]
 slab_free_hook mm/slub.c:2252 [inline]
 slab_free mm/slub.c:4473 [inline]
 kmem_cache_free+0x145/0x350 mm/slub.c:4548
 unix_stream_read_generic+0x1ef6/0x2520 net/unix/af_unix.c:2917
 unix_stream_recvmsg+0x22b/0x2c0 net/unix/af_unix.c:2996
 sock_recvmsg_nosec net/socket.c:1046 [inline]
 sock_recvmsg+0x22f/0x280 net/socket.c:1068
 __sys_recvfrom+0x256/0x3e0 net/socket.c:2255
 __do_sys_recvfrom net/socket.c:2273 [inline]
 __se_sys_recvfrom net/socket.c:2269 [inline]
 __x64_sys_recvfrom+0xde/0x100 net/socket.c:2269
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff8880326abc80
 which belongs to the cache skbuff_head_cache of size 240
The buggy address is located 68 bytes inside of
 freed 240-byte region [ffff8880326abc80ffff8880326abd70)

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x326ab
ksm flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xfdffffff(slab)
raw: 00fff00000000000 ffff88801eaee780 ffffea0000b7dc80 dead000000000003
raw: 0000000000000000 00000000800c000c 00000001fdffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 4686, tgid 4686 (udevadm), ts 32357469485, free_ts 28829011109
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1493
 prep_new_page mm/page_alloc.c:1501 [inline]
 get_page_from_freelist+0x2e4c/0x2f10 mm/page_alloc.c:3439
 __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4695
 __alloc_pages_node_noprof include/linux/gfp.h:269 [inline]
 alloc_pages_node_noprof include/linux/gfp.h:296 [inline]
 alloc_slab_page+0x5f/0x120 mm/slub.c:2321
 allocate_slab+0x5a/0x2f0 mm/slub.c:2484
 new_slab mm/slub.c:2537 [inline]
 ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3723
 __slab_alloc+0x58/0xa0 mm/slub.c:3813
 __slab_alloc_node mm/slub.c:3866 [inline]
 slab_alloc_node mm/slub.c:4025 [inline]
 kmem_cache_alloc_node_noprof+0x1fe/0x320 mm/slub.c:4080
 __alloc_skb+0x1c3/0x440 net/core/skbuff.c:667
 alloc_skb include/linux/skbuff.h:1320 [inline]
 alloc_uevent_skb+0x74/0x230 lib/kobject_uevent.c:289
 uevent_net_broadcast_untagged lib/kobject_uevent.c:326 [inline]
 kobject_uevent_net_broadcast+0x2fd/0x580 lib/kobject_uevent.c:410
 kobject_uevent_env+0x57d/0x8e0 lib/kobject_uevent.c:608
 kobject_synth_uevent+0x4ef/0xae0 lib/kobject_uevent.c:207
 uevent_store+0x4b/0x70 drivers/base/bus.c:633
 kernfs_fop_write_iter+0x3a1/0x500 fs/kernfs/file.c:334
 new_sync_write fs/read_write.c:497 [inline]
 vfs_write+0xa72/0xc90 fs/read_write.c:590
page last free pid 1 tgid 1 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 free_pages_prepare mm/page_alloc.c:1094 [inline]
 free_unref_page+0xd22/0xea0 mm/page_alloc.c:2612
 kasan_depopulate_vmalloc_pte+0x74/0x90 mm/kasan/shadow.c:408
 apply_to_pte_range mm/memory.c:2797 [inline]
 apply_to_pmd_range mm/memory.c:2841 [inline]
 apply_to_pud_range mm/memory.c:2877 [inline]
 apply_to_p4d_range mm/memory.c:2913 [inline]
 __apply_to_page_range+0x8a8/0xe50 mm/memory.c:2947
 kasan_release_vmalloc+0x9a/0xb0 mm/kasan/shadow.c:525
 purge_vmap_node+0x3e3/0x770 mm/vmalloc.c:2208
 __purge_vmap_area_lazy+0x708/0xae0 mm/vmalloc.c:2290
 _vm_unmap_aliases+0x79d/0x840 mm/vmalloc.c:2885
 change_page_attr_set_clr+0x2fe/0xdb0 arch/x86/mm/pat/set_memory.c:1881
 change_page_attr_set arch/x86/mm/pat/set_memory.c:1922 [inline]
 set_memory_nx+0xf2/0x130 arch/x86/mm/pat/set_memory.c:2110
 free_init_pages arch/x86/mm/init.c:924 [inline]
 free_kernel_image_pages arch/x86/mm/init.c:943 [inline]
 free_initmem+0x79/0x110 arch/x86/mm/init.c:970
 kernel_init+0x31/0x2b0 init/main.c:1476
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Memory state around the buggy address:
 ffff8880326abb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8880326abc00: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
>ffff8880326abc80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
 ffff8880326abd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc
 ffff8880326abd80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb

Fixes: 93c99f21db36 ("af_unix: Don't stop recv(MSG_DONTWAIT) if consumed OOB skb is at the head.")
Reported-by: syzbot+8811381d455e3e9ec788@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=8811381d455e3e9ec788
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240905193240.17565-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoaf_unix: Move spin_lock() in manage_oob().
Kuniyuki Iwashima [Thu, 5 Sep 2024 19:32:39 +0000 (12:32 -0700)]
af_unix: Move spin_lock() in manage_oob().

When OOB skb has been already consumed, manage_oob() returns the next
skb if exists.  In such a case, we need to fall back to the else branch
below.

Then, we want to keep holding spin_lock(&sk->sk_receive_queue.lock).

Let's move it out of if-else branch and add lightweight check before
spin_lock() for major use cases without OOB skb.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240905193240.17565-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoaf_unix: Rename unlinked_skb in manage_oob().
Kuniyuki Iwashima [Thu, 5 Sep 2024 19:32:38 +0000 (12:32 -0700)]
af_unix: Rename unlinked_skb in manage_oob().

When OOB skb has been already consumed, manage_oob() returns the next
skb if exists.  In such a case, we need to fall back to the else branch
below.

Then, we need to keep two skbs and free them later with consume_skb()
and kfree_skb().

Let's rename unlinked_skb accordingly.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240905193240.17565-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoaf_unix: Remove single nest in manage_oob().
Kuniyuki Iwashima [Thu, 5 Sep 2024 19:32:37 +0000 (12:32 -0700)]
af_unix: Remove single nest in manage_oob().

This is a prep for the later fix.

No functional change intended.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240905193240.17565-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge tag 'linux-can-next-for-6.12-20240909' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Tue, 10 Sep 2024 00:11:05 +0000 (17:11 -0700)]
Merge tag 'linux-can-next-for-6.12-20240909' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2024-09-09

The first patch is by Rob Herring and simplifies the DT parsing in the
cc770 driver.

The next 2 patches both target the rockchip_canfd driver added in the
last pull request to net-next. The first one is by Nathan Chancellor
and fixes the return type of rkcanfd_start_xmit(), the second one is
by me and fixes a 64 bit division on 32 bit platforms in
rkcanfd_timestamp_init().

* tag 'linux-can-next-for-6.12-20240909' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
  can: rockchip_canfd: rkcanfd_timestamp_init(): fix 64 bit division on 32 bit platforms
  can: rockchip_canfd: fix return type of rkcanfd_start_xmit()
  net: can: cc770: Simplify parsing DT properties
====================

Link: https://patch.msgid.link/20240909063657.2287493-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: remove dev_pick_tx_cpu_id()
Jakub Kicinski [Fri, 6 Sep 2024 16:10:59 +0000 (09:10 -0700)]
net: remove dev_pick_tx_cpu_id()

dev_pick_tx_cpu_id() has been introduced with two users by
commit a4ea8a3dacc3 ("net: Add generic ndo_select_queue functions").
The use in AF_PACKET has been removed in 2019 by
commit b71b5837f871 ("packet: rework packet_pick_tx_queue() to use common code selection")
The other user was a Netlogic XLP driver, removed in 2021 by
commit 47ac6f567c28 ("staging: Remove Netlogic XLP network driver").

It's relatively unlikely that any modern driver will need an
.ndo_select_queue implementation which picks purely based on CPU ID
and skips XPS, delete dev_pick_tx_cpu_id()

Found by code inspection.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240906161059.715546-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'selftests-mptcp-add-time-per-subtests-in-tap-output'
Jakub Kicinski [Mon, 9 Sep 2024 23:52:06 +0000 (16:52 -0700)]
Merge branch 'selftests-mptcp-add-time-per-subtests-in-tap-output'

Matthieu Baerts says:

====================
selftests: mptcp: add time per subtests in TAP output

Patches here add 'time=<N>ms' in the diagnostic data of the TAP output,
e.g.

  ok 1 - pm_netlink: defaults addr list # time=9ms

This addition is useful to quickly identify which subtests are taking a
longer time than the others, or more than expected.

Note that there are no specific formats to follow to show this time
according to the TAP 13, TAP 14 and KTAP specifications, but we follow
the format being parsed by NIPA [1].

Patch 1 modifies mptcp_lib.sh to add this support to all MPTCP
selftests.

Patch 2 removes the now duplicated info in mptcp_connect.sh

Patch 3 slightly improves the precision of the first subtests in all
MPTCP subtests.

Patches 4 and 5 remove duplicated spaces in TAP output, for the TAP
parsers that cannot handle them properly.

v1: https://lore.kernel.org/20240902-net-next-mptcp-ksft-subtest-time-v1-0-f1ed499a11b1@kernel.org
Link: https://github.com/linux-netdev/nipa/pull/36
====================

Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-0-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests: mptcp: connect: remove duplicated spaces in TAP output
Matthieu Baerts (NGI0) [Fri, 6 Sep 2024 18:46:11 +0000 (20:46 +0200)]
selftests: mptcp: connect: remove duplicated spaces in TAP output

It is nice to have a visual alignment in the test output to present the
different results, but it makes less sense in the TAP output that is
there for computers.

It sounds then better to remove the duplicated whitespaces in the TAP
output, also because it can cause some issues with TAP parsers expecting
only one space around the directive delimiter (#).

While at it, change the variable name (result_msg) to something more
explicit.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-5-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests: mptcp: diag: remove trailing whitespace
Matthieu Baerts (NGI0) [Fri, 6 Sep 2024 18:46:10 +0000 (20:46 +0200)]
selftests: mptcp: diag: remove trailing whitespace

It doesn't need to be there, and it can cause some issues with TAP
parsers expecting only one space around the directive delimiter (#).

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-4-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests: mptcp: reset the last TS before the first test
Matthieu Baerts (NGI0) [Fri, 6 Sep 2024 18:46:09 +0000 (20:46 +0200)]
selftests: mptcp: reset the last TS before the first test

Just to slightly improve the precision of the duration of the first
test.

In mptcp_join.sh, the last append_prev_results is now done as soon as
the last test is over: this will add the last result in the list, and
get a more precise time for this last test.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-3-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests: mptcp: connect: remote time in TAP output
Matthieu Baerts (NGI0) [Fri, 6 Sep 2024 18:46:08 +0000 (20:46 +0200)]
selftests: mptcp: connect: remote time in TAP output

It is now added by the MPTCP lib automatically, see the parent commit.

The time in the TAP output might be slightly different from the one
displayed before, but that's OK.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-2-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests: mptcp: lib: add time per subtests in TAP output
Matthieu Baerts (NGI0) [Fri, 6 Sep 2024 18:46:07 +0000 (20:46 +0200)]
selftests: mptcp: lib: add time per subtests in TAP output

It adds 'time=<N>ms' in the diagnostic data of the TAP output, e.g.

  ok 1 - pm_netlink: defaults addr list # time=9ms

This addition is useful to quickly identify which subtests are taking a
longer time than the others, or more than expected.

Note that there are no specific formats to follow to show this time
according to the TAP 13 [1], TAP 14 [2] and KTAP [3] specifications.
Let's then define this one here.

Link: https://testanything.org/tap-version-13-specification.html
Link: https://testanything.org/tap-version-14-specification.html
Link: https://docs.kernel.org/dev-tools/ktap.html
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-1-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests: return failure when timestamps can't be reported
Jason Xing [Thu, 5 Sep 2024 16:00:35 +0000 (00:00 +0800)]
selftests: return failure when timestamps can't be reported

When I was trying to modify the tx timestamping feature, I found that
running "./txtimestamp -4 -C -L 127.0.0.1" didn't reflect the error:
I succeeded to generate timestamp stored in the skb but later failed
to report it to the userspace (which means failed to put css into cmsg).
It can happen when someone writes buggy codes in __sock_recv_timestamp(),
for example.

After adding the check so that running ./txtimestamp will reflect the
result correctly like this if there is a bug in the reporting phase:
protocol:     TCP
payload:      10
server port:  9000

family:       INET
test SND
    USR: 1725458477 s 667997 us (seq=0, len=0)
Failed to report timestamps
    USR: 1725458477 s 718128 us (seq=0, len=0)
Failed to report timestamps
    USR: 1725458477 s 768273 us (seq=0, len=0)
Failed to report timestamps
    USR: 1725458477 s 818416 us (seq=0, len=0)
Failed to report timestamps
...

In the future, it will help us detect whether the new coming patch has
bugs or not.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240905160035.62407-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'unmask-dscp-part-four'
David S. Miller [Mon, 9 Sep 2024 13:14:54 +0000 (14:14 +0100)]
Merge branch 'unmask-dscp-part-four'

Ido Schimmel says:

====================
Unmask upper DSCP bits - part 4 (last)

tl;dr - This patchset finishes to unmask the upper DSCP bits in the IPv4
flow key in preparation for allowing IPv4 FIB rules to match on DSCP. No
functional changes are expected.

The TOS field in the IPv4 flow key ('flowi4_tos') is used during FIB
lookup to match against the TOS selector in FIB rules and routes.

It is currently impossible for user space to configure FIB rules that
match on the DSCP value as the upper DSCP bits are either masked in the
various call sites that initialize the IPv4 flow key or along the path
to the FIB core.

In preparation for adding a DSCP selector to IPv4 and IPv6 FIB rules, we
need to make sure the entire DSCP value is present in the IPv4 flow key.
This patchset finishes to unmask the upper DSCP bits by adjusting all
the callers of ip_route_output_key() to properly initialize the full
DSCP value in the IPv4 flow key.

No functional changes are expected as commit 1fa3314c14c6 ("ipv4:
Centralize TOS matching") moved the masking of the upper DSCP bits to
the core where 'flowi4_tos' is matched against the TOS selector.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agosctp: Unmask upper DSCP bits in sctp_v4_get_dst()
Ido Schimmel [Thu, 5 Sep 2024 16:51:40 +0000 (19:51 +0300)]
sctp: Unmask upper DSCP bits in sctp_v4_get_dst()

Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.

Note that the 'tos' variable holds the full DS field.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agoipv4: udp_tunnel: Unmask upper DSCP bits in udp_tunnel_dst_lookup()
Ido Schimmel [Thu, 5 Sep 2024 16:51:39 +0000 (19:51 +0300)]
ipv4: udp_tunnel: Unmask upper DSCP bits in udp_tunnel_dst_lookup()

Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.

Note that callers of udp_tunnel_dst_lookup() pass the entire DS field in
the 'tos' argument.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agonetfilter: nf_dup4: Unmask upper DSCP bits in nf_dup_ipv4_route()
Ido Schimmel [Thu, 5 Sep 2024 16:51:38 +0000 (19:51 +0300)]
netfilter: nf_dup4: Unmask upper DSCP bits in nf_dup_ipv4_route()

Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agonetfilter: nft_flow_offload: Unmask upper DSCP bits in nft_flow_route()
Ido Schimmel [Thu, 5 Sep 2024 16:51:37 +0000 (19:51 +0300)]
netfilter: nft_flow_offload: Unmask upper DSCP bits in nft_flow_route()

Unmask the upper DSCP bits when calling nf_route() which eventually
calls ip_route_output_key() so that in the future it could perform the
FIB lookup according to the full DSCP value.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agoipv4: netfilter: Unmask upper DSCP bits in ip_route_me_harder()
Ido Schimmel [Thu, 5 Sep 2024 16:51:36 +0000 (19:51 +0300)]
ipv4: netfilter: Unmask upper DSCP bits in ip_route_me_harder()

Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agoipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_xmit()
Ido Schimmel [Thu, 5 Sep 2024 16:51:35 +0000 (19:51 +0300)]
ipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_xmit()

Unmask the upper DSCP bits when initializing an IPv4 flow key via
ip_tunnel_init_flow() before passing it to ip_route_output_key() so that
in the future we could perform the FIB lookup according to the full DSCP
value.

Note that the 'tos' variable includes the full DS field. Either the one
specified as part of the tunnel parameters or the one inherited from the
inner packet.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agoipv4: ip_tunnel: Unmask upper DSCP bits in ip_md_tunnel_xmit()
Ido Schimmel [Thu, 5 Sep 2024 16:51:34 +0000 (19:51 +0300)]
ipv4: ip_tunnel: Unmask upper DSCP bits in ip_md_tunnel_xmit()

Unmask the upper DSCP bits when initializing an IPv4 flow key via
ip_tunnel_init_flow() before passing it to ip_route_output_key() so that
in the future we could perform the FIB lookup according to the full DSCP
value.

Note that the 'tos' variable includes the full DS field. Either the one
specified via the tunnel key or the one inherited from the inner packet.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agoipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_bind_dev()
Ido Schimmel [Thu, 5 Sep 2024 16:51:33 +0000 (19:51 +0300)]
ipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_bind_dev()

Unmask the upper DSCP bits when initializing an IPv4 flow key via
ip_tunnel_init_flow() before passing it to ip_route_output_key() so that
in the future we could perform the FIB lookup according to the full DSCP
value.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agoipv4: icmp: Unmask upper DSCP bits in icmp_reply()
Ido Schimmel [Thu, 5 Sep 2024 16:51:32 +0000 (19:51 +0300)]
ipv4: icmp: Unmask upper DSCP bits in icmp_reply()

Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agobpf: lwtunnel: Unmask upper DSCP bits in bpf_lwt_xmit_reroute()
Ido Schimmel [Thu, 5 Sep 2024 16:51:31 +0000 (19:51 +0300)]
bpf: lwtunnel: Unmask upper DSCP bits in bpf_lwt_xmit_reroute()

Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agoipv4: ip_gre: Unmask upper DSCP bits in ipgre_open()
Ido Schimmel [Thu, 5 Sep 2024 16:51:30 +0000 (19:51 +0300)]
ipv4: ip_gre: Unmask upper DSCP bits in ipgre_open()

Unmask the upper DSCP bits when calling ip_route_output_gre() so that in
the future it could perform the FIB lookup according to the full DSCP
value.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agonetfilter: br_netfilter: Unmask upper DSCP bits in br_nf_pre_routing_finish()
Ido Schimmel [Thu, 5 Sep 2024 16:51:29 +0000 (19:51 +0300)]
netfilter: br_netfilter: Unmask upper DSCP bits in br_nf_pre_routing_finish()

Unmask upper DSCP bits when calling ip_route_output() so that in the
future it could perform the FIB lookup according to the full DSCP value.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agonet: sysfs: Fix weird usage of class's namespace relevant fields
Zijun Hu [Wed, 4 Sep 2024 23:35:38 +0000 (07:35 +0800)]
net: sysfs: Fix weird usage of class's namespace relevant fields

Device class has two namespace relevant fields which are associated by
the following usage:

struct class {
...
const struct kobj_ns_type_operations *ns_type;
const void *(*namespace)(const struct device *dev);
...
}
if (dev->class && dev->class->ns_type)
dev->class->namespace(dev);

The usage looks weird since it checks @ns_type but calls namespace()
it is found for all existing class definitions that the other filed is
also assigned once one is assigned in current kernel tree, so fix this
weird usage by checking @namespace to call namespace().

Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agoMerge branch 'fs_enet-cleanup'
David S. Miller [Mon, 9 Sep 2024 09:29:05 +0000 (10:29 +0100)]
Merge branch 'fs_enet-cleanup'

Maxime Chevallier says:

====================
net: ethernet: fs_enet: Cleanup and phylink conversion

This is V3 of a series that cleans-up fs_enet, with the ultimate goal of
converting it to phylink (patch 8).

The main changes compared to V2 are :
 - Reviewed-by tags from Andrew were gathered
 - Patch 5 now includes the removal of now unused includes, thanks
   Andrew for spotting this
 - Patch 4 is new, it reworks the adjust_link to move the spinlock
   acquisition to a more suitable location. Although this dissapears in
   the actual phylink port, it makes the phylink conversion clearer on
   that point
 - Patch 8 includes fixes in the tx_timeout cancellation, to prevent
   taking rtnl twice when canceling a pending tx_timeout. Thanks Jakub
   for spotting this.

Link to V2: https://lore.kernel.org/netdev/20240829161531.610874-1-maxime.chevallier@bootlin.com/
Link to V1: https://lore.kernel.org/netdev/20240828095103.132625-1-maxime.chevallier@bootlin.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agonet: ethernet: fs_enet: phylink conversion
Maxime Chevallier [Wed, 4 Sep 2024 17:18:21 +0000 (19:18 +0200)]
net: ethernet: fs_enet: phylink conversion

fs_enet is a quite old but still used Ethernet driver found on some NXP
devices. It has support for 10/100 Mbps ethernet, with half and full
duplex. Some variants of it can use RMII, while other integrations are
MII-only.

Add phylink support, thus removing custom fixed-link hanldling.

This also allows removing some internal flags such as the use_rmii flag.

Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agonet: ethernet: fs_enet: simplify clock handling with devm accessors
Maxime Chevallier [Wed, 4 Sep 2024 17:18:20 +0000 (19:18 +0200)]
net: ethernet: fs_enet: simplify clock handling with devm accessors

devm_clock_get_enabled() can be used to simplify clock handling for the
PER register clock.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agonet: ethernet: fs_enet: use macros for speed and duplex values
Maxime Chevallier [Wed, 4 Sep 2024 17:18:19 +0000 (19:18 +0200)]
net: ethernet: fs_enet: use macros for speed and duplex values

The PHY speed and duplex should be manipulated using the SPEED_XXX and
DUPLEX_XXX macros available. Use it in the fcc, fec and scc MAC for
fs_enet.

Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agonet: ethernet: fs_enet: drop unused phy_info and mii_if_info
Maxime Chevallier [Wed, 4 Sep 2024 17:18:18 +0000 (19:18 +0200)]
net: ethernet: fs_enet: drop unused phy_info and mii_if_info

There's no user of the struct phy_info, the 'phy' field and the
mii_if_info in the fs_enet driver, probably dating back when phylib
wasn't as widely used.  Drop these from the driver code.

As the definition for struct mii_if_info is no longer required, drop the
include for linux/mii.h altogether in the driver.

Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agonet: ethernet: fs_enet: only protect the .restart() call in .adjust_link
Maxime Chevallier [Wed, 4 Sep 2024 17:18:17 +0000 (19:18 +0200)]
net: ethernet: fs_enet: only protect the .restart() call in .adjust_link

When .adjust_link() gets called, it runs in thread context, with the
phydev->lock held. We only need to protect the fep->fecp/fccp/sccp
register that are accessed within the .restart() function from
concurrent access from the interrupts.

These registers are being protected by the fep->lock spinlock, so we can
move the spinlock protection around the .restart() call instead of the
entire adjust_link() call. By doing so, we can simplify further the
.adjust_link() callback and avoid the intermediate helper.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agonet: ethernet: fs_enet: drop the .adjust_link custom fs_ops
Maxime Chevallier [Wed, 4 Sep 2024 17:18:16 +0000 (19:18 +0200)]
net: ethernet: fs_enet: drop the .adjust_link custom fs_ops

There's no in-tree user for the fs_ops .adjust_link() function, so we
can always use the generic one in fe_enet-main.

Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agonet: ethernet: fs_enet: cosmetic cleanups
Maxime Chevallier [Wed, 4 Sep 2024 17:18:15 +0000 (19:18 +0200)]
net: ethernet: fs_enet: cosmetic cleanups

Due to the age of the driver and the slow recent activity on it, the code
has taken some layers of dust. Clean the main driver file up so that it
passes checkpatch and also conforms with the net coding style.

Changes include :
 - Re-ordering of the variable declarations for RCT
 - Fixing the comment styles to either one-line comments, or net-style
   comments
 - Adding braces around single-statement 'else' clauses
 - Aligning function/macro parameters on the opening parenthesis
 - Simplifying checks for NULL pointers
 - Splitting cascaded assignments into individual assignments
 - Fixing some typos
 - Fixing whitespace issues

This is a cosmetic change and doesn't introduce any change in behaviour.

Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agonet: ethernet: fs_enet: convert to SPDX
Maxime Chevallier [Wed, 4 Sep 2024 17:18:14 +0000 (19:18 +0200)]
net: ethernet: fs_enet: convert to SPDX

The ENET driver has SPDX tags in the header files, but they were missing
in the C files. Change the licence information to SPDX format.

Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agocan: rockchip_canfd: rkcanfd_timestamp_init(): fix 64 bit division on 32 bit platforms
Marc Kleine-Budde [Sun, 8 Sep 2024 15:00:00 +0000 (17:00 +0200)]
can: rockchip_canfd: rkcanfd_timestamp_init(): fix 64 bit division on 32 bit platforms

On some 32-bit platforms (at least on parisc), the compiler generates
a call to __divdi3() from the u32 by 3 division in
rkcanfd_timestamp_init(), which results in the following linker
error:

| ERROR: modpost: "__divdi3" [drivers/net/can/rockchip/rockchip_canfd.ko] undefined!

As this code doesn't run in the hot path, a 64 bit by 32 bit division
is OK, even on 32 bit platforms. Use an explicit call to div_u64() to
fix linking.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202409072304.lCQWyNLU-lkp@intel.com/
Link: https://patch.msgid.link/20240909-can-rockchip_canfd-fix-64-bit-division-v1-1-2748d9422b00@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
5 weeks agocan: rockchip_canfd: fix return type of rkcanfd_start_xmit()
Nathan Chancellor [Fri, 6 Sep 2024 20:26:41 +0000 (13:26 -0700)]
can: rockchip_canfd: fix return type of rkcanfd_start_xmit()

With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
warning in clang aims to catch these at compile time, which reveals:

  drivers/net/can/rockchip/rockchip_canfd-core.c:770:20: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
    770 |         .ndo_start_xmit = rkcanfd_start_xmit,
        |                           ^~~~~~~~~~~~~~~~~~

->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int' (although the types are ABI compatible). Adjust
the return type of rkcanfd_start_xmit() to match the prototype's to
resolve the warning.

Fixes: ff60bfbaf67f ("can: rockchip_canfd: add driver for Rockchip CAN-FD controller")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20240906-rockchip-canfd-wifpts-v1-1-b1398da865b7@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
5 weeks agonet: can: cc770: Simplify parsing DT properties
Rob Herring (Arm) [Tue, 3 Sep 2024 13:57:30 +0000 (08:57 -0500)]
net: can: cc770: Simplify parsing DT properties

Use of the typed property accessors is preferred over of_get_property().
The existing code doesn't work on little endian systems either. Replace
the of_get_property() calls with of_property_read_bool() and
of_property_read_u32().

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20240903135731.405635-1-robh@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
5 weeks agoptp/ioctl: support MONOTONIC{,_RAW} timestamps for PTP_SYS_OFFSET_EXTENDED
Mahesh Bandewar [Wed, 4 Sep 2024 14:13:05 +0000 (07:13 -0700)]
ptp/ioctl: support MONOTONIC{,_RAW} timestamps for PTP_SYS_OFFSET_EXTENDED

The ability to read the PHC (Physical Hardware Clock) alongside
multiple system clocks is currently dependent on the specific
hardware architecture. This limitation restricts the use of
PTP_SYS_OFFSET_PRECISE to certain hardware configurations.

The generic soultion which would work across all architectures
is to read the PHC along with the latency to perform PHC-read as
offered by PTP_SYS_OFFSET_EXTENDED which provides pre and post
timestamps.  However, these timestamps are currently limited
to the CLOCK_REALTIME timebase. Since CLOCK_REALTIME is affected
by NTP (or similar time synchronization services), it can
experience significant jumps forward or backward. This hinders
the precise latency measurements that PTP_SYS_OFFSET_EXTENDED
is designed to provide.

This problem could be addressed by supporting MONOTONIC_RAW
timestamps within PTP_SYS_OFFSET_EXTENDED. Unlike CLOCK_REALTIME
or CLOCK_MONOTONIC, the MONOTONIC_RAW timebase is unaffected
by NTP adjustments.

This enhancement can be implemented by utilizing one of the three
reserved words within the PTP_SYS_OFFSET_EXTENDED struct to pass
the clock-id for timestamps.  The current behavior aligns with
clock-id for CLOCK_REALTIME timebase (value of 0), ensuring
backward compatibility of the UAPI.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agonet: sched: consistently use rcu_replace_pointer() in taprio_change()
Dmitry Antipov [Wed, 4 Sep 2024 11:54:01 +0000 (14:54 +0300)]
net: sched: consistently use rcu_replace_pointer() in taprio_change()

According to Vinicius (and carefully looking through the whole
https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa
once again), txtime branch of 'taprio_change()' is not going to
race against 'advance_sched()'. But using 'rcu_replace_pointer()'
in the former may be a good idea as well.

Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agoMerge tag 'nf-next-24-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilt...
Jakub Kicinski [Sat, 7 Sep 2024 01:39:31 +0000 (18:39 -0700)]
Merge tag 'nf-next-24-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

Patch #1 adds ctnetlink support for kernel side filtering for
 deletions, from Changliang Wu.

Patch #2 updates nft_counter support to Use u64_stats_t,
 from Sebastian Andrzej Siewior.

Patch #3 uses kmemdup_array() in all xtables frontends,
 from Yan Zhen.

Patch #4 is a oneliner to use ERR_CAST() in nf_conntrack instead
 opencoded casting, from Shen Lichuan.

Patch #5 removes unused argument in nftables .validate interface,
 from Florian Westphal.

Patch #6 is a oneliner to correct a typo in nftables kdoc,
 from Simon Horman.

Patch #7 fixes missing kdoc in nftables, also from Simon.

Patch #8 updates nftables to handle timeout less than CONFIG_HZ.

Patch #9 rejects element expiration if timeout is zero,
 otherwise it is silently ignored.

Patch #10 disallows element expiration larger than timeout.

Patch #11 removes unnecessary READ_ONCE annotation while mutex is held.

Patch #12 adds missing READ_ONCE/WRITE_ONCE annotation in dynset.

Patch #13 annotates data-races around element expiration.

Patch #14 allocates timeout and expiration in one single set element
  extension, they are tighly couple, no reason to keep them
  separated anymore.

Patch #15 updates nftables to interpret zero timeout element as never
  times out. Note that it is already possible to declare sets
  with elements that never time out but this generalizes to all
  kind of set with timeouts.

Patch #16 supports for element timeout and expiration updates.

* tag 'nf-next-24-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: set element timeout update support
  netfilter: nf_tables: zero timeout means element never times out
  netfilter: nf_tables: consolidate timeout extension for elements
  netfilter: nf_tables: annotate data-races around element expiration
  netfilter: nft_dynset: annotate data-races around set timeout
  netfilter: nf_tables: remove annotation to access set timeout while holding lock
  netfilter: nf_tables: reject expiration higher than timeout
  netfilter: nf_tables: reject element expiration with no timeout
  netfilter: nf_tables: elements with timeout below CONFIG_HZ never expire
  netfilter: nf_tables: Add missing Kernel doc
  netfilter: nf_tables: Correct spelling in nf_tables.h
  netfilter: nf_tables: drop unused 3rd argument from validate callback ops
  netfilter: conntrack: Convert to use ERR_CAST()
  netfilter: Use kmemdup_array instead of kmemdup for multiple allocation
  netfilter: nft_counter: Use u64_stats_t for statistic.
  netfilter: ctnetlink: support CTA_FILTER for flush
====================

Link: https://patch.msgid.link/20240905232920.5481-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoptp: ocp: Improve PCIe delay estimation
Vadim Fedorenko [Thu, 5 Sep 2024 14:00:28 +0000 (14:00 +0000)]
ptp: ocp: Improve PCIe delay estimation

The PCIe bus can be pretty busy during boot and probe function can
see excessive delays. Let's find the minimal value out of several
tests and use it as estimated value.

Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20240905140028.560454-1-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonetpoll: remove netpoll_srcu
Eric Dumazet [Thu, 5 Sep 2024 08:49:09 +0000 (08:49 +0000)]
netpoll: remove netpoll_srcu

netpoll_srcu is currently used from netpoll_poll_disable() and
__netpoll_cleanup()

Both functions run under RTNL, using netpoll_srcu adds confusion
and no additional protection.

Moreover the synchronize_srcu() call in __netpoll_cleanup() is
performed before clearing np->dev->npinfo, which violates RCU rules.

After this patch, netpoll_poll_disable() and netpoll_poll_enable()
simply use rtnl_dereference().

This saves a big chunk of memory (more than 192KB on platforms
with 512 cpus)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20240905084909.2082486-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'octeontx2-address-some-warnings'
Jakub Kicinski [Sat, 7 Sep 2024 01:23:51 +0000 (18:23 -0700)]
Merge branch 'octeontx2-address-some-warnings'

Simon Horman says:

====================
octeontx2: Address some warnings

This patchset addresses some warnings flagged by Sparse, gcc-14, and
clang-18 in files touched by recent patch submissions.

Although these changes do not alter the functionality of the code, by
addressing them real problems introduced in future which are flagged by
Sparse will stand out more readily.

v1: https://lore.kernel.org/20240903-octeontx2-sparse-v1-0-f190309ecb0a@kernel.org
====================

Link: https://patch.msgid.link/20240904-octeontx2-sparse-v2-0-14f2305fe4b2@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoocteontx2-pf: Make iplen __be16 in otx2_sqe_add_ext()
Simon Horman [Wed, 4 Sep 2024 18:29:37 +0000 (19:29 +0100)]
octeontx2-pf: Make iplen __be16 in otx2_sqe_add_ext()

In otx2_sqe_add_ext() iplen is used to hold a 16-bit big-endian value,
but it's type is u16, indicating a host byte order integer.

Address this mismatch by changing the type of iplen to __be16.

Flagged by Sparse as:

.../otx2_txrx.c:699:31: warning: incorrect type in assignment (different base types)
.../otx2_txrx.c:699:31:    expected unsigned short [usertype] iplen
.../otx2_txrx.c:699:31:    got restricted __be16 [usertype]
.../otx2_txrx.c:701:54: warning: incorrect type in assignment (different base types)
.../otx2_txrx.c:701:54:    expected restricted __be16 [usertype] tot_len
.../otx2_txrx.c:701:54:    got unsigned short [usertype] iplen
.../otx2_txrx.c:704:60: warning: incorrect type in assignment (different base types)
.../otx2_txrx.c:704:60:    expected restricted __be16 [usertype] payload_len
.../otx2_txrx.c:704:60:    got unsigned short [usertype] iplen

Introduced in
commit dc1a9bf2c816 ("octeontx2-pf: Add UDP segmentation offload support")

No functional change intended.
Compile tested only by author.

Tested-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240904-octeontx2-sparse-v2-2-14f2305fe4b2@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoocteontx2-af: Pass string literal as format argument of alloc_workqueue()
Simon Horman [Wed, 4 Sep 2024 18:29:36 +0000 (19:29 +0100)]
octeontx2-af: Pass string literal as format argument of alloc_workqueue()

Recently I noticed that both gcc-14 and clang-18 report that passing
a non-string literal as the format argument of alloc_workqueue()
is potentially insecure.

E.g. clang-18 says:

.../rvu.c:2493:32: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
 2493 |         mw->mbox_wq = alloc_workqueue(name,
      |                                       ^~~~
.../rvu.c:2493:32: note: treat the string as an argument to avoid this
 2493 |         mw->mbox_wq = alloc_workqueue(name,
      |                                       ^
      |                                       "%s",

It is always the case where the contents of name is safe to pass as the
format argument. That is, in my understanding, it never contains any
format escape sequences.

But, it seems better to be safe than sorry. And, as a bonus, compiler
output becomes less verbose by addressing this issue as suggested by
clang-18.

Compile tested only by author.

Tested-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240904-octeontx2-sparse-v2-1-14f2305fe4b2@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>