Pavel Begunkov [Fri, 23 Jun 2023 12:38:55 +0000 (13:38 +0100)]
net/tcp: optimise locking for blocking splice
Even when tcp_splice_read() reads all it was asked for, for blocking
sockets it'll release and immediately regrab the socket lock, loop
around and break on the while check.
Check tss.len right after we adjust it, and return if we're done.
That saves us one release_sock(); lock_sock(); pair per successful
blocking splice read.
Jakub Kicinski [Sat, 24 Jun 2023 22:12:05 +0000 (15:12 -0700)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-06-22 (iavf)
This series contains updates to iavf driver only.
Przemek defers removing, previous, primary MAC address until after
getting result of adding its replacement. He also does some cleanup by
removing unused functions and making applicable functions static.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
iavf: make functions static where possible
iavf: remove some unused functions and pointless wrappers
iavf: fix err handling for MAC replace
====================
Randy Dunlap [Thu, 22 Jun 2023 15:54:09 +0000 (08:54 -0700)]
revert "s390/net: lcs: use IS_ENABLED() for kconfig detection"
The referenced patch is causing build errors when ETHERNET=y and
FDDI=m. While we work out the preferred patch(es), revert this patch
to make the pain go away.
Fixes: 128272336120 ("s390/net: lcs: use IS_ENABLED() for kconfig detection") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com>
Link: lore.kernel.org/r/202306202129.pl0AqK8G-lkp@intel.com Cc: Alexandra Winter <wintera@linux.ibm.com> Cc: Wenjia Zhang <wenjia@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230622155409.27311-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We've added 49 non-merge commits during the last 24 day(s) which contain
a total of 70 files changed, 1935 insertions(+), 442 deletions(-).
The main changes are:
1) Extend bpf_fib_lookup helper to allow passing the route table ID,
from Louis DeLosSantos.
2) Fix regsafe() in verifier to call check_ids() for scalar registers,
from Eduard Zingerman.
3) Extend the set of cpumask kfuncs with bpf_cpumask_first_and()
and a rework of bpf_cpumask_any*() kfuncs. Additionally,
add selftests, from David Vernet.
4) Fix socket lookup BPF helpers for tc/XDP to respect VRF bindings,
from Gilad Sever.
5) Change bpf_link_put() to use workqueue unconditionally to fix it
under PREEMPT_RT, from Sebastian Andrzej Siewior.
6) Follow-ups to address issues in the bpf_refcount shared ownership
implementation, from Dave Marchevsky.
7) A few general refactorings to BPF map and program creation permissions
checks which were part of the BPF token series, from Andrii Nakryiko.
8) Various fixes for benchmark framework and add a new benchmark
for BPF memory allocator to BPF selftests, from Hou Tao.
9) Documentation improvements around iterators and trusted pointers,
from Anton Protopopov.
10) Small cleanup in verifier to improve allocated object check,
from Daniel T. Lee.
11) Improve performance of bpf_xdp_pointer() by avoiding access
to shared_info when XDP packet does not have frags,
from Jesper Dangaard Brouer.
12) Silence a harmless syzbot-reported warning in btf_type_id_size(),
from Yonghong Song.
13) Remove duplicate bpfilter_umh_cleanup in favor of umd_cleanup_helper,
from Jarkko Sakkinen.
14) Fix BPF selftests build for resolve_btfids under custom HOSTCFLAGS,
from Viktor Malik.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (49 commits)
bpf, docs: Document existing macros instead of deprecated
bpf, docs: BPF Iterator Document
selftests/bpf: Fix compilation failure for prog vrf_socket_lookup
selftests/bpf: Add vrf_socket_lookup tests
bpf: Fix bpf socket lookup from tc/xdp to respect socket VRF bindings
bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC hookpoint
bpf: Factor out socket lookup functions for the TC hookpoint.
selftests/bpf: Set the default value of consumer_cnt as 0
selftests/bpf: Ensure that next_cpu() returns a valid CPU number
selftests/bpf: Output the correct error code for pthread APIs
selftests/bpf: Use producer_cnt to allocate local counter array
xsk: Remove unused inline function xsk_buff_discard()
bpf: Keep BPF_PROG_LOAD permission checks clear of validations
bpf: Centralize permissions checks for all BPF map types
bpf: Inline map creation logic in map_create() function
bpf: Move unprivileged checks into map_create() and bpf_prog_load()
bpf: Remove in_atomic() from bpf_link_put().
selftests/bpf: Verify that check_ids() is used for scalars in regsafe()
bpf: Verify scalar ids mapping in regsafe() using check_ids()
selftests/bpf: Check if mark_chain_precision() follows scalar ids
...
====================
The mlxsw driver currently makes the assumption that the user applies
configuration in a bottom-up manner. Thus netdevices need to be added to
the bridge before IP addresses are configured on that bridge or SVI added
on top of it. Enslaving a netdevice to another netdevice that already has
uppers is in fact forbidden by mlxsw for this reason. Despite this safety,
it is rather easy to get into situations where the offloaded configuration
is just plain wrong.
As an example, take a front panel port, configure an IP address: it gets a
RIF. Now enslave the port to the bridge, and the RIF is gone. Remove the
port from the bridge again, but the RIF never comes back. There is a number
of similar situations, where changing the configuration there and back
utterly breaks the offload.
The situation is going to be made better by implementing a range of replays
and post-hoc offloads.
This patch set lays the ground for replay of next hops. The particular
issue that it deals with is that currently, driver-specific bookkeeping for
next hops is hooked off RIF objects, which come and go across the lifetime
of a netdevice. We would rather keep these objects at an entity that
mirrors the lifetime of the netdevice itself. That way they are at hand and
can be offloaded when a RIF is eventually created.
To that end, with this patchset, mlxsw keeps a hash table of CRIFs:
candidate RIFs, persistent handles for netdevices that mlxsw deems
potentially interesting. The lifetime of a CRIF matches that of the
underlying netdevice, and thus a RIF can always assume a CRIF exists. A
CRIF is where next hops are kept, and when RIF is created, these next hops
can be easily offloaded. (Previously only the next hops created after the
RIF was created were offloaded.)
- Patches #1 and #2 are minor adjustments.
- In patches #3 and #4, add CRIF bookkeeping.
- In patch #5, link CRIFs to RIFs such that given a netdevice-backed RIF,
the corresponding CRIF is easy to look up.
- Patch #6 is a clean-up allowed by the previous patches
- Patches #7 and #8 move next hop tracking to CRIFs
No observable effects are intended as of yet. This will be useful once
there is support for RIF creation for netdevices that become mlxsw uppers,
which will come in following patch sets.
====================
Petr Machata [Thu, 22 Jun 2023 13:33:09 +0000 (15:33 +0200)]
mlxsw: spectrum_router: Track next hops at CRIFs
Move the list of next hops from struct mlxsw_sp_rif to mlxsw_sp_crif. The
reason is that eventually, next hops for mlxsw uppers should be offloaded
and unoffloaded on demand as a netdevice becomes an upper, or stops being
one. Currently, next hops are tracked at RIFs, but RIFs do not exist when a
netdevice is not an mlxsw uppers. CRIFs are kept track of throughout the
netdevice lifetime.
Correspondingly, track at each next hop not its RIF, but its CRIF (from
which a RIF can always be deduced).
Note that now that next hops are tracked at a CRIF, it is not necessary to
move each over to a new RIF when it is necessary to edit a RIF. Therefore
drop mlxsw_sp_nexthop_rif_migrate() and have mlxsw_sp_rif_migrate_destroy()
call mlxsw_sp_nexthop_rif_update() directly.
Petr Machata [Thu, 22 Jun 2023 13:33:08 +0000 (15:33 +0200)]
mlxsw: spectrum_router: Split nexthop finalization to two stages
Nexthop finalization consists of two steps: the part where the offload is
removed, because the backing RIF is now gone; and the part where the
association to the RIF is severed.
Extract from mlxsw_sp_nexthop_type_fini() a helper that covers the
unoffloading part, mlxsw_sp_nexthop_type_rif_gone(), so that it can later
be called independently.
Note that this swaps around the ordering of mlxsw_sp_nexthop_ipip_fini()
vs. mlxsw_sp_nexthop_rif_fini(). The current ordering is more of a
historical happenstance than a conscious decision. The two cleanups do not
depend on each other, and this change should have no observable effects.
Petr Machata [Thu, 22 Jun 2023 13:33:07 +0000 (15:33 +0200)]
mlxsw: spectrum_router: Use router.lb_crif instead of .lb_rif_index
A previous patch added a pointer to loopback CRIF to the router data
structure. That makes the loopback RIF index redundant, as everything
necessary can be derived from the CRIF. Drop the field and adjust the code
accordingly.
Petr Machata [Thu, 22 Jun 2023 13:33:06 +0000 (15:33 +0200)]
mlxsw: spectrum_router: Link CRIFs to RIFs
When a RIF is about to be created, the registration of the netdevice that
it should be associated with must have been seen in the past, and a CRIF
created. Therefore make this a hard requirement by looking up the CRIF
during RIF creation, and complaining loudly when there isn't one.
This then allows to keep a link between a RIF and its corresponding
CRIF (and back, as the relationship is one-to-at-most-one), which do.
The CRIF will later be useful as the objects tracked there will be
offloaded lazily as a result of RIF creation.
CRIFs are created when an "interesting" netdevice is registered, and
destroyed after such device is unregistered. CRIFs are supposed to already
exist when a RIF creation request arises, and exist at least as long as
that RIF exists. This makes for a simple invariant: it is always safe to
dereference CRIF pointer from "its" RIF.
To guarantee this, CRIFs cannot be removed immediately when the UNREGISTER
event is delivered. The reason is that if a RIF's netdevices has an IPv6
address, removal of this address is notified in an atomic block. To remove
the RIF, the IPv6 removal handler schedules a work item. It must be safe
for this work item to access the associated CRIF as well.
Thus when a netdevice that backs the CRIF is removed, if it still has a
RIF, do not actually free the CRIF, only toggle its can_destroy flag, which
this patch adds. Later on, mlxsw_sp_rif_destroy() collects the CRIF.
Petr Machata [Thu, 22 Jun 2023 13:33:05 +0000 (15:33 +0200)]
mlxsw: spectrum_router: Maintain CRIF for fallback loopback RIF
CRIFs are generally not maintained for loopback RIFs. However, the RIF for
the default VRF is used for offloading of blackhole nexthops. Nexthops
expect to have a valid CRIF. Therefore in this patch, add code to maintain
CRIF for the loopback RIF as well.
Petr Machata [Thu, 22 Jun 2023 13:33:04 +0000 (15:33 +0200)]
mlxsw: spectrum_router: Maintain a hash table of CRIFs
CRIFs are objects that mlxsw maintains for netdevices that may not have an
associated RIF (i.e. they may not have been instantiated in the ASIC), but
if indeed they do not, it is quite possible they will in the future. These
netdevices are candidate RIFs, hence CRIFs. Netdevices for which CRIFs are
created include e.g. bridges, LAGs, or front panel ports. The idea is that
next hops would be kept at CRIFs, not RIFs, and thus it would be easier to
offload and unoffload the entities that have been added before the RIF was
created.
In this patch, add the code for low-level CRIF maintenance: create and
destroy, and keep in a table keyed by the netdevice pointer for easy
recall.
Petr Machata [Thu, 22 Jun 2023 13:33:03 +0000 (15:33 +0200)]
mlxsw: spectrum_router: Use mlxsw_sp_ul_rif_get() to get main VRF LB RIF
The current function, mlxsw_sp_router_ul_rif_get(), is a wrapper around the
function mentioned in the subject. As such it forms an external interface
of the router code.
In future patches we will want to maintain connection between RIFs and the
CRIFs (introduced in the next patch) that back them. That will not hold
for the VRF-based loopback netdevices, so the whole CRIF business can be
kept hidden from the rest of mlxsw.
But for the main VRF loopback RIF we do want to keep the RIF-CRIF
connection, because that RIF is used for blackhole next hops, and the next
hop code can be kept simpler for assuming rif->crif is valid.
Hence, instead, call mlxsw_sp_ul_rif_get() to create the main VRF loopback
RIF. This being an internal function will take the CRIF argument anyway.
Furthermore, the function does not lock, which is not necessary at this
point in code yet.
Thorsten Winkler [Wed, 21 Jun 2023 13:49:21 +0000 (15:49 +0200)]
s390/ctcm: Convert sprintf/snprintf to scnprintf
This LWN article explains the why scnprintf is preferred over snprintf
in general
https://lwn.net/Articles/69419/
Ie. snprintf() returns what *would* be the resulting length, while
scnprintf() returns the actual length.
Note that ctcm_print_statistics() writes the data into the kernel log
and is therefore not suitable for sysfs_emit(). Observable behavior is
not changed, as there may be dependencies.
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Thorsten Winkler <twinkler@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Thorsten Winkler [Wed, 21 Jun 2023 13:49:20 +0000 (15:49 +0200)]
s390/ctcm: Convert sysfs sprintf to sysfs_emit
Following the advice of the Documentation/filesystems/sysfs.rst.
All sysfs related show()-functions should only use sysfs_emit() or
sysfs_emit_at() when formatting the value to be returned to user space.
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Thorsten Winkler <twinkler@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Thorsten Winkler [Wed, 21 Jun 2023 13:49:19 +0000 (15:49 +0200)]
s390/lcs: Convert sprintf to scnprintf
This LWN article explains the why scnprintf is preferred over snprintf
in general
https://lwn.net/Articles/69419/
Ie. snprintf() returns what *would* be the resulting length, while
scnprintf() returns the actual length.
Reported-by: Jules Irenge <jbi.octave@gmail.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Thorsten Winkler <twinkler@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Thorsten Winkler [Wed, 21 Jun 2023 13:49:18 +0000 (15:49 +0200)]
s390/lcs: Convert sysfs sprintf to sysfs_emit
Following the advice of the Documentation/filesystems/sysfs.rst.
All sysfs related show()-functions should only use sysfs_emit() or
sysfs_emit_at() when formatting the value to be returned to user space.
While at it, follow Linux kernel coding style and unify indentation
Reported-by: Jules Irenge <jbi.octave@gmail.com> Reported-by: Joe Perches <joe@perches.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Thorsten Winkler <twinkler@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
hclge_debugfs.c:90:25: warning: 'strncpy' output truncated before
terminating nul copying as many bytes from a string as its length
[-Wstringop-truncation]
strncpy(pos, result[i], strlen(result[i]));
strncpy() use src-length as copy-length, it may result in
dest-buf overflow.
So,this patch add some values check to avoid this issue.
Signed-off-by: Hao Chen <chenhao418@huawei.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/lkml/202207170606.7WtHs9yS-lkp@intel.com/T/ Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jian Shen [Wed, 21 Jun 2023 12:33:07 +0000 (20:33 +0800)]
net: hns3: refine the tcam key convert handle
The result of expression '(k ^ ~v) & k' is exactly
the same with 'k & v', so simplify it.
(k ^ ~v) & k == k & v
The truth table (in non table form):
k == 0, v == 0:
(k ^ ~v) & k == (0 ^ ~0) & 0 == (0 ^ 1) & 0 == 1 & 0 == 0
k & v == 0 & 0 == 0
k == 0, v == 1:
(k ^ ~v) & k == (0 ^ ~1) & 0 == (0 ^ 0) & 0 == 1 & 0 == 0
k & v == 0 & 1 == 0
k == 1, v == 0:
(k ^ ~v) & k == (1 ^ ~0) & 1 == (1 ^ 1) & 1 == 0 & 1 == 0
k & v == 1 & 0 == 0
k == 1, v == 1:
(k ^ ~v) & k == (1 ^ ~1) & 1 == (1 ^ 0) & 1 == 1 & 1 == 1
k & v == 1 & 1 == 1 Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
other
- fix the last few W=1 warnings from GCC 13
- merged wireless tree to avoid conflicts
* tag 'wireless-next-2023-06-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (245 commits)
wifi: ieee80211: fix erroneous NSTR bitmap size checks
wifi: rtlwifi: cleanup USB interface
wifi: rtlwifi: simplify LED management
wifi: ath10k: improve structure padding
wifi: ath9k: convert msecs to jiffies where needed
wifi: iwlwifi: mvm: Add support for IGTK in D3 resume flow
wifi: iwlwifi: mvm: update two most recent GTKs on D3 resume flow
wifi: iwlwifi: mvm: Refactor security key update after D3
wifi: mac80211: mark keys as uploaded when added by the driver
wifi: iwlwifi: remove support of A0 version of FM RF
wifi: iwlwifi: cfg: clean up Bz module firmware lines
wifi: iwlwifi: pcie: add device id 51F1 for killer 1675
wifi: iwlwifi: bump FW API to 83 for AX/BZ/SC devices
wifi: iwlwifi: cfg: remove trailing dash from FW_PRE constants
wifi: iwlwifi: also unify Ma device configurations
wifi: iwlwifi: also unify Sc device configurations
wifi: iwlwifi: unify Bz/Gl device configurations
wifi: iwlwifi: pcie: also drop jacket from info macro
wifi: iwlwifi: remove support for *nJ devices
wifi: iwlwifi: don't load old firmware for 22000
...
====================
The first patch is by Carsten Schmidt, targets the kvaser_usb driver
and adds len8_dlc support.
Marcel Hellwig's patch for the xilinx_can driver adds support for CAN
transceivers via the PHY framework.
Frank Jungclaus contributes 6+2 patches for the esd_usb driver in
preparation for the upcoming CAN-USB/3 support.
The 2 patches by Miquel Raynal for the sja1000 driver work around
overruns stalls on the Renesas SoCs.
The next 3 patches are by me and fix the coding style in the
rx-offload helper and in the m_can and ti_hecc driver.
Vincent Mailhol contributes 3 patches to fix and update the
calculation of the length of CAN frames on the wire.
Oliver Hartkopp's patch moves the CAN_RAW_FILTER_MAX definition into
the correct header.
The remaining 14 patches are by Jimmy Assarsson, target the
kvaser_pciefd driver and bring various updates and improvements.
* tag 'linux-can-next-for-6.5-20230622' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (33 commits)
can: kvaser_pciefd: Use TX FIFO size read from CAN controller
can: kvaser_pciefd: Refactor code
can: kvaser_pciefd: Add len8_dlc support
can: kvaser_pciefd: Use FIELD_{GET,PREP} and GENMASK where appropriate
can: kvaser_pciefd: Sort register definitions
can: kvaser_pciefd: Change return type for kvaser_pciefd_{receive,transmit,set_tx}_irq()
can: kvaser_pciefd: Rename device ID defines
can: kvaser_pciefd: Sort includes in alphabetic order
can: kvaser_pciefd: Remove SPI flash parameter read functionality
can: uapi: move CAN_RAW_FILTER_MAX definition to raw.h
can: kvaser_pciefd: Define unsigned constants with type suffix 'U'
can: kvaser_pciefd: Set hardware timestamp on transmitted packets
can: kvaser_pciefd: Add function to set skb hwtstamps
can: kvaser_pciefd: Remove handler for unused KVASER_PCIEFD_PACK_TYPE_EFRAME_ACK
can: kvaser_pciefd: Remove useless write to interrupt register
can: length: refactor frame lengths definition to add size in bits
can: length: fix bitstuffing count
can: length: fix description of the RRS field
can: m_can: fix coding style
can: ti_hecc: fix coding style
...
====================
Piotr Gardocki [Wed, 21 Jun 2023 13:21:06 +0000 (15:21 +0200)]
net: fix net device address assign type
Commit ad72c4a06acc introduced optimization to return from function
quickly if the MAC address is not changing at all. It was reported
that such change causes dev->addr_assign_type to not change
to NET_ADDR_SET from _PERM or _RANDOM.
Restore the old behavior and skip only call to ndo_set_mac_address.
Fixes: ad72c4a06acc ("net: add check for current MAC address in dev_set_mac_address") Reported-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230621132106.991342-1-piotrx.gardocki@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Edward Cree [Wed, 21 Jun 2023 12:15:04 +0000 (13:15 +0100)]
sfc: keep alive neighbour entries while a TC encap action is using them
When processing counter updates, if any action set using the newly
incremented counter includes an encap action, prod the corresponding
neighbouring entry to indicate to the neighbour cache that the entry
is still in use and passing traffic.
Ying Hsu [Tue, 20 Jun 2023 17:47:32 +0000 (10:47 -0700)]
igb: Fix igb_down hung on surprise removal
In a setup where a Thunderbolt hub connects to Ethernet and a display
through USB Type-C, users may experience a hung task timeout when they
remove the cable between the PC and the Thunderbolt hub.
This is because the igb_down function is called multiple times when
the Thunderbolt hub is unplugged. For example, the igb_io_error_detected
triggers the first call, and the igb_remove triggers the second call.
The second call to igb_down will block at napi_synchronize.
Here's the call trace:
__schedule+0x3b0/0xddb
? __mod_timer+0x164/0x5d3
schedule+0x44/0xa8
schedule_timeout+0xb2/0x2a4
? run_local_timers+0x4e/0x4e
msleep+0x31/0x38
igb_down+0x12c/0x22a [igb 6615058754948bfde0bf01429257eb59f13030d4]
__igb_close+0x6f/0x9c [igb 6615058754948bfde0bf01429257eb59f13030d4]
igb_close+0x23/0x2b [igb 6615058754948bfde0bf01429257eb59f13030d4]
__dev_close_many+0x95/0xec
dev_close_many+0x6e/0x103
unregister_netdevice_many+0x105/0x5b1
unregister_netdevice_queue+0xc2/0x10d
unregister_netdev+0x1c/0x23
igb_remove+0xa7/0x11c [igb 6615058754948bfde0bf01429257eb59f13030d4]
pci_device_remove+0x3f/0x9c
device_release_driver_internal+0xfe/0x1b4
pci_stop_bus_device+0x5b/0x7f
pci_stop_bus_device+0x30/0x7f
pci_stop_bus_device+0x30/0x7f
pci_stop_and_remove_bus_device+0x12/0x19
pciehp_unconfigure_device+0x76/0xe9
pciehp_disable_slot+0x6e/0x131
pciehp_handle_presence_or_link_change+0x7a/0x3f7
pciehp_ist+0xbe/0x194
irq_thread_fn+0x22/0x4d
? irq_thread+0x1fd/0x1fd
irq_thread+0x17b/0x1fd
? irq_forced_thread_fn+0x5f/0x5f
kthread+0x142/0x153
? __irq_get_irqchip_state+0x46/0x46
? kthread_associate_blkcg+0x71/0x71
ret_from_fork+0x1f/0x30
In this case, igb_io_error_detected detaches the network interface
and requests a PCIE slot reset, however, the PCIE reset callback is
not being invoked and thus the Ethernet connection breaks down.
As the PCIE error in this case is a non-fatal one, requesting a
slot reset can be avoided.
This patch fixes the task hung issue and preserves Ethernet
connection by ignoring non-fatal PCIE errors.
Signed-off-by: Ying Hsu <yinghsu@chromium.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230620174732.4145155-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Patch 1 is just a simplification, technically unrelated to the other
two patches. But it would be a bit inconsistent to have the new
ksz_prmw32() introduced in patch 2 use ksz_rmw32() while leaving
ksz_prmw8() as-is.
The actual fix is of course patch 3. I can definitely see some weird
behaviour on our ksz9567 when writing to phy registers 0x1e and 0x1f
(with phytool from userspace), though it does not seem that the effect
is always to write zeroes to the buddy register as the errata sheet
says would be the case. In our case, the switch is connected via i2c;
I hope somebody with other switches and/or the SPI variants can test
this.
====================
Rasmus Villemoes [Tue, 20 Jun 2023 11:38:54 +0000 (13:38 +0200)]
net: dsa: microchip: fix writes to phy registers >= 0x10
According to the errata sheets for ksz9477 and ksz9567, writes to the
PHY registers 0x10-0x1f (i.e. those located at addresses 0xN120 to
0xN13f) must be done as a 32 bit write to the 4-byte aligned address
containing the register, hence requires a RMW in order not to change
the adjacent PHY register.
Jakub Kicinski [Wed, 21 Jun 2023 23:17:19 +0000 (16:17 -0700)]
tools: ynl: improve the direct-include header guard logic
Przemek suggests that I shouldn't accuse GCC of witchcraft,
there is a simpler explanation for why we need manual define.
scripts/headers_install.sh modifies the guard, removing _UAPI.
That's why including a kernel header from the tree and from
/usr leads to duplicate definitions.
This also solves the mystery of why I needed to include
the header conditionally. I had the wrong guards for most
cases but ethtool.
Zhengchao Shao [Tue, 20 Jun 2023 06:25:19 +0000 (14:25 +0800)]
net: txgbe: remove unused buffer in txgbe_calc_eeprom_checksum
Half a year passed since commit 049fe5365324c ("net: txgbe: Add operations
to interact with firmware") was submitted, the buffer in
txgbe_calc_eeprom_checksum was not used. So remove it and the related
branch codes.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202306200242.FXsHokaJ-lkp@intel.com/ Reviewed-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230620062519.1575298-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Add and use helper for PCS negotiation modes
Earlier this month, I proposed a helper for deciding whether a PCS
should use inband negotiation modes or not. There was some discussion
around this topic, and I believe there was no disagreement about
providing the helper.
that added a helper, phylink_pcs_neg_mode() which PCS drivers could use
to parse the state, and updated a bunch of drivers to use it. I got
a couple of bits of feedback to it, including some ACKs.
However, I've decided to take this slightly further and change the
"mode" parameter to both the pcs_config() and pcs_link_up() methods
when a PCS driver opts in to this (by setting "neg_mode" in the
phylink_pcs structure.) If this is not set, we default to the old
behaviour. That said, this series converts all the PCS implementations
I can find currently in net-next.
Doing this has the added benefit that the negotiation mode parameter
is also available to the pcs_link_up() function, which can now know
whether inband negotiation was in fact enabled or not at pcs_config()
time.
and received one reply, thanks Elad, which is a similar amount of
interest to previous postings. Let's post it as non-RFC and see
whether we get more reaction.
====================
Update macb's embedded PCS drivers to use neg_mode, even though it
makes no use of it or the "mode" argument. This makes the driver
consistent with converted drivers.
net: dsa: mt7530: update PCS driver to use neg_mode
Update mt7530's embedded PCS driver to use neg_mode, even though it
makes no use of it or the "mode" argument. This makes the driver
consistent with converted drivers.
Update B53's embedded PCS driver to use neg_mode, even though it makes
no use of it or the "mode" argument. This makes the driver consistent
with converted drivers.
Update Sparx5's embedded PCS driver to use neg_mode rather than the
mode argument. As there is no pcs_link_up() method, this only affects
the pcs_config() method.
Update qca8k's embedded PCS driver to use neg_mode rather than the
mode argument. As there is no pcs_link_up() method, this only affects
the pcs_config() method.
Update prestera's embedded PCS driver to use neg_mode rather than the
mode argument. As there is no pcs_link_up() method, this only affects
the pcs_config() method.
Update mvpp2's embedded PCS drivers to use neg_mode rather than the
mode argument, remembering to update the ACPI path as well. As there
are no pcs_link_up() methods, this only affects the two pcs_config()
methods.
Update mvneta's embedded PCS driver to use neg_mode rather than the
mode argument. As there is no pcs_link_up() method, this only affects
the pcs_config() method.
Update lan966x's embedded PCS driver to use neg_mode rather than the
mode argument. As there is no pcs_link_up() method, this only affects
the pcs_config() method.
Update the Lynx PCS driver to use neg_mode rather than the mode
argument. This ensures that the link_up() method will always program
the speed and duplex when negotiation is disabled.
net: pcs: lynxi: update PCS driver to use neg_mode
Update the Lynxi PCS driver to use neg_mode rather than the mode
argument. This ensures that the link_up() method will always program
the speed and duplex when negotiation is disabled.
Update xpcs to use neg_mode to configure whether inband negotiation
should be used. We need to update sja1105 as well as that directly
calls into the XPCS driver's config function.
net: phylink: pass neg_mode into phylink_mii_c22_pcs_config()
Convert fman_dtsec, xilinx_axienet and pcs-lynx to pass the neg_mode
into phylink_mii_c22_pcs_config(). Where appropriate, drivers are
updated to have neg_mode passed into their pcs_config() and
pcs_link_up() functions. For other drivers, we just hoist the call
to phylink_pcs_neg_mode() to their pcs_config() method out of
phylink_mii_c22_pcs_config().
PCS have to work out whether they should enable PCS negotiation by
looking at the "mode" and "interface" arguments, and the Autoneg bit
in the advertising mask.
This leads to some complex logic, so lets pull that out into phylink
and instead pass a "neg_mode" argument to the PCS configuration and
link up methods, instead of the "mode" argument.
In order to transition drivers, add a "neg_mode" flag to the phylink
PCS structure to PCS can indicate whether they want to be passed the
neg_mode or the old mode argument.
tools/testing/selftests/net/fcnal-test.sh d7a2fc1437f7 ("selftests: net: fcnal-test: check if FIPS mode is enabled") dd017c72dde6 ("selftests: fcnal: Test SO_DONTROUTE on TCP sockets.")
https://lore.kernel.org/all/5007b52c-dd16-dbf6-8d64-b9701bfa498b@tessares.net/
https://lore.kernel.org/all/20230619105427.4a0df9b3@canb.auug.org.au/
Linus Torvalds [Fri, 23 Jun 2023 00:59:51 +0000 (17:59 -0700)]
Merge tag 'net-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from ipsec, bpf, mptcp and netfilter.
Current release - regressions:
- netfilter: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
- eth: mlx5e:
- fix scheduling of IPsec ASO query while in atomic
- free IRQ rmap and notifier on kernel shutdown
Current release - new code bugs:
- phy: manual remove LEDs to ensure correct ordering
Previous releases - regressions:
- mptcp: fix possible divide by zero in recvmsg()
- dsa: revert "net: phy: dp83867: perform soft reset and retain
established link"
Previous releases - always broken:
- sched: netem: acquire qdisc lock in netem_change()
- bpf:
- fix verifier id tracking of scalars on spill
- fix NULL dereference on exceptions
- accept function names that contain dots
- netfilter: disallow element updates of bound anonymous sets
- mptcp: ensure listener is unhashed before updating the sk status
- xfrm:
- add missed call to delete offloaded policies
- fix inbound ipv4/udp/esp packets to UDPv6 dualstack sockets
- selftests: fixes for FIPS mode
- dsa: mt7530: fix multiple CPU ports, BPDU and LLDP handling
- eth: sfc: use budget for TX completions
Misc:
- wifi: iwlwifi: add support for SO-F device with PCI id 0x7AF0"
* tag 'net-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (74 commits)
revert "net: align SO_RCVMARK required privileges with SO_MARK"
net: wwan: iosm: Convert single instance struct member to flexible array
sch_netem: acquire qdisc lock in netem_change()
selftests: forwarding: Fix race condition in mirror installation
wifi: mac80211: report all unusable beacon frames
mptcp: ensure listener is unhashed before updating the sk status
mptcp: drop legacy code around RX EOF
mptcp: consolidate fallback and non fallback state machine
mptcp: fix possible list corruption on passive MPJ
mptcp: fix possible divide by zero in recvmsg()
mptcp: handle correctly disconnect() failures
bpf: Force kprobe multi expected_attach_type for kprobe_multi link
bpf/btf: Accept function names that contain dots
Revert "net: phy: dp83867: perform soft reset and retain established link"
net: mdio: fix the wrong parameters
netfilter: nf_tables: Fix for deleting base chains with payload
netfilter: nfnetlink_osf: fix module autoload
netfilter: nf_tables: drop module reference after updating chain
netfilter: nf_tables: disallow timeout for anonymous sets
netfilter: nf_tables: disallow updates of anonymous sets
...
Linus Torvalds [Fri, 23 Jun 2023 00:38:11 +0000 (17:38 -0700)]
Merge tag 'platform-drivers-x86-v6.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fix from Hans de Goede:
"One small fix for an AMD PMF driver issue which is causing issues for
users of just released AMD laptop models"
* tag 'platform-drivers-x86-v6.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/amd/pmf: Register notify handler only if SPS is enabled
Linus Torvalds [Fri, 23 Jun 2023 00:32:34 +0000 (17:32 -0700)]
Merge tag 'io_uring-6.4-2023-06-21' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
"A fix for a race condition with poll removal and linked timeouts, and
then a few followup fixes/tweaks for the msg_control patch from last
week.
Not super important, particularly the sparse fixup, as it was broken
before that recent commit. But let's get it sorted for real for this
release, rather than just have it broken a bit differently"
* tag 'io_uring-6.4-2023-06-21' of git://git.kernel.dk/linux:
io_uring/net: use the correct msghdr union member in io_sendmsg_copy_hdr
io_uring/net: disable partial retries for recvmsg with cmsg
io_uring/net: clear msg_controllen on partial sendmsg retry
io_uring/poll: serialize poll linked timer start with poll removal
Linus Torvalds [Fri, 23 Jun 2023 00:27:16 +0000 (17:27 -0700)]
Merge tag 'cgroup-for-6.4-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"It's late but here are two bug fixes. Both fix problems which can be
severe but are very confined in scope. The risk to most use cases
should be minimal.
- Fix for an old bug which triggers if a cgroup subsystem is
remounted to a different hierarchy while someone is reading its
cgroup.procs/tasks file. The risk is pretty low given how seldom
cgroup subsystems are moved across hierarchies.
- We moved cpus_read_lock() outside of cgroup internal locks a while
ago but forgot to update the legacy_freezer leading to lockdep
triggers. Fixed"
* tag 'cgroup-for-6.4-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Do not corrupt task iteration when rebinding subsystem
cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex in freezer_css_{online,offline}()
Gavin Shan [Thu, 15 Jun 2023 05:42:59 +0000 (15:42 +1000)]
KVM: Avoid illegal stage2 mapping on invalid memory slot
We run into guest hang in edk2 firmware when KSM is kept as running on
the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash
device (TYPE_PFLASH_CFI01) during the operation of sector erasing or
buffered write. The status is returned by reading the memory region of
the pflash device and the read request should have been forwarded to QEMU
and emulated by it. Unfortunately, the read request is covered by an
illegal stage2 mapping when the guest hang issue occurs. The read request
is completed with QEMU bypassed and wrong status is fetched. The edk2
firmware runs into an infinite loop with the wrong status.
The illegal stage2 mapping is populated due to same page sharing by KSM
at (C) even the associated memory slot has been marked as invalid at (B)
when the memory slot is requested to be deleted. It's notable that the
active and inactive memory slots can't be swapped when we're in the middle
of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count
is elevated, and kvm_swap_active_memslots() will busy loop until it reaches
to zero again. Besides, the swapping from the active to the inactive memory
slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(),
corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots().
Fix the issue by skipping the invalid memory slot at (C) to avoid the
illegal stage2 mapping so that the read request for the pflash's status
is forwarded to QEMU and emulated by it. In this way, the correct pflash's
status can be returned from QEMU to break the infinite loop in the edk2
firmware.
We tried a git-bisect and the first problematic commit is cd4c71835228 ("
KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this,
clean_dcache_guest_page() is called after the memory slots are iterated
in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called
before the iteration on the memory slots before this commit. This change
literally enlarges the racy window between kvm_mmu_notifier_change_pte()
and memory slot removal so that we're able to reproduce the issue in a
practical test case. However, the issue exists since commit d5d8184d35c9
("KVM: ARM: Memory virtualization setup").
Cc: stable@vger.kernel.org # v3.9+ Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup") Reported-by: Shuai Hu <hshuai@redhat.com> Reported-by: Zhenyu Zhang <zhenyzha@redhat.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Message-Id: <20230615054259.14911-1-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
ice_change_mtu() is currently using a separate ice_down() and ice_up()
calls to reflect changed MTU. ice_down_up() serves this purpose, so do
the refactoring here.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The complete profile bit together with the NSTR link pair
present bit indicate whether or not the NSTR bitmap is,
the NSTR bitmap size just indicates how big it is.
Fixes: 7b6f08771bf6 ("wifi: ieee80211: Support validating ML station profile length") Fixes: 5c1f97537bfb ("wifi: mac80211: store BSS param change count from assoc response") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Anton Protopopov [Thu, 22 Jun 2023 09:54:24 +0000 (09:54 +0000)]
bpf, docs: Document existing macros instead of deprecated
The BTF_TYPE_SAFE_NESTED macro was replaced by the BTF_TYPE_SAFE_TRUSTED,
BTF_TYPE_SAFE_RCU, and BTF_TYPE_SAFE_RCU_OR_NULL macros. Fix the docs
correspondingly.
Fixes: 6fcd486b3a0a ("bpf: Refactor RCU enforcement in the verifier.") Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20230622095424.1024244-1-aspsk@isovalent.com
Anton Protopopov [Thu, 22 Jun 2023 09:54:07 +0000 (09:54 +0000)]
bpf, docs: BPF Iterator Document
Fix the description of the seq_info field of the bpf_iter_reg structure which
was wrong due to an accidental copy/paste of the previous field's description.
Przemek Kitszel [Wed, 31 May 2023 12:38:40 +0000 (14:38 +0200)]
ice: remove null checks before devm_kfree() calls
We all know they are redundant.
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Przemek Kitszel [Wed, 31 May 2023 12:36:42 +0000 (14:36 +0200)]
ice: clean up freeing SR-IOV VFs
The check for existing VFs was redundant since very
inception of SR-IOV sysfs interface in the kernel,
see commit 1789382a72a5 ("PCI: SRIOV control and status via sysfs").
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently ice driver's .ndo_bpf callback brings interface down and up
independently of XDP resources' presence. This is only needed when
either these resources have to be configured or removed. It means that
if one is switching XDP programs on-the-fly with running traffic,
packets will be dropped.
To avoid this, compare early on ice_xdp_setup_prog() state of incoming
bpf_prog pointer vs the bpf_prog pointer that is already assigned to
VSI. Do the swap in case VSI has bpf_prog and incoming one are non-NULL.
Lastly, while at it, put old bpf_prog *after* the update of Rx ring's
bpf_prog pointer. In theory previous code could expose us to a state
where Rx ring's bpf_prog would still be referring to old_prog that got
released with earlier bpf_prog_put().
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jacob Keller [Tue, 13 Jun 2023 20:40:53 +0000 (13:40 -0700)]
ice: reduce initial wait for control queue messages
The ice_sq_send_cmd() function is used to send messages to the control
queues used to communicate with firmware, virtual functions, and even some
hardware.
When sending a control queue message, the driver is designed to
synchronously wait for a response from the queue. Currently it waits
between checks for 100 to 150 microseconds.
Commit f86d6f9c49f6 ("ice: sleep, don't busy-wait, for
ICE_CTL_Q_SQ_CMD_TIMEOUT") did recently change the behavior from an
unnecessary delay into a sleep which is a significant improvement over the
old behavior of polling using udelay.
Because of the nature of PCIe transactions, the hardware won't be informed
about a new message until the write to the tail register posts. This is
only guaranteed to occur at the next register read. In ice_sq_send_cmd(),
this happens at the ice_sq_done() call. Because of this, the driver
essentially forces a minimum of one full wait time regardless of how fast
the response is.
For the hardware-based sideband queue, this is especially slow. It is
expected that the hardware will respond within 2 or 3 microseconds, an
order of magnitude faster than the 100-150 microsecond sleep.
Allow such fast completions to occur without delay by introducing a small 5
microsecond delay first before entering the sleeping timeout loop. Ensure
the tail write has been posted by using ice_flush(hw) first.
While at it, lets also remove the ICE_CTL_Q_SQ_CMD_USEC macro as it
obscures the sleep time in the inner loop. It was likely introduced to
avoid "magic numbers", but in practice sleep and delay values are easier to
read and understand when using actual numbers instead of a named constant.
This change should allow the fast hardware based control queue messages to
complete quickly without delay, while slower firmware queue response times
will sleep while waiting for the response.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Przemek Kitszel [Mon, 19 Jun 2023 08:06:35 +0000 (04:06 -0400)]
iavf: fix err handling for MAC replace
Defer removal of current primary MAC until a replacement is successfully
added. Previous implementation would left filter list with no primary MAC.
This was found while reading the code.
The patch takes advantage of the fact that there can only be a single primary
MAC filter at any time ([1] by Piotr)
Piotr has also applied some review suggestions during our internal patch
submittal process.
Paolo Abeni [Thu, 22 Jun 2023 12:39:06 +0000 (14:39 +0200)]
Merge tag 'nf-23-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
This is v3, including a crash fix for patch 01/14.
The following patchset contains Netfilter/IPVS fixes for net:
1) Fix UDP segmentation with IPVS tunneled traffic, from Terin Stock.
2) Fix chain binding transaction logic, add a bound flag to rule
transactions. Remove incorrect logic in nft_data_hold() and
nft_data_release().
3) Add a NFT_TRANS_PREPARE_ERROR deactivate state to deal with releasing
the set/chain as a follow up to 1240eb93f061 ("netfilter: nf_tables:
incorrect error path handling with NFT_MSG_NEWRULE")
4) Drop map element references from preparation phase instead of
set destroy path, otherwise bogus EBUSY with transactions such as:
flush chain ip x y
delete chain ip x w
where chain ip x y contains jump/goto from set elements.
5) Pipapo set type does not regard generation mask from the walk
iteration.
6) Fix reference count underflow in set element reference to
stateful object.
7) Several patches to tighten the nf_tables API:
- disallow set element updates of bound anonymous set
- disallow unbound anonymous set/chain at the end of transaction.
- disallow updates of anonymous set.
- disallow timeout configuration for anonymous sets.
8) Fix module reference leak in chain updates.
9) Fix nfnetlink_osf module autoload.
10) Fix deletion of basechain when NFTA_CHAIN_HOOK is specified as
in iptables-nft.
This Netfilter batch is larger than usual at this stage, I am aware we
are fairly late in the -rc cycle, if you prefer to route them through
net-next, please let me know.
netfilter pull request 23-06-21
* tag 'nf-23-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: Fix for deleting base chains with payload
netfilter: nfnetlink_osf: fix module autoload
netfilter: nf_tables: drop module reference after updating chain
netfilter: nf_tables: disallow timeout for anonymous sets
netfilter: nf_tables: disallow updates of anonymous sets
netfilter: nf_tables: reject unbound chain set before commit phase
netfilter: nf_tables: reject unbound anonymous set before commit phase
netfilter: nf_tables: disallow element updates of bound anonymous sets
netfilter: nf_tables: fix underflow in object reference counter
netfilter: nft_set_pipapo: .walk does not deal with generations
netfilter: nf_tables: drop map element references from preparation phase
netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
netfilter: nf_tables: fix chain binding transaction logic
ipvs: align inner_mac_header for encapsulation
====================
Yonghong Song [Thu, 22 Jun 2023 06:19:21 +0000 (23:19 -0700)]
selftests/bpf: Fix compilation failure for prog vrf_socket_lookup
When building the latest kernel/selftest with clang17 compiler:
make LLVM=1 -j <== for kernel
make -C tools/testing/selftests/bpf LLVM=1 -j <== for selftest
I hit the following compilation error:
[...]
In file included from progs/vrf_socket_lookup.c:3:
In file included from /usr/include/linux/ip.h:21:
In file included from /usr/include/asm/byteorder.h:5:
In file included from /usr/include/linux/byteorder/little_endian.h:13:
/usr/include/linux/swab.h:136:8: error: unknown type name '__always_inline'
136 | static __always_inline unsigned long __swab(const unsigned long y)
| ^
/usr/include/linux/swab.h:171:8: error: unknown type name '__always_inline'
171 | static __always_inline __u16 __swab16p(const __u16 *p)
| ^
/usr/include/linux/swab.h:171:29: error: expected ';' after top level declarator
171 | static __always_inline __u16 __swab16p(const __u16 *p)
| ^
[...]
Basically, with header files in my local host which is based on 5.12 kernel,
__always_inline is not defined and this caused compilation failure.
Since __always_inline is defined in bpf_helpers.h, let us move bpf_helpers.h
to an early position which fixed the problem.
revert "net: align SO_RCVMARK required privileges with SO_MARK"
This reverts commit 1f86123b9749 ("net: align SO_RCVMARK required
privileges with SO_MARK") because the reasoning in the commit message
is not really correct:
SO_RCVMARK is used for 'reading' incoming skb mark (via cmsg), as such
it is more equivalent to 'getsockopt(SO_MARK)' which has no priv check
and retrieves the socket mark, rather than 'setsockopt(SO_MARK) which
sets the socket mark and does require privs.
Additionally incoming skb->mark may already be visible if
sysctl_fwmark_reflect and/or sysctl_tcp_fwmark_accept are enabled.
Furthermore, it is easier to block the getsockopt via bpf
(either cgroup setsockopt hook, or via syscall filters)
then to unblock it if it requires CAP_NET_RAW/ADMIN.
On Android the socket mark is (among other things) used to store
the network identifier a socket is bound to. Setting it is privileged,
but retrieving it is not. We'd like unprivileged userspace to be able
to read the network id of incoming packets (where mark is set via
iptables [to be moved to bpf])...
An alternative would be to add another sysctl to control whether
setting SO_RCVMARK is privilged or not.
(or even a MASK of which bits in the mark can be exposed)
But this seems like over-engineering...
Note: This is a non-trivial revert, due to later merged commit e42c7beee71d
("bpf: net: Consider has_current_bpf_ctx() when testing capable() in sk_setsockopt()")
which changed both 'ns_capable' into 'sockopt_ns_capable' calls.
Fixes: 1f86123b9749 ("net: align SO_RCVMARK required privileges with SO_MARK") Cc: Larysa Zaremba <larysa.zaremba@intel.com> Cc: Simon Horman <simon.horman@corigine.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Eyal Birger <eyal.birger@gmail.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Patrick Rohr <prohr@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230618103130.51628-1-maze@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Fix this by moving the registration of source change notify handler only
when SPS(Static Slider) is advertised as supported.
Reported-by: Allen Zhong <allen@atr.me> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217571 Fixes: 4c71ae414474 ("platform/x86/amd/pmf: Add support SPS PMF feature") Tested-by: Patil Rajesh Reddy <Patil.Reddy@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Link: https://lore.kernel.org/r/20230622060309.310001-1-Shyam-sundar.S-k@amd.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Danielle Ratson [Tue, 20 Jun 2023 12:45:15 +0000 (14:45 +0200)]
selftests: forwarding: Fix race condition in mirror installation
When mirroring to a gretap in hardware the device expects to be
programmed with the egress port and all the encapsulating headers. This
requires the driver to resolve the path the packet will take in the
software data path and program the device accordingly.
If the path cannot be resolved (in this case because of an unresolved
neighbor), then mirror installation fails until the path is resolved.
This results in a race that causes the test to sometimes fail.
Fix this by setting the neighbor's state to permanent in a couple of
tests, so that it is always valid.
Fixes: 35c31d5c323f ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1d") Fixes: 239e754af854 ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1q") Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/268816ac729cb6028c7a34d4dda6f4ec7af55333.1687264607.git.petrm@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jimmy Assarsson [Mon, 29 May 2023 13:42:45 +0000 (15:42 +0200)]
can: kvaser_pciefd: Use FIELD_{GET,PREP} and GENMASK where appropriate
Replace opencoded masking and shifting, with GENMASK, FIELD_GET and
FIELD_PREP macros.
Suggested-by: Vincent MAILHOL <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://lore.kernel.org/all/20230529134248.752036-12-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Jimmy Assarsson [Mon, 29 May 2023 13:42:44 +0000 (15:42 +0200)]
can: kvaser_pciefd: Sort register definitions
Sort the registers defines, in the same order as the register bits/fields
are defined.
Sort register bits/fields in MSB-to-LSB order.
Update and add comments.
Jimmy Assarsson [Mon, 29 May 2023 13:42:43 +0000 (15:42 +0200)]
can: kvaser_pciefd: Change return type for kvaser_pciefd_{receive,transmit,set_tx}_irq()
Change return type to void for kvaser_pciefd_transmit_irq(),
kvaser_pciefd_receive_irq() and kvaser_pciefd_set_tx_irq().
These functions always return zero.
Jimmy Assarsson [Mon, 29 May 2023 13:42:42 +0000 (15:42 +0200)]
can: kvaser_pciefd: Rename device ID defines
Rename device ID defines to better match the product name of the supported
device.
Use 16 bit hexadecimal values for device IDs.
And format kvaser_pciefd_id_table using clang-format.
Remove SPI flash parameter read functionality, since it's only used for
reading the interface CAN controller count.
This information is already read from a register, making the information
redundant.