Or Gerlitz [Sun, 16 Oct 2011 08:26:21 +0000 (10:26 +0200)]
mlx4_core: Deprecate log_num_vlan module param
Enable the maximum size (128) supported by the device for the shadow
vlans table, ignoring the module parameter that overrides it. This
table is only used by the IBoE control plane for setting a vlan index
into an RC/UC QP context or UD Address Handle.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Or Gerlitz [Mon, 10 Oct 2011 08:54:42 +0000 (10:54 +0200)]
IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
There's no need to set the vlan-related fields in an IBoE send WQE
control segment:
- the vlan to be used by a UD QP is set in the datagram segment.
- for GSI (CM) QP, all the headers down to 8021q and MAC are built by
the software anyway.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Or Gerlitz [Mon, 10 Oct 2011 08:53:41 +0000 (10:53 +0200)]
IB/mlx4: Enable 4K mtu for IBoE
The IBoE port MTU is derived from the corresponding Ethernet netdevice
MTU, which can support jumbo frames of 9K, and hence surely supports
the max IB mtu of 4K.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Tom Tucker [Tue, 25 Oct 2011 11:08:30 +0000 (16:38 +0530)]
RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
QPs need to be moved to error before telling the firwmare to shutdown
the queue. Otherwise, the application can submit WRs that will never
get fetched by the hardware and never flushed by the driver.
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Acked-by: Steve Wise <swsie@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Kumar Sanghvi [Mon, 24 Oct 2011 15:50:21 +0000 (21:20 +0530)]
RDMA/cxgb4: Serialize calls to CQ's comp_handler
Commit 01e7da6ba53c ("RDMA/cxgb4: Make sure flush CQ entries are
collected on connection close") introduced a potential problem where a
CQ's comp_handler can get called simultaneously from different places
in the iw_cxgb4 driver. This does not comply with
Documentation/infiniband/core_locking.txt, which states that at a
given point of time, there should be only one callback per CQ should
be active.
This problem was reported by Parav Pandit <Parav.Pandit@Emulex.Com>.
Based on discussion between Parav Pandit and Steve Wise, this patch
fixes the above problem by serializing the calls to a CQ's
comp_handler using a spin_lock.
Reported-by: Parav Pandit <Parav.Pandit@Emulex.Com> Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Kumar Sanghvi [Mon, 24 Oct 2011 15:50:22 +0000 (21:20 +0530)]
RDMA/cxgb3: Serialize calls to CQ's comp_handler
iw_cxgb3 has a potential problem where a CQ's comp_handler can get
called simultaneously from different places in iw_cxgb3 driver. This
does not comply with Documentation/infiniband/core_locking.txt, which
states that at a given point of time, there should be only one
callback per CQ should be active.
Such problem was reported by Parav Pandit <Parav.Pandit@Emulex.Com>
for iw_cxgb4 driver. Based on discussion between Parav Pandit and
Steve Wise, this patch fixes the above problem by serializing the
calls to a CQ's comp_handler using a spin_lock.
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Marcel Apfelbaum [Mon, 24 Oct 2011 09:02:34 +0000 (11:02 +0200)]
mlx4_core: Add extended port capabilities support
An Extended Port Info packet is sent to each hw port during HCA init.
If it returns without error, we assume the port supports extended port
capabilities.
Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Mitko Haralanov [Wed, 19 Oct 2011 22:46:40 +0000 (18:46 -0400)]
IB/qib: Hold links until tuning data is available
Hold the link state machine until the tuning data is read from the
QSFP EEPROM so correct tuning settings are applied before the state
machine attempts to bring the link up. Link is also held on cable
unplug in case a different cable is used.
Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Mike Marciniszyn [Fri, 23 Sep 2011 17:17:00 +0000 (13:17 -0400)]
IB/qib: Remove s_lock around header validation
Review of qib_ruc_check_hdr() shows that the s_lock is not required in
the normal case. The r_lock is held in all cases, and protects the qp
fields that are read.
The s_lock will be needed to around the call to qib_migrate_qp() to
insure that the send engine sees a consistent set of fields.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Mike Marciniszyn [Fri, 23 Sep 2011 17:16:44 +0000 (13:16 -0400)]
IB/qib: Use RCU for qpn lookup
The heavy weight spinlock in qib_lookup_qpn() is replaced with RCU.
The hash list itself is now accessed via jhash functions instead of mod.
The changes should benefit multiple receive contexts in different
processors by not contending for the lock just to read the hash
structures.
The patch also adds a lookaside_qp (pointer) and a lookaside_qpn in
the context. The interrupt handler will test the current packet's qpn
against lookaside_qpn if the lookaside_qp pointer is non-NULL. The
pointer is NULL'ed when the interrupt handler exits.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Mike Marciniszyn [Fri, 23 Sep 2011 17:16:29 +0000 (13:16 -0400)]
IB/qib: Optimize RC/UC code by IB operation
The memset for zeroing work completions had been unconditional.
This patch removes the memset and moves the zeroing into the work
completion with a more explicit field by field set. With this patch,
non-ONLY/non-LAST packets will avoid the overhead since they will not
generate a completion.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
When creating flushed receive CQEs, set the QPID field in the t4_cqe
to the SQ QID and not the RQ QID. Otherwise the poll code will not
find the correct QP context.
Signed-off by: Jonathan Lallinger <jonathan@ogc.us>
Signed-off by: Steve Wise <swise@ogc.us>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Kumar Sanghvi [Thu, 13 Oct 2011 08:21:30 +0000 (13:51 +0530)]
RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
At the time when a peer closes the connection, iw_cxgb4 will not send
a cq event if ibqp.uobject exists. In that case, its possible for a
user application to get blocked in ibv_get_cq_event().
To resolve this, call the cq's comp_handler to unblock any read from
ibv_get_cq_event(). This will trigger userspace to poll the cq and
collect flush status completions for any pending work requests.
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Thu, 11 Aug 2011 20:57:43 +0000 (13:57 -0700)]
RDMA/uverbs: Export ib_open_qp() capability to user space
Allow processes that share the same XRC domain to open an existing
shareable QP. This permits those processes to receive events on the
shared QP and transfer ownership, so that any process may modify the
QP. The latter allows the creating process to exit, while a remaining
process can still transition it for path migration purposes.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Mon, 8 Aug 2011 22:31:51 +0000 (15:31 -0700)]
RDMA/core: Export ib_open_qp() to share XRC TGT QPs
XRC TGT QPs are shared resources among multiple processes. Since the
creating process may exit, allow other processes which share the same
XRC domain to open an existing QP. This allows us to transfer
ownership of an XRC TGT QP to another process.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Tue, 31 May 2011 05:30:46 +0000 (22:30 -0700)]
IB/cm: Do not automatically disconnect XRC TGT QPs
Because an XRC TGT QP can end up being shared among multiple
processes, don't have the ib_cm automatically send a DREQ when the
userspace process that owns the ib_cm_id exits. Disconnect can be
initiated by the user directly; otherwise, the owner of the XRC INI QP
controls the connection.
Note that as a result of the process exiting, the ib_cm will stop
tracking the XRC connection on the target side. For the purposes of
disconnecting, this isn't a big deal. The ib_cm will respond to the
DREQ appropriately. For other messages, mainly LAP, the CM will
reject the request, since there's no one available to route the
request to.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Tue, 14 Jun 2011 23:31:53 +0000 (16:31 -0700)]
RDMA/ucm: Allow user to specify QP type when creating id
Allow the user to indicate the QP type separately from the port space
when allocating an rdma_cm_id. With RDMA_PS_IB, there is no longer a
1:1 relationship between the QP type and port space, so we need to
switch on the QP type to select between UD and connected QPs.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Sun, 29 May 2011 04:56:39 +0000 (21:56 -0700)]
RDMA/cm: Define new RDMA port space specific to IB
Add RDMA_PS_IB. XRC QP types will use the IB port space when operating
over the RDMA CM. For the 'IP protocol' field value, we select 0x3F,
which is listed as being for 'any local network'.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Tue, 2 Aug 2011 18:08:22 +0000 (11:08 -0700)]
IB/cm: Update XRC support based on XRC annex errata
The XRC annex was updated to have XRC behave more like RD. Specifically,
the XRC TGT QPN moves from the local QPN to local EECN field. Lookup of
SRQN is done using the REQ/REP protocol.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Fri, 13 May 2011 17:46:20 +0000 (10:46 -0700)]
IB/cm: Update protocol to support XRC
Update the REQ and REP messages to support XRC connection setup
according to the XRC Annex. Several existing fields must be set to 0 or
1 when connecting XRC QPs, and a reserved field is changed to an
extended transport type.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Fri, 27 May 2011 07:00:12 +0000 (00:00 -0700)]
RDMA/uverbs: Export XRC TGT QPs to user space
Allow user space to operate on XRC TGT QPs the same way as other types
of QPs, with one notable exception: since XRC TGT QPs may be shared
among multiple processes, the XRC TGT QP is allowed to exist beyond the
lifetime of the creating process.
The process that creates the QP is allowed to destroy it, but if the
process exits without destroying the QP, then the QP will be left bound
to the lifetime of the XRCD.
TGT QPs are not associated with CQs or a PD.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Thu, 26 May 2011 00:08:38 +0000 (17:08 -0700)]
RDMA/uverbs: Export XRC SRQs to user space
We require additional information to create XRC SRQs than we can
exchange using the existing create SRQ ABI. Provide an enhanced create
ABI for extended SRQ types.
Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il>
and Roland Dreier <roland@purestorage.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Fri, 27 May 2011 06:06:44 +0000 (23:06 -0700)]
RDMA/verbs: Cleanup XRC TGT QPs when destroying XRCD
XRC TGT QPs are intended to be shared among multiple users and
processes. Allow the destruction of an XRC TGT QP to be done explicitly
through ib_destroy_qp() or when the XRCD is destroyed.
To support destroying an XRC TGT QP, we need to track TGT QPs with the
XRCD. When the XRCD is destroyed, all tracked XRC TGT QPs are also
cleaned up.
To avoid stale reference issues, if a user is holding a reference on a
TGT QP, we increment a reference count on the QP. The user releases the
reference by calling ib_release_qp. This releases any access to the QP
from a user above verbs, but allows the QP to continue to exist until
destroyed by the XRCD.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Tue, 24 May 2011 02:59:25 +0000 (19:59 -0700)]
RDMA/core: Add XRC QPs
XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).
XRC communication is between an initiator (INI) QP and a target (TGT)
QP. Target QPs are associated with SRQs through an XRCD. An XRC TGT QP
behaves like a receive-only RD QP. XRC INI QPs behave similarly to RC
QPs, except that work requests posted to an XRC INI QP must specify the
remote SRQ that is the target of the work request.
We define two new QP types for XRC, to distinguish between INI and TGT
QPs, and update the core layer to support XRC QPs.
This patch is derived from work by Jack Morgenstein
<jackm@dev.mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Tue, 24 May 2011 02:42:29 +0000 (19:42 -0700)]
RDMA/core: Add XRC SRQ type
XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).
XRC defines SRQs that are specifically used by XRC connections. Expand
the SRQ code to support XRC SRQs. An XRC SRQ is currently restricted to
only XRC use according to the IB XRC Annex.
Portions of this patch were derived from work by
Jack Morgenstein <jackm@dev.mellanox.co.il>.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Mon, 23 May 2011 23:31:36 +0000 (16:31 -0700)]
RDMA/core: Add SRQ type field
Currently, there is only a single ("basic") type of SRQ, but with XRC
support we will add a second. Prepare for this by defining an SRQ type
and setting all current users to IB_SRQT_BASIC.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Tue, 24 May 2011 00:52:46 +0000 (17:52 -0700)]
RDMA/core: Add XRC domain support
XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).
A few new concepts are introduced to support this. This patch adds:
- A new device capability flag, IB_DEVICE_XRC, which low-level
drivers set to indicate that a device supports XRC.
- A new object type, XRC domains (struct ib_xrcd), and new verbs
ib_alloc_xrcd()/ib_dealloc_xrcd(). XRCDs are used to limit which
XRC SRQs an incoming message can target.
This patch is derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il>.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Introduce support for the following extended speeds:
FDR-10: a Mellanox proprietary link speed which is 10.3125 Gbps with
64b/66b encoding rather than 8b/10b encoding.
FDR: IBA extended speed 14.0625 Gbps.
EDR: IBA extended speed 25.78125 Gbps.
Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Randy Dunlap [Fri, 7 Oct 2011 20:59:41 +0000 (13:59 -0700)]
IB/ipath: Add missing <linux/stat.h> in ipath_chip_init.c
Fix build errors:
drivers/infiniband/hw/ipath/ipath_init_chip.c:54:1: error: 'S_IRUGO' undeclared here (not in a function)
drivers/infiniband/hw/ipath/ipath_init_chip.c:54:1: error: bit-field '<anonymous>' width not an integer constant
drivers/infiniband/hw/ipath/ipath_init_chip.c:67:1: error: 'S_IWUSR' undeclared here (not in a function)
drivers/infiniband/hw/ipath/ipath_init_chip.c:67:1: error: bit-field '<anonymous>' width not an integer constant
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Roland Dreier <roland@purestorage.com>
Support for Packed and Unaligned (PAU) FPDUs is needed for
interoperability between NES and non-NES nodes. When the NES hardware
detects a PAU frame, it will pass it to the driver to process the
frame. NES driver creates a new frame for each FPDU and forwards it
to the hardware to be sent to its associated qp.
Fixes a crash that occurs during close when error async event is received.
Terminate message is not sent to the remote node if already processing close.
RDMA/nes: Add support for MPAv2 Enhanced RDMA Negotiation
This patch adds support for Enhanced RDMA Connection Establishment
(draft-ietf-storm-mpa-peer-connect-06), aka MPAv2. Details of draft
can be obtained from:
<http://www.ietf.org/id/draft-ietf-storm-mpa-peer-connect-06.txt>
For backwards compatibility, the MPAv2 enabled driver reverts to MPAv1
if the remote node doesn't support MPAv2.
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Faisal Latif <Faisal.Latif@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
RDMA/cxgb4: Add support for MPAv2 Enhanced RDMA Negotiation
This patch adds support for Enhanced RDMA Connection Establishment
(draft-ietf-storm-mpa-peer-connect-06), aka MPAv2. Details of draft
can be obtained from:
<http://www.ietf.org/id/draft-ietf-storm-mpa-peer-connect-06.txt>
The patch updates the following functions for initiator perspective:
- send_mpa_request
- process_mpa_reply
- post_terminate for TERM error codes
- destroy_qp for TERM related change
- adds layer/etype/ecode to c4iw_qp_attrs for sending with TERM
- peer_abort for retrying connection attempt with MPA_v1 message
- added c4iw_reconnect function
The patch updates the following functions for responder perspective:
- process_mpa_request
- send_mpa_reply
- c4iw_accept_cr
- passes ird/ord to upper layers
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
The code that was recently introduced to report the number
of free contexts is flawed for multiple HCAs:
/* Return the number of free user ports (contexts) available. */
return scnprintf(buf, PAGE_SIZE, "%u\n", dd->cfgctxts -
dd->first_user_ctxt - (u32)qib_stats.sps_ctxts);
The qib_stats is global to the module, not per HCA, so the code is broken
for multiple HCAs.
This patch adds a qib_devdata field, freectxts, that reflects the free
contexts for this HCA.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Eli Cohen [Thu, 6 Oct 2011 16:33:12 +0000 (09:33 -0700)]
mlx4_core: Fix buddy->num_free allocation size
The num_free field of mlx4_buddy has a type of array of unsigned int
while it was allocated as an array of pointers. On 64-bit platforms
this allocates twice more than required. Fix this by allocating the
correct size for the type.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
[ Convert to kcalloc(). - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Dotan Barak [Thu, 6 Oct 2011 16:33:12 +0000 (09:33 -0700)]
mlx4_core: Use the right function to free eq->page_list entries
Fix the memory release function to be consistent with the memory
allocation one to prevent problems if the implementation of
pci_free_consistent() and dma_free_coherent() are different.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
Hefty, Sean [Thu, 6 Oct 2011 16:33:05 +0000 (09:33 -0700)]
IB/mad: Verify mgmt class in received MADs
If a received MAD contains an invalid or reserved mgmt class, we will
attempt to access method_table outside of its range. Add a check to
ensure that mgmt class falls within the handled range.
Found by code inspection.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Hefty, Sean [Thu, 6 Oct 2011 16:33:04 +0000 (09:33 -0700)]
RDMA/cma: Check for NULL conn_param in rdma_accept
Check that conn_param is not null before dereferencing it when
processing rdma_accept(). This is necessary to prevent a possible
system crash, which can be caused by user space.
Problem found by code inspection.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Yong Zhang [Thu, 6 Oct 2011 16:33:04 +0000 (09:33 -0700)]
IB/ehca: Remove IRQF_DISABLED, since it's a no-op
Since commit e58aa3d2d0cc ("genirq: Run irq handlers with interrupts
disabled"), we run all interrupt handlers with interrupts disabled and
we even check and yell when an interrupt handler returns with
interrupts enabled -- cf commit b738a50a2026 ("genirq: Warn when
handler enables interrupts").
Hefty, Sean [Thu, 6 Oct 2011 16:32:33 +0000 (09:32 -0700)]
RDMA/cma: Fix crash in cma_req_handler
The RDMA CM uses the local qp_type to determine how to process an
incoming request. This can result in an incoming REQ being treated as
a SIDR REQ and vice versa. Fix this by switching off the event type
instead, and for good measure verify that the listener supports the
incoming connection request.
This problem showed up when a user space application mismatched the QP
types between a client and server app.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Andy Shevchenko [Thu, 6 Oct 2011 16:32:24 +0000 (09:32 -0700)]
RDMA/amso1100: Use '%pM' format option to print MAC
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Linus Torvalds [Tue, 4 Oct 2011 17:37:06 +0000 (10:37 -0700)]
Merge git://github.com/davem330/net
* git://github.com/davem330/net:
pch_gbe: Fixed the issue on which a network freezes
pch_gbe: Fixed the issue on which PC was frozen when link was downed.
make PACKET_STATISTICS getsockopt report consistently between ring and non-ring
net: xen-netback: correctly restart Tx after a VM restore/migrate
bonding: properly stop queuing work when requested
can bcm: fix incomplete tx_setup fix
RDSRDMA: Fix cleanup of rds_iw_mr_pool
net: Documentation: Fix type of variables
ibmveth: Fix oops on request_irq failure
ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket
cxgb4: Fix EEH on IBM P7IOC
can bcm: fix tx_setup off-by-one errors
MAINTAINERS: tehuti: Alexander Indenbaum's address bounces
dp83640: reduce driver noise
ptp: fix L2 event message recognition
Linus Torvalds [Tue, 4 Oct 2011 16:59:22 +0000 (09:59 -0700)]
Merge branch 'fix/asoc' of git://github.com/tiwai/sound
* 'fix/asoc' of git://github.com/tiwai/sound:
ASoC: omap_mcpdm_remove cannot be __devexit
ASoC: Fix setting update bits for WM8753_LADC and WM8753_RADC
ASoC: use a valid device for dev_err() in Zylonite
Jon Mason [Mon, 3 Oct 2011 14:50:20 +0000 (09:50 -0500)]
PCI: Disable MPS configuration by default
Add the ability to disable PCI-E MPS turning and using the BIOS
configured MPS defaults. Due to the number of issues recently
discovered on some x86 chipsets, make this the default behavior.
Also, add the option for peer to peer DMA MPS configuration. Peer to
peer DMA is outside the scope of this patch, but MPS configuration could
prevent it from working by having the MPS on one root port different
than the MPS on another. To work around this, simply make the system
wide MPS the smallest possible value (128B).
Signed-off-by: Jon Mason <mason@myri.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Deucher [Tue, 4 Oct 2011 14:46:34 +0000 (10:46 -0400)]
drm/radeon/kms: fix channel_remap setup (v2)
Most asics just use the hw default value which requires
no explicit programming. For those that need a different
value, the vbios will program it properly. As such,
there's no need to program these registers explicitly
in the driver. Changing MC_SHARED_CHREMAP requires a reload
of all data in vram otherwise its contents will be scambled.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Cc: stable@kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
We found that adding load, Rx data sometimes drops.(with DMA transfer mode)
The cause is that before starting Rx-DMA processing, Tx-DMA processing starts.
This causes FIFO overrun occurs.
This patch fixes the issue by modifying FIFO tx-threshold and DMA descriptor
size like below.
Current this patch
Rx-descriptor 4Byte+12Byte*341 --> 12Byte*340-4Byte-12Byte
Rx-threshold (Not modified)
Tx-descriptor 4Byte+12Byte*341 --> 16Byte-12Byte*340
Rx-threshold 12Byte --> 2Byte
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
spi-topcliff-pch: add tx-memory clear after complete transmitting
Currently, in case of reading date from SPI flash,
command is sent twice.
The cause is that tx-memory clear processing is missing .
This patch adds the tx-momory clear processing.
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Takashi Iwai [Tue, 4 Oct 2011 01:09:14 +0000 (18:09 -0700)]
lis3: fix regression of HP DriveGuard with 8bit chip
Commit 2a7fade7e03 ("hwmon: lis3: Power on corrections") caused a
regression on HP laptops with 8bit chip. Writing CTRL2_BOOT_8B bit seems
clearing the BIOS setup, and no proper interrupt for DriveGuard will be
triggered any more.
Since the init code there is basically only for embedded devices, put a
pdata check so that the problematic initialization will be skipped for
hp_accel stuff.
Borislav Petkov [Mon, 3 Oct 2011 18:28:18 +0000 (14:28 -0400)]
ide-disk: Fix request requeuing
Simon Kirby reported that on his RAID setup with idedisk underneath
the box OOMs after a couple of days of runtime. Running with
CONFIG_DEBUG_KMEMLEAK pointed to idedisk_prep_fn() which unconditionally
allocates an ide_cmd struct. However, ide_requeue_and_plug() can be
called more than once per request, either from the request issue or the
IRQ handler path and do blk_peek_request() ends up in idedisk_prep_fn()
repeatedly, allocating a struct ide_cmd everytime and "forgetting" the
previous pointer.
Make sure the code reuses the old allocated chunk.
pch_gbe: Fixed the issue on which a network freezes
The pch_gbe driver has an issue which a network stops,
when receiving traffic is high.
In the case, The link down and up are necessary to return a network.
This patch fixed this issue.
Signed-off-by: Toshiharu Okada <toshiharu-linux@dsn.okisemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Fri, 30 Sep 2011 10:38:28 +0000 (10:38 +0000)]
make PACKET_STATISTICS getsockopt report consistently between ring and non-ring
This is a minor change.
Up until kernel 2.6.32, getsockopt(fd, SOL_PACKET, PACKET_STATISTICS,
...) would return total and dropped packets since its last invocation. The
introduction of socket queue overflow reporting [1] changed drop
rate calculation in the normal packet socket path, but not when using a
packet ring. As a result, the getsockopt now returns different statistics
depending on the reception method used. With a ring, it still returns the
count since the last call, as counts are incremented in tpacket_rcv and
reset in getsockopt. Without a ring, it returns 0 if no drops occurred
since the last getsockopt and the total drops over the lifespan of
the socket otherwise. The culprit is this line in packet_rcv, executed
on a drop:
As it shows, the new drop number it taken from the socket drop counter,
which is not reset at getsockopt. I put together a small example
that demonstrates the issue [2]. It runs for 10 seconds and overflows
the queue/ring on every odd second. The reported drop rates are:
ring: 16, 0, 16, 0, 16, ...
non-ring: 0, 15, 0, 30, 0, 46, 0, 60, 0 , 74.
Note how the even ring counts monotonically increase. Because the
getsockopt adds tp_drops to tp_packets, total counts are similarly
reported cumulatively. Long story short, reinstating the original code, as
the below patch does, fixes the issue at the cost of additional per-packet
cycles. Another solution that does not introduce per-packet overhead
is be to keep the current data path, record the value of sk_drops at
getsockopt() at call N in a new field in struct packetsock and subtract
that when reporting at call N+1. I'll be happy to code that, instead,
it's just more messy.
David Vrabel [Fri, 30 Sep 2011 06:37:51 +0000 (06:37 +0000)]
net: xen-netback: correctly restart Tx after a VM restore/migrate
If a VM is saved and restored (or migrated) the netback driver will no
longer process any Tx packets from the frontend. xenvif_up() does not
schedule the processing of any pending Tx requests from the front end
because the carrier is off. Without this initial kick the frontend
just adds Tx requests to the ring without raising an event (until the
ring is full).
This was caused by 47103041e91794acdbc6165da0ae288d844c820b (net:
xen-netback: convert to hw_features) which reordered the calls to
xenvif_up() and netif_carrier_on() in xenvif_connect().
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Fri, 23 Sep 2011 10:53:34 +0000 (10:53 +0000)]
bonding: properly stop queuing work when requested
During a test where a pair of bonding interfaces using ARP monitoring
were both brought up and torn down (with an rmmod) repeatedly, a panic
in the timer code was noticed. I tracked this down and determined that
any of the bonding functions that ran as workqueue handlers and requeued
more work might not properly exit when the module was removed.
There was a flag protected by the bond lock called kill_timers that is
set when the interface goes down or the module is removed, but many of
the functions that monitor link status now unlock the bond lock to take
rtnl first. There is a chance that another CPU running the rmmod could
get the lock and set kill_timers after the first check has passed.
This patch does not allow any function to queue work that will make
itself run unless kill_timers is not set. I also noticed while doing
this work that bond_resend_igmp_join_requests did not have a check for
kill_timers, so I added the needed call there as well.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Reported-by: Liang Zheng <lzheng@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michel Dänzer [Fri, 30 Sep 2011 15:16:53 +0000 (17:16 +0200)]
drm/radeon: Set cursor x/y to 0 when x/yorigin > 0.
Apart from the obvious cleanup, this should make the line
cursor_end = x - xorigin + w;
correct now.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Michel Dänzer [Fri, 30 Sep 2011 15:16:52 +0000 (17:16 +0200)]
drm/radeon: Update AVIVO cursor coordinate origin before x/yorigin calculation.
Fixes cursor disappearing prematurely when moving off a top/left edge which
is not located at the desktop top/left edge.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Cc: stable@kernel.org Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Nicholas Miell <nmiell@gmail.com> Reviewed-by: Michel Dänzer <michel@daenzer.net> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Mon, 3 Oct 2011 13:13:46 +0000 (09:13 -0400)]
drm/radeon/kms: add retry limits for native DP aux defer
The previous code could potentially loop forever. Limit
the number of DP aux defer retries to 4 for native aux
transactions, same as i2c over aux transactions.
Alex Deucher [Mon, 3 Oct 2011 13:13:45 +0000 (09:13 -0400)]
drm/radeon/kms: fix regression in DP aux defer handling
An incorrect ordering in the error checking code lead
to DP aux defer being skipped in the aux native write
path. Move the bytes transferred check (ret == 0)
below the defer check.
Tracked down by: Brad Campbell <brad@fnarfbargle.com>
Linus Torvalds [Sat, 1 Oct 2011 15:37:25 +0000 (08:37 -0700)]
Merge branches 'irq-urgent-for-linus', 'x86-urgent-for-linus' and 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip
* 'irq-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
irq: Fix check for already initialized irq_domain in irq_domain_add
irq: Add declaration of irq_domain_simple_ops to irqdomain.h
* 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
x86/rtc: Don't recursively acquire rtc_lock
* 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
posix-cpu-timers: Cure SMP wobbles
sched: Fix up wchan borkage
sched/rt: Migrate equal priority tasks to available CPUs
Josef Bacik [Fri, 30 Sep 2011 19:23:54 +0000 (15:23 -0400)]
Btrfs: force a page fault if we have a shorty copy on a page boundary
A user reported a problem where ceph was getting into 100% cpu usage while doing
some writing. It turns out it's because we were doing a short write on a not
uptodate page, which means we'd fall back at one page at a time and fault the
page in. The problem is our position is on the page boundary, so our fault in
logic wasn't actually reading the page, so we'd just spin forever or until the
page got read in by somebody else. This will force a readpage if we end up
doing a short copy. Alexandre could reproduce this easily with ceph and reports
it fixes his problem. I also wrote a reproducer that no longer hangs my box
with this patch. Thanks,
Reported-and-tested-by: Alexandre Oliva <aoliva@redhat.com> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>