What is the role of the fabric extender (FEX) in Cisco UCS QoS? This question was posted as a comment to my recent article VMware 10GE QoS Design Deep Dive with Cisco UCS, Nexus — as well as here, at the Cisco Support community forums.
A comment to test my understanding and then a question to follow-up…
The UCS ecosystem leverages a port aggregation solution to chassis I/O, namely, the FEX modules.
The FEX modules are not fully featured switches. Nor do they possess any forwarding policy intelligence at all. Instead, the FEX modules deploy a “pinning” approach in which downlinks (those that face the blade server’s NIC’s, LOMs, mezzanine cards) are mapped to an uplink port (those that face a 6100 Fabric Interconnect) to form what can be described as an aggregator group.
The result is a simplified approach to blade I/O in which the traffic patterns are predictable and failover is deterministic. Moreover, there is no need to configure STP because the ports are uplinked in a manner as to preclude any possibility of a bridging loop.
This having been said, is there some merit to the argument that this port aggregation design creates a hole — a discontinuity — in the middle of a QoS deployment because the scheduling of packets on the uplink ports facing the 6100 Fabric Interconnect is not performed in a manner that recognizes priority? In other words, no QoS on the FEX.
To elaborate a bit more, one can have a VMware deployment and leverage NetIOC or perhaps configure QoS on a 1000v switch (whose uplink ports are mapped to a port on the Palo VIC) and configure QoS on the VIC, and then on the 6100 Fabric Interconnect. But, since the FEX is not scheduling traffic to the 6100 Fabric Interconnect according to any priority, the QoS deployment has a hole in the middle, so to speak.
Unlike the Cisco UCS Fabric Interconnect and the Virtual Interface Card (Palo) that each have (8) COS-based queues, the FEX has (4) queues, of which only (3) are used. One FEX queue is used for strict priority Control traffic for the FI to manage the FEX and adapters. The second FEX queue is for No Drop traffic classes such as FCoE. The third FEX queue is used for Drop classes (all the other stuff). While each queue independently empties traffic in the order it was received (FIFO), the No Drop queue carrying FCoE is FIFO as well but is serviced for transmission on the wire with a guaranteed bandwidth weighting.
One could look at that and say: between the FI, FEX, and Adapter, the FEX is the odd device out sitting in the middle with inconsistent QoS capabilities from other two, creating a “hole” or “discontinuity” in the Cisco UCS end-to-end QoS capabilities. That’s a fair observation to make.
However, before we stop here, there is one very interesting and unique behavior the FEX exhibits that’s entirely applicable to this conversation:
When the FEX gets congested on any interface (facing the FI or Adapters), it will push that congestion back to the source, rather than dropping the traffic. The FEX does this for both the Drop and No Drop traffic classes. The FEX will send 802.1Qbb PFC pause messages to the FI and NIV capable adapters (such as Menlo or Palo). For non-NIV capable adapters such as the standard Intel Oplin, the FEX will send a standard 802.3X pause message.
At this point its up to the device receiving the pause message to react to it by allowing its buffers to fill up and apply its more intelligent QoS scheduling scheme from there. For example, both the Fabric Interconnect and Palo adapter would treat the pause message as if its own link was congested and apply the QoS bandwidth policy defined in the “QoS System Class” settings in UCS Manager.
Side note: The Gen2 Emulex and Qlogic adapters are NIV capable, however they do not honor the PFC pause messages sent by the FEX for the Drop classes, it will keep sending traffic that may be dropped in the fabric. The Gen1 Menlo and Palo adapters do honor the PFC message for all classes.
What this means is that while the FEX does not have the same (8) queues of the Fabric Interconnect or Palo adapter, the FEX aims to remove itself from the equation by placing more of the QoS burden on these more capable devices. From both a QoS and networking perspective, the FEX behaves like a transparent no-drop bump in the wire.
Is it perfect? No. In the ideal situation the FEX, in addition to pushing the congestion back, it would also have (8) COS-based queues for a consistent QoS bandwidth policy at every point.
Is it pretty darn good? Yes! Especially when compared to the alternative 10GE blade server solutions that have no concept of QoS to begin with.