Cisco UCS Fabric Extender (FEX) QoS

Filed in Cisco UCS, FCoE, FEX, NIV, Q&A, QoS by on December 8, 2010 14 Comments

What is the role of the fabric extender (FEX) in Cisco UCS QoS? This question was posted as a comment to my recent article VMware 10GE QoS Design Deep Dive with Cisco UCS, Nexus — as well as here, at the Cisco Support community forums.

Brad:

A comment to test my understanding and then a question to follow-up…

The UCS ecosystem leverages a port aggregation solution to chassis I/O, namely, the FEX modules.

The FEX modules are not fully featured switches. Nor do they possess any forwarding policy intelligence at all. Instead, the FEX modules deploy a “pinning” approach in which downlinks (those that face the blade server’s NIC’s, LOMs, mezzanine cards) are mapped to an uplink port (those that face a 6100 Fabric Interconnect) to form what can be described as an aggregator group.

The result is a simplified approach to blade I/O in which the traffic patterns are predictable and failover is deterministic. Moreover, there is no need to configure STP because the ports are uplinked in a manner as to preclude any possibility of a bridging loop.

This having been said, is there some merit to the argument that this port aggregation design creates a hole — a discontinuity — in the middle of a QoS deployment because the scheduling of packets on the uplink ports facing the 6100 Fabric Interconnect is not performed in a manner that recognizes priority? In other words, no QoS on the FEX.

To elaborate a bit more, one can have a VMware deployment and leverage NetIOC or perhaps configure QoS on a 1000v switch (whose uplink ports are mapped to a port on the Palo VIC) and configure QoS on the VIC, and then on the 6100 Fabric Interconnect. But, since the FEX is not scheduling traffic to the 6100 Fabric Interconnect according to any priority, the QoS deployment has a hole in the middle, so to speak.

Thoughts?

My Answer:

Unlike the Cisco UCS Fabric Interconnect and the Virtual Interface Card (Palo) that each have (8) COS-based queues, the FEX has (4) queues, of which only (3) are used.  One FEX queue is used for strict priority Control traffic for the FI to manage the FEX and adapters.  The second FEX queue is for No Drop traffic classes such as FCoE. The third FEX queue is used for Drop classes (all the other stuff).  While each queue independently empties traffic in the order it was received (FIFO), the No Drop queue carrying FCoE is FIFO as well but is serviced for transmission on the wire with a guaranteed bandwidth weighting.

One could look at that and say: between the FI, FEX, and Adapter, the FEX is the odd device out sitting in the middle with inconsistent QoS capabilities from other two, creating a “hole” or “discontinuity” in the Cisco UCS end-to-end QoS capabilities.  That’s a fair observation to make.

However, before we stop here, there is one very interesting and unique behavior the FEX exhibits that’s entirely applicable to this conversation:

When the FEX gets congested on any interface (facing the FI or Adapters), it will push that congestion back to the source, rather than dropping the traffic.  The FEX does this for both the Drop and No Drop traffic classes.  The FEX will send 802.1Qbb PFC pause messages to the FI and NIV capable adapters (such as Menlo or Palo).  For non-NIV capable adapters such as the standard Intel Oplin, the FEX will send a standard 802.3X pause message.

At this point its up to the device receiving the pause message to react to it by allowing its buffers to fill up and apply its more intelligent QoS scheduling scheme from there.  For example, both the Fabric Interconnect and Palo adapter would treat the pause message as if its own link was congested and apply the QoS bandwidth policy defined in the “QoS System Class” settings in UCS Manager.

Side note: The Gen2 Emulex and Qlogic adapters are NIV capable, however they do not honor the PFC pause messages sent by the FEX for the Drop classes, it will keep sending traffic that may be dropped in the fabric.  The Gen1 Menlo and Palo adapters do honor the PFC message for all classes.

What this means is that while the FEX does not have the same (8) queues of the Fabric Interconnect or Palo adapter, the FEX aims to remove itself from the equation by placing more of the QoS burden on these more capable devices.  From both a QoS and networking perspective, the FEX behaves like a transparent no-drop bump in the wire.

Is it perfect? No.  In the ideal situation the FEX, in addition to pushing the congestion back, it would also have (8) COS-based queues for a consistent QoS bandwidth policy at every point.

Is it pretty darn good? Yes! :-)  Especially when compared to the alternative 10GE blade server solutions that have no concept of QoS to begin with.

About the Author ()

Brad Hedlund (CCIE Emeritus #5530) is an Engineering Architect in the CTO office of VMware’s Networking and Security Business Unit (NSBU). Brad’s background in data center networking begins in the mid-1990s with a variety of experience in roles such as IT customer, value added reseller, and vendor, including Cisco and Dell. Brad also writes at the VMware corporate networking virtualization blog at blogs.vmware.com/networkvirtualization

Comments (14)

Trackback URL | Comments RSS Feed

  1. Victor says:

    Nice feedback, Brad. Please allow me to ask some follow-up questions.

    How does one go about mapping the 8 separate COS-based traffic classes in the NIV-enabled Palo CNA and the FI to the 2 queues in the UCS FEX? Is this something that is preconfigured or is manual intervention necessary?

    Does the mapping methodology — whether manual or automatic — go something like this: packets with COS values of 1-4 are mapped to the No-Drop queue while packets with COS values 5-8 are mapped to the Drop Queue?

    Is there a scheduling algorithm that gives priority to the No-Drop queue? If not, what is the value in using separate queues if the traffic that occupies them is not given differentiated treatment?

    Are these 2 queues located on the FEX uplink ports or the downlink ports?

    Pardon all the questions, but I am trying to fully understand the operation of the FEX. I don’t know of anyone who is more qualified to do so than you.

    By the way, I am not asking you these questions so I can say “gotcha.” Cisco is not the only vendor that leverages a “pinning” methodology to chassis server I/O. So these questions and concerns I raise are applicable to other vendor solutions, too. I do agree that overall, the UCS has a pretty robust QoS architecture.

    • Brad Hedlund says:

      Victor,
      See my responses below:

      How does one go about mapping the 8 separate COS-based traffic classes in the NIV-enabled Palo CNA and the FI to the 2 queues in the UCS FEX?

      On the FEX, traffic falls into two categories, Drop, and No Drop. You define which COS values are receiving No Drop service in the UCS Manager “QoS System Class” settings under the LAN tab. From there, UCS Manager programs the FEX with the information necessary to determine which queue each COS value belongs in, No Drop, or Drop (everything else). Unlike other 10GE blade solutions, under no circumstances do you ever need to configure the FEX individually yourself.

      Does the mapping methodology — whether manual or automatic — go something like this: packets with COS values of 1-4 are mapped to the No-Drop queue while packets with COS values 5-8 are mapped to the Drop Queue?

      No. See my answer above. You define which COS values are No Drop as a global setting in UCS Manager. By default, COS 3 is for FCoE traffic and pre-configured for No Drop. This default configuration works for every customer I have worked with thus far. I have yet to see a customer configure another No Drop class in addition to FCoE, but I’m sure its only a matter of time.

      Is there a scheduling algorithm that gives priority to the No-Drop queue? If not, what is the value in using separate queues if the traffic that occupies them is not given differentiated treatment?

      Yes. I mention this in the article. The No Drop class receives weighted bandwidth scheduling above the rest of the Drop traffic. In the same “QoS System Class” settings in UCS Manager where you defined which classes are Drop and No Drop, you can also define the bandwidth weight of each class. The default setting for COS 3 traffic (FCoE) is 40% – meaning a minimum bandwidth guarantee of 4 Gbps on a 10GE interface. You can increase or decrease this setting per your preference.

      Are these 2 queues located on the FEX uplink ports or the downlink ports?

      These are ingress queues on all ports.

      Cisco is not the only vendor that leverages a “pinning” methodology to chassis server I/O

      Not sure what other vendors you are referring to here, but the big advantage of the Cisco UCS FEX is the simplicity of a zero touch deployment and more importantly *consistency* — Every FEX is configured the same way and behaves the same based on the system wide policy settings in UCS Manager. As I mentioned above, under no circumstances does the customer ever need to configure the FEX, load software on the FEX, set up IP address on the FEX, etc. The FEX is also a stateless device, it doesnt carry any identity, software, or configuration that would need to be manually transferred onto a replacement FEX.

      If a FEX dies, you:
      1) pull out the bad FEX
      2) put in the new FEX
      3) Walk away

      Cheers,
      Brad

  2. VIA says:

    Hi Brad,
    I want to know about difference between implement VN-Link in hardware (6100 with Palo) and software (Nexus 1000v) with vCenter
    What’s advantage or disadvantage between these two
    Thank you.

  3. Tommy says:

    Hey Brad, curious on how we can determine how many links we need between the IOM and the FIC. We’ve been battling some performance congestion issues and can’t seem to determine where our resource contention is. Your QoS article really got me thinking on how much bandwidth is really available to FC per host. Right now we’ve only been using 2 cables per fabric, but I’m thinking this oversubscription is pretty short considering the workloads.

    • Rob Taylor says:

      Hi Tommy. I am new to UCS as well. What version of UCS manager are you on?
      Are you running bare metal servers or a hypervisor like vmware or hyper-v? What kind of san? Do you have any kind of instrumentation in place to see how much traffic you are moving over your links? I use Solarwinds Orion for snmp, and I monitor the uplinks from our FI’s to our FC switches, as well as our 10gig uplinks.
      Solarwinds can also hook into UCS(I haven’t done this yet, but am looking to do so), so that you can monitor the links from FI to FEX.
      There is also a screen in UCSM that shows you the traffic volume that you are moving on your links. Tools like those might give you some insight into what’s going on if you haven’t already looked at them.
      Also, are you running port channels to the fex from the FI’s?

      • Tommy says:

        We run firmware 2.1(1d) today, which is actually pretty damn buggy. We have a mixed generation of hardware but select port-channels as preferred so when we happen to have both Gen2 FICs with Gen2 FEX it does port-channel. I have Solarwinds watching the links today which don’t indicated an issue, but when you dig in an look at the QoS classes you can see the best-effort class tail dropping. We could increase the polling to 1 minute intervals, but you’re still not going to see the spikes. I think its important here to demonstrate that if you’re going to pin a large bit of your traffic to one side or the other, that you consider doubling the number of chassis links in use.

        • Rob Taylor says:

          Hi Tommy. Do you also have the Solarwinds Engineer’s toolkit as well?
          They have some some bandwidth graphing tools, which, if you pointed at specific interfaces, can get you stats every 10 to 15 seconds or so. Not quite real time, but better than once a minute. Also, have you tried switching off from port channels to see if it makes any difference?
          Also, what kind of apps are you running?
          How many blades per chassis?

  4. Boudewijn Plomp says:

    Brad, very nice article! Thanks a lot for sharing this with us.

    I do have a question. Many network devices (e.g. Nexus switches) use DCB(x) to share QoS/PFC information. I can see the Nexus switches and Fabric Interconnects are willing to do so. But on a Blade Server with Windows Server 2012 R2 you can install a Data Center Bridging services which implements the LLDP protocol. The server is then willing to communicate. But… it does not receive any DCB exchange information.

    Can a vNIC (on a VIC within a Blade Server) exchange DCB information by LLDP with a vEth port (on a Cisco UCS Fabric Interconnect)?

Leave a Reply

Your email address will not be published. Required fields are marked *