Cisco UCS Fabric Failover: Slam Dunk? or So What?

Filed in Cisco UCS, NIV, VMware by on September 23, 2010 27 Comments

Fabric Failover is a unique capability found only in Cisco UCS that allows a server adapter to have a highly available connection to two redundant network switches without any NIC teaming drivers or any NIC failover configuration required in the OS, hypervisor, or virtual machine.  In this article we will take a brief look at some common use cases for Fabric Failover and the UCS Manager software versions that support each implementation.

With Fabric Failover, the intelligent network provides the server adapter with a virtual cable that (because its virtual) can be quickly and automatically moved from one upstream switch to another.  Furthermore, the upstream switch that this virtual cable connects to doesn’t have to be the first switch inside a blade chassis, rather it can be extended through a “fabric extender” and connected to a common system wide upstream switch (in UCS this would be the fabric interconnect). Shown in the diagram below.

Cisco UCS Fabric Failover

When the intelligent network moves a virtual cable from one physical switch to another, the virtual cable is moved to an identically configured virtual Ethernet port with the exact same numerical interface identifier.  Additionally,  we should also consider other state information that should move along with the virtual cable, such as the adapters network MAC address.  In Cisco UCS, the adapters network MAC address is implicitly known (meaning, it does not need to be learned) because the adapter MAC address and network port it connects to were both provisioned by UCS Manager.  When Fabric Failover is enabled, the implicit MAC address of the adapter is synchronized with the second fabric switch in preparation for a failure.  This capability has been present since UCS Manager version 1.0.  Below is an example of a Windows or Linux OS loaded on the bare metal server with Fabric Failover enabled.  The OS has a simple redundant connection to the network with a single adapter and no requirement for a NIC Teaming configuration.

Bare Metal OS with single adapter and Fabric Failover

In addition to synchronization of the implicit MAC, there may be other information we may need to migrate depending on our implementation.  For example, if our server is running a hypervisor it may be hosting many virtual machines behind a software switch, with each VM having its own MAC address, and each VM using the same server adapter and virtual cable for connectivity outside of the server.  When the virtual machines are connected to a software based hypervisor switch, the MAC addresses from the individual virtual machines are not implicitly known because the VM MAC was provisioned by vCenter, not UCS Manager.  As a result, each VM MAC will need to be learned by the virtual Ethernet port on the upstream switch (UCS Fabric Interconnect).  If a fabric failure should occur we would want these learned MAC addresses to be ready on the second fabric as well.  This capability of synchronizing the learned MAC address between fabrics is not currently available in UCS Manager version 1.3, rather this capability is planned in UCSM version 1.4.

The diagram below shows the use of Fabric Failover in conjunction with a hypervisor switch, running UCS Manager v1.3 or below.  As you can see, the implicit MAC of the NIC used by hypervisor is in sync, but the more important learned MACs of the virtual machines and Service Console is not.  Hence, in a failed fabric scenario the 6100-B would need to re-learn these MAC addresses, which could take an unpredictable amount of time, depending on how long it takes for these end points to send another packet (preferably a packet that travels to the upstream L3 switch for full bi-directional convergence).  Therefore, the design combination shown below is not supported in UCSM version 1.3 or below.

Fabric Failover with Hypervisor switch, UCSM v1.3 or below

With the release of UCS Manager v1.4, support will be added for synchronizing not only the implicit MACs, but also any learned MACs.  This will enable the flexibility to use Fabric Failover in any virtualization deployment scenario, such as when a hypervisor switch is present.  Should a fabric failure occur, all affected MACs will already be available on the second fabric, and the failure event will trigger the second UCS Fabric Interconnect to send Gratuitous ARPs upstream to aid in the upstream networks fast re-learning of the new location to reach the affected MACs.  The Fabric Failover implementation with a hypervisor switch and UCSM v1.4 is shown in the diagram below.

Fabric Failover with Hypervisor switch and UCS Manager v1.4

Note: With the hypervisor switch examples shown above, it would be certainly possible to provide a redundant connection to the hypervisor management console with a single adapter.  However, some hypervisors, most notably VMware ESX, will complain and bombard you with warnings you that you only have a single adapter provisioned to the management network and therefore no redundancy.  This is perfectly understandable because the hypervisor has no awareness of the Fabric Failover services provisioned to it.  After all, that is the goal of Fabric Failover, to be transparent.  Furthermore, if you wish to use hypervisor switch NIC load sharing mechanisms such as VMware’s IP Hash or Load Based Teaming, or Nexus 1000V’s MAC Pinning or vPC Host-Mode, these technologies fundamentally require multiple adapters provisioned to the Port Group or Port Profile to operate and inherently provide redundancy themselves.  In a nutshell, Fabric Failover with a hypervisor switch is certainly an intriguing design choice, but it may not be a slam dunk in every situation.

There is one very intriguing virtualization based design where Fabric Failover IS a slam dunk every time, that being Hypervisor Passthrough or Hypervisor Bypass designs capable in Cisco UCS (commonly known as HW VN-Link).  When Cisco UCS is configured for HW VN-link it presents itself to VMware vCenter as a distributed virtual switch (DVS), much like the Nexus 1000V does.  Furthermore, Cisco UCS will dynamically provision virtual adapters in the Cisco VIC for each VM as they are powered on or moved to a host server.  These virtual adapters are basically a hardware implementation of a DVS port.  One method of getting the VM connected to its hardware DVS port is by passing its packets through the hypervisor with a software implementation of a pass through module (rather than a switch).  The software pass through module requires fewer CPU cycles than a software switch, yet the hypervisor still has visibility to network I/O and therefore vMotion works as expected.  This unparalleled integration with VMware has been in UCS Manager since version 1.1.

So where does Fabric Failover have a slam dunk role?  Simply put, Fabric Failover provides a redundant network connection to the VM without the need to provision a second NIC in the VM settings.  Most VM’s are provisioned with 1 NIC simply because the hypervisor switch provided the redundancy.  With the hypervisor switch gone, Fabric Failover provides the redundancy to the VM without requiring any changes to the VM configuration.  This makes it very easy to consider UCS HW VN-Link as a design choice, because minimal changes afford an easy migration.

In a design with hypervisor pass-through (HW VN-Link), every MAC address is implicitly known, including those of the virtual machines.  This is the result of the programmatic API connection between Cisco UCS and vCenter.  Every time a new VM is provisioned in vCenter, and powered on in the DVS, Cisco UCS knows about it, including the MAC address provisioned to the VM.  Therefore, the MAC addresses are implicitly known through a programming interface and assigned to a unique virtual Ethernet port for every VM.  Because UCS Fabric Failover has worked with implicit MACs since version 1.0, there are no caveats as discussed above in the hypervisor switch designs.

The diagram below depicts Fabric Failover in conjunction with hypervisor pass-through.

Fabric Failover with Hypervisor Pass through

Below is a variation of HW VN-Link that uses a complete hypervisor bypass approach, rather than pass through shown above.  This approach requires no CPU cycles to get the VM’s I/O to the network because the VM’s network driver writes directly to the physical adapter.  Given that network I/O is completely bypassing all hypervisor layers there is an immediate benefit in throughput, lower latency, and fewer (if any) CPU cycles required for networking.  The challenge with this approach is getting vMotion to work.  Ed Bugnion (CTO of Cisco’s SAVBU) presented a solution to this challenge at VMworld 2010 in which he showed temporarily moving the VM to pass-through mode for the duration of the vMotion, then back to bypass at completion.  But I digress.

Again, Fabric Failover has a slam dunk role to play here as well.  Rather than retooling every VM with a second NIC (just for the sake of redundancy), Fabric Failover will take care it :-)

Fabric Failover with Hypervisor Bypass

Fabric Failover is so unique to Cisco UCS that in order for HP, IBM, or Dell to implement the same capability would require a fundamental overhaul of their architecture into a more combined network + compute integrated system like that of UCS.

Without Fabric Failover, how are the other compute vendors going to offer a hypervisor pass-through or bypass design choice without asking you to retool all of your VM’s with a second NIC?

Even if you did add a second NIC to every VM, how are you going to insure consistency?  How will you know for sure that each VM has a second NIC for redundancy, and the second NIC is properly associated to a redundant path?

How long will it be before the others (HP, IBM, Dell) have an adapter that can provision enough virtual adapters to assign one to each VM?

When that adapter is finally available, will the virtual adapters be dynamically provisioned via VMware vCenter? Or will you need to statically configure each adapter?  How will vMotion work with statically configured adapters?

Of course none of this is a concern with Cisco UCS, because all of this technology is already there, built in, ready to use. :-)  Whether or not you use hypervisor bypass in your design is certainly your choice — but at least with Cisco UCS you had that choice to make!

What do you think?  Is Fabric Failover a “Slam Dunk”? Or a “So What”?

About the Author ()

Brad Hedlund is an Engineering Architect with the CTO office of VMware’s Networking and Security Business Unit (NSBU), focused on network & security virtualization (NSX) and the software-defined data center. Brad’s background in data center networking begins in the mid-1990s with a variety of experience in roles such as IT customer, systems integrator, architecture and technical strategy roles at Cisco and Dell, and speaker at industry conferences. CCIE Emeritus #5530.

Comments (27)

Trackback URL | Comments RSS Feed

  1. pattho says:

    Also I would recommend hw failover in Microsoft NLB setups (yes people really do use this)

    Glad there are plans to send those arps for the hypervisor in the future

    Current complaint I have is there is no nic teaming software available to Windows for the VIC adapter so hw failover is only option for redundancy on baremetal Windows installs

    VMWare, unix environments this is not a problem but for Windows/Hyper-V environments definitely it is

    • Basio says:

      Hello Pattho, i am working on one big project now to do a cloud solution for my Intel HW & IBM Power Servers as well.

      EMC & Cisco came to offer me the Vblock a an integrated solution to run my virtual infrastructure under one cloud.
      my concern here is, my Intel servers are running windows Hyper-v, and i heard some rumors that Hyper-V is not supported by Cisco Nexus to move a virtual server between different locations, and that it only support the ibm AIX virtualization and VmWare “Vmotion”

      i appreciate if you can help me with more info if you have…

      Thank you

      • Dan Hiss says:

        Cisco UCS does not support IBM’s AIX platform. Only IBM supports AIX servers in their Blade environment.

  2. Phil Lowden says:

    Great stuff, Brad, thanks! Any idea how much CPU efficiency increases with hypervisor bypass? 1%? 10%?

    • Brad Hedlund says:

      Phil,
      I have heard anecdotal reports of ~10% CPU savings with hypervisor bypass. While that is nice, the other attractive advantages are the low latency and throughput similar to the OS/App running on bare metal, while still maintaining the familiar 1-VM to 1-switchport service provisioning and network management model.

      Cheers,
      Brad

  3. Mike says:

    Regarding Fabric Failover with Hypervisor switch and UCS Manager v1.4
    my thought was that this option requires you plan the throughput of A / B fabric more than I would like. My thought is binding two NICS to the uplink (eth) of 1KV and choosing from 1 of the 17 load balancing options (vPC Host Mode) offered by 1KV would better as I do not need to decide what fabric to use as my primary fabric. I would like to hear more about QoS suggestions as 1.4 (N1KV) comes out as well as how vPC Host Mode should be configured.

    • Sal Collora says:

      Mike,

      The fabric interconnects are not vPC peers, which is the reason LACP channels are not supported down at the host. Cisco is investigating the ability to choose the fabric on the dynamic vNICs that are created in the service profile. This will make the system a little more predictable. What I am finding in my setups is that there is more than enough bandwidth to go around, and yes, while determinism is a good thing, in the end, it’s either one fabric interconnect or the other. UCS Manager tells you where the traffic is going so when it comes to troubleshooting time, you should be fine.

      I have several customers who are worried about bandwidth, and then they monitor the links and see that the 40G on one side is plenty! With an A/B setup (each dynamic vNIC is staggered), you should be fine. I don’t think vPC hashing would give you any more real visibility. Sure, it would be automatic in terms of the path, but you still wouldn’t have any real determinism. You’d have to go find what path was being taken at any given time.

      Make sense?

  4. Carl Skow says:

    Hey Brad, I was the guy you met standing in line for labs at VMworld, thanks for churning out good posts! I’m struggling with the implications to Palo a little bit, particularly on the pass-through side. Any ideas on how we should be looking at implementing VMware with Palo on 1.4?

  5. Ling says:

    Hi Brad, I just wondering is there any restriction regarding the adapter you will use in this fail-over case. Since I just read this fail-over description from below link regarding M81KR card:

    http://www.cisco.com/en/US/prod/collateral/ps10265/ps10276/solution_overview_c22-555987_ps10278_Product_Solution_Overview.html
    //
    Resilience is very important to an agile and flexible data center. Most customers use some form of NIC teaming software that needs to be provided by the NIC vendor for every OS and hypervisor. NIC teaming also requires certification for every application environment. The Cisco UCS M81KR offers fabric failover, which enables interface failover at a physical level without involving the OS or hypervisor or certification overhead.
    //

    Does Menlo card support this fail-over? Thank you very much.
    Brs.
    Ling

    • Brad Hedlund says:

      Yes, in addition to the Cisco VIC (M81KR), the Emulex & QLogic cards with the Menlo chip also support fabric failover.

      Note that the newer Gen2 Emulex & Qlogic cards do not have a Menlo chip, and as a result do NOT support fabric failover.

      The Broadcom & Intel adapters do NOT support fabric failover either.

      Given that the Cisco VIC is the same price as the other adapters, you should always default to using the Cisco VIC, unless your specific application support matrix requires the Emulex or QLogic drivers in the stack.

  6. Ling says:

    Hi Brad, I’m still considering the scenario, in your fail-over case. Does this cover all fault, as below, Or is there any difference regarding the fail-over process between these?
    1. One 6100 fail;
    2. One FEX fail;
    3. Link between 6100 and FEX fail (which carry the server’s traffic);
    4. One Port one CNA adapter;
    For my understanding , I think they are the same basically . Your comments are welcome. Thank you very much.

    • Brad Hedlund says:

      Ling,
      All four failures represent a Fabric Failure, just at varying degrees.
      1) 6100 fail — fabric failure for all UCS chassis
      2) FEX fail — fabric failure for one UCS chassis
      3) 6100-FEX link fail — fabric failure for some number of servers within a UCS chassis (depending on number of servers and uplinks)
      4) One CNA port fail — fabric failure for one server

      Cheers,
      Brad

      • Ling says:

        Thanks for your quick reply, that’s what I totally agree with. ~ Have a good weekend! ;)

      • Joe Keegan says:

        Hi Brad,

        Great blog! I’ve recently started blogging and it’s quite an inspiration.

        I just want to make sure I understand things correctly for scenario 3) 6100-FEX link fail. In this case, wouldn’t the CNA port mapped to the failed FEX port just get remapped to another FEX port?

        I.e. If a FEX had two up-links and one failed, then all the CNAs mapped to the second FEX port would just get remapped to the first FEX port. This wouldn’t cause the CNAs to fail over to fabric B, right?

        I think you allude to this with you note “depending on number of servers and uplinks”, but just wanted to make sure I understood.

        Thanks,

        Joe

        • Brad Hedlund says:

          Joe,
          In short, NO, when a FEX uplink fails the CNA ports are not dynamically re-pinned to another port on the same FEX. Any CNA port that was pinned to the failed FEX port will see its link go down, which will engage Fabric Failover.

          The comment: “depending on number of servers and uplinks” was alluding to things such as; if you only had 1 FEX uplink and 8 servers, if that FEX uplink fails, all 8 servers see that as a fabric failure; if you had 4 FEX uplinks and 4 servers, if 1 uplink fails, only 1 server sees a fabric failure; if you had 4 FEX uplinks and 8 servers, if 1 uplinks fails, 2 servers will see that as a fabric failure. Etc.

          Good luck with the blogging! Look forward to reading your posts.

          Cheers,
          Brad

          • Joe Keegan says:

            Great thanks for the clarification. I’m currently taking a DCUCD class and this is contrary to what the instructor had stated, but he has been wrong on a number of things (like FCoE using CoS 5).

            I’ve learned way more from reading your blog.

            Thanks,

            Joe

      • Yury Magalif says:

        Brad,

        In the situation you described:
        —————
        1) 6100 fail — fabric failure for all UCS chassis
        2) FEX fail — fabric failure for one UCS chassis
        3) 6100-FEX link fail — fabric failure for some number of servers within a UCS chassis (depending on number of servers and uplinks)
        4) One CNA port fail — fabric failure for one server
        —————
        Quest 1 — For number 4 above, Fabric Failover will happen, correct?
        Quest 2 — If I have 2 CNAs, and one whole CNA fails, will Fabric Failover switch traffic to the other CNA?
        Quest 3 — If I have 2 VICs, and one whole VIC fails, will Fabric Failover switch traffic to the other VIC?
        Quest 4 — If the answer is NO for Questions 2 and 3 above, will only an OS based teaming driver be able to fail over traffic to the other CNA/VIC?
        Quest 5 — Is the Windows teaming driver available for CNAs or VICs with 1.4 release?

        Sincerely,
        Yury Magalif

  7. jerry says:

    Hi Brad – great stuff… I have a question for you. If you
    were starting from scratch with a new UCS setup running ESXi 4.1+
    and Palo with vDS everywhere would you use UCS fabric failover for
    ethernet or would you just let ESXi take care of the
    teaming/failover for ethernet as it will need to do for FC. I’m
    leaning towards having one throat to choke in terms of letting ESXi
    handle it as it seems to do it well rather than split up my
    failover between UCS and ESXi. I realize there is a double vNIC
    config to do if you use the ESXi failover method (and I’d do it
    once and make templates) but I’m wondering if there’s anything else
    I’m missing. Would love to hear a response either way.
    Thanks!

  8. Shahid Shafi says:

    Brad,

    I attended your breakout session today in London. I was sitting right in the front and asked couple of questions as well. I wanted to drop you a line saying you were just awesome. You clearly articulated everything and lot of topics became clearer to me.

    thanks for the magic,
    Shahid

  9. Don says:

    Brad – did 1.4 introduce the fabric failover with hypervisors? I dont see mention in the release notes.

    Thanks,

    -don

    • Brad Hedlund says:

      Hi Don,
      UCSM version 1.4 brought in a capability called “FabricSync”, which effectively makes fabric failover in conjunction with hypervisors possible. Basically, any MAC addresses of VM’s handled by a hypervisor switch are now protected by Fabric Failover, where as prior to FabricSync only the MACs of the physical server itself or VMs using VM-FEX were protected (replicated to the other fabric in advance to be specific).

      Cheers,
      Brad

  10. Sean says:

    Just wondering if there was any timeline around when version 1.4 will be available.

    Cheers,
    Sean

Leave a Reply

Your email address will not be published. Required fields are marked *