Great questions on FCoE, VN-Tag, FEX, and vPC

Filed in FCoE, FEX, Nexus, NIV, Q&A, vPC by on December 9, 2010 42 Comments

I received some really good questions about FCoE, VN-Tag, FEX, and vPC from a reader named Lucas.  Although I had 10 other things to do, I just couldn’t resist highlighting these questions, and my answers, in a new post that I thought my readers would enjoy!

Brad,
You have amazing information about Nexus and UCS on your website. Please keep up the good work. I have a few queries would appreciate if you could please point me in the write direction.

1) FCOE with Vpc, How does this work. For Fcoe we must login to one fabric only, how will the load balancing offerend by Vpc effect it?

From the perspective of the CNA installed in the server, you have to keep in mind that its really two different logical adapters hosted on one physical adapter, Ethernet & FC.  The FC logical adapter on the CNA has no visibility or awareness of vPC – it still see’s two individual paths, one left, one right, and doesn’t behave any differently with or without vPC.  More specifically, a dual-port CNA will actually have (2) logical FC adapters, each one with their own port, and each port typically connected to a separate fabric.  The Ethernet logical adapters however are vPC aware and will treat the two paths as a single logical 20GE pipe for all the traffic it is handling (all the non-FCoE stuff).

UPDATE: Diagram below added for visual aid.


2) VN-tag, is the tag applied by a vmware machine or only a fex(i/o in ucs or fex 2000) Can apply this tag?

VN-Tag is a modern day version of a virtual cable that connects a virtual NIC, or virtual HBA, hosted on an NIV capable adapter to an upstream virtual ethernet or virtual FC port on an NIV capable switch.  In the case where the server does not have an NIV capable adapter, VN-Tag can also be used to connect a physical port on a fabric extender (FEX) to an upstream virtual ethernet port.

In a nutshell, an NIV capable adapter will apply the VN-Tag as traffic egresses from one of its virtual adapters.  Any FEX in the path will just pass that traffic upstream to the switch terminating the other end of the virtual cable (VNTag).  In this case you could think of the FEX as a virtual patch panel for virtual cables.

If you connect a plain non-NIV adapter to a fabric extender (FEX), it will be the FEX that applies the VN-Tag.  In this sense, you still have a cable, but half of it is physical (server-to-FEX), and the other half is virtual (FEX-to-switch).  In this case, you could think of the FEX as a physical to virtual media converter.

3) In FIP why do we need multiple Macs ( FPMA), I understand that FPMA will relieve the switch from creating a mapping between fcid and mac, but other than that why does the standard talk about multiple FC_LEPs on a single port. I am assuming each lep would need a separate mac, I am having a hard time visualizing it in real life.

Similar to Question #1 above, it helps to understand that the server CNA is really hosting two (or more) different logical adapters, Ethernet and FC.  Each logical adapter will have its own link identity (MAC/WWN).  Given that the actual physical medium is Ethernet, the logical FC adapter can’t use a WWN on the medium, so it uses a Ethernet MAC instead which will be the FC_LEP (link end point).  As you point out, its most efficient when the FCoE switch can automatically provide this MAC to the Server, for administrative ease.  Known as FPMA (fabric provided MAC address).

The same concepts hold true for the upstream switch.  The FCoE switch is really two different switches hosted on one hardware platform, an Ethernet switch and a FC switch.  The FC logical switch needs to look at the FC frames carried within the FCoE packets to process fabric logins and make a forwarding decision.  In order to do that, it needs to receive the FCoE frames, decapsulate, make a decision, and re-encapsulate into FCoE again if necessary.  The FC logical switch has an FC_LEP for this very reason, so that it can send and receive Ethernet frames carrying FC payload (FCoE).

If you could only read one book on FCoE to better understand these concepts, it would certainly be this one:  http://www.ciscopress.com/bookstore/product.asp?isbn=158705888X

4) In a 2232PP FEX why is the straight through design preferred? will the fcoe break if we did active/active design?
Please assist.
Thanks,
Lucas

As we discussed in Question #1, the logical FC adapters on the server CNA are oblivious to vPC.  As a result, attaching a server CNA with vPC makes no difference to how FCoE is forwarded via two separate paths to two separate fabrics.  However, this is not the case with a FEX or a switch which will forward the FCoE traffic on whatever Ethernet topology you place it on.  If this topology includes a vPC that spans two different fabrics, then you will have FCoE traffic from one of your logical FC adapters landing on both fabrics.  This could be confusing to determine where FCoE traffic is going, as well as breaking the traditional FC best practice of SAN A/B isolation.  Although you certainly could do this, it’s just not a supported design right now.

As a result, as of right now Cisco does not recommend that you place FCoE traffic on an Ethernet topology spanning two fabrics (A/B, Left/Right, etc.).  Therefore, if your Nexus 2232 FEX will be carring FCoE traffic from CNA’s, you should NOT vPC attach the 2232 FEX to two different upstream Nexus 5000’s.  Additionally, if your two upstream Nexus 5000’s are connected together for vPC, you should NOT forward FCoE VLANs on the vPC peer link.  This will keep your FCoE forwarding deterministic and preserve two separate SAN fabrics.  You haven’t lost any redundancy because your servers are all dual-attached to separate 2232 FEX’s which are each attached to separate Nexus 5000’s.

In the end you have something that looks like this:

Image from: Data Center Access Design Guide, Chapter 6

The above diagram was taken from the Data Center Access Design Guide, Chapter 6

Make sense?

Thanks for the great questions!  Keep’em coming :-)

About the Author ()

Brad Hedlund is an Engineering Architect with the CTO office of VMware’s Networking and Security Business Unit (NSBU), focused on network & security virtualization (NSX) and the software-defined data center. Brad’s background in data center networking begins in the mid-1990s with a variety of experience in roles such as IT customer, systems integrator, architecture and technical strategy roles at Cisco and Dell, and speaker at industry conferences. CCIE Emeritus #5530.

Comments (42)

Trackback URL | Comments RSS Feed

  1. Brad,

    Are you really sure about the #1 (FCoE over vPC)? FCoE is just one application on top of Ethernet (whereas LAG is within the MAC layer), so FCoE should be load-balanced across LAG (and vPC is just a special instance of a LAG). Obviously each FCoE session will be pinned to a particular port of LAG, but I don’t think FCoE forwarding should be any different from any other frame forwarding … or maybe I’m missing something?

    Thanks,
    Ivan

    • Brad Hedlund says:

      Ivan,
      Your question inspired me to update the post with a diagram, see above.

      The view that “FCoE is just one application on top of Ethernet” is a fair and accurate observation to make as FCoE frames *traverse the network*. However, that is not an accurate statement for FCoE frames sent directly to or from the server CNA.

      It helps to visualize the dual-port server CNA as having (2) logical FC adapters each owning one CNA port (see the newly added diagram). Therefore, any frames egressing a logical FC adapter can only be placed on one physical CNA port. So what you have is two logical FC adapters each singly homed to a separate switch. Also consider that LAG has never been a concept for FC adapter drivers and software.
      Again, for a dual port CNA, the same concept of (2) single port logical Ethernet adapters also exists. However, the concept of LAG for Ethernet adapters is well known and available for many popular drivers and OS packages and configuring the 803.2ad LAG team on these logical Ethernet adapters is fairly easy. Any traffic egressing the logical Ethernet adapters is load-balanced as you describe. However, this is not true of FCoE traffic because the logical FC adapters are not using the services of the logical Ethernet adapters, rather they are using the services of the physical CNA ports. Therefore, despite having configured a LAG team on the logical Ethernet drivers, it has no effect on how FCoE frames exit the CNA.

      On the upstream switch, the virtual FC port is associated to a virtual Port Channel. However, when binding a virtual FC port to a Port Channel, the system expects to see only one local physical member port, or the virtual FC interface will not come up. As a result the virtual FC port can only place FCoE frames on one Ethernet link facing the CNA. Therefore, the presence of vPC on the switch has no effect in how FCoE frames are transmitted to the server CNA.

      Hope that clarifies?

  2. Makes perfect sense. Not that I would agree with how it’s done, but one can’t change the facts of life ;)

    Thanks for the explanation and the extra diagram!
    Ivan

    • Brad Hedlund says:

      Ivan,
      This was the easy part … the server CNA to first access switch. The real fun begins when we start to discuss whether it makes sense, or not, to have FCoE traverse vPC or FabricPath uplinks to the upstream network. :-)

      Cheers,
      Brad

      • Emmanuel Garcia says:

        Hi Brad, I know this topic is 3 years old but I’m having some doubts about vPC and FCoE between 2 N5k and 2 FI. Hope you can help me to understand.

        I already understand how the CNA sees 2 individual paths for FCoE, and in a dual port one vHB is associated with one vFC on FI-A and something similar happen with FI-B… but here comes the question.

        How FI-A handles multiple vFC (associated with vHBAs of many blade servers) to the upstream vPC formed by the 2 N5k?

        • Brad Hedlund says:

          Hi Emmanuel,
          By default the UCS FI will pin each vFC to an FCoE Uplink, and act an N_port Proxy (NPV). So the upstream switch (N5K in this case) will see multiple N_port FLOGIs on a single FC/FCoE interface. That’s OK, presuming your N5K has NPIV enabled (which is the default, I believe), it will allow that.

          • Emmanuel Garcia says:

            Thanks for your quick answer.

            But if the FCoE is attached to a vPC, Which physical link will be used?

          • Brad Hedlund says:

            You have FCoE uplinks from the UCS FI as a vPC (a single port channel with member links landing on two different switches)?
            That’s not a supported design. As least last time I checked..

  3. Joe Smith says:

    Brad, in one of the Nexus white papers you mention in one of your blog posts (cant remember which one), it says the following:

    “vPCs enable full, cross-sectional bandwidth utilization among LAN switches, as well as between servers and LAN switches.”

    This has always confused me. Cant you run an active/active NIC teaming without vPC? That will give you full cross sectional bandwidth.

    My reasoning is the following:

    A server with dual uplinks to 2 separate switches is not like a switch with dual uplinks. There’s no loop with a server because a server does not forward broadcasts, like an Ethernet switch does. Also, with SLB server NIC teaming from Broadcom, only one uplink sends broadcasts. Nonetheless, even if both uplinks did, there still wouldnt be a loop because a server doesnt behave like a switch.

    Am I wrong?

    Thanks

    • Brad Hedlund says:

      Joe,
      In cases where you have non-virtual servers (Windows or Linux on the bare metal), without vPC between the server and LAN switches you don’t have true active/active forwarding. The server may transmit on both links, but will only receive traffic on one link. This is because the server MAC address can only be present on one link at a time. vPC takes those two links and makes it one logical link, therefore traffic can be received on both physical links.

      In cases where you have virtualization hosts (VMware for example), you can obtain active/active forwarding without vPC because some virtual machines will be pinned to one link, while other virtual machines are pinned to the other link, and failing over to the remaining link if one link fails. Keep in mind that if one link fails here the virtualization host will need to alert the upstream network that the affected virtual machines are now using a new link. This is accomplished through a gratuitous ARP message, as many are required (one message for each VM).

      However, vPC to a virtualization host does provide benefits too in that each individual virtual machine can use each physical link, rather than just one. Furthermore, when one of those links fail the virtualization host does not need to alert the upstream network with gratuitous ARP messages, because the single logical link created by vPC remains in tact. The network topology did not change despite a link failure. vPC hides link failures from the network forwarding tables. Hence with vPC to a virtualization host the bandwidth of all links is better utilized and link failures are handled much more efficiently (overall fewer moving parts).

      Hope that clarifies.

      Cheers,
      Brad

      • Joe Smith says:

        Brad, thank you kindly for that informative answer. Would you believe that I have posed that very question to server administrators and they couldn’t answer it? I also posted the question on Cisco’s NetPro forum, but all I heard were crickets chirping. lol…

        Outstanding blog!

  4. Brad,
    I have actually been working on getting a lab setup, in cooperation with Cisco, with this configuration and one question that has come up that hasn’t been answered is if Boot from SAN would work with vPC?

    So far, we have been unable to get the Qlogic 81xx series adapters to see any storage from the BIOS to enable Boot from SAN.

    Would love to hear any experiences or thoughts you have on this “little” additional layer of complexity.

    Thank you.

    • Brad Hedlund says:

      William,
      Boot from SAN works with or without vPC. As pointed out in this article, vPC is not exposed to the virtual FC adapters hosted on the CNA. More often than not Boot from SAN problems are narrowed down to firmware on the storage array or CNA adapter. When you get it resolved, I’ll hope you come back and let us know what it was.

      Cheers,
      Brad

  5. Andre Gustavo Lomonaco says:

    Hi Brad,

    Can I use the same topology of Figure 27 when I have the servers with 2 CNAs connected to Nexus 5000 using vPC without Nexus 2000 (Only Nexus 1000) ??

    My Best Regards,

    Andre Lomonaco

  6. Joe Smith says:

    Brad, is the VN-Tag number a function of the vNIC or is it married to the source MAC address and configured policies of the traffic originated by the VM? In other words, the NIV’s vNIC will tag VM traffic with a VN-Tag number on its way to the network, but what happens when the VM is migrated to another machine – will that VN-Tag number follow the VM or will the VM’s traffic get tagged with a different tag number?

    I have a hunch that the vNIC to vETH port mapping is static – meaning that the VN-Tag number is static and so the virtual infrastructure (virtual cable) that is created is also static. But if that’s the case, what happens when you vMotion the VM to another physical server – how are the policies migrated to the new vETH port?

    I hope I’m not rambling…. :-)

    • Brad Hedlund says:

      Joe,
      The VN-Tag number is not at all associated to any MAC address. The actual VN-Tag number used is negotiated between the controlling switch and port extender on a per physical switch port basis. Meaning, the VN-Tag only needs to be unique per physical port, or a grouping of physical ports (depending on hardware).

      So, if you’re using HW VN-LINK with UCS, (we’re now calling it VM FEX), and you migrate that VM from one host to another – it may or may not have the same VN-Tag at the destination host — but that doesn’t matter. The VN-Tag is of no significance to anything you care about.

      Think of VN-Tag as the virtual representation of a cable. If you move a server from point A to point B, does it make any difference if you use the same old cable, or grab a new one? Of course not. The same is true for VN-Tag. There is no reason a network or server admin would ever care about VN-Tag numbers. All of that is managed under-the-hood by the system for provisioning purposes.

      How are policies migrated? With VM FEX, the virtual machine is associated to a Port Group with policies and settings, defined in UCS Manager. When the VM moves to a new host it remains associated to the same Port Group. Pretty straight forward stuff.

      -Brad

      • Joe Smith says:

        Brad, thanks for the answers. I figured there isn’t much value in knowing the actual VN-Tag number since my assumption was that the virtual infrastructure that’s created using this tagging construct is static anyway. I asked, though, because I want to understand how things work, not just what it does. Perhaps it’s just the engineer in me. Unfortunately, I do not have a UCS at my disposal to configure and experiment with.

        I’m still a bit unclear in terms of policy migration in HW VN-Link, i.e.VM FEX. A port group is a product of the vSwitch construct, correct? If, say, a 1000v has a port profile configured with all its associated security and vlan characteristics, that profile is translated as a port group in vCenter. Moreover, the VM is associated to that port group. Simple enough. When a VM is migrated from one host to another in the same vMotion cluster, the VM will remain attached (bound) to the same vethernet port on the 1000v. Therefore, the port group to which that vethernet is bound also remains the same.

        But when one performs a HW VN-Link (HW FEX), the NIV capabilities of Palo are leveraged. My understanding is that the hypervisor is either bypassed altogether (VM Direct I/O), in which case vMotion is not possible because the hypervisor no longer has authoritative dominion over the VM, OR the 1000v simply acts as a pass-through that does noting more than aggregate the traffic from the downlinks to the uplinks, which are attached to the vNICs on the Palo. So, with the absence of a port profile and its associated port group (no vswitch construct being leveraged anymore), where does the VM’s policies reside?

  7. Joe Smith says:

    Dont worry about answering this. I got the answers I need. Thanks.

  8. Joe Smith says:

    Sure, it’s best to simply provide the link to the thread on the Cisco NetPro forumnwhere I asked the same question.

    https://supportforums.cisco.com/message/3268434#3268434

  9. Joe Smith says:

    Brad, care to comment on the info? Anything to add?

    • Brad Hedlund says:

      Joe,
      After reading through the forum thread it looks like Manish was able to answer all of your questions and I am in full agreement. No surprise there, Manish is the guy I go to when I have questions as well.

      I encourage everybody to read that forum thread.

      Sorry if I’m slow to respond to questions sometimes. Unfortunately, blogging is not my day job that pays the bills — but I’ll do my best to respond within a few days even during the busy times.

      Thanks again for being an awesome contributor here.

      Cheers,
      Brad

  10. Eddy says:

    Hi Brad,
    I wonder why Cisco recommends from your last picture there, from nexus5k to SAN switch/MDS, only port channel as a straight connection, not a cross connection for redundancy purpose. Is it different behavior between SAN and LAN there, or is there a limitation in MDS so that Cisco recommends to use straight connection?
    thx

    • Brad Hedlund says:

      Eddy,
      You could do that if you want, but you wouldn’t have any SAN A/B isolation anymore. The reason for deploying two separate SAN fabrics is to prevent problems in one switch or fabric from affecting all storage connectivity.

      • Eddy says:

        Hi brad, thank you for your response.
        But, can you explain more about the SAN Aand B isolation. I’m not quite clear with your statement.
        Even if I cross connect the link from fabric to MDS, the MDS are still 2 separate device? is this right?
        MDS1 MDS2
        | / |
        | / |
        | / |
        FIA FIB

        Let’s say I assign 4 vHBAs, 2 vHBAs to FIA and 2 vHBAs to FIB. All using dynamic pinning. From host perspective, when FIA fail, the connectivity should be alright because there are 2 vHBAs to FIB. If MDS1 which is fail, the connectivity should be fail over to straight link FIB to MDS2. Am I missing something here with the concept?
        Note: the connection from MDS1 and MDS2 are also crossed between the storage port.
        thank you

        • Brad Hedlund says:

          Eddy,
          The design you sketched would work, but has some added complexity that leads to more difficult troubleshooting and doesn’t increase your level of redundancy. What am I referring to? You dont need to connect each FI to each MDS. You dont gain any redundancy from doing that, and you’ll need to figure out which vHBA has been pinned to which MDS when troubleshooting a performance issue.

          If each server has (2) vHBAs, one associated to each FI, you already have redundancy. So keep things simple and deterministic by connecting each FI to only one MDS, not both.

          Cheers,
          Brad

  11. Jean Kobben says:

    Hello Brad,

    i’m breaking my head to figure out if it’s possible to make FCoE working in a Cisco UCS 1.4 environment, connected to Nexus 5000 Switches and a NetApp Storage attached to the Nexus Switches. Goal is to use FCoE on all infrastructure levels.
    Can i use FCoE northbound to the Nexus Switches and connect them to the SAN with FCoE, or do i have to connect the Favric Interconnects directly to the SAN if i want to use FCoE protcoll.

    Regards
    Jean

  12. Jacob says:

    I’m in the middle of an FCOE implementation where we are using Nexus 5596 UP, 2232PP, and Qlogic CNA. Depending on who you ask some people will tell you utilizing VPC to a VMware host is best practice and some will say absolutely not. I was trying to read the support forum URL included above, but it’s not working. Does anyone have the updated link?

  13. Brad:

    When doing FCoE with vPC, is it best to bind the vfc to the port-channel? That’s the only way we can get it to work, but when we do it, ESXi no longer sees paths on one of the HBA’s.

    Thanks!

    Brandon

    • Brad Hedlund says:

      Brandon,
      If I recall correctly, yes, you bind the vfc to the port-channel. Is your vPC configuration ok? If you type ‘show vpc brief’ on both switches it shows all port channels to be up?

    • Iain says:

      Is your channel mode set to “on”? You can’t use LACP with boot from SAN because the VFC interface is bound to the port channel. This presents a bootstrapping issue because LACP cannot negotiate and bring up the Po interface until the server has booted and the server cannot boot (from SAN) because the vfc never comes up.

      Now, this being said, I’m currently investigating a new feature which was supposed to be added with N5K version 5.1(3)N1(1). I *think* this may add support for Boot from SAN and vPC (with LACP). From what I can tell they’re calling it “Enhanced vPC”, but I’m still a little foggy on the details. What I really want is boot from SAN with a true 5K LACP port-channel to Nexus 1000V on my hosts (with Qlogic 8152).

      http://www.cisco.com/en/US/docs/switches/datacenter/nexus5000/sw/fcoe/513_n1_1/b_Cisco_n5k_fcoe_config_gd_re_513_n1_1_chapter_0100.html#concept_373ABC38E1B64D629AC6D06B90A6BCE3

      • Pat Colbeck says:

        It is designed to do exactly that. It lets you bind the VFC to a physical Ethernet NIC even if that NIC is part of a port channel. Thus you get rid of the problem of trying to pull yourself up by your own boot laces with boot from SAN and LACP.

  14. Pat Colbeck says:

    This does seem to remove one level of redundacny. If you have two two port ethernet adapter you can connect one port from each ethernet adaptor to each Nexus 5K in a left right pair and run a port channel across all four links, This protects you from either the failure of a NIC or a of a Nexus 5K or even from failure of a 5K and an entire two port NIC at the same time. One you are limited to a port channel of 1 per 5K then if you get a NIC failure and a 5K failure then boom no connectivty at all. Not that likely but some customers like a lot of redundancy.

  15. Ryan Kather says:

    Brad, Excellent article. I have a question, which perhaps is just me missing a step because I don’t see how it could be an issue. If you have vpc for your server to fabric extenders, and those are uplinked to one nexus 5k, how does the server no to fail over connections in the event of the 5k failure? It seems that carrier sense would perist on the CNA with the extender still online.

    Thank you for any clarification on redundancy impact from a lost 5K.

    • Brad Hedlund says:

      Ryan,
      When you connect your server with vPC to two fabric extenders, each fabric extender will be connected to a different 5K. Or, each fabric extender will be connected to both 5Ks.

  16. Rid says:

    Dear Brad

    we have a NetApp storqage withe have two controller’s.
    each contain two FCOE port and Two FC port.
    Also we have IBM Blade Center with FCOE adapters and another IBM P series Blad center with FC adapters.

    We are trying to connect them to Nexus 5548 swathes.
    so we are trying to connect two FCOE port from each controller as vPC.

    1)Do we need to have Two vsan for two NEXUS 5548?
    2)When we cornet FC ports what is the best way of connecting them??

    Thanks

    • Fardzly says:

      Hi brad, just a simple question. Can i connect cisco 2k 10gbe to dell force10? I now if i connect to cisco 5k can work? Just wan to test connection from my san storage to server through cisco 2k10 -> dell force 10? Tq

      • Brad Hedlund says:

        No, the uplink protocols implemented by the Cisco Nexus 2000 series are Cisco proprietary. And as such you can only connect the Nexus 2K to a Cisco Nexus 5000 or 7000.

        Cheers,
        Brad

        • Fardzly says:

          Tq brad for the answer.

          What if :

          Dell server 10gb->dell force10->cisco2k10gb->cisco5k-> storage

          Is the fcoe connection can be done?

          I ve try to connect direct from the server ->2k10->5k-storage and the test sucess.

          Tq

Leave a Reply

Your email address will not be published. Required fields are marked *