Cisco UCS Q&A #2 – End Host Mode forwarding behavior

Filed in Cisco UCS, Q&A by on October 22, 2010 20 Comments

This question comes from a reader named Wei about how Cisco UCS behaves in End Host Mode with respect to MAC learning and flooding.  Wei paints a scenario of two servers in the same VLAN, one inside Cisco UCS, the other outside of Cisco UCS.  With the Fabric Interconnect in End Host Mode, what happens when the server inside UCS tries to communicate with the server outside UCS, starting with ARP and the subsequent unicast conversation.

Question:

I recently came across your website, it is really an awesome resource.  Thanks for sharing your data center knowledge.   I learned a lot about UCS networking from the HD videos you posted on http://bradhedlund.com/2010/06/22/cisco-ucs-networking-best-practices/.  I do have a question regarding End Host mode and hoping you can help me with it.

Here is the scenario, the communication is with two devices on the same vlan but connected to different devices.
NIC 10 is on vlan 10 and is connected on a server port on FI 6100, NIC 20 is on vlan 10 and is connected to an upstream switch that connects to the uplink ports on the same FI 6100.  Let’s say NIC 10 ARPs for NIC 20, the frame is sent to the upstream switch via the uplink that handles the broadcast, and also other local vlan 10 server ports.  Since there are no MAC learning on the uplink ports, so when the ARP reply comes back down the uplink, does this mean NIC 20’s MAC will not be recorded in the MAC table on the FI 6100?  what happens to subsequent unicast frames from NIC 10 to NIC 20?  I know they will go up the uplink, but will they also be flooded to the local vlan 10 server ports also?

Thanks, any help would be greatly appreciated.

My Answer:

Here’s the setup: I’ll refer to the server inside Cisco UCS as Server A, and the server outside UCS as Server B.  The Cisco UCS Fabric is in End Host Mode.  Server A will initiate a conversation with Server B.  Both servers are in VLAN 10.  Server A’s primary NIC for VLAN 10 is connected to 6100-A.

When Server A issues an ARP message to learn Server B’s MAC address, this is a broadcast message that will be sent to all servers inside UCS on VLAN 10, connected to 6100-A.  Additionally, the ARP message will also be sent out the UCS uplink Server A is pinned to.  Note that 6100-A’s “Broadcast” link is an uplink it has chosen for *receiving* broadcasts, not sending broadcasts.  When a server sends a broadcast message, that broadcast will always exit UCS on the same uplink designated for all other traffic from that server, the uplink automatically chosen via dynamic pinning or statically via LAN Pin Groups.

The upstream switch will receive the broadcast on the uplink from UCS pinned to Server A, and if the upstream switch has no prior knowledge of Server A it will use this broadcast message to learn Server A’s MAC address on this interface.  The upstream switch will flood this broadcast ARP message on all other interfaces forwarding for VLAN 10 (including links connected to 6100-B).  * Note that the upstream switch will also send this broadcast back to 6100-A on all the other interfaces it has facing 6100-A.  However, only the link 6100-A has picked as its “Broadcast link” will actually process the broadcast.  6100-A will notice that the broadcast was originated from one of its own servers, and just drop it.

Server B will receive the broadcast ARP message and respond directly (unicast) back to Server A.  The upstream switch receives Server B’s (unicast) ARP response to Server A’s MAC address on VLAN 10.  Because the upstream switch has already learned which of its interfaces leads to Server A, it sends Server B’s response directly to this interface connected to 6100-A.

6100-A receives Server B’s unicast ARP response from the upstream switch on the uplink pinned to Server A.  Because Cisco UCS is in End Host Mode, it will NOT learn the MAC address & location of Server B like the upstream switch did for Server A.

6100-A knows the server port and logical interface Server A is located on because of the authoritative knowledge provided by UCS Manager.  Hence, 6100-A transmits Server B’s unicast response directly to Server A.

Server A receives the ARP response and sends a unicast message to Server B.

6100-A receives the unicast message destined to Server B.  However, 6100-A does not have any knowledge of Server B’s location on the network.  Server B is outside of UCS so there is no authoritative knowledge, no MAC table information for Server B.  Because of this, 6100-A makes the assumption that Server B must be accessible in the upstream network and sends the unicast message for Server B out of the uplink pinned to Server A (because it was sent from Server A).  The upstream switch receives the unicast message and simply transmits it directly to Server B.

In summary, Cisco UCS in End Host Mode uses this simple logic:

If I receive traffic from one of my servers destined to something I dont know about, it must be out in the network somewhere so I’ll just send it out a pinned uplink“.

If I receive unicast traffic from an uplink destined to something I dont know about, I will just drop this traffic.

I will only pay attention to broadcasts received from my servers or received on my designated Broadcast uplink

If I receive broadcast traffic from my designated Broadcast link, I will send the broadcast to my servers but not to my other uplinks.

If I receive broadcast traffic from my designated Broadcast link that originated from one of my own servers, I will just drop this traffic.

Make sense?

About the Author ()

Brad Hedlund is an Engineering Architect with the CTO office of VMware’s Networking and Security Business Unit (NSBU), focused on network & security virtualization (NSX) and the software-defined data center. Brad’s background in data center networking begins in the mid-1990s with a variety of experience in roles such as IT customer, systems integrator, architecture and technical strategy roles at Cisco and Dell, and speaker at industry conferences. CCIE Emeritus #5530.

Comments (20)

Trackback URL | Comments RSS Feed

  1. Ambi says:

    Great explanation

    after reading this i however have a doubt … say for example the link connecting the FI to the upstream switch is not a trunk but an access link and i have 2 such access link one connecting vlan 10 and another connecting vlan 20

    Now lets say server B in vlan 20 needs to send traffic to UCS server A again in vlan 20, the arp is flooded over the FI ports (will just be on the link configured with vlan 20) but would be dropped by the UCS as the broadcast link chosen was the one connecting via vlan 10.

    is my understanding correct or will there be 2 incoming broadcast links choseni (one per vlan).

    Ambi

    • Brad Hedlund says:

      Ambi,
      What you have described is not a valid setup for this simple reason:
      All UCS uplinks are trunks that carry all VLAN’s defined in the Fabric Interconnect.
      Therefore, it doesn’t matter which uplink the FI chooses as the link to receive broadcasts, because all uplinks are exposed to all VLANs relevant to UCS.

      Good question!

      Cheers,
      Brad

  2. Dan says:

    I wonder why is it UCS specific feature? Why can’t it be implemented on Nx5K?

    Anyway, it looks very much like vSwitch or HP VirtualConnect.

  3. Dani says:

    Hi,

    what about PIN group?
    If one interconnect is dual attached to uplink switches (of course for resiliency) and the uplink to PIN group fail what happens to traffic (in case of not having HW failover)?
    Traffic in re-pinned automatically or just i’ve lost connectivity?
    Maybe is better not to create e PIN group and let UCS to PIN automatically?

    tnx
    Dan

    • Brad Hedlund says:

      Dani,
      If an uplink in a Pin Group fails, vNICs assigned to that PIN group will:
      A) see their vNIC link state down
      … or
      B) if Fabric Failover is enabled on the vNIC it will be pinned to the same Pin Group on the second Fabric Interconnect

      The reason for Pin Groups is to provide partioned BW for the vNICs you choose. If you had an application that needed its own bandwidth, not shared with other traffic, you would use Pin Groups for that.

      If you dont have any such requirements, I agree, there’s no need to configure Pin Groups. Just let UCS do dynamic pinning.

      Cheers,
      Brad

      • Dani says:

        Brmad i have to thank you for answering.
        Not so bad to have a choice of forcing pinning and have a kind of fallback if ‘primary’ pin group fail.
        Of course is just a suggestion for future implementation for having better and better control of UCS.

        I must say that UCS is incredibly flexible and i think is really a good product.

        tnx
        Dan

    • Troy says:

      Thanks for all of your help!

      I’m replying to Dan’s question because it is similar to the setup we’d like to put forward.

      The FIs would both have port channels to our primary datacenter switch, but we’d like to run a single long-range connection for each FI to a 2nd switch for resiliency. We would only want to use the 2nd switch in the event the primary switch failure. I thought we could possibly assign everything to a “DC-switch” pin group, but in the event of failure of every member of that pin group it would possibly take the only path left via the 2nd switch. It would be ideal if we could assign a “cost” to the uplinks. I’m new to UCS so perhaps this functionality exists, and I just haven’t heard about it.

      I.E. on your video series Part 7a: End Host mode Pinning (http://www.youtube.com/watch?feature=player_embedded&v=Ru-1TztpfU8) I’m thinking of the scenario where the uplink number 4 (and it’s counterpart on 6100 B) that is manually pinned for oracle would go down. Would vEth1 be down or would it just use uplink 1?

      Thanks again!
      Troy

  4. VIJAY SHEKHAR says:

    HI Brad,

    I posted a very similar question a few days back on another article of yours, however this article is much closer to my question.

    From your following statement, it looks like FI does rely of UCM manager to let it know about Southbound MACs, is that the Only method it uses? or it learns MAc address like a switch does?:

    Statement:
    “6100-A knows the server port and logical interface Server A is located on because of the authoritative knowledge provided by UCS Manager. Hence, 6100-A transmits Server B’s unicast response directly to Server A.”

    Q: How does FI (And UCS manager) know about the MAC addresses that are dynamically assigned to the VM running on ESX hosts which are connected via Nexus 1000v.

    My subsequent question is going to be based on my understanding of this piece.
    Thanks!

    • Brad Hedlund says:

      Hi Vijay,
      MAC addresses from VM’s running on Nexus 1000V (or any other hypervisor switch) are learned by the FI, like a typical switch would do.

      MAC addresses of the physical server — or MAC addresses of VM’s connected via HW VN-Link (VM FEX) — these MACs are known via authoritative knowledge provided by UCS Manager and VMware vCenter, respectively.

  5. VIJAY SHEKHAR says:

    Thanks Brad,
    Hmm,
    Okay so If FI in Host mode is learning Southbound MACs like a switch, then the frames for unknown destinations would flood on all the Ports, Right?
    Combine it with the fact that FI does not Learning MAC for upstream devices how does nothbound frame forwarding happen?
    -Vijay Shekhar

    • Brad Hedlund says:

      Vijay,
      Nope, still no flooding. If the FI receives a frame on an uplink destined to an unknown MAC, that traffic will be dropped.
      If the FI receives a frame on a Server port or Appliance port destined to an unknown MAC, that traffic will be forwarded upstream on an uplink.

      • VIJAY SHEKHAR says:

        Brad,
        If FI (in Host mode) recieces a frame for unknown destination, it could mean 2 things.
        a.) MAC address of another ESX VM machine that it does not know
        OR
        b.) MAC address of upstream L3 Hop for which it Does not Learn MAC,

        How does the FI Differentiate between the two and take a decision if it has to flood the Frame to all Server ports OR forward it to Pinned Uplink Port?

        Thanks!
        -Vijay Shekhar.

        • Mohan says:

          Depends on which port the broadcast frame comes in on.

          • Mohan says:

            I have to clarify:

            In the case of unknown unicast traffic:

            1) If the frame enters from the north, it is dropped
            2) if the frames enters from the south, it is flooded on the designated broadcast uplink port and all server ports except the one it was received on.

            In the case of broadcast traffic, the rules are:

            1) For traffic ingressing from north, an additional check is performed on the source address, and if it is one the servers attached to the FI, then this traffic dropped. Otherwise, it is broadcast toward the south (servers).
            2) For traffic ingressing from the south, it is sent to all server ports except the one that it was received on and the designated broadcast uplink port to the north.

  6. joe says:

    hi
    my 6100-A have two FC(port1 and port2) uplink connect to Brocade, when i disable port 1,repinning isn’t work,ALl server have lose lun,

  7. Nagaraj Dhandapani says:

    Hi Brad,
    I have a query on UCS +ESX setup, we have UCS 1 chassis, connecting to FI-A and FI-B and they have one uplink each towards LAN (connected to Cisco 6500), 4 ports from FI-A connected to FEX and confugured as server ports (same done for FI-B). have got 2 FC cables connected to SAN switch A and connectivity ends at Clarrion 120 (same done for SAn switch B – running parallel). this is a boot from SAN setup. we are able to boot ESX from LUN. I am to see the ESX booted up in the KVM session of the server 1/1. we have given the management ip address for the service profile is of range (172.20.24.XX/24) and have used 8 ips for that. Here the issue comes, i have used the 9th ip 172.24.20.9 to my ESX and the gw is 172.24.20.254. Local LAN connectivity is fine, as i can reach any of the ip from 172.24.20.1 till .8. and traceroute is also fine. But my 172.24.20.9 is not reachable (ping failed). i have only one uplink towards cisco 6500 and it is configured as TRUNK at its side. the managaement vlan is configured as 20 and it is NOT Native vlan. i am using a PIN group which is pinned to the uplink towards cisco 6500 (this is not required however i tried to get it work) but not successful. The PINg to ESX should be simple and striaght forward. Am i missing any thing. thanks in advance.

    • Brad Hedlund says:

      The management IP address range for the blades maps to the out of band management Ethernet port on the FI. Therefore, you shouldn’t assign server OS IP addresses to that same subnet. Try assigning your ESX host an IP address that belongs to a VLAN configured in UCS, not the management network.

      • Nagaraj Dhandapani says:

        Hi Brad, thanks for your reply. but we have an another UCS setup aligning Vblock 0, in which we see both the management ip and the ESX host ip are of the same range/subnet and under vlan 20. we dont find any issues with that. added to this, this design document is verified by VCE team. throw some light on this issue. I need a strong reason to ask another subnet from the client to resolve the issue.thanks

        • Brad Hedlund says:

          Wait a minute, are you talking about the management IP address for the ESX host, or the management IP address for the blade server BMCs assigned in UCSM?

          • nagarajdhandapani says:

            I was talking about both the ip address.I have assigned 172.24.20.x/24 for for BMC and 172.24.20.y/24 for ESXi host. however I was able to ping the BMC one but not the ESX host IP from the LAN network.

            Good new is, the issue is resolved now. the problem was with LAN switch connecting to the FI. the mgmt VLAN was blocked on the FC port at the swith side.

            Thanks for your comment which gave an idea to test further on this issue.

Leave a Reply

Your email address will not be published. Required fields are marked *