Cisco UCS Q&A #3 – flexible configuration

This is a follow-up question from the same reader Geoff who’s original question about traffic steering was discussed here.  Geoff responded to my original answer by brining up my Folly in HP vs UCS Tolly article and doubting that Cisco UCS really has active/active fabrics.

Follow-up Question:

Hi Brad,

thank you very much for your comprehensive answer. However, I still see have a couple of questions (sorry).  In Cisco documents I see only a heatbeat and syncronisation link between the two 6100s in a UCS. In your own article http://www.bradhedlund.com/2010/03/02/the-folly-in-hp-vs-ucs-tolly/ you show four 10Gb links between the two 6100s. Which is correct? By the way this is where I got the interleaving idea but now see it was blades 1-4 going to Fabric A and 5-8 going to Fabric B, and not odds and evens.

In the same article you also mention active/active fabric configuration but actually as far as each server blade is concerned it sees a active/failover configuration. There is no possibility for a dual channel server adaptor to drive with 20Gb which is what I would call active/active. (but did you say this in your UCS networking best practices talk? I will have to listen again.)  I really wonder why UCS forces separate fabrics. It might make sense for Fiberchannel storage where this is best practice, but for a pure IP environment would it not make sense to have a single fabric? But maybe it is not possible to set up a vPC cluster with a pair of 6100s.

My Follow-up Answer:

Cisco UCS provides tremendous flexibility in how you architect the system to deploy server bandwidth to meet your specific application needs.  Remember that UCS has two fabrics, and the fabric that each server vNIC uses is based on vNIC settings in the Service Profile (not the hardwires), and each server can have multiple vNIC’s each using one fabric or the other as its primary path.

Speaking of specific application needs, the Tolly Group test involved a single chassis with (6) blades, with pairs of blades sending a full 10G load between each other.  The Tolly Group tried to show that a Cisco UCS chassis was not capable of 60 Gbps of throughput.  However they made the unfortunate and fatal mistake in believing that only one fabric was active, and the other fabric was for failover only.  Wrong! Consequently, they setup each server with one vNIC using just one fabric (infact, the second fabric may not have been present).  Given that one fabric extender is 40 Gbps, of course (6) servers are not going to get 60 Gbps. Duh!

My response to Tolly’s flawed testing was simply providing an education to the Tolly Group in how they should have setup Cisco UCS to meet the criteria of their test.  This is not necessarily how every Cisco UCS configuration should be deployed.  Infact, I have yet to see any customers setting up their UCS in such a manner.  A testament to how unrealistic Tolly’s botched “gotcha” testing really was.

Most customers setup their Cisco UCS servers with (2) or more vNICs, each using alternating fabrics.  So, YES, you absolutely CAN have a server send 20 Gbps, one vNIC sending 10 Gbps on Fabric A, another vNIC sending 10 Gbps on Fabric B.  Both fabrics handling traffic for all blades.

Cisco UCS has separate fabrics for the robust high availability customers expect from a mission critical enterprise class platform.  What’s the downside with that? Especially when both fabrics are indeed ACTIVE/ACTIVE.

Disagree?

Cisco UCS Q&A #2 – End Host Mode forwarding behavior

This question comes from a reader named Wei about how Cisco UCS behaves in End Host Mode with respect to MAC learning and flooding.  Wei paints a scenario of two servers in the same VLAN, one inside Cisco UCS, the other outside of Cisco UCS.  With the Fabric Interconnect in End Host Mode, what happens when the server inside UCS tries to communicate with the server outside UCS, starting with ARP and the subsequent unicast conversation.

Question:

I recently came across your website, it is really an awesome resource.  Thanks for sharing your data center knowledge.   I learned a lot about UCS networking from the HD videos you posted on http://www.bradhedlund.com/2010/06/22/cisco-ucs-networking-best-practices/.  I do have a question regarding End Host mode and hoping you can help me with it.

Here is the scenario, the communication is with two devices on the same vlan but connected to different devices.
NIC 10 is on vlan 10 and is connected on a server port on FI 6100, NIC 20 is on vlan 10 and is connected to an upstream switch that connects to the uplink ports on the same FI 6100.  Let’s say NIC 10 ARPs for NIC 20, the frame is sent to the upstream switch via the uplink that handles the broadcast, and also other local vlan 10 server ports.  Since there are no MAC learning on the uplink ports, so when the ARP reply comes back down the uplink, does this mean NIC 20’s MAC will not be recorded in the MAC table on the FI 6100?  what happens to subsequent unicast frames from NIC 10 to NIC 20?  I know they will go up the uplink, but will they also be flooded to the local vlan 10 server ports also?

Thanks, any help would be greatly appreciated.

My Answer:

Here’s the setup: I’ll refer to the server inside Cisco UCS as Server A, and the server outside UCS as Server B.  The Cisco UCS Fabric is in End Host Mode.  Server A will initiate a conversation with Server B.  Both servers are in VLAN 10.  Server A’s primary NIC for VLAN 10 is connected to 6100-A.

When Server A issues an ARP message to learn Server B’s MAC address, this is a broadcast message that will be sent to all servers inside UCS on VLAN 10, connected to 6100-A.  Additionally, the ARP message will also be sent out the UCS uplink Server A is pinned to.  Note that 6100-A’s “Broadcast” link is an uplink it has chosen for *receiving* broadcasts, not sending broadcasts.  When a server sends a broadcast message, that broadcast will always exit UCS on the same uplink designated for all other traffic from that server, the uplink automatically chosen via dynamic pinning or statically via LAN Pin Groups.

The upstream switch will receive the broadcast on the uplink from UCS pinned to Server A, and if the upstream switch has no prior knowledge of Server A it will use this broadcast message to learn Server A’s MAC address on this interface.  The upstream switch will flood this broadcast ARP message on all other interfaces forwarding for VLAN 10 (including links connected to 6100-B).  * Note that the upstream switch will also send this broadcast back to 6100-A on all the other interfaces it has facing 6100-A.  However, only the link 6100-A has picked as its “Broadcast link” will actually process the broadcast.  6100-A will notice that the broadcast was originated from one of its own servers, and just drop it.

Server B will receive the broadcast ARP message and respond directly (unicast) back to Server A.  The upstream switch receives Server B’s (unicast) ARP response to Server A’s MAC address on VLAN 10.  Because the upstream switch has already learned which of its interfaces leads to Server A, it sends Server B’s response directly to this interface connected to 6100-A.

6100-A receives Server B’s unicast ARP response from the upstream switch on the uplink pinned to Server A.  Because Cisco UCS is in End Host Mode, it will NOT learn the MAC address & location of Server B like the upstream switch did for Server A.

6100-A knows the server port and logical interface Server A is located on because of the authoritative knowledge provided by UCS Manager.  Hence, 6100-A transmits Server B’s unicast response directly to Server A.

Server A receives the ARP response and sends a unicast message to Server B.

6100-A receives the unicast message destined to Server B.  However, 6100-A does not have any knowledge of Server B’s location on the network.  Server B is outside of UCS so there is no authoritative knowledge, no MAC table information for Server B.  Because of this, 6100-A makes the assumption that Server B must be accessible in the upstream network and sends the unicast message for Server B out of the uplink pinned to Server A (because it was sent from Server A).  The upstream switch receives the unicast message and simply transmits it directly to Server B.

In summary, Cisco UCS in End Host Mode uses this simple logic:

If I receive traffic from one of my servers destined to something I dont know about, it must be out in the network somewhere so I’ll just send it out a pinned uplink“.

If I receive unicast traffic from an uplink destined to something I dont know about, I will just drop this traffic.

I will only pay attention to broadcasts received from my servers or received on my designated Broadcast uplink

If I receive broadcast traffic from my designated Broadcast link, I will send the broadcast to my servers but not to my other uplinks.

If I receive broadcast traffic from my designated Broadcast link that originated from one of my own servers, I will just drop this traffic.

Make sense?

Cisco UCS Q&A #1 – traffic steering

This question about Cisco UCS traffic steering comes from a reader named Geoff:

There is one aspect of UCS which I have been trying to understand and have not found a good explanation. Perhaps you can point me in the right direction and maybe if it is of general interest it might form the basis for one of your great articles.
As far as I understand it the ports from the virtualization adaptors of different blades are interleaved to the fabric switches, e.g., all ”A” ports from even blades go to the left fabric switch (via the fabric extender), but the ”A” ports from odd blades go the the right fabric switch. Similarly for the ”B” ports. This seems to result in a active/active switch configuration with traffic evenly spread on both switches but how does an odd blade talk to an even blade?? The two Nexus fabric switches are separate with no VPC link between them (as far as I know). Does this blade to blade traffic have to take place via the Nexus 7000. I hope not! Probably the answer is that I have misunderstood something.

My answer:

Every dual port adapter is hard wired the same way.  One port is hard wired to Fabric A, the other port is hard wired to Fabric B.  This is the case regardless of slot placement.

For every vNIC you create, you can decide whether it uses Fabric A or Fabric B, and if you want failover.  The OS see’s the vNIC, not the physical adapter.  In theory you could have all your vNICs on all the blades configured to use just Fabric A, with Fabric B used only for failover.  In this case, all *Layer 2* blade to blade traffic would be locally handled by the Fabric A 6100.  Any Layer 3 traffic would need to be handled by an upstream L3 switch (Nexus 7000 / Cat 6500).

In most cases customers will use both Fabrics by steering some vNICs out of Fabric A, others out of Fabric B.  Should any vNIC on Fabric A need to send traffic to a vNIC on Fabric B on the same VLAN (Layer 2), Yes, that traffic would need to traverse an upstream switch to get from Fabric A to Fabric B.  If you are aware ahead of time of a group of servers that will send a lot of Layer 2 traffic between each other, you could certainly set up their vNIC profiles such that they will communicate on the same Fabric.  It wouldn’t matter what slot you placed each server because you are steering traffic via the vNIC settings, not via the hardwires.

Make sense?