Cisco UCS criticism and FUD: Answered

One of my readers recently submitted a comment asking me to respond to some criticisms he frequently hears about Cisco UCS.  This is a pretty typical request I get from partners and perspective customers, and its a list of stuff I ‘ve seen many times before, so I thought it would be fun to address these and other common criticisms and FUD against Cisco UCS in one consolidated post.  We’ll start with the list submitted by the reader and let the discussion continue in the comments section.  Sounds like fun, right? :-)

Brad,

I regularly hear a few specific arguments critiquing the UCS that I would like you to respond to, please.

1. The Cisco UCS system is a totally proprietary and closed system, meaning:

a) the Cisco UCS chassis cannot support other vendor’s blades. For example, you can’t place an HP, IBM or Dell blade in a Cisco UCS 5100 chassis.

b) The Cisco UCS can only be managed by the Cisco UCS manager – no 3rd party management tool can be leveraged.

c) Two Cisco 6100 Fabric Interconnects can indeed support 320 server blades (as Cisco claims), but only with an unreasonable amount of oversubscription. The more accurate number is two 6100s for every four (4) 5100 UCS chassis (32 servers), which will yield a more reasonable oversubscription ratio of 4:1.

d) A maximum of 14 UCS chassis can be managed by the UCS manager, which resides in the 6100 Fabric Interconnects. Therefore, this creates islands of management domains, especially if you are planning on managing 40 UCS chassis (320 servers) with the same pair of Fabric Interconnects.

e) The UCS blade servers can only use Cisco NIC cards (Palo).

f) Cisco Palo cards use a proprietary version of interface virtualization and cannot support the open SR-IOV standard.

I would really appreciate it if you can give us bulleted responses in the usual perspicacious Brad Hedlund fashion. :-)

Thanks!

This is a good list to start with.  But before we begin, lets define what constitutes valid criticism in the context of this discussion.

Criticism: something pointed out as lacking or deficient when compared to what is typically found and expected in other comparable and “acceptable” solutions.  For example, if my new commuter car didn’t have anti-lock brakes this would be a valid criticism as anti-lock brakes is a feature commonly found and expected in most newer commuter cars today.  However, if my car didn’t transform into a jet plane and fly with the press of a button, is that a valid criticism? No.  This is not a capability typically expected of any automobile.  Such a “criticism” is pointless.

OK, lets get started…

1) “Cisco UCS chassis cannot support other vendor’s blades”

This is one of my favorites.  If someone brings this up you know right away you’re dealing with someone who is either A) joking, or B) has no idea what they’re talking about.  Anybody who has set foot in a data center in the last 7 years knows that Vendor X’s blade chassis are only populated with Vendor X’s blade servers, and … <GASP> yes! Cisco UCS chassis are only populated with Cisco UCS blades. Shame on Cisco! LOL.

Before the IBM guys jump out of their seat, Yes, I am aware that 3rd party blade servers can be made to fit into an IBM blade chassis.  While that’s a cute little check box to have on your data sheet, the actual implementation of this is extremely rare.  Why? It just doesn’t make any sense to do this, especially with commodity x86 hardware.

When was the last time you saw Vendor X’s blade server in Vendor Y’s blade chassis?  Exactly.  This is not a valid criticism.  Case closed.

2) “Cisco UCS can only be managed by the Cisco UCS manager – no 3rd party management tool can be leveraged.”

If “managed” means: The basic baseboard level management of the blade itself (BIOS settings, firmware, iLO, KVM, virtual media, etc.), in other words, everything needed to get the blade up and functionally booting an OS — Well, yes, this of course is true and again its no different than the other market leading vendors.  Example, the HP c7000 chassis requires that you have at least one HP management module present in every chassis to manage the blades (HP Onboard Administrator).  Furthermore, to aggregate management across multiple c7000 chassis you are required to have HP management software performing that function as well, HP Systems Insight Manager.  This is true of the other blade vendors as well (DELL, IBM).  You have their management software and modules managing their hardware.  So help me understand, how is this a valid criticism against Cisco UCS?

If “managed” means: a higher level capability set such as.. auditing, provisioning, historical statistics, life cycle management, alerts and monitoring, etc. — this is actually where Cisco UCS sets itself apart from the other vendors in being more “open” and eco-system friendly.  Unlike the others, Cisco UCS provides an extremely powerful and open XML API that any 3rd party developer can customize their solution to.  Consider the fact that the UCS Manager GUI is just a browser based front-end to the same XML API that 3rd party developers are interfacing with.  Its entirely possible to provision and manage an entire UCS system with 3rd party software, and never once using the UCS Manager GUI.

There are many examples this open XML API management integration with Cisco UCS, but here are just a few:

Why isn’t there an iPhone app yet to manage HP BladeSystem, or DELL, or IBM? Answer: Without a consolidated and open API to interface with this would be a tremendously complex effort. Compare that to the Cisco UCS iPhone app that was developed by just one Cisco SE (Tige Phillips) in his spare time!

If an amateur programmer can write an iPhone app to manage Cisco UCS in his spare time, imagine what a team of cloud savvy programmers can accomplish?  Example: Check out what newScale is doing with Cisco UCS.

So, as for the claim: “… no 3rd party management tool can be leveraged.”?  We can dismiss that one as being totally false.

3) “Two Cisco 6100 Fabric Interconnects can indeed support 320 server blades (as Cisco claims), but only with an unreasonable amount of oversubscription. The more accurate number is two 6100s for every four (4) 5100 UCS chassis (32 servers), which will yield a more reasonable oversubscription ratio of 4:1″

The statement “unreasonable amount of oversubscription” is pure speculation.  Oversubscription requirements will vary per customer deployment depending on factors of desired scale, bandwidth, and cost.  The trade off between bandwidth and scale is no secret and is simply a fact of life with any vendor solution, its not something unique to UCS.  More bandwidth means more network ports, more switches, higher cost.  More oversubscription means higher scale at lower costs.

Next, what does “oversubscription” really mean anyway?  For some, it might be very blade chassis centric where they calculate the ratio of total bandwidth provisioned to the servers in a chassis compared to the total uplink bandwidth available to that chassis.  In these calculations, each Cisco UCS chassis of 8 servers can be provisioned for a max of 80 Gbps, or a minimum of 20 Gbps.  When you provision the minimum 20 Gbps of uplink bandwidth per chassis you can in theory*** achieve the scale of 320 servers per Fabric Interconnect. (40 chassis dual homed to a pair of 40-port fabric interconnects)

Example: If I have a modest provisioning of 10 Gbps per server (that’s a lot, actually), and the minimum of 20 Gbps of chassis uplink bandwidth — that’s a <GASP> “more reasonable” 4:1 oversubscription ratio for 320 servers! ;-)

For others, “oversubscription” might mean the ratio of bandwidth a server must share not only to exit the chassis, but rather the total amount of bandwidth each server shares to reach the Layer 3 core switch.  Again, this is a universal bandwidth/scale/cost design trade-off across all vendors, not just Cisco.  This kind of exercise requires taking a look at the total solution including servers, chassis, access switches, core switches, LAN and SAN.

Here’s a simple example of achieving 4:1 oversubscription from every server to the LAN core*, and 8:1 to the SAN core**.  You could have (8) UCS chassis each with 8 servers provisioned for 10 Gbps of LAN bandwidth, 4 Gbps of SAN bandwidth.  Each chassis is wired for the maximum of 80 Gbps providing 1:1 at the chassis  uplink level.  So, now we have 64 servers at 1:1 that we need to uplink to the SAN and LAN core.  To get 4:1 to the LAN core*, and 8:1 to the SAN core**, we need to have (16) 10GE uplinks and (8) 4G FC uplinks from each Fabric Interconnect.  We’ll take those uplinks and connect them to 1:1 non-oversubscribed ports at the SAN and LAN core.

The result: 64 servers each provisioned for 10GE with 4:1 oversubscription to the LAN core*, and 8:1 to the SAN core**.  All of this fits into a single pair of UCS 6140 Fabric Interconnects.  You could treat this as a discreet “Pod”.  As you need to scale out more servers at similar oversubscription, you stamp out more similarly equipped pods.

Want 4:1 to both the LAN and SAN core?  Scale back to (6) UCS chassis and (48) servers per Fabric Interconnect, and provision more FC uplinks to the SAN core.  Its the classic scale vs. bandwidth design trade off applicable to any vendor solution.

*Side note: The LAN oversubscription from Server to Core is actually 2:1 with both fabrics available, and 8:1 with one fabric completely offline.  For the sake of discussion lets just average it out to 4:1.

**Side note: The SAN oversubscription from Server to Core is actually 4:1 with both fabrics available, and 8:1 with one fabric completely offline.

***Side note: The current hardware architecture of Cisco UCS can fit 40 chassis and 320 servers underneath a single pair of 6140 fabric interconnects. However, the number of chassis per fabric interconnect officially supported by Cisco at this time is 20. This number started at 5 and continues to go up with each new major firmware release.  Reaching 40 supported chassis is only a matter of time.

4) “A maximum of 14 UCS chassis can be managed by the UCS manager, which resides in the 6100 Fabric Interconnects. Therefore, this creates islands of management domains, especially if you are planning on managing 40 UCS chassis (320 servers)”

Correction: as of the most recent UCS Manager 1.4 release you can now manage a maximum of 20 chassis.

This one always cracks me up because it somehow tries to say that a single point of management for 14 or even 20 chassis is somehow a BAD thing? LOL! :-D

What’s the alternative?  With HP, IBM, or DELL, (20) chassis is exactly (20) islands of management, and each island has multiple things you need to manage on it (chassis switches and management modules).  What about the LAN and SAN access switches connecting the (20) chassis? Yep, you need to manage those too.

Compare that to the (1) management island per (20) chassis from a single interface and single data set managing settings and policies for all of the servers including LAN & SAN. ;-)

5) “The UCS blade servers can only use Cisco NIC cards (Palo)”

This is simply not true.  From the very beginning customers have had the choice of several non-Cisco adapters.  In fact, the Cisco adapter (Palo) wasn’t available for almost a year after the initial release of Cisco UCS.  As of the recent UCS Manager 1.4 release, several more adapters have been added to the portfolio of choices.

The adapters Cisco UCS customers can choose from:

Case closed.

6) “Cisco Palo cards use a proprietary version of interface virtualization and cannot support the open SR-IOV standard”

The Cisco Palo card accomplishes interface virtualization in way that’s completely transparent to the OS — This is done through simple standards based PCIe.  There’s nothing proprietary happening here at all.  When installed into the server, the Cisco Palo card appears to the system like a PCIe riser hosting multiple standard PCIe adapters.  In other words, Cisco has effectively obsoleted the need for SR-IOV with the design of the Cisco VIC (Palo).  There’s nothing stopping any other vendor from using the same transparent PCIe based approach to interface virtualization.

With SR-IOV, on the other hand, the OS needs to be SR-IOV aware.  You need to have the proper SR-IOV drivers and extensions loaded, etc.  Why complicate the solution with additional complexity when you can achieve the same goal (interface virutalization) in a way that’s completely transparent to the OS and adapter drivers?  This obviates the need for any additional “standard” layered into the solution.

By the way, there’s nothing preventing you from using an SR-IOV adapter with Cisco UCS.  For example, the new Intel 82599 adapter for UCS supports PCI SIG SR-IOV.  If you want SR-IOV really bad, use that adapter.


OK! That was a good round. Now go ahead and hit me with your best shot in the comments section. Please, keep your comments to one or two concise paragraphs. If you have a whole bunch of stuff to throw at me, break it up into multiple comments if you can.  If you submit a really good one, I’ll promote it into the article content.

For my HP, IBM, and DELL friends out there (you know who you are) — Guys, there’s no need to submit comments pretending to be a disappointed customer.  Just cite your real name and vendor disclosure and lets have an honest and forthright discussion like the gentlemen we all are.  No need for games.

Also, please keep in mind that I am employed by Cisco Systems, so I do need to exercise discretion in what I can and cannot say.  I’m sure you can understand.

Cheers & Happy New Year!
-Brad



Disclaimer:  The views and opinions expressed are those of the author, and not necessarily the views and opinions of the author’s employer.  The author is not an official media spokesperson for Cisco Systems, Inc.  For design guidance that best suites your needs, please consult your local Cisco representative.

Routing over Nexus 7000 vPC peer-link? Yes and No.

This is a Nexus 7000 design question that comes up from time to time:

In a Nexus 7000 Vpc environment, how can I form a layer 3 adjency between the two switches. Lets say I want to run OSPF and want to create two SVIs on the two switches connected via Vpc, Will the neighborship relation be formed over the Vpc Peer link or is the peer link only designed for control traffic for Vpc.

Some people believe that in order to form an L3 adjacency between two Nexus 7000 vPC peer switches you must provision a separate link (other than the peer link) to use for L3 routing.  This is not true.  You absolutely can use the existing vPC peer link to form a routing adjacency between two vPC peer Nexus 7000′s.

I believe the source of confusion comes from a vPC design caveat that gets condensed and passed on as simply: “No L3 routing over vPC”.

Not knowing any better, if you take that statement at face value its easy to see how one might believe that you cannot form a routing adjacency between the two Nexus 7000′s over the vPC peer link, and therefore seek to provision a new “non-vPC” inter-switch link for that purpose.

Let’s take a minute to look at what works, and what doesn’t.  I’m not going to bore you with all the technical details.  Instead we’ll take a look at six simple diagrams with brief explanations.

Diagram #1 below shows two Nexus 7000′s configured as vPC peers with a single inter-switch link between them, the vPC peer link.  The two Nexus 7000′s are configured for OSPF and are using an SVI associated to a VLAN on the peer-link to form the L3 adjacency.  This VLAN for the L3 adjacency should only be forwarded on the peer-link.  Do not forward this routing VLAN on the vPC member ports (such as toward the L2 Switch shown in the diagram).

L3 peering over vPC peer-link

In the diagram above, we have a Layer 3 switch or router upstream configured for OSPF.  This L3 switch is attached to each Nexus 7000 with two normal point to point Layer 3 interfaces, no port channels, no vPC.

This design works.  This design has been discussed in official Cisco design documents and is supported by Cisco TAC.


So where does the “No L3 routing over vPC” come from anyway? Simply put, this comes from the vPC design caveat that you should NOT have an external device (firewall, router, switch) forming a routing protocol adjacency with the Nexus 7000′s over the vPC peer link.

Diagram #2 below shows an L3 switch running OSPF attached with a vPC to the two Nexus 7000′s and attempting to form an OSPF adjacency with each Nexus 7000. This design does NOT work.

Router attached via vPC - peering with Nexus 7000

This design does NOT work because some of the OSPF routed traffic from the L3 switch to the Nexus 7000′s will traverse the vPC peer-link (even when no ports or links are failed). As a result, this traffic will be dropped as a result of loop prevention logic in the Nexus 7000 hardware.

Side note: For the more vPC savvy folks out there who might be wondering “Does the new vPC Peer Gateway” feature make this design work? The answer is No.


Let’s look at Diagram #3 below. Here’s another example of an external device building a routing protocol adjacency with the Nexus 7000′s, this time its firewalls. The firewalls are singly attached (no vPC) to a VLAN that is forwarded on the Nexus 7000′s vPC peer link. The firewalls are running OSPF and attempting for form an adjacency with the each Nexus 7000.  This design too does NOT work.

Firewalls running OSPF attached to Nexus 7000's running vPC

This design does NOT work for the same reason as Diagram #2. Each firewall will form an OSPF adjacency with both Nexus 7000′s. This means that some OSPF routed traffic will traverse the vPC peer-link (even when no ports or links are failed). As a result, this traffic will be dropped.

Each firewall see’s both 7K1 and 7K2 as directly adjacent OSPF neighbors. If Firewall-1 chooses 7K-2 as the next-hop, this traffic will traverse the peer-link with the loop prevention bit set. If 7K-2 realizes the packet is destined for a vPC member port it will drop the traffic if the corresponding vPC member port on 7K-1 is up. If the packet was not destined for a vPC member port it would be forwarded normally.


To make the above Diagram #3 work, we can provision a new inter-switch link between the Nexus 7000′s that is NOT a vPC peer-link. On this new link we will forward a VLAN that is not forwarding on the vPC peer-link. This makes it a non-vPC VLAN. From here we can attach our firewalls running OSPF to the Nexus 7000′s on the non-vPC VLAN. See Diagram #3B below.

Firewalls with OSPF attached to non-vPC VLANs

This design works because each firewall will form an adjacency with both Nexus 7000′s, however the OSPF routed traffic will not traverse the vPC peer-link and not be subject to any loop prevention logic.
Keep in mind that non-vPC VLANs cannot be forwarded on vPC member ports. So if you have another device that is vPC attached and needs to be Layer 2 adjacent to the firewalls, that will not work. You will need to get that other device attached to the non-vPC VLAN as well on a normal non-vPC connection.


If your firewalls are not running a routing protocol there is no problem at all. Simply have static routes, or a static default route, pointing to the HSRP VIP address on the Nexus 7000′s.
See Diagram 4 below. This design works.

Firewall with static routes connected to Nexus 7000's running vPC

In this case each Nexus 7000 will locally forward traffic sent to the HSRP VIP address (even if its the “Standby” switch). Therefore no static routed traffic will ever traverse the vPC peer-link, and there will be no problems.


Finally, let’s look at Diagram #5 below where we have external devices running a routing protocol attached to the Nexus 7000′s. Moreover, these devices are attached with a vPC. That shouldn’t work, right? Well, lets take a look. This design DOES work.

Routing peers using vPC for transit only

This design works because these devices are not attempting to form an OSPF adjacency with the Nexus 7000′s. Rather, these external routing devices are forming an adjacency only with each other, and simply using the vPC topology as a transit. Under normal conditions when no links or ports are failed, there will be no traffic traversing the peer link and no problems.


Summary

The Nexus 7000 hardware has loop prevention logic that drops traffic traversing the peer link (destined for a vPC member port) when there are no failed vPC ports or links.  Normally peer-link traffic is non-existent in a normal network and this is never a problem for attaching normal Layer 2 switches or servers.  However when external devices attempt to form a routing adjacency with the Nexus 7000′s over the vPC peer-link, some traffic destined for a vPC member port can be forced over the peer-link even when the network is healthy, causing traffic to be dropped by the loop prevention logic.

Best practice:

  • Attach external routers or L3 switches with L3 routed interfaces.
  • It’s OK to use the vPC peer-link to form a routing adjacency between the two Nexus 7000′s.  Use a VLAN dedicated to the routing adjacency and only forward this VLAN on the peer-link, not on the vPC member ports.
  • Use the ‘passive-interface default’ command in your routing protocol to prevent a routing adjacency on all the other VLANs.
  • If attaching external devices on a Layer 2 port running a routing protocol with the Nexus 7000′s (e.g. firewall running OSPF), provision a new non-vPC inter-switch link, and attach the device to non-vPC VLANs.
  • Use static routes to the HSRP gateway address on external devices such as firewalls and load balancers.  Do not run routing protocols on these devices unless absolutely necessary.
  • Read the Cisco vPC best practices design guides

Great questions on FCoE, VN-Tag, FEX, and vPC

I received some really good questions about FCoE, VN-Tag, FEX, and vPC from a reader named Lucas.  Although I had 10 other things to do, I just couldn’t resist highlighting these questions, and my answers, in a new post that I thought my readers would enjoy!

Brad,
You have amazing information about Nexus and UCS on your website. Please keep up the good work. I have a few queries would appreciate if you could please point me in the write direction.

1) FCOE with Vpc, How does this work. For Fcoe we must login to one fabric only, how will the load balancing offerend by Vpc effect it?

From the perspective of the CNA installed in the server, you have to keep in mind that its really two different logical adapters hosted on one physical adapter, Ethernet & FC.  The FC logical adapter on the CNA has no visibility or awareness of vPC – it still see’s two individual paths, one left, one right, and doesn’t behave any differently with or without vPC.  More specifically, a dual-port CNA will actually have (2) logical FC adapters, each one with their own port, and each port typically connected to a separate fabric.  The Ethernet logical adapters however are vPC aware and will treat the two paths as a single logical 20GE pipe for all the traffic it is handling (all the non-FCoE stuff).

UPDATE: Diagram below added for visual aid.


2) VN-tag, is the tag applied by a vmware machine or only a fex(i/o in ucs or fex 2000) Can apply this tag?

VN-Tag is a modern day version of a virtual cable that connects a virtual NIC, or virtual HBA, hosted on an NIV capable adapter to an upstream virtual ethernet or virtual FC port on an NIV capable switch.  In the case where the server does not have an NIV capable adapter, VN-Tag can also be used to connect a physical port on a fabric extender (FEX) to an upstream virtual ethernet port.

In a nutshell, an NIV capable adapter will apply the VN-Tag as traffic egresses from one of its virtual adapters.  Any FEX in the path will just pass that traffic upstream to the switch terminating the other end of the virtual cable (VNTag).  In this case you could think of the FEX as a virtual patch panel for virtual cables.

If you connect a plain non-NIV adapter to a fabric extender (FEX), it will be the FEX that applies the VN-Tag.  In this sense, you still have a cable, but half of it is physical (server-to-FEX), and the other half is virtual (FEX-to-switch).  In this case, you could think of the FEX as a physical to virtual media converter.

3) In FIP why do we need multiple Macs ( FPMA), I understand that FPMA will relieve the switch from creating a mapping between fcid and mac, but other than that why does the standard talk about multiple FC_LEPs on a single port. I am assuming each lep would need a separate mac, I am having a hard time visualizing it in real life.

Similar to Question #1 above, it helps to understand that the server CNA is really hosting two (or more) different logical adapters, Ethernet and FC.  Each logical adapter will have its own link identity (MAC/WWN).  Given that the actual physical medium is Ethernet, the logical FC adapter can’t use a WWN on the medium, so it uses a Ethernet MAC instead which will be the FC_LEP (link end point).  As you point out, its most efficient when the FCoE switch can automatically provide this MAC to the Server, for administrative ease.  Known as FPMA (fabric provided MAC address).

The same concepts hold true for the upstream switch.  The FCoE switch is really two different switches hosted on one hardware platform, an Ethernet switch and a FC switch.  The FC logical switch needs to look at the FC frames carried within the FCoE packets to process fabric logins and make a forwarding decision.  In order to do that, it needs to receive the FCoE frames, decapsulate, make a decision, and re-encapsulate into FCoE again if necessary.  The FC logical switch has an FC_LEP for this very reason, so that it can send and receive Ethernet frames carrying FC payload (FCoE).

If you could only read one book on FCoE to better understand these concepts, it would certainly be this one:  http://www.ciscopress.com/bookstore/product.asp?isbn=158705888X

4) In a 2232PP FEX why is the straight through design preferred? will the fcoe break if we did active/active design?
Please assist.
Thanks,
Lucas

As we discussed in Question #1, the logical FC adapters on the server CNA are oblivious to vPC.  As a result, attaching a server CNA with vPC makes no difference to how FCoE is forwarded via two separate paths to two separate fabrics.  However, this is not the case with a FEX or a switch which will forward the FCoE traffic on whatever Ethernet topology you place it on.  If this topology includes a vPC that spans two different fabrics, then you will have FCoE traffic from one of your logical FC adapters landing on both fabrics.  This could be confusing to determine where FCoE traffic is going, as well as breaking the traditional FC best practice of SAN A/B isolation.  Although you certainly could do this, it’s just not a supported design right now.

As a result, as of right now Cisco does not recommend that you place FCoE traffic on an Ethernet topology spanning two fabrics (A/B, Left/Right, etc.).  Therefore, if your Nexus 2232 FEX will be carring FCoE traffic from CNA’s, you should NOT vPC attach the 2232 FEX to two different upstream Nexus 5000′s.  Additionally, if your two upstream Nexus 5000′s are connected together for vPC, you should NOT forward FCoE VLANs on the vPC peer link.  This will keep your FCoE forwarding deterministic and preserve two separate SAN fabrics.  You haven’t lost any redundancy because your servers are all dual-attached to separate 2232 FEX’s which are each attached to separate Nexus 5000′s.

In the end you have something that looks like this:

Image from: Data Center Access Design Guide, Chapter 6

The above diagram was taken from the Data Center Access Design Guide, Chapter 6

Make sense?

Thanks for the great questions!  Keep’em coming :-)

Cisco UCS Fabric Extender (FEX) QoS

What is the role of the fabric extender (FEX) in Cisco UCS QoS? This question was posted as a comment to my recent article VMware 10GE QoS Design Deep Dive with Cisco UCS, Nexus — as well as here, at the Cisco Support community forums.

Brad:

A comment to test my understanding and then a question to follow-up…

The UCS ecosystem leverages a port aggregation solution to chassis I/O, namely, the FEX modules.

The FEX modules are not fully featured switches. Nor do they possess any forwarding policy intelligence at all. Instead, the FEX modules deploy a “pinning” approach in which downlinks (those that face the blade server’s NIC’s, LOMs, mezzanine cards) are mapped to an uplink port (those that face a 6100 Fabric Interconnect) to form what can be described as an aggregator group.

The result is a simplified approach to blade I/O in which the traffic patterns are predictable and failover is deterministic. Moreover, there is no need to configure STP because the ports are uplinked in a manner as to preclude any possibility of a bridging loop.

This having been said, is there some merit to the argument that this port aggregation design creates a hole — a discontinuity — in the middle of a QoS deployment because the scheduling of packets on the uplink ports facing the 6100 Fabric Interconnect is not performed in a manner that recognizes priority? In other words, no QoS on the FEX.

To elaborate a bit more, one can have a VMware deployment and leverage NetIOC or perhaps configure QoS on a 1000v switch (whose uplink ports are mapped to a port on the Palo VIC) and configure QoS on the VIC, and then on the 6100 Fabric Interconnect. But, since the FEX is not scheduling traffic to the 6100 Fabric Interconnect according to any priority, the QoS deployment has a hole in the middle, so to speak.

Thoughts?

My Answer:

Unlike the Cisco UCS Fabric Interconnect and the Virtual Interface Card (Palo) that each have (8) COS-based queues, the FEX has (4) queues, of which only (3) are used.  One FEX queue is used for strict priority Control traffic for the FI to manage the FEX and adapters.  The second FEX queue is for No Drop traffic classes such as FCoE. The third FEX queue is used for Drop classes (all the other stuff).  While each queue independently empties traffic in the order it was received (FIFO), the No Drop queue carrying FCoE is FIFO as well but is serviced for transmission on the wire with a guaranteed bandwidth weighting.

One could look at that and say: between the FI, FEX, and Adapter, the FEX is the odd device out sitting in the middle with inconsistent QoS capabilities from other two, creating a “hole” or “discontinuity” in the Cisco UCS end-to-end QoS capabilities.  That’s a fair observation to make.

However, before we stop here, there is one very interesting and unique behavior the FEX exhibits that’s entirely applicable to this conversation:

When the FEX gets congested on any interface (facing the FI or Adapters), it will push that congestion back to the source, rather than dropping the traffic.  The FEX does this for both the Drop and No Drop traffic classes.  The FEX will send 802.1Qbb PFC pause messages to the FI and NIV capable adapters (such as Menlo or Palo).  For non-NIV capable adapters such as the standard Intel Oplin, the FEX will send a standard 802.3X pause message.

At this point its up to the device receiving the pause message to react to it by allowing its buffers to fill up and apply its more intelligent QoS scheduling scheme from there.  For example, both the Fabric Interconnect and Palo adapter would treat the pause message as if its own link was congested and apply the QoS bandwidth policy defined in the “QoS System Class” settings in UCS Manager.

Side note: The Gen2 Emulex and Qlogic adapters are NIV capable, however they do not honor the PFC pause messages sent by the FEX for the Drop classes, it will keep sending traffic that may be dropped in the fabric.  The Gen1 Menlo and Palo adapters do honor the PFC message for all classes.

What this means is that while the FEX does not have the same (8) queues of the Fabric Interconnect or Palo adapter, the FEX aims to remove itself from the equation by placing more of the QoS burden on these more capable devices.  From both a QoS and networking perspective, the FEX behaves like a transparent no-drop bump in the wire.

Is it perfect? No.  In the ideal situation the FEX, in addition to pushing the congestion back, it would also have (8) COS-based queues for a consistent QoS bandwidth policy at every point.

Is it pretty darn good? Yes! :-)  Especially when compared to the alternative 10GE blade server solutions that have no concept of QoS to begin with.

Cisco Nexus 7000 connectivity solutions for Cisco UCS

Last summer I was invited by the Nexus 7000 product management team at Cisco to help co-author a whitepaper covering general guidelines and best practices for network integration of Cisco UCS with Cisco Nexus 7000.  The idea was to take a lot of the content already presented in my video series Cisco UCS Networking Best Practices (in HD), extract the material most relevant to Cisco UCS + Nexus 7000, and publish a narrative with diagrams in a whitepaper format.

I am pleased to announce that as of today this whitepaper is now the official Cisco publication:

Cisco Nexus 7000 Series Connectivity Solutions for the Cisco Unified Computing System

In summary, this whitepaper discusses the following topics:

  • Nexus 7000 bandwidth and density complimenting Cisco UCS deployments
  • Cisco UCS network connectivity overview
  • Cisco UCS End Host Mode vs. Switch Mode
  • Why End Host Mode is the preferred (and default) mode of operation
  • Why vPC uplinks from Cisco UCS to Nexus 7000 are preferred
  • Traffic patterns and failure scenarios with vPC uplinks to Nexus 7000
  • Why attaching Cisco UCS without vPC to Nexus 7000′s configured for vPC should be avoided
  • No vPC? No problem!  Best practices when connecting Cisco UCS to Nexus 7000 without vPC
  • Connecting Cisco UCS to separated Layer 2 networks
  • Connecting Cisco UCS to networks with Nexus 5000 and Nexus 7000 using vPC
  • Why connecting Cisco UCS to a Spanning Tree influenced Layer 2 access topology should be avoided.
  • Summary of Cisco UCS + Nexus 7000 networking best practice recommendations
  • Cisco Nexus 7000 architectural advantages for Cisco UCS connectivity
    • Hitless ISSU, Stateful process restarts, Stateful supervisor switchover
    • N+1 and Grid level power supply redundancy
    • End of row L2/L3 connectivity for high density compute pods
    • Scalability for large deployments, 128,000 MAC addresses – hardware learning
    • Infrastructure consolidation with virtual device contexts (VDC)
    • Support for next generation switching fabrics with FabricPath, and TRILL
    • SAN/LAN infrastructure consolidation with future support for FCoE & FCF

What’s NOT covered in this whitepaper:

  • Connecting Cisco UCS to Nexus 7000 FabricPath networks
  • Guidance on choosing Nexus 7000 F1 or M1 series linecards for Cisco UCS connectivity
  • FCoE uplinks from Cisco UCS to Nexus 7000

The above items not covered in this whitepaper may be the subject of future blogs here and/or additional Cisco whitepapers and CVD‘s.  However, I will take this opportunity to write a few comments on each subject.

Connecting Cisco UCS to Nexus 7000 FabricPath networks

Nexus 7000 switches configured for FabricPath have a new switch port mode available called, you guessed it, a FabricPath port.  These are the ports that directly connect to other FabricPath capable switches and must be explicitly configured as such.

interface Ethernet 1/1

description Connection to FabricPath network

switchport mode fabricpath

All other standard non-FabricPath ports are referred to as “Classic Ethernet” ports that normal switches and servers connect to without any knowledge or awareness of FabricPath.  This is the default port setting.

The Cisco UCS fabric interconnect is not a FabricPath aware switch, and as such should be connected to the Nexus 7000 on a normal “Classic Ethernet” port, in either End Host mode or Switch mode (end host mode is still preferred).  The Nexus 7000 may be participating in a larger FabricPath network upstream, but this fact is completely transparent to Cisco UCS or any other device attached to a normal “Classic Ethernet” port.

interface Ethernet 2/1

description Connection to Cisco UCS

switchport mode trunk

spanning-tree port type edge trunk

The Nexus 7000 “Classic Ethernet” ports can still be configured for vPC, so the best practice recommendation of connecting Cisco UCS to Nexus 7000 with vPC uplinks in End Host mode still applies, with or without FabricPath.

The Nexus 7000 configured for FabricPath has an enhancement to normal vPC, called vPC+ which basically makes the Nexus 7000 vPC domain appear as one Switch ID to the rest of the FabricPath network.  This is helpful in preventing the thrashing of Switch ID’s in the FabricPath forwarding tables, but has nothing to do with how Cisco UCS connects to the network.

In a nutshell, connecting Cisco UCS to a Nexus 7000 FabricPath network has little impact in how you would normally connect Cisco UCS.  Just make sure you’re connecting Cisco UCS to a normal “Classic Ethernet” port on the Nexus 7000.

More in this later…

Guidance on choosing Nexus 7000 F1 or M1 series linecards for Cisco UCS connectivity

First lets understand the some of the key differences in terms of price and capabilities…

The Nexus 7000 M1 series are the normal Layer 2 and Layer 3  capable linecards available since the beginning with an 80 Gbps connection to the switch fabric and 4:1 oversubscribed at the front panel 32 ports.  Additionally, the M1 series linecard support hardware learning for 128,000 MAC addresses, and roughly 1 million IP routes.  The M1 linecard Layer 3 capabilities and MAC scalability provides flexibility that is both simple and scalable, but at twice the price of the F1 linecard for an equivalent 32-ports of 10GE.  If price is more important than density, an 8-port non-oversubscribed M1 linecard is available for almost half the price of the 32-port card.

The Nexus 7000 F1 series is a new 32-port 10GE linecard that supports Layer 2 forwarding only with a 230 Gbps connection to the switch fabric and line rate non-blocking forwarding (320 Gbps) for all Layer 2 flows local to the linecard.  Additionally, the F1 linecard supports FabricPath and is FCoE ready.  Every two front panel ports are serviced by a switch on chip (SoC) that supports hardware learning for 16,000 MAC addresses.  If you simply spread all VLANs across all ports (all SoC), the entire linecard supports 16,000 MAC addresses.  With careful planning, you can try to isolate VLANs to fewer ports, and therefore expose the MAC addresses in those VLANs to fewer SoC.  The extreme case would be keeping any given VLAN unique to only one SoC, resulting in the F1 linecard supporting 256,000 unique MAC addresses (16 SoC’s each with 16K unique MACs).

Side note: When the F1 linecard receives traffic that needs Layer 3 switching, it will forward that traffic across the internal fabric to an M1 linecard (if one exists) for the Layer 3 lookup and forwarding.

Which linecard is best for Cisco UCS connectivity?  Each is a good choice with pros & cons, so it really depends on what’s more important to you: cost, scalability, flexibility, bandwidth, over-subscription, etc.

You might choose the M1 linecard under these criteria:

  • Scalability with simplicity, e.g. 128,000 MAC’s with no special planning.
  • You are linking Cisco UCS to the Aggregation layer Nexus 7000 where Layer 3 switching is required.
  • Consistency and simplicity of local forwarding for Layer 2 and Layer 3 flows.
  • Line rate non-oversubscribed forwarding for all Layer 2 and Layer 3 flows (8-port M1)
  • Low cost & low over-subscription more important than port density (8-port M1)

You might choose the F1 linecard under these criteria:

  • You are linking Cisco UCS to an Access/Edge Nexus 7000 where only Layer 2 switching is required.
  • You are linking Cisco UCS to a Nexus 7000 at the Edge of a FabricPath network.
  • Low over-subscription, low latency, for all end-to-end Layer 2 flows is a concern.
  • Both port density and cost are key concerns
  • MAC scalability is not a concern

In my experience, most customers connect their Cisco UCS to the Aggregation layer (this makes sense if you view the fabric interconnect as the Access layer).  Of those customers, given the choice, most choose the M1 linecard, except for those where cost, low latency, and low over-subscription for Pod-to-Pod layer 2 forwarding is a key concern.

Some customers are beginning to deploy Nexus 7000 in both the Access (end of row) and Aggregation layers for density requirements and to prepare themselves for FabricPath.  These customers are connecting their Cisco UCS fabric interconnects to the Nexus 7000 Access/Edge switch which is Layer 2 only by design, so the F1 linecard there is a no-brainer.

More on this later…

FCoE uplinks from Cisco UCS to Nexus 7000

There isn’t a lot of detail that can be discussed right now because two things still need to happen. But I think I can give you a hint of where this is heading.

  1. Nexus 7000 software (NX-OS) support for Fibre Channel forwarding (FCF)
  2. Cisco UCS Manager software support for FCoE uplinks

The key word in both items is software – Meaning, no new hardware that isn’t already available today will be required.
When these software capabilities arrive, we will begin to see topologies where Cisco UCS can link to a common pair of Nexus 7000′s that provide both the LAN and SAN infrastructure. The holy grail of unified fabric consolidation at both the access and aggregation layers starts to become a real world reality.

More on that later too…



Disclaimer:  The views and opinions expressed are those of the author, and not necessarily the views and opinions of the author’s employer.  The author is not an official media spokesperson for Cisco Systems, Inc.  For design guidance that best suites your needs, please consult your local Cisco representative.