Cisco UCS criticism and FUD: Answered

One of my readers recently submitted a comment asking me to respond to some criticisms he frequently hears about Cisco UCS.  This is a pretty typical request I get from partners and perspective customers, and its a list of stuff I ‘ve seen many times before, so I thought it would be fun to address these and other common criticisms and FUD against Cisco UCS in one consolidated post.  We’ll start with the list submitted by the reader and let the discussion continue in the comments section.  Sounds like fun, right? :-)


I regularly hear a few specific arguments critiquing the UCS that I would like you to respond to, please.

1. The Cisco UCS system is a totally proprietary and closed system, meaning:

a) the Cisco UCS chassis cannot support other vendor’s blades. For example, you can’t place an HP, IBM or Dell blade in a Cisco UCS 5100 chassis.

b) The Cisco UCS can only be managed by the Cisco UCS manager – no 3rd party management tool can be leveraged.

c) Two Cisco 6100 Fabric Interconnects can indeed support 320 server blades (as Cisco claims), but only with an unreasonable amount of oversubscription. The more accurate number is two 6100s for every four (4) 5100 UCS chassis (32 servers), which will yield a more reasonable oversubscription ratio of 4:1.

d) A maximum of 14 UCS chassis can be managed by the UCS manager, which resides in the 6100 Fabric Interconnects. Therefore, this creates islands of management domains, especially if you are planning on managing 40 UCS chassis (320 servers) with the same pair of Fabric Interconnects.

e) The UCS blade servers can only use Cisco NIC cards (Palo).

f) Cisco Palo cards use a proprietary version of interface virtualization and cannot support the open SR-IOV standard.

I would really appreciate it if you can give us bulleted responses in the usual perspicacious Brad Hedlund fashion. :-)


This is a good list to start with.  But before we begin, lets define what constitutes valid criticism in the context of this discussion.

Criticism: something pointed out as lacking or deficient when compared to what is typically found and expected in other comparable and “acceptable” solutions.  For example, if my new commuter car didn’t have anti-lock brakes this would be a valid criticism as anti-lock brakes is a feature commonly found and expected in most newer commuter cars today.  However, if my car didn’t transform into a jet plane and fly with the press of a button, is that a valid criticism? No.  This is not a capability typically expected of any automobile.  Such a “criticism” is pointless.

OK, lets get started…

1) “Cisco UCS chassis cannot support other vendor’s blades”

This is one of my favorites.  If someone brings this up you know right away you’re dealing with someone who is either A) joking, or B) has no idea what they’re talking about.  Anybody who has set foot in a data center in the last 7 years knows that Vendor X’s blade chassis are only populated with Vendor X’s blade servers, and … <GASP> yes! Cisco UCS chassis are only populated with Cisco UCS blades. Shame on Cisco! LOL.

Before the IBM guys jump out of their seat, Yes, I am aware that 3rd party blade servers can be made to fit into an IBM blade chassis.  While that’s a cute little check box to have on your data sheet, the actual implementation of this is extremely rare.  Why? It just doesn’t make any sense to do this, especially with commodity x86 hardware.

When was the last time you saw Vendor X’s blade server in Vendor Y’s blade chassis?  Exactly.  This is not a valid criticism.  Case closed.

2) “Cisco UCS can only be managed by the Cisco UCS manager – no 3rd party management tool can be leveraged.”

If “managed” means: The basic baseboard level management of the blade itself (BIOS settings, firmware, iLO, KVM, virtual media, etc.), in other words, everything needed to get the blade up and functionally booting an OS — Well, yes, this of course is true and again its no different than the other market leading vendors.  Example, the HP c7000 chassis requires that you have at least one HP management module present in every chassis to manage the blades (HP Onboard Administrator).  Furthermore, to aggregate management across multiple c7000 chassis you are required to have HP management software performing that function as well, HP Systems Insight Manager.  This is true of the other blade vendors as well (DELL, IBM).  You have their management software and modules managing their hardware.  So help me understand, how is this a valid criticism against Cisco UCS?

If “managed” means: a higher level capability set such as.. auditing, provisioning, historical statistics, life cycle management, alerts and monitoring, etc. — this is actually where Cisco UCS sets itself apart from the other vendors in being more “open” and eco-system friendly.  Unlike the others, Cisco UCS provides an extremely powerful and open XML API that any 3rd party developer can customize their solution to.  Consider the fact that the UCS Manager GUI is just a browser based front-end to the same XML API that 3rd party developers are interfacing with.  Its entirely possible to provision and manage an entire UCS system with 3rd party software, and never once using the UCS Manager GUI.

There are many examples this open XML API management integration with Cisco UCS, but here are just a few:

Why isn’t there an iPhone app yet to manage HP BladeSystem, or DELL, or IBM? Answer: Without a consolidated and open API to interface with this would be a tremendously complex effort. Compare that to the Cisco UCS iPhone app that was developed by just one Cisco SE (Tige Phillips) in his spare time!

If an amateur programmer can write an iPhone app to manage Cisco UCS in his spare time, imagine what a team of cloud savvy programmers can accomplish?  Example: Check out what newScale is doing with Cisco UCS.

So, as for the claim: “… no 3rd party management tool can be leveraged.”?  We can dismiss that one as being totally false.

3) “Two Cisco 6100 Fabric Interconnects can indeed support 320 server blades (as Cisco claims), but only with an unreasonable amount of oversubscription. The more accurate number is two 6100s for every four (4) 5100 UCS chassis (32 servers), which will yield a more reasonable oversubscription ratio of 4:1″

The statement “unreasonable amount of oversubscription” is pure speculation.  Oversubscription requirements will vary per customer deployment depending on factors of desired scale, bandwidth, and cost.  The trade off between bandwidth and scale is no secret and is simply a fact of life with any vendor solution, its not something unique to UCS.  More bandwidth means more network ports, more switches, higher cost.  More oversubscription means higher scale at lower costs.

Next, what does “oversubscription” really mean anyway?  For some, it might be very blade chassis centric where they calculate the ratio of total bandwidth provisioned to the servers in a chassis compared to the total uplink bandwidth available to that chassis.  In these calculations, each Cisco UCS chassis of 8 servers can be provisioned for a max of 80 Gbps, or a minimum of 20 Gbps.  When you provision the minimum 20 Gbps of uplink bandwidth per chassis you can in theory*** achieve the scale of 320 servers per Fabric Interconnect. (40 chassis dual homed to a pair of 40-port fabric interconnects)

Example: If I have a modest provisioning of 10 Gbps per server (that’s a lot, actually), and the minimum of 20 Gbps of chassis uplink bandwidth — that’s a <GASP> “more reasonable” 4:1 oversubscription ratio for 320 servers! 😉

For others, “oversubscription” might mean the ratio of bandwidth a server must share not only to exit the chassis, but rather the total amount of bandwidth each server shares to reach the Layer 3 core switch.  Again, this is a universal bandwidth/scale/cost design trade-off across all vendors, not just Cisco.  This kind of exercise requires taking a look at the total solution including servers, chassis, access switches, core switches, LAN and SAN.

Here’s a simple example of achieving 4:1 oversubscription from every server to the LAN core*, and 8:1 to the SAN core**.  You could have (8) UCS chassis each with 8 servers provisioned for 10 Gbps of LAN bandwidth, 4 Gbps of SAN bandwidth.  Each chassis is wired for the maximum of 80 Gbps providing 1:1 at the chassis  uplink level.  So, now we have 64 servers at 1:1 that we need to uplink to the SAN and LAN core.  To get 4:1 to the LAN core*, and 8:1 to the SAN core**, we need to have (16) 10GE uplinks and (8) 4G FC uplinks from each Fabric Interconnect.  We’ll take those uplinks and connect them to 1:1 non-oversubscribed ports at the SAN and LAN core.

The result: 64 servers each provisioned for 10GE with 4:1 oversubscription to the LAN core*, and 8:1 to the SAN core**.  All of this fits into a single pair of UCS 6140 Fabric Interconnects.  You could treat this as a discreet “Pod”.  As you need to scale out more servers at similar oversubscription, you stamp out more similarly equipped pods.

Want 4:1 to both the LAN and SAN core?  Scale back to (6) UCS chassis and (48) servers per Fabric Interconnect, and provision more FC uplinks to the SAN core.  Its the classic scale vs. bandwidth design trade off applicable to any vendor solution.

*Side note: The LAN oversubscription from Server to Core is actually 2:1 with both fabrics available, and 8:1 with one fabric completely offline.  For the sake of discussion lets just average it out to 4:1.

**Side note: The SAN oversubscription from Server to Core is actually 4:1 with both fabrics available, and 8:1 with one fabric completely offline.

***Side note: The current hardware architecture of Cisco UCS can fit 40 chassis and 320 servers underneath a single pair of 6140 fabric interconnects. However, the number of chassis per fabric interconnect officially supported by Cisco at this time is 20. This number started at 5 and continues to go up with each new major firmware release.  Reaching 40 supported chassis is only a matter of time.

4) “A maximum of 14 UCS chassis can be managed by the UCS manager, which resides in the 6100 Fabric Interconnects. Therefore, this creates islands of management domains, especially if you are planning on managing 40 UCS chassis (320 servers)”

Correction: as of the most recent UCS Manager 1.4 release you can now manage a maximum of 20 chassis.

This one always cracks me up because it somehow tries to say that a single point of management for 14 or even 20 chassis is somehow a BAD thing? LOL! 😀

What’s the alternative?  With HP, IBM, or DELL, (20) chassis is exactly (20) islands of management, and each island has multiple things you need to manage on it (chassis switches and management modules).  What about the LAN and SAN access switches connecting the (20) chassis? Yep, you need to manage those too.

Compare that to the (1) management island per (20) chassis from a single interface and single data set managing settings and policies for all of the servers including LAN & SAN. 😉

5) “The UCS blade servers can only use Cisco NIC cards (Palo)”

This is simply not true.  From the very beginning customers have had the choice of several non-Cisco adapters.  In fact, the Cisco adapter (Palo) wasn’t available for almost a year after the initial release of Cisco UCS.  As of the recent UCS Manager 1.4 release, several more adapters have been added to the portfolio of choices.

The adapters Cisco UCS customers can choose from:

Case closed.

6) “Cisco Palo cards use a proprietary version of interface virtualization and cannot support the open SR-IOV standard”

The Cisco Palo card accomplishes interface virtualization in way that’s completely transparent to the OS — This is done through simple standards based PCIe.  There’s nothing proprietary happening here at all.  When installed into the server, the Cisco Palo card appears to the system like a PCIe riser hosting multiple standard PCIe adapters.  In other words, Cisco has effectively obsoleted the need for SR-IOV with the design of the Cisco VIC (Palo).  There’s nothing stopping any other vendor from using the same transparent PCIe based approach to interface virtualization.

With SR-IOV, on the other hand, the OS needs to be SR-IOV aware.  You need to have the proper SR-IOV drivers and extensions loaded, etc.  Why complicate the solution with additional complexity when you can achieve the same goal (interface virutalization) in a way that’s completely transparent to the OS and adapter drivers?  This obviates the need for any additional “standard” layered into the solution.

By the way, there’s nothing preventing you from using an SR-IOV adapter with Cisco UCS.  For example, the new Intel 82599 adapter for UCS supports PCI SIG SR-IOV.  If you want SR-IOV really bad, use that adapter.

OK! That was a good round. Now go ahead and hit me with your best shot in the comments section. Please, keep your comments to one or two concise paragraphs. If you have a whole bunch of stuff to throw at me, break it up into multiple comments if you can.  If you submit a really good one, I’ll promote it into the article content.

For my HP, IBM, and DELL friends out there (you know who you are) — Guys, there’s no need to submit comments pretending to be a disappointed customer.  Just cite your real name and vendor disclosure and lets have an honest and forthright discussion like the gentlemen we all are.  No need for games.

Also, please keep in mind that I am employed by Cisco Systems, so I do need to exercise discretion in what I can and cannot say.  I’m sure you can understand.

Cheers & Happy New Year!

Disclaimer:  The views and opinions expressed are those of the author, and not necessarily the views and opinions of the author’s employer.  The author is not an official media spokesperson for Cisco Systems, Inc.  For design guidance that best suites your needs, please consult your local Cisco representative.

Routing over Nexus 7000 vPC peer-link? Yes and No.

This is a Nexus 7000 design question that comes up from time to time:

In a Nexus 7000 Vpc environment, how can I form a layer 3 adjency between the two switches. Lets say I want to run OSPF and want to create two SVIs on the two switches connected via Vpc, Will the neighborship relation be formed over the Vpc Peer link or is the peer link only designed for control traffic for Vpc.

Some people believe that in order to form an L3 adjacency between two Nexus 7000 vPC peer switches you must provision a separate link (other than the peer link) to use for L3 routing.  This is not true.  You absolutely can use the existing vPC peer link to form a routing adjacency between two vPC peer Nexus 7000’s.

I believe the source of confusion comes from a vPC design caveat that gets condensed and passed on as simply: “No L3 routing over vPC”.

Not knowing any better, if you take that statement at face value its easy to see how one might believe that you cannot form a routing adjacency between the two Nexus 7000’s over the vPC peer link, and therefore seek to provision a new “non-vPC” inter-switch link for that purpose.

Let’s take a minute to look at what works, and what doesn’t.  I’m not going to bore you with all the technical details.  Instead we’ll take a look at six simple diagrams with brief explanations.

Diagram #1 below shows two Nexus 7000’s configured as vPC peers with a single inter-switch link between them, the vPC peer link.  The two Nexus 7000’s are configured for OSPF and are using an SVI associated to a VLAN on the peer-link to form the L3 adjacency.  This VLAN for the L3 adjacency should only be forwarded on the peer-link.  Do not forward this routing VLAN on the vPC member ports (such as toward the L2 Switch shown in the diagram).

L3 peering over vPC peer-link

In the diagram above, we have a Layer 3 switch or router upstream configured for OSPF.  This L3 switch is attached to each Nexus 7000 with two normal point to point Layer 3 interfaces, no port channels, no vPC.

This design works.  This design has been discussed in official Cisco design documents and is supported by Cisco TAC.

So where does the “No L3 routing over vPC” come from anyway? Simply put, this comes from the vPC design caveat that you should NOT have an external device (firewall, router, switch) forming a routing protocol adjacency with the Nexus 7000’s over the vPC peer link.

Diagram #2 below shows an L3 switch running OSPF attached with a vPC to the two Nexus 7000’s and attempting to form an OSPF adjacency with each Nexus 7000. This design does NOT work.

Router attached via vPC - peering with Nexus 7000

This design does NOT work because some of the OSPF routed traffic from the L3 switch to the Nexus 7000’s will traverse the vPC peer-link (even when no ports or links are failed). As a result, this traffic will be dropped as a result of loop prevention logic in the Nexus 7000 hardware.

Side note: For the more vPC savvy folks out there who might be wondering “Does the new vPC Peer Gateway” feature make this design work? The answer is No.

Let’s look at Diagram #3 below. Here’s another example of an external device building a routing protocol adjacency with the Nexus 7000’s, this time its firewalls. The firewalls are singly attached (no vPC) to a VLAN that is forwarded on the Nexus 7000’s vPC peer link. The firewalls are running OSPF and attempting for form an adjacency with the each Nexus 7000.  This design too does NOT work.

Firewalls running OSPF attached to Nexus 7000's running vPC

This design does NOT work for the same reason as Diagram #2. Each firewall will form an OSPF adjacency with both Nexus 7000’s. This means that some OSPF routed traffic will traverse the vPC peer-link (even when no ports or links are failed). As a result, this traffic will be dropped.

Each firewall see’s both 7K1 and 7K2 as directly adjacent OSPF neighbors. If Firewall-1 chooses 7K-2 as the next-hop, this traffic will traverse the peer-link with the loop prevention bit set. If 7K-2 realizes the packet is destined for a vPC member port it will drop the traffic if the corresponding vPC member port on 7K-1 is up. If the packet was not destined for a vPC member port it would be forwarded normally.

To make the above Diagram #3 work, we can provision a new inter-switch link between the Nexus 7000’s that is NOT a vPC peer-link. On this new link we will forward a VLAN that is not forwarding on the vPC peer-link. This makes it a non-vPC VLAN. From here we can attach our firewalls running OSPF to the Nexus 7000’s on the non-vPC VLAN. See Diagram #3B below.

Firewalls with OSPF attached to non-vPC VLANs

This design works because each firewall will form an adjacency with both Nexus 7000’s, however the OSPF routed traffic will not traverse the vPC peer-link and not be subject to any loop prevention logic.
Keep in mind that non-vPC VLANs cannot be forwarded on vPC member ports. So if you have another device that is vPC attached and needs to be Layer 2 adjacent to the firewalls, that will not work. You will need to get that other device attached to the non-vPC VLAN as well on a normal non-vPC connection.

If your firewalls are not running a routing protocol there is no problem at all. Simply have static routes, or a static default route, pointing to the HSRP VIP address on the Nexus 7000’s.
See Diagram 4 below. This design works.

Firewall with static routes connected to Nexus 7000's running vPC

In this case each Nexus 7000 will locally forward traffic sent to the HSRP VIP address (even if its the “Standby” switch). Therefore no static routed traffic will ever traverse the vPC peer-link, and there will be no problems.

Finally, let’s look at Diagram #5 below where we have external devices running a routing protocol attached to the Nexus 7000’s. Moreover, these devices are attached with a vPC. That shouldn’t work, right? Well, lets take a look. This design DOES work.

Routing peers using vPC for transit only

This design works because these devices are not attempting to form an OSPF adjacency with the Nexus 7000’s. Rather, these external routing devices are forming an adjacency only with each other, and simply using the vPC topology as a transit. Under normal conditions when no links or ports are failed, there will be no traffic traversing the peer link and no problems.


The Nexus 7000 hardware has loop prevention logic that drops traffic traversing the peer link (destined for a vPC member port) when there are no failed vPC ports or links.  Normally peer-link traffic is non-existent in a normal network and this is never a problem for attaching normal Layer 2 switches or servers.  However when external devices attempt to form a routing adjacency with the Nexus 7000’s over the vPC peer-link, some traffic destined for a vPC member port can be forced over the peer-link even when the network is healthy, causing traffic to be dropped by the loop prevention logic.

Best practice:

  • Attach external routers or L3 switches with L3 routed interfaces.
  • It’s OK to use the vPC peer-link to form a routing adjacency between the two Nexus 7000’s.  Use a VLAN dedicated to the routing adjacency and only forward this VLAN on the peer-link, not on the vPC member ports.
  • Use the ‘passive-interface default’ command in your routing protocol to prevent a routing adjacency on all the other VLANs.
  • If attaching external devices on a Layer 2 port running a routing protocol with the Nexus 7000’s (e.g. firewall running OSPF), provision a new non-vPC inter-switch link, and attach the device to non-vPC VLANs.
  • Use static routes to the HSRP gateway address on external devices such as firewalls and load balancers.  Do not run routing protocols on these devices unless absolutely necessary.
  • Read the Cisco vPC best practices design guides

Great questions on FCoE, VN-Tag, FEX, and vPC

I received some really good questions about FCoE, VN-Tag, FEX, and vPC from a reader named Lucas.  Although I had 10 other things to do, I just couldn’t resist highlighting these questions, and my answers, in a new post that I thought my readers would enjoy!

You have amazing information about Nexus and UCS on your website. Please keep up the good work. I have a few queries would appreciate if you could please point me in the write direction.

1) FCOE with Vpc, How does this work. For Fcoe we must login to one fabric only, how will the load balancing offerend by Vpc effect it?

From the perspective of the CNA installed in the server, you have to keep in mind that its really two different logical adapters hosted on one physical adapter, Ethernet & FC.  The FC logical adapter on the CNA has no visibility or awareness of vPC – it still see’s two individual paths, one left, one right, and doesn’t behave any differently with or without vPC.  More specifically, a dual-port CNA will actually have (2) logical FC adapters, each one with their own port, and each port typically connected to a separate fabric.  The Ethernet logical adapters however are vPC aware and will treat the two paths as a single logical 20GE pipe for all the traffic it is handling (all the non-FCoE stuff).

UPDATE: Diagram below added for visual aid.

2) VN-tag, is the tag applied by a vmware machine or only a fex(i/o in ucs or fex 2000) Can apply this tag?

VN-Tag is a modern day version of a virtual cable that connects a virtual NIC, or virtual HBA, hosted on an NIV capable adapter to an upstream virtual ethernet or virtual FC port on an NIV capable switch.  In the case where the server does not have an NIV capable adapter, VN-Tag can also be used to connect a physical port on a fabric extender (FEX) to an upstream virtual ethernet port.

In a nutshell, an NIV capable adapter will apply the VN-Tag as traffic egresses from one of its virtual adapters.  Any FEX in the path will just pass that traffic upstream to the switch terminating the other end of the virtual cable (VNTag).  In this case you could think of the FEX as a virtual patch panel for virtual cables.

If you connect a plain non-NIV adapter to a fabric extender (FEX), it will be the FEX that applies the VN-Tag.  In this sense, you still have a cable, but half of it is physical (server-to-FEX), and the other half is virtual (FEX-to-switch).  In this case, you could think of the FEX as a physical to virtual media converter.

3) In FIP why do we need multiple Macs ( FPMA), I understand that FPMA will relieve the switch from creating a mapping between fcid and mac, but other than that why does the standard talk about multiple FC_LEPs on a single port. I am assuming each lep would need a separate mac, I am having a hard time visualizing it in real life.

Similar to Question #1 above, it helps to understand that the server CNA is really hosting two (or more) different logical adapters, Ethernet and FC.  Each logical adapter will have its own link identity (MAC/WWN).  Given that the actual physical medium is Ethernet, the logical FC adapter can’t use a WWN on the medium, so it uses a Ethernet MAC instead which will be the FC_LEP (link end point).  As you point out, its most efficient when the FCoE switch can automatically provide this MAC to the Server, for administrative ease.  Known as FPMA (fabric provided MAC address).

The same concepts hold true for the upstream switch.  The FCoE switch is really two different switches hosted on one hardware platform, an Ethernet switch and a FC switch.  The FC logical switch needs to look at the FC frames carried within the FCoE packets to process fabric logins and make a forwarding decision.  In order to do that, it needs to receive the FCoE frames, decapsulate, make a decision, and re-encapsulate into FCoE again if necessary.  The FC logical switch has an FC_LEP for this very reason, so that it can send and receive Ethernet frames carrying FC payload (FCoE).

If you could only read one book on FCoE to better understand these concepts, it would certainly be this one:

4) In a 2232PP FEX why is the straight through design preferred? will the fcoe break if we did active/active design?
Please assist.

As we discussed in Question #1, the logical FC adapters on the server CNA are oblivious to vPC.  As a result, attaching a server CNA with vPC makes no difference to how FCoE is forwarded via two separate paths to two separate fabrics.  However, this is not the case with a FEX or a switch which will forward the FCoE traffic on whatever Ethernet topology you place it on.  If this topology includes a vPC that spans two different fabrics, then you will have FCoE traffic from one of your logical FC adapters landing on both fabrics.  This could be confusing to determine where FCoE traffic is going, as well as breaking the traditional FC best practice of SAN A/B isolation.  Although you certainly could do this, it’s just not a supported design right now.

As a result, as of right now Cisco does not recommend that you place FCoE traffic on an Ethernet topology spanning two fabrics (A/B, Left/Right, etc.).  Therefore, if your Nexus 2232 FEX will be carring FCoE traffic from CNA’s, you should NOT vPC attach the 2232 FEX to two different upstream Nexus 5000’s.  Additionally, if your two upstream Nexus 5000’s are connected together for vPC, you should NOT forward FCoE VLANs on the vPC peer link.  This will keep your FCoE forwarding deterministic and preserve two separate SAN fabrics.  You haven’t lost any redundancy because your servers are all dual-attached to separate 2232 FEX’s which are each attached to separate Nexus 5000’s.

In the end you have something that looks like this:

Image from: Data Center Access Design Guide, Chapter 6

The above diagram was taken from the Data Center Access Design Guide, Chapter 6

Make sense?

Thanks for the great questions!  Keep’em coming :-)