The FOLLY in the HP vs Cisco UCS Tolly Group report on bandwidth

Filed in Cisco UCS, FUD by on March 2, 2010 37 Comments

Folly: lack of good sense or normal prudence and foresight

Tolly Group: “Clients work with Tolly Group senior personnel to identify the chief marketing message desired

HP: Client of Tolly Group with a desired marketing message of “Cisco UCS bandwidth sucks”, but in fact received an embarrassing Folly. (refund?)

By now you may have read or heard about the recent HP funded Tolly Group report which attempts to position HP Bladesystem as being superior to Cisco UCS for blade-to-blade bandwidth scalability in a single blade chassis. Unfortunately though for HP, The Tolly Group, and You (who wasted your time reading this report), it contains an egregious FOLLY that effectively makes it a useless waste of time.

The report begins with a crucial and fatal misunderstanding about Cisco UCS:

Only one fabric extender module was used as the second is only used for fail-over.

WRONG! This is completely untrue. When two fabric extenders are installed in a Cisco UCS chassis they are both ACTIVE, and provide redundancy. Each fabric extender provides 40 Gbps of I/O to the chassis, so with two active fabrics you have a total of 80 Gbps of active and useable I/O per chassis under normal conditions. In the event one of the fabrics is failed (or completely missing in the Tolly tests) the other fabric will provide non disruptive I/O for all of the Server vNICs that were using the failed fabric.

Because of this fatal misunderstanding, the HP Tolly Group tests proceeded with the belief that a Cisco UCS chassis only has 40 Gbps of active I/O under normal operations. How could HP and Tolly Group miss this simple fact? After all, the Cisco.com data sheet for the Cisco UCS fabric extender clearly states:

Typically configured in pairs for redundancy, two fabric extenders provide up to 80 Gbps of I/O to the chassis.

http://www.cisco.com/en/US/prod/collateral/ps10265/ps10278/data_sheet_c78-524729_ps10276_Products_Data_Sheet.html

Figure 1 below shows normal operations of Cisco UCS with 80 Gbps ACTIVE/ACTIVE redundant fabrics. Each blue line is 10GE.

Figure 1 - Cisco UCS with 80 Gbps ACTIVE/ACTIVE redundant fabrics

Figure 1 above shows the Cisco recommend configuration for scaling UCS for maximum bandwidth. Servers 1 – 4 can have their vNIC associated to the Fabric A side with 40 Gbps of bandwidth. While Servers 5 – 8 can have their vNIC associated to the Fabric B side which also has 40 Gbps. The vNIC on each Server can also be configured for failover to the other fabric in a failure condition. This failover happens non-disruptively to the OS. The OS never sees a link down event on the Adapter. During the fabric failure condition, all (8) blades will share the same 40 Gbps of bandwidth on the remaining fabric.

Figure 2 below shows how to select the active fabric for a UCS server vNIC and enable failover

Figure 2 - Selecting the fabric for a vNIC with failover

Under normal operations each blade has full dedicated 10 Gbps of bandwidth. Any server can talk to any server at full line rate 10GE with ZERO oversubscription, ZERO shared bandwidth.

Under a fabric failure condition, each blade shares 10GE with another, resulting in a 2:1 oversubscription.

The HP funded Tolly Group tested Cisco UCS in a failed fabric condition, under the false premise of normal operations.

Figure 3 below shows the failed fabric condition as tested by HP and Tolly Group

Figure 3 - Cisco UCS with a failed fabric and 1/2 bandwidth

In the failed fabric condition shown above, (8) blades will share 40 Gbps. More specifically with the HP Tolly Group tests that used 6 servers, Servers 1 & 5 will share the same 10GE link, and Servers 2 & 6 will also share the same 10GE link on the Fabric A side.

This is exactly how the Tolly Group tested Cisco UCS under the premise of showing “Bandwidth Scalability” – when in fact they did not provide the full available bandwidth to the Cisco UCS blades. However, the full available bandwidth was provided to the HP blades. Is that a fair test? No way Jose!

What is even more interesting is that even with Cisco UCS tested in a failed fabric condition it still out performed HP in bandwidth tests using 4 servers:

Aggregate throughput of 4 Servers with HP in normal conditions: 35.83 Gbps

Aggregate throughput of 4 Servers with Cisco UCS under failed fabric conditions: 36.59 Gbps

Cisco UCS with (3) hops outperforms HP with only (1) hop — Ouch! That’s gotta be a tough one for the folks at HP to explain.

The major blow the HP Tolly Report tries to deliver is a test with 6 servers where HP almost doubles the performance of Cisco UCS. Again, this should not come as a surprise to anybody because Cisco UCS was tested while in a failed condition, while HP was tested under normal conditions:

Aggregate throughput of 6 servers with HP in normal conditions: 53.65 Gbps

Aggregate throughput of 6 servers with Cisco UCS under failed fabric conditions: 26.28 Gbps

Cisco UCS with (3) hops and half its fabric missing performs at half the speed of HP with (1) hop and a full fabric. Why is that a shocker?

What would have happened if the Tolly Group actually provided a fair test between HP and Cisco on the 6 server test? Is that something the Tolly Group should figure out? After all, the Tolly Group has what it describes as a Fair Testing Charter that states:

With competitive benchmarks, The Tolly Group strives to ensure that all participants are [tested] fairly

http://www.tolly.com/FTC.aspx

That sure sounds nice, I wonder if this actually means anything? Only the Tolly Group can tell us for sure.

Furthermore, I wonder if HP will continue to mislead the public with this unfair testing? Or will HP do the right thing and insist the Tolly Group re-test under apples-to-apples fair test condtions?

At this point the ball is in their court to either disappoint or impress.

###

Disclaimer: The views and opinions are solely those of the author as a private individual and do not necessarily represent those of the authors employer (Cisco Systems). The author is not an official spokesperson for Cisco Systems, Inc.

About the Author ()

Brad Hedlund (CCIE Emeritus #5530) is an Engineering Architect in the CTO office of VMware’s Networking and Security Business Unit (NSBU). Brad’s background in data center networking begins in the mid-1990s with a variety of experience in roles such as IT customer, value added reseller, and vendor, including Cisco and Dell. Brad also writes at the VMware corporate networking virtualization blog at blogs.vmware.com/networkvirtualization

Comments (37)

Trackback URL | Comments RSS Feed

  1. Pete says:

    Thank you Brad. We can always count on you to set the record straight be it FUD from HP or Egenera.

  2. Tom Howarth says:

    Brad, Interesting synopsis of the sorry affair. Tolly really need to get their act together here, bad testing strategy is like bad science.

  3. Duncan says:

    excellent article and definitely places tests like these in a different perspective. Thanks,

  4. Aaron Delp says:

    Hey Brad – Great Article! So, in this case, only one vNIC would be presented to the VMware vSwtich and failover would happen at the UCS level instead of the VMware level?

    I’m going to test all this out tomorrow to get a better understanding.

    Thanks again!

    • Brad Hedlund says:

      Aaron,
      That is correct. UCS abstracts the server’s adapter from the physical NIC port and backplane traces, hence the term “vNIC”. Therefore, it would be possible to create a design where there is only (1) adapter per VMware Host and let UCS fabric failover hide any path failures from the ESX kernel. However, our current recommendation is to NOT use this vNIC fabric failover setting when connecting the vNIC to a hypervisor switch such as vSwitch or Nexus 1000V. The vNIC fabric failover setting is recommended for VMware designs using hypervisor passthrough switching with the M81KR VIC adapter (Palo). More details on why in a future post perhaps.

      Cheers,
      Brad

  5. Stephen says:

    Hi Brad, perhaps you can clarify for me. Each adaptor has 2 ports, so the first 4 blades use port 1 and the the 2nd 4 blades use port 2. In this configuration I need two fabric extenders and two UCS 6100′s to avoid oversubscription for blade to blade traffic. What if my workload requires that I need to use both port active at the same time on a given blade fo a total of 20Gb I/O to each blade?

    • Brad Hedlund says:

      Stephen,
      Having (2) fabric extenders and (2) 6100′s more importantly provides high availability. If you *need* 20 Gbps I/O per blade you can do that in a UCS chassis populated with (8) servers at a 2:1 oversubscription ratio. If you *need* 20 Gbps I/O, and you *need* ZERO oversubscription, you can do that in a UCS chassis populated with (4) servers. However, customers with those kind of strict I/O requirements typically use rack mount servers.

  6. Ken Henault says:

    Brad, maybe your UCS customers that need 20Gbps go to rack mount servers, but HP BladeSystem customers can easily get 60Gbps in a half height blade.

    • Brad Hedlund says:

      Ken,
      You would have to tell the 60 Gbps customer they can only populate (8) servers in their (16) slot chassis to get ZERO oversubscription. After running the math for blades vs. rack mounts, which one do you think this customer would choose more often?

  7. Sri says:

    I read this paper, it actually exposes bottle necks with FEX “pinning” feature. I guess it’s your job to discredit any thing against UCS, I know you get paid of that :)

    Here is my analysis on it:
    This surprised me as I understood it, Cisco screw up big time on this pinning feature!!! They have fixed configurations depending on how many uplinks are used to connect FEX to 6100 FI.

    From UCS Manager GUI config guide:
    Pinning Server Traffic to Server Ports
    All server traffic travels through the I/O module to server ports on the fabric interconnect. The number of links for which the chassis is configured determines how this traffic is pinned.
    The pinning determines which server traffic goes to which server port on the fabric interconnect. This pinning is fixed. You cannot modify it. As a result, you must consider the server location when you determine the appropriate allocation of bandwidth for a chassis.
    You must review the allocation of ports to links before you allocate servers to slots. The cabled ports are not necessarily port 1 and port 2 on the I/O module. If you change the number of links between the fabric interconnect and the I/O module, you must reacknowledge the chassis to have the traffic rerouted.

    For example: if you use 2 uplinks today and tomorrow you decide to use 2 more, then to optimize bw use of servers, you got to physically shuffle servers, so that two busy server don’t pair up. Atleast move VMs around for bw optimization!!

    So you can have two servers per uplink leading to 2:1 subscription if all the four links are used, but the traffic flow are fixed. Eg: Traffic from Server 1 and 5 share uplink1. You don’t have a choice to change that, but your choice is if server A and B has higher bw requirements, then you don’t want them in Slot 1&5 or 2&6 or 3&7 or 4&8 together, to avoid BW starvation. So you may want to plug Server A with Server C because server C needs little bw.

    This paper exposes this fundamental design problem and highlights it as limited BW aggregation capability. In the enthusiasm of doing that, they forgot that Cisco uses two FEX modules and both can be active at the same time. So effectively aggregate uplink bw will be 9.1 x 2 = 18.2 Gbps per two servers if all uplinks are used.

    So from server pair point of view, if it has 2x10gig CNAs, then 40gig downlink traffic should share 20Gig uplink bw. Means 2:1 oversubcription….

    If scaled to 320 servers as Cisco UCS claims, then the oversubscription will be 8:1, in other words, if customer has apps running on these blades that need high bw, then scalability story runs short quickly..

    I think Cisco “Pinned” themselves wrong!!

    Cheers
    Sri

    I wonder if you are going to publish this :) I will see if you are up to my challege….

    • Brad Hedlund says:

      Sri,
      Here’s some blogging advice: If you’re going to point out someone’s financial motivations you better be prepared to disclose your own.

      I see you work for Dell. So, yes, while you can say I get paid to refute and defend against Cisco UCS FUD, by that same token, you get paid to create and distribute the Cisco UCS FUD. Hence our conversation here. Touche.

      I think you’re completely missing the big picture. As someone who’s focused on data centers you’re probably aware that bandwidth scalability is not the primary problem facing data centers today. That’s whats so silly about this HP Tolly Report. It distracts from the real problem facing data centers, and that’s management scalability. The largest cost of a data center is management and operational expenses, and the fastest growing cost is power and cooling. Bandwidth engineering is somewhere on the bottom of the list of things a data center manager or CIO worries about on a daily basis.

      Your complaints about the FEX pinning in Cisco UCS being fixed, predetermined, and un-configurable are exactly it’s strengths in addressing the real problem of management complexity.

      Here’s how many steps it takes to get a Cisco UCS chassis added to the data center:
      1) connect the power cables
      2) connect the ethernet cables
      3) acknowledge the chassis in UCSM
      –DONE–

      How many configuration steps are required to add a Dell chassis to the data center? That’s my challenge to you.

      Cheers,
      Brad

  8. Dylan says:

    Brad,

    How many steps are required to install the FIRST UCS chassis into a datacenter?

    What do you with workloads that cannot be virtualized?

    Dylan

    • Brad Hedlund says:

      Dylan,

      After the initial setup of the UCS Fabric Interconnect you add your first and subsequent chassis using the 3 simple steps. Compare that to other systems where you have to setup each new blade chassis as a unique and independent entity with duplicative management.

      Read the Cisco UCS Fabric Interconnect setup guide here: http://bit.ly/9SAvdm

      For workloads that cannot be virtualized you would simply run those on a Cisco UCS blade as well. You can run VMware, Citrix, Hyper-V, Windows, Solaris x86, and Linux operating systems on UCS blade servers.

      Cheers,
      Brad

  9. Adam says:

    (Disclamer I do work for HP)

    Hi Brad,

    So there seems to be alot of discussion at the moment around how UCS and Flex 10 work with regard to networking and in particular oversubscription. Im going to be honest, I have been having these conversations in customers for the past few years now and it all comes down to politics (without meaning to degrade your good self, Cisco are almost as bad as IBM when it comes to FUDing something that they percieve to be a “threat” to their install base, and I appreciate that this report could be taken is the same way, but please remember all the comments you have made against Flex 10 in the past as well before this report came out).

    However everyone seems to be forgetting that the real reason these systems are bought for is to run workloads (we’ll ignore the fact that the only thing you can run at the moment is virtualised workloads (and only in VMware) ;-) even though most of my customers run mixed physical and virtual workloads in a chassis/datacentre, not to mention UNIX and a whole host of other random systems that you will always find in a enterprise DC)

    Unfortunately when considering this you need to take into account; servers, storage, network, application, business needs/wants/risks and a whole host of other aspects, the network actually plays a relatively small part (as do all of the other areas in relation).

    USC appears to be very targeted towards the network and solving the “pain” points associated with that, sorry but thats not the real problem customers have(with all due respect I wouldnt expect any less from a network company). Cisco have spent alot of time telling us that this is how enterprise computing should be (and that IBM, HP and Dell (not too mention SUN/Oracle and a whole host of other “long timers”) have been getting it wrong for the past 30 odd years or so!)

    To illustrate my point let me take your above post around management scalability being the real DC issue. If you speak to most customers (Im talking architects, not sys admins here) the problem is not as simple as “how many IP’s are consumed” and single plane of glass its actually much more complicated and takes into account such things as business process and orchestration/workflow tools to address the real pain points around time to market (as in how quickly can I get a new service up and running (thats not only infrastructure (servers/storage/networking, but apps, people, processes blah blah blah)).

    We can argue till the cows come home around who’s got the better system, too be honest most people couldnt care less (as you pointed out above!) what they care about is the big picture and as far as I can see UCS does not address/solve most of that.

    Thanks,

    Adam

    • Brad Hedlund says:

      Adam,

      Thanks for joining the conversation. You said:

      … we’ll ignore the fact that the only thing you can run at the moment is virtualised workloads (and only in VMware)

      Well, I won’t ignore the fact that you need to do your homework on Cisco UCS because you are dead wrong here. Our customers can run VMware, Citrix, Hyper-V, Windows, Solaris x86, and Linux workloads on UCS.

      the problem is not as simple as “how many IP’s are consumed”

      If you think the management capabilities of UCS are simply reducing the IP addresses to keep track of, again, you need to do some homework. You mentioned turning up new services quickly, lets work with that scenario…

      Here’s one simple example of many: Imagine a new application to be deployed that requires a new VLAN and Firmware update pushed to all of the adapters on the perhaps hundreds of servers that will be supporting that application. The Cisco UCS customer simply logs into the Fabric Interconnect and with a few clicks of the mouse updates a single Service Profile template with the new VLAN settings and Firmware bundle. -DONE- The system goes out to the hundreds of servers driven by that template and updates the Firmware and VLAN settings. Now that’s time to market.

      This time it was VLANs and Firmware, the next time it might be QoS or BIOS settings. A process that would otherwise take weeks, several different management platforms, and team of people is all done in a few minutes after a few clicks of the mouse. All of this with the out-of-the-box capabilities in UCS Manager. This process can also be driven via the XML API integration with 3rd party automation players such as BMC BladeLogic or EMC Ionix, for example.

      I could go on with 10 other similar examples, but you get the idea, right?

      Do you still think UCS is just solving “pain points” only in the network?

      Cheers,
      Brad

    • Jim H says:

      Adam,
      You are way off here I work for a company that designs ULL datacenter and HFT solutions for financial firms. With that said we are completely vendor neutral. And your management scenario is way off I do agree that companies look at other things before the nuts and bolts of the actually technology. However, that argument is exactly where HP and IBM fall short for example the number of resources required supporting HP SIM / Opsware, Flex managers, etc. is “big” With Cisco it is UCS manager one view and done i.e. “less” … Also with IBM same rings true with Tivoli… I hate to burst your bubble but some of your largest financial customers are seriously looking at UCS or have already decided to forklift again not only due to technology but for limitations of HP’s white glove service, inability to do timely firmware upgrades, and the number of headcount to support the management console / consoles. This is what I heard from HP and IBM customers directly from their mouths. A large hospital management company also told me they refit their 800 servers in a weekend with UCS, cool stuff!

  10. Sri says:

    Brad,
    I guess you didn’t like my answer to your challege. Didn’t dare to post it :)

    Ok let me throw another challenge:

    What happens if a company needs more than 320 physical server, say 321 do they have to run another instance of UCSM and manage that separately…

    I guess Cisco is working hard to spin the scalability story.

    Dell don’t have that limitation on management scalability.

    Even HP and IBM has better story than UCS…
    Cheers again!
    Sri

    • Brad Hedlund says:

      Sri,
      I noticed you had also published your last submitted comment to me on your blog as well. So rather than duplicating it here I had planned on visiting your blog to continue that discussion.

      What happens if a company needs more than 320 physical server, say 321 do they have to run another instance of UCSM and manage that separately…

      Let’s say in the case of HP or Dell a blade chassis holds 16 servers. So for 320 servers that would be 20 chassis. Each of the 20 chassis has Ethernet, Fibre Channel, and iLO management, so that’s 3 points of management per chassis. 20 chassis with 3 points of management is 60 management points for 320 servers.

      With Cisco UCS there is no individual per chassis management points, you only need to manage the single Fabric Interconnect. Furthermore, the Ethernet, FC, and iLO is all managed in a single pane of glass … 1 management point. So for 320 servers thats 1 point of management in Cisco UCS, and 60 points of management for HP/IBM/Dell.

      Ok, 321 servers? That would be 2 points of management in Cisco UCS, and 63 for HP/IBM/Dell.

      Why stop at 321? How about 640 servers? 2 points of management in Cisco UCS, 120 with HP or DELL.

      You get the idea.

      Cheers,
      Brad

  11. Sri says:

    Brad,
    Are you saying Customers adopt Cisco UCS and now they end up with 1 management point? I wish that’s for real man!

    Looks like you totally forgot the data center reality! That they have existing infrastructure and management tools. That they have Dell servers (or other vendor servers, really sorry to say no Cisco servers!)… and relevant management tools in place. That they have trusted their business with all these years. So get this man, to your surprise! The reality is by adding UCS, they are adding more management points, even if it is just one :)

    So you need to get real first, and understand that with the management infrastructure that is already inplace in every data centers with Dell Servers, it is just adding Server HW. That is one step…pretty simple and provides one major thing customer really cares “Investment Protection”. In addition, ease of deployment, etc, etc….

    So, don’t you agree, in fact with UCS you are just adding more complexity to existing data centers, instead of simplifying..Now customer has to worry about managing two different server platforms…

    I will definitely agree that you have some value proposition, if in case you build ground up new data centers where that companies just started up…otherwise it’s just “Nuke and Pave” proposition…and I bet customer don’t agree with it…even if Cisco is giving away to UCS chassis free..

    Don’t make me post this to my blog, by not posting it to yours Be fair…can u?
    Cheers
    Sri

    • Emre SUMENGEN says:

      I am sorry, but I can’t hold it. This very claim is _ridiculous_ to say the least.

      Assuming adding (320) new servers (as blades) to an existing datacenter…

      With Cisco UCS, you only add 1 additional management points.
      With any other vendor, you ALWAYS add more, unless it is the very brand you already have. Even in that case, you usually add more server islands, server switches, SAN switches, etc. And all these need to be managed, usually independently.

      So, do you suggest customers SHOULD stick to the brand, just because they already have some legacy investment in their products? This really sounds like running away from competition.

      Disclaimer: I work for an integrator company as a Pre-Sales SE, which does mainly Cisco, but also Brocade, Juniper, McAfee, Tandberg (at least we “did” :P), Tellabs, Teldat etc. with respect to customer’s expectations and needs.

  12. Ken says:

    Great to see such passionate debate. It looks like the innovations in Cisco UCS have really caused a stir amongst the establishment.

  13. Really? says:

    All this discussion and no mention of UCS and its ability to use (cheaper) low density RAM (at no loss of performance), and lots of it….that seems just as important as the other features when you are talking about a virtual environment (that typically needs a lot of RAM).
    It depends on the situation, but is this not a valid reason to migrate to UCS considering the (probable) cost savings?

    The service profiles also look like huge time savers (once configured)…

    And to Sris point on management..I do believe that UCS can integrate into BMC, Tivoli, Openview, and System Center…but I just started looking at UCS as a replacement for an HP infrastructure a few days ago…so I could very well be mistaken.

  14. Craig says:

    Bladelogic from BMC had been integrated with UCS manager since the earlier days they launcht the UCS products.

  15. Vik says:

    Hi Brad,

    Could you please post the link for where you got the icon for the UCS 6100-A labelled in this article. I have searched through the visio icon library and cannot find the exact icon. Are these made in visio? thx

  16. Jason says:

    It really shocks me how many people make the comment on something so ridiculous as UCS can only run VMware… Do people really start talking before they actually have a clue as to what they are talking about??

    The other thing that I really don’t understand is how people can argue with the innovation of UCS. The simplicity is very simple to prove and I would challenge all of you that work for DELL HP and IBM as well as the customers like myself that have the ability to do the following:

    Use a single Disk Array, attached to your SAN switch of choice (Yes you can run multi vendor environments with UCS – a shocker to the FUD writers I know …. )

    Build 4 Dell Chassis
    Build 4 HP Chassis
    Build 4 IBM Chassis
    Build 8 Cisco UCS Chassis

    Now just compare the time it took you to configure each of those Chassis (yes just the chassis). If you want to just blow your mind install some SLES servers, some Windows 2008 Servers (again a shocker to the FUD writers but YES you can install Linux, Windows and just about any OS on UCS, IT IS COMMODITY HARDWARE PEOPLE) and see how quickly those are up and able to communicate to everything else in the Data Center or LAB or whatever you want.

  17. Jeremy Foster says:

    Sri,
    First off I work on the UCS team at Cisco, lets make that clear.
    Second I think you are missing the point of UCS entirely. This whole management point argument is just not one that really makes sense for you to make. UCS Manger is like a mega blade chassis that holds 320 servers under single management point including both LAN and SAN connectivity to any server at any time. UCS manger is simply a remote control for your TV. The open XML API which can be leveraged into the existing tools in the datacenter (BMC,Ionix, Tivoli,SCOM, ETC…). The existing tools are the ‘Universal Remote’ to control the data center Home theatre.

    Every other manufacture scales on a Chassis Basis (16 or 14 Servers) We scale on a datacenter or Mega Chassis 320 Server Basis. If you want to incorporate the 321st server then most likely your existing tools already have plug ins to make them ‘UCS Aware’ and they can scale across multiple UCS PODS. Cisco’s approach is about Choice.

    Here is a ‘Challenge’ go setup UCS System. Outside of giving the Mangers IP addresses out of the box you can control everything else directly from BMC BladeLogic or Ionix or Tivoli or SCOM. Cisco Is not in the OS provisioning business. Cisco is not in the orchestration business.The value add of UCS is to allow UCS Manager XML API to provide the ‘Missing Link’ between existing customer management stacks and the internal cloud.

    Cisco UCS customer can do more WITH THEIR EXISTING MANAGEMENT TOOLS by using UCS.

    I really encourage you to setup a UCS because it really is something you appreciate once you do. If not that is just as well, I really like all the FUD that is thrown at UCS because it is so off base it really helps us out. Also glad to see Dell saved scalent from bankruptcy and look forward to competing with dell for customers compute business.

  18. Dr_V says:

    We are currently undergoing a proof of concept of a 2 chassis UCS configuration. I’m not impressed. While the UCS manager appears functional and well organized, it underscores the sheer complexity and cumbersome configuration effort required to leverage UCS.

    Firmware updating is buggy and unreliable. We also had a bad interconnect; a proprietary and expensive device essential to UCS.

    The fixed IO pinning issue is the biggest negative of the proprietary UCS platform. While Cisco and it’s zealots proudly boast about 80gb throughput per chassis, this number doesn’t hold up because of the static i/o pinning. I also find it misleading that Cisco advertises a per blade I/O maximum that can never be achieved because of the Chassis limitations.

    Cisco acknowledges the i/o pinning and it’s consequences via a warning about server placement in it’s ucs configuration guide. UCS abuses the term “Oversubscription”.

    Installation and configuration of a ucs platform is cumbersome and complex. Cisco makes traditional server resource configuration more complex than it ever should be to compensate for the lack of physical redundancy and options.

    I work in an environment where performance and hardware redundancy are paramount for virtualization. While Cisco provides a complelling platform, it falls well short of the overall value that industry leaders IBM, Dell and HP provide.

    In a nutshell, Cisco UCS is currently: Proprietary, complex to install and fully configure, prohibitavely expensive and not suitable for high i/o applications or virtualization needs.

    I can stand up 16 traditional servers with 10gb, faster than a UCS 2-chassis deployment (with interconnects) easily.

    • Brad Hedlund says:

      Thanks for the “comment” (rant).
      You sound like a competing vendor rattling off your list of FUD talking points, or at best a customer (as you claim) with an agenda against Cisco.
      Either way, I hope you’ll come back when you’re ready to have a real and genuine substantive discussion.

      Cheers,
      Brad

    • Jason says:

      As a customer that has set up multiple chassis environments (HP, IBM and Cisco) I can say that I don’t believe that you have every set up all for of the environments that you claim in this.

      Time to deploy doesn’t compare … your complexity argument is just complete ignorance

      Recently setup 2 new systems: Time To Complete 4 hours (UCS System, Core Network Connectivity, Core SAN Connectivity)

      Have added 6 chassis to each system: Time To Complete each chassis, 15 min and 95% of that time is unpacking the chassis and taking it out to the data center floor. 3% racking and 2% logging into UCSM and then clicking to make server port.

      Just the other day I had to add another network switch and san switch to an HP c7000 (install configure and cable), this took longer than adding the all 12 chassis (minus unpack time) and then installing ESXi on 6 servers….

      You can keep your horse and buggy …. I like my new Under-carriage Combustion System

      Disclaimer … I am an Enterprise customer that likes to get work done and not make life more difficult that it needs to be.

  19. Dr_V says:

    Oversubscription is easily one of Cisco’s most abused words in regards to the UCS hyperbole. Because the chassis it self is incapable of dynamic I/O scheduling across blades, I find it hard for any blade vendor to be able to claim oversubscription.

    While the Tolly review clearly has some flaws in it, the review does cast light on the fact that the UCS platform’s I/O capabilities, even if ‘properly’ configured are severly limited. If the Chassis is only capable of 80gb of I/O, then your blades, best case, under full load, cannot truly leverage over 10gb per blade.

    And I won’t even get started on the huge costs and extremely proprietary nature of the UCS platform.

    • Brad Hedlund says:

      Here we go again with the Frankenstein bandwidth scenarios. Every server in the chassis sending/receiving more than 10G all at the same time? Fine, lets have that discussion.

      OK, if you did actually have such a niche application requirement like that you CAN infact do that in Cisco UCS, by simply populating each chassis with (4) compute blades. All blades in the chassis could send/receive 20 Gbps all at the same time. From there, you can scale that kind of bandwidth beyond a single chassis into a larger pod of (10) chassis holding (40) servers — all (40) servers capable of sending/receiving 20G of bandwidth at the same time to any other server in any other chassis, with no over-subscription.

      How would you do that with HP/IBM/Dell chassis without skyrocketing the networking costs? Lets take HP … You would need to populate each c7000 chassis with no more than (8) servers, and each chassis would need (2) Flex-10 switch modules with all (8) 10GE uplinks connected. Each c7000 chassis with fans and power supplies is ~$8,700 list price. Each Flex-10 switch module is ~$12,000 list price. To manage all those Flex-10 modules you’ll also need the Virtual Connect Enterprise Manager license at $5,000 per chassis.

      With Cisco UCS, on the other hand, you dont need per-chassis management licensce costs, you don’t need the expense 10GE switch modules in each chassis.

      HP Solution (MSRP):
      (5) c7000 chassis = $43,500
      (5) VCEM licenses = $25,000
      (10) Flex-10 modules = $120,000
      (2) 48-port non-blocking 10GE switches (Arista 7148) = $50,000
      HP Total = $238,500

      Cisco UCS Solution (MSRP):
      (10) chassis = $37,000
      (20) fabric extenders = $40,000
      (2) 40-port fully licensed fabric interconnects = $128,000
      Cisco UCS Total = $205,000

      Result: Cisco UCS is actually over $30,000 less and provides a whole lot more functionality (stateless computing):
      http://www.mseanmcgee.com/2010/04/the-state-of-statelessness-cisco-ucs-vs-hp-virtual-connect/

      Help me understand what you see as Cisco UCS “huge costs” — because I must be missing something here.
      The reality is, in your Frankenstein bandwidth scenario, the costs only work favorably for HP/IBM/Dell if you would actually implement a single chassis with no upstream networking, and no customer is going to realisticly implement that.

      Cheers,
      Brad

  20. Dave says:

    In the so called Frankenstein IO examples proposed, I think a little thing called ‘reality’ is being lost here.

    In a UCS chassis with two IOMs and 4 links each, you have 80Gb/s (shared by 8 blades) resulting in 1:1 over subscription.

    In an HP c7000 with two Flex-10′s, you have 80Gb/s (shared by 16 blades) resulting in 2:1 over subscription.

    Both of those examples are cheesy at best. They quote raw numbers without considering operation in a failure.

    To use a bare metal OS example on the UCS chassis, you could set half the blades to be fabric A active, fabric B failover. The other 4 blades could be configured fabric B active, fabric A failover. In this scenario, you _could_ run full-tilt 80gb/s. I suspect HP has some similar capability.

    It kind of misses the point in a production data center though. If in the split-fabric scenario above, you exceeded an average throughput of greater than 20Gb/s on both fabrics, the loss of a fabric interconnect (or Flex-10) would mean a demand greater than 40Gb/s on the active remaining fabric and packets being dropped on the ground. Neither Cisco, HP, IBM or Dell can defy physics. Any sane designer takes this into consideration.

    Cisco built a better mousetrap. Even if you haven’t drank the UCS kool-aid yet, I suspect you grudgingly will in the coming 2-5 years. I can’t wait to see HP or IBM validate the UCS concept by introducing similar functionality.

    All truth passes through three stages:
    First, it is ridiculed; Second, it is violently opposed; and Third, it is accepted as self-evident.

    – Arthur Schopenhauer, 1788-1860

  21. Billy Contraras says:

    To all of the above commentors. Blogs can be a lot greater to go through should you can maintain Your feedback simple and also to the point. No-one likes to study giant comments when the idea can be conveyed using a not as lengthy remark

Leave a Reply

Your email address will not be published. Required fields are marked *