The FOLLY in the HP vs Cisco UCS Tolly Group report on bandwidth

Folly: lack of good sense or normal prudence and foresight

Tolly Group: “Clients work with Tolly Group senior personnel to identify the chief marketing message desired

HP: Client of Tolly Group with a desired marketing message of “Cisco UCS bandwidth sucks”, but in fact received an embarrassing Folly. (refund?)

By now you may have read or heard about the recent HP funded Tolly Group report which attempts to position HP Bladesystem as being superior to Cisco UCS for blade-to-blade bandwidth scalability in a single blade chassis. Unfortunately though for HP, The Tolly Group, and You (who wasted your time reading this report), it contains an egregious FOLLY that effectively makes it a useless waste of time.

The report begins with a crucial and fatal misunderstanding about Cisco UCS:

Only one fabric extender module was used as the second is only used for fail-over.

WRONG! This is completely untrue. When two fabric extenders are installed in a Cisco UCS chassis they are both ACTIVE, and provide redundancy. Each fabric extender provides 40 Gbps of I/O to the chassis, so with two active fabrics you have a total of 80 Gbps of active and useable I/O per chassis under normal conditions. In the event one of the fabrics is failed (or completely missing in the Tolly tests) the other fabric will provide non disruptive I/O for all of the Server vNICs that were using the failed fabric.

Because of this fatal misunderstanding, the HP Tolly Group tests proceeded with the belief that a Cisco UCS chassis only has 40 Gbps of active I/O under normal operations. How could HP and Tolly Group miss this simple fact? After all, the data sheet for the Cisco UCS fabric extender clearly states:

Typically configured in pairs for redundancy, two fabric extenders provide up to 80 Gbps of I/O to the chassis.

Figure 1 below shows normal operations of Cisco UCS with 80 Gbps ACTIVE/ACTIVE redundant fabrics. Each blue line is 10GE.

Figure 1 - Cisco UCS with 80 Gbps ACTIVE/ACTIVE redundant fabrics

Figure 1 above shows the Cisco recommend configuration for scaling UCS for maximum bandwidth. Servers 1 – 4 can have their vNIC associated to the Fabric A side with 40 Gbps of bandwidth. While Servers 5 – 8 can have their vNIC associated to the Fabric B side which also has 40 Gbps. The vNIC on each Server can also be configured for failover to the other fabric in a failure condition. This failover happens non-disruptively to the OS. The OS never sees a link down event on the Adapter. During the fabric failure condition, all (8) blades will share the same 40 Gbps of bandwidth on the remaining fabric.

Figure 2 below shows how to select the active fabric for a UCS server vNIC and enable failover

Figure 2 - Selecting the fabric for a vNIC with failover

Under normal operations each blade has full dedicated 10 Gbps of bandwidth. Any server can talk to any server at full line rate 10GE with ZERO oversubscription, ZERO shared bandwidth.

Under a fabric failure condition, each blade shares 10GE with another, resulting in a 2:1 oversubscription.

The HP funded Tolly Group tested Cisco UCS in a failed fabric condition, under the false premise of normal operations.

Figure 3 below shows the failed fabric condition as tested by HP and Tolly Group

Figure 3 - Cisco UCS with a failed fabric and 1/2 bandwidth

In the failed fabric condition shown above, (8) blades will share 40 Gbps. More specifically with the HP Tolly Group tests that used 6 servers, Servers 1 & 5 will share the same 10GE link, and Servers 2 & 6 will also share the same 10GE link on the Fabric A side.

This is exactly how the Tolly Group tested Cisco UCS under the premise of showing “Bandwidth Scalability” – when in fact they did not provide the full available bandwidth to the Cisco UCS blades. However, the full available bandwidth was provided to the HP blades. Is that a fair test? No way Jose!

What is even more interesting is that even with Cisco UCS tested in a failed fabric condition it still out performed HP in bandwidth tests using 4 servers:

Aggregate throughput of 4 Servers with HP in normal conditions: 35.83 Gbps

Aggregate throughput of 4 Servers with Cisco UCS under failed fabric conditions: 36.59 Gbps

Cisco UCS with (3) hops outperforms HP with only (1) hop — Ouch! That’s gotta be a tough one for the folks at HP to explain.

The major blow the HP Tolly Report tries to deliver is a test with 6 servers where HP almost doubles the performance of Cisco UCS. Again, this should not come as a surprise to anybody because Cisco UCS was tested while in a failed condition, while HP was tested under normal conditions:

Aggregate throughput of 6 servers with HP in normal conditions: 53.65 Gbps

Aggregate throughput of 6 servers with Cisco UCS under failed fabric conditions: 26.28 Gbps

Cisco UCS with (3) hops and half its fabric missing performs at half the speed of HP with (1) hop and a full fabric. Why is that a shocker?

What would have happened if the Tolly Group actually provided a fair test between HP and Cisco on the 6 server test? Is that something the Tolly Group should figure out? After all, the Tolly Group has what it describes as a Fair Testing Charter that states:

With competitive benchmarks, The Tolly Group strives to ensure that all participants are [tested] fairly

That sure sounds nice, I wonder if this actually means anything? Only the Tolly Group can tell us for sure.

Furthermore, I wonder if HP will continue to mislead the public with this unfair testing? Or will HP do the right thing and insist the Tolly Group re-test under apples-to-apples fair test condtions?

At this point the ball is in their court to either disappoint or impress.


Disclaimer: The views and opinions are solely those of the author as a private individual and do not necessarily represent those of the authors employer (Cisco Systems). The author is not an official spokesperson for Cisco Systems, Inc.