I had an interesting discussion with a customer the other day where both Cisco (myself included) and HP account teams where on the same call to discuss Flex-10, Nexus 1000V, or other approaches that may work better. – Yeah, awkward.

Anyway, for most of the time we (the Cisco team) focused on Flex-10’s total lack of QoS capabilities. Flex-10 requires the customer to carve up bandwidth on the Flex-NICs exposed to the vSphere ESX kernel.

For example (1) FlexNIC set at 2Gbps for Service Console, another FlexNIC set at 2Gbps for vMotion, and a 3rd FlexNIC set at 6Gbps for VM data. Even if the 10GE link is completely idle a FlexNIC doesn’t get anymore bandwidth than what’s been manually assigned to it.

The question then becomes … Why not just feed the server a 10G interface and let vMotion or VM’s grab as much bandwidth as is available? – with minimum guarantees provided by QoS.

The Nexus 1000V can apply the QoS markings and the Nexus 5000 can act upon it. However, with Flex-10 as a man-in-the-middle the QoS model breaks.

What value does the FlexNIC bandwidth assignments have with a 10G attached server, other than being a workaround for a lack of QoS intelligence in Flex-10?

Moving on to operations, with Nexus 1000V, the server team is handing over the VM networking configuration and responsibility to the network team. The Nexus 5000 is obviously managed by the network team as well …. so … who manages Flex-10? Server team or Network team?

If the Server team manages it, there is no continuity in troubleshooting from Nexus 5000 to Nexus 1000 because you have this other object in the path (Flex-10) managed by a separate team – everybody seems to agree that’s not an ideal situation. If the Network team manages Flex-10, where is the CLI? where’s the QoS? how do you upgrade the code? do you need to reboot servers or interact with the server team for every config change? – nobody is quite sure.

The HP people on the call kept interjecting … “but, but, but … with Flex-10 you can move or replace the server without reconfiguring MAC/WWNs” That was all they could really say.

The customer didn’t seem to give this argument much weight because plenty of fail over capacity would be built into the VMware environment. If a server failed, VM’s will be immediately started on another host, and how fast you can reconfigure a physical server becomes less important.

It would take (2) or more servers failing at the exact same time (highly unlikely) for the speed of physical server provisioning becoming the bottle neck to restoring capacity to the vSphere environment.

It became quite clear towards the end of the meeting that 10GE pass-through to Nexus 5000 was a solid approach without much downside … and VMware vSphere + Nexus 1000V had a lot to do with the customer coming to that conclusion. The nice migration path to FCoE with 10GE pass-through to Nexus 5000 is also a huge plus. The HP 10GE pass-through module does support FCoE today. So, if HP is “going to announce FCoE soon”, then why buy a Flex-10 switch today that doesn’t support FCoE? Why not buy a pass-through device that supports FCoE now, and spend less money doing so?

More information on the HP 10GE passthrough module can be found here.

Cheers,
Brad