Cisco UCS and Nexus 1000V design diagram with Palo adapter

This is a follow-up and enhancement of a previous design diagram in which I showed Cisco UCS running the standard VMware vSwitch. In this post I am once again showing Cisco UCS utilizing the Cisco (Palo) virtualized adapter with an implementation of VMware vSphere 4.0, however in this design we are running ESXi and the Cisco Nexus 1000V distributed virtual switch (vDS).

The Cisco adapter on the UCS B-200 blade is using its Network Interface Virtualization (NIV) capabilities and presenting (4) virtual Ethernet NICs, and (2) virtual Fibre Channel HBA’s to the operating system vSphere 4.0 ESXi. The vSphere 4.0 hypervisor sees the virtual adapters as unique physical adapters and identifies them as VMNIC’s and VMHBA’s. The vSphere VMNIC’s are then associated to the Cisco Nexus 1000V software switch to be used as uplinks. The NIV capabilities of the Cisco adapter allow the designer to use a familiar VMware multi-NIC design on a server that in reality has (2) 10GE physical interfaces with complete Quality of Service, bandwidth sharing, and VLAN portability among the virtual adapters.

Cisco UCS with Nexus 1000V and Palo Adapter

Aside from visualizing how all the connectivity works, this diagram is also intended to illustrate some key concepts and capabilities.

Cisco Virtualization Adapter preserving familiar ESX multi-NIC network designs

In this design we have used the NIV capabilities of the Cisco “Palo” adapter to present multiple adapters to the vSphere hypervisor in an effort to preserve the familiar and well known (4) NIC design where (2) adapters are dedicated to VM’s, and (2) adapters dedicated to management connections. The vSphere hypervisor scans the PCIe bus and see’s what it believes to be (4) discreet phsyical adapters, when in reality there is only (1) physical dual-port 10GE adapter. Just as we would with a server with (4) physical NICs we can dedicate (2) virtual Ethernet adapters to the virtual machine traffic by creating a port profile called “VM-Uplink” and associating it to the Cisco adapter vNIC1 and vNIC2. Similarly we can dedicate (2) virtual Ethernet adapters to the management traffic by creating a port profile called “System-Uplink” and associating it to the Cisco adapter vNIC3 and vNIC4.

We will configure the “VM-Uplink” port profile to only forward VLANs belonging to VM’s, and configure the “System-Uplink” port profile to only forward VLANs belonging to management traffic.

Creating separate uplink Port Profiles for VM’s and Management:

Nexus1000V# config Nexus1000V(config)# port-profile System-Uplink Nexus1000V(config-port-prof)# capability uplink Nexus1000V(config-port-prof)# vmware port-group Nexus1000V(config-port-prof)# switchport mode trunk Nexus1000V(config-port-prof)# switchport trunk allowed vlan 90, 100, 260-261 Nexus1000V(config-port-prof)# no shutdown Nexus1000V(config-port-prof)# state enabled

Nexus1000V(config)# port-profile VM-Uplink Nexus1000V(config-port-prof)# capability uplink Nexus1000V(config-port-prof)# vmware port-group Nexus1000V(config-port-prof)# switchport mode trunk Nexus1000V(config-port-prof)# switchport trunk allowed vlan 10, 20 Nexus1000V(config-port-prof)# no shutdown Nexus1000V(config-port-prof)# state enabled

The VMware administrator will now be able to associate vmnic0 and vmnic1 to the “VM-Uplink” port group, additionally vmnic2 and vmnic3 can be associated to the “System-Uplink” port group. This action puts those NICs in the control of Nexus 1000V which assigns them to a physical interface number; Eth1/1 for vmnic0, Eth1/2 for vmnic1, and so on.

Nexus 1000V VSM running on top of one of it’s own VEMs

In this diagram the UCS blade is running the Nexus 1000V VSM in a virtual machine connected to a VEM managed by the VSM itself. Sounds like a chicken and egg brain twister doesn’t it? So how does that work? Well, pretty simple actually. We use the ‘system vlan’ command on the uplink port profile “System Uplink”. This allows the VLANs stated in this command to be up and forwarding prior to connecting with the VSM for ‘critical connections’ such those needed to reach the VSM and other critical VMWare management ports such as the VM Kernel. We can also use the same ‘system vlan’ command on the port profiles facing the locally hosted VSM on this blade.

Nexus1000V# config Nexus1000V(config)# port-profile System-Uplink Nexus1000V(config-port-prof)# capability uplink Nexus1000V(config-port-prof)# system vlan 90,100,260-261 ! These VLANs forwarding on the uplink prior to locating VSM

Nexus1000V(config)# port-profile VMKernel Nexus1000V(config-port-prof)# switchport mode access Nexus1000V(config-port-prof)# switchport access vlan 100 Nexus1000V(config-port-prof)# system vlan 100 ! This allows access to VMKernel if VSM is down

Nexus1000V(config)# port-profile N1K-Control Nexus1000V(config-port-prof)# switchport mode access Nexus1000V(config-port-prof)# switchport access vlan 260 Nexus1000V(config-port-prof)# system vlan 260 ! Allows VNICs for the VSM to be up prior to connecting to the VSM itself ! Do the same for N1K-Packet and N1K-Control

Virtual Port Channel “Host Mode” on the Nexus 1000V VEM uplinks (vPC-HM)

In this design the uplink port profiles “System Uplink” and “VM Uplink” are establishing a single logical port channel interface to two separate upstream switches. The two separate upstream switches in this case are (Fabric Interconnect LEFT) and (Fabric Interconnect RIGHT). While the server adapter is physically wired the UCS “Fabric Extenders” (aka IOM), the fabric extender is simply providing a remote extension of the upstream master switch (the Fabric Interconnect), therefore the server adapter and Nexus 1000V VEM see itself as being connected directly to the two Fabric Interconnects. Having said that, the two Fabric Interconnects are not vPC peers that would normally allow them to share a single port channel facing a server or upstream switch. So how does the Nexus 1000V form a single port channel across two separate switches not enabled for vPC? This is done with a simple configuration on the Nexus 1000V called vPC-HM.

The Nexus 1000V VEM learns via CDP that Eth 1/1 and Eth 1/2 are connected to separate physical switches and creates a “Sub Group” unique to each physical switch. If there are multiple links to the same physical switch they will be added to the same Sub Group. When a virtual machine is sending network traffic the Nexus 1000V will first pick a Sub Group and pin that VM to it. If there are multiple links within the chosen Sub Group the Nexus 1000V will load balance traffic across those links on a per-flow basis.

Enabling vPC-HM on Nexus 1000V:

Nexus1000V# config Nexus1000V(config)# port-profile VM-Uplink Nexus1000V(config-port-prof)# channel-group auto mode on sub-group cdp Nexus1000V(config)# port-profile System-Uplink Nexus1000V(config-port-prof)# channel-group auto mode on sub-group cdp

With this configuration the Nexus 1000V will automatically create two Port Channel interfaces and associate them to the chosen Port Profiles.

Nexus1000V# show run ! unnecessary output omitted interface port-channel1 inherit port-profile VM-Uplink interface port-channel2 inherit port-profile System-Uplink

Cisco Virtualization Adapter per vNIC Quality of Service

Our multi-NIC design is enhanced by the fact that Cisco UCS can apply different Quality of Service (QoS) levels to each individual vNIC on any adapter. In this design, the virtual adapters vNIC3 and vNIC4 dedicated to management connections are given the QoS profile “Gold”. The “Gold” QoS setting can for example define a minimum guaranteed bandwidth of 1Gbps. This works out nicely because this matches the VMware best practice of providing at least 1Gbps of guaranteed bandwidth to the VM Kernel interface. Similarly, the “Best Effort” QoS profile assigned to the NICs used by VM’s can also be given a minimum guaranteed bandwidth.

It is important to understand that this is NOT rate limiting. Interface rate limiting is an inferior and sub optimal approach that results in wasting unused bandwidth. Rather, if the VM Kernel wants 10G of bandwidth it will have access to all 10G bandwidth if available. If the VM’s happen to be using all 10G of bandwidth and the VM Kernel needs the link, the VM Kernel will get it’s minimum guarantee of 1Gbps and the VM’s will be the able to use the remaining 9Gbps, and vice versa. The net result is that Cisco UCS provides a fair sharing of available bandwidth combined with minimum guarantees.

QoS policies for the individual adapters are defined and applied centrally at the UCS Manager GUI:

UCS Manager QoS

Read the Cisco.com UCS Manager QoS configuration example for more information.

True NIV goes both ways: (Server and Network)

To obtain true NIV requires virtualizing the adapter towards the Server and the Network. In this design we are providing NIV to the Server by means of SR-IOV based PCIe virtualization which fools the server into seeing more than one adapter, all from a single physical adapter. So the virtual adapters vNIC1, vNIC2, and so on, are identifying and distinguishing themselves to the server system with PCIe mechanisms. This accomplishes the goal of adapter consolidation and virtualization from the Server perspective.

The next challenge is differentiating the virtual adapters towards the Network. Remember that more than one virtual adapter is sharing the same physical cable with other virtual adapters. In this case vNIC1 and vNIC3 are sharing the same 10GE physical cable. When traffic is received by the adapter on this shared 10GE cable how does the physical adapter know to which vNIC the traffic belongs to? Furthermore, when a vNIC transmits traffic towards the Network, how does the upstream network know which vNIC the traffic came from and apply a unique policy to it, such as our “Gold” QoS policy?

Cisco UCS and Nexus 5000 solve this problem with the use of a unique tag dedicated for NIV identification purposes, shown here as a VNTag. Each virtual adapter has it’s own unique tag# assigned by UCS Manager. When traffic is received by the physical adapter on the shared 10GE cable it simply looks at the NIV tag# to determine what vNIC the traffic belongs. When a vNIC is transmitting traffic towards the network it applies it’s unique NIV tag# and the upstream switch (Fabric Interconnect) is able to identify which vNIC the traffic was received from and apply a unique policy to it.

Not all implementations of NIV adequately address the Network side of the equation, and as a result can impose some surprising restrictions on the data center designer. A perfect example of this is Scott Lowe’s discovery that HP Virtual Connect Flex-10 FlexNICs cannot have the same VLAN present on two virtual adapters (FlexNICs) sharing the same LOM. Because HP did not adequately address the Network side of NIV (such as implementing an NIV tag), HP is forcing the system to use the existing VLAN tag as the means to determine which FlexNIC is receiving or sending traffic on a shared 10GE cable, resulting in the limitation Scott Lowe discovered and wrote about on his blog. Furthermore, HP’s Flex-10 imposes a rate limiting requirement that imposes a hard partitioning of bandwidth resulting in waste and inefficiency. Each FlexNIC must be given a not-to-exceed rate limit, and the sum of those limits must not exceed 10Gbps. For example, I could have (4) FlexNICs sharing one 10GE port and I could give each FlexNIC 2.5Gbps of max bandwidth. However if the link is idle FlexNIC #1 could not transmit any faster than 2.5Gbps (wasted bandwidth).

Cisco UCS addresses NIV from both the Server side and Network side, and provides actual Quality of Service with fair sharing of bandwidth secured by minimum guarantees (not max limits). As a result there is no VLAN or bandwidth limitations. In the design shown here with Cisco UCS and Nexus 1000V, any VLAN can be present on any number of vNICs on any port, and any vNIC can use the full 10GE of link bandwidth, giving the Data Center Architect tremendous virtualization design flexibility and simplicity.

Disclaimer: This is not an official Cisco publication. The views and opinions expressed are solely those of the author as a private individual and do not necessarily reflect those of the author’s employer (Cisco Systems, Inc.). This is not an official Cisco Validated Design. Contact your local Cisco representative for assistance in designing a data center solution that meets your specific requirements.