Networking is a Service, and you are the Service Provider

The status quo approach to Networking is the biggest barrier to realizing the full potential of Virtualization and the private, public, or hybrid cloud.  We must re-think how Networking *Services* are delivered, in a way that comports with automation, decoupling, pooling, and abstractions.  I would argue, the solution is a more software-centric approach — Network Virtualization.  But more importantly, we must re-think how we view Networking as a career skill set and the value we bring to an organization.

That was the mesage of two keynote talks I recently gave at the Sydney & Melbourne VMUG user conferences.  The title of the talk was Three reasons why Networking is a pain in the IaaS, and how to fix it“.  I will share the slides and a brief summary of that talk in a subsequent post.  But before I do that, please indulge me in a heart-to-heart chat from one long time Networking professional (me) to another (you):

I emphasized the word *services* above because if you really think about it, that is what Networking really is — Networking is a Service. It always has been, and will always continue to be a service.  A service that will always be needed. To some, that may seem like an obvious statement. Congratulations, you are enlightened. But to others, Networking is still viewed as a set of hardware boxes with ports and features.

What box should I buy? What features does it have? How fast is it? How do I configure that box? I better buy a box with all the features, just in case I might need it. I better buy a box with with lots of ports, just incase I might need it.  And so on. And you begin to associate your career value to the knowledge you have in evaluating, configuring, and managing these boxes and their complex feature sets.  At this point, the mere thought of a software-centric approach to Networking can be quite unsettling.  If networking moves to software (read: x86 machines, hypervisors, SDN), well, that makes me less relevant and/or I don’t have the skills for that.  And to appeal to your anxieties, the hardware box vendors serve up a healthy plate of Fear, Uncertainty, and Doubt assuring you that software-centric networking will fail, keeping you comfortably stuck in your Networking-is-a-hardware-box comfort zone.  Meanwhile, the organization continues to see your value associated to the efficient operations and deployment of it’s infrastructure *hardware*.  When the platform changes, and it will, where does that leave you?

Your service on any platform

Contrast that to a mindset where you view Networking as a *service* — a service that can be fulfilled by any underlying platform, architecture, or another service (hardware, software, external providers).  You know that the ideal platform will change over time, because it always does (Client-Server, Virtualization, Cloud, Everything as a Service).  You make it your job to recognize when those changes are starting to occur and prepare both yourself, and the organization.  You’re able to comfortably adapt to these architecture changes because you own the service of networking — *you* are a Service Provider.  Things such as Connectivity, Routing, Security, High Availability, Access, Performance, Analytics, Reporting, just to name a few; these services are perpetual and platform independent.  You’ve put yourself in a position to help the organization navigate the ever changing landscape of applications and IT architecture, keeping the business one step ahead of its competitor that’s still stuck on legacy platforms and architectures.

Your value to the organization is much different now.  It’s no longer a situation of “I need this person to configure and manage that gear over there”.  Rather, it’s now in the realm of “I need this person to keep the business competitive and relevant in an ever changing technology landscape”.

I believe Network Virtualization (e.g. VMware NSX) really enables this shift in both platform, architecture, and career value.  Networking *services* (the things we really care about) are finally abstracted and decoupled from infrastructure, and become portable across a variety of architectures, platforms, and for that matter, service providers.  It makes it easier to provide a clean separation of the (more interesting) services that provide value, from the (less interesting) infrastructure that supports it.

Over time, everything will change — both the services and the infrastructure, but probably not at the same pace.  The decoupling of services from infrastructure, provided by Network Virtualization, allows us to:

  • Change, add, and optimize services quickly — without changing infrastructure
  • Change, add, and optimize infrastructure — without changing the services

It’s that basic freedom that allows Networking to be elevated and identified as a perpetual and discrete service to which the organization can associate tangible business value.  And the person who owns that service is linked to that value.  There’s a hero waiting to be made here.  Is it going to be you, or someone else?  If you ask me, there’s no more exciting time in Networking than right now.  The opportunity at hand now will not come around again.


New design guide: VMware NSX with Cisco UCS and Nexus 7000

Back in September 2013 I wrote a piece on why you would deploy VMware NSX with your Cisco UCS and Nexus gear.  The gist being that NSX adds business agility, a rich set of virtual network services, and orders of magnitude better performance and scale to these existing platforms.  The response to this piece was phenomenal with many people asking for more details on the how.

The choice is clear.  To obtain a more agile IT infrastructure you can either:

  1. Rip out every Cisco UCS fabric interconnect and Nexus switch hardware you’ve purchased and installed, then proceed to repurchase and re-install it all over again (ASIC Tax).
  2. Add virtualization software that works on your existing Cisco UCS fabric interconnects and Nexus switches, or any other infrastructure.

To help you execute on choice #2, we decided to write a design guide that provides more technical details on how you would deploy VMware NSX for vSphere with Cisco UCS and Nexus 7000.  In this guide we provide some basic hardware and software requirements and a design starting point.  Then we walk you through how to prepare your infrastructure for NSX, how to design your host networking and bandwidth, how traffic flows, and the recommended settings on both Cisco UCS and VMware NSX.  As a bonus there is 48 x 36 poster that includes most of the diagrams from the guide and some extra illustrations.

Download the 48 x 36 poster here (PDF):


Download the full design guide here (PDF):





Distributed virtual and physical routing in VMware NSX for vSphere

This post is intended to be a primer on the distributed routing in VMware NSX for vSphere, using a basic scenario of L3 forwarding between both virtual and physical subnets. I’m not going to bore you with all of the laborious details, just the stuff that matters for the purpose of this discussion.

In VMware NSX for vSphere there are two different types of NSX routers you can deploy in your virtual network.

  1. The NSX Edge Services Router (ESR)
  2. The NSX Distributed Logical Router (DLR)

Both the ESR and DLR can run dynamic routing protocols, or not.  They can just have static/default routes if you like.

The ESR is a router in a VM (it also does other L4-L7 services like FW, LB, NAT, VPN, if you want).  Both the control and data plane of the ESR router are in the VM.  This VM establishes routing protocol sessions with other routers and all of the traffic flows through this VM.  It’s like a router, but in a VM.  This should be straight forward, not requiring much explanation.

The ESR is unique because it’s more than a just router.  It’s also a feature rich firewall, load balancer, and VPN device.  Because of that, it works well as the device handling the North-South traffic at the perimeter of your virtual network.  You know, the traffic coming from and going to the clients, other applications, other tenants.  And don’t be fooled.  Just because it’s a VM doesn’t mean the performance is lacking.  Layer 4 firewall and load balancer operations can reach and exceed 10 Gbps throughput, with high connections per second (cps).  Layer 7 operations also perform well compared to hardware counterparts.  And because it’s a VM, well, you can have virtually unlimited ESRs running in parallel, each establishing the secure perimeter for their own “tenant” enclave.

The DLR is a different beast.  With the DLR the data plane is distributed in kernel modules at each vSphere host, while only the control plane exists in a VM.  And that control plane VM also relies on the NSX controller cluster to push routing updates to the kernel modules.

The DLR is unique because it enables each vSphere hypervisor host to perform L3 routing between virtual and physical subnets in the kernel at line rate.  The DLR is configured and managed like one logical router chassis, where each hypervisor host is like a logical line card.  Because of that the DLR works well as the “device” handling the East-West traffic in your virtual network.  You know, the traffic between virtual machines, the traffic between virtual and physical machines, all of that backend traffic that makes your application work.  We want this traffic to have low latency and high throughput, so it just makes sense to do this as close to the workload as possible, hence the DLR.

The ESR and DLR are independent.  You can deploy both in the same virtual network, just one, or none.

Now that we’ve established the basic difference and autonomy between the ESR and DLR, in this blog we’ll focus on the DLR.  Let’s look at a simple scenario where we have just the DLR and no ESR.

Let’s assume a simple situation where our DLR is running on two vSphere hosts (H1 and H2) and has three logical interfaces:

  • Logical Interface 1: VXLAN logical network #1 with VMs (LIF1)
  • Logical Interface 2: VXLAN logical network #2 with VMs (LIF2)
  • Logical Interface 3: VLAN physical network with physical hosts or routers/gateways (LIF3)

Routers have interfaces with IP addresses and the DLR is no different.  Each vSphere host running the DLR has an identical instance of these three logical interfaces, with identical IP and MAC addresses (with the exception of LIF3).

  • The IP address and MAC address on LIF1 is the same on all vSphere hosts (vMAC)
  • The IP address and MAC address on LIF2 is the same on all vSphere hosts (vMAC)
  • The IP address on LIF3 is the same on all vSphere hosts, however the MAC address on LIF3 is unique per vSphere host (pMAC)

LIFs attached to physical VLAN subnets will have unique MAC addresses per vSphere host.

Side note: the pMAC cited here is not the physical NIC MAC.  It’s different.

The DLR kernel modules will route between VXLAN subnets.  If for example VM1 on Logical Network #1 wants to communicate with VM2 on Logical Network #2, VM1 will use the IP address on LIF1 as it’s default gateway, and the DLR kernel module will route the traffic between LIF1 and LIF2 directly on the vSphere host wherever VM1 resides.  The traffic will then be delivered to VM2, which might be on the same vSphere host, or perhaps another vSphere host where VXLAN encapsulation on Logical Network #2 will be used to deliver the traffic to the hypervisor host where VM2 resides.  Pretty straight forward.

VMware NSX Distributed Logical Router for vSphere

The DLR kernel modules can also route between physical and virtual subnets.  Let’s see what happens when a physical host PH1 (or router) on the physical VLAN wants to deliver traffic to a VM on a VXLAN logical network.

PH1 either has a route or default gateway pointing at the IP address of LIF3.
PH1 issues an ARP request for the IP address present on LIF3.
Before any of this happened, the NSX controller cluster picked one vSphere host to be the Designated Instance (DI) for LIF3.

  • The DI is only needed for LIFs attached to physical VLANs.
  • There is only one DI per LIF.
  • The DI host for one LIF might not be the same DI host for another LIF.
  • The DI is responsible for ARP resolution.

Let’s presume H1 is the vSphere host selected as the DI for LIF3, so H1 responds to PH1’s ARP request, replying with its own unique pMAC on its LIF3.
PH1 then delivers the traffic to the DI host, H1.
H1 then performs a routing lookup in its DLR kernel module.
The destination VM may or may not be on H1.
If so, the packet is delivered directly. (i)
If not, the packet is encapsulated in a VXLAN header and sent directly to the destination vSphere host, H2. (ii)

For (ii) return traffic, the vSphere host with the VM (H2 in this case) will perform a routing lookup in its DLR kernel module and see that the output interface to reach PH1 is its own LIF3.  Yes, if a DLR has a LIF attached to a physical VLAN, each vSphere host running the DLR had better be attached to that VLAN.

Each LIF on the DLR has its own ARP table.  By consequence, each vSphere host in the DLR carries an ARP table for each LIF.
The DLR ARP table for LIF3 may be empty or not contain an entry for PH1, and because H2 is not the DI for LIF3, it’s not allowed to ARP.  So instead H2 sends a UDP message to the DI host (H1) asking it to perform the ARP.

Note: The NSX controller cluster, upon picking H1 as the DI, informed all hosts in the DLR that H1 was the DI for LIF3.

The DI host for LIF3 (H1) issues an ARP request for PH1 and subsequently sends a UDP response back to H2 containing the resolved information. H2 now has an entry for PH1 on its LIF3 ARP table and delivers the return traffic directly from the VM to PH1.  The DI host (H1) is not in the return data path.

All of that happened with just a DLR and static/default routes (no routing protocols).

The DLR can also run IP routing protocols — both OSPF and BGP.

In the case where the DLR is running routing protocols with an upstream router, the DLR will consume two IP addresses on that subnet. One for the LIF in the DLR kernel module in each vSphere host, and one for the DLR control VM.  The IP address on the DLR control VM is not a LIF, it’s not present in the DLR kernel modules of the vSphere hosts, it only exists on the control VM and will be used for establishing routing protocol sessions with other routers — this IP address is referred to as the “Protocol Address”.

The IP address on the LIF will be used for the actual traffic forwarding between the DLR kernel modules and the other routers — this IP address is referred to as the “Forwarding Address” — and is used as the next-hop address in routing advertisements.

When the DLR has a routing adjacency with another router on a physical VLAN, the same process described earlier concerning Designated Instances happens when the other router ARPs for the DLR’s next-hop forwarding address.  Pretty straight forward.

If however the DLR has a routing adjacency with the “other” router on a logical VXLAN network — such as with a router VM running on a vSphere host (eg. ESR) — where that vSphere host is also running the DLR — then no Designated Instance process is needed because the DLR LIF with the Forwarding Address will always be present on the same host as the “other” router VM.  How’s that for a Brain Twister? 😉

The basic point here is that the DLR provides optimal routing between virtual subnets, and physical subnets, and can establish IP routing sessions with virtual and physical routers.

One example where this would work might be a three tier application where each tier is its own subnet.  The Web and App tiers might be virtual machines on VXLAN logical networks, whereas the Database machines might be non-virtualized physical hosts on a VLAN.  The DLR can perform optimal routing between these three subnets, virtual and physical, as well as dynamically advertise new subnets to the data center WAN or Internet routers using OSPF for BGP.

Pretty cool, right?

Stay tuned.  More to come…