“What will my next generation data center networking platform look like?”
“How do I describe this platform to IT managers and begin to wrap my arms around it?”
This post attempts to provide a framework for that discussion, in which I’ll argue that the platform for the next generation data center network has already taken shape. It’s called Network Virtualization, and it looks a lot like the networking platforms we’re already familiar with.
Over the last 15 years the networking industry has been challenged with tremendous growth in all areas of the infrastructure, but none more challenging than the data center. As networking professionals we have built data centers to grow from tens, to hundreds, to thousands of servers – all while undergoing a transition to server virtualization and cloud computing. How on earth did we manage to make it this far? Platforms.
More specifically: flexible, scalable, robust, adaptable, modular switching platforms.
A platform for data center networks
As networking professionals we have relied on these modular switching platforms as a foundation to build any network, connect to any network and any host, meet any requirement, and build an architecture that is both scalable and easy to manage with L2-L4 services. This is evident by observing the phenomenal success of the modular chassis switch, a marvel in network engineering for architecting the physical data center network.
There have been many different modular switching platforms over the years, each with their own differentiating features – but the baseline fundamental architecture is always the same. There is a supervisor engine which provides a single point of configuration and forwarding policy control (the management and control plane). There are Linecards (the data plane) with access ports that enact a forwarding policy prescribed by the supervisor engine. And finally there are fabric modules that provide the forwarding bandwidth from one linecard to the next.
In general, we see this very same architecture in almost all network switch platforms (virtual or physical), which boils down to three basic components.
- Control point (switch CPU, or supervisor engine)
- Edge forwarding components (port ASIC, or linecards)
- Fabric (switch ASIC, or fabric modules)
The scale at which this architecture is realized varies based on the implementation of the switch (e.g. fixed, or modular).
Why has this architecture been so successful? I can think of several reasons:
Consistency – A single forwarding policy is defined at one control point, the supervisor engine, which automatically deploys that policy to the appropriate linecards. The supervisor engine ensures that each linecard configuration is correct and consistent with policy. This consistency model also applies to the forwarding tables. As each linecard learns the MAC address connected to a port, the supervisor engine ensures that all other linecards have the same consistent forwarding table.
Simplicity – Each linecard locally implements a forwarding lookup and enforces a policy (security, QoS, etc.) upon receiving traffic — determines a destination linecard — and relies upon the fabric modules to provide the non-blocking bandwidth to the destination linecard. There is no need to re-implement the forwarding policy again on the fabric modules. Additionally, there is no need to populate the full forwarding tables on the fabric modules either, because forwarding in the fabric is based on information added by the source linecard that identifies the destination linecard. In total, this architecture simplifies both the fabric module design and the overall implementation of the modular chassis switch.
Scale – The switch architecture in its entirety represents a single logical forwarding pipeline of input ports, a policy or service to implement, and output ports. After all, this is the very essence of what a network should do:
- Receive traffic
- Apply a service or policy
- Forward to a final destination
The configuration complexity required to implement that basic pipeline across a network of discrete devices impacts the overall scalability of the network. Traffic steering is one of those complexities, where network teams must weave together an intricate tapestry of VLANs, VRFs, VDCs, appliance contexts, etc. across many devices just to establish the logical pipeline with isolation and multi-tenancy. Inside a single switch however, traffic steering is handled by the switch architecture, not the operator of the switch. Hence scalable switch architectures such as a modular chassis switch work to reduce the configuration touch points required to implement traffic steering and the forwarding pipeline across the overall physical network.
Setting aside for a moment the impracticality of massive sheet metal, cabling, silicon, and vendor lock-in: If it were possible to build and install one large all-encompassing chassis switch for the entire data center, imagine the simplicity it would afford in implementing the basic forwarding pipeline. We would have a single control point for all ports, consistency in policy and forwarding, and no complex traffic steering.
Indeed the massive hardware chassis switch is just a fantasy for the physical network. However the majority of endpoints in the data center are now virtual – attached to a virtual network made up of virtual switches. And unlike a physical switch, the scope of Network Virtualization is not constrained by hardware elements such as sheet metal, cabling, and silicon. Instead, a network virtualization platform is only constrained by software, standard transport interfaces (IP), and open control protocols (API).
Next we’ll explore how the VMware/Nicira network virtualization platform provides a common logical switching architecture at an all-encompassing scale for the data center virtual network.
A platform for Network Virtualization
Now let’s look at how this very same proven and familiar modular switch architecture has manifested itself once again to become a next-generation platform for the data center network. Remember our three basic architecture components: 1) Control point, 2) Edge forwarding, and 3) Fabric. All of these still apply, only now at a much larger, more encompassing scale.
Before we begin, let’s first recognize what it is we are trying to accomplish. Remember the essence of what a network should do: 1) receive traffic, 2) apply a service or policy, 3) forward to the final destination. This, again, is the essence of what we want to accomplish – implement a logical forwarding pipeline for the virtual network – with all the properties of consistency, simplicity, and scale.
Let’s begin with the Edge. This is where traffic is first received on the virtual network – the insertion point of ingress policy in our forwarding pipeline. And this of course is what we know today to be the virtual switch present in hypervisor hosts. Two obvious examples of the virtual Edge are the VMware vSwitch, and Open Virtual Switch (OVS). The virtual edge is effectively the “Linecard” of the network virtualization platform (NVP). And these edge Linecard devices are “wired” to each other as needed with tunnels, configured dynamically by the Controller.
One notable difference from physical switch architecture is that our network virtualization platform is not limited to a small subset of vendor specific linecards, or vendor specific fabric modules. This is because the logical chassis is constructed with open source software at the edge (OVS), linked together with soft cabling (STT, VXLAN, GRE tunnels), over any network fabric, and controlled with open APIs such as OpenFlow and OVSDB. This creates a platform ripe for an ecosystem. For example, in addition to a virtual switch, other possibilities for a virtual Edge linecard include 3rd party Top of Rack switches (for connecting physical hosts to the virtual network), and 3rd party network services appliances for attaching specialized network services to the virtual network and forwarding pipeline.
Similar to a supervisor engine of a modular chassis switch, the virtual edge linecards are programmed with a forwarding policy from a central controller. Specifically, a scale-out software-defined network (SDN) controller cluster made up of x86 machines capable of managing close to a thousand virtual edge linecards. Just as a supervisor engine has a management interface supporting protocols such as SSH and SNMP, the SDN controller cluster has an API interface for configuring the virtual network, and supporting any potential upstream cloud management platform (CMP) such as OpenStack, VMware vCloud, CloudStack.
Similar to the supervisor engine, the SDN controller ensures consistent policy and forwarding tables across all virtual linecards. For example, when a virtual machine is powered on or migrated, all linecards requiring knowledge of this event are updated and configured by the controller. Similar to a Linecard in a modular chassis switch, the forwarding table of the virtual linecard maps a destination endpoint to a destination Linecard. In this case, the destination linecard is identified by its IP address in a tunnel header. The controller has a global view of the virtual network. It knows the location and network policy of each virtual machine and is able to program that view when and where needed.
And finally we have the Fabric. In the modular chassis switch, the fabric is made up of fabric modules supplied by the switch vendor providing the forwarding bandwidth between linecards. In a network virtualization platform, the fabric is the physical network – which itself could be constructed with modular chassis switches, or perhaps a distributed architecture of fixed switches. Either way, the physical network provides forwarding bandwidth between all of the virtual Edge linecards. And the fabric for network virtualization can be supplied by any switching vendor – similar to how hardware for server virtualization can be supplied by any server vendor.
Similar to the fabric modules of a chassis switch, the physical network fabric is not configured with the same forwarding policy and forwarding tables as deployed in the virtual edge linecards. Consider that fabric modules of a chassis switch have no awareness of linecard configurations such as QoS, VLANs, ACLs, VRFs, NAT, etc. — the same is true for network virtualization. Any network configuration that implements the forwarding pipeline and virtual network viewed by a virtual machine is only necessary at the Edge, and programmed automatically by the Controller.
As a result, network teams do not need to configure the multitude of physical switches with traffic steering and network configurations that construct the virtual network, such as VLANs, VRF, VDC, QoS, ACL, etc. Consequently, the physical network is free to evolve independently of the virtual network, and designed around criteria of scale, throughput, and robust network architecture (Layer 3 ECMP).
The VMware/Nicira Network Virtualization Platform
The network virtualization platform (NVP) from VMware/Nicira is the first solution to deliver full network virtualization and deployed in production at some of the largest service providers and enterprises. NVP is a standalone L2-L7 data center networking platform designed to work on any network fabric, work with any hypervisor, connect to any external network, and deployable with any cloud management platform (CMP).
Through full network virtualization, NVP is able to create a complete multilayer network abstraction exposing logical network devices such as logical switches, logical routers, and more. These logical devices can be configured with security and monitoring policies, and attached to each other in any arbitrary topology through the NVP API. The NVP Controller programs the logical topology at the virtual edge. With this programmatic control, the logical network has the speed of configuration and operational model similar to a virtual machine – create, start, stop, clone, snapshot, audit, migrate, etc.
Looking ahead, NVP will serve as a platform for ecosystem partners to plug-in physical (or virtual) devices such as Top of Rack switches and Network Services appliances into the architecture like a “Linecard”, based on protocols and APIs such as VXLAN and OVSDB. Network architects will be able to present these 3rd party switches and services as logical devices in the logical network, while NVP systematically implements any necessary traffic steering with tunnels to abstract a simplified view of the forwarding pipeline.
Just as a modular chassis switch can connect to any external network, NVP Gateways provide an edge that connects to any standard Layer 2 or Layer 3 external physical network. Network architects can attach the external physical networks anywhere in the logical network through the NVP API. Gateways can also extend logical networks to a remote site using secure IP tunnels (IPSec + STT). And multiple NVP Gateways can be deployed for scale-out performance and high availability.
In addition to NVP Gateways providing a connection to any external network, NVP Service Nodes provide a connection on any network. Service Nodes are x86 machines managed by the NVP Controller dedicated to performing additional CPU intensive packet processing services such as handling broadcast, unknown unicast, multicast (BUM) frames, and encryption (IPSec) — offloading that work from hypervisor hosts. The handling of BUM frames by a scale-out cluster of Service Nodes provides scalable network virtualization on any network, without requiring the limited scale and complexity of an IP multicast deployment in the physical network.
Eschew tradeoffs of Good Design vs. Speed and Flexibility
A network switch is useless until it’s been provided with a configuration, and a network virtualization platform is no different. However there is one important difference. Physical network switches have always been designed under the assumption that once a switch has been configured, the configuration is not going to change that often. Because physical network topology is assumed to stay stable once established, and physical servers are added and moved infrequently. As such, the CLI has been a suitable interface for configuration change that happens over longer time scales.
The virtual network, on the hand, is completely different. Topology and configuration change is happening all the time – virtual machines are frequently added, removed, and migrating about – and the virtual network configuration must move at a similar time scale. If not, overall provisioning speed and accuracy is bottlenecked by the slowest common denominator – the physical network switches, each with its own CLI.
Before network virtualization, the physical network needed to play a role in constructing the end-to-end virtual network used by the virtual machines. A virtual machine was just another host on the physical network. Traffic steering with VLANs, VRF, VDC, ACL, NAT, etc. needed to be configured by hand with a CLI on numerous switches — a time-consuming process prone to error and inconsistency.
As a result, the significant delta in provisioning speed between virtual machines and virtual networks brought about a contentious tradeoff: You can have faster network provisioning with a precarious network design (such as all VLANs preemptively flooded on every port and large Layer 2 domains). Or, you can have a good network design but with slow and limited provisioning (such as virtual machine networks limited to certain racks and services anchored to a physical network chokepoint). You can’t have both good network design and service provisioning speed. Not until you’ve decoupled virtual and physical network configuration through a network virtualization platform.
With the virtual network fully abstracted from physical switch hardware, through network virtualization, we are now free to use a configuration mechanism specifically for the virtual layer that’s better suited to the faster time scale of virtual networks – the API. And the physical switch configurations need only provide a topology to deliver forwarding bandwidth that doesn’t need to change that often — for which the existing CLI is well suited. As such the manner in which network operators configure the physical network today need not change with network virtualization.
The era of software centric networking platforms
The next generation network virtualization platforms such as NVP closely resemble the switch architectures we’ve deployed over the years to build highly scalable and robust physical networks. What’s different is that the primary elements enabling this platform are software driven, such as SDN controllers and virtual switches, connected together via standard transports (tunnels), and controlled via standard API interfaces (OpenFlow and OVSDB).
At VMware we believe that virtualization software providers are best equipped to deliver and package a software driven networking platform for the virtual network. For example, NVP was the first network virtualization platform, and already in production at many service provider and enterprise data centers. While we do expect network vendors to deliver similar “network virtualization” platforms aimed at the virtual network – their execution is likely to come with caveats that require only their physical network hardware.
For the same reason that it makes sense to support sever virtualization on any server hardware, network virtualization should provide the same basic principle of deployment on any network hardware. Otherwise, it’s not really virtualization.
Engineering Architect, Virtual Networking