What is Network Virtualization?

Data centers exist for the sole purpose to deploy applications. Applications that automate business processes, serve customers better, enter new markets … you get the idea. It’s all about the Apps.

Server Virtualization

Applications are composed with both Compute and Network resources. It doesn’t make sense to have one without the other; a symbiotic relationship. And for the last decade, one half of that relationship (Compute) has been light years ahead of the other (Network). Compute and Network is a symbiotic relationship lacking any symmetry.

For example, it’s possible to deploy (virtual servers) the Compute of an application within seconds, through powerful automation enabled by software on general purpose hardware — Server Virtualization. The virtual network, on the other hand, is still provisioned manually, on specialized hardware, with keyboards and CLIs. Meanwhile the application deployment drags on for days, weeks, or longer, until the network is finally ready.

Server virtualization also enabled Compute with awesomeness like mobility, snapshots, and push button disaster recovery — to name a few. The network, on the other hand, doesn’t have the same capabilities. There is no mobility – the network configuration is anchored to hardware. Snapshots of the application’s network architecture is next to impossible because the network configuration state is spread across a multitude of disparate network devices (physical and virtual). And recreating the application’s network architecture at a second data center (disaster recovery) is a house of cards (at best), if not impossible, without the same automation, untethered mobility, and snapshots. The Compute portion of the application, with all of its virtualization capabilities, is held back from reaching its full potential, anchored to the non-virtualized network.

Network Virtualization is a solution with products that bring symmetry to the symbiotic relationship of Compute & Network. With network virtualization, the application’s virtual Network is provisioned in lock step with virtual Compute, with the same level of speed, automation, and mobility. With compute and network working in symmetry, through Server & Network Virtualization, compute and network are deployed together – rather than one waiting for the other. Applications are fully decoupled, with fully automated provisioning, and truly mobile.

What is Virtualization?

Virtualization is the basic act of decoupling an infrastructure service from the physical assets on which that service operates. The service we want to consume (such as Compute, or Network) is not described on, identified by, or strictly associated to any physical asset. Instead, the service is described in a data structure, and exists entirely in a software abstraction layer reproducing the service on any physical resource running the virtualization software. The lifecycle, identity, location, and configuration attributes of the service exists in software with API interfaces, thereby unlocking the full potential of automated provisioning.

The canonical example is Server Virtualization, where the familiar attributes of a physical server are decoupled and reproduced in virtualization software (hypervisor) as vCPU, vRAM, vNIC, etc., and assembled in any arbitrary combination producing a unique virtual server in seconds.

The same type of decoupling and automation enabled by server virtualization is made available to the virtual network with Network Virtualization.

What is the Network?

Virtual machines supporting the application often require network connectivity (switching and routing) to other virtual machines and the outside word (WAN/Internet) with security and load balancing. The first network device virtual machines are attached to is a software virtual switch on the hypervisor. The “network” we want to virtualize is the complete L2-L7 services viewed by the virtual machines, and all of the network configuration state necessary to deploy the application’s network architecture (n-tier, etc). The network relevant to the virtual machines is sometimes more specifically referred to as the virtual network.

Virtual servers have been fully decoupled from physical servers by server virtualization. The virtual network, on the other hand, has not been fully decoupled from the physical network. Because of this, the configuration necessary to provision an application’s virtual network must be carefully engineered across many physical and virtual switches, and L4-L7 service appliances. Despite the best efforts of server virtualization, the *application* is still coupled to hardware.

With Network Virtualization, the goal is to take all of the network services, features, and configuration necessary to provision the application’s virtual network (VLANs, VRFs, Firewall rules, Load Balancer pools & VIPs, IPAM, Routing, isolation, multi-tenancy, etc.) – take all of those features, decouple it from the physical network, and move it into a virtualization software layer for the express purpose of automation.

With the virtual network fully decoupled, the physical network configuration is simplified to provide packet forwarding service from one hypervisor to the next. The implementation details of physical packet forwarding are separated from, and not complicated by, the virtual network. Both the virtual and physical network can evolve independently. The virtual network features and capabilities evolve at software release cycle speeds (months). The physical network packet forwarding evolves at hardware release cycle speeds (years).

Packet forwarding is not the point of friction in provisioning applications. Current generation physical switches do this quite well with dense line-rate 10/40/100G silicon and standard IP protocols (OSPF, BGP). Packet forwarding is not the problem. The problem addressed by network virtualization is the manual deployment of network policy, features, and services constructing the network architecture viewed by application’s compute resources (virtual machines).

Network Virtualization

Network Virtualization reproduces the L2-L7 network services necessary to deploy the application’s virtual network at the same software virtualization layer hosting the application’s virtual machines – the hypervisor kernel and its programmable virtual switch. Similar to how server virtualization reproduces vCPU, vRAM, and vNIC – Network Virtualization software reproduces Logical switches, Logical routers (L2-L3), Logical Load Balancers, Logical Firewalls (L4-L7), and more, assembled in any arbitrary topology, thereby presenting the virtual compute a complete L2-L7 virtual network topology.

All of the feature configuration necessary to provision the application’s virtual network can now be provisioned at the software virtual switch layer through APIs. No CLI configuration per application is necessary in the physical network. The physical network provides the common packet forwarding substrate. The programmable software virtual switch layer provides the complete virtual network feature set for each application, with isolation and multi-tenancy.

Server & Network Virtualization

With Network Virtualization the virtual network is entirely provisioned in software, by software, with APIs, at the same speed and agility and in lock step with server virtualization. The same software tools already provisioning the application’s virtual machines can simultaneously provision both compute and network together (with templates), and subsequently validate the complete application architecture — compute and network together.

Next, rather than just taking snapshots of virtual machines, take a snapshot of the complete application architecture (compute and network) and ship a copy off to a disaster recovery site – on standby for push button recovery. The application’s network is finally equally mobile and running as fast as the compute.

Network Virtualization makes sense because of Server Virtualization. Compute and Network, a symbiotic relationship deployed in synchronization, with symmetry.

It’s a no-brainer.

Cheers,
Brad

Reporting from the front lines of network transformation

It’s been a while :-)

So what gives? Well, I’ve been spending most of my time on the front lines: meeting with customers, breaking the ice, laying out the fundamental case for Network Virtualization, face to face, heart to heart. Just a whiteboard, rolled up sleves, and a room full of intelligent IT converstationalists.

This is, actually, my favorite thing to do.

I’m not a real big fan of the formal presentation, the pomp and pageant of tech conferences, or endless pontificatating from atop some ivory tower “Office of the CTO” … “customers want this, customers want that, blah, blah, blah”. Not to minimize that stuff. It’s important too, and there’s always a time and place for that.

But there’s nothing better than having a raw, unscripted conversation, laying out the core concepts of a transformative networking tech and seeing where the dialogue takes you, and learning a few new things with each discussion.

And there’s never a shortage of things to talk about when the topic is Network Virtualization.

When you look what it takes to deploy an application, all the VMs and network services, you’ll find that network provisioning is a tremedous drag — up and down the stack — the VLANs, Firewalls, Load balancers, Routing (VRF), ACLs, QoS, IP addressing, DNS, ACLs, Monitoring, NAT, VPN, the list goes on.  Now try to pick that application up (network services and all) and move it to another data center … <pound head here>

The virtual machines are in this 21st century world of sofware automation, common hardware, API’s, mobility, and rapid provisioning. Provisioning the network, on the other hand, is still stuck in this 1990′s era of humans, keyboards, CLIs, specialized hardware, and chokepoints.  Despite the best efforts of server virtualization, the application is still not fully decoupled from hardware.

When you think about it … the problem with networking is NOT packet forwarding.  That’s one thing the networking industry has done really really well.  We have these wonderful line rate 10/40/100G switches running extremely well engineered and robust distributed routing protocols such as OSPF/BGP/ISIS. We don’t need to re-invent that.

The problem with networking is the manual deployment of networking services and policy.  All the stuff you need to configure in network hardware to get a new application online (or moved to another data center).

Contrary to the current SDN hype — we don’t need to decouple network hardware control planes from data planes.  Rather, we need to decouple the network policy from packet forwarding. Network Virtualization.

Networking needs to evolve.  Everybody seems to agree.

How do you do that?  Decouple, Distribute, Automate.

Decouple the application from networking hardware (finally!) — the entire L2-L7 stack.  Move the workload’s network closer to the workload — at the edge software layer.

Distribute networking services at the software edge.  Distributed in-kernel L3 routing.  Distributed in-kernel statefull firewall.  No more chokepoints.  Move the services to the workload.  Stop moving workloads to the services. End the traffic steering madness.

Automate the complete L2-L7 virutal network deployment in lock step with the compute.  The cloud provisioning system should be deploying the entire application stack — the VMs and its complete virtual network.  Throw some API messages at the server virtualization software. Throw some API messages at the network virtualization software.  Validate and snapshot the whole thing.

Now we’re talk’n :)

Cheers,
Brad

On “VMware’s SDN Dilemma: VXLAN or Nicira?”

Some commentary on a blog published by Networking Computing titled “VMware’s SDN Dilemma: VXLAN or Nicira?”

VMware has a technology problem: it’s backing two competing standards for overlay networks: Nicira’s STT and the IETF draft standard VXLAN

Nonsense.  As of right now, STT tunneling provides the best performance for network virtualization (wire speed).  If and when VXLAN (or some derivative) becomes the best option, it’s just a matter of adding VXLAN as another choice of tunneling protocol from which to configure the system – if not already there.  That’s not a “technology problem”.  That’s providing the right tools at the right time — facilitating a transition from one generation to the next (from early adopters to wide-spread deployment).

… limited entropy in the STT header means it doesn’t balance loads evenly over Ethernet port bundles in network backbones. Depending on your network design, this could be a significant limitation.

This is just factually incorrect.  The TCP source port in the STT outer header is derived from a hash of the internal frame’s header.  Individual flows carried by STT will have a different TCP source port in the other header.  This provides maximum flow level granularity (entropy) for optimal load balancing for ECMP/LAG paths on standard hardware in the physical network.  This is discussed in section 2.5 of the STT informational draft. By the way, this is the same method employed by VXLAN.

NVGRE is the tunneling protocol (pushed by Microsoft) with poor handling of flow level granularity.  Section 4.8 of the NVGRE draft states that “NVGRE-Aware” network devices would be required to realize the best flow level entropy and optimal load balancing on ECMP/LAG paths. Perhaps the author confused STT with NVGRE?

Network Virtualization: a next generation modular platform for the data center virtual network

“What will my next generation data center networking platform look like?”

“How do I describe this platform to IT managers and begin to wrap my arms around it?”

This post attempts to provide a framework for that discussion, in which I’ll argue that the platform for the next generation data center network has already taken shape.   It’s called Network Virtualization, and it looks a lot like the networking platforms we’re already familiar with.

Over the last 15 years the networking industry has been challenged with tremendous growth in all areas of the infrastructure, but none more challenging than the data center.  As networking professionals we have built data centers to grow from tens, to hundreds, to thousands of servers – all while undergoing a transition to server virtualization and cloud computing.  How on earth did we manage to make it this far? Platforms.

More specifically: flexible, scalable, robust, adaptable, modular switching platforms.

A platform for data center networks

As networking professionals we have relied on these modular switching platforms as a foundation to build any network, connect to any network and any host, meet any requirement, and build an architecture that is both scalable and easy to manage with L2-L4 services.  This is evident by observing the phenomenal success of the modular chassis switch, a marvel in network engineering for architecting the physical data center network.

There have been many different modular switching platforms over the years, each with their own differentiating features – but the baseline fundamental architecture is always the same.  There is a supervisor engine which provides a single point of configuration and forwarding policy control (the management and control plane).  There are Linecards (the data plane) with access ports that enact a forwarding policy prescribed by the supervisor engine.  And finally there are fabric modules that provide the forwarding bandwidth from one linecard to the next.

In general, we see this very same architecture in almost all network switch platforms (virtual or physical), which boils down to three basic components.

  1. Control point (switch CPU, or supervisor engine)
  2. Edge forwarding components (port ASIC, or linecards)
  3. Fabric (switch ASIC, or fabric modules)

The scale at which this architecture is realized varies based on the implementation of the switch (e.g. fixed, or modular).

Why has this architecture been so successful?  I can think of several reasons:

Consistency – A single forwarding policy is defined at one control point, the supervisor engine, which automatically deploys that policy to the appropriate linecards.  The supervisor engine ensures that each linecard configuration is correct and consistent with policy.  This consistency model also applies to the forwarding tables.  As each linecard learns the MAC address connected to a port, the supervisor engine ensures that all other linecards have the same consistent forwarding table.

Simplicity – Each linecard locally implements a forwarding lookup and enforces a policy (security, QoS, etc.) upon receiving traffic — determines a destination linecard — and relies upon the fabric modules to provide the non-blocking bandwidth to the destination linecard.  There is no need to re-implement the forwarding policy again on the fabric modules.  Additionally, there is no need to populate the full forwarding tables on the fabric modules either, because forwarding in the fabric is based on information added by the source linecard that identifies the destination linecard.  In total, this architecture simplifies both the fabric module design and the overall implementation of the modular chassis switch.

Scale – The switch architecture in its entirety represents a single logical forwarding pipeline of input ports, a policy or service to implement, and output ports.  After all, this is the very essence of what a network should do:

  1. Receive traffic
  2. Apply a service or policy
  3. Forward to a final destination

The configuration complexity required to implement that basic pipeline across a network of discrete devices impacts the overall scalability of the network.  Traffic steering is one of those complexities, where network teams must weave together an intricate tapestry of VLANs, VRFs, VDCs, appliance contexts, etc. across many devices just to establish the logical pipeline with isolation and multi-tenancy.  Inside a single switch however, traffic steering is handled by the switch architecture, not the operator of the switch.  Hence scalable switch architectures such as a modular chassis switch work to reduce the configuration touch points required to implement traffic steering and the forwarding pipeline across the overall physical network.

Setting aside for a moment the impracticality of massive sheet metal, cabling, silicon, and vendor lock-in:  If it were possible to build and install one large all-encompassing chassis switch for the entire data center, imagine the simplicity it would afford in implementing the basic forwarding pipeline.  We would have a single control point for all ports, consistency in policy and forwarding, and no complex traffic steering.

Indeed the massive hardware chassis switch is just a fantasy for the physical network.  However the majority of endpoints in the data center are now virtual – attached to a virtual network made up of virtual switches.  And unlike a physical switch, the scope of Network Virtualization is not constrained by hardware elements such as sheet metal, cabling, and silicon.  Instead, a network virtualization platform is only constrained by software, standard transport interfaces (IP), and open control protocols (API).

Next we’ll explore how the VMware/Nicira network virtualization platform provides a common logical switching architecture at an all-encompassing scale for the data center virtual network.

A platform for Network Virtualization

Now let’s look at how this very same proven and familiar modular switch architecture has manifested itself once again to become a next-generation platform for the data center network.  Remember our three basic architecture components: 1) Control point, 2) Edge forwarding, and 3) Fabric.  All of these still apply, only now at a much larger, more encompassing scale.

Before we begin, let’s first recognize what it is we are trying to accomplish.  Remember the essence of what a network should do: 1) receive traffic, 2) apply a service or policy, 3) forward to the final destination.  This, again, is the essence of what we want to accomplish – implement a logical forwarding pipeline for the virtual network – with all the properties of consistency, simplicity, and scale.

Edge

Let’s begin with the Edge.  This is where traffic is first received on the virtual network – the insertion point of ingress policy in our forwarding pipeline.  And this of course is what we know today to be the virtual switch present in hypervisor hosts.  Two obvious examples of the virtual Edge are the VMware vSwitch, and Open Virtual Switch (OVS).  The virtual edge is effectively the “Linecard” of the network virtualization platform (NVP).  And these edge Linecard devices are “wired” to each other as needed with tunnels, configured dynamically by the Controller.

One notable difference from physical switch architecture is that our network virtualization platform is not limited to a small subset of vendor specific linecards, or vendor specific fabric modules.  This is because the logical chassis is constructed with open source software at the edge (OVS), linked together with soft cabling (STT, VXLAN, GRE tunnels), over any network fabric, and controlled with open APIs such as OpenFlow and OVSDB.  This creates a platform ripe for an ecosystem.  For example, in addition to a virtual switch, other possibilities for a virtual Edge linecard include 3rd party Top of Rack switches (for connecting physical hosts to the virtual network), and 3rd party network services appliances for attaching specialized network services to the virtual network and forwarding pipeline.

Controller

Similar to a supervisor engine of a modular chassis switch, the virtual edge linecards are programmed with a forwarding policy from a central controller.  Specifically, a scale-out software-defined network (SDN) controller cluster made up of x86 machines capable of managing close to a thousand virtual edge linecards.  Just as a supervisor engine has a management interface supporting protocols such as SSH and SNMP, the SDN controller cluster has an API interface for configuring the virtual network, and supporting any potential upstream cloud management platform (CMP) such as OpenStack, VMware vCloud, CloudStack.

Similar to the supervisor engine, the SDN controller ensures consistent policy and forwarding tables across all virtual linecards.  For example, when a virtual machine is powered on or migrated, all linecards requiring knowledge of this event are updated and configured by the controller.  Similar to a Linecard in a modular chassis switch, the forwarding table of the virtual linecard maps a destination endpoint to a destination Linecard.  In this case, the destination linecard is identified by its IP address in a tunnel header.  The controller has a global view of the virtual network.  It knows the location and network policy of each virtual machine and is able to program that view when and where needed.

Fabric

And finally we have the Fabric.  In the modular chassis switch, the fabric is made up of fabric modules supplied by the switch vendor providing the forwarding bandwidth between linecards.  In a network virtualization platform, the fabric is the physical network – which itself could be constructed with modular chassis switches, or perhaps a distributed architecture of fixed switches.  Either way, the physical network provides forwarding bandwidth between all of the virtual Edge linecards.  And the fabric for network virtualization can be supplied by any switching vendor – similar to how hardware for server virtualization can be supplied by any server vendor.

Similar to the fabric modules of a chassis switch, the physical network fabric is not configured with the same forwarding policy and forwarding tables as deployed in the virtual edge linecards.  Consider that fabric modules of a chassis switch have no awareness of linecard configurations such as QoS, VLANs, ACLs, VRFs, NAT, etc. — the same is true for network virtualization.  Any network configuration that implements the forwarding pipeline and virtual network viewed by a virtual machine is only necessary at the Edge, and programmed automatically by the Controller.

As a result, network teams do not need to configure the multitude of physical switches with traffic steering and network configurations that construct the virtual network, such as VLANs, VRF, VDC, QoS, ACL, etc.  Consequently, the physical network is free to evolve independently of the virtual network, and designed around criteria of scale, throughput, and robust network architecture (Layer 3 ECMP).

The VMware/Nicira Network Virtualization Platform

The network virtualization platform (NVP) from VMware/Nicira is the first solution to deliver full network virtualization and deployed in production at some of the largest service providers and enterprises.  NVP is a standalone L2-L7 data center networking platform designed to work on any network fabric, work with any hypervisor, connect  to any external network, and deployable with any cloud management platform (CMP).

 

Through full network virtualization, NVP is able to create a complete multilayer network abstraction exposing logical network devices such as logical switches, logical routers, and more.  These logical devices can be configured with security and monitoring policies, and attached to each other in any arbitrary topology through the NVP API.  The NVP Controller programs the logical topology at the virtual edge.  With this programmatic control, the logical network has the speed of configuration and operational model similar to a virtual machine – create, start, stop, clone, snapshot, audit, migrate, etc.

Looking ahead, NVP will serve as a platform for ecosystem partners to plug-in physical (or virtual) devices such as Top of Rack switches and Network Services appliances into the architecture like a “Linecard”, based on protocols and APIs such as VXLAN and OVSDB.  Network architects will be able to present these 3rd party switches and services as logical devices in the logical network, while NVP systematically implements any necessary traffic steering with tunnels to abstract a simplified view of the forwarding pipeline.

Just as a modular chassis switch can connect to any external network, NVP Gateways provide an edge that connects to any standard Layer 2 or Layer 3 external physical network.  Network architects can attach the external physical networks anywhere in the logical network through the NVP API.  Gateways can also extend logical networks to a remote site using secure IP tunnels (IPSec + STT).  And multiple NVP Gateways can be deployed for scale-out performance and high availability.

In addition to NVP Gateways providing a connection to any external network, NVP Service Nodes provide a connection on any network.  Service Nodes are x86 machines managed by the NVP Controller dedicated to performing additional CPU intensive packet processing services such as handling broadcast, unknown unicast, multicast (BUM) frames, and encryption (IPSec) — offloading that work from hypervisor hosts.  The handling of BUM frames by a scale-out cluster of Service Nodes provides scalable network virtualization on any network, without requiring the limited scale and complexity of an IP multicast deployment in the physical network.

Eschew tradeoffs of Good Design vs. Speed and Flexibility

A network switch is useless until it’s been provided with a configuration, and a network virtualization platform is no different.  However there is one important difference.  Physical network switches have always been designed under the assumption that once a switch has been configured, the configuration is not going to change that often.  Because physical network topology is assumed to stay stable once established, and physical servers are added and moved infrequently.  As such, the CLI has been a suitable interface for configuration change that happens over longer time scales.

The virtual network, on the hand, is completely different.  Topology and configuration change is happening all the time – virtual machines are frequently added, removed, and migrating about – and the virtual network configuration must move at a similar time scale.  If not, overall provisioning speed and accuracy is bottlenecked by the slowest common denominator – the physical network switches, each with its own CLI.

Before network virtualization, the physical network needed to play a role in constructing the end-to-end virtual network used by the virtual machines.  A virtual machine was just another host on the physical network.  Traffic steering with VLANs, VRF, VDC, ACL, NAT, etc. needed to be configured by hand with a CLI on numerous switches — a time-consuming process prone to error and inconsistency.

As a result, the significant delta in provisioning speed between virtual machines and virtual networks brought about a contentious tradeoff:  You can have faster network provisioning with a precarious network design (such as all VLANs preemptively flooded on every port and large Layer 2 domains).  Or, you can have a good network design but with slow and limited provisioning (such as virtual machine networks limited to certain racks and services anchored to a physical network chokepoint).  You can’t have both good network design and service provisioning speed.  Not until you’ve decoupled virtual and physical network configuration through a network virtualization platform.

With the virtual network fully abstracted from physical switch hardware, through network virtualization, we are now free to use a configuration mechanism specifically for the virtual layer that’s better suited to the faster time scale of virtual networks – the API.  And the physical switch configurations need only provide a topology to deliver forwarding bandwidth that doesn’t need to change that often — for which the existing CLI is well suited.  As such the manner in which network operators configure the physical network today need not change with network virtualization.

The era of software centric networking platforms

The next generation network virtualization platforms such as NVP closely resemble the switch architectures we’ve deployed over the years to build highly scalable and robust physical networks.  What’s different is that the primary elements enabling this platform are software driven,  such as SDN controllers and virtual switches, connected together via standard transports (tunnels), and controlled via standard API interfaces (OpenFlow and OVSDB).

At VMware we believe that virtualization software providers are best equipped to deliver and package a software driven networking platform for the virtual network.  For example, NVP was the first network virtualization platform, and already in production at many service provider and enterprise data centers.  While we do expect network vendors to deliver similar “network virtualization” platforms aimed at the virtual network – their execution is likely to come with caveats that require only their physical network hardware.

For the same reason that it makes sense to support sever virtualization on any server hardware, network virtualization should provide the same basic principle of deployment on any network hardware.  Otherwise, it’s not really virtualization.

Cheers,

Brad Hedlund
Engineering Architect, Virtual Networking
VMware, Inc.

The start of an epic adventure with VMware, advancing the software defined virtual network

Today I am excited to write that a page has turned, starting a new chapter in my career, and life.  I’ve concluded an excellent year of service with Dell as “Networking Enterprise Technologist” where we grew DELL networking revenues by 40% Y/Y.  We launched cool networking software products like Dell Fabric Manager (fabric automation) and Active System Manager (converged infrastructure), and we launched the industry’s first 40/10GE converged blade server switches — the MXL and IO Aggregator.  I believe Dell is on a path to become a serious contender in data center fabrics — something you or I would have never imagined just a few years ago.  Along that path Dell has some tough decisions ahead, but I think they can make it happen.

In my time at Dell, I’ve learned to see the data center network from a different perspective.  I observed this space from a bottom-up point of view, looking at the specific needs of big data and private cloud clusters of compute and storage.  This, compared to the usual top-down monolithic network point of view I’ve had most of my career, looking at Core switches and trickling down from there to access layer protocols and port counts. Learning to see things from a different point of view expands your horizon and opens your mind.

Now, on to the next chapter.  I couldn’t be more thrilled to be joining the Networking business unit at VMware (Nicira), as “Engineering Architect, Virtual Networks”, reporting to Martin Casado (need I say more?).  Other members of the team are former Cisco fellow and IP/MPLS guru Bruce Davie, and Teemu Koponen (coding genius behind NVP) who recently won the 2012 SIGCOMM Rising Star award. Surround yourself with the right people and the rest will take care of itself.

Imagine an infrastructure where you can essentially draw and deploy your network topology, including the workloads, L2 segments, load balancers, firewalls, routers, gateways, etc — in any way, in any combination, all without touching any hardware configurations.  And all on common hardware platforms in a cluster of fabric and compute.  That’s a comprehensive L2-L4 network abstraction made possible by networking software built like a distributed system.  Now make a template of that topology for rapid re-provisioning, disaster recovery, auditing and compliance, application portability, etc.  That’s a virtual network.

This is not your Dad’s VLANs.  Not your Uncle’s VRF.  And not your Grandpa’s router CLI.

When the time comes to make a serious career change, you have to follow your passion and let your intuition and core beliefs guide you. That can be hard to do sometimes in an environment thick with hype, money, and start-up allure as we have right now in the networking industry.  It shouldn’t be about picking a winner.  It should be about finding something you really believe in, and making it a winner.

I’m a believer in distributed systems.  Look at how distributed systems radically changed the storage and data analytics industry (eg. Hadoop).  Petabytes of data can now be analyzed for business value in a matter of seconds — all on common hardware platforms in a cluster of fabric and compute.  Can distributed systems bring the same kind of transformation to networking?  I believe so.

I’m a believer in the intelligent edge and packet transport core (fabric).  This is a proven architecture for service oriented networks.  Look at the MPLS architecture of any service provider and this is what you see.  The customer is connected to an ingress “Provider Edge” box where policies are applied and then placed on a packet transport label-switched path through the “Core” to the egress edge.  It doesn’t make sense to re-inspect the same bits of a packet at each hop in the network.  The same example can found in chassis switch architecture –intelligent edge linecards  interconneted by packet transport fabric modules.

I also believe that x86 machines and the hypervisor vswitch are the ideal intelligent edge devices in our data center virtual network.  The hypervisor vswitch is exposed to a much greater set of context than a typical top of rack switch.  For example, it can differentiate VMs grouped together in the same application or tenant and program the vswitch accordingly.  I also consider the first interface between the “outside world” and our virtual network to be an intelligent edge as well — the North/South edge.  Which, again, is ideally x86 machines with the same L2-L4 vswitch programmed from the same context at the workload edge.  And in the middle of it all is a packet transport fabric — the physical network.

With our hypervisor vswitch playing such an important role in our virtual network — the question becomes:  Where is the ideal place to program the networking services and topology for our virtual network? Perhaps the same software managing the deployment and provisioning of the workloads, the VMs?  Or something closely coupled to it? I believe so.  The rationale being that you would want your application architecture defined in one tightly coupled policy engine — Rather than duct taping your VMs in one system to your virtual network in another system (that’s a loosely coupled kludge).  Besides, one workflow is better than two, right?

And finally, I believe in a solution that works on standard, commonly available hardware.  That the virtual and physical networks can and should be independently interchangeable and replaceable.  This of course leaves all of the leverage and control with the customer, not the vendors — and cultivates an ecosystem along the way.

And that’s why I couldn’t be more jazzed to embark on this epic adventure with VMware Networking.  I look forward to meeting you along the way!

Cheers,
Brad

A better fabric with VMware NSXi for your network switch

I’m chewing on a few thoughts today I wanted to jot down here and marinate on for a while.  I’ll use VMware as the straw man for the sake of discussion, simply because — like it or not — they are the household name in virtualization.  Disclaimer: The illustrations here are purely of my own imagination and do not reflect anything more than that.

Does it make sense for the software that controls the host machine to also control the fabric that interconnects those hosts?

Note that the host machine software already has some control of the “fabric” — but not all.  What am I talking about?  The virtual edge.  The hypervisor vswitch — a network device (yes, it’s a network device) providing a network connection for the virtual machine (what we care about).

This brings us to the larger question of: What is the Fabric?  Most people think of “Fabric” as a specially constructed network of physical switches — with all of the emphasis placed on how we should connect these physical switches together, and how they should be configured, etc.

Meanwhile, there’s another fabric to contend with — the virtual fabric – constructed by the host software with virtual switches.  This is the fabric touching the virtual machines at the access edge.

We already know that VMware provides software to load on your favorite server hardware and cool stuff happens, right? Virtualization, multi-tenancy, intelligent resource allocations, QoS, push button automation, etc, etc.

VMware is a software company.  They don’t sell servers.  The model is and probably always will be: “You bring the hardware, we’ll bring the software.”  At least, that’s been the model for the *host* machines in our virtual data center.

The network is a different story though.  Here, the network switch vendor says: “I’ll bring both the hardware and the software — it’s a package deal.”

There has always been this proverbial line in the sand between host machines and network switches.  ”You run your software there. I’ll run my software here — and we’ll all play nice together”.  Hence we never know what kind of thing we’ll need to play with on other side of the line.  So we need to establish some dumbed down and very basic rules of the game that just about anybody can follow.

In our case, those rules would be things such as: “Here’s how the host machine instructs the switch where some data should be delivered to, and the SLA you want.”  Hint: Destination IP address, ToS bits.

What we end up with is a very basic and lowest common denominator interface between the host and physical network — and by consequence this same basic interface applies to the virtual and physical fabrics.  Something just good enough to say “Here’s where I want this data to go and can you please take good care of it for me?”

Instead, what if the rules changed to:  ”Here’s how you load and run software on this physical switch.”  Just like we do today with standard x86 servers.

Now you potentially have software in the physical fabric that intimately understands what the attached hosts are attempting to do.  And as a result we can play with a more sophisticated set of interfaces between the host and network, and what the information carried in those interfaces means to this fabric — not the IEEE or IETF.  This doesn’t necessarily mean switches with new special proprietary ASICs, although that’s possible.  You work with whatever your switch ASIC is capable of.

For example: Software vendors already work with the well-known capabilities of Intel x86 commodity silicon.  Similarly, software vendors could also work with the well-known capabilities of commodity switching ASICs (Intel and Broadcom).  Things like DCB and MPLS.

The end result perhaps being a more capable and contiguous Fabric.  A better blending of the physical and virtual. Something that delivers better capabilities around service assurance, traffic engineering, and better visibility into the ever changing correlation between the virtual and physical topology.

Further reading: Fabric: A Retrospective on Evolving SDN

Cheers,
Brad

Data center network fabric auto deployed in 30 minutes with Dell Fabric Manager

This is a basic video demonstration I put together showing how Dell Fabric Manager 1.0 can be used to auto deploy a Leaf/Spine data center networking fabric based on standard Layer 3 routing protocols.

In the video you will see the fabric auto designed, auto configured, auto documented, and auto validated — from scratch — starting with design templates and sizing input.  All of this demonstrated in about 30 minutes without any time-consuming and error prone human-keyboard CLI configuration.  The demo also shows how DFM handles rouge configuration changes, and how DFM automates the process of adding new switches to the fabric.

Additional info:

  • The video shows the deployment of a Layer 3 Leaf/Spine fabric.
  • Future versions of DFM will also support auto deployment of Layer 2 VLT based fabrics.
  • This video shows interfacing with DFM via a web browser GUI.
  • Future versions of DFM will also provide northbound API interfaces.
  • DFM currently supports the Dell Z9000 and Dell S4810 data center switches.

Cheers,
Brad

Video: Basic introduction to Network Virtualization, Nicira, and VMware

This video is a snippet from a presentation I made which includes a quick and very basic introduction to Network Virtualization; virtual Layer 2; why VMware acquired Nicira; and how this changes the way we can design and deploy data center networks.

Enjoy!

Cheers,
Brad

Video: Basic introduction to the Leaf/Spine data center networking fabric design

This video is a snippet from a presentation I gave to a Dell audience covering a basic introduction to the Leaf/Spine Layer 3 data center networking fabric design with a Dell Networking point of view.

Enjoy!

Cheers,
Brad

Mind blowing L2-L4 Network Virtualization by Midokura MidoNet

Today there seems to be no shortage of SDN start-ups, chasing the OpenFlow hype in one way or another aiming to re-invent the physical network — SDpN (software defined physical network).  And then there’s a rare breed out there.  Those solving cloud networking problems entirely with software at the virtual network layer (hypervisor vswitch) —  SDvN (software defined virtual network).  Nicira was one of those rare breeds (look what happened) — and now it’s apparent to me that Midokura with their MidoNet solution is another one of those rare breed SDvN start-ups like Nicira, but with what appears to be a differentiated and perhaps even more capable solution.

I first learned of MidoNet from a mind-blowing post on Ivan Pepelnjak’s blog.  I had to read that post a few times before I really “got it” — the first clue you’re looking at a rare breed.  And like my first light bulb moment with Nicira, I couldn’t stop thinking about MidoNet for weeks afterwards.

Here’s my understanding of the MidoNet solution based on my own conversations with Dan Mihai Dumitriu (CTO) & Ben Cherian (CSO) of Midokura, with some answers to questions I had after reading Ivan’s post.

MidoNet solution diagram provided by Midokura

 

L2-L4 Network Virtualization

We already know that a cluster of vswitches can behave like a single multi-tenant virtual L2 network switch.  This is a well understood and accepted technology which has been further improved upon with L2 Network Virtualization Overlays (NVO) pioneered by Microsoft VL2 and Nicira NVP.  Now imagine that very same cluster of vswitches (NVO) behaving like a single, multi-tenant, virtual L3 router.  The technology that revolutionized L2 networking in the cloud has moved up the OSI model to include L3 & L4.  What was once bolted-on and kludged together is now literally built-in as a cohesive multi-layer solution.

Even more interesting, with MidoNet you can have x86 machines collectively behave as one virtual “Provider Router” interfacing with other routers outside of the cloud with regular routing protocols.  These machines provide the “Edge” gateway functionality (which I wrote about here) for the North/South traffic to/from our cloud.  On these Edge machines one NIC faces the outside with standard packet formats and protocols.  The other NIC faces the inside with <pick your favorite tunneling protocol here>.  No real need for hardware VXLAN or anything like that from a switch vendor.

Distributed L4 Service Insertion

We typically think of the vswitch as being a basic and dumb L2 networking device — forcing our hand into designing complex, fragile, and scale challenged L2 physical networks.  However that’s not the case with SDvN solutions such as MidoNet (and Nicira).  The networking kernel bits providing basic machine network I/O (and the basic L2 between two VMs) are the very same kernel bits that can also function as our gateway router, load balancer, firewall, NAT, etc.

The next trick is to get the L4 services configuration and session state distributed and synchronized across all machines.  If you can do that, the L4 network services required ( eg. firewall and load balancing) can be executed on the very machine hosting the VM requiring those services.  And Midokura has done just that.  With MidoNet, the days of complicating your network design for the sake of steering traffic to a firewall or load balancer are gone. And all the usual spaghetti mess flows and service insertion bandwidth choke points are gone too.  You can begin to see why I’m an unapologetic Network Virtualization fan boy.

Decentralized Architecture

Each host machine is running the Open vSwitch kernel module.   This is the packet processing engine (the fast path data plane) fully capable of L2-L4 forwarding.  You can think of this as the (software) ASIC of your host machine vswitch.  By itself the software ASIC is useless.  As with a physical switch, the ASIC needs to be programmed with a configuration and flow table by a control plane CPU.  And that control plane CPU can be off-box and centralized somewhere (Nicira), or it can be distributed and exist individually and locally on each host (MidoNet).

Each host machine runs the MidoNet agent (MN) which operates on a local copy of the virtual network configuration, and the agent acts upon that configuration when a packet is received on ingress from a vPort (the logical network interface that connects to a virtual machine, or a router for example).  Think of the local MidoNet agent as the control plane CPU for the local vswitch on that host.  Any special network control packets received on a vPort, such as ARP requests, will be handled and serviced locally by the MidoNet agent.  No more ARP broadcasts bothering the network.

Similarly, L2-L4 forwarding lookups and packet processing (NAT, encap, etc.) is handled locally by the MidoNet agent based upon its local copy of the virtual network configuration.  And as this happens the MidoNet agent builds the flow state on its local data plane (the Open vSwitch kernel module).  The host vswitch data plane does not rely on some nebulous central controller for programming flows — the local agent handles that.  This is one significant point of differentiation between Midokura MidoNet and Nicira (VMware).

MidoNet’s decentralized flow state architecture demonstrates that this solution was built for scalability and robustness.  It takes some serious talent in distributed systems to pull that off — which Midokura has.  Their technology team is a who’s-who roster of ex-Amazon and ex-Google.

Fully Distributed

MidoNet centralizes the virtual network configuration state and stateful L4 session state on a “Network State Database” — and even then, this database is a distributed system itself.  The Network State Database uses two different open source distributed systems platforms.  Any stateful L4 session state (NAT, LB, security groups) is kept in an Apache Cassandra database, while everything else is stored by an Apache Zookeeper cluster (ARP cache, port learning tables, network configuration, topology data, etc.). The virtual network configuration is logically centralized but still physically distributed.   In a normal production deployment, the Network State Database will operate on its own machines, apart from the machines hosting VMs or interfacing with routers.

Each MidoNet agent has a Cassandra and Zookeeper client that reads the virtual network configuration and L4 session state from the Network State Database, with the result being that all host machines have the same virtual network configuration and session state.  But this is a cloud, and things change.  Virtual machines move between hosts, new tenants are created, routers fail, etc.  Such events are configuration changes.  And to facilitate the synchronization of these configuration changes, MidoNet agents and the Network State Database communicate bidirectional for distributed coordination.

That’s the real “trick” — providing both a distributed and consistent virtual network configuration state across all machines.

Note: There’s no OpenFlow here at all — another point of difference from Nicira (albeit insignificant).  That’s just the result of one being a centralized architecture (Nicira) and the other being distributed (MidoNet).

Symmetrical

As we’ve already discussed, the configuration state of the virtual network is the same on all machines — the entire L2-L4 configuration.  Even a machine acting as an Edge “Provider Router” has the same network configuration and state information as the machines hosting VMs.  And vice versa — a host machine has the L3 routing configuration and device state of an Edge machine.  The configuration state is totally symmetrical.

There’s a significant and positive consequence to this — the virtual network traffic is also symmetrical and always takes the most direct path with no spaghetti mess of flows on the network.  This is because each machine can act upon the full multi-tenant L2-L4 configuration/state on ingress — execute any required L2-L4 processing — and send the packet directly to the destination machine.

For example, if you have (3) “Provider Router” Edge machines, as shown in the diagram above, a packet could arrive from the “Internet” on any one of those machines, destined for a VM which could be on any one of the (3) host machines.  Because the Edge machines have the same configuration/state information as the host machines, the Edge machine knows exactly which machine is hosting the destination VM and can send the packet directly to it.  And vice versa.  Maybe one of the (3) Edge machines loses its link to the “Internet” router.  That’s a networking state change realized on all machines, including the host machines.  And thus a host machine delivering a packet from a VM to the “Internet” would avoid sending traffic toward the affected Edge machine.

Note: I put “Internet” in quotes because what is shown as the “Internet” in the diagram above could just as well be the North/South data center routers.

Another example is the scenario where the Blue tenant is allowed to communicate with the Red tenant.  Normally that traffic would need to go through multiple L3 hops through various routers and firewalls. Not here though — because each machine has the full L2-L4 configuration for all tenants, and can act upon it with its L2-L4 kernel module data plane.  And as a result the traffic will be sent directly from the machine with the Blue tenant to the machine with the Red tenant — with all necessary L3-L4 packet transformations applied (NAT).

Tunneling

Packet delivery from one MidoNet machine to another is encapsulated into a tunneling protocol — any tunneling protocol supported by the Open vSwitch kernel module.  At this time MidoNet is using GRE.  It could be something else (CAPWAP, NVGRE, STT) if there’s a particularly good reason (better per flow load balancing on the physical IP fabric), but the tunneling protocol implemented is not the most interesting part of any Network Virtualization solution , despite playing an important role — like a cable.  It’s how the system implements it configuration, control, and management planes that will differentiate it from others.

But of course the tunneling allows us to build our physical data center network fabric anyway we like.  I personally like the Layer 3 Leaf/Spine architecture of fixed switches. ;)

Hypervisors & Cloud Platforms

MidoNet supports the KVM hypervisor and Midokura is currently targeting the OpenStack and CloudStack cloud platforms.  They have developed API extensions and plugins that snap MidoNet into the existing OpenStack architecture, both for the Essex release and the new Folsom release.

VMware is a non-starter (forget about it — don’t even ask).  VMware is a closed networking platform and even more competitive to other network vendors now than ever before with the acquisition of Nicira.

One interesting hypothetical for me is Microsoft Hyper-V.  Microsoft’s networking environment is not closed (yet) to the same degree as VMware.  The Open vSwitch is written in C code, and could be ported into Windows Server 2012.  Something tells me a team in Redmond has already tried that.  So in theory MidoNet could (hypothetically) be a virtual networking solution for customers choosing to build their cloud with Microsoft.  Or perhaps I should say… giving customers a better reason to choose Microsoft instead of VMware.

Midonet features provided by Midokura

Tradeoffs

Midokura has done the heavily lifting — building a distributed system of MidoNet agents — which provides a solid foundation to deliver Network Virtualization beyond simple Layer 2, and in to the realm of distributed multi-tenant L3 and L4 service insertion.  However, like anything else there are tradeoffs to the design choices you make.  Compared to a more centralized model, the MidoNet agent on each host is a thicker footprint than a lighter-weight OpenFlow agent (Nicira).

Some of the natural questions we should be asking are things such as:

What kind of resource drain is the MidoNet agent? CPU, Memory, Disk, Network I/O?

How do the different L2-L4 features affect those host resources if and when configured?

How does the size of the deployment relate to host resource consumption, if at all?

What is the latency to replicate a configuration/state change to all hosts?

Does the scale of the system impact the change latency?

What kind of change rate can the system handle and what would be a “normal” change rate?

Can the machines send data back to the Cassandra cluster for analytics?

Can we have a configuration feedback loop based on such analysis?

Can I “move” a virtual network environment from one data center to another?

Network Virtualization is #Awesome

It’s amazing to me that with just some x86 machines and a standard Layer 3 IP fabric you can build an impressive IaaS cloud with fully virtualized and distributed scale-out L2-L4 networking.  I’m a big fan (boy), and extremely intrigued by these rare breed solutions such as Midokura MidoNet and Nicira.

Cheers,
Brad