Three reasons why Networking is a pain in the IaaS, and how to fix it

In this post I share the slides, audio recording, and short outline of a presentation I gave at the Melbourne VMUG conference (Feb 2014) called “Three reasons why Networking is a pain in the IaaS, and how to fix it”.

As network technologists we know that when the compute architecture changes, the network architecture changes with it. Consider the precedent. The transition from mainframe to rack servers brought about Ethernet and top-of-rack switches. Blade servers introduced the blade switch and a cable-less network. And of course the virtual server necessitating the software virtual switch and a hardware-less network. At each iteration, we observe the architecture change occurring at the edge, directly adjacent to compute.

We can look at this superficially and say, “yes, the network architecture changed”. However if you think about it, the catalyzing change in each shift was the operational model, with intent to increase agility and reduce costs. The architecture change was consequential. Without compute, there is no reason for a network. Networking, both as a profession and technology, exists as a necessary service layer for computing. Without a network, computing is practically useless. As such, the capabilities of the network will either enable or impede computing. Viewed in that light, when an organization decides to change the operational model of computing (virtualization, IaaS), the operational model of the network must evolve with it. If not, the “Network” becomes the impediment to the organization, not an enabler. (Hint: you don’t want to be on the receiving end of that).

Static compute > Static network
Virtual compute > Virtual network
Infrastructure as a Service > Networking as a Service

Audio Recording (MP3) 44 min

Click here to download the MP3

Three reasons: Outline

1) Impedance Mismatch

Deploying legacy non-virtual networking with virtual computing creates an operational impedance mismatch. Virtual computing provides instant provisioning, mobility, and template based deployments. Despite these advances, the virtual compute is still coupled to network services that are slow to provision, anchored to specific physical equipment, and manually deployed at the risk of configuration drift and human error. The full potential of virtualization and the IaaS cannot be realized. Simply creating virtual machine equivalents of Firewalls and Load Balancers doesn’t change the operational model of network services, it only changes the form factor.

The solution is to bring the same operational model of virtual computing to the network – network virtualization. Networking services should be instantly provisioned from a capacity pool, decoupled from specific hardware, made equally mobile, and deployed by machines using templates.

2) Lost in Translation (Scripting)

Attempting network “automation” or “orchestration” by scripting against individual device interfaces is untenable. Some 3rd party scripting tool has the difficult job of providing both an upstream interface with which to accept desired network state, and display the real time network state. This requires translation and coordination across many different autonomous devices and interfaces (languages).

The solution is deploy a virtual networking platform (like a virtual chassis switch) where many different devices connect to the platform like a virtual line card using the platform API. The virtual networking platform can then expose a single API endpoint to an upstream automation tool (e.g. OpenStack or VMware vCloud Automation Center). All of the complexities around deploying desired network state and gathering the real time state are removed from the automation tool and assumed by virtual networking platform. The individual device interfaces (languages) still remain for operational tasks (code upgrade), but are out of the way in terms of service provisioning.

Examples: VMware NSX + F5 (tech preview video), and VMware NSX + Palo Alto Networks (PDF)

3) Choke points

In many cases Firewalls are required to handle east-west traffic between compute instances, or between different trust zones. If the firewall is a “box”, be it a physical piece of iron, or even a virtual machine, it’s a single “device” in the network somewhere to which traffic must be forced through so that it can be inspected by a policy. This is a choke point catching packets. Performance of east-west traffic suffers, and the choke point (several layers removed from the source of traffic) has no real meaningful visibility into where the traffic came from, who sent it, or where it’s going. The choke point is merely inspecting IP packet headers against an access list. This means IP addresses of the workloads are critical to the applied security policy. This is not what we want in a highly agile Infrastructure as a Service. Security policy should be attached to the applications and workload, not the IP addresses. And there should be no choke points that impede performance.

The solution is to centrally define, and physically distribute the security policy across the virtual switching layer in the hypervisor kernel. Every virtual port attached to a virtual machine is not just the access port, it’s the stateful firewall too. The security policy is applied to the virtual machine, not the IP address, and enforced at the very first hop – no more choke points. And your policy can trigger on a large set of semantics such as user identity, operating system, security posture, or any arbitrary and hierarchical grouping of virtual machines (applications).

Example: VMware NSX Distributed Firewall

The rest of the presentation covers some example multi tenant topologies you can deploy in your IaaS with NSX, and how to introduce NSX into your existing environment and make a gradual migration. Listen to the full audio, and stay tuned for more blogs on these topics and more.

Cheers, Brad