This post will be the first in a series that examine what I think are some of the powerful security capabilities of the VMware NSX platform and the implications to the data center network architecture. In this post we’ll look at the concepts of Zero Trust (as opposed to Trust Zones), and virtualization-centric grouping (as opposed to network-centric grouping).
Note: Zero Trust as a guiding principle to enterprise wide security is inspired by Forrester’s “Zero Trust Network Architecture“.
What are we trying to accomplish?
We want to be able to secure all traffic in the data center without compromise to performance (user experience) or introducing unmanageable complexity. Most notably the proliferation of East-West traffic; we want to secure traffic between any two VMs, or between any VM and physical host, with the best possible security controls and visibility — per flow, per packet, stateful inspection with policy actions, and detailed logging — in a way that’s both economical to obtain and practical to deploy.
Trust Zones of Insecurity
Until now, it hasn’t been possible (much less economically feasible or even practical) to directly connect every virtual machine to its own port on a firewall. Because of this, the firewall has always been a “thing” (a physical piece of iron, or virtual machine) that we need to bolt on top of the network. First, you need a network to connect, aggregate, and group machines. After that you can connect the firewall to a port on that network-centric grouping (a virtual switch Port-Group and/or VLAN). Meanwhile, the network construct establishing the group provides unfettered connectivity within the group. In other words, the firewall has no visibility or security control over the East-West traffic between machines in given group. The result is a “Trust Zone”. We “trust” (read: hope), but can’t verify, that one machine in the zone will not laterally infect/attack the other zone members.
Network-centric grouping in a virtual environment
Groups form the basis of a security policy. Similar machines from a policy standpoint are placed into a group, at which point a policy governs how traffic is handled in to, out of, and within that group. How these groups are defined and where they exist can make a big difference in a virtualized data center. For example, when groups are defined by a networking construct, and then pushed into a virtual environment (vSphere), the security policy attached to a virtual machine is determined by its connection to a specific network-centric grouping object, with the most minimal granularity being a Port Group. Taking a network-centric approach in a virtual environment presents a number of challenges.
First, this approach can quickly create a large quantity of networking objects to deal with — a morass of Port Groups cluttering the virtual network inventory. For example, lets say you have 100 applications, each with three distinct tiers of policy groups (Web, App, DB); this would result in 300 Port Groups to choose from in your distributed virtual switch.
Second, the virtual administrator needs to correctly choose, and manually attach, the specific Port Group for each virtual machine network interface when it’s deployed. With an inventory of hundreds or thousands of virtual machines and Port Groups to choose from, human error in applying the wrong security policy is something to contend with. Despite the clutter of Port Groups, the manual aspects can be mitigated however if there is good integration with upstream automation software, namely vCloud Automation Center (vCAC).
Third, a Port Group is an object that’s specific to one distributed virtual switch (DVS). If the security policy for a virtual machine depends on its connection to a specific Port Group, the mobility domain for that virtual machine is limited to one DVS. Migrating outside of the DVS would involve a cold stop/start operation, and manually attaching the virtual machine to a different and specific Port Group in a new DVS.
Fourth, there are no security controls for East-West traffic within the Port Group that establishes a group. It’s just another “Trust Zone”. Only traffic between groups can be secured; which might lead to an effort to obtain more granularity by creating more and more Port Groups.
Zero Trust transparent security
In the Zero Trust model, we take the usual Firewall-bolted-on-top approach and turn it upside down. Every virtual machine is first connected to a transparent in-kernel statefull firewall filtering engine (with logging) before it’s even connected to the network. This means that any traffic to or from a virtual machine can be secured, regardless of the network construct it’s attached to. Because the firewall is below the network, directly adjacent to the things we want to protect, there is never an unfettered “Trust Zone”. Security is omnipresent — per flow, per packet, statefull inspection with policy actions and detailed logging, per virtual machine, per virtual NIC. The network constructs still exist, of course, but only to provide connectivity (not security). The Zero Trust model is also referred to as Micro Segmentation.
Virtualization-centric grouping in a virtual environment
A security policy works with the basic concept of a group, comprised of similar objects, to which you then apply a policy based on group names. In the network-centric model these groups were represented by Port Groups in a distributed virtual switch. In contrast, another approach is to employ a virtualization-centric grouping model, as implemented by VMware NSX, where the groups that form the basis of your security policy are decoupled from the network, and are simply an abstract object called a “Security Group” existing in the virtualization layer. There are a number of advantages to this approach in a virtual environment (e.g. vSphere).
First, the virtual network inventory remains simple and uncluttered. For every Security Group created there is no requisite and corresponding Port Group to create. The virtual network inventory remains constant as the environment grows. For example, this time your 100 applications, each with distinct tiers of policy groups (Web, App, DB), can be deployed with only one Port Group and VLAN providing the network connectivity.
Second, the virtual environment can dynamically attach virtual machines to the appropriate Security Group based on virtualization relevant context, tags, and business logic. As a simple example, in the diagram above, any VMs with the name “PROD-web” are placed in the “Web” Security Group automatically. Another scenario might be; if VMs are deployed by members of the “Engineering” active directory group, tag them as “Engineering”, and based on that tag dynamically add them to the “Dev/Test” Security Group, and isolate them from “Prod”. It doesn’t matter which Port Group the VMs are attached to. An incorrect Port Group assignment might only break network connectivity, not security policy.
Third, mobility is not artificially limited a network-centric object such as a single distributed virtual switch. Security Groups are not coupled to a distributed virtual switch (DVS), or any network construct for that matter. It doesn’t matter which Port Group connects to your virtual machine, and by consequence it also doesn’t matter which DVS your virutal machines are connected to either. This means you can live migrate virtual machines from one DVS to another; and someday soon, between vCenter instances — all while maintaining consistent security policy.
And finally, as previously discussed, there are no insecure Trust Zones with virtualization-centric grouping. Even traffic within a Security Group can be subject to policy controls and statefull inspection with detailed logging. The highest degree of granularity is provided at the onset (per virtual machine, per virtual nic).
With a transparent firewall underneath the network, as opposed to bolted on top, this will have implications to the data center network architecture. The result, I contend, will be virtual and physical topology simplification.
When the firewall is bolted on top, the network substrate needs to be designed in such a way that correctly implements a security policy — selectively steering traffic from a virtual machine to some physical or virtual firewall several hops away. The more granularity you attempt, the more complex the design becomes with a quagmire of network-centric traffic steering and isolation tools like Port Groups, VLANs, ACLs, and VRFs. Meanwhile, more and more East-West traffic needs to be detoured several hops to a firewall, impacting performance (user experience). And in the end, you’re still left with unsecured Trust Zones, as you can never realistically obtain per-VM granularity.
With virtualization-centric VMware NSX, on the other hand, policy is applied underneath the network, in the virtualization layer. Throw away that East-West traffic detouring bag of tricks. Security is applied, transparently, before the packets even arrive at the first virtual network port. Latency sensitive East-West traffic is free to travel directly to its destination, taking the lowest latency path, having already been secured at the onset.
The network architecture is simply designed for connectivity; whether that might be a handful of VLAN backed Port Groups in an L2 fabric that you’re already using today; or migrating toward full network virtualization with VXLAN backed Logical Switches, Logical Routers, and simple L3 farbrics. You can start with the former and gradually move to the later.
Some points of differentiation
When evaluating options and comparing the security capabilities of VMware NSX for vSphere to other solutions, here are some points of differentiation to keep in mind.
Headless operation — The VMware NSX for vSphere distributed firewall does not rely on some other virtual machine for the data plane to function. Rules are centrally programmed by the NSX Manager and each host is able to inspect and enforce security policy for every flow and packet on its own, without the Manager (including headless vMotion).
Mobility — Your virtual machines are not constrained to single distributed virtual switch. Security policy is consistent irrespective of the DVS or Port Group providing the connectivity, and virtual machine live migration is not artificially constrained to a single DVS.
Zero Trust — Even traffic within the most minimal grouping construct is secured. East-West traffic within a Security Group, can be subject to policy, statefull inspection, and logging. There are no insecure Trust Zones.
Automation — The virtual environment can automatically attach virtual machines to the appropriate Security Group and subsequent policy based on virtualization relevant context. The virtual administrator doesn’t need to correctly choose and manually assign virtual machines to a specific Port Group. And when a host is added to a cluster, all of the required software is automatically installed.
Dynamic security — Just as the virtual environment can automatically assign a virtual machine to a Security Group, based on context, it can also change the Security Group (and policy) dynamically, based on changing context, or context provided from a third party, such as a malware or vulnerability assessment solution (Rapid7, McAfee, Symantec, Trend Micro).
Distributed platform for NGFW — One of the policy actions you can apply to a Security Group is selectively redirecting traffic to a local user space service virtual machine on each host. For example, 3rd party firewall providers can leverage this platform to add NGFW inspection to the environment in a distributed manner. Palo Alto Networks has already leveraged this capability with their VM-Series NGFW firewall that integrates with VMware NSX for vSphere.
Quick Video Demonstration
Finally, here’s a quick video demonstrating the scenario depicted in the diagrams above. I will show how a Security Group is created, how virtual machines are automatically assigned to a group, how East-West traffic within this group can be filtered by the NSX statefull firewall, and how the logs can be viewed and analyzed.