Going Over the Edge with your VMware NSX and Cisco Nexus

Hey! Cisco Nexus peeps! What could possibly be more fun than connecting your awesome new NSX gear to your Cisco Nexus gear? For the life of me I really don’t know.  All right then. Lets do it!

Lets kick things off with this email question I received from a reader.

“Hi Brad,

In our environment we have two prevailing server standards, rackmounts and UCS. I read your excellent NSX on UCS and 7K design guide and the section on not running routing protocols over the VPC links makes sense. My related question concerns how we can achieve a routing adjacency from the NSX Distributed Router to the N7K  with a rack mount with 2x10gbe interfaces connecting to 2x7Ks via VPC? (we don’t use the NSX Edge Router).”

This reader has politely pointed out that my VMware NSX on Cisco UCS and Nexus 7000 design guide could have provided a bit more detail on NSX Edge design. I totally agree. There’s no time like the present, so let’s dive into that now and stir up some content that might end up in the next version of the guide.

All right.  We won’t worry too much about the form factor of the servers right now.  Whether it’s a blade or a rack mount doesn’t matter; lets just generalize that we have servers.  And to make things extra difficult, these servers will only have 2 x 10GE interfaces — no more, no less.  Those interfaces [ultimately] connect to a vPC enabled VLAN, or they will connect to a non-VPC normal VLAN.  Working with this baseline of 2 x 10GE also helps to keep everything easily applicable to either blades or rack mounts.

I’m going to present the logical topology of three different designs.  How these translate into a physical topology is something I’ll leave for the time being to your own expertise and imagination.

Any design discussion can have a number of variables and permutations, and especially here, and especially in the bottom half section depicting On Demand virtual networks. “What about inserting service X for this or that tenant?” etc. If I attempted to discuss all such nuances in completeness this post would get way off topic.  Lets keep it simple for now and focus on the edge topology.  At a later time we’ll come back to the various flavors of On Demand virtual networks you can lay down underneath the Pre-Created edge topology of your choice.

Lost Your Edge

We’ll start with the scenario posed in the opening question; “we don’t use the NSX Edge Router”.  Thus, the only NSX router is the distributed logical router (running in kernel on your ESX compute hosts) which is directly adjacent to your virtual machines (naturally); and it’s also directly adjacent to your Nexus 7000s on a vPC enabled VLAN.  The later constitutes the “Uplink” of the distributed router and allows for the possibility of running a dynamic routing protocol with an upstream router.

The motivation for the Lost Your Edge design might be simplicity, where-in you don’t want — or feel that you don’t need — an additional layer of NSX Edge virtual machines to worry about and manage.

Notice that we’ve laid this topology down on all vPC-enabled VLANs.  Remember the NSX distributed router is running in-kernel on your ESX compute hosts, and I presume that you do want your ESX compute hosts attached via vPC.  As a result our NSX distributed router is also vPC attached.  By consequence, this prevents us from running a dynamic routing protocol between the NSX distributed router and the Nexus 7000s.  The reason for this I have explained here.

We can most definitely do the Lost Your Edge design with static routing.  Your NSX distributed router would have a simple default route pointing to the Nexus 7000s HSRP address on the “Edge VLAN”. Meanwhile, the Nexus 7000s will have a static aggregate route (eg. pointing to the NSX distributed router forwarding address. Later on, the individual subnets you create (On Demand) behind the NSX distributed router will of course fall into that aggregate route. The only thing left to do now is redistribute this static route into your enterprise backbone with BGP or OSPF.

One thing to be aware of in the Lost Your Edge design is the need for a Designated Instance (DI) on the NSX distributed router for the Uplink logical interface on the “Edge VLAN” facing the Nexus 7000s.

When the NSX distributed router has an interface on a VLAN, one of the ESX hosts will be designated as responsible for ARP handling and forwarding for the distributed router’s forwarding MAC address on that VLAN. By consequence, that one host will receive all traffic coming from other devices on that VLAN (like the Nexus 7Ks). Once received, the designated host will locally route traffic to the proper VXLAN (or VLAN) containing the destination, and send it as a logical Layer 2 flow to the host where the VM resides (which might be on another host, or the same host).

The DI host is elected by the NSX Controller cluster. This is not something that you can easily influence or predict, any host could be elected DI.  And when it fails, a new one needs to be re-elected. The failure detection and recovery of a new DI can take as long as 45-60 seconds. This is something you might want to test for yourself in a lab.

The other important thing to point out about Lost Your Edge is that you’re missing an opportunity to apply services like NAT, VPN, or perimeter Firewall inspection as traffic enters or exits the NSX domain.

In designs to follow you’ll see how we can obtain faster failure recovery, services, better traffic distribution, and even dynamic routing.

On the Edge

Let’s assume for the moment that you’re fine with static routing (or maybe you’re stuck with all vPC VLANs in your physical design).  Maybe it’s the failure recovery and ingress choke of the Designated Instance that you’re not cool with (heck, I don’t blame you).  No problem.  In this On the Edge design we’ll introduce the NSX Edge routing VMs and see what happens.

Nothing has changed with the Nexus 7000s and the physical VLAN setup.  We still have all vPC enabled VLANs, and we still have the previously discussed static aggregate route.  The difference lies in the NSX topology.  Our first hop into NSX is now an NSX Edge Router VM which we’ve protected by a state-synced shadow VM.  Second, we’ve introduced a VXLAN Transit Logical Switch that will sit between our NSX Edge and NSX distributed router.

All of our hosts are still attached via vPC with 2 x 10GE NICs.  Some of these hosts should be designated as Edge hosts and placed in an Edge Cluster for the purpose of running your NSX Edge VMs.  This (must read) VMware NSX Design Guide 2.1 covers that approach quite thoroughly as a design best practice.  That said, in a lab you can certainly mingle your NSX Edge with compute hosts just for the fun of it.

For our distributed router, the concept of a Designated Instance does not apply on any VXLAN segment (such as our Transit Logical Switch) where traffic is flowing from an NSX Edge VM to the distributed router, and vice versa.  When traffic arrives at the NSX Edge VM from the Nexus 7000, the Edge host machine also happens to be running the NSX distributed router in its kernel.  Therefore, the next hop (.2) is always local to every Edge machine, along with the Logical Switches attached to that distributed router.  In a nutshell, the Edge host machine is able route traffic from the Nexus 7000 directly to the ESX compute host where the destination VM resides. How cool is that?

You can see the On the Edge design — when compared to Lost Your Edge — has the same (if not better) traffic flow properties, faster failover (6 seconds), and the opportunity to add services like NAT, VPN, and perimeter Firewall.  Not bad for a days work.

On the Upgraded Edge

Now let’s assume that you do have some flexibility in your physical design to vPC attach some hosts, and not others.  With that luxury we’ll take the Edge hosts running the NSX Edge VMs and have those non-vPC attached.  Meanwhile we’ll leave the compute hosts with their optimal vPC attachment.  By doing this, we’ll be able to upgrade the On the Edge design with dynamic routing.  Just as a reminder, this an exercise specific to Cisco Nexus 7000.  Other platforms may be able handle dynamic routing on vPC or MLAG connections just fine.

From the diagram above you will notice that we’ve made the “Edge VLAN” a non-vPC VLAN and our Edge hosts will attach to it.  You might also observe that we’ve added a second VTEP VLAN that is non-vPC, and we will attach our Edge host VXLAN vmkernel interfaces to it.  Our Edge hosts are completely non-VPC attached while our compute hosts remain attached to all vPC enabled VLANs.

With our NSX Edge hosts free from vPC attachment, we are able run dynamic routing protocols with the Nexus 7000 without issue, such as BGP.  Every new subnet created on the NSX distributed router will be advertised to the NSX Edge, and in-turn will be advertised by the NSX Edge to the upstream Nexus 7000s (or whatever) with BGP.  Pretty cool, right?

The traffic flow here is very similar to the previous design, only now our VXLAN traffic between Edge and Compute hosts will take a Layer 3 hop through the Nexus 7000 (before it was Layer 2).  No biggie.  Depending on your physical design and host placement, this might mean an extra hop through the Nexus 7000, or not.  Such as with N7K-N2K (no difference) vs. N7K-N5K-N2K (maybe) or N7K-UCS (maybe).  Keep in mind, Edge to Compute host traffic is North/South in nature and generally bottle-necked by some other smaller link further upstream.  Fair enough?

Over the Edge

Up to this point we’ve been placing one NSX Edge VM on that “Edge VLAN” to send/receive all traffic to/from our NSX distributed router.  Well and good.  A single NSX Edge VM can easily route 10Gbps of traffic.  But you want more?  No problem.  We’ll we just 8-way ECMP that mofo and call it a day.  Check it out.

What we’ve done here is deploy up to eight NSX Edge VMs on that “Edge VLAN”, placed them on separate hosts, and enabled ECMP.  We also went to our NSX distributed router and enabled ECMP there as well.  Our Nexus 7000s see dynamic routing updates coming from 8 equal cost next hops and perform per-flow hashing, placing each unique flow on one of our eight NSX Edge VMs (each capable of 10Gbps).  The reverse applies as well.  IF you had up to eight Nexus 7000s on the Edge VLAN (seriously?) each NSX Edge VM would install eight equal cost next hops for each route upstream.

The same magic applies to our NSX distributed router.  Each compute host sending traffic northbound will perform eight way per-flow hashing (in-kernel), picking a NSX Edge for each unique flow.  If for whatever reason a NSX Edge drops off the network, only 13% of the traffic will be affected (in theory), and only for the period of time it takes routing protocol timeouts to detect and remove the failed next hop (3 seconds or so).

When you’re letting it rip with ECMP there’s no guarantee that both directions of a flow will traverse the same NSX Edge.  Because of that we need to turn off stateful services like NAT, VPN, and perimeter Firewall.  That’s the only bummer.  Not much we can do about that right now with ECMP.  But if you need lots of bandwidth (more than one Edge) with stateful services, you can always horizontally scale Edge and distributed router in pairs.  For example, Edge1+DR1, Edge2+DR2, and so on.

Design Poster

CLICK HERE to download your own copy of this 48 x 36 design poster containing all of the cool diagrams from this post.  Print that bad boy out and hang it up next to your NSX + UCS + Nexus 7000 poster.


CLICK HERE to download your own copy of these diagrams in PDF slides.



A tale of two perspectives: IT Operations with NSX

This year I had the honor and privilege to co-present a session at VMworld 2014 with my esteemed colleague Scott Lowe.  As many of you know, Scott is a celebrity at VMworld.  He’s one of the most famous virtualization bloggers and the author of many best selling books on VMware vSphere.

Together, we presented what I think turned out to be a really fun session!  Scott and I pretended to be colleagues at a company that decided to deploy VMware NSX for their software-defined data center.  I played the role of the “Network Guy”, and of course Scott played the role of the “Server Guy”.  So then, how do we work together in this environment?

  • How do we gain operational visibility into our respective disciplines using existing tools?
  • How do we preserve existing roles and responsibilities?
  • What opportunities exist to converge operational data for cross-functional troubleshooting?
  • How does the Network team gain hop-by-hop visibility across virtual and physical switches?
  • How can the Network and Server teams work together to troubleshoot issues?

These are just some of the questions we attempt to role play and answer in this 35 min session.



On choosing VMware NSX or Cisco ACI

Are you stuck in the middle of a battle to choose VMware NSX or Cisco ACI?  In this post I’ll attempt to bring some clarity and strategic guidance in first choosing the right path, then propose how the two technologies can co-exist.  I’ll start with the message below from a reader asking for my opinion on the matter:

“Hi Brad,

I’m involved in a new Data Center networking project where Cisco is proposing the Cisco ACI solution.  I am starting to dig-in to the technology, but my immediate “gut reaction” is to use Cisco for a standard Clos-type Leaf and Spine switch network and use NSX for providing Layer 3 to Layer 7 services.

I am interested in hearing your opinion about Cisco ACI versus VMware NSX, since you have worked for both companies.  If you have time, it would be great to share your views on this subject.

As you can imagine, this is a highly political discussion and our network team are Cisco-centric and resisting my ideas.  We are a VMware/Cisco shop and I want the best fit for our SDDC strategy.”

For the sake of discussion, let’s assume that your IT organization wants to optimize for better efficiency across all areas, and embark on a journey to “the promised land”.  More specifically, you want to obtain template driven self-service automation for application delivery, as well as configuration automation for the physical switches and servers.  Let’s also assume that you would like to preserve the familiar model of buying your hardware from Cisco, and your software from VMware.  E.g. “We are a VMware/Cisco shop”.

Before I begin, it should be obvious that I’ll approach this with a bias for VMware NSX; the result of a thoughtful decision I made two years ago to join the VMware NSX team instead of the other hardware switch vendor opportunities available at the time.  The choice was easy for the simple reason that VMware is the most capable pure software company in the networking business.  It was apparent to me then (and still is now), that in the new world of hybrid cloud and self-service IT, the winners will be the ones who can produce the best software.

Choosing a path forward rooted in software

Any way you slice it, your virtual machines will be connected to a software virtual switch.  This is the domain of a fluid virtual environment that will exist whether or not you decide to use VMware NSX, or go all-in with Cisco ACI.  Either direction will require that you do something special with the software virtual switch before you can proceed down the chosen path to the promised land.  This isn’t opinion or theory, it’s a universally accepted fact.  If the solution isn’t able to gain programmatic control of the fluid network within the software-centric virtual environment, it’s a total non-starter – like buying a fancy television without a remote control.  It’s not optional or even a matter that’s up for discussion.  Everybody agrees this is a necessary function.  Well then, what does that tell us?

To explore that thought a bit further, let’s consider the hardware-centric point of view.  Any way you slice it, your hypervisors and non-virtual machines will be connected to a hardware physical switch.  This is the domain of a static environment that will exist whether or not you decide to use VMware NSX or Cisco ACI.  One of the two directions requires that you also do something special with hardware switches before you can even proceed with the (above) unanimous requirement for special software virtual switches (e.g. Cisco’s software virtual switch for ACI doesn’t even function without special hardware switches).  However, nothing special needs to be done with hardware in the NSX direction. You’re already well down the path of VMware NSX when (above) you did something special with software virtual switches.

I can proceed to argue that nothing special with hardware will ever need to be done.  The moment you gained programmatic control over the fluid software environment you’ve done everything necessary, and then pose the question; “Why do you need programmatic control over this static non-virtual environment anyway?”  The point here is not to have the debate, but that the debate is there to be had.  This is still a matter of opinion and theory.  Suppose you bought an adjustable TV stand to go with that fancy new television; does it need a remote control too?

For the sake of argument, let’s presume you accept the theory that there needs to be some programmatic control over the static environment. Hey, it sounds nice, so why not?  Maybe you do want a remote control to adjust your TV stand, “just in case”.  For the Cisco ACI path to make sense, the next argument you need to make is that the fancy television should only function when it’s placed on an adjustable TV stand; and only if the TV stand can be adjusted by the same remote control that operates the television.  And finally, you’ll need to convince people that your fancy television and adjustable stand must be designed by the same company — one that specializes in building television stands. Otherwise, they’d better wait and stick with the same old worn-out TV.

In contrast, for the VMware NSX path to make sense, you’ll need to make the argument that a fancy television should be able to work on any stand you can rest it on.  If you can place it on an adjustable stand, well that would be nice.  And if the adjustable stand came with a remote control, Wow, even better.  You’ll also need to convince people that it makes more sense to buy televisions from an electronics company; and television stands should be bought from a television stand company.

Analogies aside, what this tells us is that software is the more important choice, and the hardware is secondary.  There are two primary reasons for this.  First, to realize the benefits of the fluid data center with fast provisioning and low OpEx requires tight integration with the overall orchestration framework.  This is a function of software.  Second, the first hop any packet will see is a software virtual switch, and this is where security policy and other important functionality will reside.  Hardware is still important, but overall it accounts for fewer ports and has less of the necessary intelligence.

“Networking is a software industry.  To succeed in a software market, you need to be a software company.” – Guido Appenzeller

“Who do you think is going to make better software, a software company or a hardware company?” – Steve Mullaney

In other words, Cisco makes great hardware switches.  And of course you still need a well-engineered physical network to construct the static environment (the television stand). There are other good choices available, but if you prefer Cisco Nexus 9000 physical switches (either in NX-OS or ACI mode) that’s perfectly fine.  However, that decision does not imply that Cisco is also the best fit for the fluid virtual environment, because that is a world of pure software.

The best example of this is security. Consider the distributed firewall available in VMware NSX providing true Zero Trust micro-segmentation with per virtual machine stateful security, including full auditing via syslog, partner integration such as Palo Alto Networks, all with no choke points (because it’s built-in to the vSphere kernel).  In contrast, this capability provided by VMware NSX does not exist in Cisco ACI.  One problem is that switching hardware is simply not capable yet of providing granular per virtual machine stateful security.  However, this can easily be accomplished in software, as it’s done today in the NSX-enabled VMware distributed virtual switch.  Similarly, there’s no technical reason why this same level of security couldn’t be available in Cisco’s ACI-enabled Nexus 1000V software virtual switch (AVS), but it’s not there.  The point here is that critical network services like security work best in software and virtual switches.  And it’s clearly evident that a pure software company has the focus on software to execute better and faster in providing these features than a hardware company.

On the VMware NSX path, where does Cisco ACI fit?

Let’s assume you’ve decided to follow VMware’s lead in software to the promised land, and begin utilizing NSX and the vRealize Suite for your SDDC self-service automation and policy based application delivery.  In that scenario, VMware NSX and Cisco ACI are not at all mutually exclusive, because they’re each fulfilling different roles.  One is a network and security virtualization platform for your SDDC (NSX), the other is a well-engineered fabric (ACI).  They go together, like a television and its adjustable stand (or wall-mount, whichever you prefer).

Your well-engineered fabric can certainly have its own automation interfaces for the purposes of constructing the static environment in way that’s, well, automated.  The presence of NSX doesn’t prevent that.  If you want to deploy Cisco Nexus 9K physical switches – great.  The fabric can be deployed in NX-OS mode with a familiar Cisco CLI, and automation through either the NX-API or Python API.  Or the fabric can be deployed in ACI mode (with no CLI), and automation available through the ACI-API.  Either way, automation is obtainable.  Your Cisco fabric APIs manage the static environment (connections for hypervisors and non-virtual hosts), while the NSX API manages the fluid virtual environment (network services and security for virtual machines).

For example, when it’s time to establish network connectivity for a new rack of hypervisor hosts, you’ll use the Cisco APIs for that.  In the case of ACI-API, one example would be an application network profile, where the “application” in question is the vSphere hosts running NSX.  This ACI profile would contain End Point Groups that establish connectivity policy for the various hypervisor vmkernel interfaces supporting vMotion, Management, vSAN, and NSX.  The workflow to provision a new rack of hypervisors would include an API call to Cisco APIC requesting that it assign this profile to the appropriate physical switch ports.

Now it’s time to provision applications in minutes from a self-service portal, complete with network and security services.  That’s when your vRealize Suite (or maybe VMware Integrated OpenStack) will call upon the VMware NSX API.  You simply point vRealize orchestration software at your vCenter and NSX Manager as its API end points; and from there you proceed to create full application blueprints complete with templates for compute, storage, security, and full L2-L7 network services. You can do all of this today with NSX, vRealize, and your Cisco Nexus 9K fabric.

Later, if Cisco provides integration for ACI with the vRealize Suite, you might decide to create some application blueprints using the NSX networking services model, others using the ACI model — just for the fun of it — and then compare the two side by side. “Which model provides better security, better performance, etc?”  But we’ll have to wait for that, which brings me back to my original point on the winners producing the best software in a timely manner.

In the meantime, I hope to see you in your awesome new SDDC sometime soon donning your new VMware NSX certifications!