Mind blowing L2-L4 Network Virtualization by Midokura MidoNet

Filed in MidoNet, Open vSwitch, OpenFlow, OpenStack, SDN, VMware by on October 6, 2012 12 Comments

Today there seems to be no shortage of SDN start-ups, chasing the OpenFlow hype in one way or another aiming to re-invent the physical network — SDpN (software defined physical network).  And then there’s a rare breed out there.  Those solving cloud networking problems entirely with software at the virtual network layer (hypervisor vswitch) —  SDvN (software defined virtual network).  Nicira was one of those rare breeds (look what happened) — and now it’s apparent to me that Midokura with their MidoNet solution is another one of those rare breed SDvN start-ups like Nicira, but with what appears to be a differentiated and perhaps even more capable solution.

I first learned of MidoNet from a mind-blowing post on Ivan Pepelnjak’s blog.  I had to read that post a few times before I really “got it” — the first clue you’re looking at a rare breed.  And like my first light bulb moment with Nicira, I couldn’t stop thinking about MidoNet for weeks afterwards.

Here’s my understanding of the MidoNet solution based on my own conversations with Dan Mihai Dumitriu (CTO) & Ben Cherian (CSO) of Midokura, with some answers to questions I had after reading Ivan’s post.

MidoNet solution diagram provided by Midokura

 

L2-L4 Network Virtualization

We already know that a cluster of vswitches can behave like a single multi-tenant virtual L2 network switch.  This is a well understood and accepted technology which has been further improved upon with L2 Network Virtualization Overlays (NVO) pioneered by Microsoft VL2 and Nicira NVP.  Now imagine that very same cluster of vswitches (NVO) behaving like a single, multi-tenant, virtual L3 router.  The technology that revolutionized L2 networking in the cloud has moved up the OSI model to include L3 & L4.  What was once bolted-on and kludged together is now literally built-in as a cohesive multi-layer solution.

Even more interesting, with MidoNet you can have x86 machines collectively behave as one virtual “Provider Router” interfacing with other routers outside of the cloud with regular routing protocols.  These machines provide the “Edge” gateway functionality (which I wrote about here) for the North/South traffic to/from our cloud.  On these Edge machines one NIC faces the outside with standard packet formats and protocols.  The other NIC faces the inside with <pick your favorite tunneling protocol here>.  No real need for hardware VXLAN or anything like that from a switch vendor.

Distributed L4 Service Insertion

We typically think of the vswitch as being a basic and dumb L2 networking device — forcing our hand into designing complex, fragile, and scale challenged L2 physical networks.  However that’s not the case with SDvN solutions such as MidoNet (and Nicira).  The networking kernel bits providing basic machine network I/O (and the basic L2 between two VMs) are the very same kernel bits that can also function as our gateway router, load balancer, firewall, NAT, etc.

The next trick is to get the L4 services configuration and session state distributed and synchronized across all machines.  If you can do that, the L4 network services required ( eg. firewall and load balancing) can be executed on the very machine hosting the VM requiring those services.  And Midokura has done just that.  With MidoNet, the days of complicating your network design for the sake of steering traffic to a firewall or load balancer are gone. And all the usual spaghetti mess flows and service insertion bandwidth choke points are gone too.  You can begin to see why I’m an unapologetic Network Virtualization fan boy.

Decentralized Architecture

Each host machine is running the Open vSwitch kernel module.   This is the packet processing engine (the fast path data plane) fully capable of L2-L4 forwarding.  You can think of this as the (software) ASIC of your host machine vswitch.  By itself the software ASIC is useless.  As with a physical switch, the ASIC needs to be programmed with a configuration and flow table by a control plane CPU.  And that control plane CPU can be off-box and centralized somewhere (Nicira), or it can be distributed and exist individually and locally on each host (MidoNet).

Each host machine runs the MidoNet agent (MN) which operates on a local copy of the virtual network configuration, and the agent acts upon that configuration when a packet is received on ingress from a vPort (the logical network interface that connects to a virtual machine, or a router for example).  Think of the local MidoNet agent as the control plane CPU for the local vswitch on that host.  Any special network control packets received on a vPort, such as ARP requests, will be handled and serviced locally by the MidoNet agent.  No more ARP broadcasts bothering the network.

Similarly, L2-L4 forwarding lookups and packet processing (NAT, encap, etc.) is handled locally by the MidoNet agent based upon its local copy of the virtual network configuration.  And as this happens the MidoNet agent builds the flow state on its local data plane (the Open vSwitch kernel module).  The host vswitch data plane does not rely on some nebulous central controller for programming flows — the local agent handles that.  This is one significant point of differentiation between Midokura MidoNet and Nicira (VMware).

MidoNet’s decentralized flow state architecture demonstrates that this solution was built for scalability and robustness.  It takes some serious talent in distributed systems to pull that off — which Midokura has.  Their technology team is a who’s-who roster of ex-Amazon and ex-Google.

Fully Distributed

MidoNet centralizes the virtual network configuration state and stateful L4 session state on a “Network State Database” — and even then, this database is a distributed system itself.  The Network State Database uses two different open source distributed systems platforms.  Any stateful L4 session state (NAT, LB, security groups) is kept in an Apache Cassandra database, while everything else is stored by an Apache Zookeeper cluster (ARP cache, port learning tables, network configuration, topology data, etc.). The virtual network configuration is logically centralized but still physically distributed.   In a normal production deployment, the Network State Database will operate on its own machines, apart from the machines hosting VMs or interfacing with routers.

Each MidoNet agent has a Cassandra and Zookeeper client that reads the virtual network configuration and L4 session state from the Network State Database, with the result being that all host machines have the same virtual network configuration and session state.  But this is a cloud, and things change.  Virtual machines move between hosts, new tenants are created, routers fail, etc.  Such events are configuration changes.  And to facilitate the synchronization of these configuration changes, MidoNet agents and the Network State Database communicate bidirectional for distributed coordination.

That’s the real “trick” — providing both a distributed and consistent virtual network configuration state across all machines.

Note: There’s no OpenFlow here at all — another point of difference from Nicira (albeit insignificant).  That’s just the result of one being a centralized architecture (Nicira) and the other being distributed (MidoNet).

Symmetrical

As we’ve already discussed, the configuration state of the virtual network is the same on all machines — the entire L2-L4 configuration.  Even a machine acting as an Edge “Provider Router” has the same network configuration and state information as the machines hosting VMs.  And vice versa — a host machine has the L3 routing configuration and device state of an Edge machine.  The configuration state is totally symmetrical.

There’s a significant and positive consequence to this — the virtual network traffic is also symmetrical and always takes the most direct path with no spaghetti mess of flows on the network.  This is because each machine can act upon the full multi-tenant L2-L4 configuration/state on ingress — execute any required L2-L4 processing — and send the packet directly to the destination machine.

For example, if you have (3) “Provider Router” Edge machines, as shown in the diagram above, a packet could arrive from the “Internet” on any one of those machines, destined for a VM which could be on any one of the (3) host machines.  Because the Edge machines have the same configuration/state information as the host machines, the Edge machine knows exactly which machine is hosting the destination VM and can send the packet directly to it.  And vice versa.  Maybe one of the (3) Edge machines loses its link to the “Internet” router.  That’s a networking state change realized on all machines, including the host machines.  And thus a host machine delivering a packet from a VM to the “Internet” would avoid sending traffic toward the affected Edge machine.

Note: I put “Internet” in quotes because what is shown as the “Internet” in the diagram above could just as well be the North/South data center routers.

Another example is the scenario where the Blue tenant is allowed to communicate with the Red tenant.  Normally that traffic would need to go through multiple L3 hops through various routers and firewalls. Not here though — because each machine has the full L2-L4 configuration for all tenants, and can act upon it with its L2-L4 kernel module data plane.  And as a result the traffic will be sent directly from the machine with the Blue tenant to the machine with the Red tenant — with all necessary L3-L4 packet transformations applied (NAT).

Tunneling

Packet delivery from one MidoNet machine to another is encapsulated into a tunneling protocol — any tunneling protocol supported by the Open vSwitch kernel module.  At this time MidoNet is using GRE.  It could be something else (CAPWAP, NVGRE, STT) if there’s a particularly good reason (better per flow load balancing on the physical IP fabric), but the tunneling protocol implemented is not the most interesting part of any Network Virtualization solution , despite playing an important role — like a cable.  It’s how the system implements it configuration, control, and management planes that will differentiate it from others.

But of course the tunneling allows us to build our physical data center network fabric anyway we like.  I personally like the Layer 3 Leaf/Spine architecture of fixed switches. ;)

Hypervisors & Cloud Platforms

MidoNet supports the KVM hypervisor and Midokura is currently targeting the OpenStack and CloudStack cloud platforms.  They have developed API extensions and plugins that snap MidoNet into the existing OpenStack architecture, both for the Essex release and the new Folsom release.

VMware is a non-starter (forget about it — don’t even ask).  VMware is a closed networking platform and even more competitive to other network vendors now than ever before with the acquisition of Nicira.

One interesting hypothetical for me is Microsoft Hyper-V.  Microsoft’s networking environment is not closed (yet) to the same degree as VMware.  The Open vSwitch is written in C code, and could be ported into Windows Server 2012.  Something tells me a team in Redmond has already tried that.  So in theory MidoNet could (hypothetically) be a virtual networking solution for customers choosing to build their cloud with Microsoft.  Or perhaps I should say… giving customers a better reason to choose Microsoft instead of VMware.

Midonet features provided by Midokura

Tradeoffs

Midokura has done the heavily lifting — building a distributed system of MidoNet agents — which provides a solid foundation to deliver Network Virtualization beyond simple Layer 2, and in to the realm of distributed multi-tenant L3 and L4 service insertion.  However, like anything else there are tradeoffs to the design choices you make.  Compared to a more centralized model, the MidoNet agent on each host is a thicker footprint than a lighter-weight OpenFlow agent (Nicira).

Some of the natural questions we should be asking are things such as:

What kind of resource drain is the MidoNet agent? CPU, Memory, Disk, Network I/O?

How do the different L2-L4 features affect those host resources if and when configured?

How does the size of the deployment relate to host resource consumption, if at all?

What is the latency to replicate a configuration/state change to all hosts?

Does the scale of the system impact the change latency?

What kind of change rate can the system handle and what would be a “normal” change rate?

Can the machines send data back to the Cassandra cluster for analytics?

Can we have a configuration feedback loop based on such analysis?

Can I “move” a virtual network environment from one data center to another?

Network Virtualization is #Awesome

It’s amazing to me that with just some x86 machines and a standard Layer 3 IP fabric you can build an impressive IaaS cloud with fully virtualized and distributed scale-out L2-L4 networking.  I’m a big fan (boy), and extremely intrigued by these rare breed solutions such as Midokura MidoNet and Nicira.

Cheers,
Brad

About the Author ()

Brad Hedlund is an Engineering Architect with the CTO office of VMware’s Networking and Security Business Unit (NSBU), focused on network & security virtualization (NSX) and the software-defined data center. Brad’s background in data center networking begins in the mid-1990s with a variety of experience in roles such as IT customer, systems integrator, architecture and technical strategy roles at Cisco and Dell, and speaker at industry conferences. CCIE Emeritus #5530.

Comments (12)

Trackback URL | Comments RSS Feed

  1. vmcreator says:

    VMware is a non-starter (forget about it — don’t even ask). VMware is a closed networking platform and even more competitive to other network vendors now than ever before with the acquisition of Nicira.

    One interesting hypothetical for me is Microsoft Hyper-V. Microsoft’s networking environment is not closed (yet) to the same degree as VMware. The Open vSwitch is written in C code, and could be ported into Windows Server 2012. Something tells me a team in Redmond has already tried that. So in theory MidoNet could (hypothetically) be a virtual networking solution for customers choosing to build their cloud with Microsoft. Or perhaps I should say… giving customers a better reason to choose Microsoft instead of VMware.

    Very worrying Brad. Even though VMware may have a closed networking platform and primarily a closed virtualised datacentre!. I can assure everybody out there they do not have a dominated closed application cloud.

    To encourage the masses to consider a move to Hyper-V would be far worse as the Redmond giant would have a closed “everything” cloud, networking. applications etc etc. I far prefer to have VMware niggling Microsoft with sound robust technologies and competition than have Microsoft have it all.

    Running Microsoft applications on the best software defined datacentre [ datacentre OS ] = VMware is a better prospect and a safer bet than letting MS lock you in to everything they have!

    • Brad Hedlund says:

      The only people who should worry about Microsoft getting better at virtual networking is VMware — or others who perceive Microsoft as a potential threat in the data center. Again, this is a hypothetical discussion anyway.

    • Brad Hedlund says:

      Rather than just posting links — It would help the readers to explain why you think that link is relevant to the content of this post.

      Thanks,
      Brad

      • vmcreator says:

        Many many apologies Brad, I definitely fell out of bed the wrong side that day!.
        Your blog and material is excellent and my intention was not to rub your back up the wrong way.
        It was just it seemed a bit of an attacking comment “VMware is a non-starter (forget about it — don’t even ask).”
        To have healthy competition in the industry is better for all companies. If people out there think it is more important to have as much as possible without spending little,then we might as well let there be only one IT company = Microsoft!!

        I guess some people would swap VMware for Microsoft because it is cheaper, but not swap Cisco for Netgear because it is cheaper…………………….very strange?.

        Keep up the good work, your articles are excellent, and it is nice to see Hadoop explored :-)

  2. Somnath Mani says:

    Awsome concept! Couple of quick thoughts…
    At the provider edge router, do you see rampant IP fragmentation because one side of the NIC talks native IP vs other side is encapsulated?
    I anticipate some debugging nightmares at scale given the highly distributed nature of the system.
    Nevertheless, great stuff.

  3. Rohit Baweja says:

    Its a one of the biggest revolution of Virtualization Industry. Earlier there was NICIRA & now Midokura.
    Its simply awesome & thankful to Brad for sharing the great knowledge…..

    • Yacht says:

      You said that one of the Midonet key feture is “virtual port and device monitoring”.What does “virtual port and device monitoring”exactly mean?Does it mean “monitoring the traffic from one port to another port” ?Or just”monitoring a port’s whole traffic”?

  4. Lennie says:

    This makes so much sense, it almost makes my head explode. :-)

    I just wish I see had seen this article sooner, I’ve been looking for information/solutions/ideas like this for a while now.

    The tradeoffs are easy to solve (with BGP at the edge already built in, I’m sure you can scale it to very, very large)

    With things like fairly cheap silicon photonics and packet processing in userspace like Intel DPDK around the corner this will be the right solution for the job.

Leave a Reply

Your email address will not be published. Required fields are marked *