On “Why TRILL wont work for the data center”

Filed in Fabrics, FUD, QFabric, TRILL by on June 15, 2011 27 Comments

Today I came across “Why TRILL won’t work for data center network architecture” by Anjan Venkatramani of Juniper. Anjan’s article makes a few myopic and flawed arguments in slamming TRILL, setting up a sale for QFabric.  The stated problems with TRILL include FCoE, L3 multi-pathing, VLAN scale, and large failure domains. The one and only Ivan Pepelnjak has already tackled the flawed FCoE argument, be sure to read that, so I’ll opine here on the L3, VLAN scale, and failure domain arguments.

Anjan writes this about L3 gateways in a TRILL network:

While TRILL solves multi-pathing for Layer 2, it breaks multi-pathing for Layer 3. There is only one active default router with Virtual Router Redundancy Protocol (VRRP), which means that there is no multi-pathing capability at Layer 3.

This is a bit shortsighted and assumes that we are simply stuck with existing L3 gateway protocols of today like VRRP, and therefore you just have to use those in a TRILL network. Why? As the L2 technology evolves it makes perfect sense to look at how L3 protocols should evolve with it.  For example, it’s entirely possible that a simple Anycast method could be used for the L3 gateway in TRILL. In short, each L3 TRILL switch would have the same IP address and same MAC address for the L3 default gateway. The server resovles the L3 gateway to this MAC address which is available on ALL links, because each TRILL spine switch is originating it as if it were its own. The L2 edge switch now makes a simple ECMP hash calculation to decide which L3 switch and link receives the flow.  Simple, right?  The same Anycast concept can also be used for services, such as load balancers and firewalls.

Anjan’s L3 gateway argument is a setup for stating why fewer VLANs should be used with TRILL, thus requiring more hosts on each VLAN (to reduce the L3 switching bottlenecks) thereby adding to scalability problems. Therefore, any subsequent argument related to having too many hosts on one VLAN can be dismissed as FUD based on a shortsighted premise. There’s no reason to change the number of VLANs you deploy with TRILL, or the number of hosts per VLAN.

Anjan continues about TRILL failure domains:

Security and failure isolation are a real concern in the TRILL architecture. Both issues stem from being artificially forced into large broadcast domains. Flapping interfaces, misbehaving or malicious applications and configuration errors can cause widespread damage and in a worst case scenario result in a data center meltdown.

Again, the “large broadcast domains” can be dismissed as myopic FUD. There would be no reason to have larger than normal broadcast domains in a TRILL deployment.

Now, lets talk about “configuration error” and the resulting “wide spread damage”. Coming from Juniper, lets acknowledge the somewhat obvious ulterior motive of selling their (still in slideware) QFabric architecture. Given that, the assumption Juniper would like you to believe is that QFabric would be much less vulnerable to wide spread damage from configuration error than a TRILL network.  But how can that be possible?  On what basis?

The QFabric architecture resembles that of one big proprietary 128-slot switch. One configuration change affects the entire architecture, for better or worse. How is Juniper proposing their architecture is any less vulnerable to disastrous configuration mistakes? If anything, a single network-wide configuration input such as you get with QFabric only increases this risk.  No?  Why not?  Furthermore, why would “security and failure isolation” be less of a concern with Juniper QFabric, compared to any other standards based architecture such as TRILL?

Would any of the Juniper folks like to state their case, and enlighten me? :-)

Cheers,
Brad


Disclaimer: The author is an employee of Cisco Systems, Inc. However, the views and opinions expressed by the author do not necessarily represent those of Cisco Systems, Inc. The author is not an official media spokesperson for Cisco Systems, Inc.

About the Author ()

Brad Hedlund (CCIE Emeritus #5530) is an Engineering Architect in the CTO office of VMware’s Networking and Security Business Unit (NSBU). Brad’s background in data center networking begins in the mid-1990s with a variety of experience in roles such as IT customer, value added reseller, and vendor, including Cisco and Dell. Brad also writes at the VMware corporate networking virtualization blog at blogs.vmware.com/networkvirtualization

Comments (27)

Trackback URL | Comments RSS Feed

Sites That Link to this Post

  1. Your Momma Is So Proprietary « The Data Center Overlords | October 20, 2011
  1. Troy Levin says:

    Brad…..as usual great post!! With TRILL standard officially moved from draft to proposed standard in IETF vendors and customers can confidently begin deploying TRILL compliant software implementations. EX. Cisco FabricPath which includes enhancements for active/active HSRP which can leverage ECMP now and anycast as u mention in the future.

  2. Ravi patil says:

    Well said, defn Trill is future for data center

  3. Jonathan says:

    I think we need to wait and see before jumping to any conclusions about q-fabric vs trill, once we get our hands on some Juniper kit we can decide which is best…. What gets me thinking though is if Cisco have nothing to fear from Juniper’s Q-Fabric why did Cisco see the need to employ David Yen http://www.datacentremanagement.org/2011/05/cisco-nabs-juniper-qfabric-architect-david-yen/

    • Brad Hedlund says:

      Jonathan,
      Cisco hires from competitors all the time, and vice versa. That said, one could easily spin your question around and ask: “Why is the top QFabric guy leaving Juniper?”

      Cheers,
      Brad

      • BT says:

        That said, one could easily spin your question around and ask: “Why is the top QFabric guy leaving Juniper?”
        >> This top QFabric guy leaving Juniper has nothing to do with technology. He just didn’t get the promotion he wants. Now in Cisco, he is playing a bigger role.

  4. EtherealMind says:

    Some this argument is specious.

    Cisco is asking customers to deploy FabricPath – which is intended to be TRILL with a set of Cisco proprietary extensions that make a Cisco network an equivalent closed system to Juniper QFabric. And we are to believe that the software for TRILL is mission-critical and proven from day one which seems unlikely given Cisco’s software development history ( but signs of improvement lately ).

    Second, the fact that QFabric offers a linear pricing model compared to Cisco’s capital intensive purchasing model by buying heavy switches in the core is also attractive. A “128 slot switch” can be a better price model than “128 individual switches” that Cisco is so keen to sell. You should note that QFabric has a target network size of 500 x 10GbE ports – not a market that Cisco is readily able to attack with existing products.

    And finally, the confusion of the Nexus 2000 line which uses proprietary technologies to make a “dumb hub” switch known as Fabric Extender so as to lower the price. Is this a short term play as Cisco attempts to saturate the market.

    Cisco has great technology in the NX7K, but Juniper QFabric is a different approach that also has merit. I agree that the whitepaper is clumsy marketing, but Juniper’s apporach has merit in my view.

    • Brad Hedlund says:

      Hi Greg,

      While, yes, FabricPath is Cisco proprietary I will disagree with your statement that FabricPath is “an equivalent closed system to QFabric”.
      For a few simple reasons:

      1) Cisco FabricPath is based on TRILL and hardware compatible with TRILL. It will only take a software upgrade to run standard TRILL.

      2) With Cisco being the main contributor to TRILL, it’s inevitable that the current non-standard things about FabricPath will find their way into future versions of the TRILL standard.

      3) The customer can choose the ports they want to run FabricPath, while other ports can connect to standard Ethernet hosts or other non-Cisco switches. No such flexibility exists with the QFabric Interconnect.

      To be fair, 500 x 10GE L2/L3 line rate ports in a single switch is not something Juniper can do with existing products either.

      Thanks for the comment!

      Cheers,
      Brad

      • Derick Winkworth says:

        There are several examples of Cisco driven standards that never included Cisco proprietary features. Time-Based Anti-Reply is a great example of that. I’m not sure that #2 is a convincing argument.

        Can you clarify #3? I believe the uplink ports on the qfabric edge devices will be multipurpose… The core devices themselves of course will not be. Is that what you are referring to?

        I’m not sure thats a benefit. FabricPath from a forwarding plane perspective looks just like MPLS. FabricPath nodes then are effectively P nodes and IMHO… P nodes should never also be PE nodes… I imagine the same will hold true for FabricPath designs.

        • Brad Hedlund says:

          Hi Derick,
          On point #3, I’m referring to the QFabric core device (Interconnect) being a closed QFabric only device. That matters because it determines how you integrate 3rd party or legacy equipment during a migration. With Cisco FabricPath, you can connect any standard Ethernet access switch to the FabricPath core layer and run those ports as classical Ethernet.

          With QFabric, on the other hand, you’ll have no other choice but to connect your classical Ethernet switches to the edge device, the QF-Node. So now you have an access switch connected to another access switch. That amounts to adding layers to the network, not removing them, which by the way is counter intuitive to the QFabric 3-2-1 marketing hype.

      • EtherealMind says:

        It’s my understanding that post-TRILL standard, Cisco will develop a number of proprietary extensions. These may offer customers valuable features (in the same way that EIGRP offered useful features in 2001) but ultimately result in a closed data center network – just like Juniper QFabric. Thus the outcome remains the same.

        I will continue to perceive FabricPath per se, as a closed system in the same way that Juniper QFabric / Brocade VCS is closed – vendor differentiated features. Any finger pointing about non-standards compliance should be done with care since each companies own products are all culpable.

        greg

        • Brad Hedlund says:

          Greg,
          The big difference here is that customers will have a choice between proprietary Cisco FabricPath and standard TRILL with a software change, all on the same hardware investment.
          Such choice and flexibility does not exist with closed solutions (QFabric) which are not rooted in any standards.

          Cheers,
          Brad

          • EtherealMind says:

            I agree with that. However, I would also point out that QFabric is a different approach. Instead of many individual control planes acting as a fabric as we have today, Juniper QFabric looks to make a single control/management plane out of many switches. Therefore the interoperability isn’t a requirement since they all act as a single “borg” style fabric.

            That said, the QFNode switches can act as conventional switches with TRILL if you want to design them that way.

          • Brad Hedlund says:

            Of course interoperability isn’t a requirement between the switches in a closed architecture. It comes down to how you integrate with existing equipment without adding layers to the network.

  5. Juan Lage says:

    Greg,

    I’d like some insight about something you wrote: “QFabric offers a linear pricing model compared to Cisco’s capital intensive purchasing”. I really can’t see much difference for an equivalent deployment from a capital cost between a leaf/spine model deployed with a closed proprietary approach (qfabric) and other vendor’s (Cisco included) deploying based on TRILL. Maybe you are assuming the Qfabric interconnect will be much less expensive, which is possible, but I have not seen any pricing guidance yet …

    In any case, and to add to Brad’s point, Cisco’s FP approach is also more “open” it that it does not lock you in at all layers. You can build a 2-tier network with Nexus switches as leaf and spine. Once this is done, you can connect other vendor switches to a Nexus 7K spine if you wish so, or you can use Nexus leaf switches to other vendor as spine. This would rely on TRILL future interop yes, but even today you can do that with LACP. You can also rely on using GE, 10GE and in the future 40GE and 100GE for such connections. With Qfabric, what can you connect? Nothing but JNPR devices and for what we know, always using only 40Gbps interfaces.

    FabricPath is proprietary, but not closed. Qfabric is both proprietary and closed. :-)

    Cheers,

    Juan

    P.S. … on the pricing, don’t forget to add on also the OOB management network (redundant) which is a must have with Qfabric … :-)

    • EtherealMind says:

      QFabric consists of three elements, QFNode, QFManager and QFInterconnect.

      The QF Interconnect is multistage clos silicon fabric. Conceptually similar to the Silicon Switch line cards of the Nexus 7000 – except that the entire chassis is devoted to silicon fabric cards (rear) and 40GbE line cards (front) that act as “backplane connections”.

      I’ve previously made some attempt to describe this technology here (hope it’s ok to link, Brad) : http://etherealmind.com/controller-based-networks-for-data-centres/

      and made some more detailed observations here:

      http://etherealmind.com/juniper-qfabric-my-speculation-too/

      The QFInterconnect has no management or control plane. As such its much cheaper to buy. Compare with the NX7K which requires the purchase of the Supervisors, Line Cards, Silicon Fabrics in the first phase – in effect purchasing a large chunk of your forwarding capacity in a single and capital intensive process. The Juniper QFNodes contain the control plane, and the addition of each edge switch adds more performance to the overall system. While the QFInterconnect will require more line cards and fabric cards, these are simpler devices that cost much less than equivalent Nexus devices.

      However, the potential negative is that the economic sweet spot is at 500 10GbE ports – that where the QFabric really stacks up. That’s a lot of ports, and not many Enterprises need that right now therefore it’s solving a different problem to the Nexus family.

      You might be interested in a podcast I recorded with Juniper that was published today:

      http://packetpushers.net/show-51-juniper-qfabric/

      The OOB network uses standard ERX switches, they aren’t expensive compared to, say, a Nexus 2K – more like a C3750.

      Hope this helps.

      greg

      • Juan Lage says:

        Hi Greg, I’ve listened to your podcast about Qfabric indeed, very good :-)

        I understand well the concepts and components that make up QFabric. And I fully understand that the QFInterconnect isn’t a switch. Not entirely sure it has absolutely no management plane, because the hardware needs to be programmed somehow, and I presume it will have some sort of management modules that connect to the OOB network. But I take it that such management modules could be built perhaps with less cpu and memory resources … at any rate, you assume they will be much less expensive than a comparable N7K with future 40GE linecards. Possible. We will need to wait and see.

        But physically the QFInterconnect is a modular box that sits in the place of your DC where you have laid your fiber to. Very strategic place from a cabling perspective. And all you can connect to this device is QFnodes and using 40Gbps interfaces (not 40GE) afaik. This is capital expensive for customers who do not require so much bandwidth (just on the optics and the fiber). If you have blade chassis with embedded switches you can’t connect them directly, you need to add a ToR layer.

        Also, if you want to connect the QFabric to the rest of the network you need to connect via a QFX3500, a device built for server access with only 9MB of shared memory (afaik again) – clearly not built for high speed core connections where more buffering is usually nice to have.

        On the pricing, can’t really say EX switches less or more expensive than “comparable” Nexus or Catalyst solutions. All I am saying is that if you build a TRILL or FabricPath based network, you don’t need to build the OOB network. :-)

        So all in all, without knowing the pricing, I am not sure we can assume that it will be less capital intensive.

        thanks for the good info :-)

        cheers,

        Juan

  6. Omar Sultan says:

    So, if we are going to get into pricing, its also helpful to note that FabricPath has much greater granularity. If you are an N7K customer, you can create a “flat network” for the price of a couple of F1 modules. This gives you an easy way to try flattening your network and also gives you the ability to flatten the portions of the network where it actually makes sense and you don’t have to mess with the portions of the network that are happily running as-is. Needless to say, this is also a lot simpler and less risky that introducing an entirely new switch architecture into your DC.

    One the pricing front, in case folks don’t realize, the Nexus 55xx is also FP capable (support via a NX-OS update is forthcoming) so you have a couple of choices for you leaf nodes.

    Regards,

    Omar

    Omar Sultan (@omarsultan)
    Cisco

    • Stefan says:

      @omarsultan: and which version of NX-OS would you recommend running with the F1 cards? We are presently ready for testing, but in our bug scrubbing of the 5.1 (we only know of the 5.1(3) as production supported) has revealed no less than 326 issues, of which things like CSCtg43396, 13963, 83899, 59485, , 78583 (!!!) make us very, very worried about production deployment.

  7. Jaime C. says:

    Good points on everyone’s favor, but don’t miss the point… what DC needs to evolve towards Cloud is the lower latency, the lesser hops and the simpler managent, of corse being as green as possible. At the end customer need to weight trade offs and make a choice to go. In the near future I guess Jnpr is leading the way wisely. I bet Cisco will re-architect to a true fabric in the next few moths.

  8. joe smith says:

    Brad, this should be another thread…

    I am very curious to get your take on security in the cloud. To be more specific, how does one go about providing IDS services in a virtual environment?

    A requirement I am starting to see more often is a requirement to run an IDS service such that VM-to-VM traffic can be monitored. The traffic flow can be between two VMs on the same blade, 2 VMs on two separate blades in the same chassis, or two VMs on two separate chasses…

    In that case, I see 3 traffic flows off the bat…

    same blade: vm-to-vm traffic is switched by a hypervisor switch (1000v or vmware vDS).
    different blades in same chassis: vm-to-vm traffic will leave blade and be switched by chassis hardware switch (chassis I/O blade).
    different chassis: vm-to-vm traffic will have to go to ToR (maybe even end-of-row).

    NOTE: if VMs are on different VLANs, traffic will always go to end-of-row/agg switches (the L3/L2 boundary).

    So given all those possible flows, what is the best way to go about deploying an IDS service? Placement? Virtual or physical? etc….

    Looking forward to getting your insight!

  9. Craig Weinhold says:

    Cisco’s GLBP is a FHRP algorithm that supports multipath L3 routing very well. And with OTV, Cisco has legitimized the concept of filtering FHRP messages to create anycast-like VRRP/HSRP.

    But the L2/L3 scalability discussion misses the point. Assuming the network is competently designed and operated, the main technical reason for VLAN segmentation is to control traffic flow — to force different VLANs through different policies — firewalls, IPS, load-balancers, packet capture, etc. And policy enforcement is going to be the chokepoint no matter what the underlying switching technology is.

    VLANs still have some intrinsic value of their own, but they aren’t the mosh pits they once were. Traffic is far more orderly and controlled, thanks to Private VLANs, ARP security, MAC security, IGMP snooping, the dominance of IP, etc. Unicast flooding is even on the table, having been removed from Cisco’s OTV and UCS without much impact.

  10. Null says:

    Recently had an opportunity to evaluate presentation from different vendors e.g Cisco, Brocade, HP and Juniper with regards to Datacenter.. In terms of presentation, Brocade & Juniper was quite impressive. I liked what Brocade has to offer & each top of the rack switches has its own brain unlike Cisco’s one master switch with brain and rest just dum. so if master fails then other switches have no brains e.g nexus but only concern: New Technology and they don’t support Cisco proprietary EIGRP
    Juniper os feature of automatic configuration command reversal after 5-10min if not save..was interesting & again and they don’t support Cisco proprietary traditional EIGRP

    • Brad Hedlund says:

      Thanks for sharing your thoughts here. Yeah, if you require Cisco EIGRP, then considering other vendors is just torturing yourself with things you cannot have.
      Get that EIGRP out of there, ASAP. :-)

  11. Guillaume BARROT says:

    Hi,

    Great post, and that L2 anycast is a great idea. Not only for the gateway by the way.
    Is there a draft of this in the trill ietf workgroup or is this just scifi for the moment ?

Leave a Reply

Your email address will not be published. Required fields are marked *