Comparing efficiencies of Fixed vs Chassis switches

Filed in Fabrics, merchant silicon, OpenFlow, TRILL, VXLAN by on May 10, 2012 20 Comments

When building a fabric for a cluster of servers and storage in the data center, how should you architect the network?  There are several ways to approach this.  The architecture you choose probably depends on your preconceived notions of what a network should look like and addressing the things that you care about.  For example, does power and space efficiency matter at all?  It usually does, given these are often finite and costly resources.  The more power and space given to the network means less power and space given to storage and compute — the stuff that actually provides a return.

With that in mind, which architecture and switch platforms might work best when space and power are taken into consideration? Lets take a quick look at comparing the power and space efficiency of Fixed switches vs. Chassis switches.  I will make the observation that fixed switches continually outpace chassis switches in power and space efficiency.

Power efficiency

The line graph above shows the maximum rated power per line rate L2/L3 port.  We are looking at the most dense platform for that year and from the data sheet we divide the max power by the number of ports.  For the Fixed switches, I could have used data from the lower power single chip platforms, but to be extra fair we are looking at the higher power multi chip platforms (eg. Dell Force10 Z9000).   I did not include 2008 because the chassis data for that year was so high that it skewed visibility for the remaining years.  Chassis switches got significantly better in 2010 thanks to some new players in that market (namely Arista).

Note: Projections for 2014 are based on the trend from previous years.

Space efficiency

The line graph above shows the line rate L2/L3 port density of the most dense platform that year, Chassis vs Fixed.  Pretty straight forward.  We take the number of ports and divide it by the required rack units (RU).  While each platform is getting better, the port density of fixed switches continually outpaces chassis switches with no end in sight.

Conclusion

Fixed platforms are more power and space efficient than chassis platforms, by a significant margin, year after year.

Some might say: “Yes, Brad, this is obvious. But comparing chassis vs fixed is not a fair comparison, and its silly.  You can’t build a scalable fabric with fixed switches.”

My response to that: Think again.  Perhaps it’s time to question the preconceived notions of what a network architecture should look like, and the form factors we instinctively turn to at each layer in the network .  Ask yourself a very basic question:  “Why do I want so many ports shoved into one box?”  Are you building a scalable network? Or are you designing around arcane Layer 2 network protocols?

What would an efficient and scalable network look like if we could eschew the premise of arcane Layer 2 protocols (STP)? And instead build the network with new alternatives such as TRILL, OpenFlow, or Layer 3 fabric underlays with network virtualization overlays (VXLAN, NVGRE, STT).

What would that network look like? ;-)

Follow-up posts:

Cheers,
Brad

Data:
Chassis density
2008 – 3 (Nexus 7010 w/ 64 @ 21RU) *M1-32 linecard
2010 – 34 (Arista 7508 w/ 384 @ 11RU)
2012 – no change, Arista 7508 still most dense
2014 – anticipated 96pt per slot w/ current chassis

Fixed density
2008 – 24 (Arista 7124, Force10 S2410) @ 1RU
2010 – 48 (Arista 7148) @ 1RU
2012 – 64 (Broadcom Trident) @ 1RU
2014 – anticipated 128pt @ 1RU

Chassis power
2008 – Nexus 7010 w/ 8 x M1-32 = 8400W max (64 ports line rate), 131W / line rate port
2010 – Arista 7508 = 6600W max / 384 ports = 17W
2012 – Nexus 7009 w/ 7 x F2 = 4595W max / 336 = 13.6W
2014 – Anticipated 25% decrease = 10.2 (based on a 25% decrease from prior 2 years)

Fixed power
2008 – Arista 7124SX – 210W / 24 ports = 8.75 W / line rate port (single chip)
2010 – Arista 7148SX – 760W / 48 ports = 15.8 W / line rate port (multi chip)
2012 – Broadcom Trident+ based platforms – 789W (Dell Force10 Z9000) / 128 line rate ports (multi chip) = 6.1W
2014 – Anticipated 60% decrease = 2.4W (based on a 60% decrease from prior 2 years)

About the Author ()

Brad Hedlund (CCIE Emeritus #5530) is an Engineering Architect in the CTO office of VMware’s Networking and Security Business Unit (NSBU). Brad’s background in data center networking begins in the mid-1990s with a variety of experience in roles such as IT customer, value added reseller, and vendor, including Cisco and Dell. Brad also writes at the VMware corporate networking virtualization blog at blogs.vmware.com/networkvirtualization

Comments (20)

Trackback URL | Comments RSS Feed

  1. Mark Berly (@markberly) says:

    As always an excellent post I 100% agree with your assertions above that a fixed switch has power efficiency advantages, if you are building a network that fits into a single fixed device. If you are not then you have to include the ISLs (Inter Switch Links) and associated aggregation switches in your calculation. With the addition of these links/switches it can quickly push power and space well beyond a chassis based solution. Both chassis and fixed switches have their place in the networking world when designing we need to factor many things aside from power consumption including desired over-subscription, amount of buffering, etc…

    If you want to include fixed switches in your 2011/2012 list it should include the Arista 7050S-64 which draws 2W per 10G port, still the lowest power draw per 10G port around.

    BTW – The numbers for the 7508 are wrong that is the theoretical max draw of a system, not the draw it takes today which is 5700W – this number is even high as it assumes the worse case optics typical is 3800W. There is capacity for additional power to the chassis for future line cards which may push it up to 6600W providing a future proof chassis as next-gen ASICs push the density even higher…

    • Brad Hedlund says:

      Hi Mark,

      …you have to include the ISLs (Inter Switch Links) and associated aggregation switches in your calculation. With the addition of these links/switches it can quickly push power and space well beyond a chassis based solution.

      I’m glad you brought that up. You need to look at this from a viewpoint of building the entire fabric. Eg. for a *fabric* of size X (number of server ports), and oversubscription Y, what would it take to build that with fixed switches vs chassis switches, or some combination of the two. It doesn’t do a bit of good to point out how many fixed switches it would take to build a replacement for *one* chassis — because nobody ever deploys just one chassis for their entire fabric.

      The numbers for the 7508 are wrong that is the theoretical max draw of a system, not the draw it takes today which is 5700W

      I used the number found in the Arista documentation, which took some digging to find by the way. Other chassis vendors offer a power calculator which makes things easier — why is there no power calculator for the Arista chassis switches?

      Cheers,
      Brad

      • Mark Berly (@markberly) says:

        I think the design needs to be predicated on the requirements, this would dictate which platform you would use and its capabilities. With the current state of today’s technologies unless you can use a L3 ECMP design you are limited on scale when trying to use open standards to scale out a network with fixed form factor switches. As technologies are becoming available (e.g. VXLAN, OpenFlow, NVGRE, etc…) which allow us to build L3 based networks while overcoming the application dependancies requiring L2 connectivity then building large scale low oversubscription Clos based designs using fixed form factor switches becomes feasible.

        • Brad Hedlund says:

          Don’t forget about TRILL — which will provide good old physical Layer 2 between racks in a scale-out fabric of fixed switches.

          • Derek Dolan says:

            Can we please forget about TRILL? :) It’s just spanning tree for bigger, faster networks. Not remotely a fundamental change that fixes a real issue… just a band-aid, IMO.

            I much prefer the idea of killing those sort of protocols altogether in favor of something more intelligent in the SDN arena (per your post on dodging open protocols).

          • Mark Berly (@markberly) says:

            When TRILL is supported in a non-proprietary way by the majority of vendors then I will consider it ;-)

  2. Craig Johnson says:

    One thing to consider in the fixed vs. modular debate is the control plane – I don’t know of any fixed switches that can give me redundant CPU like a dual-supervisor modular chassis. This leads into in service upgrades – now I need other switches to take on control plane functions if I’m doing live upgrades (my bias is speaking here – this what is done in Nexus 5000 vPC ISSU).

    I do think the industry is trending away from large chassis – power and space is only getting at more of a premium, and the density in fixed chassis are becoming much greater in a much smaller footprint.

    • Ryan Malayter says:

      ISSU is a software feature and has little to do with form factor. See Arista for what can be accomplished on a single CPU box with a reasonable multi-process software architecture. ISSU has been done for decades on single UNIX servers. It can be tricky to copy state from old process to new, but many OS and server software packages have the feature.

      That said, even if you have a “legacy” monolithic software stack on your multiple fixed configuration switches, you should be dual-homing all hosts and downstream switches to two physically separate switches in your layer 2 domains. Given that the L2 domains and restart times are much smaller, having controlled fail-over during an upgrade is not much of a big deal.

      But looking forward, we’re talking about a “fabric” of fixed-configuration switches here. That presumes a controller or routing protocol, multi-pathing, and Clos, torus, or hypercube topology. Convergence will be measured in milliseconds in such a setup, and I imagine vendors will probably offer the “increase link cost temporarily” option used for bringing traditional L3 routers out of service gracefully during maintenance.

      • One could argue that ISSU reduces the impact of downtime of one large box. One could also argue that in supporting these kind of redundancy mechanisms you slow feature adoption and development speed, and that you require more complex and expensive hardware with each extra bit of internal redundancy you throw into the mix.

        What if, for your aggregation layer (where port density matters big time), you moved from two behemoth highly redundant (99,9% uptime) switches with 480 ports each, to a “spine” of ten simple switches of 48 ports each, with limited redundancy (98% uptime). The likelihood of losing half of your bandwidth is far greater with the behemoths than with the simple spines. You just need the right protocols to design that setup.

        The bigger challenges, IMHO, in moving to that scenario for massive scale, are two:

        - Management overhead: managing 5x as many switches is not so funny :)
        - Expandability: fixed switches force to modify the topology to grow, modular switches allow to insert more cards leaving the network diagram untouched. I see modern protocols making this a non-issue, but for the moment, building a old Ethernet three-box distribution layer is a pain in the butt.

        Just my 2c here. Keep it up, Brad, you are well missed around here at Cisco. ;)

    • Mark Berly (@markberly) says:

      While we are discussing differences between fixed and modular we need to look at the fabric architecture and how it handles traffic flows. In general multi-chip Clos based designs can run into trouble when we see multiple flows hashing to the same internal link, while this can be mitigated it is an issue. A VoQ fabric design, which most modular switches have today, helps ensure fairness and mitigates HOLB issues. As stated earlier there are many important factors that need to be considered when designing a network, power and space being two, but there are many others….

      • Brad Hedlund says:

        Mark,
        Modern chips in fixed switches today mitigate HOLB. These chips also provide multiple hashing options. Where you have a fixed switch using a multi-chip architecture, the switch designer just needs to implement the appropriate hashing technique for inter-chip flows.

        • Mark Berly (@markberly) says:

          Unless you have a predictive hash then there will be situations where you have multiple flows hashing to a link which exceeds its capacity. Once you get into this condition then you can change the hash and redistribute the flows, this is after you are dropping packets and no guarantee that the traffic patterns will not change exposing the issue again. WIthout the ability to predict the flows ahead of time you will see drops due to this, no different than we see on port-channels.

          • Brad Hedlund says:

            Mark,
            If you have a chip that has N front panel ports, and N fabric ports — you can pin Front1 to Fabric1. In the fabric chips you can apply similar logic. No hashing collisions, and no predictions required.

          • This is actually in response to Brad’s “port-to-port” crossconnecting. Nice TDM-style idea, but think what’s going to happen on “fabric-only” switch: you may need to send data from multiple source ports to the same destination (e.g. N podsets speaking to 1) and you again run into the statmux and collision issues. Link load-balancing is serious business… :)

          • Brad Hedlund says:

            Each chip on the “fabric-only” (Fixed) switch manages its own Incast events with integrated buffers an on-chip scheduling. Is that any better or worse than a centralized scheduler? Tough to say. Incast is a difficult traffic condition to manage on multi-chip switches, be it a Chassis or Fixed switch.

  3. Merrill Hammond says:

    Brad, I’m curious to know how we’re going to reasonably get beyond 48 ports per 1U. Will we have 40gig ports with breakouts? A smaller interface than SFP? Or some other method I’m not aware of. Love your articles, they’re always very insightful and challenging to the current mindset.

    • Mark Berly (@markberly) says:

      There are already platforms today that offer > 48 ports per 1RU this can be done with SFP+ or QSFP (there are future technologies but these are the ones that exist today.

  4. DM says:

    Brad, Do include the Brocade VDX switches too in your comparisons.. Brocade leads in number of Ethernet Fabric Deployment and the VDX products have been shipping for almost 20 months now with close to 500 customers in production. These switches are built extremely power efficient.

Leave a Reply

Your email address will not be published. Required fields are marked *