Outage Story with VTP

Filed in Switching by on December 2, 2007 4 Comments

One of my accounts had an unfortunate network outage that lasted about an hour. This outage was caused by human error with VTP but not in the classic revision number way we have heard about before.

Here is what happened…

1) A CatOS access switch fails and is scheduled to be replaced by the network team.

2) The network team grabs a replacement switch off the shelf and is configured with the IP address, default gateway, SNMP strings and VTP domain name of the failed switch. In addition the switch was configured as a VTP Server < — mistake. At this point the switch has a very low revision number.

3) The failed switch is removed and the replacement switch is put in its place. Once the new switch connects to the network it downloads the VTP configuration and syncs up its configuration revision. At this point everything is fine.

4) To restore exact configuration of the previously failed switch a Ciscoworks configuration restore job is launched. The Ciscoworks server does a stare and compare of the last archived config and starts configuring the switch.

5) In the process of configuring the switch the Ciscoworks server deletes all VLANs execpt the ones needed by the switch (as was called for in the config file). Since the switch is still a VTP Server it starts deleting the VLANs across the campus. Network connectivity on the MGMT vlan was lost to the switch before Ciscoworks could set the VTP mode back to Client or make any further configurations.

The customer had to manually recreate each VLAN at the intended VTP servers to restore the network.

This is an unfortunate reminder that VTP really is a risky thing that should be turned off everywhere. Whatever administrative ease it provides does not offset the risks.

###

Tags:

About the Author ()

Brad Hedlund is an Engineering Architect with the CTO office of VMware’s Networking and Security Business Unit (NSBU), focused on network & security virtualization (NSX) and the software-defined data center. Brad’s background in data center networking begins in the mid-1990s with a variety of experience in roles such as IT customer, systems integrator, architecture and technical strategy roles at Cisco and Dell, and speaker at industry conferences. CCIE Emeritus #5530.

Comments (4)

Trackback URL | Comments RSS Feed

  1. Stanley Chan says:

    Yup, I totally agree. VTP “shouldn’t” be use at all cost. I rather manually configure the VLANs than let VTP destroys the network and then configure the VLAN afterward. My customer has experienced a similar situation. Their CIO seats not far from the switch closet and saw them changing the switch out before the whole network dropped.

    Talk about maximum exposure to your upper managements..

  2. Darby Weaver says:

    Funny, I would think to question the operational functionality of Ciscoworks Server. It seems like it is the piece of the puzzle that broke the camels back.

    Never a fan of automatic network configuration in most cases. Not to say I do not use templates, because I do. But I’ve seen my share of quirks with CiscoWorks and the WLSE for that matter.

    Is VTP the crux of the problem or automatically letting CiscoWorks configure the switch in the first place?

    Now the idea that misconfiguration can occur with VTP is not lost upon me at all.

    But human oversight is as much to blame here as mostly anything else. Why was the switch left a VTP Server in the first place?

    Sounds like Lucy has some explaining to do.

  3. VaibhaV Singasane says:

    Hi

    I totally aggre for this case , we always face so many problem because of human errors so what are the solution to reduce it.

    I also want to know about Roaming VLAN ie same vlan for whole campus is it possible

  4. Joe Harris says:

    I won’t go into the details pertaining to the complete coding of CW2K but know that the CiscoWorks Server did exactly as it programmed to do and is not the culprit here…Surely Morpheus (aka the “Lord of Networks”) would know this and I didn’t remember it but thanks for helping me achieve my CCIE status, it was uncanny!!!

    http://en.wikipedia.org/wiki/User_talk:Darby_Weaver

    BTW, VaibhaV….if you span that VLAN across your network (IE…Roaming VLAN) you better understand and DOCUMENT your layer 2 network to the ninth degree because an issue in/with that VLAN will effect every switch in the network because each switch in the network has that vlan residing on it.

Leave a Reply

Your email address will not be published. Required fields are marked *