Summary

On January 15, 1990, the long-distance network of AT&T collapsed nationwide when a routine software upgrade introduced a subtle but critical bug into the recovery software of 114 of its 4ESS electronic switching systems. :contentReference[oaicite:2]{index=2}
As a result, many switches mis-handled certain simultaneous messages, triggering cascading failures. Over the course of roughly nine hours, millions of calls were blocked or dropped, and the network became partially inoperative. :contentReference[oaicite:3]{index=3}

Systemic Features

  • A single software bug (a misplaced break statement in nested conditional logic) in recovery code affected all 114 switches — showing how a tiny code error in a highly-centralized system can trigger widespread collapse. :contentReference[oaicite:4]{index=4}
  • The network was tightly coupled and highly centralized, with most long-distance traffic routed through the same switching infrastructure — meaning failure at one node could ripple across the system. :contentReference[oaicite:5]{index=5}
  • Despite prior testing, the bug passed quality control — illustrating limitations in traditional software testing when dealing with complex, real-time systems under load. :contentReference[oaicite:6]{index=6}

Impacts

  • Huge disruption of communications nationwide: tens of millions of calls went uncompleted during the outage window. :contentReference[oaicite:7]{index=7}
  • Significant economic losses for AT&T (and likely downstream effects for businesses relying on long-distance communication), estimated in the tens of millions of dollars. :contentReference[oaicite:8]{index=8}
  • Exposure of a major systemic risk in digital infrastructure — showing that centralization + software complexity + lack of redundancy can create brittle dependencies.
  • Broader impact on public perception of telecom reliability; served as a cautionary example in the emerging era of large-scale, software-driven infrastructure.

Further Reading / Sources

  • The Crash of the AT&T Network in 1990 (overview of incident) :contentReference[oaicite:9]{index=9}
  • Technical post-mortem report from California Polytechnic University analyzing root cause (“one-line bug in recovery code”) :contentReference[oaicite:10]{index=10}
  • Contemporary press coverage (e.g. Los Angeles Times discussion of the fault in switching software) :contentReference[oaicite:11]{index=11}