How Software Fails: A Field Guide to Understanding Complex System Disasters
Why Systems Break in Ways Their Creators Never Imagined.
Software failures aren't accidents, they're inevitabilities. In a universe governed by probability rather than certainty, even cosmic rays from distant stars can flip bits in computer memory, causing election machines to miscount votes or video game characters to jump impossibly high. But cosmic interference is just the beginning of how our most critical systems fail in spectacular and unpredictable ways.
Through gripping real-world case studies, this field guide reveals the hidden laws governing complex system disasters. Discover how Knight Capital lost $460 million in 45 minutes due to a single misplaced software flag. Learn why the Therac-25 radiation machine killed patients despite passing every safety test.Understand how a 40-kilobyte configuration file crashed 8.5 million computers worldwide, grounding flights and shuttering hospitals across the globe.
What You'll Learn
Based on complexity theorist Richard Cook's groundbreaking principles, you'll discover:
— Why testing can never guarantee perfection, and what to do instead
— How “reasonable” decisions combine to create unreasonable disasters
— Why complex systems always run in degraded mode, and why that's actually normal
— How scale transforms rare impossibilities into daily certainties
— Why the search for “root causes” consistently leads us astray
From Understanding to Action
But this isn't just about understanding failure, it's about building resilience. Explore practical strategies from organizations that have learned to thrive in chaos:
— NASA's Mars rovers that adapt and learn from component failures, operating decades beyond their planned lifetimes
— The internet's routing protocols that automatically heal themselves when damaged
— Netflix's chaos engineering that deliberately breaks their own servers to build antifragile systems
— The ethical frameworks for deciding what level of failure is acceptable when lives are at stake
Who This Book Is For
Whether you're a software engineer debugging production issues, a manager trying to prevent the next catastrophic outage, or simply curious about why technology fails in impossible ways, this book will forever change how you think about the complex systems that run our world.