Rethinking the Unthinkable

Managing the Risk of Catastrophic Failure in the Twenty-First Century

“This was not our drilling rig, it was not our equipment, it was not our people, our systems or our processes.”
– BP CEO Tony Hayward, 13 days after the explosion aboard Deepwater Horizon

Despite Mr. Hayward’s assertion, it was ultimately BP’s failure to manage the myriadrisks of deepwater drilling that caused a image_0tragic loss of life, widespread environmental damage, and a bill of upwards of fifty billion dollars. The failure of Deepwater Horizon, and BP’s inability to contain the subsequent oil leak, was not simply a failure—it was a system meltdown.

It’s not hard to find other examples that reveal unexpected fragility in our systems. In the winter of 2009, weather conditions caused the breakdown of Eurostar trains, stranding 2,000 passengers inside the Channel Tunnel and contributing to a transportation standstill across western Europe. In October 2012 emergency generators at New York University’s hospital failed during Hurricane Sandy, forcing the evacuation of critically ill patients. Earlier that same year, incorrectly deployed software at the market maker Knight Capital flooded the stock market with millions of unintended orders and caused Knight to lose nearly half a billion dollars in just 45 minutes. Indeed, the Global Financial Crisis, including the bankruptcy of Lehman Brothers, the near-collapse of AIG, and related liquidity shocks, represents a series of interconnected system failures.

Though these failures look different on the surface, many of the underlying causes are surprisingly similar. Modern systems are steadily becoming both more complex and more interconnected in ways that are not well understood.

In simpler systems, most risks stem from predictable disruptions, and small mistakes tend to have minor and well-understood consequences. In contrast, in many modern systems, small errors can combine in novel ways to yield large failures that are hard to understand even as they unfold. To understand this distinction, compare the physics of throwing a ball with the dynamics of an avalanche. A ball follows a predictable path, and the harder you throw it, the farther it will go. In contrast, an avalanche can be triggered by a small event that unleashes a wildly more powerful response.

Systemic challenges are proliferating and reshaping the modern risk landscape. In a recent survey of C-suite executives, nearly 60% reported that the volume and complexity of the risks they face have increased substantially over the past five years. Objective measures, too, suggest that the physical and financial context in which organizations operate has become radically riskier. According to the IMF, the recent worldwide cost of natural disasters has far outpaced the growth of global GDP. Similarly, since 1973, banking and currency crises around the world have been occurring twice as frequently as they did during the Bretton Woods period, leading some economists to conclude that “there is something different and disturbing about our age.”

As systemic challenges proliferate, there are increasing the penalties for failing to address technological complexities, organizational weaknesses, and cognitive challenges that organizations might have been able to safely absorb in the past. As a result, the measures that once served organizations well in managing risk—instituting rules and controls, scenario planning, and bringing in additional expertise—are no longer sufficient.

Through our research, we have identified a set of complementary interventions that are taking on new importance in enabling organizations to detect early warning signals, reduce the number of errors that can trigger cascading failures, and develop more effective crisis response capabilities. Tracking near misses, for example, is a powerful way to learn from early signals of potential catastrophe, and there are notable success cases, particularly in aviation and healthcare. Likewise, organizations become more resilient when leaders appoint designated sceptics: devil’s advocates who stress test estimates, explore extreme scenarios, and challenge optimistic assumptions.

It is also fundamentally important to avoid pushing through during a crisis. Sticking to an existing plan even in the face of new, contradictory information has played a key role in a variety of failures, including the Deepwater Horizon oil spill, NASDAQ’s handling of the Facebook IPO, and numerous aviation accidents. While there will always be pressures to continue in the face of uncertainty, executives can foster norms that help organization members overcome the psychological challenge of conceding (temporary) defeat by halting an ongoing process or giving up on a planned course of action. At a trading firm we worked with, for example, one junior trader reported that he had never received as much praise from senior managers as when he stopped an apparently profitable trade after realizing that he did not fully understand it. Such feedback helps to create norms that, one day, may prevent catastrophe.

The solutions we present don’t require large financial investments or expensive technologies. But that does not mean that they are trivial to implement. Organizational cultures often celebrate self-confidence, decisiveness, persistence, accord within a group, and good news. In contrast, reducing the potential for catastrophic failures requires an emphasis on the importance of doubt, hesitation, dissent, and the sharing of bad news. A cultural shift in this direction can be an extremely difficult leadership challenge, especially in high-performing organizations unaccustomed to failure.

Despite these challenges, there are important cases of success. Since the late 1970s, for example, commercial aviation has undertaken radical changes to create an effective risk management culture—reducing hierarchy to encourage dissent, creating designated skeptics, tracking near misses, and fostering norms of stopping. As a result the industry has achieved massive improvements in safety even as aircraft and operations have become significantly more complex. As the risk landscape continues to shift, the ability to implement such interventions will become one of the defining traits of successful organizations.

Christopher Clearfield is a Principal with System Logic, a boutique consultancy focused on risk and decision making.

András Tilcsik is an Assistant Professor of Strategic Management at the Rotman School of Management at the University of Toronto and a Fellow at the Michael Lee-Chin Institute for Corporate Citizenship.

This excerpt is from their full book proposal, which won the Financial Times and McKinsey’s Bracken Bower prize, awarded to the best proposal for a business book about the challenges and opportunities of growth.

Author: Chris Clearfield and András Tilcsik

#BrackenBower winners Chris Clearfield and András Tilcsik on managing the risk of catastrophic failure.