Superforecasting: The Art and Science of Prediction
13 min read

Superforecasting: The Art and Science of Prediction

Book notes on Dan Gardner and Philip E. Tetlock's guide on predicting what happens next.
Superforecasting: The Art and Science of Prediction

We live in a complex world, and in order to make sense of it we use shortcuts and create story arcs to answer ‘why’ something happened.

Shortcuts can be helpful, but they can also often lead us astray when trying to understand more complex systems.

In 'Superforecasting: The Art and Science of Prediction', authors Tetlock and Gardner offer a thoroughly readable and eye-opening introduction into how people can become superforecasters, and how we can and should calibrate our understanding of the complexity happening around us.

I thoroughly enjoyed this book, offering important insights into how we frame narratives around us.

A butterfly flaps its wings...

The Arab Spring is an example of how narratives are written.

It is one thing to look backward and sketch a narrative arc, connecting Mohamed Bouazizi to all the events that flowed out of his lonely protest. But looking deeper, there are a whole series of seemingly random events that came together to trigger the entire movement.

In 1972 the American meteorologist Edward Lorenz wrote a paper with an arresting title: ‘Predictability: Does the Flap of a Butterfly’s Wings in Brazil Set Off a Tornado in Texas?’

A decade earlier, Lorenz had discovered by accident that tiny data entry variations in computer simulations of weather patterns—like replacing 0.506127 with 0.506—could produce dramatically different long-term forecasts.

It was an insight that would inspire chaos theory: in nonlinear systems like the atmosphere, even small changes in initial conditions can mushroom to enormous proportions.

He meant that if that particular butterfly hadn’t flapped its wings at that moment, the unfathomably complex network of atmospheric actions and reactions would have behaved differently, and the tornado might never have formed—just as the Arab Spring might never have happened, at least not when and as it did, if the police had just let Mohamed Bouazizi sell his fruits and vegetables that morning in 2010.

How predictable something is depends on what we are trying to predict, how far into the future, and under what circumstances.

Predictability

Weather forecasts are typically quite reliable, under most conditions, looking a few days ahead, but they become increasingly less accurate three, four, and five days out. Much beyond a week, we might as well consult that dart-throwing chimpanzee.

So we can’t say that weather is predictable or not, only that weather is predictable to some extent under some circumstances.

More often forecasts are made and then … nothing. Accuracy is seldom determined after the fact and is almost never done with sufficient regularity and rigor that conclusions can be drawn.

The reason?

Mostly it’s a demand-side problem: The consumers of forecasting—governments, business, and the public—don’t demand evidence of accuracy. So there is no measurement. Which means no revision. And without revision, there can be no improvement. This is a constant problem.

The Good Judgement Project

Good Judgment’s global network of Superforecasters has its roots in research funded by the US intelligence community. Reports that Superforecasters were 30% more accurate than intelligence analysts with access to classified information rocked the conventional wisdom.

Each team, up against the likes of the CIA, would effectively be its own research project, free to improvise whatever methods it thought would work, but required to submit forecasts at 9 a.m. eastern standard time every day from September 2011 to June 2015. By requiring teams to forecast the same questions at the same time, the tournament created a level playing field—and a rich trove of data about what works, how well, and when.

In year 1, GJP beat the official control group by 60%. In year 2, it beat the control group by 78%. GJP also beat its university-affiliated competitors, including the University of Michigan and MIT, by hefty margins, from 30% to 70%, and even outperformed professional intelligence analysts with access to classified data. After two years, GJP was doing so much better than its academic competitors that IARPA dropped the other teams.

What makes superforecasters super at forecasting?

One, foresight is real.

Foresight isn’t a mysterious gift bestowed at birth. It is the product of particular ways of thinking, of gathering information, of updating beliefs. These habits of thought can be learned and cultivated by any intelligent, thoughtful, determined person.

Superforecasting demands thinking that is open-minded, careful, curious, and—above all—self-critical.

Machines may get better at “mimicking human meaning,” and thereby better at predicting human behavior, but “there’s a difference between mimicking and reflecting meaning and originating meaning,” David Ferrucci said.

That’s a space human judgment will always occupy.

The problem comes with our taste for certainty. On television, we want pundits who claim with certainty when the next financial crash will occur and who the next President will be.

The problem is - no one can predict any of that with certainty. But we have such strong desire for quick answers, the pundits with the loudest voices get on television, while those with moderate views are hushed.

In fact, data revealed an inverse correlation between fame and accuracy: the more famous an expert was, the less accurate they were.

Tip of your nose perspective

Try answering this:

“A bat and ball together cost $1.10. The bat costs a dollar more than the ball. How much does the ball cost?”

You instantly had an answer: “Ten cents.” You didn’t think carefully to get that. You didn’t calculate anything. It just appeared.

For that, you can thank System 1. Quick and easy, no effort required.

(The answer is the ball costs $0.05. If the ball was $0.10, the bat would have to cost $1.20).

The bat-and-ball question is one item in an ingenious psychological measure, the Cognitive Reflection Test, which has shown that most people—including very smart people—aren’t very reflective.

The tip of your nose perspective (aka System 1) follows a primitive psycho-logic: if it feels true, it is.

Scientists are trained to be cautious. Scientists must be able to answer the question “What would convince me I am wrong?” If they can’t, it’s a sign they have grown too attached to their beliefs.

The key is doubt.

Probability

Forecasting is all about estimating the likelihood of something happening.

Forecasters use probabilities to inform the likelihood of something happening. This is a very successful way of calibrating answers to complex questions:

A problem—then and now—is that expressing a probability estimate with a number may imply to the reader that it is an objective fact, not the subjective judgment it is. That is a danger. But the answer is not to do away with numbers. It’s to inform readers that numbers, just like words, only express estimates—opinions—and nothing more.

This can be known as the wrong-side-of-maybe fallacy. Here’s an example:

If a meteorologist says there is a 70% chance of rain and it doesn’t rain, is she wrong? Not necessarily. Implicitly, her forecast also says there is a 30% chance it will not rain.

If the forecast said there was a 70% chance of rain and it rains, people think the forecast was right; if it doesn’t rain, they think it was wrong. This simple mistake is extremely common. Even sophisticated thinkers fall for it.

Aggregating information

In 1906 the legendary British scientist Sir Francis Galton went to a country fair and watched as hundreds of people individually guessed the weight that a live ox would be.

Their average guess—their collective judgment—was 1,197 pounds, one pound short of the correct answer, 1,198 pounds.

It was the earliest demonstration of a phenomenon popularized by—and now named for—James Surowiecki’s bestseller The Wisdom of Crowds.

Aggregating the judgment of many consistently beats the accuracy of the average member of the group, and is often as startlingly accurate as Galton’s weight-guessers.

The key is recognizing that useful information is often dispersed widely, with one person possessing a scrap, another holding a more important piece.

How well aggregation works depends on what you are aggregating. Aggregating the judgments of many people who know nothing produces a lot of nothing.

Aggregating the judgments of people who know a little is better, and if there are enough of them, it can produce impressive results

The role of skill and luck

It’s easy to misinterpret randomness. We don’t have an intuitive feel for it. Randomness is invisible from the tip-of-your-nose perspective. We can only see it if we step outside ourselves.

We often single out an extraordinarily successful person, show that it was extremely unlikely that the person could do what he or she did, and conclude that luck could not be the explanation.

This often happens in news coverage of Wall Street. Someone beats the market six or seven years in a row, journalists profile the great investor, calculate how unlikely it is to get such results by luck alone, and triumphantly announce that it’s proof of skill.

The mistake? They ignore how many other people were trying to do what the great man did. If it’s many thousands, the odds of someone getting that lucky shoot up.

Most things involve luck and skill to varying proportions. Complexity makes it hard to figure out what to chalk up to skill and what to luck.

How to deconstruct an unknowable question

Italian American physicist Enrico Fermi—a central figure in the invention of the atomic bomb—concocted this little brainteaser:

‘How many piano tuners are there in Chicago?’

Here, we can break the question down by asking, “What information would allow me to answer the question?”

So what would we need to know to calculate the number of piano tuners in Chicago?

Well, the number of piano tuners depends on how much piano-tuning work there is and how much work it takes to employ one piano tuner. So I could nail this question if I knew four facts: 1. The number of pianos in Chicago 2. How often pianos are tuned each year 3. How long it takes to tune a piano 4. How many hours a year the average piano tuner works.
With the first three facts, I can figure out the total amount of piano-tuning work in Chicago. Then I can divide it by the last and, just like that, I’ll have a pretty good sense of how many piano tuners there are in Chicago.
But we don’t have that information - but by breaking down the question, we can better separate the knowable and the unknowable. So guessing—pulling a number out of the black box—isn’t eliminated. But we have brought our guessing process out into the light of day where we can inspect it. And the net result tends to be a more accurate estimate than whatever number happened to pop out of the black box when we first read the question.

Outside versus inside view

If you were asked: does Mr Johnson living in Australia have a pet? How would you go about finding out?

Most people might start asking questions about his family’s details. But superforecasters wouldn’t bother with any of that, at least not at first. The first thing they would do is find out what percentage of American households own a pet.

Google tells me 61% of households in Australia have pets. That would be my baseline and the outside view. I can then start digging deeper into the specifics, such as his work, family, location etc. to calibrate my baseline score.

Likewise if looking at the likelihood of an armed clash between two countries. Rather than looking at the current political situation, I would first look at how many clashes they have had in recent history - the outside view. If it was one every two years, I have a baseline. Then I step into the inside view.

Coming up with an outside view, an inside view, and a synthesis of the two isn’t the end. It’s a good beginning. Superforecasters constantly look for other views they can synthesize into their own.

Adding perspective

Researchers have found that merely asking people to assume their initial judgment is wrong, to seriously consider why that might be, and then make another judgment, produces a second estimate which, when combined with the first, improves accuracy almost as much as getting a second estimate from another person.

There is an even simpler way of getting another perspective on a question: tweak its wording

Imagine a question like “Will the South African government grant the Dalai Lama a visa within six months?”

The naive forecaster will go looking for evidence that suggests the Dalai Lama will get his visa while neglecting to look for evidence that suggests he won’t. The more sophisticated forecaster knows about confirmation bias and will seek out evidence that cuts both ways.

To check that tendency, turn the question on its head and ask, “Will the South African government deny the Dalai Lama for six months?”

For superforecasters, beliefs are hypotheses to be tested, not treasures to be guarded.

Exploration should make you change your mind a lot, and the best superforecasters constantly tweak their predictions.

Dweck’s research has shown, the growth mindset is far from universal. Many people have what she calls a “fixed mindset”—the belief that we are who we are, and abilities can only be revealed, not created and developed.

The ten commandments of aspiring superforecasters

For more, visit https://goodjudgment.com/

1. Triage

Focus on questions where your hard work is likely to pay off. Don’t waste time either on easy “clocklike” questions (where simple rules of thumb can get you close to the right answer) or on impenetrable “cloud-like” questions (where even fancy statistical models can’t beat the dart-throwing chimp). Concentrate on questions in the Goldilocks zone of difficulty, where effort pays off the most.

2. Break seemingly intractable problems into tractable sub-problems.

Channel the playful but disciplined spirit of Enrico Fermi who—when he wasn’t designing the world’s first atomic reactor—loved ballparking answers to head-scratchers such as “How many extraterrestrial civilizations exist in the universe?” Break apart the problem into its knowable and unknowable parts. Flush ignorance into the open. Expose and examine your assumptions. Dare to be wrong by making your best guesses. Better to discover errors quickly than to hide them behind vague verbiage.

3. Strike the right balance between inside and outside views.

Superforecasters know that there is nothing new under the sun. Nothing is 100% “unique.” Language purists be damned: uniqueness is a matter of degree. So Superforecasters conduct creative searches for comparison classes even for seemingly unique events, such as the outcome of a hunt for a high-profile terrorist (Joseph Kony) or the standoff between a new socialist government in Athens and Greece’s creditors. Superforecasters are in the habit of posing the outside-view question: How often do things of this sort happen in situations of this sort?

4. Strike the right balance between under- and overreacting to evidence.

Belief updating is to good forecasting as brushing and flossing are to good dental hygiene. It can be boring, occasionally uncomfortable, but it pays off in the long term. That said, don’t suppose that belief updating is always easy because it sometimes is. Skillful updating requires teasing subtle signals from noisy news flows— all the while resisting the lure of wishful thinking.

5. Look for the clashing causal forces at work in each problem.

For every good policy argument, there is typically a counterargument that is at least worth acknowledging. For instance, if you are a devout dove who believes that threatening military action never brings peace, be open to the possibility that you might be wrong about Iran. And the same advice applies if you are a devout hawk who believes that soft “appeasement” policies never pay off. Each side should list, in advance, the signs that would nudge them toward the other.

6. Strive to distinguish as many degrees of doubt as the problem permits but no more.

As in poker, you have an advantage if you are better than your competitors at separating 60/40 bets from 40/60—or 55/45 from 45/55. Translating vague-verbiage hunches into numeric probabilities feels unnatural at first, but it can be done. It just requires patience and practice. The Superforecasters have shown what is possible.

7. Strike the right balance between under- and overconfidence, between prudence and decisiveness.

Superforecasters understand the risks both of rushing to judgment and of dawdling too long near “maybe.” They routinely manage the trade-off between the need to take decisive stands (who wants to listen to a waffler?) and the need to qualify their stands (who wants to listen to a blowhard?). They realize that long-term accuracy requires getting good scores on both calibration and resolution—which requires moving beyond blame-game ping-pong. It is not enough just to avoid the most recent mistake. They have to find creative ways to tamp down both types of forecasting errors—misses and false alarms—to the degree a fickle world permits such uncontroversial improvements in accuracy.

8. Look for the errors behind your mistakes but beware of rearview-mirror hindsight biases.

Don’t try to justify or excuse your failures. Own them! Conduct unflinching postmortems: Where exactly did I go wrong? And remember that although the more common error is to learn too little from failure and to overlook flaws in your basic assumptions, it is also possible to learn too much (you may have been basically on the right track but made a minor technical mistake that had big ramifications). Don’t forget to do postmortems on your successes, too. Not all successes imply that your reasoning was right. You may have just lucked out by making offsetting errors.

9. Bring out the best in others and let others bring out the best in you.

Master the fine art of team management, especially perspective taking (understanding the arguments of the other side so well that you can reproduce them to the other’s satisfaction), precision questioning (helping others to clarify their arguments so they are not misunderstood), and constructive confrontation (learning to disagree without being disagreeable). Wise leaders know how fine the line can be between a helpful suggestion and micromanagerial meddling or between a rigid group and a decisive one or between a scatterbrained group and an open-minded one.

10. Master the error-balancing bicycle.

Implementing each commandment requires balancing opposing errors. Just as you can’t learn to ride a bicycle by reading a physics textbook, you can’t become a superforecaster by reading training manuals. Learning requires doing, with good feedback that leaves no ambiguity about whether you are succeeding—“I’m rolling along smoothly!”—or whether you are failing—“crash!” Also remember that practice is not just going through the motions of making forecasts, or casually reading the news and tossing out probabilities. Like all other known forms of expertise, superforecasting is the product of deep, deliberative practice.




Enjoying these posts? Subscribe for more