Something funny happened during the 2008 presidential election.

Telephone polls taken right before the election showed voters leaning toward John McCain. Election results, of course, sent Barack Obama to the White House.

What happened? No poll is perfect, but telephone polls in 2008 had a unique bias: many only called landlines, skipping cell phones.

That threw the poll off. As Nate Silver wrote after the election, "The roughly one-third of Americans who rely exclusively on cell phones tend to be younger, more urban, worse off financially and more likely to be black or Hispanic than the broader group of voters, all characteristics that correlate with Democratic voting."

Ideally, election polls would contact every voter. But that's too hard and expensive. So we use a small sample and assume it represents the average.

Sadly, it often doesn't. A sample that is biased, or too small, is one of the biggest terrors in statistics. It creates averages that really aren't, and causes us to see trends that don't exist -- all under the guise of, "Hey, this is data!"

This is important to keep in mind when thinking about markets or the economy. Because here's the truth: We don't have a very good sample to draw from.

There have been 33 U.S. recessions in the last 156 years, a period that begins before the Civil War. That isn't many. And the data we have on most of those recessions is dubious. For example, estimates on how much the economy contracted during the recession of 1920 range from 2.4% to 6.9%, which is the difference between a moderate recession and a near-depression. In the last 50 years, when data is more reliable, there have been just seven U.S. recessions.

So how are we supposed to take seriously any statistic about the average recession? How long the average recession lasts? How frequently they occur? How high unemployment goes? We're talking about something that has occurred just seven times in the last half-century. Just like the election polls, these averages can't be taken too seriously. This is not a good sample -- in this case, it's just too small.

There have been two world wars in the last 100 years. But no one would ever say, "the average world war lasts five years," or "the average world war kills 60 million people." No one would say we're overdue for a world war based on past trends. It's obvious that two wars isn't enough to create an average that might tell us what the next war might look like, or when it will occur.

Same goes with market volatility. There have been nine occurrences of the S&P 500 falling 30% or more since 1928, and three occurrences of 50% declines. Three! That is not nearly enough to define anything about the average market crash. But there's a tendency -- one I've fallen for -- to look at past market crashes as a guide to future expectations.

There is no easy answer to this. "More data" doesn't always help, because markets change over time that apples aren't being compared to apples.

The S&P 500 did not include financial stocks until 1976; today, financials make up 16% of the index. Technology stocks were virtually nonexistent 50 years ago. Today, they're almost one-fifth of the index. Accounting rules have changed over time. So have disclosures, auditing, and market liquidity. 401(k)s and IRAs -- which hold trillions of dollars --- didn't exist until 40 years ago. Comparing today's market to the past isn't apples to apples. 

Changing the sample you use can give you completely different results, which is dangerous because it lets you prove almost anything you want. 

Take Yale economist Robert Shiller's valuation metric, the cyclically adjusted P/E ratio. Since 1871, CAPE has averaged 16.6, which makes today's market, at 26 times earnings, look overvalued. But since 1957, when the S&P 500 was born (Shiller used a hypothetical version before then), the average is 20. Since 1990, when globalization took off and technology stocks became a bigger part of the index, the average is 25.3. So maybe stocks aren't so overvalued? 

A question I like to ask economists is, "What's one thing you'd like to know about the economy that can't be known?"

If someone asked me the same question, my answer would be this: What would we know about the economy and stock market if we had 10,000 years of perfect, unimpeachable, apples-to-apples data?

My guess is that almost everything we think we know about average recessions, average market crashes, and average bull markets would be upended.

Maybe we'd learn that the last 100 years of high stock market returns were a fluke. Or abnormally low, pushed down by two world wars. Or that Great Depressions are actually common. Or extremely rare.

I really don't know. Nobody does. But I'm confident we'd be shocked.

