Our Workshop strategies have superb historical results. The possibility that these results occur solely by chance appears to be very small. However, we test many different ideas in the Workshop, increasing our chances of making an occasional "lucky guess." We should look at "out-of-sample" data to verify if our strategies are the result of skill or luck.
|
||||||
|
||||||
|
||||||
By
But you might be jumping headfirst into a dry swimming pool. It's entirely possible that our portfolio won't do that well in the future. It could lose to the S&P 500 Index! It could even lose money in absolute terms. The market goes up, and it goes down. Bet that got your attention!
How could that happen? Our screens are based on fundamental stock valuation criteria such as low price-to-book value, P/E ratios, and earnings growth. These criteria have long been associated with stock price appreciation. Note we said "associated." We don't really know if "associated" means that criteria like low price-to-book value or high earnings growth drives stock prices or if they are both just passengers in the same car.
We have backtested our strategies thoroughly, and our historical data is very accurate. Most of our Workshop screens use Relative Strength ("RS") as a selection criterion, and several academic papers have shown RS to be a good indicator of future outperformance. If that isn't enough, we're using established economic concepts such as the Modern Portfolio Theory and the Sharpe Ratio in building our portfolio. What could possibly go wrong? Oh, lots of things.
The stock market is an evolving organism, and strategies that may have worked in the past won't necessarily work in the future. However, after reading Edwin Lefevre's Reminiscences of a Stock Operator, I'm convinced that "plus ca change, plus ca meme chose." Even if the criteria remain good, the strategies could become "too popular." If everyone and his brother starts trying to buy and sell the same stocks on Friday afternoon, it could become impossible to get prices that match our models. I'm not terribly worried about these possibilities. They come with the territory when investing.
The real problem is "data mining." Statisticians like to say that if you torture the data long enough, it'll tell you what you want to know. That doesn't mean that it's telling you the truth.
Think about flipping a coin. If you flip a coin 10 times, and it comes up heads 6 times instead of 5, do you think the coin is unbalanced, or is that result just luck? The chance of a coin coming up heads 6 or more times out of 10 is nearly 40%. Odds are, it's just luck.
Now, say you flip a coin 10 times, and it comes up heads all 10 times. The chances of that happening are only one in a thousand (1,024, to be precise). Looks like our coin might be unbalanced.
This is what we're trying to do with our strategies. We're looking for quantitative methods for picking stocks that come up "heads" (better than the index) more often than "tails." The less likely that those strategies are just random luck, the more likely it is that we have a real underlying phenomenon that is driving the strategy performance. In effect we are looking for an unbalanced coin, one that will keep on coming up heads in the future.
A statistical test of our proposed Workshop Portfolio compared to the S&P 500 would tell us that the outperformance our portfolio has shown in the past would occur by chance only one time in seven thousand! Wow. That's like flipping heads 13 times in a row (roughly). Guess we don't have anything to worry about, right?
Guess again. We still have one major problem. As we wrote last year: "We go through a lot of strategies in the Workshop. We might try 19 sets of criteria that 'should' be associated with outperformance but aren't, before stumbling upon one that is."
A run-of-the-mill statistical test might say that the 20th strategy was "statistically significant" at a 95% confidence level. A 95% confidence level is generally considered "statistically significant," but that simply tells us that there is a 5% chance of finding an association like that by luck. Turn that around and it means we will probably find such an association one time out of twenty -- merely by luck. How many strategies did we try?
The name for this issue is the "multiple hypothesis" problem. Many different strategies are proposed on the Workshop and more advanced Mechanical Investing discussion boards. These are tested on historical data. The ideas that don't work are analyzed for insights into better strategies, and good strategies are refined. Our Workshop strategies are the results of lots of work, and lots of "hypotheses." How do we get around the multiple hypothesis problem?
Let's go back to our example of coin flipping. Imagine that you have a stack of a thousand coins. You flip each one ten times, and one of those thousand coins comes up heads every time. Now, do you think the coin is unbalanced?
Answer: You don't know. In a situation like that, you can assume that any coin that has some mix of heads and tales would not be unbalanced, but probability theory tells you a run of ten heads is likely under these circumstances. While there is a possibility that your coin is unbalanced, one coin coming up heads ten times in a row is not proof of that when you've got 999 coins in the test.
The simplest solution is to retest using "out-of-sample" data. Our original test was in-sample data. It can't really prove anything, it can only suggest possibilities. To perform an out-of-sample test, we flip the questionable coin another ten times and check the results. Those additional ten flips would be our "out-of-sample" data.
If the coin comes up heads only five times, then we can be confident that our earlier results were just luck. However, if our coin comes up heads eight, nine, or ten times, we suspect we might be on to something. Note that we can never know for sure if our coin is unbalanced, but the more out-of-sample tests we do, the more confident we become of the results.
So, how have our screens performed "out-of-sample"? We'll start to look into this next week. I hope to see you then -- same Fool time, same Fool channel.
(My thanks to VMSoui and Repoonsatad, among others, for their comments on "in-sample" and "out-of-sample" data, and for the coin-flipping example.)
RSS Headlines
Fool UK