วันพุธที่ 15 กรกฎาคม พ.ศ. 2558

Self-Fulfilling Beliefs and Data Mining

Taken to extremes, these cognitive illusions may give rise to closed systems of thought that are immune, at least for a while, to revision and refutation. (Austrian writer and satirist Karl Kraus once remarked, “Psychoanalysis is that mental illness for which it regards itself as therapy.”) This is especially true for the market, since investors’ beliefs about stocks or a method of picking them can become a self-fulfilling prophecy. The market sometimes acts like a strange beast with a will, if not a mind, of its own. Studying it is not like studying science and mathematics, whose postulates and laws are (in quite different senses) independent of us. If enough people suddenly wake up believing in a stock, it will, for that reason alone, go up in price and justify their beliefs.
A contrived but interesting illustration of a self-fulfilling belief involves a tiny investment club with only two investors and ten possible stocks to choose from each week. Let’s assume that each week chance smiles at random on one of the ten stocks the investment club is considering and it rises precipitously, while the week’s other nine stocks oscillate within a fairly narrow band.
George, who believes (correctly in this case) that the movements of stock prices are largely random, selects one of the ten stocks by rolling a die (say an icosahedron—a twenty-sided solid—with two sides for each number). Martha, let’s assume, fervently believes in some wacky theory, Q analysis. Her choices are therefore dictated by a weekly Q analysis newsletter that selects one stock of the ten as most likely to break out. Although George and Martha are equally likely to pick the lucky stock each week, the newsletter-selected stock will result in big investor gains more frequently than will any other stock.
The reason is simple but easy to miss. Two conditions must be met for a stock to result in big gains for an investor: It must be smiled upon by chance that week and it must be chosen by one of the two investors. Since Martha always picks the newsletter-selected stock, the second condition in her case is always met, so whenever chance happens to favor it, it results in big gains for her. This is not the case with the other stocks. Nine-tenths of the time, chance will smile on one of the stocks that is not newsletter-selected, but chances are George will not have picked that particular one, and so it will seldom result in big gains for him. One must be careful in interpreting this, however. George and Martha have equal chances of pulling down big gains (10 percent), and each stock of the ten has an equal chance of being smiled upon by chance (10 percent), but the newsletter-selected stock will achieve big gains much more often than the randomly selected ones.
Reiterated more numerically, the claim is that 10 percent of the time the newsletter-selected stock will achieve big gains for Martha, whereas each of the ten stocks has only a 1 percent chance of both achieving big gains and being chosen by George. Note again that two things must occur for the newsletter-selected stock to achieve big gains: Martha must choose it, which happens with probability 1, and it must be the stock that chance selects, which happens with probability 1/10th. Since one multiplies probabilities to determine the likelihood that several independent events occur, the probability of both these events occurring is 1 × 1/10, or 10 percent. Likewise, two things must occur for any particular stock to achieve big gains via George: George must choose it, which occurs with probability 1/10th, and it must be the stock that chance selects, which happens with probability 1/10th. The product of these two probabilities is 1/100th or 1 percent.
Nothing in this thought experiment depends on there being only two investors. If there were one hundred investors, fifty of whom slavishly followed the advice of the newsletter and fifty of whom chose stocks at random, then the newsletter-selected stocks would achieve big gains for their investors eleven times as frequently as any particular stock did for its investors. When the newsletter-selected stock is chosen by chance and happens to achieve big gains, there are fifty-five winners, the fifty believers in the newsletter and five who picked the same stock at random. When any of the other nine stocks happens to achieve big gains, there are, on average, only five winners.
In this way a trading strategy, if looked at in a small population of investors and stocks, can give the strong illusion that it is effective when only chance is at work.
“Data mining,” the scouring of databases of investments, stock prices, and economic data for evidence of the effectiveness of this or that strategy, is another example of how an inquiry of limited scope can generate deceptive results. The problem is that if you look hard enough, you will always find some seemingly effective rule that resulted in large gains over a certain time span or within a certain sector. (In fact, inspired by the British economist Frank Ramsey, mathematicians over the last half century have proved a variety of theorems on the inevitability of some kind of order in large sets.) The promulgators of such rules are not unlike the believers in bible codes. There, too, people searched for coded messages that seemed to be meaningful, not realizing that it’s nearly impossible for there not to be some such “messages.” (This is trivially so if you search in a book that has a chapter 11, conveniently foretelling many companies’ bankruptcies.)
People commonly pore over price and trade data attempting to discover investment schemes that have worked in the past. In a reductio ad absurdum of such unfocused fishing for associations, David Leinweber in the mid-90s exhaustively searched the economic data on a United Nations CD-ROM and found that the best predictor of the value of the S&P 500 stock index was—a drum roll here—butter production in Bangladesh. Needless to say, butter production in Bangladesh has probably not remained the best predictor of the S&P 500. Whatever rules and regularities are discovered within a sample must be applied to new data if they’re to be accorded any limited credibility. You can always arbitrarily define a class of stocks that in retrospect does extraordinarily well, but will it continue to do so?
I’m reminded of a well-known paradox devised (for a different purpose) by the philosopher Nelson Goodman. He selected an arbitrary future date, say January 1, 2020, and defined an object to be “grue” if it is green and the time is before January 1, 2020, or if it is blue and the time is after January 1, 2020. Something is “bleen,” on the other hand, if it is blue and the time is before that date or if it is green and the time is after that date. Now consider the color of emeralds. All emeralds examined up to now (2002) have been green. We therefore feel confident that all emeralds are green. But all emeralds so far examined are also grue. It seems that we should be just as confident that all emeralds are grue (and hence blue beginning in 2020). Are we?
A natural objection is that these color words grue and bleen are very odd, being defined in terms of the year 2020. But were there aliens who speak the grue-bleen language, they could make the same charge against us. “Green,” they might argue, is an arbitrary color word, being defined as grue before 2020 and bleen afterward. “Blue” is just as odd, being bleen before 2020 and grue from then on. Philosophers have not convincingly shown what exactly is wrong with the terms grue and bleen, but they demonstrate that even the abrupt failure of a regularity to hold can be accommodated by the introduction of new weasel words and ad hoc qualifications.
In their headlong efforts to discover associations, data miners are sometimes fooled by “survivorship bias.” In market usage this is the tendency for mutual funds that go out of business to be dropped from the average of all mutual funds. The average return of the surviving funds is higher than it would be if all funds were included. Some badly performing funds become defunct, while others are merged with better-performing cousins. In either case, this practice skews past returns upward and induces greater investor optimism about future returns. (Survivorship bias also applies to stocks, which come and go over time, only the surviving ones making the statistics on performance. WCOM, for example, was unceremoniously replaced on the S&P 500 after its steep decline in early 2002.)
The situation is rather like that of schools that allow students to drop courses they’re failing. The grade point averages of schools with such a policy are, on average, higher than those of schools that do not allow such withdrawals. But these inflated GPAs are no longer a reliable guide to students’ performance.
Finally, taking the meaning of the term literally, survivorship bias makes us all a bit more optimistic about facing crises. We tend to see only those people who survived similar crises. Those who haven’t are gone and therefore much less visible.

ไม่มีความคิดเห็น:

แสดงความคิดเห็น