Is there some tractable investment performance metric that corrects weaknesses commonly encountered in financial markets research? In the July 2014 version of their paper entitled “The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting and Non-Normality”, David Bailey and Marcos Lopez de Prado introduce the Deflated Sharpe Ratio (DSR) as a tool for evaluating investment performance that accounts for both non-normality and data snooping bias. They preface DSR development by noting that:
- Many investors use performance statistics, such as Sharpe ratio, that assume test sample returns have a normal distribution.
- Fueled by high levels of randomness in liquid markets, testing of a sufficient number of strategies on the same data essentially guarantees discovery of an apparently profitable, but really just lucky, strategy.
- The in-sample/out-of-sample hold-out approach does not eliminate data snooping bias when multiple strategies are tested against the same hold-out data.
- Researchers generally publish “successes” as isolated analyses, ignoring all the failures encountered along the road to statistical significance.
The authors then transform Sharpe ratio into DSR by incorporating sample return distribution skewness and kurtosis and by correcting for the bias associated with the number of strategies tested in arriving at the “winning” strategy. Based on mathematical derivations and an example, they conclude that:
- Expanding the number of strategy alternatives tested on a sample increases the expected maximum (luckiest) Sharpe ratio, and therefore the likelihood that the winning strategy’s Sharpe ratio is inflated by luck. Non-normality of the sample return distribution can exacerbate such Sharpe ratio inflation.
- DSR eliminates Sharpe ratio inflation by accounting for:
- Non-Normality of sample returns (return distribution skewness and kurtosis).
- Length of the sample.
- Number of alternative strategies tested on the sample.
- Variance of the Sharpe ratios of alternative strategies.
- To estimate the optimal number of alternative strategies to test on a given set of data:
- Determine the number of alternatives that are theoretically justifiable.
- Pick 1/e (about 37%) of these alternatives at random and test them.
- Then randomly pick and test additional alternatives one-by-one until one beats all prior tests. That final strategy is the winner, and the number of trials to that point optimal.
In summary, the Deflated Sharpe Ratio, implemented with theoretical rigor and empirical discipline, offers a way to account for return distribution non-normality and data snooping bias.
For many investors, the approach outlined in this paper may epitomize the frustrations of financial markets research rather than a practical way to address them.
Cautions regarding conclusions include:
- The term “theoretically justifiable” (with respect to limiting the strategy alternatives considered) is not well-defined. One researcher’s justification is another one’s crime.
- Researchers may not have the incentive and/or discipline to document all strategy alternatives considered.
- Dependence of strategy development on prior research for which the degree of snooping is unknown introduces unquantifiable secondary snooping bias.