Objective research to aid investing decisions

Value Investing Strategy (Strategy Overview)

Allocations for December 2024 (Final)
Cash TLT LQD SPY

Momentum Investing Strategy (Strategy Overview)

Allocations for December 2024 (Final)
1st ETF 2nd ETF 3rd ETF

Big Ideas

These blog entries offer some big ideas of lasting value relevant for investing and trading.

Snooping for Fun and No Profit

How much distortion can data snooping inject into expected investment strategy performance? In their October 2014 paper entitled “Statistical Overfitting and Backtest Performance”, David Bailey, Stephanie Ger, Marcos Lopez de Prado, Alexander Sim and Kesheng Wu note that powerful computers let researchers test an extremely large number of model variations on a given set of data, thereby inducing extreme overfitting. In finance, this snooping often takes the form of refining a trading strategy to optimize its performance within a set of historical market data. The authors introduce a way to explore snooping effects via an online simulator that finds the optimal (maximum Sharpe ratio) variant of a simple trading strategy by testing all possible integer values for strategy parameters as applied to a set of randomly generated daily “returns.” The simple trading strategy each month trades a single asset by (1) choosing a day of the month to enter either a long or a short position and (2) exiting after a specified number of days or a stop-loss condition. The randomly generated “returns” come from a source Gaussian (normal) distribution with zero mean. The simulator allows a user to specify a maximum holding period, a maximum percentage stop loss, sample length (number of days), sample volatility (number of standard deviations) and sample starting point (random number generator seed). After identifying optimal parameter values on “backtest” data, the simulator runs the optimal strategy variant on a second set of randomly generated returns to show the effect of backtest overfitting. Using this simulator, they conclude that: Keep Reading

Survey of Recent Research on Constructing and Monitoring Portfolios

What’s the latest research on portfolio construction and risk management? In the the introduction to the July 2014 version of his (book-length) paper entitled “Many Risks, One (Optimal) Portfolio”, Cristian Homescu states: “The main focus of this paper is to analyze how to obtain a portfolio which provides above average returns while remaining robust to most risk exposures. We place emphasis on risk management for both stages of asset allocation: a) portfolio construction and b) monitoring, given our belief that obtaining above average portfolio performance strongly depends on having an effective risk management process.” Based on a comprehensive review of recent research on portfolio construction and risk management, he reports on:

Keep Reading

When Bollinger Bands Snapped

Do financial markets adapt to widespread use of an indicator, such as Bollinger Bands, thereby extinguishing its informativeness? In the August 2014 version of their paper entitled “Popularity versus Profitability: Evidence from Bollinger Bands”, Jiali Fang, Ben Jacobsen and Yafeng Qin investigate the effectiveness of Bollinger Bands as a stock market trading signal before and after its introduction in 1983. They focus on bands defined by 20 trading days of prices to create the middle band and two standard deviations of these prices to form upper and lower bands. They consider two trading strategies based on Bollinger Bands:

  1. Basic volatility breakout, which generates  buy (sell) signals when price closes outside the upper (lower) band.
  2. Squeeze refinement of volatility breakout, which generates buy (sell) signals when band width drops to a six-month minimum and price closes outside the upper (lower) band.

They assess the popularity (and presumed level of use) of Bollinger Bands over time based on a search of articles from U.S. media in the Factiva database. They evaluate the predictive power of Bollinger Bands across their full sample sample and three subsamples: before 1983, 1983 through 2001, and after 2001. Using daily levels of 14 major international stock market indexes (both the Dow Jones Industrial Average and the S&P 500 Index for the U.S.) from initial availabilities (ranging from 1885 to 1971) through March 2014, they find that: Keep Reading

Evaluating Systematic Trading Programs

How should investors assess systematic trading programs? In his August 2014 paper entitled “Evaluation of Systematic Trading Programs”, Mikhail Munenzon offers a non-technical overview of issues involved  in evaluating systematic trading programs. He defines such programs as automated processes that generate signals, manage positions and execute orders for exchange-listed instruments or spot currency rates with little or no human intervention. He states that the topics he covers are not exhaustive but should be sufficient for an investor to initiate successful relationships with systematic trading managers. Based on his years of experience as a systematic trader and as a large institutional investor who has evaluated many diverse systematic trading managers on a global scale, he concludes that: Keep Reading

Snooping Bias Accounting Tools

How can researchers account for the snooping bias derived from testing of multiple strategy alternatives on the same set of data? In the July 2014 version of their paper entitled “Evaluating Trading Strategies”, Campbell Harvey and Yan Liu describe tools that adjust strategy evaluation for multiple testing. They note that conventional thresholds for statistical significance assume an independent (single) test. Applying these same thresholds to multiple testing scenarios induces many false discoveries of “good” trading strategies. Evaluation of multiple tests requires making significance thresholds more stringent. In effect, such adjustments mean demanding higher Sharpe ratios or, alternatively, applying “haircuts” to computed strategy Sharpe ratios according to the number of strategies tried. They consider two approaches: one that aggressively excludes false discoveries, and another that scales avoidance of false discoveries with the number of strategy alternatives tested. Using mathematical derivations and examples, they conclude that:

Keep Reading

Sensitivity of Risk Adjustment to Measurement Interval

Are widely used volatility-adjusted investment performance metrics, such as Sharpe ratio, robust to different measurement intervals? In the July 2014 version of their paper entitled “The Divergence of High- and Low-Frequency Estimation: Implications for Performance Measurement”, William Kinlaw, Mark Kritzman and David Turkington examine the sensitivity of such metrics to the length of the return interval used to measure it. They consider hedge fund performance, conventionally estimated as Sharpe ratio calculated from monthly returns and annualized by multiplying by the square root of 12. They also consider mutual fund performance, usually evaluated as excess return divided by excess volatility relative to an appropriate benchmark (information ratio). Finally, they consider Sharpe ratios of risk parity strategies, which periodically rebalance portfolio asset weights according to the inverse of their return standard deviations. Using monthly and longer-interval return data over available sample periods for each case, they find that: Keep Reading

Sharper Sharpe Ratio?

Is there some tractable investment performance metric that corrects weaknesses commonly encountered in financial markets research? In the July 2014 version of their paper entitled “The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting and Non-Normality”, David Bailey and Marcos Lopez de Prado introduce the Deflated Sharpe Ratio (DSR) as a tool for evaluating investment performance that accounts for both non-normality and data snooping bias. They preface DSR development by noting that:

  • Many investors use performance statistics, such as Sharpe ratio, that assume test sample returns have a normal distribution.
  • Fueled by high levels of randomness in liquid markets, testing of a sufficient number of strategies on the same data essentially guarantees discovery of an apparently profitable, but really just lucky, strategy.
  • The in-sample/out-of-sample hold-out approach does not eliminate data snooping bias when multiple strategies are tested against the same hold-out data.
  • Researchers generally publish “successes” as isolated analyses, ignoring all the failures encountered along the road to statistical significance.

The authors then transform Sharpe ratio into DSR by incorporating sample return distribution skewness and kurtosis and by correcting for the bias associated with the number of strategies tested in arriving at the “winning” strategy. Based on mathematical derivations and an example, they conclude that:

Keep Reading

Generating Parameter Sensitivity Distributions to Mitigate Snooping Bias

Is there a practical way to mitigating data snooping bias while exploring optimal parameter values? In his February 2014 paper entitled “Know Your System! – Turning Data Mining from Bias to Benefit through System Parameter Permutation” (the National Association of Active Investment Managers’ 2014 Wagner Award winner), Dave Walton outlines System Parameter Permutation (SPP) as an alternative to traditional in-sample/out-of-sample cross-validation and other more complex approaches to compensating for data snooping bias. SPP generates distributions of performance metrics for rules-based investment strategies by systematically collecting outcomes across plausible ranges of rule parameters (see the figure below). These distributions capture typical, optimal and worst-case outcomes. He explains how to apply SPP to estimate both long-run and short-run strategy performance and to test statistical significance. He provides an example that compares conventional in-sample/out-of-sample testing and SPP as applied to an asset rotation strategy based on relative momentum. Using logical arguments and examples, he concludes that: Keep Reading

The Significance of Statistical Significance?

How should investors interpret findings of statistical significance in academic studies of financial markets? In the March 2014 draft of their paper entitled “Significance Testing in Empirical Finance: A Critical Review and Assessment”, Jae Kim and Philip Ji review significance testing in recent research on financial markets. They focus on interplay of two types of significance: (1) the probability of a Type I error (the probability of rejecting a true null hypothesis), with significance threshold usually set at 1%, 5% (most used) or 10%; and, (2) the probability of a Type II error (the probability of accepting a false null hypothesis). They consider the losses associated with the significance threshold, and they assess the Bayesian method of inference as an alternative to the more widely used frequentist method associated with conventional significance testing. Based on review of past criticisms of conventional significance testing and 161 studies applying linear regression recently published in four top-tier finance journals, they conclude that: Keep Reading

Estimating Snooping Bias for a Multi-parameter Strategy

A subscriber flagged an apparently very attractive exchange-traded fund (ETF) momentum-volatility-correlation strategy that, as presented, generates a optimal compound annual growth rate of 45.7% with modest maximum drawdown. The strategy chooses from among the following seven ETFs:

ProShares Ultra S&P500 (SSO)
SPDR EURO STOXX 50 (FEZ)
iShares MSCI Emerging Markets (EEM)
iShares Latin America 40 (ILF)
iShares MSCI Pacific ex-Japan (EPP)
Vanguard Extended Duration Treasuries Index ETF (EDV)
iShares 1-3 Year Treasury Bond (SHY)

The steps in the strategy are, at the end of each month:

  1. For the first six of the above ETFs, compute log returns over the last three months and standard deviation (volatility) of log returns over the past six months.
  2. Standardize these the monthly sets of past log returns and volatilities based on their respective means and standard deviations.
  3. Rank the six ETFs according to the sum of 0.75 times standardized past log return plus 0.25 times past standardized volatility.
  4. Buy the top-ranked ETF for the next month.
  5. However, if at the end of any month, the correlation of SSO and EDV monthly log returns over the past four months is greater than 0.75, instead buy SHY for the next month.

The developer describes this strategy as an adaptation of someone else’s strategy and notes that he has tested a number of systems. How material might the implied secondary and primary data snooping bias be in the performance of this system? To investigate, we examine the fragility of performance statistics to variation of each strategy parameter separately. As presented, the author substitutes other ETFs for the two with the shortest histories to extend the test period backward in time. We use only price histories as available for the specified ETFs, limited by EDV with inception January 2008. Using monthly adjusted closing prices for the above seven ETFs and for SPDR S&P 500 (SPY) during January 2008 through February 2014 (74 months), we find that: Keep Reading

Login
Daily Email Updates
Filter Research
  • Research Categories (select one or more)