How should investors assess the credibility of investment strategy backtests? In his October 2013 paper entitled “Telling the Good from the Bad and the Ugly: How to Evaluate Backtested Investment Strategies”, Patrick Beaudan recommends ways to judge investment strategies and backtests based on his years of experience in evaluating, managing and investing in algorithmic strategies. His perspective is that of investors choosing among strategies proposed by investment managers. Using examples to illustrate his points, he concludes that:
- To assess operational feasibility of a strategy, request a record of the offeror’s actual trades and ongoing performance to verify that they match the offered backtest.
- To assess the risk of a strategy, examine the depth and duration of drawdowns. Avoid managers who seem unlikely to be able to withstand the pressure of a major drawdown or who have never actually experienced one.
- Before digging into a backtest, assess the strategy’s value proposition by examining:
- What investor behaviors or market/economic conditions drive performance.
- What market regimes are favorable and unfavorable (and especially whether backtest outperformance concentrates in a few trades).
- Whether future conditions are likely to be favorable.
- Examine all strategy rules/parameters to ensure they make sense.
- In general, favor simple strategies. The more rules/parameters embedded in a strategy, the greater the likelihood that its performance derives from data snooping (unique dataset conditions). Be especially wary of strategies with more than 12 total rules/parameters across all the following categories:
- Defining the universe of securities in the strategy.
- Determining how capital is allocated across securities.
- Conditions triggering trades/portfolio rebalancing, including assumptions about trading frictions and minimum portfolio size.
- Controlling risk, including stop-losses and use of leverage.
- The absence of sudden, large drawdowns in a backtest employing leverage is a red flag that results are probably not repeatable.
- In general, the backtest for a strategy should use the longest set of relevant data available to: (1) quantify maximum drawdown; (2) identify conditions for gains and losses; and, (3) identify a trigger for “retirement” of the strategy because it is no longer working.
- However, advances in analysis/trading technology in combination with a 1982-2012 secular decline in interest rates make it essentially impossible to construct a good long-run data set for testing low-frequency trading strategies. Focus on isolating and analyzing market regimes over the past fifteen to twenty years to develop intuitions about when a strategy is likely to work, when it is not likely to work, the impact of rising interest rates, and the pattern of major drawdowns.
In summary, when judging investment strategies, investors should have a clear understanding of potential and inevitable disconnects between the results of hypothetical backtests and likely future performance.
Cautions regarding conclusions include:
- The paper may under-emphasize the importance of checking proposed strategies for the reasonableness of assumptions about trading frictions.
- Another potential source of disconnect between backtests and future performance is return distribution wildness, which undermines sample representativeness.
For related discussions, see Avoiding Investment Strategy Flame-outs.