In-sample vs. Out-of-sample Performance of 888 Trading Strategies
June 2, 2016 - Big Ideas
Are any trading strategy backtest performance statistics predictive of out-of-sample results? In their March 2016 paper entitled “All that Glitters Is Not Gold: Comparing Backtest and Out-of-Sample Performance on a Large Cohort of Trading Algorithms”, Thomas Wiecki, Andrew Campbell, Justin Lent and Jessica Stauth compare backtest and out-of-sample performance statistics for 888 algorithmic trading strategies. They first screen a larger set of strategies to remove duplicates, outliers and algorithms unlikely to represent real strategies. They next test the selected strategies in-sample (IS) with data that was available to the developers (from 2010 through deployment dates between January and June in 2015). They then test the strategies out-of-sample (OOS) during June 2015 through February 2016. All tests employ minute-by-minute prices for trade entry/exit and include robustly estimated trading frictions. Performance metrics derive from end-of-day positions/prices. Most tests are linear regressions relating individual IS-OOS performance metrics (such as Sharpe ratio). They also examine abilities of several multivariate machine learning techniques to predict performance, ultimately via an an equal-weighted portfolio of the 10 strategies predicted to have the highest OOS Sharpe ratios. Using position and price data for the 888 strategies during the specified IS and OOS periods, plus the total number of backtest days actually employed by each strategy developer, they find that: Keep Reading