Are technical rules applied to pairs trading attractive after correcting for data snooping bias? In their March 2018 paper entitled “Pairs Trading, Technical Analysis and Data Snooping: Mean Reversion vs Momentum”, Ioannis Psaradellis, Jason Laws, Athanasios Pantelous and Georgios Sermpinis test a variety of technical trading rules for long-short trading of 15 commodity futures, equity indexes and currency pairs (all versus the U.S. dollar) frequently used on trading websites or offered by financial market firms. Specifically, they test 18,412 trend-following/momentum and contrarian/mean-reversion rules often applied by traders to past daily pair return spreads. They consider average excess (relative to short-term interest rate) return and Sharpe ratio as key metrics for rule selection and performance measurement. They use False Discovery Rate (FDR) to control for data snooping bias, such that 90% of the equally weighted best rules in FDR-corrected portfolios significantly outperform the benchmark. Most tests are in-sample. To test robustness of findings, they: (1) account for one-way trading frictions ranging from 0.02% to 0.05% across assets; (2) consider five subperiods to test consistency over time; and, (3) perform out-of-sample tests using the first part of each subperiod to select the best rules and roughly the last year to measure performance of these rules out-of-sample. Using daily prices of specified assets and daily short-term interest rates for selected currencies during January 1990 (except ethanol starts late March 2006) through mid-December 2016, they find that:
- Over the full sample period for FDR-corrected portfolios of rules formed by maximizing average excess return, only five of 15 pairs are predictable in-sample via technical analysis with moderate to high statistical significance. Of these, three are commodity futures pairs. The Brent-WTI crude oil futures pair is the most predictable.
- However, for FDR-corrected portfolios of rules formed by maximizing Sharpe ratio, 14 of 15 pairs are predictable in-sample via technical analysis with moderate to high statistical significance. Commodities are the most uniformly predictable, with the Brent-WTI crude oil futures pair again most predictable.
- For most in-sample cases, breakeven trading frictions far exceed modeled frictions, especially for FDR-corrected portfolios formed based on Sharpe ratio (which generally trade much less frequently than those formed using average excess return).
- Among rules selected using Sharpe ratio, the best-performing in-sample ones for:
- Commodity futures pairs are relative strength index (two pairs) and Bollinger Bands (two pairs).
- Equity index pairs are Commodity Channel Index (four pairs), channel breakout (one pair) and Bollinger Bands (one pair).
- Currency pairs are relative strength index (two pairs), moving average (one pair), channel breakout (one pair) and support and resistance (one pair).
- In-sample subperiod analysis indicates that profitability of technical pairs trading declines overall, but not for all pairs.
- Out-of-sample tests of FDR-corrected portfolios based on lagged in-sample Sharpe ratio and of single best-performing lagged in-sample rules generate some attractive but also many unattractive pairs trading outcomes. For example, FDR-corrected portfolios for 2016 have positive Sharpe ratios for only four of 15 pairs.
- Multi-pair portfolios that equally weight commodity futures, equity index or currency pairs, or combine all three asset classes have spotty out-of-sample results. For example, all such portfolios have negative Sharpe ratios for 2016.
In summary, evidence indicates that out-of-sample performance of technical pairs trading rules applied to popular commodity futures, equity index and currency pairs, after correction for data snooping bias, is not generally attractive.
Cautions regarding findings include:
- Overall, the tone of the paper seems much more positive than do the quantitative findings.
- Assumed levels of trading frictions are those “normal” for institutional investors. Trading signals may tend to occur during abnormal bid-ask spread conditions. Also non-institutional investors may bear higher costs.
- Iterative testing of the full range of technical rules across pairs to identify the best snooping-corrected rules for out-of-sample use may be costly. Such testing is beyond the reach of most investors, who would bear fees for delegating pairs portfolio maintenance to a fund manager.
- Snooping bias correction addresses selection of rules for each pair, but appears not to address consideration of: (1) many pairs with potentially interrelated return spreads; and, (2) alternative portfolio formation methods considered in the paper. Nor does it account for inherited snooping bias derived from using “pairs being frequently advertised by trading websites or launched by financial market companies.”