Is the conventional linear factor model comprised of a few presumably independent predictors the best, or even a good, way to model differences in returns across assets? In the December 2019 update of their paper entitled “The Cross-Section of Returns: A Non-Parametric Approach”, Enoch Cheng and Clemens Struck compare predictive powers of conventional linear models and less presumptive tree-based methods. The latter accommodate multivariate interactions and non-linearities across all predictors. They consider two linear and two tree-based methods with parameter settings commonly used in other studies:
1a. Logit – a linear regression model including all factors.
1b. LASSO – a linear regression model with a shrinkage term that sets betas to zero for (discards) predictors that do not add information, and thereby acts as a variable selection tool.
2a. Bagged regression trees – bootstrapping to create different samples from the original data, growing an individual tree on each and combining predictions of individual trees by a simple majority vote.
2b. Boosted regression trees – a modification to bagging whereby bagging and growing trees takes place sequentially with bootstrapping subsequently adjusted to improve prediction accuracy for the forest with each new tree.
Specifically, they measure relationships between 59 predictor variables and next-month (4-week) return for a universe of 28 liquid commodity futures series. This asset universe has low trading costs and avoids survivorship bias. They use nearest, second and third month contracts, the latter two only to construct signals and the first for trading. They generally roll contracts 10 days before the last trade date. The 59 predictors include time series (intrinsic or absolute) momentum variants, moving average variants, volatility variants, value metrics, miscellaneous variables, dummies for calendar months and dummies for each of the 28 commodity contract series. They consider long-short portfolios based on top half-bottom half, top five-bottom five and top three-bottom three assets in terms of expected returns. Their break point for in-sample and out-of-sample testing is the end of 2013. Using monthly data for the 28 commodity contract series and the 59 predictors during January 1987 through October 2019, they find that:
- Compared to linear factor models, tree-based methods boost in-sample and out-of-sample powers to predict next-month gross return variation. For example, the boost from Logit versus bagged trees for top three-bottom three portfolios is 0.58% to 56.2% in-sample and 0.33% to 3.74% out-of-sample.
- Findings are qualitatively robust to: changing portfolio rebalancing frequency and forecast horizon from four weeks to two weeks; changing the number of long and short positions in the portfolio; modifying the bootstrapping approach; changing the futures contract roll rule; adjusting the in-sample/out-of-sample break point; altering the portfolio rebalancing rule; and, normalizing predictors on a zero-to-one scale.
- Conversely, even with tree-based methods, over 96% of out-of-sample return variation remains unpredictable, and it is unlikely that refinements to improve in-sample predictability from 56.2% toward 100% would increase out-of-sample predictability much further.
In summary, evidence indicates that tree-based methods accommodating multivariate interactions and non-linearities in relationships offer considerable improvement over linear factor models for pricing different asset.
Cautions regarding findings include:
- Findings are based on gross, not net, returns. While futures trading may be low-cost, there could be a material difference in costs of rebalancing portfolios of futures portfolios every four weeks between linear and tree-based approaches. The paper does not address differences in portfolio turnover.
- Testing multiple models on the same data introduces data snooping bias, such that the best result overstates predictability.
- Execution of tree-based methods is computationally problematic for much of the sample period. Were such methods readily available, they may have affected historical asset pricing.
- All methods described are beyond the reach of most investors, who would bear fees for delegating to a fund manager.