Open Source Stock Predictor Data and Code

Steve LeCompte | July 6, 2020 | Posted in: Big Ideas, Equity Premium

Are published studies that predict higher returns for some U.S. stocks and lower for others based on firm accounting, stock trading and other data reproducible? In their May 2020 paper entitled “Open Source Cross-Sectional Asset Pricing”, Andrew Chen and Tom Zimmermann make available data and code that reproduce many published cross-sectional stock return predictors, allowing other researchers to modify and extend past studies. They commit to annual updates of their study. Defining statistical significance as achieving at least 95% confidence in predictive power, they include:

180 clear predictors that exhibit statistical significance in original studies and are easily reproducible.
30 likely predictors that exhibit statistical significance in original studies but are not precisely reproducible.
315 additional predictors covered in past studies that were not clearly tested or failed, or are variations of these predictors. They further extend this group by separately testing 1-month, 3-month, 6-month and 12-month portfolio reformation frequencies (1,260 total tests).

They compute all predictors on a monthly basis and create for each a long-short portfolio based on the specifications and the sample period of its original study. They check predictive power of each using data available at the end of each month to evaluate long-short portfolio returns the next month. They assume a 6-month lag for availability of annual accounting data and a 1-quarter lag for quarterly accounting data. They make no attempt to account for portfolio reformation frictions or to winnow predictors based on similarity. Using data and sample periods for U.S. firms/stocks as specified in original published studies as described above, they find that:

For the 180 clear predictors from prior studies, 98% (176 of 180) of reproductions confirm significance based on gross returns.
For the 30 likely predictors from prior studies, 90% (27 of 30) of reproductions exhibit significance based on gross returns.
Among these 210 predictors, gross performance declines systematically as the rebalancing frequency decreases from monthly to annually, with median gross monthly return declining from 0.65% to 0.40%. In compensation, portfolio reformation frictions should also decrease with lower turnovers.
Limiting the universe to the most liquid stocks also lowers gross performance of predictors, with median monthly decreases up to 0.35% depending on the liquidity metric applied.

In summary, published studies finding significant differences in gross performances of individual stocks based on firm return predictors are largely reproducible.

Cautions regarding findings include:

All analyses are gross, not net, with widely varying portfolio reformation frictions, shorting costs and shorting constraints. Using instead estimates of net predictor performance may dramatically alter findings. Study reproductions generally involve sample periods with trading frictions much higher than currently exist. For perspective, see “Trading Frictions Over the Long Run”.
Publication of a stock return anomaly may diminish its subsequent performance due to trader exploitation. See “Emptying the Equity Factor Zoo?”.
Data snooping bias for well over 1,000 predictor variations is substantial, such that stringent control for this bias may eliminate significance for many predictors. For perspectives on this caution, see results of this search.
Moreover, there may be additional inherited snooping bias due to experimentation with variables and parameters by authors of original studies.

Value Investing Strategy (Strategy Overview)

Momentum Investing Strategy (Strategy Overview)

Open Source Stock Predictor Data and Code

Become a CXO Member

Login

Daily Email Updates

Filter Research

Research Categories (select one or more)

Date Range (optional)

Recent Research