How can the financial markets research community shed biases that exaggerate predictability and associated expected performance of investment strategies? In his January 2017 paper entitled “The Scientific Outlook in Financial Economics”, Campbell Harvey assesses the conventional approach to empirical research in financial economics, sharing insights from other fields. He focuses on the meaning of p-value, its limitations and various approaches to p-hacking (manipulating models/data to increase statistical significance, as in data snooping). He then outlines and advocates a Bayesian alternative approach to research. Based on research metadata and examples, he concludes that:
- The combination of unreported tests, lack of adjustment for multiple testing and direct/indirect p-hacking undermine the future usefulness of many published research findings.
- Journals across scientific fields are unlikely, and increasingly unlikely, to publish negative findings, amplifying the effect of study-by-study p-hacking/data snooping.
- Such negative feedback from publishers influences researchers not to submit articles with weak results and, further, to engage in p-hacking by (for example):
- Running thousands of trials and reporting only the most significant one.
- Trying a number of different statistical methods and reporting only the strongest result.
- Choosing a particular dataset.
- Excluding/manipulating parts of a dataset unfavorable to statistical significance (such as standardizing, winsorizing or excluding outliers).
- Concerns about misunderstanding and misuse of p-values recently prompted the American Statistical Association to issue a statement to improve conduct and interpretation of quantitative science.
- Increasing the threshold for statistical significance (such as requiring t-statistic > 3 rather than >2) may be necessary, but is not sufficient. If the effect under study is rare, even a t > 3 threshold allows a large number of false positives.
- Research requires injection of prior beliefs into test design, as in Bayesian inference (versus frequentist), because what really counts is the ability of a test to change belief.
- Bayesian inference involves a Bayes factor, the ratio of the outcome likelihood under the null hypothesis to outcome likelihood under the competing hypothesis under study. This factor quantifies shifts in belief based on new test results.
- The minimum Bayes factor, the lower bound of Bayes factors for all plausible alternative hypotheses, is independent of the design of the competing hypothesis. The minimum Bayes factor quantifies only how much at most belief shifts based on new test results. However, it avoids disagreement about competing hypothesis design and the complexities of full Bayesian statistics.
In summary, much published research is materially biased/unreliable, because research and publication practices suborn non-scientific agendas and ignore impact on belief.
Cautions regarding conclusions include:
- Conventional publishing and statistical practices are firmly rooted in habit, simplicity, reluctance to explicitly state prior beliefs and self-interest, and are likely to persist.
- As noted in the paper, Bayesian inference using the minimum Bayes factor retains some weaknesses of conventional (frequentist) statistical practices and may be too hard on the null hypothesis.