Might purveyors of trading strategies be presenting performance results biased by stopping them when falsely successful? In other words, might they be choosing lucky closing conditions for reported positions? In the December 2018 revision of their paper entitled “p-Hacking and False Discovery in A/B Testing”, Ron Berman, Leonid Pekelis, Aisling Scott and Christophe Van den Bulte investigate whether online A/B experimenters bias results by stopping monitored commercial (marketing) experiments based on latest p-value. They hypothesize that such a practice may exist due to: (1) poor training in statistics; (2) self-deception motivated by desire for success; or, (3) deliberate deception for selling purposes. They employ regression discontinuity analysis to estimate whether reaching a particular p-value causes experimenters to end their tests. Using data from 2,101 online A/B experiments with daily tracking of results during 2014, they find that:
- About 73% of experimenters stop the experiment just when a positive effect reaches 90% confidence.
- This optional stopping increases likelihood of a false positive finding from 33% to 40%.
- Improper stopping behavior is more likely when observing small effects.
In summary, evidence indicates that experimenters closely tracking outputs may bias results by optionally ending tests after lucky streaks.
This finding has some parallels with purveyors of trading strategy who report performance only for closed positions. They may be closing positions after lucky streaks, with open positions representing a hidden bow wave of lesser luck. See “Chapter 6: Modeling at the Portfolio Level”.
This paper addresses marketing, not investing, experiments. Parallels with investing are inferred.