Is there a way that asset managers can share knowledge/data across proprietary boundaries with many researchers to advance development of investment strategies? In their September 2019 paper entitled “Crowdsourced Investment Research through Tournaments”, Marcos Lopez de Prado and Frank Fabozzi describe highly structured tournaments as a crowdsourcing paradigm for investment research. In each such tournament, the organizer poses one investment challenge as a forecasting problem and provides abstracted and obfuscated data. Contestants pay an entry fee, develop models and provide forecasts, retaining model ownership by running calculations on their own hardware/software. Based on this hypothetical tournament setup and their experience, they conclude that:
- The asset management industry currently employs a “silo” approach (multiple in-house researchers working independently) to develop investment strategies, with four attendant flaws:
- Data scientists without asset management industry expertise do not participate.
- Budgetary constraints and confidentiality restrictions prevent asset managers from assembling large (diversified) research teams.
- Due to flaws 1 and 2, asset managers cannot efficiently monetize data and computer hardware/software.
- The process incentivizes backtest overfitting.
- A (costly) alternative obviating these flaws is the “assembly line,” with members of a large team of researchers specializing in data curation, feature analysis, strategy, backtesting, deployment and portfolio oversight.
- An informal (undirected) alternative is crowdsourcing via a publicly shared backtesting platform, including data and computing resources. Outputs from the crowd are trades or portfolios based on a variety strategies, of which respective developers retain ownership. This approach obviates flaws 2 and (perhaps too much) 3, but exacerbates flaw 4 via group snooping and survivorship bias.
- The tournament alternative addresses:
- Flaws 1 and 2 by explicitly and carefully posing the investment problem as a forecasting problem that all data scientists can tackle (by making available an abstracted data set and specifying training, validation and test subsets).
- Flaw 3 by obfuscating abstracted data such that contestants can use data only to win the tournament, thereby preserving the value of the original data as intellectual property.
- Flaw 4 by: (1) designing the contest to pay winners only for true out-of-sample performance; and, (2) requiring that contestants stake their own money in the tournament.
In summary, tournaments as described offer an alternative to silo, assembly line and backtesting platform approaches, incentivizing many data scientists without an investment background to contribute forecasts to an asset manager.
Cautions regarding conclusions include:
- Design of a tournament requires considerable careful work.
- It is not obvious how tournaments would account for differing costs of implementation among contestant forecasting solutions.
- It is not obvious what levels of contestant entry fees and winner awards would support robust participation.