Many institutional investors are attempting to exploit alternative data (less structured and more obscure than traditional data) to boost portfolio performance, supporting a complex system of data collectors, aggregators and organizers. How do they approach this potential edge? In their October 2020 paper entitled “Alternative Data in Investment Management: Usage, Challenges and Valuation”, Gene Ekster and Petter Kolm describe the alternative data ecosystem. They identify and discuss obstacles and emerging best practices in applying alternative data for investing purposes. They illustrate potential effectiveness of alternative data methods via a healthcare industry example. Based on review of current alternative data examples/obstacles/practices and samples of daily medical purchasing activity by 778 U.S. healthcare facilities (2.6% of all such facilities) during 2015 through 2017, they conclude that:
- Examples of alternative data are consumer transactions, satellite imagery, vehicle movements, bills of lading, cargo locations, cell phone locations and social media extracts. Data may be aggregated/structured by an intermediary or raw/unstructured (requiring considerably more analysis by the investor, but offering exclusivity).
- Due to complexity of alternative data analysis, there are no public domain software solutions for mapping alternative data to stock tickers. Ticker mapping based on machine learning exhibits promise for rapid and scalable tagging.
- The instability of alternative data works against approximating missing observations via averaging or extrapolation. Alternative data models should allow missing data.
- It is difficult to assess and improve representativeness of alternative data samples due to lack of structure.
- One way to evaluate alternative data is to check for significant correlations with firm operating metrics and/or stock returns, after controlling for other known predictive factors. Investors mostly use firm operating metrics because they are less noisy than stock returns.
- Due to short histories of alternative data, investors often use less quantitative methods to assess value of alternative data, such as the (manual and therefore time-consuming) Golden Triangle event study methodology:
- Identify time series changes (data events) in alternative data.
- Find corroborating real-world events from public sources, including firm news releases, that qualitatively support meaningfulness of each alternative data event.
- And, find corroborating stock returns that quantitatively support meaningfulness of each alternative data event.
- In the example, modeling medical purchasing data at a device-facility-month level provides insights into sales performance of new products, improving associated revenue prediction accuracy from 88% mean absolute error to 2.6%.
In summary, benefits of exploiting alternative data for investing purposes are difficult to assess.
Cautions regarding conclusions include:
- The authors do not quantify costs or investment benefits of exploiting alternative data.
- As noted in the paper, representativeness/reliability of alternative data is suspect. Short sample periods exacerbate this concern.
- Use of alternative data is beyond the reach of most investors, who would bear fees for delegating to an investment/fund manager.