Objective research to aid investing decisions

Value Investing Strategy (Strategy Overview)

Allocations for January 2025 (Final)
Cash TLT LQD SPY

Momentum Investing Strategy (Strategy Overview)

Allocations for January 2025 (Final)
1st ETF 2nd ETF 3rd ETF

Investing Expertise

Can analysts, experts and gurus really give you an investing/trading edge? Should you track the advice of as many as possible? Are there ways to tell good ones from bad ones? Recent research indicates that the average “expert” has little to offer individual investors/traders. Finding exceptional advisers is no easier than identifying outperforming stocks. Indiscriminately seeking the output of as many experts as possible is a waste of time. Learning what makes a good expert accurate is worthwhile.

Lookahead Bias in Large Language Model Training Data

Can Large Language Models (LLM) inject lookahead bias into backtests when rigor is lacking in generation of LLM training samples? In their preliminary and incomplete March 2024 paper entitled “Lookahead Bias in Pretrained Language Models”, Suproteem Sarkar and Keyon Vafa examine the potential for lookahead bias in backtests using the Llama-2 LLM to identify future firm risks based on content of earnings calls. They consider cases for which: (1) the backtest falls within the LLM training sample, but the researcher tells the LLM to consider only information before the test period; and, (2) the researcher specifies a training sample that ends before the backtest but generates it long after the end of the training sample. Using Llama-2 to interpret transcripts of selected firm earnings calls from 2018, they find that:

Keep Reading

Informativeness of Seeking Alpha Articles for Stock Returns

Are sentiments conveyed in Seeking Alpha articles useful for stock picking? In their January 2023 paper entitled “Seeking Alpha: More Sophisticated Than Meets the Eye”, Duo Selina Pei, Abhinav Anand and Xing Huan apply two-pass natural language processing to test the informativeness of articles from Seeking Alpha incremental to publicly available earnings data. Specifically, they each month:

  • Associate articles with one or more specific stocks.
  • Extract positive and negative sentiment at both phrase and aggregate levels for each article/stock.
  • Calculate a standardized net sentiment for each article/stock based on the difference between positive and negative mentions, emphasizing event sentiment over general sentiment.
  • Rank articles/stocks based on standardized net sentiment over the last month. Reform equal-weighted portfolios of articles/stocks by ranked tenths (deciles). Calculate both immediate [-1,+1] and 90-day future [+2,+90] average gross raw returns and average gross abnormal returns adjusted for size, book-to-market and momentum.
  • Sort stocks into 20 groups based on monthly standardized net sentiments up to two days before portfolio selection, excluding stocks with few articles or neutral sentiment. Reform an equal-weighted hedge portfolio that is long stocks with the highest sentiments and short stocks with the lowest (on average, 105 long and 86 short positions).

Using 350,095 articles published on Seeking Alpha since its inception in 2004 through the beginning of October 2018, daily returns of matched stocks and their options and associated earnings surprise data as available, they find that: Keep Reading

Day Trading Stocks with ChatGPT

Can artificial intelligence platforms such as ChatGPT be good stock day traders? In his March 2024 paper entitled “Can ChatGPT Generate Stock Tickers to Buy and Sell for Day Trading?”, Sangheum Cho tests whether ChatGPT 3.5 turbo supports profitable day trading. He instructs ChatGPT to pretend to be a professional day trader who picks from among U.S. listed stocks 100 to buy and 100 to sell for short-term returns based on daily Bloomberg and the Wall Street Journal news blurbs on Twitter. Each day, prior to the market open, he:

  • Uses the Refinitiv Eikon News Monitor to collect the selected tweets from the past 24 hours. He removes hyperlinks and duplicate tweets.
  • Segments the tweets into batches to accommodate ChatGPT processing limitations.
  • For each batch, asks ChatGPT to generate 100 BUY and 100 SELL signals, with 30 iterations for each batch to amplify signals by suppressing spurious selections. He then constructs equal-weighted long and short portfolios of stocks with signals.
  • For each stock with signals:
    • Sums BUY and SELL signals across batches/iterations to calculate SUM_BUY and SUM_SELL signals. He constructs signal count-weighted long and short portfolios from these summed signals.
    • Subtracts SUM_SELL from  SUM_BUY to calculate NET_BUY and NET_SELL signals. He constructs signal magnitude-weighted long and short portfolios from these netted signals.

For each portfolio, he excludes stocks with zero daily volume, missing daily prices or incomplete trading histories for the previous five trading days. He measures returns from the market open to the market close. Using 222,659 tweets (only 16,359 of which are firm-specific) and daily opening and closing prices for U.S. listed common stocks during December 2022 through December 2023 (271 trading days), he finds that:

Keep Reading

A Professor’s Stock Picks

Does finance professor David Kass, who presents annual lists of stock picks on Seeking Alpha, make good selections? To investigate, we consider his picks of:

We compare the average return for stocks picks each year with that for SPDR S&P 500 ETF Trust (SPY) for the same year as a benchmark. Using dividend-adjusted returns from Yahoo!Finance for SPY and most stock picks and returns from Barchart.com and Investing.com for three picks during their selection years, we find that: Keep Reading

Compendium of Live ETF Factor/Niche Premium Capture Tests

Some exchange-traded funds (ETF) focus on capturing potentially attractive factor premiums or thematic niches. Their histories offer a way to test these concepts live. We have conducted many such tests, listed here to offer a global view.

  1. “U.S. Equity Premium?” – evidence from simple tests on about 21 years of data suggests that stock market leadership shifts between the U.S. and other developed markets over time, but the U.S. may be better overall.
  2. “Tech Equity Premium?” – evidence from simple tests on 24 years of data suggests long boom, short bust for a tech/innovation-concentrated portfolio. It does not support belief in risk-adjusted outperformance.
  3. “Measuring the Size Effect with Capitalization-based ETFs” – evidence from simple tests of capitalization-based ETFs with nearly 22 years of data offers little support for belief in a long-term, reliably exploitable size effect among U.S. stocks.
  4. “Do Equal Weight ETFs Beat Cap Weight Counterparts?” – evidence from simple tests on some equal-weight U.S. equity ETFs offers little support for belief that equal weighting substantially and reliably beats capitalization weighting on a net basis.
  5. “Measuring the Value Premium with Value and Growth ETFs” – evidence from simple tests with 21.6 years of available data does not support belief that investors reliably capture a value premium via popular value-growth ETFs.
  6. “Are Equity Momentum ETFs Working?” – available evidence on attractiveness of momentum-oriented U.S. stock and sector ETFs is less than compelling.
  7. “Are Stock Quality ETFs Working?” – available evidence offers little support for belief that quality ETFs reliably beat respective benchmarks.
  8. “Are Low Volatility Stock ETFs Working?” – available evidence on attractiveness of low volatility stock ETFs is mixed, with recent data undermining belief in reliability of low volatility outperformance.
  9. “Are Equity Multifactor ETFs Working?” – available evidence offers very little support for belief that equity multifactor ETFs beat their benchmarks, or that they offer material diversification with comparable performance.
  10. “Are Hedge Fund ETFs Working?” – evidence on attractiveness of hedge fund-oriented ETFs is mostly negative.
  11. “Are Managed Futures ETFs Working?” – available evidence on attractiveness of managed futures ETFs in aggregate (but with recent short-sample exceptions) suggests that any benefits from diversification of equities and fixed income are unlikely to compensate for poor absolute returns.
  12. “Best Safe Haven ETF?” – evidence from simple tests over available and common sample periods suggests that silver, gold, longer-term U.S. Treasuries and investment grade corporate bonds are safe havens, while crude oil is clearly not.
  13. “Do High-dividend Stock ETFs Beat the Market?” – evidence from data for high-dividend U.S. stock ETFs does not support belief that high-dividend stocks reliably outperform the broad U.S. stock market.
  14. “Are ESG ETFs Attractive?” – available evidence suggests that ESG ETFs do not perform much differently from selected benchmarks.
  15. “How Are Renewable Energy ETFs Doing?” – available evidence on attractiveness of renewable energy ETFs is adverse overall, but with bursts of market outperformance perhaps due to novelty.
  16. “How Are Robotics-AI ETFs Doing?” – available evidence is that robotics-AI ETFs are less attractive than the broader technology exposure offered by QQQ.
  17. “How Are AI-powered ETFs Doing?” – available evidence does not support belief that ETFs using AI to select and weight assets are particularly attractive.
  18. “Are iShares Core Allocation ETFs Attractive?” – available evidence regarding attractiveness of iShares Core Asset Allocation ETFs is mixed to negative.
  19. “Are Target Retirement Date Funds Attractive?” – evidence offers little support for belief that target retirement date mutual funds are preferable to simple stocks-bonds diversification.
  20. “How Are TIPS ETFs Doing?” – available evidence on attractiveness of TIPS ETFs is mostly favorable after the recent inflation burst, with shorter duration funds offering more reliable inflation protection.
  21. “Are Equity Index Covered Call ETFs Working?” – available evidence on attractiveness of equity index covered call ETFs as either substitutes for or diversifiers of underlying stock indexes is generally adverse.
  22. “Are Equity Put-Write ETFs Working?” – available evidence on attractiveness of equity put-write ETFs is adverse.
  23. “Are IPO ETFs Working?” – available evidence on attractiveness of IPO ETFs is mixed, requiring very high risk tolerance of interested investors.
  24. “Are Preferred Stock ETFs Working?” – available evidence on attractiveness of preferred stock ETFs relative to a 60-40 stocks-bonds portfolio is largely negative.
  25. “Do Convertible Bond ETFs Attractively Meld Stocks and Bonds?” – available evidence suggests that convertible bond ETFs sometimes outperform and sometimes underperform a conventional 60-40 stocks-bonds portfolio.
  26. “Do ETFs Following Gurus/Insiders Work?” – available evidence on attractiveness of guru/insider-following stock ETFs is mostly adverse.
  27. “Congressional Trade Tracking ETFs” – limited available evidence suggests that investors should choose a fund mimicking holdings of Democrat rather than Republican members of Congress.
  28. “The Long and Short of Jim” – available evidence does not support belief that funds based on Jim Cramer’s stock/market recommendations reliably produce attractive short-term returns.
  29. “Live Test of the Stock Market Overnight Move Effect” – early evidence does not support belief in exploitability of the overnight move effect.

The upshot of the above items is that academic factor research and thematic speculations rarely translate to outperformance when implemented with ETFs.

A global caution is that the period since 2009 is strong for broad equity indexes, driven by a few large-capitalization firms. This trend may not persist.

ChatGPT Prediction of News-related Stock Market Returns

Is ChatGPT useful for predicting stock market returns based on financial news headlines? In the December 2023 version of their paper entitled “ChatGPT, Stock Market Predictability and Links to the Macroeconomy”, Jian Chen, Guohao Tang, Guofu Zhou and Wu Zhu investigate whether ChatGPT 3.5 can predict U.S. stock market (S&P 500 Index) returns based on Wall Street Journal front-page news headlines/alerts. The instruction they give ChatGPT 3.5 is:

“Forget all previous instructions. You are now a financial expert giving investment advice. I’ll give you a news headline, and you need to answer whether this headline suggests the U.S. stock prices are GOING UP or GOING DOWN. Please choose only one option from GOING UP, GOING DOWN, UNKNOWN, and do not provide any additional responses.”

They first compute monthly ratios of good news to total news (NRG) and bad news to total news (NRB) and then relate these ratios to S&P 500 Index excess returns over the next 1, 3, 6, 9 or 12 months. They compare the ability of ChatGPT to predict returns to that of traditional human interpretation and to those of BERT and RoBERTa as ChatGPT alternatives. Using daily Wall Street Journal front-page news headlines/alerts and monthly S&P 500 Index excess returns during January 1996 through December 2022, they find that:

Keep Reading

Equity Factor Timing from Deep Neural Networks

Can enhanced machine learning models accurately time popular equity factors? In their January 2024 paper entitled “Multi-Factor Timing with Deep Learning”, Paul Cotturo, Fred Liu and Robert Proner explore equity factor timing via a multi-task neural network model (MT) to capture the commonalities across factors and a dynamic multi-task neural network model (DMT) to extract financial and macroeconomic states. They attempt to time six well-known factors: (1) excess market return, size, value, profitability, investment and momentum. They employ 272 model inputs (123 macroeconomic and 149 financial) to predict each month:

  1. The sign of next-month return for each factor.
  2. The return for an equal-weighted portfolio that holds the factors (the risk-free asset) for factors with positive (negative) predicted returns.

The compare performances of MT and DMT with those of seven simpler off-the-shelf machine learning models: logistic regression (LR), penalized logistic regression (EN), random forest (RF), extremely randomized trees (XRF), gradient boosted trees (GBT), support vector machine (SVM) and feed-forward neural network (NN). For all models, they use the first 20 years of their sample period for training, the next five years for validation and the remaining years for out-of-sample testing. Their benchmark is an equal-weighted portfolio of all six factors. Using monthly data for the 272 model inputs and monthly returns for the six factors during January 1965 through December 2021, with out-of-sample testing starting January 1990, they find that: Keep Reading

ChatGPT Interpretation of Firm Earnings Calls

Can ChatGPT find red flags in firm earnings calls? In their January 2024 paper entitled “Unusual Financial Communication – Evidence from ChatGPT, Earnings Calls, and the Stock Market”, Lars Beckmann, Heiner Beckmeyer, Ilias Filippou, Stefan Menze and Guofu Zhou test the ability of ChatGPT-4 Turbo to identify and analyze unusual content and tone aspects of S&P 500 earnings calls. Unusualness has 25 dimensions derived from executive behaviors, analyst questions, specific content or technical issues. They examine correlations of unusualness with firm characteristics, industry and macroeconomic indicators across business cycles. They validate unusualness by looking at associated stock returns and trading volumes from one day before through one day after earnings calls. Using transcripts of S&P 500 earnings calls from Refinitiv, firm characteristics/stock trading data and macroeconomic data during January 2015 through December 2022, they find that:

Keep Reading

Profitable Machine Learning Stock Picking Strategies?

Can machine learning models pick stocks that unequivocally generate alpha out-of-sample? In their November 2023 paper entitled “The Expected Returns on Machine-Learning Strategies”, Vitor Azevedo, Christopher Hoegner and Mihail Velikov assess expected net returns and alphas of machine learning-based anomaly trading strategies. They use nine machine learning models to predict next-month stock returns based on inputs for up to 320 published anomalies, added to the mix according to respective publication dates:

They train the models using an expanding window, with the last seven years reserved for six years of validation and one year of out-of-sample-testing. During the test year, they each month reform a portfolio that is long (short) the value-weighted tenth, or decile, of stocks with the highest (lowest) predicted next-month returns. They then calculate actual next-month gross returns and 6-factor (market, size, value, profitability, investment and momentum) alphas during the test year. To calculate net returns and alphas, they multiply trading frictions estimated from historical bid-ask spreads times monthly portfolio turnovers. Using returns and firm characteristics for a broad sample of U.S. common stocks having data covering at least 20% of the 320 anomalies during March 1957 through December 2021, with out-of-sample tests starting January 2005, they find that:

Keep Reading

Understandable AI Stock Pricing?

Can explainable artificial intelligence (AI) bridge the gap between complex machine learning predictions and economically meaningful interpretations? In their December 2023 paper entitled “Empirical Asset Pricing Using Explainable Artificial Intelligence”, Umit Demirbaga and Yue Xu apply explainable artificial intelligence to extract the drivers of stock return predictions made by four machine learning models: XGBoost, decision tree, K-nearest neighbors and neural networks. They use 209 firm/stock-level characteristics and stock returns, all measured monthly, as machine learning inputs. They use 70% of their data for model training, 15% for validation and 15% for out-of-sample testing. They consider two explainable AI methods:

  1. Local Interpretable Model-agnostic Explanations (LIME) – explains model predictions by approximating the complex model locally with a simpler, more interpretable model.
  2. SHapley Additive exPlanations (SHAP) – uses game theory to determine which stock-level characteristics are most important for predicting returns.

They present a variety of visualizations to help investors understand explainable AI outputs. Using monthly data as described for all listed U.S stocks during March 1957 through December 2022, they find that:

Keep Reading

Login
Daily Email Updates
Filter Research
  • Research Categories (select one or more)