S&P 500 Daily Index Time Series Forecasting Given Global News Headlines Using LSTM, BERT, and GloVe Embeddings
DOI:
https://doi.org/10.58445/rars.54Keywords:
Economics, Time Series, Forecasting, NLP, Stock MarketAbstract
The ability to project capital markets performance to meet rapidly increasing accuracy demands has material implications for investors and capital raising environments. The evolution of natural language processing (NLP) and deep learning techniques has provided a previously unutilized approach to accurately forecast stock market performance, progress that has largely been allowed by the research and development of cutting-edge tools such as Google Bidirectional Encoder Representations from Transformers (BERT), Global Vectors for Word Representation (GloVe), and Word2vec. NLP models have been proven to be useful in projecting stock prices, inflation and economic factors, and fundraising potential, serving as vital tools for economists and others closely tracking these markets. The aim of this analysis is to produce an accurate NLP model for stock market price prediction utilizing pretrained BERT, LSTM and CNN-based models trained on global news headlines and correspondingly labeled daily percentage returns for the S&P 500 index. Artificial intelligence (AI) neural networks are reliable methods to accurately forecast stock market performance based on global news headlines in conjunction with opinion mining techniques, while accuracies over random sampling, e.g. 50% for a binary model and approx. 17% for a hex factor model, can have substantial impacts on economic forecasting. This project utilized three models and achieved a maximum accuracy of 54% for binary classification with BERT and 45% for multiclass with GloVe.
References
Ahn, J., & Oh, A. (2021, September 13). Mitigating language-dependent ethnic bias in BERT. ArXiv.Org. https://arxiv.org/abs/2109.05704
Applications of machine learning - Javatpoint. (n.d.). Www.Javatpoint.Com. Retrieved August 17, 2022, from https://www.javatpoint.com/applications-of-machine-learning
Binhuraib, T. (2020, October 16). NLP with CNNs. Towards Data Science. https://towardsdatascience.com/nlp-with-cnns-a6aa743bdc1e
Brown, R. (2021, September 2). What are the different types of sentiment analysis? Nerd For Tech. https://medium.com/nerd-for-tech/what-are-the-different-types-of-sentiment-analysis-808f36ef89ee
Brownlee, J. (2017, October 10). What are word embeddings for text? Machine Learning Mastery. https://machinelearningmastery.com/what-are-word-embeddings/
Burton, J. (2013, June 17). 5 charts to tell if stock buyers are too bullish. MarketWatch. https://www.marketwatch.com/story/5-charts-to-tell-if-stock-buyers-are-too-bullish-2013-06-17
Chawla, J. S. (2020, July 6). What is GloVe? - Analytics Vidhya - Medium. Analytics Vidhya. https://medium.com/analytics-vidhya/word-vectorization-using-glove-76919685ee0b
Context analysis in NLP: Why it’s valuable and how it’s done. (2019, February 19). Lexalytics. https://www.lexalytics.com/blog/context-analysis-nlp/
Corporate Finance Institute. (2019, March 26). Fundamental analysis. Corporate Finance Institute. https://corporatefinanceinstitute.com/resources/knowledge/trading-investing/fundamental-analysis/
Gentzkow, M., Kelly, B., & Taddy, M. (2019). Text as data. Journal of Economic Literature, 57(3), 535–574. https://doi.org/10.1257/jel.20181020
Graham, B., & Dodd, D. (2008). Security Analysis: Sixth Edition, foreword by Warren Buffett. Mcgraw-hill.
Herz, F., Ungar, L., Eisner, J., & Labys, W. (2014). Stock market prediction using natural language processing. https://patentimages.storage.googleapis.com/df/93/5d/4cc361daa8ee8c/US20030135445A1.pdf
Horev, R. (2018, November 17). BERT Explained: State of the art language model for NLP. Towards Data Science. https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270
IBM Cloud Education. (2020, August 17). What are Neural Networks? IBM. https://www.ibm.com/cloud/learn/neural-networks
Kurohara, J., Chang, J., & Hoskins, C. (2018). Predicting Stock Market Movements Using Global News Headlines. CS230.
Lexicon-Based sentiment analysis: A tutorial. (n.d.). KNIME. Retrieved August 17, 2022, from https://www.knime.com/blog/lexicon-based-sentiment-analysis
LSTM for text classification. (2021, June 14). Analytics Vidhya. https://www.analyticsvidhya.com/blog/2021/06/lstm-for-text-classification/
Özateş, M. N. (2021, February 20). Transformer architecture: How transformer models work? CARBON CONSULTING. https://medium.com/carbon-consulting/transformer-architecture-how-transformer-models-work-46fc70b4ea59
Repustate Team. (2022, January 4). Aspect based sentiment analysis. Repustate. https://www.repustate.com/blog/aspect-based-sentiment-analysis/
Sentiment analysis with LSTM. (2022, January 17). Analytics Vidhya. https://www.analyticsvidhya.com/blog/2022/01/sentiment-analysis-with-lstm/
Sun, J. (2016). Daily News for Stock Market Prediction, Version 1. https://www.kaggle.com/aaron7sun/stocknews
TensorFlow hub. (n.d.-a). Retrieved August 25, 2022, from https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-128_A-2/2
TensorFlow hub. (n.d.-b). Retrieved August 25, 2022, from https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3
Turc, I., Chang, M.-W., Lee, K., & Toutanova, K. (2019, August 23). Well-Read students learn better: On the importance of pre-training compact models. ArXiv.Org. https://arxiv.org/abs/1908.08962
Varian, H. (2014). Big Data: New Tricks for Econometrics. Journal of Economic Perspectives, 28(Spring), 3–28.
What is Technical Analysis and How Does it Work? (n.d.). Nadex. Retrieved August 18, 2022, from https://www.nadex.com/learning/introduction-to-technical-analysis/
(N.d.). Stanford. https://downloads.cs.stanford.edu/nlp/data/glove.6B.zip
Downloads
Posted
Versions
- 2022-12-22 (3)
- 2022-12-22 (2)
- 2022-11-11 (1)
Categories
License
Copyright (c) 2022 Arav Santhanam
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.