Preprint / Version 1

Sentiment Analysis for Youth Mental Welfare

A Comparative Study of Machine Learning Models

##article.authors##

  • Vincent Qin Lynbrook High School

DOI:

https://doi.org/10.58445/rars.1795

Keywords:

machine learning, artificial intelligence, sentiment analysis

Abstract

Mental health concerns among youth are becoming increasingly prevalent, with 20% of United States adolescents experiencing mental health problems[1]. A potential indicator of mental health concerns includes when a person’s texts express overwhelming sadness or hopelessness. I present a comparison of methods to determine the emotional polarity of text. The models are trained on the Stanford SST2[2] and IMDb[3] datasets as they are based on movie reviews, which exhibit particularly apparent emotions. The data is then encoded using a Bag-of-Words (BoW) strategy by only encoding the 10,000 most common words. I tested five models: a decision tree, a random forest, the Adaboost classifier created with scikit-learn[4], a feedforward neural network with two hidden layers created using the PyTorch module[5], and a fine-tuned version of the model DistilBERT[6]. The results are cross-validated by dividing the data into ten shards, training the model on nine shards, testing it on one shard, and then repeating this procedure for every shard. Finally, the models’ accuracies were compared. The DistilBERT model had the highest overall accuracy (94.89%), which made it the most suitable for large-scale classification tasks. However, the DistilBERT model has very high learning (613 m) and inference times (38 ms), which makes it inefficient for smaller tasks. Instead, I recommend the slightly weaker neural network and Adaboost models. Although they have lower accuracies (88.79% and 80.21%, respectively), their short learning time (~1-2 hours) and inference times (<10 ms) are suitable for smaller tasks that can be manually verified.

References

National Library of Medicine, Child and Adolescent Mental Health. Agency for Healthcare Research and Quality (US), 2022. Available: https://www.ncbi.nlm.nih.gov/books/NBK587174/

R. Socher et al., “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,” ACL Anthology, pp. 1631–1642, Oct. 2013, Accessed: Jun. 18, 2024. [Online]. Available: https://www.aclweb.org/anthology/D13-1170

A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, “Learning Word Vectors for Sentiment Analysis,” ACLWeb, Jun. 01, 2011. http://www.aclweb.org/anthology/P11-1015 (accessed Jun. 17, 2024).

F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, no. 85, pp. 2825–2830, 2011, Available: https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html

J. Ansel et al., “PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation,” Apr. 2024, doi: https://doi.org/10.1145/3620665.3640366.

V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a Distilled Version of BERT: smaller, faster, Cheaper and Lighter,” arXiv.org, 2019. https://arxiv.org/abs/1910.01108

American Academy of Pediatrics, “AAP-AACAP-CHA Declaration of a National Emergency in Child and Adolescent Mental Health,” www.aap.org, Oct. 19, 2021. https://www.aap.org/en/advocacy/child-and-adolescent-healthy-mental-development/aap-aacap-cha-declaration-of-a-national-emergency-in-child-and-adolescent-mental-health/ (accessed Jul. 18, 2024).

National Institute Of Mental Health, “Depression,” National Institute of Mental Health, Mar. 2023. https://www.nimh.nih.gov/health/topics/depression (accessed Jul. 23, 2024).

W. Uther et al., “TF–IDF,” Encyclopedia of Machine Learning, pp. 986–987, 2011, doi: https://doi.org/10.1007/978-0-387-30164-8_832.

P. Whatman, “Social Sentiment Online: Yes, Social Media is Becoming More Negative,” Mention, Mar. 27, 2018. https://mention.com/en/blog/social-media-mentions-analysis (accessed Aug. 19, 2024).

S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python. Beijing Etc.: O’reilly, 2009.

D. Berrar and W. Dubitzky, “Information Gain,” Springer eBooks, pp. 1022–1023, Jan. 2013, doi: https://doi.org/10.1007/978-1-4419-9863-7_719.

Y. Dodge, “Gini Index,” The Concise Encyclopedia of Statistics, pp. 231–233, 2021, doi: https://doi.org/10.1007/978-0-387-32833-1_169.

T. R. Shultz et al., “Confusion Matrix,” Encyclopedia of Machine Learning, pp. 209–209, 2011, doi: https://doi.org/10.1007/978-0-387-30164-8_157.

P. Refaeilzadeh, L. Tang, and H. Liu, “Cross-Validation,” Encyclopedia of Database Systems, pp. 532–538, 2009, doi: https://doi.org/10.1007/978-0-387-39940-9_565.

“Mental Health Awareness,” Ca.gov, 2023. https://www.dhcs.ca.gov/services/MH/Pages/MHAM_Matters.aspx

Downloads

Posted

2024-10-18