Assessing and Evaluating the Performance of Sequence to Sequence Models in Natural Language Generation

Emilio Medina

doi:10.58445/rars.1418

##article.authors##

Emilio Medina High School student

DOI:

https://doi.org/10.58445/rars.1418

Keywords:

Computer Science, Machine Learning, Artificial Intelligence, Sequence-to-Sequence, Encoder-Decoder, RNN, LSTM, GRU, Transformer Model, Generative Models

Abstract

Sequence-to-sequence models are a type of machine learning encoder-decoder architecture designed for tasks involving sequential data. This data type is vast and of great significance, yet, little research is available about the performance comparison between different sequence-to-sequence models. This paper aims to give a quantitative and qualitative analysis and comparison of an RNN, GRU, LSTM, and Transformer Model. This was achieved using the most well-known sequence-to-sequence metrics: Rougescore, BLEU, and BERTscore. The analysis was done for the task of text generation of Homer’s writing style using a small corpus of data. It was observed that for these conditions, the automated scores (Rougescore and BLEU score) are futile since they reward the mimicking of a sentence rather than the similarity to the writing style. Additionally, it was noted that the lack of data impacted the performance of the more complex models, supporting the claim that when little data is available less complex models proved to be more efficient. These findings are relevant since they offered a comparison between models for text generation tasks and suggested the need for more and different sequence-to-sequence evaluation metrics.

References

Celikyilmaz, Asli, Elizabeth Clark, and Jianfeng Gao. “Evaluation of Text Generation: A Survey.” arXiv 2 (June 26, 2020). https://arxiv.org/pdf/2006.14799.

G. Aalipour, P. Kumar, S. Aditham, T. Nguyen and A. Sood, "Applications of Sequence to Sequence Models for Technical Support Automation," 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 2018, pp. 4861-4869, doi: 10.1109/BigData.2018.8622395

GeeksforGeeks. “Box Plot in Python Using Matplotlib.” GeeksforGeeks, March 8, 2022. https://www.geeksforgeeks.org/box-plot-in-python-using-matplotlib/.

———. “Next Word Prediction With Deep Learning in NLP.” GeeksforGeeks, March 13, 2024. https://www.geeksforgeeks.org/next-word-prediction-with-deep-learning-in-nlp/.

“Inference Parameters - Amazon Bedrock,” n.d. https://docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html.

Karpathy, Andrej. “The Unreasonable Effectiveness of Recurrent Neural Networks.” Andrej Karpathy Blog, May 21, 2015. https://karpathy.github.io/2015/05/21/rnn-effectiveness/.

M, Siddharth. “Long Short Term Memory: Predict the Next Word.” Analytics Vidhya, February 26, 2024. https://www.analyticsvidhya.com/blog/2021/08/predict-the-next-word-of-your-text-using-long-short-term-memory-lstm/.

Moses, Kriz. “Encoder-Decoder Seq2Seq Models, Clearly Explained!!” Medium, January 7, 2022. https://medium.com/analytics-vidhya/encoder-decoder-seq2seq-models-clearly-explained-c34186fbf49b.

Mudadla, Sujatha. “NLP Model Metrics - Sujatha Mudadla - Medium.” Medium, December 14, 2023. https://medium.com/@sujathamudadla1213/nlp-model-metrics-b3fa32373269.

Sutskever, Ilya, Oriol Vinyals, and Quoc Le. “Sequence to Sequence Learning With Neural Networks.” arXiv 3 (September 10, 2014). https://arxiv.org/abs/1409.3215.

Team, AIContentfy, and AIContentfy Team. “Exploring Text Generation Models: A Comprehensive Overview.” AIContentfy, September 1, 2023. https://aicontentfy.com/en/blog/exploring-text-generation-models-comprehensive-overview.

Classics MIT. “The Internet Classics Archive | Works by Homer,” n.d. https://classics.mit.edu/Browse/browse-Homer.html.

Yanhui, Chen. “A Battle Against Amnesia: A Brief History and Introduction of Recurrent Neural Networks.” Medium, January 7, 2022. https://towardsdatascience.com/a-battle-against-amnesia-a-brief-history-and-introduction-of-recurrent-neural-networks-50496aae6740.

Y. Deng, L. Wang, H. Jia, X. Tong and F. Li, "A Sequence-to-Sequence Deep Learning Architecture Based on Bidirectional GRU for Type Recognition and Time Location of Combined Power Quality Disturbance," in IEEE Transactions on Industrial Informatics, vol. 15, no. 8, pp. 4481-4493, Aug. 2019, doi: 10.1109/TII.2019.2895054.

Yong Yu, Xiaosheng Si, Changhua Hu, Jianxun Zhang; A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Comput 2019; 31 (7): 1235–1270. doi: https://doi.org/10.1162/neco_a_01199

Ysthehurricane. “Next Word Prediction BI-LSTM Tutorial Easy Way.” Kaggle, August 15, 2021. https://www.kaggle.com/code/ysthehurricane/next-word-prediction-bi-lstm-tutorial-easy-way/input.

Zahra Fayyaz, Aya Altamimi, Carina Zoellner, Nicole Klein, Oliver T. Wolf, Sen Cheng, Laurenz Wiskott; A Model of Semantic Completion in Generative Episodic Memory. Neural Comput 2022; 34 (9): 1841–1870. doi: https://doi.org/10.1162/neco_a_01520

Zhang, Tianyi, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. “BERTScore: Evaluating Text Generation With BERT.” arXiv (Cornell University) 3 (April 21, 2019). https://arxiv.org/pdf/1904.09675.pdf.

Assessing and Evaluating the Performance of Sequence to Sequence Models in Natural Language Generation

##article.authors##

DOI:

Keywords:

Abstract

References

Downloads

Posted

Categories

License