Identifying the Most Salient Audio and Language Features for Pediatric Specific Language Impairment Classification
DOI:
https://doi.org/10.58445/rars.1844Keywords:
Specific Language Impairment, Machine Learning, Speech AnalysisAbstract
Specific language impairment, also known as SLI, is a pediatric language disorder that delays the development of typical speech functions without the influence of other developmental delays or neurological disorders. SLI prevents children from clearly communicating their thoughts or desires with others and can persist throughout their lives if left undiagnosed. With the ability to provide scalable diagnostic services in the comfort of one’s home, machine learning solutions offer the potential for an accessible screening method for SLI, enabling a parent or guardian to identify potential markers and consult with a speech and language therapist about clinical actions. To address this opportunity, I developed a machine-learning solution to classify SLI based on audio and language features derived from the Talkbank Collection of the CHILDES dataset. I applied feature selection to identify the most salient features using top-ranked gradient-boosting features, logistic regression coefficients, and mutual information scores. The gradient-boosting classifier outperformed the other two methods, achieving 85% average accuracy, 85% average precision, and 83% average recall. The top features across the three feature selection strategies were the z-score of mean utterance length, age, perplexity of 1-gram SLI, word types to word token ratio, number of nouns followed immediately by a verb, flesch–kincaid score, repetitions, possessives, and the z-score of word errors. Of note, the flesch-kincaid score and perplexity of n-gram sequences, while not new, are relatively understudied features in SLI analysis and would benefit from additional research. Interestingly, prior ML studies have found these features appear in the context of other conditions, such as mild cognitive impairment and dementia.
References
Sporna, A. B. Static versus interactive online resources about dementia: A comparison of readability scores.
Cohen, T., & Pakhomov, S. (2020). A tale of two perplexities: sensitivity of neural language models to lexical retrieval deficits in dementia of the Alzheimer's type. arXiv preprint arXiv:2005.03593.
Bishop, D. V. (2006). What causes specific language impairment in children?. Current directions in psychological science, 15(5), 217-221.
Duinmeijer, I. (2013). Persistent problems in SLI: which grammatical problems remain when children grow older. Linguistics in Amsterdam, 6, 28-48.
Ebbels, S. H., Van Der Lely, H. K., & Dockrell, J. E. (2007). Intervention for verb argument structure in children with persistent SLI: A randomized control trial.
Shahmahmood, T. M., Jalaie, S., Soleymani, Z., Haresabadi, F., & Nemati, P. (2016). A systematic review on diagnostic procedures for specific language impairment: The sensitivity and specificity issues. Journal of Research in Medical Sciences, 21(1), 67.
Ebbels, S. (2014). Introducing the SLI debate. International Journal of Language & Communication Disorders, 49(4), 377.
MacWhinney, B. (2019). Understanding spoken language through TalkBank. Behavior research methods, 51, 1919-1927.
MacWhinney, B. (2000). The CHILDES project: The database (Vol. 2). Psychology Press.
Conti-Ramsden, Nicola Botting Zoësimkin, Emma Knox, G. (2001). Follow-up of children attending infant language units: Outcomes at 11 years of age. International journal of language & communication disorders, 36(2), 207-219.
Colozzo, P., Gillam, R. B., Wood, M., Schnell, R. D., & Johnston, J. R. (2011). Content and form in the narratives of children with specific language impairment.
Schneider, P., Hayward, D., & Dubé, R. V. (2006). Storytelling from pictures using the Edmonton narrative norms instrument. Journal of speech language pathology and audiology, 30(4), 224.
Rezzonico, S., Chen, X., Cleave, P. L., Greenberg, J., Hipfner‐Boucher, K., Johnson, C. J., ... & Girolametto, L. (2015). Oral narratives in monolingual and bilingual preschoolers with SLI. International Journal of Language & Communication Disorders, 50(6), 830-841.
Flesch, R. (2007). Flesch-Kincaid readability test. Retrieved October, 26(3), 2007.
Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54, 1937-1967.
Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in neurorobotics, 7, 21.
Flach, P. (2019, July). Performance evaluation in machine learning: the good, the bad, the ugly, and the way forward. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 9808-9814).
Dhal, P., & Azad, C. (2022). A comprehensive survey on feature selection in the various fields of machine learning. Applied Intelligence, 52(4), 4543-4581.
Nusinovici, S., Tham, Y. C., Yan, M. Y. C., Ting, D. S. W., Li, J., Sabanayagam, C., ... & Cheng, C. Y. (2020). Logistic regression was as good as machine learning for predicting major chronic diseases. Journal of clinical epidemiology, 122, 56-69.
Belghazi, M. I., Baratin, A., Rajeshwar, S., Ozair, S., Bengio, Y., Courville, A., & Hjelm, D. (2018, July). Mutual information neural estimation. In International conference on machine learning (pp. 531-540). PMLR.
Botting, N., & Conti‐Ramsden, G. (2001). Non‐word repetition and language development in children with specific language impairment (SLI). International Journal of Language & Communication Disorders, 36(4), 421-432.
Conti-Ramsden, G., & Jones, M. (1997). Verb use in specific language impairment. Journal of Speech, Language, and Hearing Research, 40(6), 1298-1313.
Lahey, M., & Edwards, J. (1999). Naming errors of children with specific language impairment. Journal of Speech, Language, and Hearing Research, 42(1), 195-205.
Bishop, D. V. (1994). Grammatical errors in specific language impairment: Competence or performance limitations?. Applied Psycholinguistics, 15(4), 507-550.
Popel, M., & Mareček, D. (2010). Perplexity of n-gram and dependency language models. In Text, Speech and Dialogue: 13th International Conference, TSD 2010, Brno, Czech Republic, September 6-10, 2010. Proceedings 13 (pp. 173-180). Springer Berlin Heidelberg.
Oetting, J. B., & Rice, M. L. (1993). Plural acquisition in children with specific language impairment. Journal of Speech, Language, and Hearing Research, 36(6), 1236-1248.
O’Keefe, David. (2017, April). Diagnose Specific Language Impairment in Children, Version 6. Retrieved June 23, 2023 from https://www.kaggle.com/datasets/dgokeeffe/specific-language-impairment.
Downloads
Posted
Categories
License
Copyright (c) 2024 Ronak Chadha
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.