Preprint / Version 1

Comparing Prominent Generative Language Models for Classifying Political Alignment Of Limited Context Bigrams


  • Sankalp Singh Polygence



Machine Learning, Linguistics, Few-Shot Learning


Generative Language Models (GLMs) have transformed artificial intelligence by enabling human-like text generation across diverse applications. This study delves into GLM-generated content, focusing on the ability of GLMs to classify politically charged bigrams from congressional speeches with minimal context by creating a Python script for each GLM to prompt the models en masse. The investigation studies three major GLMs: Google's Bard, OpenAI's GPT-3.5 Turbo, and OpenAI’s GPT-4. Using prompts encompassing target bigrams, congress details, and polarity values, the study assesses the models' proficiency in aligning bigrams with left-leaning or right-leaning ideologies. The dataset originates from Stanford University, comprising of parsed political bigrams from congressional speeches and corresponding political polarity values for each bigram. Despite expected deviations from the exact Stanford benchmark polarity values, the GLMs show varying degrees of accuracy in political classification, with GPT-4 exhibiting the highest proficiency. The findings underline GLMs' capacity to consider context and infer political associations based on their training data. They also emphasize the complexities of language, ideology, and context. This research contributes to understanding GLMs' strengths, limitations, and implications in political discourse analysis.


Gentzkow, Matthew, Jesse M. Shapiro, and Matt Taddy. Congressional Record for the 43rd-114th Congresses: Parsed Speeches and Phrase Counts. Palo Alto, CA: Stanford Libraries [distributor], 2018-01-16.

Zhang, Y., Wang, X., Li, Y., & Liu, Z. (2023). How do generative language models learn from the internet? A survey of methods and challenges. arXiv preprint arXiv:2303.18223.

Matthew Gentzkow & Jesse M. Shapiro & Matt Taddy, 2019. "Measuring Group Differences in High‐Dimensional Choices: Method and Application to Congressional Speech," Econometrica, Econometric Society, vol. 87(4), pages 1307-1340, July.

Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Advances in neural information processing systems (pp. 4349-4357).

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.