Preprint / Version 1

A Comparison of How Four Large Language Models Resolve Trolley-Problem Moral Dilemmas

##article.authors##

  • Ayush Agarwal Student

DOI:

https://doi.org/10.58445/rars.3925

Keywords:

Large language models (LLMs), Moral reasoning, AI ethics, Trolley problem

Abstract

As large language models (LLMs) are increasingly consulted on questions that carry moral weight, whether different models resolve those questions differently has become an empirical matter. Using a fully crossed factorial design—8 scenario variants × 5 prompt framings × 4 models (GPT-5, Claude Sonnet 4.6, Gemini 3.5 Flash, Grok 4.3 Fast) × 3 replicates = 480 trials—this study measured how each model resolves trolley-style dilemmas and how its choice shifts with the morally relevant feature of the scenario and with the framing of the question. Each trial was coded as utilitarian (choosing the option that maximizes total lives) or not. The four models differed markedly and in a clean order: Claude chose the life-maximizing option 40.8% of the time versus Grok’s 75.8%—a 35-percentage-point spread whose Wilson confidence intervals do not overlap—with Gemini (56.7%) and ChatGPT (66.7%) between them. Scenario features produced the largest and most interpretable pattern: the consent scenario (8%) and the footbridge personal-force case (35%) drove utilitarian choice to the floor, mirroring established human moral psychology, while the clean baseline produced unanimous life-maximizing across every model. Notably, no model ever refused or returned an unclear answer in 480 trials. Requesting step-by-step reasoning was associated with higher utilitarian choice, while rephrasing the dilemma from harm to rescue was associated with lower choice. A mixed-effects logistic regression that accounts for the replicate dependence is consistent with the model and scenario effects holding once the repeated-measures structure is modeled (Supplement S1). Together the results suggest that LLM moral verdicts are model-distinctive and framing-sensitive, consistent with the view that they reflect learned, deployment-shaped dispositions rather than stable ethical commitments.

References

Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., Bonnefon, J.-F., & Rahwan, I. (2018). The Moral Machine experiment. Nature, 563(7729), 59–64. https://doi.org/10.1038/s41586-018-0637-6

Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C., Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., … Kaplan, J. (2022). Constitutional AI: Harmlessness from AI feedback [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2212.08073

Bonnefon, J.-F., Shariff, A., & Rahwan, I. (2016). The social dilemma of autonomous vehicles. Science, 352(6293), 1573–1576. https://doi.org/10.1126/science.aaf2654

Cheung, V., Maier, M., & Lieder, F. (2025). Large language models show amplified cognitive biases in moral decision-making. Proceedings of the National Academy of Sciences, 122(25), e2412015122. https://doi.org/10.1073/pnas.2412015122

Christiano, P. F., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30, 4299–4307.

Foot, P. (1967). The problem of abortion and the doctrine of double effect. Oxford Review, 5, 5–15.

Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and Machines, 30(3), 411–437. https://doi.org/10.1007/s11023-020-09539-2

Greene, J. D., Sommerville, R. B., Nystrom, L. E., Darley, J. M., & Cohen, J. D. (2001). An fMRI investigation of emotional engagement in moral judgment. Science, 293(5537), 2105–2108. https://doi.org/10.1126/science.1062872

Hursthouse, R. (1999). On virtue ethics. Oxford University Press.

Jin, Z., Levine, S., Kleiman-Weiner, M., Piatti, G., Liu, J., Gonzalez, F., Ortu, F., Strausz, A., Sachan, M., Mihalcea, R., Choi, Y., & Schölkopf, B. (2024). Multilingual trolley problems for language models [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2407.02273

Kant, I. (1998). Groundwork of the metaphysics of morals (M. Gregor, Ed. & Trans.). Cambridge University Press. (Original work published 1785)

Mill, J. S. (1998). Utilitarianism (R. Crisp, Ed.). Oxford University Press. (Original work published 1863)

Neuman, W. R., Coleman, C., & Shah, M. (2025). Analyzing the ethical logic of six large language models [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2501.08951

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P. F., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.

Scanlon, T. M. (1998). What we owe to each other. Harvard University Press.

Sorensen, T., Moore, J., Fisher, J., Gordon, M., Mireshghallah, N., Rytting, C. M., Ye, A., Jiang, L., Lu, X., Dziri, N., et al. (2024). A roadmap to pluralistic alignment [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2402.05070

Takemoto, K. (2024). The Moral Machine experiment on large language models. Royal Society Open Science, 11(2), Article 231393. https://doi.org/10.1098/rsos.231393

Thomson, J. J. (1985). The trolley problem. The Yale Law Journal, 94(6), 1395–1415. https://doi.org/10.2307/796133

Tversky, A., & Kahneman, D. (1981). The framing of decisions and the psychology of choice. Science, 211(4481), 453–458. https://doi.org/10.1126/science.7455683

Wallach, W., & Allen, C. (2009). Moral machines: Teaching robots right from wrong. Oxford University Press.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q. V., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824–24837.

Additional Files

Posted

2026-06-28