Pre-service Teachers Comparative Evaluation of Large Language Models

A survey among Learning Sciences

Autori

  • Emiliana Murgia Università di Genova
  • Flippo Bruni Università del Molise

Parole chiave:

LLMs Comparative Evaluation; Preservice teacher training; AI Teachers’ Perception; AI Literacy; ChatGPT and Gemini.

Abstract

The rise of generative Artificial Intelligence (AI) necessitates moving beyond digital competencies to AI literacy for educators. This study examines preservice primary teachers' knowledge, perceptions, and preferences regarding generative AI tools, specifically comparing ChatGPT and Gemini for instructional design tasks. Understanding user perceptions can help tailor effective teacher training pathways. A sample of 172 preservice teachers completed three comparative tasks using both ChatGPT and Gemini: generating lesson designs from given prompts, creating prompts following guidelines, and generating mathematical problems. Participants rated models on six performance dimensions and provided open-ended justifications. Data were analysed using qualitative content analysis of 721 responses and segmentation analysis across experience levels, gender, and task types. ChatGPT received the majority preference (67% overall), with consistent superiority across all tasks and experience levels. Qualitative analysis revealed three primary evaluation criteria: completeness, coherence, and organisation. Experienced users showed stronger ChatGPT preference (72%) compared to novices (61%). Task complexity moderated preferences, with ChatGPT demonstrating the strongest advantages in creative and analytical tasks. In conclusion, User preferences are driven by functional attributes—particularly response organisation and coherence—rather than advanced features. Effective teacher training requires differentiated pathways based on prior experience, explicit instruction in prompt engineering, and continuous assessment of evolving AI tools.

Riferimenti bibliografici

Biagini, G. (2025). Towards an AI-literate future: A systematic literature review exploring education, ethics, and applications. International Journal of Artificial Intelligence in Education. Advance online publication. https://doi.org/10.1007/s40593-025-00466-w

Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77-101. https://doi.org/10.1191/1478088706qp063oa

Charmaz, K. (2006). Constructing grounded theory: A practical guide through qualitative analysis. Sage Publications.

Chyung, S. Y., Roberts, K., Swanson, I., & Hankinson, A. (2017). Evidence-based survey design: The use of a midpoint on the Likert scale. Performance Improvement, 56(10), 15-23. https://doi.org/10.1002/pfi.21727

Cope C. and Ward P. (2002). Integrating learning technology into classrooms: The importance of teachers' per-ceptions. Journal of Educational Technology and Society, 5(1): 67-74.

Creswell, J. W., & Plano Clark, V. L. (2017). Designing and conducting mixed methods research.. Sage Publications.

De La Higuera, C. (2019). A report about education, training teachers and learning artificial intelligence: over-view of key issues. Education, Computer Sciences, 1-11.

Dengel, A., Gehrlein, R., Fernes, D., Görlich, S., Maurer, J., Pham, H., Großmann, G., & Eisermann, N. (2023). Qualitative Research Methods for Large Language Models: Conducting Semi-Structured Interviews with ChatGPT and BARD on Computer Science Education. Informatics, 10, 78. https://doi.org/10.3390/informatics10040078.

Dwivedi, Y. K., Kshetri, N., Hughes, L., Slade, E. L., Jeyaraj, A., Kar, A. K., Baabdullah, A. M., Koohang, A., Raghavan, V., Ahuja, M., Albanna, H., Albashrawi, M. A., Al-Busaidi, A. S., Balakrishnan, J., Barlette, Y., Basu, S., Bose, I., Brooks, L., Buhalis, D., ... Wright, R. (2023). Opinion paper: "So what if ChatGPT wrote it?" Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management, 71, 102642. https://doi.org/10.1016/j.ijinfomgt.2023.102642

European Commission (2019). Key Competences for Lifelong Learning. Publications Office of the European Union.

Hargittai, E. (2020). Potential biases in big data: Omitted voices on social media. Social Science Computer Review, 38(1), 10-24. https://doi.org/10.1177/0894439318788322

Krosnick, J. A., & Alwin, D. F. (1987). An evaluation of a cognitive theory of response-order effects in survey measurement. Public Opinion Quarterly, 51(2), 201-219. https://doi.org/10.1086/269029

Long, D., & Magerko, B. (2020, April). What is AI literacy? Competencies and design considerations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, April 25–-30), pp. 1–-16. Association for Computing Machinery. https://doi.org/10.1145/3313831.3376727

Murgia, E. & Bruni, F. (2024). Generative Artificial Intelligence at school: University students perceptions and visions at Learning Sciences Faculty. Education Sciences & Society, (2), 269-283.

Reja, U., Manfreda, K. L., Hlebec, V., & Vehovar, V. (2003). Open-ended vs. close-ended questions in web questionnaires. Developments in Applied Statistics, 19(1), 159-177.

Sperling, K., Stenberg, C. J., McGrath, C., Åkerfeldt, A., Heintz, F., & Stenliden, L. (2024). In search of artificial intelligence (AI) literacy in teacher education: A scoping review. Computers and Education Open, 6, 100169

Vasantha Raju, N., & Harinarayana, N. S. (2016). Online survey tools: A case study of Google Forms. In National Conference on Scientific, Computational & Information Research Trends in Engineering, GSSS-IETW, Mysore.

Venkatesh, V., & Davis, F. D. (2000). A theoretical extension of the technology acceptance model: Four longitudinal field studies. Management science, 46(2), 186-204. https://doi.org/10.1287/mnsc.46.2.186.11926

Zheng, L., Chiang, W. L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., & Stoica, I. (2023). Judging LLM-as-a-judge with MT-bench and chatbot arena. Advances in neural information processing systems, 36, 46595-46623.

##submission.downloads##

Pubblicato

2025-11-21

Come citare

Murgia, E., & Bruni, F. (2025). Pre-service Teachers Comparative Evaluation of Large Language Models: A survey among Learning Sciences. Journal of Inclusive Methodology and Technology in Learning and Teaching, 5(4). Recuperato da https://www.inclusiveteaching.it/index.php/inclusiveteaching/article/view/423