The three research questions guiding this study give the structure to the findings section. In particular, we first present a descriptive overview of the data, this is followed by a thematic overview, we then characterise the state of research evoked in the studies. Subsequently, we report on the theories of learning used to frame, examine or understand the implications of AI chatbots. Finally, we report on the discourses of AI manifested in the corpus.
RQ1 What is the current state of empirical research on AI chatbots in HE?
A tabular overview of the studies’ characteristics with information about study design, stakeholders, location, cohort size, research questions and main findings is provided in Appendix 1 and online here via Notion
We identify a range of study designs targeting different stakeholders. In particular, we identify the following studies: (1) AI chatbot performance studies, (2) studies examining teachers’ reflections on the impact of AI chatbots on student learning, and (3) studies examining students’ perceptions and use of AI.
AI chatbot performance
A number of studies target how AI chatbots perform in terms of being able to provide acceptable answers to questions asking information from different disciplines. In these studies, it is common to conduct an experiment where a chatbot’s performance is tested and validated by experts. For example, Khosravi et al., (2023) examined ChatGPT performance in relation to questions in genetics and found that 70% of the ChatGPT-generated responses provided a correct answer. Analysing the strengths and weaknesses in the answers provided by the chatbot, Khosravi et al., (2023) report that it performed better and more accurately on descriptive and memorisation tasks than the ones requiring students to engage with critical analysis and problem-solving. Additionally, Pursnani et al., (2023) examined GPT4-powered ChatGPT’s ability to answer questions in the US fundamentals of engineering exam and concluded that there have been significant improvements in mathematical capabilities and problem-solving of complex engineering cases. Hallal et al., (2023) testing the claim of using AI chatbots as valuable tools in students’ learning examined ChatGPT (both in GPT3.5 and GPT4 versions) and Bard performance in text-based structural notations in organic chemistry, suggesting that AI chatbots’ integration in education needs careful consideration and monitoring. Finally, Farazouli et al., (2023) examined ChatGPT’s ability to answer home examination essay questions in law, philosophy, sociology and education as well as teacher’s assessment of ChatGPT text in comparison with student written texts to see if teachers could discern who wrote the texts. They report that teachers have difficulty discerning student-written texts from ChatGPT-written ones. Farazouli et al., (2023) also observed a tendency for teachers to downgrade student written texts when there was a potential presence of a chatbot. In these studies, AI chatbots’ performance was examined in relation to exam questions and assignments. Studies included here highlight how the high quality of AI chatbots output challenged teachers in distinguishing the answers between AI chatbot-written and student-written texts.
Teachers’ perceptions of AI chatbots
A number of studies target teachers’ concerns about how AI chatbots may impact students’ learning. Dakakni and Safa (2023), for example, identify that 67% of instructors expressed feelings of distrust toward AI chatbots as they felt it prodded students to plagiarize, which they found distasteful. In addition, when asked about how they felt about being trained on the usage of AI technology, approximately 83% were in favour of training if only for “policing” purposes and keeping a system of “checks and balances” on students’ tendencies to plagiarize. Barrett and Pack (2023) identify that both teachers and students had a shared understanding of what constitutes appropriate AI use. Kohnke et al., (2023), suggest the significance of familiarity and confidence with using AI-driven teaching tools play an important role in adoption; they also argue there are challenges and concerns that language instructors face representing a need for tailored support and professional development (Kohnke et al., 2023).
Students’ motivation for use
We also identified studies focusing on students’ widespread use of AI chatbots. In one study, 85.2% of the respondents (students) answered that they resorted to using AI technologies, primarily utilizing Bard and Quillbot at 38% and Chatbot GPT in second place at 25.6% (Dakakni & Safa, 2023). Explaining why students use chatbots, Lai et al., (2023) identify intrinsic motivation as the strongest motivator for ChatGPT use and identify that their findings are consistent with the prior literature on technology acceptance, in that perceived usefulness was found to be a strong predictor of behavioral intention (Lai et al., 2023). In a single respondent autoethnographic study (Schwenke et al., 2023), the authors report that there was a perceived value in using an AI chatbot to structure one’s thoughts during the writing of a degree thesis, but that the process required a process of continues validation. Al-Zahrani, (2023) found that students had a positive outlook of GAI and reported they were aware of the ethical concerns associated with their use. However, Al-Zahrani, (2023) does not detail any of such concerns in the study. Focusing on the impact that GAI had on students’ learning, Yilmaz & Karaoglan (2023) argued that the experimental group students’ computational thinking skills, programming self-efficacy, and motivation for the lesson were significantly higher than the control group students (Yilmaz & Karaoglan Yilmaz, 2023).
State of the evidence
Looking at the methodological choices reflected in the selected studies (see Appendix 1 or Notion for an overview of the results) we observe that our corpus consists of one small-scale but in-depth autoethnography study with (n = 1) participant (Schwenke et al., 2023), a survey study (n = 505) addressing students’ perceptions of whether they were ready to engage with generative AI in their future roles (Al-Zahrani, 2023), and another large scale study with (n = 1117) participants that examined the uptake of technology (Habibi et al., 2023). Survey studies (n = 11) use self-reported evidence to identify the impact (Rodway & Schepman, 2023) or readiness (Al-Zahrani, 2023) or willingness (Lai et al., 2023) to engage with AI in educational practice. A number of interview studies (3) (i.e. Jafari & Keykha, 2023) were identified in the corpus of articles we included in the review, as well as studies (4) using mixed methods included either in-depth or semi-structured interviews, as part of their data collection (i.e. Dakakni & Safa, 2023).
Several studies articulate evidence whereby subject experts validate the output of the chatbot, as is illustrated in (Farazouli et al., 2023; Hallal et al., 2023; Khosravi et al., 2023; Li et al., 2023). Expert validation, where AI generated responses are tested on experts for validity is a form of evidence about how well chatbots perform in higher education settings, and how well they could challenge core practices in education as was evidenced in Farazouli et al., (2023), but does not target students’ learning. We identify one study that utilizes a randomized control design (Yilmaz & Karaoglan Yilmaz, 2023), where the study suggests that students, in the chatbot intervention, did better in post-test creativity tests, as well as post-test computer programming self-efficacy tests, but this study had a very small sample size of only 45 students.
We identify a number of studies focusing on second language learning (n = 5) where the findings in these studies suggest that chatbots facilitate students structuring their thoughts in the second language, for example in (Yan, 2023; Zou & Huang, 2023). In this category, Escalante et al examined the differences between chatbot written and tutor written feedback, and found that students did not value tutor written feedback more than chatbot written feedback (Escalante et al., 2023). At the same time, we find examples of chatbots being unreliable, for example Zou and Huang underscore: “Nonetheless, its generative nature also gave rise to concerns for learning loss, authorial voice, unintelligent texts, academic integrity as well as social and safety risks” (Zou & Huang, 2023).
RQ 2 What theories of learning are used in studies of chatbots and their impact on student learning?
Of the 23 studies examined only three studies make use of learning theories explicitly, referring to Experiential Learning (Li et al., 2023; Yan, 2023) and Reflective Learning (Li et al., 2023; Yan, 2023), Active Learning (Lai et al., 2023) and SRL (Lai et al., 2023). More specifically, Li et al., (2023) use experiential learning and reflective learning aiming to investigate the effectiveness of ChatGPT in generating reflective writing and the potential challenges it poses to academic integrity. Li et al., (2023) implement a framework for assessing the quality of reflective writing based on experiential learning and reflective activity. Yan (2023) uses reflective learning and experiential learning to investigate students’ behaviour and reflections in their exposure to ChatGPT in writing classrooms (Yan, 2023). The design of the study is inspired by experiential learning where undergraduate students used ChatGPT as part of their practicum. Lai et al., (2023) uses active learning theory and SRL to motivate the empirical approach. That study identifies active learning and SRL as key components to explore and identify the impact of chatbots on students’ motivation in their learning. Furthermore, Lai et al. (2023) explore the impact of AI chatbots on students’ active learning using the Technology Acceptance Model (TAM) to examine undergraduate students’ motivation and intention to use ChatGPT. On this note, we acknowledge that although only Li et al. (2023), Yan (2023) and Lai et al. (2023) make explicit reference to learning theories a number of studies (Bernabei et al., 2023; Chan & Hu, 2023; Habibi et al., 2023; Maheshwari, 2023) draw on behaviour research theories such as technology acceptance model (TAM), or theory of planned behaviour (TPB) and Unified theory of acceptance and use of technology (UTAUT). Other conceptual frameworks such as Biggs’s 3Ps (Presage-Process-Product) model, examining the acceptance and use of technology. In these studies, the focus seems to lie on exploring certain factors that influence individuals’ behavioural intentions, such as Performance expectancy, Effort expectancy and Hedonic Motivation, when adopting a new technology. Across the rest of the examined work (n = 20), we conclude that theories of students’ learning, and teachers’ practices are absent.
RQ3 What discourses about AI are found in the literature?Bearman’s et al.’s (2022) discourses of imperative response and altering authority were found in the studies in this review. More in particular, the discourse of imperative response acts predominantly to frame the selected studies. We observe a tendency to establishing emerging, and potentially disruptive changes as a form of either a positive transformation offering new potentials to teaching and learning or as an existential threat to university practices. We identified discourses of utopia-just-around-the-corner in several studies for example Dakakni and Safa (2023) and Rodway and Schepman (2023) highlighting the significant advantages of AI in education that support tailored learning experiences, boost and improve student learning and facilitate the identification of students’ strengths and weaknesses, and subsequently adapting lessons to cater to their individual learning needs. Additionally, focusing on how AI chatbots and their usability were portrayed we identified this discourse represented in studies such as (Chan & Hu, 2023; Hallal et al., 2023; Jafari & Keykha, 2023; Lai et al., 2023) arguing for the ‘undeniable’ benefit that HE institutions need to harness. More specifically these studies highlight the positive impact AI chatbots may have on students’ learning, by providing tailored feedback on assignments, supporting learners in pinpointing areas for improvement, and avoiding potential embarrassment from direct and judgmental instructor criticism, as well as on teaching activities by creating interactive activities and suggesting relevant resources, enabling students to learn at their own pace. Another claim which suggests the inevitable impact AI chatbots will have, is presented in Hallal et al., (2023) “The invention of AI-Chatbot is undeniably one of the most remarkable achievements by humanity, harnessing an unparalleled level of power and potential. In the near future, AI-chatbots are expected to become valuable tools in education, aiding students in their learning journeys.” (p1)
The Dystopia-is-now discourse was also represented in several studies (i.e. Al-Zahrani, 2023), where AI in education is portrayed as having a disruptive character potentially displacing teachers’ roles and functions in education, and anticipates a negative impact on knowledge work productivity. Specifically, on AI chatbots, studies such as (Escalante et al., 2023; Farazouli et al., 2023; Li et al., 2023) raise concerns about AI chatbots’ disruptive impact on education, particularly posing a potential threat to assessment practices and negative consequences to students’ reflective writing and critical thinking.
In Farazouli et al., (2023) for example, chatbots are potentially an existential threat to higher education teachers’ assessment practices which we identify as the dystopia-is-now discourse. In Al-Zahrani (2023) we identify the utopia-around-the-corner thematicization expressed through the framing of the study. There, they note that “Dwivedi et al (2023) argue that GPT’s, in particular, will disrupt education and believe their biggest impact will be on knowledge work productivity”. Similarly, we identify examples of utopia-around-the-corner discourses in the discussion of the same paper when the authors note “in summary, the findings indicate the readiness of the higher education community in Saudi Arabi to integrate AI technologies in research and development.” p.11 (Al-Zahrani, 2023) articulating that the education community in a country is ready to take on the challenge presented by AI in education. This bold claim is made on self-reported data from a survey on students’ perception of AI.
We also found the discourse of altering authority reflected in the corpus. Several studies seem to align with this discourse about the integration of AI in education, such as in Dakakni and Safa, (2023) and Jafari and Keykha (2023), where the authors refer to AI learning systems as beneficial for students’ convenience and personalised learning without the intervention of teachers. Additionally, Rodway and Schepman (2023) and Mohamed et al. (2023), discuss AI in education as enabling teachers to offer individualised and ‘customized’ experiences to students. In several studies, this discourse was also present when framing AI chatbots’ integration in HE.
In Farazouli et al., (2023) for example the findings suggest teachers lose a sense of control which we understand as an expression of the discourse of altering authority. Here agency is not necessarily only altered but there are concerns that it may be lost. Finally, Khosravi et al. (2023) portray AI chatbots as a game changer in clinical decision-making and genetics education and suggest that if their accuracy is improved, they can assist teachers in teaching and evaluating students.
We conclude that the studies we examined correspond in different ways to Bearman et al.’s (2022) discourse positions both in terms of framing AI chatbots as an imperative response and as altering authority in the context of HE representing both in dystopia-is-now or utopian-around-the-corner approaches.