AI Chatbots in Higher Education. A state-of-the-art review of an emerging research area

doi:10.21203/rs.3.rs-3893749/v1

Download PDF

Research Article

AI Chatbots in Higher Education. A state-of-the-art review of an emerging research area

https://doi.org/10.21203/rs.3.rs-3893749/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

AI chatbots trained on large language models are an example of Generative AI which brings promises and threats to the higher education sector. In this study, we examine the emerging research area of AI chatbots in higher education (HE), focusing specifically on empirical studies conducted since the release of ChatGPT. Our state-of-the-art review included 23 research articles published between December 2022 and December 2023 exploring the use of AI chatbots in HE settings. We take a three-pronged approach to the empirical data. We first, examine the state of the emerging field of AI chatbots in HE. Second, we identify the theories of learning used in the empirical studies on AI chatbots in HE. Third, we scrutinize the discourses of AI in HE framing the latest empirical work on AI chatbots. Our findings contribute to a better understanding of the eclectic state of the nascent research area of AI chatbots in HE, the lack of common conceptual groundings about human learning and the presence of both dystopian and utopian discourses about the future role of AI chatbots in HE.

AI Chatbots

Generative AI

Large Language Models

Discourses

Theories of Learning

Technological triggers may act as catalysts for educational development but can also lead to inflated hype and discourses with dystopian and utopian undertones (Bearman et al., 2022). Examples of such technological triggers in educational practices are plentiful, and range from the emergence of the calculator in the 1950s (Banks, 2011) and its widespread use in schools, to more recent phenomena in higher education, such as Massive Open Online Courses (MOOCs) (Barman et al., 2019). Generative Artificial Intelligence (GAI) using Large Language Models (LLM’s) is a more recent example of a technological trigger that brings promise but also connotations of fear in education. Moreover, the desire to implement AI in education and GAI specifically can be seen against the broader backdrop of societal progress and a pervasive trend towards digitising fundamental functions, for example, educational practices (McGrath et el., 2019). We identify three previous review studies examining the impact of AI chatbots (Wu and Yu, 2023; Alemdag, 2023 and Ansari et al., 2023). Wu and Yu (2023) and Alemdag’s, (2023) conduct meta-analyses on empirical studies and draw evidence on effects sizes irrespective of domain, discipline and contextual setting consulting primary, secondary, and tertiary education. In particular, Alemdag’s (2023) study points to paying more attention to the affordances of chatbots in providing a conversational exchange and environment while Wu and Yu, (2023) suggest to take account of both learning outcomes and negative effects in future studies due to the relatively little research that has reported on negative impact of AI chatbots on students’ learning output. Wu and Yu (2023) also show no significant differences between chatbot groups and control groups studied concerning learning engagement, confidence, motivation, and performance. Similarly, Alemdag (2023) adds that contradictory findings concern, for instance, study results where no significant difference was found between experimental and control groups when tested about vocabulary learning and reading skills in English as a Foreign Language (EFL) education. Ansari et al., (2023), on the other hand, conduct a systematic scoping review that examines higher education, including both conceptual and empirical work to map global evidence of chatbots’ effects. They found that students use ChatGPT as “personal tutors for various learning purposes”. They also report, “concerns related to accuracy, reliability, academic integrity, and potential negative effects on cognitive and social development identified in the selected articles” (p.1).

All three studies are valuable contributions in different ways, but all have their limitations due to the scope or degree of engagement with empirical work therefore, we argue there is a need to better understand the current state of the emerging field of AI chatbots in higher education (HE). To this aim, this study conducts a state-of-the-art review (Grant & Booth, 2009) focusing on empirical studies conducted since the release of ChatGPT. We included research articles exploring AI chatbots in higher education settings and published between December 2022 and December 2023. Taking a three-pronged approach to the empirical data, we (1) examine the state of the emerging field of AI chatbots in HE, (2) identify the theories of learning used in the selected studies, (3) scrutinize the discourses of AI (Bearman et al., 2022) framing the empirical work examined. The research questions guiding this review are the following:

RQ1: What is the current state of empirical research on AI chatbots in HE?
RQ2: What theories or conceptualizations of learning underpin the studies of AI chatbots?
RQ3: What discourses about AI are found in the literature?

The emergence of GAI

GAI has been brought about by technological developments in machine learning and the development of LLMs that utilise natural language processing. In broad terms, GAI refers to a type of AI based on unsupervised or self-supervised machine learning models pre-trained on certain datasets that can generate originals such as text, images, and sound. LLMs are artificial neural networks created to process and generate natural language text and mimic human representation of text (Wei et al., 2022). Using deep learning algorithms, LLMs learn the form (i.e., patterns and structures) of language from massive amounts of textual data and then use that information to generate new text based on prompts or inputs (Jurafsky & Martin, 2023). Chatbots date back to the 1960s when MIT's first chatbot, ELIZA, could simulate a conversation (Weizenbaum, 1966). Chatbots are also extensively used in language learning for commercial purposes and in educational contexts. The introduction of transformer architecture in 2017 (Vaswani et al., 2017) heralded significant developments in natural language processing (NLP). This new architecture of LLMs is designed to process large amounts of textual data efficiently and perform a wide range of complex language tasks. Since 2015, several generative pre-trained transformer (GPT) models have been introduced, and each iteration has been more powerful than the last. The original GPT was trained on a massive corpus of text data and can generate text in various styles and genres. This was followed in 2019 by GPT-2 (Radford et al., 2019), which was an even larger and more powerful model that could produce more human-like text. This model was followed by GPT-3 in 2020, including 175 billion parameters in which the model could be trained to perform new tasks with just a few examples of labelled data.

Evidence of learning in education research and practice

As noted above, the launch of LLM’s has taken the global education sector by storm, with higher education institutions being eclipsed by the increasing hype around LLMs, considered a “game changer” for the sector due to their transformative potential. However, hype should not drive educational policy. Instead, we argue policy and practice should be driven by an educational vision based on knowledge, research evidence, and sound reasoning. Using evidence-informed or evidence-based methods to design education practices is becoming more common in education contexts. The European University Association, (EUA), for example, highlights the importance of evidence-based learning and teaching (European University Association, 2020) and regards evidence-based approaches to teaching and learning as part of quality assurance processes.

Broadly, research output or evidence is often ranked from weakest to strongest and presented in the following order: single case studies, cohort studies on groups of people, randomised controlled trials using intervention and control groups to measure the effect of an intervention, evidence syntheses where different types of evidence are combined, and at the top of the hierarchy, systematic reviews where authors have systematically searched for, evaluated, and summarized a comprehensive amount of data to conclude the best available evidence (Farazouli et al., 2023). Hierarchies have been argued to help guide the evaluation of research that predominantly targets effectiveness with an emphasis on experimental and quantitative design, while examining non-causal relationships or engaging in exploratory work requires other methodologies and approaches (Burns & Schuller, 2007). However, applying research output hierarchies in educational contexts is challenging. On this note, Biesta (2007) raises several issues that question the potential analogy between using in education and medical contexts (Biesta, 2007) by observing that while medicine is a mainly post-positivist (objectivist) research tradition, education practice and research occur in situations where context is crucial. The critical role of context is also discussed by Ferguson and Clow, (2017) in their work about research evidence in learning analytics (Ferguson & Clow, 2017). There, the authors identify that “(t)he contexts in which learning occurs are highly variable and personal, and there is even less consensus about the ethics of conducting trials than there is in medicine” (p. 57). This state-of-the-art review actualizes questions about the quality of research evidence by asking where the research evidence is regarding the educational value of AI chatbots on student learning or university teachers’ practices.

Theories of learning

Drawing on Khalil et al., (2023), we consider the importance of learning theory in scientific output on emerging AI technologies in general and AI chatbots and their impact in HE in particular. Khalil et al.’s, (2023) scoping review examines the use and application of learning theories in Learning Analytics, aiming to explore the theoretical points of departure as well as the ontological and epistemological assumptions of this body of research. There, the authors identified 32 examples of learning theories across 74 texts with 94 instances of theories use, where some papers included multiple theories (Khalil et al., 2023). Self-Regulated Learning (SRL) is identified as the most dominant theory, followed by Cognitive Load Theory and Constructivism. They also observed a trend of increasing use of learning theory over time, especially SRL and Cognitive Load Theory. Khalil et al.’s study offers an overview of how learning theories are used in the Learning Analytics field but stop short of a critical engagement with how theory is used. Positioning an empirical study in a broader theoretical framework can help the reader understand how the results of empirical studies can enable transferability of the findings to other settings. Empirical work, does not as previously noted occur in de-contextualized settings, quite the contrary, they build on contextual readiness and here theories can help in situating the work both ontologically in understanding the difference between hype and substantiated claims about evidence, and epistemologically with respects to how we understand the value of the suggested findings. In previous educational contexts, calls for an increase in the use of theory have, at times at least, led to an increasing cosmetic use of theories of learning (McGrath et al., 2020). Given that research on the impact of AI chatbots in HE is currently an emerging research area, we argue for considering how learning theories are put into use.

Discourses of AI

We also draw on Bearman et al., (2022) critical review of how AI is used in the HE literature. In their study, the authors conduct a discourse analysis aiming to interpret references to AI in terms of; situated meanings, social language, intertextuality, figured worlds and conversations. The authors identify that AI is often associated with change and expressed by either dystopian or utopian discourses. They argue that few articles provide clear definitions of AI, and most associate it with either technological concepts or artefacts. Two predominant discourses are identified: i) the discourse of imperative response and ii) the discourse of altering authority. The discourse of imperative response portrays AI as a force of unprecedented and inevitable change, that requires HE to resist or adapt. This discourse of imperative response is portrayed either as dystopia-is-now, where universities need to resist AI-forced change, or utopia-just-around-the-corner, where institutions need to respond positively. The authors reason about how discourses construct different identities and perspectives for institutions, staff and students, along a spectrum of dystopian and utopian views. The discourse altering authority examines how AI challenges the locus of authority and agency within higher education. Bearman et al., (2022) discuss how this discourse depicts the shifting power dynamics between humans and machines, and between different actors such as teachers, students, corporations and universities. More specifically, they explain that the altering authority discourse pinpoints how AI challenges the role and purpose of teachers, as AI is to ‘invest the authority of the teacher into the technology’ (p.378), as well as what students can exist and function in an AI-embedded educational reality. The dipole of dystopia-is-now and utopia-just-around-the-corner is present in this discourse too, representing harmful agency loss for the first and benevolent enhancement in the latter. In the present review study, we examine to which extent and how these discourses are reflected in the empirical work gathered on AI chatbots in the HE literature.

We conducted a state-of-the-art review as they are organized in relation to how the understanding of the phenomena has evolved over time, or, as is the case in this study a phenomenon as it is emerging (Grant & Booth, 2009).

Specifically, we included original research articles published in the 31 highest impact journals as listed in Google Scholar (top 20 h5-index), SCImago Journal Rank (≥ 1700 Q1 SJR) and Journal Citation Reports by Clarivate (≥ 5.9 JIF). More specifically, we included HE journals, excluding a) the ones which were subject/discipline specific (i.e. Language Learning, STEM education); b) education psychology journals; and c) educational research methodology journals (see Table 1). Only journals targeting HE practices, including teachers and students were included. This choice was made based on our knowledge that a majority of the literature specific to advances in emerging technologies comes from computer science contexts (Bearman et al., 2022), and this is particularly relevant for emerging AI tools and technologies.

Table 1

Alphabetical list of Journals in Higher Education and articles on AI chatbots in 2023 included in the study
Name of the Journal	Articles on AI chatbots
1. Active Learning in Higher Education	-
2. Assessment & Evaluation in Higher Education	Farazouli et al., 2023
3. British Journal of Educational Technology	-
4. Computers and Education	-
5. Computers and Education: Artificial Intelligence	Habibi et al., 2023; Rodway et al., 2023; Yilmaz et al., 2023; Pursnani et al., 2023; Li et al., 2023; Lai et al., 2023; Kohnke et al., 2023; Hallal et al., 2023; Dakakni et al., 2023; Bernabei et al., 2023
6. Education and Information Technologies	Maheshwari, 2023; Mohamed, 2023; Yan, 2023; Zou et al., 2023; Guo et al., 2023
7. Higher Education	-
8. Higher Education for the Future	-
9. Higher Education Quarterly	-
10. Higher Education Research & Development	-
11. Innovations in Education and Teaching International	Al-Zahrani, 2023; Khosravi et al., 2023
12. International Journal of Computer-Supported Collaborative Learning	-
13. International Journal of Educational Technology in Higher Education	Escalante et al., 2023; Chan et al., 2023; Barrett et al., 2023
14. International Journal of Higher Education	-
15. International Journal of Sustainability in Higher Education	-
16. Internet and Higher Education	-
17. Journal of Applied Research in Higher Education	Jafari et al., 2023
18. Journal of Computing in Higher Education	-
19. Journal of Diversity in Higher Education	-
20. Journal of Further and Higher Education	-
21. Journal of Higher Education Policy and Management	-
22. Journal of International Students	-
23. Journal of Studies in International Education	-
24. Journal of University Teaching & Learning Practice	-
25. Research in Higher Education	-
26. Studies in Higher Education	-
27. Teaching in Higher Education	-
28. The Internet and Higher Education	-
29. The Journal of Higher Education	-
30. The Review of Higher Education	-
31. Trends in Higher Education	Schwenke et al., 2023

Reviews play an essential role in research and policy development, enabling the synthesizing of knowledge contributed on a topic, preventing the duplication of research efforts, and providing additional insights through the comparison and/or combination of individual pieces of research (Petticrew & Roberts, 2008). With the goal to examine the emerging research area of AI chatbots in HE, we targeted only empirical studies conducted during the last 12 months, since the release of OpenAI’s ChatGPT. However, our scope was open to all AI chatbots and was not restricted to any version of a GPT. We believe that a review of the emerging field of AI chatbots in HE from an empirical standpoint may be instrumental in helping the research community to take stock and value the knowledge gained so far as well as identify underexplored areas of research. Moreover, and more importantly, it may offer nuance in a time of hyperbole and hype (Bearman et al., 2022). To the best of our knowledge, no such reviews have previously been conducted in HE with the specific focus on AI chatbots. We employ a four-step review process.

Step One (Determine initial search question and field of inquiry): We defined the scope of the review, by specifying the questions we aimed to address, and the focus on HE journals. Initial conceptual and definitional work forms the basis of the review’s conceptual framework. Here we identified previous review work addressing the impact of chatbots across broader educational settings, but chose to target HE specifically.

Step two: (Selection criteria) We targeted studies published between December 2022- December 2023. We identified the 31 most prominent and influential HE journals (based on Google Scholar, SCImago Journal Rank and Journal Citation Reports by Clarivate). The search strategy encompassed a delineation of relevant sources (higher education) and essential keywords (Chatbots, generative pre trained transformers – GPT, Generative Artificial Intelligence - GAI). The search was driven by the following research questions guiding the review: What is the current state of empirical research on AI chatbots in HE? What theories or conceptualizations of learning are used in studies of chatbots and their impact on student learning? What discourses about AI are found in the literature?

Step three (Data extraction): We identified the studies conducted within HE and that were empirical. We assigned descriptive codes to variables such as country, study design, outcome metric/type, cohort size, subject and stakeholders. A total of 23 articles were included in the study (see Fig. 1).

Step four (Data analysis): Data analysis involved a three-step process; (1) we descriptively mapped: study design, stakeholders, location, cohort size, research questions and main findings across the corpus. We then analysed the data thematically targeting specifically stakeholders, study design and main findings and finally, we identified the types of evidence the studies evoke on the impact of AI chatbots; (2) we analysed the data looking for theories of learning in alignment with Khalil et al., (2023), to determine how theories of learning are used to conceptualize the impact of chatbots on learning, and (3) we analysed the data to determine if elements of the discourses presented by Bearman et al., (2022) were identified, this involves examining the terms and concepts ascribed to AI chatbots and their use or potential use in educational settings.

The three research questions guiding this study give the structure to the findings section. In particular, we first present a descriptive overview of the data, this is followed by a thematic overview, we then characterise the state of research evoked in the studies. Subsequently, we report on the theories of learning used to frame, examine or understand the implications of AI chatbots. Finally, we report on the discourses of AI manifested in the corpus.

RQ1 What is the current state of empirical research on AI chatbots in HE?

A tabular overview of the studies’ characteristics with information about study design, stakeholders, location, cohort size, research questions and main findings is provided in Appendix 1 and online here via Notion

We identify a range of study designs targeting different stakeholders. In particular, we identify the following studies: (1) AI chatbot performance studies, (2) studies examining teachers’ reflections on the impact of AI chatbots on student learning, and (3) studies examining students’ perceptions and use of AI.

AI chatbot performance

A number of studies target how AI chatbots perform in terms of being able to provide acceptable answers to questions asking information from different disciplines. In these studies, it is common to conduct an experiment where a chatbot’s performance is tested and validated by experts. For example, Khosravi et al., (2023) examined ChatGPT performance in relation to questions in genetics and found that 70% of the ChatGPT-generated responses provided a correct answer. Analysing the strengths and weaknesses in the answers provided by the chatbot, Khosravi et al., (2023) report that it performed better and more accurately on descriptive and memorisation tasks than the ones requiring students to engage with critical analysis and problem-solving. Additionally, Pursnani et al., (2023) examined GPT4-powered ChatGPT’s ability to answer questions in the US fundamentals of engineering exam and concluded that there have been significant improvements in mathematical capabilities and problem-solving of complex engineering cases. Hallal et al., (2023) testing the claim of using AI chatbots as valuable tools in students’ learning examined ChatGPT (both in GPT3.5 and GPT4 versions) and Bard performance in text-based structural notations in organic chemistry, suggesting that AI chatbots’ integration in education needs careful consideration and monitoring. Finally, Farazouli et al., (2023) examined ChatGPT’s ability to answer home examination essay questions in law, philosophy, sociology and education as well as teacher’s assessment of ChatGPT text in comparison with student written texts to see if teachers could discern who wrote the texts. They report that teachers have difficulty discerning student-written texts from ChatGPT-written ones. Farazouli et al., (2023) also observed a tendency for teachers to downgrade student written texts when there was a potential presence of a chatbot. In these studies, AI chatbots’ performance was examined in relation to exam questions and assignments. Studies included here highlight how the high quality of AI chatbots output challenged teachers in distinguishing the answers between AI chatbot-written and student-written texts.

Teachers’ perceptions of AI chatbots

A number of studies target teachers’ concerns about how AI chatbots may impact students’ learning. Dakakni and Safa (2023), for example, identify that 67% of instructors expressed feelings of distrust toward AI chatbots as they felt it prodded students to plagiarize, which they found distasteful. In addition, when asked about how they felt about being trained on the usage of AI technology, approximately 83% were in favour of training if only for “policing” purposes and keeping a system of “checks and balances” on students’ tendencies to plagiarize. Barrett and Pack (2023) identify that both teachers and students had a shared understanding of what constitutes appropriate AI use. Kohnke et al., (2023), suggest the significance of familiarity and confidence with using AI-driven teaching tools play an important role in adoption; they also argue there are challenges and concerns that language instructors face representing a need for tailored support and professional development (Kohnke et al., 2023).

Students’ motivation for use

We also identified studies focusing on students’ widespread use of AI chatbots. In one study, 85.2% of the respondents (students) answered that they resorted to using AI technologies, primarily utilizing Bard and Quillbot at 38% and Chatbot GPT in second place at 25.6% (Dakakni & Safa, 2023). Explaining why students use chatbots, Lai et al., (2023) identify intrinsic motivation as the strongest motivator for ChatGPT use and identify that their findings are consistent with the prior literature on technology acceptance, in that perceived usefulness was found to be a strong predictor of behavioral intention (Lai et al., 2023). In a single respondent autoethnographic study (Schwenke et al., 2023), the authors report that there was a perceived value in using an AI chatbot to structure one’s thoughts during the writing of a degree thesis, but that the process required a process of continues validation. Al-Zahrani, (2023) found that students had a positive outlook of GAI and reported they were aware of the ethical concerns associated with their use. However, Al-Zahrani, (2023) does not detail any of such concerns in the study. Focusing on the impact that GAI had on students’ learning, Yilmaz & Karaoglan (2023) argued that the experimental group students’ computational thinking skills, programming self-efficacy, and motivation for the lesson were significantly higher than the control group students (Yilmaz & Karaoglan Yilmaz, 2023).

State of the evidence

Looking at the methodological choices reflected in the selected studies (see Appendix 1 or Notion for an overview of the results) we observe that our corpus consists of one small-scale but in-depth autoethnography study with (n = 1) participant (Schwenke et al., 2023), a survey study (n = 505) addressing students’ perceptions of whether they were ready to engage with generative AI in their future roles (Al-Zahrani, 2023), and another large scale study with (n = 1117) participants that examined the uptake of technology (Habibi et al., 2023). Survey studies (n = 11) use self-reported evidence to identify the impact (Rodway & Schepman, 2023) or readiness (Al-Zahrani, 2023) or willingness (Lai et al., 2023) to engage with AI in educational practice. A number of interview studies (3) (i.e. Jafari & Keykha, 2023) were identified in the corpus of articles we included in the review, as well as studies (4) using mixed methods included either in-depth or semi-structured interviews, as part of their data collection (i.e. Dakakni & Safa, 2023).

Several studies articulate evidence whereby subject experts validate the output of the chatbot, as is illustrated in (Farazouli et al., 2023; Hallal et al., 2023; Khosravi et al., 2023; Li et al., 2023). Expert validation, where AI generated responses are tested on experts for validity is a form of evidence about how well chatbots perform in higher education settings, and how well they could challenge core practices in education as was evidenced in Farazouli et al., (2023), but does not target students’ learning. We identify one study that utilizes a randomized control design (Yilmaz & Karaoglan Yilmaz, 2023), where the study suggests that students, in the chatbot intervention, did better in post-test creativity tests, as well as post-test computer programming self-efficacy tests, but this study had a very small sample size of only 45 students.

We identify a number of studies focusing on second language learning (n = 5) where the findings in these studies suggest that chatbots facilitate students structuring their thoughts in the second language, for example in (Yan, 2023; Zou & Huang, 2023). In this category, Escalante et al examined the differences between chatbot written and tutor written feedback, and found that students did not value tutor written feedback more than chatbot written feedback (Escalante et al., 2023). At the same time, we find examples of chatbots being unreliable, for example Zou and Huang underscore: “Nonetheless, its generative nature also gave rise to concerns for learning loss, authorial voice, unintelligent texts, academic integrity as well as social and safety risks” (Zou & Huang, 2023).

RQ 2 What theories of learning are used in studies of chatbots and their impact on student learning?

Of the 23 studies examined only three studies make use of learning theories explicitly, referring to Experiential Learning (Li et al., 2023; Yan, 2023) and Reflective Learning (Li et al., 2023; Yan, 2023), Active Learning (Lai et al., 2023) and SRL (Lai et al., 2023). More specifically, Li et al., (2023) use experiential learning and reflective learning aiming to investigate the effectiveness of ChatGPT in generating reflective writing and the potential challenges it poses to academic integrity. Li et al., (2023) implement a framework for assessing the quality of reflective writing based on experiential learning and reflective activity. Yan (2023) uses reflective learning and experiential learning to investigate students’ behaviour and reflections in their exposure to ChatGPT in writing classrooms (Yan, 2023). The design of the study is inspired by experiential learning where undergraduate students used ChatGPT as part of their practicum. Lai et al., (2023) uses active learning theory and SRL to motivate the empirical approach. That study identifies active learning and SRL as key components to explore and identify the impact of chatbots on students’ motivation in their learning. Furthermore, Lai et al. (2023) explore the impact of AI chatbots on students’ active learning using the Technology Acceptance Model (TAM) to examine undergraduate students’ motivation and intention to use ChatGPT. On this note, we acknowledge that although only Li et al. (2023), Yan (2023) and Lai et al. (2023) make explicit reference to learning theories a number of studies (Bernabei et al., 2023; Chan & Hu, 2023; Habibi et al., 2023; Maheshwari, 2023) draw on behaviour research theories such as technology acceptance model (TAM), or theory of planned behaviour (TPB) and Unified theory of acceptance and use of technology (UTAUT). Other conceptual frameworks such as Biggs’s 3Ps (Presage-Process-Product) model, examining the acceptance and use of technology. In these studies, the focus seems to lie on exploring certain factors that influence individuals’ behavioural intentions, such as Performance expectancy, Effort expectancy and Hedonic Motivation, when adopting a new technology. Across the rest of the examined work (n = 20), we conclude that theories of students’ learning, and teachers’ practices are absent.

RQ3 What discourses about AI are found in the literature?

Bearman’s et al.’s (2022) discourses of imperative response and altering authority were found in the studies in this review. More in particular, the discourse of imperative response acts predominantly to frame the selected studies. We observe a tendency to establishing emerging, and potentially disruptive changes as a form of either a positive transformation offering new potentials to teaching and learning or as an existential threat to university practices. We identified discourses of utopia-just-around-the-corner in several studies for example Dakakni and Safa (2023) and Rodway and Schepman (2023) highlighting the significant advantages of AI in education that support tailored learning experiences, boost and improve student learning and facilitate the identification of students’ strengths and weaknesses, and subsequently adapting lessons to cater to their individual learning needs. Additionally, focusing on how AI chatbots and their usability were portrayed we identified this discourse represented in studies such as (Chan & Hu, 2023; Hallal et al., 2023; Jafari & Keykha, 2023; Lai et al., 2023) arguing for the ‘undeniable’ benefit that HE institutions need to harness. More specifically these studies highlight the positive impact AI chatbots may have on students’ learning, by providing tailored feedback on assignments, supporting learners in pinpointing areas for improvement, and avoiding potential embarrassment from direct and judgmental instructor criticism, as well as on teaching activities by creating interactive activities and suggesting relevant resources, enabling students to learn at their own pace. Another claim which suggests the inevitable impact AI chatbots will have, is presented in Hallal et al., (2023) “The invention of AI-Chatbot is undeniably one of the most remarkable achievements by humanity, harnessing an unparalleled level of power and potential. In the near future, AI-chatbots are expected to become valuable tools in education, aiding students in their learning journeys.” (p1)

The Dystopia-is-now discourse was also represented in several studies (i.e. Al-Zahrani, 2023), where AI in education is portrayed as having a disruptive character potentially displacing teachers’ roles and functions in education, and anticipates a negative impact on knowledge work productivity. Specifically, on AI chatbots, studies such as (Escalante et al., 2023; Farazouli et al., 2023; Li et al., 2023) raise concerns about AI chatbots’ disruptive impact on education, particularly posing a potential threat to assessment practices and negative consequences to students’ reflective writing and critical thinking.

In Farazouli et al., (2023) for example, chatbots are potentially an existential threat to higher education teachers’ assessment practices which we identify as the dystopia-is-now discourse. In Al-Zahrani (2023) we identify the utopia-around-the-corner thematicization expressed through the framing of the study. There, they note that “Dwivedi et al (2023) argue that GPT’s, in particular, will disrupt education and believe their biggest impact will be on knowledge work productivity”. Similarly, we identify examples of utopia-around-the-corner discourses in the discussion of the same paper when the authors note “in summary, the findings indicate the readiness of the higher education community in Saudi Arabi to integrate AI technologies in research and development.” p.11 (Al-Zahrani, 2023) articulating that the education community in a country is ready to take on the challenge presented by AI in education. This bold claim is made on self-reported data from a survey on students’ perception of AI.

We also found the discourse of altering authority reflected in the corpus. Several studies seem to align with this discourse about the integration of AI in education, such as in Dakakni and Safa, (2023) and Jafari and Keykha (2023), where the authors refer to AI learning systems as beneficial for students’ convenience and personalised learning without the intervention of teachers. Additionally, Rodway and Schepman (2023) and Mohamed et al. (2023), discuss AI in education as enabling teachers to offer individualised and ‘customized’ experiences to students. In several studies, this discourse was also present when framing AI chatbots’ integration in HE.

In Farazouli et al., (2023) for example the findings suggest teachers lose a sense of control which we understand as an expression of the discourse of altering authority. Here agency is not necessarily only altered but there are concerns that it may be lost. Finally, Khosravi et al. (2023) portray AI chatbots as a game changer in clinical decision-making and genetics education and suggest that if their accuracy is improved, they can assist teachers in teaching and evaluating students.

We conclude that the studies we examined correspond in different ways to Bearman et al.’s (2022) discourse positions both in terms of framing AI chatbots as an imperative response and as altering authority in the context of HE representing both in dystopia-is-now or utopian-around-the-corner approaches.

Given the hype that the emergence of AI chatbots has caused, we sought to examine the research work published since the launch of OpenAI’s ChatGPT in 2022. We examined empirical work in 31 journals of relevance for higher education, including 23 studies on AI chatbots. We argue there is value in contributing to understand current trends in an emerging research area while we acknowledge the exploratory nature of the review conducted. In what follows we discuss the main lessons learned from the present state-of-the-art review.

First, we observe relatively little available empirical work on the use of AI chatbots in the HE literature with only 23 studies in 31 Journals conducted since the launch of ChatGPT in November 2022. More importantly, we find highly various studies employing different methods, samples and a wide range of disciplines within such a small sample. The eclectic character of such studies makes us question the interest in, for instance, conducting meta-synthesis at the current time. We also find that it may be misleading to count effect sizes in studies that are so diverse methodologically speaking and ranging from language learning to engineering studies. We identify only one randomized control trial (Yilmaz & Karaoglan Yilmaz, 2023), but this is also less surprising given that it has only been one year since the release of the much-hyped generative AI chatbots, we also note that their study had a small sample size. The diversity of the studies, with such different objects of analysis makes comparison difficult. Moreover, we identify a dominance of studies published in journals targeting technology or computer and education contexts (n = 20). This may suggest that as of yet, higher education journals are not engaging with research on AI chatbots yet.

Although diverse, several of the studies included in this review are exploratory and experimental (Farazouli et al., 2023; Hallal et al., 2023; Jafari & Keykha, 2023; Khosravi et al., 2023), using expert validation techniques to examine the quality of chatbot responses to authentic examination questions that could be asked to students. We also observe that the majority of studies are either observational nature or based on small data sets, or is self-reported and as such cannot be yet used to identify robust findings or generalizations for university teachers and their practices. We know that major technological triggers such as GAI based on LLM’s can cause significant concern both in the public but also in educational contexts (Barman et al., 2019), but we conclude it may be too early to answer the question about the impact of AI chatbots’ value and impact on students’ learning and teachers’ practice or draw to far-reaching conclusions about the state of the evidence.

Second, only three studies in our corpus explicitly draw on theories of learning. It is likely that theories of learning or theories may need to become more central to studies on AI chatbots, as is evidenced in other contexts where theory becomes more important as a field develops (Khalil et al., 2023; Author et al., 2020). As the research area is emerging, we see both studies examining student’s willingness to engage with AI chatbots, as well as teacher perceptions about the meaning of this new technology for teacher-student relationships. The novelty of engaging with validation experiments of what AI chatbots can do may very well be more predominant in the early work surrounding chatbots. We conclude that work on the impact of AI chatbots in HE on student learning or teacher practices may be better informed by theories of learning or teachers’ practice.

Third, in our corpus the prevailing discourses identified by Bearman et al’s (2022), can be seen across the body of work examined in this study too, are used to frame and position the work, and also as a way of inferring meaning to future challenges and opportunities. Our work focussed on mapping the existing discourses, and we acknowledge that they are conceptualized quite broadly. Future work may also focus on de-constructing these discourses even further. We identify a disconnect, at times, between the findings that are presented and the bold claims, for example, identified as being encapsulated in the discourse of altering authority. This is an important observation given that “Not only do groups develop technologies with cultural assumptions and power relations in place that guide development efforts, but people also construct certain uses and purposes for technology through discourse that is itself, in turn, shaped in profound ways by cultural beliefs about technology”. (Haas, 1996, p. 227). In this sense, we conclude that the scientific literature should be wary of contributing to both the dystopian and utopian undertones when framing and considering the transferability of the findings presented in the studies. While the sample of this study is too small to draw any broader implications it is interesting that dystopian and utopian undertones are often ascribed to other scholars, in the framing of studies and accordingly act as rhetorical devices (Author et al., 2020), and not to the empirical evidence which is presented in the studies.

Moving ahead we argue that studies on AI chatbots could develop protocols for identifying the type of evidence and how robust the evidence is, we see a need for cost benefit analysis of AI chatbot studies, where the human labour in investing time in developing expertise on chatbots is seen as time that could be spent in engagement with students. Future studies could target student learning and critical thinking abilities, as well as studies focusing on concept mastery over time.

Limitations

A limitation of this work is that we could have examined literature that exists elsewhere, in the periphery of HE. This is a valid critique; however, the purpose of this study is to examine the emergence of a field of research of relevance to HE scholars and practitioners and in places where they are actively engaged, and therefore we conclude that this represents a genuine overview of the field of higher education in a broad sense. Finally, we are fully aware of the very small sample, yet we believe there is value in HE academics approaching the emergence of AI chatbots with more nuance and in more robust methodologically and theory-informed ways.

Alemdag, E. (2023). The effect of chatbots on learning: a meta-analysis of empirical research. Journal of Research on Technology in Education, 1–23
Al-Zahrani, A. M. (2023). The impact of generative AI tools on researchers and research: Implications for academia in higher education. Innovations in Education and Teaching International, 1–15. https://doi.org/10.1080/14703297.2023.2271445
Ansari, A. N., Ahmad, S., & Bhutta, S. M. (2023). Mapping the global evidence around the use of ChatGPT in higher education: A systematic scoping review. Education and Information Technologies, 1–41.
Banks, S. (2011). A historical analysis of attitudes toward the use of calculators in junior high and high school math classrooms in the United States since 1975.
Barman, L., McGrath, C., & Stöhr, C. (2019). Higher Education; for free, for everyone, for real? Massive Open Online Courses (MOOCs) and the Responsible University: History and enacting rationalities for MOOC initiatives at three swedish universities. In The Responsible University (pp. 117–143). Palgrave Macmillan, Cham.
Barrett, A., & Pack, A. (2023). Not quite eye to A.I.: Student and teacher perspectives on the use of generative artificial intelligence in the writing process. International Journal of Educational Technology in Higher Education, 20(1), 59. https://doi.org/10.1186/s41239-023-00427-0
Bearman, M., Ryan, J., & Ajjawi, R. (2022). Discourses of artificial intelligence in higher education: A critical literature review. Higher Education, 1–17.
Bernabei, M., Colabianchi, S., Falegnami, A., & Costantino, F. (2023). Students’ use of large language models in engineering education: A case study on technology acceptance, perceptions, efficacy, and detection chances. Computers and Education: Artificial Intelligence, 5, 100172. https://doi.org/10.1016/j.caeai.2023.100172
Biesta, G. (2007). Why “what works” won’t work: Evidence-based practice and the democratic deficit in educational research. Educational Theory, 57(1), 1–22.
Burns, T., & Schuller, T. (2007). The evidence agenda. Evidence in Education: Linking Research and Policy, 15–32.
Chan, C. K. Y., & Hu, W. (2023). Students’ voices on generative AI: Perceptions, benefits, and challenges in higher education. International Journal of Educational Technology in Higher Education, 20(1), 43. https://doi.org/10.1186/s41239-023-00411-8
Dakakni, D., & Safa, N. (2023). Artificial intelligence in the L2 classroom: Implications and challenges on ethics and equity in higher education: A 21st century Pandora’s box. Computers and Education: Artificial Intelligence, 5, 100179. https://doi.org/10.1016/j.caeai.2023.100179
Escalante, J., Pack, A., & Barrett, A. (2023). AI-generated feedback on writing: Insights into efficacy and ENL student preference. International Journal of Educational Technology in Higher Education, 20(1), 57. https://doi.org/10.1186/s41239-023-00425-2
European University Association. (2020). Evidence-Based Approaches to Learning and Teaching. EUA European University Association.
Farazouli, A., Cerratto-Pargman, T., Bolander-Laksov, K., & McGrath, C. (2023). Hello GPT! Goodbye home examination? An exploratory study of AI chatbots impact on university teachers’ assessment practices. Assessment & Evaluation in Higher Education, 1–13. https://doi.org/10.1080/02602938.2023.2241676
Ferguson, R., & Clow, D. (2017). Where is the evidence? A call to action for learning analytics. 56–65.
Grant, M. J., & Booth, A. (2009). A typology of reviews: An analysis of 14 review types and associated methodologies. Health Information & Libraries Journal, 26(2), 91–108.
Habibi, A., Muhaimin, M., Danibao, B. K., Wibowo, Y. G., Wahyuni, S., & Octavia, A. (2023). ChatGPT in higher education learning: Acceptance and use. Computers and Education: Artificial Intelligence, 5, 100190. https://doi.org/10.1016/j.caeai.2023.100190
Hallal, K., Hamdan, R., & Tlais, S. (2023). Exploring the potential of AI-Chatbots in organic chemistry: An assessment of ChatGPT and Bard. Computers and Education: Artificial Intelligence, 5, 100170. https://doi.org/10.1016/j.caeai.2023.100170
Jafari, F., & Keykha, A. (2023). Identifying the opportunities and challenges of artificial intelligence in higher education: A qualitative study. Journal of Applied Research in Higher Education, ahead-of-print(ahead-of-print). https://doi.org/10.1108/JARHE-09-2023-0426
Jurafsky, D., & Martin, J. H. (n.d.). Speech and Language Processing. Retrieved 25 March 2023, from https://web.stanford.edu/~jurafsky/slp3/
Khalil, M., Prinsloo, P., & Slade, S. (2023). The use and application of learning theory in learning analytics: A scoping review. Journal of Computing in Higher Education, 35(3), 573–594. https://doi.org/10.1007/s12528-022-09340-3
Khosravi, T., Al Sudani, Z. M., & Oladnabi, M. (2023). To what extent does ChatGPT understand genetics? Innovations in Education and Teaching International, 1–10. https://doi.org/10.1080/14703297.2023.2258842
Kohnke, L., Moorhouse, B. L., & Zou, D. (2023). Exploring generative artificial intelligence preparedness among university language instructors: A case study. Computers and Education: Artificial Intelligence, 5, 100156. https://doi.org/10.1016/j.caeai.2023.100156
Lai, C. Y., Cheung, K. Y., & Chan, C. S. (2023). Exploring the role of intrinsic motivation in ChatGPT adoption to support active learning: An extension of the technology acceptance model. Computers and Education: Artificial Intelligence, 5, 100178. https://doi.org/10.1016/j.caeai.2023.100178
Li, Y., Sha, L., Yan, L., Lin, J., Raković, M., Galbraith, K., Lyons, K., Gašević, D., & Chen, G. (2023). Can large language models write reflectively. Computers and Education: Artificial Intelligence, 4, 100140. https://doi.org/10.1016/j.caeai.2023.100140
McGrath, C., & Åkerfeldt, A. (2019). Educational technology (EdTech): Unbounded opportunities or just another brick in the wall? In Digital Transformation and Public Services (pp. 143–157). Routledge.
McGrath, C., Liljedahl, M., & Palmgren, P. J. (2020). You say it, we say it, but how do we use it? Communities of practice: A critical analysis. Medical Education, 54(3), 188–195. https://doi.org/10.1111/medu.14021
Maheshwari, G. (2023). Factors influencing students’ intention to adopt and use ChatGPT in higher education: A study in the Vietnamese context. Education and Information Technologies. https://doi.org/10.1007/s10639-023-12333-z
Petticrew, M., & Roberts, H. (2008). Systematic reviews in the social sciences: A practical guide. John Wiley & Sons.
Pursnani, V., Sermet, Y., Kurt, M., & Demir, I. (2023). Performance of ChatGPT on the US fundamentals of engineering exam: Comprehensive assessment of proficiency and potential implications for professional environmental engineering practice. Computers and Education: Artificial Intelligence, 5, 100183. https://doi.org/10.1016/j.caeai.2023.100183
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. https://www.semanticscholar.org/paper/Language-Models-are-Unsupervised-Multitask-Learners-Radford-Wu/9405cc0d6169988371b2755e573cc28650d14dfe
Rodway, P., & Schepman, A. (2023). The impact of adopting AI educational technologies on projected course satisfaction in university students. Computers and Education: Artificial Intelligence, 5, 100150. https://doi.org/10.1016/j.caeai.2023.100150
Schwenke, N., Söbke, H., & Kraft, E. (2023). Potentials and Challenges of Chatbot-Supported Thesis Writing: An Autoethnography. Trends in Higher Education, 2(4), 611–635. https://doi.org/10.3390/higheredu2040037
Storr, C., & McGrath, C. (2023). In search of the evidence: Digital learning in legal education, a scoping review. The Law Teacher, 1–16.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is All you Need. Advances in Neural Information Processing Systems, 30. https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., Chi, E. H., Hashimoto, T., Vinyals, O., Liang, P., Dean, J., & Fedus, W. (2022). Emergent Abilities of Large Language Models. Transactions on Machine Learning Research. https://openreview.net/forum?id=yzkSU5zdwD
Weizenbaum, J. (1966). ELIZA—a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45.
Wu, R., & Yu, Z. (2023). Do AI chatbots improve students learning outcomes? Evidence from a meta-analysis. British Journal of Educational Technology.
Yan, D. (2023). Impact of ChatGPT on learners in a L2 writing practicum: An exploratory investigation. Education and Information Technologies, 28(11), 13943–13967. https://doi.org/10.1007/s10639-023-11742-4
Yilmaz, R., & Karaoglan Yilmaz, F. G. (2023). The effect of generative artificial intelligence (AI)-based tool use on students’ computational thinking skills, programming self-efficacy and motivation. Computers and Education: Artificial Intelligence, 4, 100147. https://doi.org/10.1016/j.caeai.2023.100147
Zou, M., & Huang, L. (2023). The impact of ChatGPT on L2 writing and expected responses: Voice from doctoral students. Education and Information Technologies. https://doi.org/10.1007/s10639-023-12397-x

The authors declare no competing interests.

KopiaavCodingandfindings8JANFA.xlsx

Download PDF

Version 1

posted

You are reading this latest preprint version

AI Chatbots in Higher Education. A state-of-the-art review of an emerging research area

Status:

Version 1

Abstract

Figures

Introduction

The emergence of GAI

Evidence of learning in education research and practice

Theories of learning

Discourses of AI

Method

Findings

RQ1 What is the current state of empirical research on AI chatbots in HE?

AI chatbot performance

Teachers’ perceptions of AI chatbots

Students’ motivation for use

State of the evidence

Discussion

Limitations

References

Additional Declarations

Supplementary Files

Status:

Version 1