The study investigated, using innovation diffusion theory, the potential enablers of the "intention to use ChatGPT" by students. The study identified five potential antecedents: relative advantage, compatibility, ease of use, trialability, and observability through an extensive literature review. All the constructs, viz. five independent and one dependent variable ("intention to use ChatGPT"), were operationalized as first-order factors. The measures for the first-order constructs were developed from extant literature and were subjected to expert review and feedback. A panel of ten experts was appointed. The experts were given a working definition of each construct and were asked to indicate the suitability of the items for measurement as 1 = not suitable, 2 = somewhat suitable, and 3 = highly suitable. Only those items with a rating of "highly suitable" from the expert panel were retained for further pre-testing. Then using convenience sampling and certain filtering criteria, the study contacted 198 students at a university in India to pre-test and validate the constructs' measurement items. After several follow-ups, 172 responses were received and subjected to principal component analysis in SPSS 28 to check if the items measured the intended constructs. Bartlett's test of sphericity and KMO were significant, suggesting correlations among the items.
Furthermore, the rotated factor loadings clearly showed one item (of relative advantage) as problematic and were deleted from further consideration. The rotated factor and loadings are shown below (Table 1). Items for Relative Advantage were labeled as ra1-ra5, compatibility as comp1-5, ease to use as eou1-4, trialability as trial1-4, observation as obs1-3, and intention to use as int1-3. One problematic item (ra6 - I am usually the first to try out innovations like ChatGPT) did not load appropriately and was removed to develop a clear pattern.
Table 1
|
Component
|
1
|
2
|
3
|
4
|
5
|
6
|
ra1
|
0.096
|
0.776
|
0.109
|
0.213
|
0.085
|
0.144
|
ra2
|
-0.021
|
0.794
|
-0.031
|
0.148
|
0.038
|
0.141
|
ra3
|
-0.027
|
0.803
|
0.102
|
0.052
|
-0.043
|
0.039
|
ra4
|
-0.006
|
0.895
|
0.061
|
0.030
|
0.116
|
0.049
|
ra5
|
0.015
|
0.930
|
0.135
|
0.080
|
0.007
|
0.064
|
comp1
|
0.848
|
-0.004
|
0.122
|
0.161
|
-0.075
|
0.098
|
comp2
|
0.861
|
-0.036
|
-0.055
|
0.004
|
0.061
|
-0.058
|
comp3
|
0.804
|
0.016
|
0.055
|
0.151
|
0.050
|
0.046
|
comp4
|
0.970
|
0.007
|
0.045
|
0.128
|
-0.057
|
0.051
|
comp5
|
0.918
|
0.067
|
0.060
|
0.114
|
-0.022
|
0.063
|
eou1
|
0.179
|
0.043
|
0.115
|
0.833
|
-0.011
|
0.092
|
eou2
|
0.060
|
0.074
|
0.051
|
0.856
|
0.054
|
0.091
|
eou3
|
0.166
|
0.210
|
-0.055
|
0.779
|
-0.024
|
-0.167
|
eou4
|
0.132
|
0.179
|
0.045
|
0.958
|
0.031
|
-0.015
|
trial1
|
0.039
|
0.004
|
0.815
|
-0.025
|
0.076
|
-0.028
|
trial2
|
0.042
|
0.095
|
0.891
|
0.096
|
0.007
|
-0.025
|
trial3
|
0.066
|
0.169
|
0.877
|
0.071
|
0.060
|
0.065
|
trial4
|
0.050
|
0.082
|
0.964
|
0.022
|
0.005
|
-0.011
|
obs1
|
0.005
|
0.095
|
0.047
|
0.023
|
0.960
|
-0.062
|
obs2
|
-0.047
|
0.024
|
0.048
|
0.023
|
0.953
|
-0.080
|
obs3
|
0.017
|
0.049
|
0.053
|
0.005
|
0.971
|
-0.032
|
int1
|
0.120
|
0.203
|
-0.014
|
0.063
|
-0.025
|
0.808
|
int2
|
0.025
|
0.047
|
-0.038
|
-0.023
|
-0.037
|
0.870
|
int3
|
0.016
|
0.114
|
0.044
|
-0.012
|
-0.100
|
0.836
|
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
|
a. Rotation converged in 5 iterations.
|
KMO = 0.717
|
Bartlett's Test of Sphericity: Chi-sq.= 3849.24, df = 276, p = 0.000
|
With the satisfactory measures, the study next prepared a questionnaire in Google forms for further data collection. The form contained the purpose of the study, definition and explanation of necessary items, and assurance of anonymity. The authors team then circulated the developed questionnaire to different students primarily studying in universities. A filter question was included enquiring if the respondents are "very familiar/somewhat familiar/not familiar at all" with ChatGPT. After several follow-ups and reminders, 336 complete responses were received. However, since the study primarily wanted to understand the attitude of university students towards ChatGPT for education and learning, the study excluded 20 school students that responded to the survey.
Further, 32 respondents indicated they were not familiar with ChatGPT. Therefore, the study excluded these respondents from further consideration and had 288 complete responses for final analysis. Next, the study analyzed the responses for their reliability and validity using confirmatory factor analysis.
The final sample profile has variability in terms of demographics. There were 140 female and 148 male responses. Furthermore, 171 respondents indicated they are 'very familiar,' and 117 respondents indicated they are 'somewhat familiar' with ChatGPT.
Measurement Model Evaluation
Confirmatory Factor Analysis (CFA) was conducted using AMOS 28 to evaluate the reliability and validity of the measures corresponding to the latent variables (Table 2). Reliability was assessed through Cronbach's alpha and composite reliability, while validity was assessed by examining standardized loadings, Average Variance Extracted (AVE), and discriminant validity (Fornell and Larcker, 1981). All items had loadings greater than 0.70 and AVE > 0.50, indicating adequate convergent validity (Hu and Bentler, 1999).
Table 2
Measurement Items: Reliability and Validity
Measurement Items
|
Items
|
Std. loadings*
|
|
Mean
|
SD
|
Skewness
|
Kurtosis
|
Relative Advantage
|
|
|
|
|
|
|
|
Using ChatGPT will improve my writing/coding skills
|
ra1
|
0.785
|
Alpha = 0.897
|
3.63
|
1.135
|
-0.911
|
0.042
|
ChatGPT will lead to a higher level of engagement in my subjects
|
ra2
|
0.78
|
C.R.=0.899
|
4.19
|
1.118
|
-1.424
|
1.161
|
ChatGPT will make writing/coding assignments and reports a better experience
|
ra3
|
0.75
|
AVE = 0.642
|
4.41
|
1.055
|
-1.965
|
3.089
|
ChatGPT helped me to learn more about technology while also learning about subjects
|
ra4
|
0.852
|
|
4.01
|
1.235
|
-1.197
|
0.451
|
Using ChatGPT made learning a better experience
|
ra5
|
0.835
|
|
3.99
|
1.183
|
-1.04
|
0.129
|
Compatibility
|
|
|
|
|
|
|
|
ChatGPT fits very well into my learning style
|
comp1
|
0.855
|
Alpha = 0.891
|
3.92
|
1.1
|
-1.218
|
0.84
|
Using ChatGPT will improve the quality of the work I do.
|
comp2
|
0.731
|
C.R.=0.900
|
4.01
|
1.079
|
-1.235
|
1.063
|
I have no issues in being monitored in the use of ChatGPT
|
comp3
|
0.72
|
AVE = 0.643
|
4
|
1.082
|
-1.296
|
1.205
|
I had more fun because of ChatGPT
|
comp4
|
0.806
|
|
3.75
|
1.117
|
-0.99
|
0.324
|
Using ChatGPT has risk-taking elements, which I enjoy
|
comp5
|
0.885
|
|
3.78
|
1.056
|
-0.924
|
0.337
|
Ease of Use
|
|
|
|
|
|
|
|
Instructions about using ChatGPT were easy to understand
|
eou1
|
0.764
|
Alpha = 0.860
|
3.61
|
1.193
|
-0.708
|
-0.357
|
I had no difficulty understanding how to get around in ChatGPT.
|
eou2
|
0.758
|
C.R.=0.876
|
3.89
|
1.059
|
-0.99
|
0.408
|
My role in using ChatGPT is clear and understandable.
|
eou3
|
0.766
|
AVE = 0.639
|
3.76
|
1.217
|
-0.88
|
-0.158
|
Using ChatGPT will require a lot of training
|
eou4
|
0.902
|
|
3.6
|
1.071
|
-0.718
|
-0.039
|
Trialability
|
|
|
|
|
|
|
|
Being able to try out ChatGPT was important in my decision to use it
|
trial1
|
0.706
|
Alpha = 0.880
|
4.16
|
1.112
|
-1.53
|
1.624
|
It is easy to stop using ChatGPT
|
trial2
|
0.812
|
C.R.=0.885
|
3.94
|
1.258
|
-1.19
|
0.361
|
I am more likely to use ChatGPT because it is freely available
|
trial3
|
0.859
|
AVE = 0.658
|
3.97
|
1.245
|
-1.163
|
0.244
|
I won't lose much by trying ChatGPT, even if I don't like it.
|
trial4
|
0.859
|
|
3.86
|
1.205
|
-0.983
|
0.101
|
Observability
|
|
|
|
|
|
|
|
I have seen others using ChatGPT
|
obs1
|
0.78
|
Alpha = 0.811
|
3.49
|
1.238
|
-0.353
|
-1.006
|
I will join ChatGPT after seeing my friends using it
|
obs2
|
0.714
|
C.R.=0.799
|
3.15
|
1.233
|
-0.108
|
-1.059
|
Other students seemed interested in ChatGPT when they saw me using it.
|
obs3
|
0.77
|
AVE = 0.571
|
3.32
|
1.331
|
-0.397
|
-1.019
|
Intention to Use
|
|
|
|
|
|
|
|
I would like to use ChatGPT for my learning
|
int1
|
0.715
|
Alpha = 0.750
|
3.8
|
1.165
|
-0.859
|
-0.202
|
I would definitely like to use ChatGPT for my education
|
int2
|
0.759
|
C.R.=0.770
|
3.9
|
1.174
|
-0.969
|
-0.019
|
I am sure to use ChatGPT to aid my learning process
|
int3
|
0.703
|
AVE = 0.527
|
3.78
|
1.231
|
-0.839
|
-0.351
|
Chi-sq./df = 1.567, GFI = 0.917, CFI = 0.974, TLI = 0.963, NFI = 0.931, RMSEA = 0.044
|
Furthermore, the constructs' square root of AVE was greater than inter-correlations, suggesting satisfactory discriminant validity (Hu and Bentler, 1999) (Table 3)
Table 3
|
RA
|
COMP
|
EOU
|
TRIAL
|
OBS
|
INT
|
Relative Advantage
|
0.801
|
|
|
|
|
|
Compatibility
|
0.095
|
0.802
|
|
|
|
|
Ease of Use
|
.248**
|
.188**
|
0.799
|
|
|
|
Trialability
|
.229**
|
0.066
|
0.101
|
0.811
|
|
|
Observability
|
− .167**
|
− .168**
|
− .714**
|
-0.059
|
0.755
|
|
Intention to use
|
.269**
|
.193**
|
.242**
|
.242**
|
-0.072
|
0.726
|
**. Correlation is significant at the 0.01 level (2-tailed).
|
Diagonal = Sqrt of AVE, below = correlations
|
With the measures of the construct deemed reliable and valid, the study next evaluated the validity of the proposed structural paths.
Structural Model Evaluation
With adequate reliability and validity of the measurement items for each construct, the study next evaluated the structural model in AMOS 28. As shown in the structural model (Fig. 2), all the posited enablers of students' intention to use ChatGPT found support, as validated by our responses. The path values and their corresponding significances are summarized in the figure below.
Relative advantage refers to the extent to which ChatGPT is perceived as beneficial for education compared to other available technologies. For instance, ChatGPT can provide quick information on any topic for students in remote locations, eliminating the need for physical tuition. Additionally, ChatGPT can help students customize their learning experience with time-saving flexibility (Halaweh, 2023). Our findings support the idea that students perceive ChatGPT as an innovative tool that enables them to pursue their educational objectives independently. As a result, the added benefits offered by ChatGPT as an educational tool encourage students to use it.
Compatibility, another crucial aspect of innovation, concerns the innovation's ability to meet users' actual needs. ChatGPT has been making headlines for its transformative impact on various sectors, including education. Regarding compatibility, ChatGPT aligns well with existing online educational tools that students can access from the comfort of their homes. However, some studies have highlighted concerns regarding students' potential misuse of ChatGPT (Cotton et al., 2023; King and ChatGPT, 2023). Developing appropriate mechanisms to prevent the misuse of advanced AI technologies is essential. This suggests that although ChatGPT is highly compatible with early adopters' needs, it should be used under monitored circumstances (Cotton et al., 2023). This could explain the positive and significant, yet low, path coefficient value of compatibility influencing students' intention to use ChatGPT.
The innovation diffusion theory highlights the ease of use or complexity as a potential feature influencing technology adoption. Consequently, our study considered ease of use as a factor potentially affecting students' intention to use ChatGPT. The findings revealed a positive and significant path coefficient, indicating that user-friendly features help students familiarize themselves with ChatGPT. Studies have shown that students find ChatGPT easy to use when seeking answers to questions (Arif et al., 2023; van Dis et al., 2023).
The fourth potential enabler, trialability, refers to the extent to which users can experiment with technology upon its introduction. ChatGPT provides ample opportunities for students to explore its capabilities, ensuring trialability significantly influences their intention to use the tool (Thorp, 2023). Lastly, the benefits of using ChatGPT should be observable and tangible to motivate users to adopt it. As supported by existing research, ChatGPT has demonstrated visible results across various sectors, including education (Rudolph et al., 2023; Kasneci et al., 2023). Our study indicates that as the visibility of the benefits of using ChatGPT in education increases, so does students' intention to use it.
A gender-based execution of our hypothesized model also reveals certain significant insights, as seen in Fig. 3 and Fig. 4. While male students prioritize compatibility (p = 0.042), ease of use (p = 0.009), and observability (p = 0.051) when considering ChatGPT adoption, female students prefer the ease of use (p = 0.045), compatibility (p = 0.043), relative advantage (p = 0.000), and trialability (p = 0.010). Furthermore, male students prefer user-friendly interfaces and related attributes more than female students and display greater interest in compatibility issues. This suggests that further research is necessary to explore gender-based differences in attribute preferences for ChatGPT adoption.
Sentiment Analysis Framework
Sentiment analysis comprises text analysis tools designed to identify opinions that emphasize positive or negative polarity regarding a specific process. In this case, we are primarily concerned with understanding students' views about the tool ChatGPT.
The Sentiment Analysis Framework is shown in Fig. 5. To perform any Natural Language Processing tasks over raw text data, it must be converted to machine-readable format, typically real-valued vectors that mimic the sentence's meaning and context. Using sentence transformers, this work converts the raw text data into real-valued vectors (sentence embeddings). A transfer learning approach is considered here, as the size of the dataset is small, and a pre-trained sentence transformer ("bert-base-nli-mean-tokens") is used to generate vectors (768 dimensions) for the given raw text. The generated sentence vectors (the unlabelled data points) are clustered into three distinct clusters: positive, negative, and neutral, using the K-Means clustering algorithm. The premise is that the Sentence embeddings (vectors) with similar sentiments will be clustered in the semantic space.
The scores presented in Tables 4, 5, and 6 represent the ten most similar sentences for the given sentence in each generated cluster. The similar sentences are inducted based on the cosine similarity metric. The cosine similarity score will measure the distance between two vectors in the semantic space regardless of their magnitude. As the raw text (sentences) are represented as vectors in semantic space, the cosine similarity will present with the most similar sentences for the given sentence. The cosine similarity metric ranges between − 1 to 1, where score '1' indicates the vectors that are very close by (similar sentences), '0' indicates the orthogonal vectors, and '-1' indicates the vectors that are opposite (most dissimilar sentences) in the semantic space. The cosine similarity of two vectors' a' and 'b' is calculated as given below,
cosine_similarity(a, b) = (a. b) / (||a|| * ||b||)
Table 4
Sample of top 10 similar sentences in the semantic space for the comment "nil" (Cluster 1, Neutral sentiment)
Scores (Cosine Similarity)
|
Similar Sentences
|
1.0000
|
nil
|
0.8750
|
nope
|
0.8668
|
nopes
|
0.8520
|
no
|
0.8287
|
np
|
0.7865
|
no comments
|
0.7857
|
didn't try
|
0.7816
|
cannot believe it
|
0.7598
|
i don't know about it
|
0.7429
|
I used it for assignments
|
Table 5
Sample of top 10 similar sentences in the semantic space for the comment "risky app" (Cluster 2, Negative sentiment)
Scores (Cosine Similarity)
|
Similar Sentences
|
1.0000
|
risky app
|
0.9271
|
risky application
|
0.8486
|
dangerous tools
|
0.7690
|
scared what is next with ai
|
0.7571
|
there are high chances of getting non-accurate answers.
|
0.7538
|
scary but useful tool is chatgpt
|
0.7261
|
chatgpt can be scary and terrific at the same time.
|
0.7256
|
yes will replace humans. be ready
|
0.7109
|
little bit afraid
|
0.7098
|
it becomes a troublemaker sometimes, by showing irrelevant information.
|
Table 6
Sample of top 10 similar sentences in the semantic space for the comment "great invention" (Cluster 3, Positive sentiment)
Scores (Cosine Similarity)
|
Similar sentences
|
1.0000
|
great invention
|
0.9757
|
excellent invention
|
0.9659
|
it is a great innovation
|
0.8982
|
very useful
|
0.8981
|
very much useful
|
0.8892
|
very helpful
|
0.8849
|
very good
|
0.8844
|
a very good platform
|
0.8830
|
amazing software
|
0.8789
|
very creative tool
|
For further analysis, the mean and standard deviation are calculated to evaluate the quality of the generated clusters. The mean, also known as the centroid, refers to the center of a cluster and is calculated by averaging all the data points in the cluster. The centroid is the point that minimizes the sum of the squared distances between all points in the cluster and itself (Wu et al., 2019). The standard deviation, a measure of the data spread within a cluster, is calculated as the square root of the variance. The variance is the average of the squared distances of each point from the mean. In K-means clustering, the objective is to minimize the sum of the squared distances between each data point and its assigned cluster's centroid (Wu et al., 2019). The mean and standard deviation assess how well the data points are clustered around their respective centroids. The clusters are considered well-separated and well-defined if the means are close to the centroids and the standard deviations are minor.
As shown in Table 7, the small standard deviation indicates that the data points within a cluster are very close to each other and spread around the mean.
Table 7
Mean and Standard Deviation of each Cluster generated by K-Means Clustering
Clusters
|
Mean
|
Standard Deviation
|
Cluster 1 (Neutral)
|
0.90
|
0.08
|
Cluster 2 (Negative)
|
0.70
|
0.09
|
Cluster 3 (Positive)
|
0.80
|
0.09
|
From Table 8, it can be observed that detecting female sentiment is more challenging because it tends to be less explicit. The accuracy is biased towards male-user opinions, and such gender biases can impact the conclusions drawn from the sentiment analysis, as each individual's mental and linguistic comprehension can lead to variations in the interpretation of the connotation of terms.
Table 8
Gender bias ratio over the opinions in each cluster
Clusters
|
Male
|
Female
|
Total
|
% Male
|
% Female
|
Cluster 1 (Neutral)
|
145
|
17
|
162
|
89%
|
11%
|
Cluster 2 (Negative)
|
85
|
40
|
126
|
67%
|
33%
|
Cluster 3 (Positive)
|
87
|
24
|
111
|
78%
|
22%
|
To mitigate personal bias in sentiment analysis, using a diverse and representative set of training data, evaluating the system's performance across demographic groups, and employing techniques to detect and correct algorithmic bias is crucial (Mayur et al., 2022). Therefore, to achieve unbiased accuracy, it is crucial to collect opinions with an equal gender ratio when using mixed-gender datasets.