Prolific countries based on posts
The countries leading in posts about ChatGPT include those with a strong research focus on artificial intelligence, such as the United States, Japan, and the United Kingdom. These are shown in Fig. 1. China may be among the top countries for tweets related to ChatGPT research despite Twitter being banned in China. However, other social media platforms popular in China, such as Weibo and WeChat, may also be used to discuss ChatGPT research. The social media pattern of ChatGPT's mentions reveals that Twitter is the platform with the most mentions (94% total mentions).
Highly mentioned articles – geographical analysis based on AAS
Table 1 presents the top ten articles based on AAS, sorted by the top five countries for each article. The Table displays the percentage of mentions from each country for each article. We observe that the United States and Japan are the top two countries with the highest percentages of mentions. The United Kingdom and Spain also have a considerable number of mentions, and their percentages are consistently high across the top ten combinations. While developing economies like India and Brazil have a relatively lower percentage of mentions in the top ten list, their presence in some combinations indicates their contributions to the ChatGPT research field and their potential for further growth and collaboration.
Top source titles based on mentions
Table 2 displays the source titles that received the most mentions in tweets, news, blogs, and Facebook posts related to ChatGPT and the number of research outputs associated with each source. Nature had the highest total mentions, accounting for 43% of all mentions, followed by medRxiv at 34% and arXiv at 23%. Science received 4% of the total mentions, while PLOS Digital Health accounted for 2%. Radiology, JMIR Medical Education, Cureus, Accountability in Research, Patterns, and the Journal of Educational Evaluation for Health Professions received 1% or less of the total mentions in the Table.
The Table suggests that Nature, medRxiv, and arXiv were the most prominent sources of research related to ChatGPT, with a significant percentage of the total mentions across all social media platforms. Approximately 98% of the total mentions in the Table came from Twitter, indicating that it was the most popular platform for discussing articles related to ChatGPT. The remaining 2% of mentions were split between news, blog, and Facebook mentions. This suggests that Twitter is the primary social media platform for individuals to share and discuss content related to ChatGPT.
The top source title with a high Altmetrics Attention Score is Nature (AAS=8192), known for its research in artificial intelligence and natural language processing, followed by Preprints medRxriv (AAS=4206), arXiv (AAS=3299), and PLOS Digital Health (AAS=1233).
Most mapped Fields of Research (FoR) by total Mentions
Table 3 displays the percentage of total mentions for the top five fields of research related to ChatGPT. Information and computing sciences had the highest percentage of total mentions with 73%, followed by biomedical and clinical sciences with 38%, and language communication and culture with 18%. Philosophy and religious studies had 11% of total mentions, while education had the lowest percentage with 5%.
Fields of Research linkages
Next, we analyzed the linkages between the Fields of Research. The network formed by fields of research linkage based on citations is shown in Figure 2. This resulted in 5 clusters, as described in Table 4.
Cluster 1 (red) predominantly focuses on biomedical, clinical, and health sciences and has a cumulative AAS of 6347 from 87 articles. It's AAS/TP is 73. Cluster 2 (green) has the highest cumulative AAS score of 15278 from 317 articles and is mapped to information and computing sciences. It also has the highest TC of 317 and AAS/TP of 81.3. Cluster 3 (blue) articles are mapped to language, communication, and culture with a total AAS of 6546 from 84 articles and an AAS/TP of 77.9. Cluster 4 (yellow) articles are mostly mapped to philosophy and religious studies, with an AAS of 558 from 33 articles and a modest AAS/TP of 16.9. Finally, Cluster 5 (violet) is mapped to education and has an AAS of 1203 from 35 articles and an AAS/TP of 34.4. Early trends show that ChatGPT research articles are most mapped to Information and Computing Sciences and Engineering (TC=317, AAS=15278) and Biomedical and Clinical Sciences and health sciences (TC=240, AAS=6347). Five clusters of articles emerge based on the mapping of ChatGPT research articles to FoR.
Cluster analysis with major themes
Cluster 1: Biomedical and clinical sciences
The top 3 articles based on AAS in cluster 1 are further analyzed, as seen in Table 5.
The highest altmetric attention scored article by Kung, Tiffany H et al. (2022) evaluated the performance of ChatGPT on the three staged United States Medical Licensing Exam (USMLE). Without any specialized training or reinforcement, ChatGPT performed well and scored almost or near the passing score. The display of concordance and insights in answers was the key highlight of ChatGPT and implied the potential of large language models to assist medical education and clinical decision-making.
Biswas (2023) attempted to provide a proof by example that medical writing is going to be heavily dependent on AI and chatbots. His experiment was to pose an essay question about radiology training to ChatGPT and the bot provided a confident, human-like answer reflecting on how good a training it has got for radiology. The work concludes by providing some cautions related to ethics, legal issues (including medico-legal issues), innovation, accuracy, bias and transparency for the usage of ChatGPT to move forward with medical writing.
The study by Liebrenz, Michael et al. (2023) found that ChatGPT's usage in overcoming the language barrier of publishing relieves non-native speakers. ChatGPT's ability to produce misleading and inaccurate content may place medical research at risk of spreading misinformation. Another major challenge comes from Open AI's prospects of monetizing the product after an initial period of free access, as this may widen the existing international inequalities in publishing. Elsevier group of publications' decision to not allow ChatGPT as an author and to demand proper acknowledgment of its use is judiciously based on the apprehension about 'originality' and 'accuracy' of AI-generated text and on the grounds of 'accountability' of the content produced by ChatGPT.
Cluster 2: Information and computing sciences
The analysis of the top 3 articles based on AAS in cluster 2 is presented in Table 6.
The article by Stokel-Walker (2023) with the second highest AAS score reported that some scientific manuscript submissions listed ChatGPT as an author in the byline information, causing concern to journal editors and publishers. This forces editors and publishers to devise suitable policies to restrict the use of ChatGPT in scientific authorship.
Nature's editorial piece "Tools such as ChatGPT threaten transparent science; here are our ground rules for their use " (Nature Editorial, 2023) is the fourth most AAS-scored work. It draws inputs from articles published in Nature about the AI-LLM bot's efficiency in publishing feel-genuine research manuscripts, and editors are already receiving submissions crediting authorship to ChatGPT. In this work, two principles were introduced by Nature and Springer Nature journals (some other journals are on their way to adopting these). These are: (i) No LLM tool will be used as a credited author on a research paper (as AI tools cannot be held accountable while accountability is a primary characteristic of authorship), and (ii) researchers should properly acknowledge the use of LLM tools in methods, acknowledgment, introduction or any other suitable sections.
The paper by Gao et al. (2022) found that scientists failed to differentiate between abstracts written by ChatGPT and original abstracts. Upon a test by researchers at Northwestern University, Chicago, led by Professor Gao, the AI output decoder successfully distinguished original abstracts from ChatGPT-generated abstracts while Plagiarism detectors drastically failed. Also, medical researchers correctly identified 68% of abstracts written by ChatGPT and 86% of original abstracts. This outlined the ability of AI-LLM bots to generate convincing medical research articles.
Cluster 3: Language, communication, and culture
The top 3 articles according to AAS in cluster 3 have been analyzed in more detail, as presented in Table 7.
The editor's note from Science (Thorp, H Holden 2023) declared the updating of license and editorial policies of the journal so that not only the text but also the figures, images, or graphics generated by ChatGPT cannot be used in submissions. It also specified that violating these policies will invite actions for scientific misconduct equivalent to altering images or plagiarism of existing works. However, the intentional production of legitimate datasets by AI is excluded from such actions.
The news piece in Nature titled "AI bot ChatGPT writes smart essays - should professors worry?" by Stokel-Walker (2022) is mainly based on the opinions expressed by Lilian Edwards (New Castle University, UK), Dan Gillmor (Arizona State University, United States), Thomas Lancaster (Imperial College, UK), Aravind Narayanan (Princeton University, United States) and Sandra Wachter (Oxford Internet Institute, UK). Edwards observed that ChatGPT is so good that there is no point in using essays for assessment. Upon Gillmor's test on ChatGPT by feeding a homework question that he often assigned his students, ChatGPT produced a response that would have earned a good grade. Lancaster found no 'game changer potential' in ChatGPT as, according to him, it is trained to generate a new pattern of words based on the pattern of words used it has seen before. Narayanan opined that the 'essays for assignment' problem could be tackled by reworking the assessment priority to encourage critical thinking and reasoning. Wachter found ChatGPT exciting and worrying simultaneously, as students might be outsourcing their writing and thinking. But the challenges are not insurmountable.
And finally, the news by Else (2023) published in Nature is mainly about the experiment conducted by Professor Gao and his team. It mainly reports the opinions of Sandra Wachter (University of Oxford), Aravind Narayanan (Princeton University), and Irene Solaiman (Hugging Face, an AI Company). Wachter cautioned of dire consequences to researchers as they might be misled by flawed research and society, as scientific research plays a huge role in society. Narayanan opined that those who are into serious research are unlikely to use ChatGPT and added that the focus should be more on the incentives that lead to 'publication pressure' that forces desperate measures like usage of ChatGPT and mentioned practices like hiring and promotions based on mere counting of publications should be checked. Solaiman insisted that fields like medical science, where misinformation can be fatal, should adopt a more rigorous approach to ensure information accuracy and people's safety.
Cluster 4: Philosophy and religious studies
The top 3 articles according to AAS in cluster 4 have been analyzed in more detail, as presented in Table 8. The article "Rapamycin in the context of Pascal's Wager: generative pre-trained transformer perspective" (Transformer & Zhavoronkov, 2022) has ChatGPT as the first author. It explored the benefits of taking Rapamycin in the context of the philosophical argument 'Of Pascal's Wager.' ChatGPT successfully picked up its pros, like anti-aging effects and life-extension capabilities in animals. Drawbacks of the medicine, especially long-term risks like potential increase in cholesterol and chances for developing diabetes, were also correctly retrieved by ChatGPT. Notably, a wise recommendation to consult health care professionals is also given by ChatGPT.
According to Dowling & Lucey (2023), the assessment of ChatGPT for the research process at all four stages, from idea creation to testing, is carried out. It found that with the addition of private data (rather than public data) and the researcher's expertise, ChatGPT's results are likely to become more impressive. This work favored the usage of ChatGPT as a research assistant. Regarding the ethical concern of authorship, it draws an analogy to the Banarama Conjecture, where the extent of usage (of ChatGPT in research and level of human supervision) matters most to deciding authorship. This approach can be more suitable than plain acceptance of ChatGPT's authorship or a blanket ban on its authorship.
The preprint by Krugel et al. (2023) remarked that though the assistance of ChatGPT is beneficial for many purposes, it turns out highly inconsistent as a moral advisor, mainly when different codes of morality exist in society. Despite this, its influence on the user's judgment is predominant. This work views ChatGPT as a threat that corrupts users' judgment. It highlights the need for responsible use of ChatGPT and similar AI and recommends training to improve digital literacy to ensure that.
Cluster 5: Education, curriculum, pedagogy
The top 3 articles according to AAS in cluster 5 are shown in Table 9. The work of Rudolph et al. (2023) laid out the implications for higher education and discussed the future of learning, teaching, and assessment. It is observed that ChatGPT can be beneficial in providing conceptual explanations and applications. However, AI is deemed less competent for content that requires higher-order thinking (critical, analytical thinking). It also provided separate recommendations to students, faculty, and higher education institutions in the context of ChatGPT and other AI tools.
The position paper by Kasneci et al. (2023) presented LLMs' potential benefits and challenges from students' and teachers' perspectives. It then discussed how these models could be used to create educational content, improve student engagement and interaction and personalize learning experiences. It also highlighted that clear strategies within educational systems with a strong emphasis on critical thinking and fact-checking strategies are required to reap the full benefit of these models. It also provided recommendations on addressing these challenges to ensure responsible usage of these models for education.
The preprint by Frieder et al. (2023) investigated the mathematical capabilities of ChatGPT by testing it on publicly available datasets and newly developed benchmark datasets. Upon evaluation of ChatGPT on a benchmark dataset that covers graduate-level mathematics, ChatGPT's mathematical abilities are significantly below that of an average mathematics graduate student. ChatGPT understands the question but fails to provide the right solutions due to a lack of ability for mathematical comprehension.
Who is tweeting?
Altmetric analyzed the top categories of Twitter users who tweeted about ChatGPT articles by examining their profile descriptions and the types of journals they linked to. The categories of users were divided into three groups: scientists who were familiar with the literature, practitioners who were clinicians or researchers working in clinical sciences, and science communicators who frequently linked to scientific articles from various journals and publishers. The resulting Table provided insights into the demographics of Twitter users engaging with ChatGPT content.
Table 10 shows that scientists had the highest percentage of tweets for the top eight ChatGPT articles, ranging from 21% to 26%. Science communicators had the second-highest percentage of tweets, ranging from 3% to 4%, while practitioners had the lowest percentage of tweets, ranging from less than 1% to 9%. This indicates that scientists are the most engaged group when tweeting about ChatGPT articles, followed by science communicators, while practitioners are the least engaged group. Overall, the Table suggests that individuals primarily discuss chatGPT articles with a scientific background or an interest in science communication.