Study Design
This research is a quantitative, cross-sectional study that aims to identify the extent by which health science articles have incorporated the use of LLMs. The study covers the years 2022 and 2024, enabling comparison of results at two different intervals and, crucially, allowing for a comparison between the pre and post-ChatGPT eras. For this study, we define the pre-ChatGPT era as the period before November 30, 2022, when ChatGPT was publicly released. Specifically, our pre-ChatGPT data comes from January 1 to May 31, 2022. The post-ChatGPT era is defined as the period after ChatGPT's public release, with the data covering January 1 to May 31, 2024. This timeframe allows for a full year of potential ChatGPT influence on academic writing and research practices. To achieve these goals, it was important to delineate clear criteria for the subject’s eligibility for this study to avoid over- or under-inclusion of potential publications and to ensure consistency in the evaluation of the utilization of LLMs in publications from the health sciences. These criteria were used in all twelve health science disciplines under consideration in this research.
Eligibility-Criteria
Inclusion-Criteria
(i) Publication-Type:
-
No limits were placed on the type of articles that were considered in the analysis, as all those that are included in the Web of Science Core Collection were considered
-
This broader inclusion facilitates a comprehensive investigation into the use of LLMs across different aspects of scientific writing.
(ii) Time-Frame:
-
The articles were retrieved from medical journals, and other sources from January 1 to May 31 of 2022 and the same time frame in 2024.
-
This approach helps avoid the effects of seasonality while still allowing for the year comparison which may otherwise be influenced.
(iii) Language:
-
Only English-language publications were included in this study, to ensure consistency in keyword analysis, as the LLM-associated keywords are primarily developed in English-language contexts. English remains the predominant language of international scientific communication, allowing for a representative sample of global scientific discourse.
-
This approach also ensures direct comparability across different countries, institutions, and disciplines without the confounding factor of language differences.
-
Many widely used LLMs, like ChatGPT, are primarily trained on English data, making their impact more noticeable and easier to detect in English-language publications. While this may limit insights into non-English scientific communities, it allows for a more controlled analysis of LLMs’ effects on scientific writing.
(iv) Indexing:
-
Publications indexed in Web of Science Core Collection databases were used in the study (SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH, ESCI).
(v) Discipline:
(vi) Full-text Availability:
Exclusion Criteria
(i) Publication Type:
(ii) Language:
-
Non-English publications were excluded.
-
While this may limit the study's global representation, it ensures consistency in keyword analysis.
(iii) Duplicate Publications:
(iv) Incomplete Metadata:
-
Publications with incomplete metadata (missing author affiliations, countries, journal titles, or publisher names) were excluded to ensure comprehensive analysis across all study dimensions.
Data Source
The current study utilizes the Web of Science Core Collection as the primary data source due to its extensive indexing and rigorous standards. Several specific databases are used in the study, which are integrated into the Web of Science: the SCI-EXPANDED, SSCI, A&HCI, CPCI–SCIENCE, CPCI–SSH, and ESCI. It ensures the comprehensiveness of coverage of the peer-reviewed literature across health science disciplines.
Disciplines and Time Frame
The Health Science disciplines and associated keywords of the study are summarized in Table 1. The analysis covers publications from the first five months (January 1 to May 31) of 2022, and 2024, allowing for a comparison of trends before and after the public launch of ChatGPT in November 2022, while controlling for potential seasonal variations in publication patterns.
Search Strategy
A search query was developed for each discipline using a combination of Web of Science Categories and Research Area fields. For example, the query for Medicine (General & Internal) was structured as Web of Science Categories=(Medicine, General & Internal) AND Web of Science Categories and Research Area =(General & Internal Medicine).
Table 1
Health Science Disciplines and Associated Keywords
Discipline | Search Keywords |
1. Medicine | Medicine OR Medical OR Physician* OR Surgery OR Surgical OR Surgeon* |
2. Dentistry | Dentistry OR Dental OR Dentist* |
3. Pharmacy | Pharmacy OR Pharmaceutical* OR Pharmacist* |
4. Nursing | Nursing OR Nurse* |
5. Physical therapy | "Physical therapy" OR Physiotherapy OR "Occupational therapy" or "Physical therapist" OR Physiotherapist OR "Occupational therapist" |
6. Public health | Public health OR Epidem* |
7. Psychology | Psycholog* OR Psychotherapy OR Psychotherapist |
8. Nutrition | Nutrition* OR Diet* |
9. Bioinformatics | Bioinformatics or "computational biology" or omics or genomics or proteomics or transcriptomics or metabolomics |
10. Health informatics | "Health informatics" OR "medical informatics" OR "clinical informatics" OR "biomedical informatics" OR "e-health" OR "mHealth" OR "digital health" OR "health information" |
11. Complementary and Alternative Medicine | "Integrative medicine" or "complementary medicine" or "traditional medicine" or "Holistic Medicine" or "Herbal Medicine" or "Natural Remedy" OR naturopathy or ayurveda or siddha or unani or homeopathy or "Chinese medicine" |
12. Veterinary Medicine | Veterinar* OR "Animal health" |
Combined | Medicine OR Medical OR Physician*OR Surgery OR Surgical OR Surgeon* OR Dentistry OR Dental OR Dentist* OR Pharmacy OR Pharmaceutical* OR Pharmacist* OR Nursing OR Nurse* OR "Physical therapy" OR Physiotherapy OR "Occupational therapy" or "Physical therapist" OR Physiotherapist OR "Occupational therapist" OR Public health OR Epidem* OR Psycholog* OR Psychotherapy OR Psychotherapist OR Nutrition* OR Diet* OR Bioinformatics or "computational biology" or omics or genomics or proteomics or transcriptomics or metabolomics OR "health informatics" OR "medical informatics" OR "clinical informatics" OR "biomedical informatics" OR "e-health" OR "mHealth" OR "digital health" OR "health information" OR "integrative medicine" or "complementary medicine" or "traditional medicine" or "Holistic Medicine" or "Herbal Medicine" or "Natural Remedy" OR naturopathy or ayurveda or siddha or unani or homeopathy or "Chinese medicine" OR Veterinar* OR "Animal health" |
Adjectives & Adverbs | Search Keywords |
| Intricate OR meticulous OR meticulously OR commendable OR notable OR pivotal OR invaluable OR noteworthy OR methodically OR strategically |
Publication Date (From & To; YYYY/MM/DD) |
2022-01-01 | 2022-05-31 |
2024-01-01 | 2024-05-31 |
LLM Usage Assessment
The study utilizes a keyword-based approach to assess potential LLM usage, based on the method outlined by Gray (2024). Ten keywords identified as indicators of potential LLM usage were selected: intricate, meticulous, meticulously, commendable, notable, pivotal, invaluable, noteworthy, methodically, and strategically [13]. A full-text search was conducted for each discipline and year using these keywords. To account for general trends in language use, a set of neutral control keywords (consider, conclusion, furthermore, relative, technical) were also searched. These control keywords help distinguish between general changes in academic writing style and specific increases potentially attributable to LLM use. The selection of these ten adjectives and adverbs (intricate, meticulous, meticulously, commendable, notable, pivotal, invaluable, noteworthy, methodically, and strategically) as potential indicators of LLM-assisted writing is based on the methodology outlined by Gray (2024). The terms were chosen for their linguistic sophistication and observed higher frequency in known LLM-generated texts compared to typical human-written academic content. They often appear in LLM outputs to provide nuanced descriptions, emphasize importance, or add a layer of academic-style writing. While not exclusively used by LLMs, the combination and increased frequency of these terms can serve as potential markers of LLM involvement in text generation. It's important to note that the presence of these terms alone does not conclusively prove LLM usage, but rather indicates a higher likelihood of LLM assistance when found in atypical frequencies or specific combinations within health science publications.
Data Collection and Analysis
For each discipline, year, and keyword (including control keywords), the total number of publications, the number of publications containing each keyword, and several publications containing any of the LLM-associated keywords were collected. Additional metadata, including author affiliations, countries, journal titles, and publisher names, was extracted for publications containing LLM-associated keywords. The prevalence of potential LLM usage was calculated as the percentage of publications containing LLM-associated keywords relative to the total number of publications. Year-over-year changes in prevalence were calculated for each discipline, and relative changes were computed to allow for direct comparisons.
Statistical Analysis
The study employs both descriptive and inferential statistical methods. Descriptive statistics summarize the prevalence of LLM-associated keywords across various dimensions. Chi-square tests of independence compare the proportion of publications with LLM-associated keywords across years and disciplines.
Comparative Analysis
-
The study includes comprehensive comparative analyses across multiple dimensions. Disciplines are ranked based on their prevalence of LLM-associated keywords. Institutional analysis identifies and ranks institutions based on their contribution to publications with LLM-associated keywords. Country-level analysis examines the percentage contribution of each country to the total number of publications with LLM-associated keywords (Top Tier: ≥ 90th percentile)
-
High Tier: 75th to 90th percentile
-
Upper-Middle Tier: 50th to 75th percentile
-
Lower-Middle Tier: 25th to 50th percentile
-
Bottom Tier: < 25th percentile
Journal analysis calculates the percentage of publications with LLM-associated keywords for each journal and ranks them accordingly. Publisher analysis determines the percentage contribution of each publisher to the total number of publications with LLM-associated keywords and ranks publishers based on their contribution. A p-value of < 0.05 was considered as significant
To enhance the accuracy of detecting potential LLM usage, we incorporated a frequency analysis of LLM-related words, complementing our existing keyword-based method. We expanded our list of LLM-associated terms to include 50 adjectives and adverbs commonly found in LLM-generated text (e.g., "comprehensive", "nuanced", "significantly", and "effectively"), based on literature review and analysis of known LLM-generated academic texts. The occurrences of these words were counted for each paper in our dataset and calculated word density (occurrences per 1000 words) to normalize for varying paper lengths. Through a pilot study of 500 papers (250 known to use LLMs and 250 known not to), we established frequency thresholds indicative of potential LLM usage, flagging papers exceeding the 75th percentile of word density in the LLM-used group as high-probability LLM-influenced. We then analyzed changes in word frequency patterns across years (2022, 2024) and between different categories (countries, institutions, journals), employing statistical tests (chi-square and t-tests) to assess the significance of observed differences. This method allows us to capture more subtle influences of LLM tools on academic writing style, providing a complementary measure to our existing approach and a more nuanced picture of LLM influence in scholarly publications.
Ethical Considerations and Limitations
The study uses publicly available bibliometric data and does not involve human subjects, ensuring data privacy. No individual authors or specific papers are identified in the analysis. The methodology, including search strategies and analytical methods, is fully disclosed to ensure reproducibility.