3.1 Literature Search Results
A total of 11,347 papers were retrieved from WoS. After data inspection, deduplication, and cleaning, 10,754 papers were included in the study. These papers were exported in the form of "Plain Text - Full Record and References". An initial screening of CNKI yielded 2,152 papers. After further evaluation, 969 papers were selected and included in the study (Figure 1).
In the CNKI database (Figure 2A), the overall number of publications was relatively low and showed slow growth. There was a decline in 2018, followed by a recovery to normal levels. A rapid increase was observed from 2022. In 2023, the number of publications was 5.97 times greater than that in 2013. In the WoS database, the number of publications in the rare disease field showed a steady growth trend (Figure 2B). The growth rate increased significantly from 2018, although there was a marginal decline in 2023.
Based on a comparison of the domestic and international publication counts (Figure 2C), the field of research on rare diseases is continuously expanding both domestically and internationally. Compared to domestic journals, international journals show a higher level of attention and depth of research in this field.
Using a nonlinear index to fit the growth trend, the curve fitting equation for CNKI was set at Y = -134.6*exp(0.0567*X), R2 = 0.8894; WoS is Y = -72.97*exp(0.0376*X), R2 = 0.8644.
3.2 Database Literature Spatial Distribution (Core Countries/Institutions)
3.2.1 Countries and Regions
The country with the greatest number of publications in the WoS database was the United States, with 2423 papers. China ranked second with 1969 papers, followed by Italy and Germany, both with over 1000 papers. However, there was a significant gap between the top two countries and the rest (Table 1 ). This indicates the core position of the United States in the field of rare disease research. China also demonstrated strong research capabilities. Among the top 10 countries, six were European countries. From the Network of Collaborating Countries (Figure 3), extensive and frequent collaboration was observed among European and American countries. However, currently, China has limited research collaborations with other countries. In terms of literature centrality, the Netherlands had the greatest intermediary centrality (0.09), indicating close collaborative connections with other countries. The Netherlands has not only published a large volume of research output but also achieved high-quality results in the field of rare disease research, thus playing a crucial role.
3.2.2 Research Institutions
Among publishing institutions (Table 2 ), Chinese research institutions have abundant research output in both domestic and international databases. The Beijing Union Medical College Hospital, Chinese Academy of Medical Sciences, as a top medical and research center in China, is the only institution that ranks among the top 10 institutions in both domestic and international databases. It houses the National Key Laboratory for Rare and Difficult Diseases and focuses on diseases such as Gitelman syndrome, transthyretin amyloidosis cardiomyopathy, and hereditary retinal degeneration. Zhejiang University has the greatest number of publications in the WoS database and primarily focuses on rare diseases such as spinal muscular atrophy and Wilson's disease. Combining the results of the analysis of countries in the previous section, it is evident that European and American countries hold critical positions in the field of rare disease research. Meanwhile, the research capabilities of China are noteworthy.
3.3 Keywords
3.3.1 Co-occurrence of Keywords
Keywords are a reflection of the core content of the literature. In this study, we selected co-occurrence network graphs of keywords with a frequency greater than 100 in the WoS database (Figure 4B). The top three keywords were "rare disease" (1662), "diagnoses" (980), and "mutation" (833) (Table 3 ). "Management" (794) also had a high frequency. In the CNKI database (Figure 4A, Table 3), "rare disease" was also the most frequently appearing keyword (452), followed by "orphan drugs" (100) and "diagnosis" (38). A comparison of the two showed that "rare disease," "diagnosis," "therapy," and "Children" are among the top 10 keywords with the greatest frequency in both databases. This indicates that diagnosis, therapy, and children’s rare diseases are common themes of concern in this field at both domestic and global levels, and they represent the research focus. However, among the high-frequency keywords in the domestic database, certain terms were related to orphan drugs and rare drugs, which were also treatment-related. In contrast, the international database focused more on gene-related directions such as "mutation" and "expression," as well as rare disease management. This reveals the different research perspectives between domestic and international studies in this field. Chinese research institutions may place a greater emphasis on drug development and application, whereas international research tends to explore the mechanisms underlying rare diseases and the social management of special populations.
3.3.2 Keyword Clustering
Keyword clustering can reflect the different research focuses in a particular field. The smaller the clustering number, the more the keywords included in that cluster. In the WoS database (Table 4), orphan drugs and whole-exome sequencing were the most prominent research directions. Case reports were the primary form of research output, indicating that with the improvement of medical standards and advancement of diagnostic methods, a greater number of rare disease cases are being reported. In the CNKI database (Table 4 ), orphan drugs remain a research focus. The comparison between the two databases reconfirmed the difference in research focuses between domestic and international studies in the field of rare diseases. These research directions also indicate that international research on rare diseases has focused on findings at the molecular level, whereas Chinese journals continue to focus on treatment and medication, with less emphasis on the investigation of disease mechanisms. The development of the field of rare diseases in China is less advanced than the international community.
3.3.3 Keyword Timeline Graph
The timeline graph reflects the development of keywords within each cluster. In the WoS database (Figure 5B), the top ten keywords in terms of frequency have been appearing since 2013 and have maintained a high occurrence rate over the past 10 years. This indicates that the diagnosis, therapy, management, and genetic research of rare diseases constitute the foundation of this field. On this basis, medical genetics (2019), genomics (2018), and public health (2020) have advanced considerably, aiding rare disease research at a deeper investigative level. In the CNKI database (Figure 5A), high-frequency keywords such as "rare disease," "orphan drugs," and "rare drugs" related to medications have appeared since 2013 and have laid the foundation for subsequent research directions. Keywords such as "diagnosis," "therapy," and "clinical trials" only appeared first in 2015-2016, exhibiting a lag compared to international trends. Keywords related to rare disease management and social support in the field of social sciences also appeared relatively late, indicating that China still lacks sufficient social attention and policy support for rare disease populations. Based on observations of the keyword burst graph (Figure 6), the Ice Bucket Challenge once created a wave of enthusiasm in China but disappeared within a year. This indicates that while marketing-style dissemination can increase social awareness of rare diseases, it does not significantly impact social security in rare disease populations. Currently, the research focus is concentrated on medical security, and governmental influence may significantly improve the lives of rare disease populations in China.
3.4 Authors
In the WoS database, Taruscio, Domenica, Boycott, Kym M, and Baynam, Gareth were the top three authors in terms of publication volume (Table 5 ). The three authors constituted the center in the collaboration network, and their collaboration has gained prominence in the past 5 years (Figure 7B). Before this period, the author collaboration network was not significant, with the majority of authors producing independent work. This indicates that contemporary research places greater emphasis on multi-team collaboration and multi-center studies. Taruscio, Domenica, Boycott, Kym M, and others have contributed to rare disease research for long, consistently producing research output, and they are important scholars in this field.
The author collaboration network reflected in the CNKI database is relatively close but shows clear stages (Figure 7A). In 2013-2014, Pei-Wen Wang and Jin-Ping Xie were the central authors in the network. The collaboration network centered around Shu-Yang Zhang, Bo Zhang, and Meng-Chun Gong lasted for a longer duration. The publication volume clearly indicated that this team has made outstanding contributions to the field of rare diseases (Table 5). However, their output decreased significantly in the past 3 years, and the formation of novel collaboration networks is not yet apparent.
3.5 Co-cited References
Highly cited references can indicate the hot topics in a research field. By analyzing these co-cited references, the dynamic changes in research topics within a specific time range can be identified. The top three co-cited references were "Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database [14]", "Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology[15]", and "The mutational constraint spectrum quantified from variation in 141,456 humans[16]" (Table 6). The first-ranked reference was cited as many as 156 times. We used the Orphanet database to estimate the cumulative point prevalence of rare diseases. Of the 6,172 unique rare diseases, 71.9% were genetic, and 69.9% manifest in childhood. The second- and third-ranked references were both related to genomics.
Table 7 presents further analysis of the hot topics and progress in the field of rare diseases over the past 10 years. This study selected the top 15 references with the strongest citation bursts. Among them, six articles were related to genomics[15-20], and three articles reported database-related content, including those related to the Orphanet [14] and HPO databases[21, 22]. Two articles introduced disease-related platforms, namely The Matchmaker Exchange[23], a platform for rare disease gene discovery, and PhenoTips[24], a software for patient phenotype analysis for clinical and research purposes. The remaining articles cover topics such as writing standards for reviews[25], disease terms definition [2], social science research[21], and reviews[26] on rare diseases. Evidently, genomic research is a hot topic in the field of rare diseases. Contribution to disease diagnosis and treatment using big data and artificial intelligence is also a growing trend. With social progression, minority groups are gradually receiving more attention, leading to an increase in social science research on rare disease populations.
Centrality can indicate the importance and influence of literature in a specific field. In this study, the top four cited references with the greatest centrality were selected (Table 8). Among them, one article had a centrality ≥ 0.1. This article introduced the use of exome sequencing to identify disease-causing genes[27], reaffirming that genetics is a research hotspot and development trend in this field.