Mapping Global Knowledge Domain, Research in Information Retrieval in Medical Sciences: A Scientometric and Evaluative Study

doi:10.21203/rs.3.rs-184663/v1

Download PDF

Manuscript

Mapping Global Knowledge Domain, Research in Information Retrieval in Medical Sciences: A Scientometric and Evaluative Study

https://doi.org/10.21203/rs.3.rs-184663/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Objective: The main goal of this paper is to visualize and draw the intellectual and cognitive structures of information retrieval (IR) in the medical sciences through the use of Science Mapping.

Methods: This Scientometrics research has been undertaking using Mapping Knowledge Domain (MKD) methods and drawing scientific maps to analyze scientific products and show their trends. In this study, we recruited all documents indexed in the Web of Science database (WOS), with the topic of storing and retrieval of information in medical sciences. To analyze the results, 3 software SciMAT-v1.1.04, VOSviewer-v1.6.14, CitNetExplorer_v1.0.0 were used.

Results: Our results show that most scientific productions in this field fall into two categories: 1. Effective methods of organizing information and 2. Application and operation of IR system in the process of intelligent questioning and answering, and analyzing information behaviors of physicians and health professionals". The results showed that the similarity index increased over time from 0.43 to 0.71. Analysis of the findings shows that Similarity measures, Expert systems, Concepts, Experience, Answers, and Multi-model IR clusters are considered as mature and completely centralized clusters in the first quarter of the strategic chart.

Conclusions: The effectiveness of scientific documents based on answering clinical questions and focusing on health professionals' information behaviors has increased compared to search methods and tools.

Information Retrieval and Management

Medical and Health Science

Information Retrieval

Citation network

Scientometric

CitNetExplorer

VOSviewer

In recent years, there has been a drastic change in the way information is been disseminated in the scientific world. Huge amounts of scientific evidence make it more difficult for researchers to access the information they need. Therefore, information retrieval (IR) technologies have been developed to answer the information needs of researchers and scholars and to help them retrieve the most appropriate and diverse scientific resources to the questions provided to fully meet their information needs (Alves et al. 2018; Di Girolamo 2019; Xu et al. 2019).

The tools created in the science of data retrieval are used to achieve the maximum content produced, both directly and indirectly. In the past decades, the use of language modeling, filtering, recommendation systems, and answering interactive questions has become the main area of research, and these researches seem to be focusing more on users, including modeling behavior, and fixing user interface has become more important. But information technology, information systems, and data retrieval have changed in ways that can't even be imagined. These changes occurred so rapidly that it is difficult to predict what will happen in the next 20 which will make it difficult to understand the quality of scientific development in the field of IR, especially in medical sciences (Harman 2019; Li et al. 2013). Discovering medical information retrieval (MIR) subject trends, as a process to obtain useful information from the explosive amount of data available in the field of health, which help improving healthcare services, and provide better therapeutic responses for patients, have always been of interest to researchers in the field (Irmawati et al. 2019).

In this regard, today the ability to map out concepts, ideas, and issues in various scientific fields has become very important for logical reasons which will help in achieving the goals of policymakers. As a result of the information explosion in medical sciences, several ways of knowledge mapping have been deduced in this field. The scientometric method is the quantitative study of documents in a scientific field that is done through experimental methods such as visualization, citation analysis, citation analysis, vocabulary analysis, etc. (Ding et al. 2001; Janssens et al. 2005).

In scientometric research, one of the indicators of the maturity of a research field is the growth in the number and quality of research publications. By examining scientific productions in a particular field, it is possible to understand the nature of research in that field. This method of evaluating the status of a field can reflect the current content and orientations of that field by providing a lot of information (Franker 2020; Mohammadi et al. 2019; Rorissa and Yuan 2012). The previous paradigm and intellectual base of a discipline that is reflected in its existing scientific productions can inform us about future research fronts and the analysis of research frontiers and the development of unique research areas or topics. Overall, these connections "represent the cognitive structure of the research area" (Lu et al. 2020; Rorissa and Yuan 2012).

Tracking and drawing the developments in the subject areas of science by determining and separating the most important documents and analyzing them in scientometric studies is of increasing importance and is growing rapidly. In this regard, MKD as a research method provides ways to extract the mass of data obtained. MKD or knowledge graph or knowledge visualization research is a cross-sectional approach that includes applied mathematics, information science, and computer science, and is a new field of scientometric research. MKD produces graphic representations that represent the processes of development and structural relationships in a field of scientific knowledge. This method is an effective tool for tracking the boundaries of science and technology, showing scientific research that helps in making scientific and technological decisions. By mapping out what is known in sciences, its most notable elements can be visualized, and it will help researchers to extract the knowledge and interact with it (Li et al. 2019; Zeng et al. 2017; Zhu et al. 2015; Zou and Vu 2019).

In this paper, our main goal is to visualize and draw the intellectual and cognitive structures of the IR subset in the medical sciences through Science Mapping. Scientific mapping or bibliographic mapping focuses on determining areas of research in a scientific field to determine the cognitive structure and its evolution (Cobo et al. 2011). By presenting various illustrated maps of data related to scientific production in the field of IR in medical sciences and focusing on large data and discovering patterns in this data, an attempt is made to gain insight into the nature of the research structure of IR knowledge as at present. As well as provide perspectives for future researches in this field. Therefore, this research has been done with MKD methods and drawing scientific maps to analyze scientific products and draw trends.

In this section, we describe two related categories of information retrieval work in medicine and KMD in medical science

Information Retrieval in Medical and Health domain

Regarding information retrieval in the field of medical sciences, research is very diverse and focuses on the application of data analysis methods, especially artificial intelligence tools in data retrieval. There is also research on optimal recovery methods and trial and error of semi-automated strategy formulation methods in various fields. The advent of Web 2 and social media have also been studied in medical information retrieval. For example, Di Girolamo reminds us of the effective emergence of social networks in medical information retrieval; such as Facebook, Google scholar, Instagram, YouTube (Di Girolamo 2019). Balaneshinkordan and Kotov have examined the Bayesian approach to combining different types of the biomedical knowledge base in information retrieval systems to support clinical decision making in precision medicine (Balaneshinkordan and Kotov 2019). Núñez Reiz et al. have examined big data analysis and machine learning in the ICU (Núñez Reiz et al. 2019). Milliken et al. have designed an accurate medical article retrieval system (ARtPM) and provides a ranking system for related articles to summarize specific medical cases (genetic types, diseases, demographics, and other medical conditions) (Milliken et al. 2019). Salvador et al. examined errors in search strategies in a systematic review and their effects on data retrieval (Salvador-Oliván et al. 2019). Ahmed et al. have used information retrieval and data mining methods to retrieve medical images as one of the most important medical resources (Ahmed et al. 2016). Hess describes techniques for accessing information through various sources on the Internet (Hess 2004).

Mapping knowledge domain

In the field of information retrieval, limited studies have been conducted on scientific mapping and the use of KMD. One of the most notable is Zhao's research, which uses CiteSpace II information visualization software to map knowledge based on cross-language information retrieval (CLIR) data and analyze CLIR research points (Zhao and Rui 2011).

Zowj, Ghane, and Ehsanifar have identified the process of data retrieval research using the author's citation network (Zowj et al. 2019). Ding et al. have mapped the intellectual structure of the field of information retrieval (IR) during the period 1987-1997. Keyword analysis was used to reveal patterns and trends in the field of IR by measuring the co-occurring power of the terms of the respective publications or other texts produced in the field of IR (Ding et al. 2001).

As mentioned, in the field of information retrieval in medical sciences, until this research was done, no scientometric study or KMD has been conducted to analyze research in this field. In the most relevant research, Kim et al. have used KMD methods to examine historical footprints, emerging technologies, and challenges in using UMLS resources and tools to present potential future directions (Kim et al. 2020). But KMD has been used in many studies in specific fields of medical sciences. For example, Ebener et al. have explored the possibility of integrating knowledge mapping into a conceptual framework that can be used as a tool to understand the many complex processes involved in a health system and to identify potential gaps in knowledge translation processes to address them (Ebener et al. 2006). Zhang et al. have examined the main topics of research on military health and medical research and aims to be used as a reference for development in military health and medicine (Zhang et al. 2017). Chen et al. has used CiteSpace as a research tool to explore key factors, such as the focus of research, key researchers, its evolution, and important results over the past ten years (Chen et al. 2020). Cargnel et al. improved the laboratory diagnostic research capacity of emerging diseases using knowledge mapping. Raju Vaishya has examined the trend of publishing in 3D printing in the field of orthopedics using KMD (Cargnel et al. 2020). Zhao conducted research to reveal the general state of research on the Ebola virus by mapping knowledge of the Ebola virus literature around the world (Zhao et al. 2015).

In this study, we recruited all documents indexed in the Web of Science database with the topic of storage and retrieval of information in medical sciences. To achieve maximum comprehensiveness, SCI-EXPANDED and SSCI collections were selected. The search for the documents took place on March 15, 2020. To illustrate the thematic process, all the documents available on the Web of Science database in the search field, which were published from 1968 and 2020, were examined.

Search Strategy

Searching for resources in the Topic field was done with the following strategy: ((Retrieval AND (information* OR storage* OR data* OR system* OR article* OR research* OR image*)) AND (health* OR Medic*)).

Inclusion and Exclusion Criteria

This study examined all articles on information retrieval in the field of medical sciences. Therefore, the inclusion criteria include the following: All research articles on the subject of information retrieval or data retrieval or data storage and retrieval systems or retrieval systems, or article retrieval or research retrieval or image retrieval in the field of medicine and health. Exclusion criteria were: (1) studies that are not research articles, (2) studies whose bibliographic information was not sufficient to obtain standard outputs.

Therefore, out of the total document retrieved, 8404 articles were included in the study. 4578 unrelated articles were excluded from the study. If after reviewing the abstract or the full text of an article, it became clear that its subject is not directly related to information retrieval in medical sciences, it was removed as an irrelevant item. Finally, 3826 articles were analyzed (Figure 1).

Data analysis

To analyze the results, 3 software SciMAT-v1.1.04, VOSviewer-v1.6.14, CitNetExplorer_v1.0.0 were used. SciMAT (Science Mapping Analysis Software Tool) is a scientific mapping software designed by the University of Granada and available as an open-source (Cobo et al.). This software allows scientometric analysis based on bibliographic networks such as co-word, co-citation, author co-citation, journal co-citation, coauthor, bibliographic coupling, journal bibliographic coupling, and author bibliographic coupling (Cobo et al. 2012). CitNetExplorer software has also been used in this study to cluster documents based on citation relationships and analyze results based on individual authors. This software is designed by Leiden University as a tool for illustrating and analyzing citation networks of scientific publications at the level of authors.VOSviewer software has also been used for thematic cluster analysis. VOSviewer is a software tool for creating and visualizing bibliographic networks. While CitNetExplorer is used to analyze a cluster at the level of separate documents, VOSviewer is used to analyze clustering at the entire level of articles. (Cobo et al. 2012; Rezaei and Mohammadi 2018; Van Eck and Waltman 2007; Van Eck and Waltman 2010; Van Eck and Waltman 2011).

To extract bibliographic and citation information from documents in a readable format by the software used, the data is exported in the form of full records (covering author and author units, source journal titles, titles, keywords, and abstracts) and cited references in plain -text format. Some considerations on how to configure software for analysis are provided in table 1.

Table 1. Features applied in each software to perform analyzes

Tool	Index	Value
CitNetExplorer	The base of citation analysis	citation sources, predecessors^{^[1]}, and successors^{^[2]}
	Minimum number of citation links a processor or successor	2
	Maximum distance at which the processor or successors may be located from marked publication	1
VOSviewer	Keywords	author's keywords and keyword plus
VOSviewer	Minimum number of occurrence	20
SciMAT	Repetitions for assigning to include each keyword	3
	Edge value minimum in network reduction	2
	Similarity scale in network normalization	association strength
	Clustering algorithm	simple centers algorithm
	Document mapper section	core mapper
	Similarity criteria	equivalence index

[1] publications cited by marked publications

[2] publications citing marked publications

Citation analysis of the results was performed using CitNetExplorer software. The findings of this section are based on Cluster publications based on their citation relationship and analyzing the resulting clustering solutions at the level of individual publications.

Thus, the analysis of the results was as follows that is a total of 3826 publications are involved in 3661 citation links between these publications during the study period. Table 2 provides an overview of citation links in three three-year block periods. The highest citation links and relative publications are observed in the third 10-year period.

Table 2. Evolution of citation links

Block Period	Publications	Citation Links	Cl/P*
1960-2000	462	113	0.24
2000-2010	1388	932	0.67
2010-2020	2211	1157	0.52

* Citation Links/Publications

The chronological citation network is shown in Figure 2. CitNetExplorer was used in the visualization of a citation network of documents, by default, displays tags with the first author's last name. In this image, the circles symbolize the documents. The curved lines represent the citation relationships of each document (Van Eck and Waltman 2011).

The map above shows that the main and most cited articles were in two main themes. In the first theme (left), the main content of the resources was an effective method of organizing information. Most of the articles in this category deal with the methods of mining, anthologies, and their application and indexing of resources. As time goes on, the topics of the articles move from mining and its methods in retrieving information to anthologies and their application to the meaning of information. In this regard, Wilbur and Yang's article is considered as a basic article. They provided a new information-theoretical interpretation of term strength, reviewed some of its uses in focusing on the processing of documents for IR, and described new results obtained in document categorization (Wilbur and Yang 1996).

In the second topic, the main content of the documents was the application and performance of IR systems in Question and Answering forms and the analysis of information behaviors of health professionals. Over time, the thematic content of documents has shifted from search, text browsing, and search tools to topics such as physicians' clinical answers and information-seeking behaviors. The Harsh article in this category is considered as a basic article in which it discusses the use of IR systems by physicians to answer clinical questions and physician information behavior. The purpose of this article is to provide a conceptual framework and to apply the results of previous studies to this framework (Hersh and Hickam 1998).

Thematic clustering of documents based on CiteNetExplorer analysis is shown in Figure 3. From the result after analysis, the documents were categorized into 4 thematic clusters. Each cluster contains documents that are strongly related to each other. The results showed that a total of 136 documents are placed in these clusters and the core clusters were in the form of the following clusters: 48 (35%) to group 1 (blue), 36 (26%) to group 2 (green), 35 (25%) to group 3 (red) and 10 (7%) to group 4 (orange).

The thematic theme of the documents in blue clusters was “the analysis of physicians' information behavior, IR systems, EBM, and CDSSs”, in Green Cluster were “EHR and Medical Documents”. Also, the thematic theme of the red cluster was “text mining and indexing” and the thematic theme of the orange cluster was “question answering systems”.

Also, as can be seen in the chart above, most of the core publications were published from 2000 and 2010, and this indicates the significant impact of the scientific activities within this decade on the scientific productions of the next decade. In other words, most of the core scientific products that have created the infrastructure for other IR research in medical sciences had been from 2000 to 2010. As shown in Table 2, the citation link ratio of the scientific production of this period was higher than the number of its publications (0.67).

Topic networks were based on Co-occurrence networks and term maps using VOSviewer software. This embodiment shows the most important terms in the publications belonging to a cluster and the co-relational relationships of these terms. In this section, the co-occurrence analysis of words for the analysis of thematic trends in the field of IR in medical sciences is examined.

One of the problems of this stage was the existence of different forms of writing or singular and plurals and synonyms of concepts for drawing lexical maps. Therefore, to unify the concepts and prevent the dispersion of the same concepts, the researchers first designed a specialized thesaurus in IR in medical science to be used in the analysis by VOSviewer. This is one of the specialized advantages of VOSviewer software analysis. Figure 4 shows a picture of designed terminology to use in analyzing data by VOSviewer

The results of this section showed that the documents examined had a total of 10783 keywords. In addition to the author's keywords, a "keyword plus" is provided on the web of science database to provide a more accurate overview of the summary of articles. Therefore, based on the researchers' experience, both options were selected as the criteria for selecting keywords for deeper analysis. For the meaningful drawing of knowledge maps, the minimum number of occurrence conditions was considered to be 20 for analysis, and under these conditions, 116 keywords were selected as frequent keywords for these articles. Then, to increase accuracy, irrelevant keywords such as "medicine" were removed from the selected keywords. In the end, 80 keywords remained. In all maps, we plotted the weight of the words based on the frequency of the events.

The placement of keywords in clusters and the distance between nodes is based on the simultaneous use of two or more similar keywords. The size of each circle in the cluster indicates the abundance of that word in that cluster (Mohammadi et al. 2019; Rezaei and Mohammadi 2018).

After drawing the clusters and examining the keywords, it was found that the analyzed documents were in the themes of IR technologies and techniques (first cluster), information behaviors and CDSS systems (second cluster), indexing and knowledge representation tools (the third cluster) and the knowledge of searching for resources and topics related to databases (the fourth cluster) and searching for information as placed on the web (the fifth cluster). The first and second clusters had the highest number of keywords with 30 items, and after these clusters, the third clusters with 10, the fourth with 7 items, and the fifth cluster with 4 items.

In terms of all the three indicators of links, total strength link, keyword occurrence, the order of importance of keywords in the 5 clusters are as follows: In the first cluster, the keywords of “Information storage and retrieval”, “IR system”, “Natural language processing”, “Ontology’s”; in the second cluster, “Knowledge”, “Models”, “Electronic health record”, in the third cluster, “Query expansion’, “MeSH”, “UMLS”, “Terminology”, in the fourth cluster, “Bibliographic databases”, “Bibliometric”, “Databases” and “Literature searching” have the most important in their cluster (Figure 5).

Table 3 provides detailed information on the keywords in each cluster, the number of links per keyword with other concepts, Total Strength Link, and keyword Occurrence. For each specific keyword, the links and total strength link, respectively, show the number of links of a keyword with other keywords and the overall strength of the links of a keyword with other items. There can be a link between any pair of items. A link is a relationship between two things. In other words, the numbers presented indicate the number of links between each item and other items; That is, the X keyword is related to several other keywords in terms of coincidence. Each link has a strength that is indicated by a positive numerical value, as the higher the value, the stronger the bond. The strength of a link indicates the number of documents in which the two terms occur together. Occurrences show the number of documents in which a keyword appears.

Based on these results, the first cluster, "IR technologies and techniques," had the highest Link (1176), Total Strength Link (7265), and Keyword Occurrence (3228). Regarding the Link index, the keywords of the fifth cluster, "web IR", had the lowest number of links and coincidences with 216 items. But the keywords of the third cluster with Total Strength Link equal to 1129 and Keyword Occurrence equal to 420 had the lowest indicators.

Table 3. Thematic clusters in IR in medical science and detailed information of keywords based on the three (3) attributes.

Cluster number (color)	keyword	Link	TSL*	KOc**	Cluster number (color)	keyword	Link	TSL	KOc
1 (red)	Algorithms	47	164	83	2(green)	Access to information	51	130	41
	Annotation	35	98	24		Behavior	40	143	57
	Architecture	33	61	21		Clinical question	44	130	35
	Big data	21	45	29		Communication	28	49	27
	Bioinformatics	30	98	37		Decision making	36	97	37
	Biomedical literature	28	71	22		Decision support systems	35	95	35
	Classification	56	241	100		Design	48	148	51
	Content-based image retrieval	16	32	23		Education	36	90	47
	Data mining	39	134	60		Electronic health record	50	219	97
	Image retrieval	38	118	45		Framework	53	147	74
	Gene ontology	32	74	22		Impact	46	148	56
	Information extraction	39	174	63		informatics	65	381	143
	Information retrieval system	76	723	279		Information management	29	81	29
	Information storage and retrieval	79	2845	1562		Information seeking behavior	24	53	20
	Integration	30	63	20		Information systems	34	97	37
	Machine learning	41	151	63		Knowledge	71	390	110
	Natural language processing	59	401	146		Management	44	128	44
	Networks	54	171	59		Medical records	22	71	25
	Ontologies	50	354	147		Memory	8	14	22
	Patterns	28	61	23		Models	61	261	109
	Recognition	20	36	21		Needs	37	125	37
	Resources	49	103	31		Patient care information	29	103	24
	Search engines	47	116	40		Quality	48	230	78
	Semantic web	49	212	85		Question	42	147	45
	Similarity	27	66	29		Relevance	36	97	35
	Text mining	50	254	93		seeking	41	151	44
	Text retrieval	58	258	58		Support	36	86	26
	Tools	45	141	43		Technology	36	86	32
	Total	1176	7265	3228		Total	1130	3897	1417
3 (blue)	Indexing and abstracting	40	133	40	4(yellow)	Bibliographic databases	38	202	65
	Controlled vocabularies	35	81	26		Bibliometric	25	49	20
	Evaluation	31	92	33		Databases	68	529	178
	Language	42	115	41		Literature searching	14	62	23
	MeSH	52	160	58		Medline	62	513	185
	Performance	37	86	40		Search	65	419	142
	Query expansion	46	172	71		Strategies	35	125	41
	Query expansion	46	172	71		Total	307	1899	654
	Terminology	46	127	42	5(purple)	Consumer health information	21	71	21
	UMLS	36	111	49		Information	72	598	192
	Vocabulary	26	52	20		Internet	62	567	206
	Vocabulary	26	52	20		Web	61	406	141
	Total	391	1129	420		Total	216	1642	560

* Total Strength Link

** Keyword Occurrence

We used SciMAT to draw a thematic strategic diagram in the field of IR in medical sciences. To do this, after entering the data into the software, 10530 keywords were recovered. The reason for the difference with VOSviewer is that SciMAT only considers the author's words and not the keyword plus. Then we cleared the keywords. By removing the unrelated ones and replacing the synonyms. After all this work, 263 items (keywords that have been cleared) remained for analysis.

Figure 5 shows stability measures over three consecutive periods. The loops represent the periods and numbers inside each loop, indicating the number of keywords. The horizontal arrow shows the number of common keywords in both periods, and in parentheses, the similarity index is shown between them. The upper-incoming arrow indicates the number of new keywords within a period and in the period but not in the next period (Cobo et al. 2011).

The results of this section showed that the number of keywords increased significantly over time, and in the 2000-2010 period, compared to the period before 2000, it increased by 2.38 times. Similarly, the number of common keywords between subsets has increased from 94, between the period before 2000 and the period 2000 to 2010, to 224 between the period 2000-2010 and 2010-2020. The similarity index has grown over time from 0.43 to 0.71. This means that researchers in medical IR have, over time, brought their terms closer together. On the other hand, the findings show that during the 2000s and 2010s, most new keywords (91 keywords) entered the literature and terminology of IR in the medical field, indicating the growth of new concepts and dramatic changes in the development of thematic boundaries in this decade. But from 2010 and 2020, compared to the 2000s and 2020s, the emergence of new keywords has reached almost half (57 keywords), indicating a relative slowdown in the growth rate of the subject's domain (Figure 6).

Figure 2 shows a strategic chart of scientific topics in a chart. In this diagram, the centrality index is on the x-axis and the density index is on the y-axis. The strategic chart is used to determine and analyze the position of clusters and thematic concepts under each field and to describe the internal relationship and correlation from thematic clusters and the illustration of maturity and the coherence of thematic clusters. Also in the strategic chart, centrality indicators are used to measure the relationship between one subject area and other thematic areas and the density. Centrality indicates the importance of an issue, and the larger the index, the more important the cluster among the existing issues. The density index indicates the strength of the bonds that connect words in a cluster (Abdollahzadeh 2019; Cobo et al. 2012).

Using two indicators, centrality and density, the strategic chart is divided into four quarters. The topics in the upper right quarter (first quarter) are fully developed and are very important for the development of the main research structure in medical science. They are known as special themes due to their high centrality and density. The placement of the Topics in this quarter means that they have the most internal coherence and connection and are conceptually very close and related. Topics in the upper left quadrant (second quarter) are still coherent but decentralized, each of which consists of smaller specialized areas of science. Topics in the lower left quadrant (third quarter) have low density and centrality, which mainly reflect emerging or declining scientific disciplines. Topics in the lower right quarter (fourth quarter) are important in a research field but have not yet matured and have the potential to become major topics in the field (Abdollahzadeh 2019; Cobo et al. 2011; Ke et al. 2013; Melcer et al. 2015) (Figure 7).

To draw a strategic diagram to explain the situation more accurately, a strategic diagram is presented based on the number of scientific productions and the index of citation to the scientific products of the field under study.

Based on the average of citation to scientific products, the largest clusters includes ‘Similarity measures” (40.41 citations), “Mechanism” (39.37 citations), and “Barriers” (34.82 citations). In the Similarity measures cluster, “Similarity measures”, “distance nodes” with 11 documents were the largest nodes, followed by “Sets”, “Topic Models” with 6 documents in the next ranks. In the Mechanism, the cluster was “Mechanism” nodes with 15 documents and “Single-molecule magnet” with 3 documents. In the Barriers cluster, there were “Complexes” nodes with 14 documents and “Barriers” with 4 documents.

Based on the number of documents, “Medical Informatics” (1281 Doc.), “Experience” (51 Doc.), and “Expert Systems” (45 Doc.) were the largest clusters. In the Medical Informatics cluster, “Medline Search” (224 Doc.), “Medical Informatics” (211 Doc), “Database Management Systems” (189 Docs), and “Ontology” (152 Doc) was the largest nodes. In the Experience cluster, “Methodology and Experience” (15 Doc) and “University Library” (12 Docs) nodes were the largest nodes. In Expert System clusters, “Expert Systems” (13 Docs), “Conceptual graph” (12 Doc), “Interface” (11 Docs), and “Cased-based reasoning” (11 Docs) were the largest nodes. (Figure 8).

Analysis of these findings shows that in the field of IR in medical sciences, clusters of Similarity measures, Expert systems, Concepts, Experience, Answers, Multi-model IR are in the first quarter of the strategic chart. In the second quarter are the Smartphone, Hybrid, Decision tree, RFID, Feasibility Study Clusters. The third quarter includes Relational Database Clusters, Mechanism, Clinical Information Systems, Medical Terminologies, and barriers. In the fourth quarter are health information exchanges, metadata, Medical Informatics.

Examining the thematic areas of information retrieval in medical sciences, and drawing its maps, is one of the most essential methods for predicting ground research based on the past path and this study was carried out to evaluate the evolution of research and Mapping Global Knowledge Domain in works of literature of this field.

Analysis of information based on the effectiveness of research in the field of IR in medical sciences (bases on analysis of highly cited documents), shows that most scientific productions in this field fall into two categories: 1. Effective methods in organizing information and 2. Applications and operations of IR systems, the process of intelligent questioning and answering and analysis of information behavior of physicians and health professionals ". The important point in this regard is to increase the effectiveness of scientific productions in the issues of structuring and organizing knowledge and using tools such as ontologies and other semantic tools in systematizing knowledge compared to methods such as data mining. In other words, over time, research, and attention to pre-designed tools and semantic tools has increased over the methods of automatic data extraction and retrieval.

Also, the effectiveness of scientific documents based on answering clinical questions and focusing on health professionals' information behaviors has increased compared to search methods and tools. It can be said that this situation indicates the conditions in which researchers have focused more on human factors in IR.

Zowj et al identified 10 clusters in a study to identify trends in data retrieval research using the author's citation network, including Library and Information Science, Computer Science, Electrical Engineering, Information Retrieval, Information-seeking Behavior, Psychology. Multimedia Information Retrieval, Software Engineering, Ophthalmology, and Surgery. In our research, the documents were in 4 thematic clusters: "Analysis of Physicians' Information Behavior, IR Systems, EBM and CDSS", "EHR and Medical Documents", "Text Mining and Indexing" and "Question Answer Systems". The reason for this difference, in addition to the focus of current study on information retrieval articles in the field of medical sciences, was the exclusion of non-information retrieval articles in our study. Therefore, only articles written directly in the field of information retrieval in medical sciences were included in the cluster mapping. The point is that regarding the information behavior of users, the results of the mentioned research are in line with the results of our research. In both studies (information retrieval and information retrieval in medical sciences), attention to human dimensions and user behavior has been one of the most important focuses of research (Zowj et al. 2019).

On the other hand, the analysis of scientific documents published based on keywords in the field of IR research in medical sciences shows that the thematic clusters of "IR technologies and techniques" in terms of all 3 indicators, Total Strengths Link and Keywords Occurrence's has been the strongest and most cohesive cluster. The "Information Behaviors and CDSS Systems" cluster ranks next to all of these indicators with little difference. This situation shows that in terms of the frequency of the subject of the research, the technologies and retrieval techniques are still at the top; but the abundance and strengths of human subjects and aspects are quite significant close to the thematic domains of the first cluster. In other words, in terms of the number of Items, focus, and attention to human aspects of IR in medical sciences such as information behavior and application of technology in clinical science processes and related clinical areas have been increased. This confirmed the analysis of scientific products based on their effectiveness (based on the citation status of published documents).

Ding et al. in their research on data retrieval research mapping using keyword analysis, identified 5 main clusters in this research and stated that the trend of information retrieval research is moving towards concepts such as the World Wide Web. Web, information retrieval behaviors, artificial intelligence, online databases, electronic publishing, neural networks, knowledge illustration, data mining and search engines, and topics such as information needs of users in parallel with technical issues of information retrieval have been considered. This research is consistent with our research and indicates the continued focus of researchers in this field on the human aspects of information retrieval. Also, the use of intelligent methods of knowledge organization instead of classical methods such as organization based on traditional methods has received more attention (Ding et al. 2001). This part of the results is also consistent with the current results.

From another perspective, Zhao and Rui identified cross-language information retrieval research centers. The main centers of research are CLIR techniques, machine translation, query translation, query expansion, parallel corporan. Similarly, in our study, query expansion was in the third cluster, and this situation shows the importance of query expansion in various areas of data retrieval (Zhao and Rui 2011).

The results also showed that the similarity index increased over time from 0.43 to 0.71. This means that researchers in medical IR have, over time, brought their terms closer together. On the other hand, the findings show that during the 2000s and 2010s, most new keywords (91 keywords) entered the literature and terminology of IR in the medical field, indicating the growth of new concepts and dramatic changes in the development of thematic boundaries in this area. But from 2010 to 2020, the emergence of new keywords has reached almost half (57 keywords) of the period 2000-2010, indicating a relative slowdown in the growth rate of the subject area.

Analysis of the findings shows that Similarity measures, Expert systems, Concepts, Experience, Answers, and Multi-model IR clusters are considered as mature and completely centralized clusters in the first quarter of the strategic chart. In other words, these thematic clusters are highly centralized and have the highest internal coherence and communication, and are conceptually very close and interconnected. These clusters are quite developed and are very important for the development of the main research structure in the scientific field. In the second quarter, which represents cohesive but centralized clusters, each of which consists of smaller specialized areas of science, the Smartphone, Hybrid, Decision tree, RFID, and feasibility study clusters are included. In the third quarter, clusters of Relational Database, Mechanism, Clinical information systems, Medical Terminologies, and Barriers are clusters of low density and centrality, with most emerging or declining themes. In the fourth quarter, health information exchanges, Metadata, Medical informatics are not mature clusters but have the potential to become major research topics in the field of health IR in the future.

In Abdollahzadeh’s research, which drew a thematic map of the field of librarianship and information using the co-occurrence method, it was found that the metadata cluster was one of the central but not developed clusters, which is completely consistent with the results of our research (Abdollahzadeh 2019).

Paying attention to the evolution of various scientific fields is one of the most important prerequisites for research policy-making and predicting the scientific needs of researchers. This study aimed to respond to this goal and draw future perspectives in the highly variable and developing field of IR in medical sciences. The importance of this issue is that the IR and its related subjects in medical science need to evaluate IR techniques as a powerful tool for developing the research capabilities. Therefore, paying attention to the model, maps, and visualizations in this research, which has been the result of systematic analysis of scientific products in the most prestigious scientific journals in the world, can be effective in understanding research gaps and future needs in IR. Other considerations include a dramatic approximation of the vocabulary used (in fact, research areas) by researchers and a relative slowdown in the growth rate of the subject's domain in the last decade from 2000 to 2010. Therefore, it seems necessary to pay attention to the expansion of the fields of IR and the application of its concepts in medical information sciences.

In particular, research findings indicate a relative growth in the focus of IR research on the practical and human aspects of IR and information retrieval behaviors. These conditions indicate the specific situation of the application of IR technologies in medical sciences and the focus on human factors along with technological factors. Therefore, it can be recommended that designers of IR systems and techniques in medical information sciences pay more attention to human factors attentively to develop new technologies and tools.

Acknowledgment:

Researchers thank Dr. Alireza Norouzi for providing valuable guidance in conducting research.

Funding: Not applicable

Conflicts of interest/Competing interests: Not applicable

Availability of data and material: The data that support the findings of this study are available from the corresponding author upon reasonable request

Code availability: SciMAT-v1.1.04, VOSviewer-v1.6.14, CitNetExplorer_v1.0.0

Abdollahzadeh P (2019) Mapping Research Topics of Library and Information Sciences based on Co-word Analysis. Tabriz University of Medical Science

Ahmed Z, Zeeshan S, Dandekar T (2016) Mining biomedical images towards valuable information retrieval in biomedical and life sciences Database (Oxford) 2016 doi:10.1093/database/baw118

Alves T, Rodrigues R, Costa H, Rocha M (2018) Development of an information retrieval tool for biomedical patents Computer Methods and Programs in Biomedicine 159:125-134

Balaneshinkordan S, Kotov A (2019) Bayesian approach to incorporating different types of biomedical knowledge bases into information retrieval systems for clinical decision support in precision medicine J Biomed Inform 98:103238 doi:10.1016/j.jbi.2019.103238

Cargnel M, Bianchini J, Welby S, Koenen F, Van der Stede Y, De Clercq K, Saegerman C (2020) Improving laboratory diagnostic capacities of emerging diseases using knowledge mapping Transbound Emerg Dis doi:10.1111/tbed.13768

Chen H, Fang T, Liu F, Pang L, Wen Y, Chen S, Gu X (2020) Career Adaptability Research: A Literature Review with Scientific Knowledge Mapping in Web of Science Int J Environ Res Public Health 17 doi:10.3390/ijerph17165986

Cobo M, López-Herrera A, Herrera-Viedma E, Herrera F SciMAT Version 1.0 User guide:1-17

Cobo MJ, López-Herrera AG, Herrera-Viedma E, Herrera F (2011) An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the fuzzy sets theory field Journal of informetrics 5:146-166

Cobo MJ, López‐Herrera AG, Herrera‐Viedma E, Herrera F (2012) SciMAT: A new science mapping analysis software tool Journal of the American Society for Information Science and Technology 63:1609-1630

Di Girolamo N (2019) Advances in Retrieval and Dissemination of Medical Information Vet Clin North Am Exot Anim Pract 22:539-548 doi:10.1016/j.cvex.2019.06.005

Ding Y, Chowdhury GG, Foo S (2001) Bibliometric cartography of information retrieval research by using co-word analysis Information processing & management 37:817-842

Ebener S, Khan A, Shademani R, Compernolle L, Beltran M, Lansang M, Lippman M (2006) Knowledge mapping as a technique to support knowledge translation Bull World Health Organ 84:636-642 doi:10.2471/blt.06.029736

Franker MA (2020) Visualisations in science communication: Friend or foe? Medical Writing 29:11-10

Harman D (2019) Information retrieval: the early years Foundations and Trends® in Information Retrieval 13:425-577

Hersh WR, Hickam DH (1998) How well do physicians use electronic information retrieval systems?: A framework for investigation and systematic review Jama 280:1347-1352

Hess DR (2004) Information retrieval in respiratory care: tips to locate what you need to know Respir Care 49:389-399; discussion 399-400

Irmawati S, Cakrawijaya H, Lydia EL, Shankar K, Nguyen PT (2019) Medical Information Retrieval for Healthcare: The Challenges International Journal of Engineering and Advanced Technology 8:811-814

Janssens F, Glenisson P, Glänzel W, De Moor B Co-clustering approaches to integrate lexical and bibliographical information. In: Proceedings of the 10th international conference of the International Society for Scientometrics and Informetrics (ISSI), 2005. pp 284-289

Ke W, Yunjiang X, Xiao L, Weichan L Analysis on current research of supernetwork through knowledge mapping method. In: International Conference on Knowledge Science, Engineering and Management, 2013. Springer, pp 538-550

Kim MC, Nam S, Wang F, Zhu Y (2020) Mapping scientific landscapes in UMLS research: a scientometric review Journal of the American Medical Informatics Association 27:1612-1624 doi:10.1093/jamia/ocaa107

Li S, Jin Q, Jiang X, Park JJJH (2013) Frontier and Future Development of Information Technology in Medicine and Education: ITME 2013 vol 269. Springer Science & Business Media,

Li X, Du J, Long H (2019) Dynamic analysis of international green behavior from the perspective of the mapping knowledge domain Environmental Science and Pollution Research 26:6087-6098

Lu C et al. (2020) Knowledge Mapping of Angelica sinensis (Oliv.) Diels (Danggui) Research: A Scientometric Study Frontiers in Pharmacology 11:294

Melcer E, Nguyen T-HD, Chen Z, Canossa A, El-Nasr MS, Isbister K (2015) Games research today: Analyzing the academic landscape 2000-2014 network 17:20

Milliken LK, Motomarry SK, Kulkarni A (2019) ARtPM: Article Retrieval for Precision Medicine J Biomed Inform 95:103224 doi:10.1016/j.jbi.2019.103224

Mohammadi M, Sheikhshoaei F, Banisafar M, Mozafari O (2019) Scientometric Analysis of Scientific Publications on Persian Medicine Indexed in the Web of Science Database Webology 16:151-165

Núñez Reiz A, Armengol de la Hoz MA, Sánchez García M (2019) Big Data Analysis and Machine Learning in Intensive Care Units Med Intensiva 43:416-426 doi:10.1016/j.medin.2018.10.007

Rezaei L, Mohammadi M (2018) Scientometric analysis of Iranian scientific productions in the field of Ophthalmology Journal of Clinical and Basic Research 2:23-32

Rorissa A, Yuan X (2012) Visualizing and mapping the intellectual structure of information retrieval Information processing & management 48:120-135

Salvador-Oliván JA, Marco-Cuenca G, Arquero-Avilés R (2019) Errors in search strategies used in systematic reviews and their effects on information retrieval J Med Libr Assoc 107:210-221 doi:10.5195/jmla.2019.567

Van Eck NJ, Waltman L (2007) VOS: A new method for visualizing similarities between objects. In: Advances in data analysis. Springer, pp 299-306

Van Eck NJ, Waltman L (2010) Software survey: VOSviewer, a computer program for bibliometric mapping scientometrics 84:523-538

Van Eck NJ, Waltman L (2011) Text mining and visualization using VOSviewer arXiv preprint arXiv:11092058

Wilbur WJ, Yang YM (1996) An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts Computers in Biology and Medicine 26:209-222 doi:10.1016/0010-4825(95)00055-0

Xu B et al. (2019) A supervised term ranking model for diversity enhanced biomedical information retrieval BMC bioinformatics 20:1-11

Zeng L, Li Z, Wu T, Yang L Mapping knowledge domain research in big data: From 2006 to 2016. In: International Conference on Data Mining and Big Data, 2017. Springer, pp 234-246

Zhang XM, Zhang X, Luo X, Guo HT, Zhang LQ, Guo JW (2017) Knowledge mapping visualization analysis of the military health and medicine papers published in the web of science over the past 10 years Mil Med Res 4:23 doi:10.1186/s40779-017-0131-8

Zhao R, Rui C Visual analysis on the research of cross-language information retrieval. In: Proceedings of the International Conference on Uncertainty Reasoning and Knowledge Engineering, URKE 2011, 2011. pp 32-35. doi:10.1109/URKE.2011.6007900

Zhao XY, Sheng L, Diao TX, Zhang Y, Wang L, Yanjun Z (2015) Knowledge mapping analysis of Ebola research Bratisl Lek Listy 116:729-734 doi:10.4149/bll_2015_143

Zhu L, Liu X, He S, Shi J, Pang M (2015) Keywords co-occurrence mapping knowledge domain research base on the theory of Big Data in oil and gas industry Scientometrics 105:249-260

Zou X, Vu HL (2019) Mapping the knowledge domain of road safety studies: A scientometric analysis Accident Analysis & Prevention 132:105243

Zowj HA, Ghane MR, Ehsanifar F (2019) Identifying information retrieval research trends using author co-citation network International Journal of Information Science and Management 17:99-117

Download PDF

Version 1

posted

You are reading this latest preprint version

Mapping Global Knowledge Domain, Research in Information Retrieval in Medical Sciences: A Scientometric and Evaluative Study

Status:

Version 1

Abstract

Figures

Introduction

Methods

Results

Discussion

Conclusion

Declarations

References

Status:

Version 1