Study Design
To determine the speed of sharing research data and results in the COVID-19 epidemic, we measured the time-lag between the epidemic start time and the rise in the number of coronavirus-related documents indexed in the PubMed database. The PubMed database enables easy access to the Create Date (CRDT) of each article, which is the date that the article was added to the database (15). This date is important because the articles become searchable for the research community since then. We determined the number of PubMed publications pertaining to coronaviruses indexed in each week by finding articles that have the term “coronavirus” in their “title/abstract” and indexed in “Date-Create (CRDT)” of the week range. Forty-eight weeks before the epidemic start-time was set as the control period (from 6th January to 8th December)(1,16), and the number of documents per week for this period was assumed as the baseline level. Afterward, the number of documents per week after the epidemic start-time was analyzed, and the week in which the numbers of documents increased significantly compared to the baseline level was determined. Then, the time range between this week and the epidemic start-time and declare-time was calculated. We also did the same for the SARS and 2014 Ebola epidemic in West Africa by changing the time scale to month. Therefore, forty-eight months before each epidemic start-time was considered as the control (1st November 1998 to 1st November 2002 for the SARS (17), and 1st December 2009 to 1st December 2013 for the Ebola epidemic (18)). The time frame change was made because of the low number of publications per some weeks for these two epidemics (even zero for some records). In the case of the Ebola epidemic, the term “Ebola” was searched in the “title/abstract.”
To classify the topics investigated in the articles before and during the COVID-19, SARS, and Ebola epidemics, we used the PubMed Medical Subheading (MeSH) database. MeSH terms and subheadings are controlled vocabularies for indexing and searching biomedical literature, which is used as an indicator for the topic of an article (19). Eleven Mesh subheadings were chosen to classify the articles’ topics into eleven categories: diagnosis, drug therapy, epidemiology, etiology, genetics, immunology, microbiology, prevention and control, statistics & numerical data, transmission, and organization & administration. Each category’s exact definition is presented in the PubMed MeSH database in the subheadings part (20). For the COVID-19, the year 2019 was considered before the epidemic period, and the year 2020, up to 10th April, was considered the epidemic course. For the SARS and Ebola epidemics, the years 2002 and 2013 were considered as before the epidemics period, respectively and, the years 2003 and 2014 were considered the epidemic course. This consideration’s logic was based on the previous searches, which showed a marked rise in the number of relevant articles in 2003, 2014, and 2020.
To find the number of articles having each topic in them, we found the number of PubMed publications that indexed in the desired “Date-Create” and have the term “coronavirus” (“Ebola” in the case of the Ebola epidemic) in their “title/abstract” and assigned to the desired MeSH subheading. Samples from the total search results were screened manually to assess the validity of the search method. After approving the search method’s validity, all of the results were included in the analysis, and no exclusion was performed. A similar process was done for finding the records pertaining to the SARS and Ebola epidemics. Notably, each article could be assigned to a few of these subheadings or none of them; therefore, the sum of the subheadings’ frequency was not equal to the total number of articles indexed.
Finally, to determine the topics that were more investigated during the epidemic courses, two measures were compared between before and during each epidemic: First, the rank of each topic in frequency, and second, the proportion of each topic frequency to the total number of search results (relative frequency).
Statistical Analysis
For analyzing the rise in the number of publications after each epidemic emerged, we omitted outlier records in the control period by considering any value that lies more than one and a half times the interquartile range (IQR) beyond the first and the third quartiles. Therefore, two records were omitted from the SARS and Ebola control periods (October 2011 and December 2012 in the Ebola and December 1998 and January 2002 in the SARS epidemic control period). Then, the Shapiro-Wilks test was performed to confirm that all of the control periods follow a normal distribution (21). The records belong to after each epidemic start-time was analyzed using a one-sample z-test to determine the first record that shows a significant increase compared to the control period. All Data are summarized as mean (SD), and we considered differences at p-value < 0.01. Data analysis was conducted using Python (version 3.6). The SciPy library (version 1.4.1). Data visualization was performed using Tableau Desktop (version 2020).
For comparing the proportion of each topic frequency to the total number of search results (relative frequency of topics) between before and after each epidemic start-time, the Z test for two population proportions was performed. Using the analysis, we determined the topics that their relative frequency was increased significantly after the epidemics start-time. We considered increase at p-value < 0.01. The data was visualized in the figures using Word document (version 2016).