We used a four-step RAND-modified Delphi method to develop a set of QIs to measure the appropriateness of antimicrobial use in adult and hospitalized pediatric patients, outpatients, or patients receiving surgical prophylaxis [10,11]. Figure 1 presents an overview of the RAND-modified Delphi procedure, which included a comprehensive literature review to develop a list of candidate key QIs, two rounds of an online survey, and a face-to-face meeting with the panelists. The consensus procedure combined the individual opinions of multidisciplinary expert panels. All the panel members consented to participate in the study and were aware that their answers would be used for research.
Systematic search for generating key QIs
We performed a systematic search using a protocol designed by two independent medical librarians (D.W.S and M.L). We screened the literature using databases of PubMed, EMBASE, and Cochrane Library for papers published up to 20th November 2019. The search strategy is shown in Supplemental Figure 1 and was directed towards identifying evidence-based QIs for antibiotic use (e.g., literature review or evidence-based guideline). Papers written in English and discussed using systemically administered antibiotic drugs in inpatients, outpatients, and surgical prophylaxis were included, except for case reports. This study aimed to develop key QIs useful to assess the antibiotic use appropriateness in treating all bacterial infections and surgical prophylaxis. Therefore, we excluded antiviral, antifungal, antiparasitic, or antituberculosis drugs. Using EndNote software (version X7.1, 2020 Clarivate), two researchers (B.K. and S.Y.P.) independently examined all titles and abstracts to select papers that described QIs. Any disagreement on the inclusion or exclusion of studies was resolved through discussion with a third author (S.M.). If no abstract was available or there was a lack of information for the eligibility assessment, papers were selected for full-text screening. The inclusion/exclusion criteria evaluation for full-text screening was performed by two researchers (B.K. and S.Y.P.).
Selection of potential key QIs
Data on potential QIs were extracted by four researchers (B.K., M.J.L., S.Y.P., and S.M.). The QIs were excluded if they were: not concerned with antibiotic use for a specific group of patients, non-normative, or developed for each institution and not for patients. The extracted QIs were then clustered into different non-overlapping logical themes based on the definition of responsible use. When a QI could be allocated to more than one theme, the predominant theme was chosen based on consensus between two authors (B.K. and S.Y.P.). Duplicates were removed, and the QIs were rephrased as a recommendation. The clustering, aggregating, and rephrasing steps were undertaken consensually among four authors (B.K., M.J.L., S.Y.P., and S.M.).
First online survey
We emailed invitations to different specialists for their participation in 25 expert panels. The panel of doctors comprised experts working at university-affiliated hospitals in the ROK. During panel selection, we aimed to select experts representative and responsible for antibiotic prescriptions. The panel comprised infectious disease specialists (n=13), laboratory medicine doctors (n=3), pediatric infectious diseases specialists (n=2), urologists (n=2), otorhinolaryngologist (n=1), gastroenterologist (n=1), pulmonologist (n=1), general surgeon (n=1), and researchers of the National Evidence-Based Healthcare Collaborating Agency (n=1) (Supplemental Table 2). To rate the degree to which the potential QI described appropriate antibiotic use, a Likert scale, ranging from 1 (‘definetly inappropriate care’) to 7 (‘definitely appropriate care’), was used. The panelists could rephrase the potential indicator and could even add new items. A consensus was defined as the case wherein ≥70% of the scores were in the top quartile (scores 6 and 7); similarly, scores with <70% agreement were defined as disagreement. QIs with a median score of 6 or 7 were accepted if there was agreement. If there was disagreement and the median score was ≤5, the QI was rejected. Thus, QIs with a median score of 6 and 7 with disagreement were discussed during the expert panel meeting. In addition, we graded each QI using a Likert scale score, ranging from 1 to 7, as relevant to inpatient care, outpatient care, or surgical prophylaxis. If the score was 6 or 7, we considered it to be an appropriate QI.
Expert panel meeting
All panel members were invited to a face-to-face panel meeting. Before the meeting, all participants received a personal feedback report with the results of the first online questionnaire. The agenda of the panel meeting was to present the results of the first round of the survey and to discuss the QIs with a median score of 6 or 7 that had inadequate consensus. These QIs were accepted if at least 70% of the experts concurred. In addition, newly added potential QIs were discussed, and the accepted QIs with comments from the experts were rephrased base on consensus.
Second online survey
A second questionnaire that included all the selected and rephrased QIs were sent with a personal feedback report (providing the results of the previous two steps of the consensus procedure) to all participating panelists. The panelists were asked to select from the following three answers: ‘agree,’ ‘disagree,’ and ‘cannot assess.’ The rephrased indicators were accepted if at least 70% of the experts agreed with the new formulation. Furthermore, we asked the panelists to consider the importance of potential QIs that could be used as key indicators for antibiotic use in the ROK, with a Likert scale score from 1 ‘less important’ to 5 ‘highest importance’. If the QIs could be evaluated in point surveillance, we considered them applicable. This was finally confirmed by four researchers (B.K., M.J.L., S.Y.P., and S.M.). We excluded QIs with durations in which appropriateness was difficult to evaluate by point surveillance