Systematic review
The database search identified 492 publications, yielding 357 after removal of duplicates, 181 after screening titles and abstracts. 21 full text articles were identified that fulfilled the inclusion criteria. The CONSORT diagram is illustrated in Fig. 1.
Detailed characteristics of these 21 publications is summarized in Additional file 2. All selected studies represented a full economic evaluation, examining both costs and effectiveness (CEA) or utilities (CUA). Eighteen out of the 21 publications took variable estimates in the analysis from randomized control trials (RCTs), most were derived from the LUX-Lung 3,6 and 7, OPTIMAL, EURTAC, SATURN, GFPC and BR.21 trials, the other three publications were based on hospital medical records. Nine studies demenstrated specifically the patient population, the study sample size varied from 41 to a cohort of 731 patients.
The majority of articles compared different targeted therapies to chemotherapy (n = 16), being afatinib versus chemotherapy (n = 2) and erlotinib versus chemotherapy/placebo/best surpportive care (BSC) (n = 12), gefitinib versus chemotherapy/routine care (n = 2) in addition. Eight articles evaluated the cost-effectiveness of treatments between the three first line strategies among NSCLC patients harboring EGFR mutations, four of which compared afatinib to gefitinib in three countries, two studies estimated the economic impact between afatinib and erlotinib in two countries, another two publications addressed the effectinveness and cost-effectiveness of erlotinib versus gefitinib in one country.
CHEERS checklist
The results of the assessment of reporting quality per study is summarized in Table S2.1. Figure 2 shows a visual representation of the fulfilment of the CHEERS criteria and a sorting of completeness of the items. The score on the CHEERS checklist ranged from 12 to 21among the selected studies. According to Hong et al [33], the publications were categorized as being of good reporting quality if they were scored 20–24, and were deemed to be of moderate and low reporting quality if they were scored 14–19.5 and <14, respectively. Only four (19%) studies were of good quality based on the CHEERS checklist, 16 were of moderate quality, and two were of low quality. The quality ranking of these studies were of no relation with publication years.
Treatment comparators were always described in the title, except 5 papers did not describe the interventions compared [15,16,19–21]. In addition, setting, perspective, time horizon and discount rates were not always included; results of uncertainty analyses, choice for health outcomes, findings and conflicts of interest were also not provided in all articles. As most studies were using data from RCTs, characterizing heterogeneity was generally unprovided, only Wang et al [18] reported explicitly by variations between subgroups of patients with different genotype baseline. Although some studies described the base case population, most of them presented no characteristics and the reason they chosen. For measurement and valuation of preference based outcomes, only two papers based on a systematic review [1,14], the others referred to the source of utilities used, without justifying the selection.
Quality evaluation (QHES)
The results of the quality assessment using the QHES instrument are presented in Table S2.2. The table shows how often each criterion was met by the 21 studies. According to Spiegel et al [34], studies are grouped by the following quartiles: (1) extremely poor quality (0–24); (2) poor quality (25–49); (3) fair quality (50–74); and (4) high quality (75–100). Less than half publications (38%) are classified as high quality and 3 studies (14.3%) are of poor quality [1,11,17]. There are five studies presented no economic model used through their manuscript [1,7,11,17,20].
For the rest of CEA/CUAs, ten articles developed a Markov model to compare the cost-effectiveness of first-line targeted therapies and chemotherapy, among them two articles [14,15] created both the Decision tree and the Markov model. While three studies [2,9,16] just use decision tree to estimate the costs and the utilities, the model is displayed in an in-transparent manner by three studies [5,12,13]. For the articles used Markov model, there was no illustrated structure of the model in three studies [6,8,19], and three were showed by the Markov model tree [4,10,21]. Only two studies [4,18] manifested the formula of the transitioning probabilities specifically from each stage to the next, most others usually demonstrated the rates or the probability which were calculated from clinical trials. No justification for the choice of the model was given by six articles [2,9,12,13,15,19]. We also evaluated the source of utility values that all selected studies extracted, 13 out of 21 papers were obtained from previously published literature, five studies derived utility data from EQ–5D or other quality of life survey [2,3,5,7,16], and three studies used survival data from a single medical institution [9] or clinical trials [12,13]. The effectiveness value was calculated in quality-adjusted life years (QALYs) for fifteen studies, life year (LY) was measured by four studies [12,13,20,21], one article [1] used median survival time (MST) to evaluate the therapeutic effect of the regimens. Statistical pairwise comparison between CHEERS and QHES did not result in significant differences (Fig 3).