Subgroup analyses have the potential to generate investigation hypotheses, discover new treatments, and identify baseline factors that may influence treatment efficacy or toxicity. However, when subgroup analyses are misused may also lead to spurious findings and misleading interpretations [31–33]. The most frequent methodological limitations of subgroup analyses in RCTs have been reported extensively; multiple testing of hypotheses, inadequate statistical power, inappropriate a priori specification, and lacking biological rationale [4, 5, 33–35].
As a result of this review, we can observe that, generally, the subgroup analyses carried out in RCTs of pulmonary hypertension-specific therapy are of low quality, despite being published primarily in high-impact factor journals. It highlights the lack of clarity in the allocation concealment. For most clinical trials, the study protocol is not available; therefore, it is challenging to verify aspects such as the pre-specification of the subgroup analyses. Furthermore, of the 11 RCTs with subgroup effect claims, only one has a publicly available protocol. For those studies whose protocol was available, subgroup analyses reported in the manuscript lacked description and were significantly different from those planned in the protocol.
Other factors that stand out the methodological errors when performing subgroup analyses in this study were identified; A high number of subgroup analyses reported, the high number of post hoc analyses, and the lack of interaction test to confirm the existence of subgroup effects.
When multiple subgroup analyses are carried out, the results obtained should be interpreted with caution since the probability of obtaining a false positive can be significantly augmented [5]. This risk may be increased, especially if, in addition, the hypothesis of the subgroup analyses has not been pre-specified [5, 13, 33]. The approximately calculated risk for a false positive result for 5 subgroup analyses is 25%; however, it may increase as the number of subgroup analyses arises. We identified a median of 6 subgroup analyses reported among the RCTs evaluated in this review.
The pre-specification of subgroup analysis is a frequent parameter measured in order to estimate methodological quality. For a subgroup analysis to be prespecified, it must be planned and documented before any examination of the data; this is based on the premise that a prespecified analysis usually follows a biological rationale. However, pre-specification alone may not lead to solid subgroup analyses as prespecified analysis may be based on unlikely and poorly formulated hypotheses [36]. In pulmonary-specific therapy RCT, 46.7% (14) of subgroup analyses were prespecified.
In addition to the pre-specification of the subgroup analysis, the correct direction of subgroup hypotheses must also be specified. For those claims in which the direction of the effect has not been or has been wrongly identified, their credibility could be reduced.
A common mistaken belief among authors is to claim a subgroup difference when a statistically significant effect is found in one subgroup but not in the other. One of the essential criteria to appropriately establish a claim of subgroup effect is performing an interaction test [37]. The p-value of an interaction test provides information about the probability that the existence of a subgroup difference is due to an accidental finding or chance rather than an actual subgroup effect. In this review, we observed that only 37.7% of the RCTs performed an interaction test to confirm the existence of a subgroup claim. Of the 9 claims of subgroup difference identified in this study, 44.4% (n = 4) were based on a significant interaction test. When comparing our results with others carried out in other areas, we found mixed results. Wallach et al. identified that among a sample of articles that made at least one claim in the abstract, 40% of the subgroups' claims were based on the result of an interaction test [38]. On the other hand, Khan et al. evaluate the quality of subgroup analyses in heart failure RCTs, reporting 70% of claims based on significant interaction tests [39].
Most of the studies included in this review were industry-funded (90%), which could have influenced our results. The source of funding of clinical trials may play a role in the quality of the reports of subgroup analyses; industry-funded RCTs are more likely to report subgroup analyses [40–42], even when an overall treatment effect for a primary outcome could not be proved [40]. Industry funding was also correlated with suboptimal reporting of subgroup effects; often, the subgroup hypotheses were not pre-specified, and the use of an interaction test was rare [40, 42]. This is consistent with our findings in this primarily industry-funded sample of RCTs as, among the articles that claimed difference of subgroup effect, only 4 (36.4%) RCTs reached the primary endpoint.
Previous studies have found that the methodological quality reported on the methods sections of published articles is lacking compared to study protocols [43, 44], finding high-quality studies being poorly reported. Protocols provide a complete insight into the analysis methods utilized in RCT. It is recommended to publish trial protocols all together with the publication of the RCT and its publication in clinical trial registries, thus providing the reader a transparent and complete description of the prespecified methods. However, several studies have found that RCT protocols are often not freely available [41, 45]; this is consistent with our findings, as only 7 out of 30 RCTs provided the study protocol, and discrete growth in protocol publishing was observed during the studied period.
The fact that protocols are not systematically accessible is alarming; even when voluntarily published, discrepancies with journal publications are relatively frequent when reporting study outcomes [46–54]. Similarly, high inconsistency between protocols and publications has been described in several methodological characteristics of subgroup analysis: Omitted prespecified analyses [54], interaction test, pre-specification of subgroup analyses, and minor differences for the anticipated direction of the effect [41]. Due to these prevalent discrepancies, the credibility of subgroup methods may be questionable if the study protocol is not accessible.
Our findings coincide with previous reports; few studies (23.3%) published the protocol either in the journal publication or clinical trial registries. 46.7% (n = 14) of studies reported a prespecified subgroup analysis, with only half publishing the study's protocol. Furthermore, 30% (n = 9) of studies did not report clearly whether the subgroup analysis was prespecified or post-hoc; in none of these cases, the protocol was freely available.
Despite subgroup analysis methodological limitations in RCTs are increasingly recognized, a review of 437 randomly selected RCTs published in high-impact journals found a decrease in the appropriateness of reporting subgroup analyses from 2007 to 2014 [42].
In contrast with these results, we observed an improvement of most methodological characteristics of pulmonary hypertension-specific therapy RCTs: a priori specification, forest plot utilization, and interaction test improved from 2002 to 2019. However, a decline of subgroup variables set as stratification factors during randomization was observed. This decrease adds to the hypothesis that most subgroup analyses, even when prespecified, are exploratory. When a particular characteristic is known to influence the trial outcome, it should be used as a stratification factor at randomization.
Claims of subgroup effect are common in RCT reports. Several systematic reviews and analyses have shown that authors believe and report a difference in treatment effects between patient subgroups in 40–60% of all RCT reporting subgroup analyses [13, 36, 55]. Few systematic reviews have described a relatively low number of subgroup claims [14, 39]. Our results were in line with the latest, as we found that pulmonary hypertension-specific therapy RCTs reported claims of subgroup effect on 26.7% (n = 9) of RCTs reporting subgroup analyses. Fewer subgroup claims may indicate that authors are cautious in their reporting, as these claims may result in changes in clinical practices.
4.1 Strengths.
To our knowledge, this is the first systematic review of the credibility of subgroup analysis and subgroup effect claims reported on pulmonary hypertension-specific therapy RCTs. A rigorous systematic method was employed. Standardized criteria were used in order to assess the credibility of subgroup claims.
4.2 Limitations.
This study has some limitations: First, although we use a scale to determine the credibility of the claims, the sun criteria were not designed to provide a score; therefore, the later interpretation of its results is not without subjectivity.
Secondly, when assessing the strength of a claim, there is an undeniable subjective value in interpreting what the authors state. However, the pair-wise work and the high agreement in the results of both researchers suggest that the limitation in this sense was not significant.
Third, in most of the studies, we were unable to find the study protocols. In many cases, we could not know whether the published results correspond to the initially defined objectives; this limits our capability to judge the credibility of subgroup claims. For this purpose, authors must provide detailed information about the conduct and results of subgroups analysis.
4.3 Improvement on the reporting of subgroup analyses proposals.
Although the methodological limitations of subgroup analyses are consistently reported in the literature, similar mistakes are carried when conducting and reporting subgroup analyses in recent RCTs. As improvement measures to change the current state of subgroup analyses, we propose the following:
Firstly, subgroup analysis should be prespecified and documented in trial registries. Secondly, scientific journals should request authors to make the study protocol accessible to reviewers and readers as a requirement for publishing the results of RCTs. Thirdly the use of guidelines or tools for the correct publication of subgroup analyses should be enforced. Fourthly, researchers should be cautious when claiming subgroup differences, even when a robust methodology for subgroup analyses was followed.