This study applied two causal inference techniques, inverse propensity weighting (IPW) and doubly robust method (DR), combined with handling of missing data to estimate the effect of TXA on patients with TBI from observational data. Based on the present large observational database, IPW seems to overestimate a harmful effect of TXA on mortality when compared to the CRASH-3 reference. By contrast, the estimation based on Doubly Robust (DR) suggests that TXA administration after TBI does not exert any effect on mortality.
How do these results relate to the available evidence? First, the present study truly does not compare to the available prospective evidence. Any evaluation between the presented results and prospective evidence in particular CRASH-3 only serves the purpose to appreciate the performance of the deployed statistical (or analysis) techniques. Second, prospective studies diverge in crucial points such as power, outcome criteria (28-day mortality versus 30-day or all-cause mortality), pre-hospital or intrahospital TXA administration, excluding extracranial hemorrhage, administration protocols.
The results for ATE in % of risk of head injury related death for the Doubly Robust method ranged from a 7% decrease to a 5% increase depending on the method to estimate the missing values (MICE of MIA) and 3–15% increase with IPW. In comparison, CRASH-3 risk of head injury related death in the overall cohort ranged from a 14% decrease to a 2% increase with a mean of a 1.3% decrease[2]. Rowell at al [3] showed a range from a 8% decrease for 28-day mortality to a 2% increase, with a mean of 3% decrease. The meta-analysis[5] concluded to a 3% decrease to a 1% increase in overall mortality.
The main challenge in any causal inference approach from observational data is the control of confounders. Bossers et al., a registry based observational study, showed an increase in 28-day mortality concurring with the ATE estimation by IPW. Bossers et al. established the association through unadjusted logistic regression, then adjusted for confounders in contrast the present study based on expert knowledge and directed acyclic graphs as recommended[9]. The results discordant from CRASH-3 obtained by Bossers et al. and IPW may however pertain to the same difficulty to obtain sufficient control for confounding.
When comparing observational data from two very disparate groups, standard propensity scores methods tend to under-correct for the observed difference, either due to model misspecification (in the case of logistic regression) or insufficient sample size (in case of random forest regression). In consequence the estimation of the treatment effect becomes erroneous. In the present case IPW seems not to have sufficiently corrected for the treatment bias, probably because it struggles to achieve sufficient control of confounders. IPW and DR require both a sufficient knowledge of all confounding factors. DR however provides better control of potential bias and smaller variability than IPW, integrating at the same time a prediction of mortality and of treatment allocation. This dual modelling of mortality and treatment allocation optimally exploits the available data and protects against misspecification of either one of the models, making it more robust than IPW. Furthermore, the flexibility of random forests in the doubly robust method engenders a more powerful model capturing complex relationships and is suited for application to a large cohort. The first take-away from the present paper is therefore when employing causal inference, DR is preferrable to IPW. This study also shares an important innovation, since it is the first to combine DR with two advanced methods to handle missing data and both generate concordant results.
Trials in critical care in the last fifteen years often produced negative results and were regularly underpowered to detect frequently unattainable outcome targets [7]. Mortality as the most certified outcome criterion often falls short to picture heterogenous effects of complex interventions in complex disease [8]. Furthermore, RCTs consume precious human, financial, organizational and time resources. Benchmark trials such as CRASH-3 result from exemplar international research efforts, not applicable or reproducible to many research questions. Not only because of resource constraints, but the necessary recruitment remains unobtainable in an appropriate timeframe. Recruitment is not facilitated by the increasingly small marginal benefits bestowed by ever more complex interventions. Despite a strong rationale, CRASH-3 required more than 12000 patients.
Augmented causal inference does not aspire to become a substitute for randomized controlled trials but is capable to upgrade conventional observational in particular in the era of big data and physiological research and to provide a better rationales for RCTs [9]. The approach could become a reference to prepare RCTs, explore the association of different interventions or bundles in different subgroups. This customized preparation would funnel research resources to the most promising RCTs. For this reason, the results of this study using augmented causal inference appear promising and should be further explored. An association of prospective, randomized data and parallel augmented causal inference on observational dataset could be feasible.
The study imparts specific limitations. The inclusion period spans from 2010 to 2019, over this long period it is likely that management and epidemiology have evolved. The study group considered all TXA to be administered within three hours of injury. Furthermore, TXA was administered for suspected hemorrhage and not TBI, making the effect on isolated TBI difficult to assess; the association of hemorrhage and TBI might have affected the outcome prediction, although this was accounted for in the model. Among the patients included with TBI, 842 presented with severe acute hemorrhage (received at least 4 red blood cell packs within the first six hours). The choice of confounders and treatment allocation variables was based on expert advice and could be a possible source of bias. Even experts may fail to perceive alternative explicative patterns and risk to appreciate only the patterns they know and are prone to inherent cognitive bias. Despite use of a Delphi and DAG to map possible confounders to account for these in the final model, some variables might still escape sufficient control. Collected data can be imprecise (for example blood pressure measurements) and are only a fragmented surrogate for a complex physiological process (a few blood pressure measurements at various time points versus continuous data). Missing data constitute an inherent limitation of any work based on off-the-shelf observational data; missingness of data is impossible to prevent in particular in registry data and all the more so in a clinical context of emergency. Fully aware of this intrinsic limitation of registry data, the study group set out to purposefully integrate and advance management of missing data, testing two different methods for imputation of missing data. With all these imperfections, control for confounding in causal inference of observational data remains a formidable challenge. Future studies need to address this challenge including the quality and mapping capacity of observational data. Finally, the chosen threshold of a Mean Standardized Difference of 20% as accepted might seem important, in particular since mortality differences between the groups are < 20%.