We systematically reviewed the literature to assess the current state of practice in using MI for estimation of causal effects from incompletely observed observational data. We focussed on four key areas: missing data summaries, missing data assumptions, primary and sensitivity analyses, and MI implementation. Overall, we found that most studies are not reporting missing data, and missing-data-related assumptions, decisions, or analyses with sufficient clarity.
An unexpected although perhaps unsurprising finding from this review was that the analytical sample is often arrived at by excluding individuals with missing data in specific variables, for example, by using eligibility criteria that require key variables to be completely observed. This means the full extent of missing data is difficult to quantify due to difficulty in identifying the inception sample. Therefore, for the purposes of reporting the amount of missing data in this review, we considered the amount of missing data within the analysis sample only. However, identifying the exact amount of missing data within the well-defined analysis sample was also often difficult because summaries were frequently reported per variable without describing missing data patterns.
Details of the assumptions made about the missing data mechanism were often lacking and, when provided, not justified appropriately. A statement of assumptions about the missingness mechanism was provided for just one-third (33%) of studies. This is, however, an improvement over what was found in the reviews conducted by Mackinnon (2010), where 8/50 (16%) observational studies provided a statement that data were MAR,(9) and Rezvan et al. (2015), where 7/30 (23%) observational studies stated or described the assumed missing data mechanism.(8) When a statement about the missing data mechanism was provided, most studies said they assumed data were MAR, but justifications for missingness assumptions were provided in just 11 studies. The most common justification for the MAR assumption included participant characteristics differing between those with and without complete data, determined by an investigation of summary statistics or by conducting formal hypothesis tests. However, it is impossible to distinguish between MAR and MNAR using data-based assessments, so these justifications are not complete. As described in the Introduction, the MCAR/MAR/MNAR assumptions are difficult to interpret and assess in the context of multivariable missingness, so it is not surprising that we found lacking or incomplete justifications for these assumptions. Of note, no study provided a comprehensive description of missing data assumptions, for example, using an m-DAG. Furthermore, the omission of a statement of missing data assumptions entirely from most studies suggests that the critical link between missing data assumptions and estimation methods is not generally appreciated. When missing data assumptions were used to guide the choice of MI as the primary analysis, the most common justification for using MI was because data were assumed to be MAR (without justifying the MAR assumption).
Most studies in this review used standard MI for the primary analysis. Approximately half of the studies conducted a secondary analysis that treated the missing data differently from the primary analysis, but the reason for doing so was almost always omitted or unclear. When studies did carry out two analyses that handled the missing data differently, it was common to conduct both a CCA and MI. Without justification, it is not clear why such an analysis is warranted. It may be to examine the sensitivity of ACE estimates to causal assumptions made about the missing data mechanism for the primary analysis. We speculate another motivation for such an analysis may be the misconception that a CCA is the “normal” approach to dealing with missing data while standard MI provides a more sophisticated analysis that allows you to assess whether the missing data were really an issue or not. However, if under plausible missingness assumptions neither standard MI nor CCA can provide unbiased estimation, then it would be incorrect to conclude that the missing data “had little impact” on the results. In other words, when there is no unbiased estimate to compare against, the impact of the missing data remains unknown. Of the 61 studies that conducted both a CCA and MI analysis, only 3 (5%) studies observed a substantial difference between MI and CCA estimates. Just one study conducted an analysis that incorporated assumptions about a difference between the missing and observed data distributions. Despite being an area of recent methodological development, our finding that such analyses are not being performed often is similar to findings from previous reviews, see e.g. (8, 152).
MI is increasingly recognised as a method for estimation that needs to be carefully tailored to the target analysis.(7) However, the findings from the current review suggest that there is room for improvement in the reporting of MI implementation. For example, certain aspects of the imputation model form were reported just over half of the time despite being needed to judge the appropriateness of the MI model and ensure the analysis can be reproduced.
The strengths of this review are that it documents the current practices in the use of MI for estimating ACEs from incomplete observational data. Our review followed a clear, pre-specified protocol,(4) and, by including articles in top general epidemiology journals, our review reflects current best practice. Furthermore, the analysis conducted for the current study is entirely reproducible as all data and code are available on GitHub: github.com/rheanna-mainzer/MI-scoping-review. This review has several limitations. Authors may have chosen not to provide details on all aspects of handling missing data that we examined, for example, due to strict journal word limits. However, all accompanying supplementary material was also reviewed and used for data extraction. Most of the data extraction was performed by a single reviewer (RM), with double data extraction performed for 10% of studies, so there may be some extraction errors. Also, it may have been useful to extract additional items or extract items in more detail to better capture the variety of analyses undertaken. However, additional notes on each paper were recorded and are available as part of the complete dataset on GitHub. Lastly, by limiting to five top general epidemiology journals, our results may not reflect papers published in other journals, but it seems unlikely that less highly regarded journals would exhibit higher standards in this area of practice.