Background
A complete case logistic regression will give a biased estimate of the exposure odds ratio only if there is a multiplicative interaction between the exposure and outcome with respect to the probability of missingness – whereas linear regression with a continuous outcome is biased in more scenarios, including when only the outcome causes missingness. It is not clear whether a complete case logistic regression will give a biased estimate of the odds ratio if missingness depends on a continuous outcome but this outcome is dichotomised for the analysis – a common situation in epidemiology.
Methods
We investigated this using a simulation study and data from the Avon Longitudinal Study of Parents and Children (ALSPAC), a UK birth cohort. We also examined whether any bias could be reduced by including a proxy for the binary outcome as an auxiliary variable in multiple imputation.
Results
There was negligible bias in the exposure odds ratio when the probability of being a complete case was independently associated with the exposure and (continuous) outcome but important bias in the presence of an interaction, particularly at high levels of missing data. Inclusion of the proxy led to significant bias reductions when this had high sensitivity and specificity in relation to the study outcome.
Conclusions
The robustness of logistic regression to missing data is maintained even when the outcome is a binary version of a continuous outcome. Bias due to an interaction between the exposure and outcome in their effect on selection could be reduced by including proxies for the missing outcome as auxiliary variables in MI. If such proxies are available, we would recommend using MI over a complete case analysis because, in practice, it would be difficult to rule out an interaction.