2.2. Outcome
The two outcome variables considered for this study were the survival outcome, i.e., time from attendance date to death, measured in months, and the observed CD4 counts, measured in cells/mm3 of blood. WHO’s clinical stage of disease, the functional status of patients (ambulatory, bedridden or working), weight, smoking status, alcohol consumption status, drug use status and marital status were the covariates considered. Further details about the data are found in Temesgen et al [7].
2.3 Statistical analysis
Since HIV survival is known to be dependent on disease progression, such as measured by CD4 count [3, 7, 11], careful consideration of the statistical method used to relate the two outcome variables is important. A key characteristic of HIV disease progression is its dynamic nature: the rate of progression is not only different from patient to patient, but also dynamically changes in time for the same patient. Thus, the true potential of the CD4 count biomarker in describing disease progression and its association with survival can only be exploited when repeated measurements of CD4 count are considered in the analysis. The structure of the dependence between the two outcomes is not fully known, and may vary between populations, and even over time within a single patient. To address research questions involving characterisation of the association structures between repeated measures and event times, a class of statistical models has been developed known as joint models for longitudinal and time-to-event data [2, 12–14]. Briefly, a mixed effects model is proposed for the longitudinal biomarker observations, of a general form yi (t) = mi(t) + εi(t) where the mean mi(t) is the (linear) predictor comprising both fixed and random effects, and εi(t) is a normally distributed error term. Simultaneously, a standard survival model is posited for the time-to-event data, regressed both on covariates and on the mean mi(t) of the longitudinal process. Different forms of the regression on mi(t) are possible, including regressing only on the current value of the mean, regressing on both the current value and rate of change, or instead regressing on the random effects that are included in mi(t), for example. Given the random effects in mi(t), both the longitudinal and survival processes are assumed independent, as are the longitudinal responses of each individual. The random effects therefore account both for the association between the longitudinal and the survival outcomes and the correlation between the repeated measurements in the longitudinal process.
Extensions of joint models such as dynamic predictions and accuracy measures have also been implemented [10, 15, 16]. Dynamic prediction is a method for updating predictions ahead in time of both the longitudinal and survival processes, whenever a new measurement of the longitudinal biomarker is taken. Here three joint models of the evolution of the CD4 count process and HIV survival are fitted to the data in a Bayesian framework, each with a different association structure. Dynamic predictions are derived from each of the three models and are combined using Bayesian model averaging [10, 17]. The Bayesian approach to joint modelling [18, 19] was implemented using the JMbayes package in R version 1.2.5033. Full details on the formulation of these joint models and dynamic prediction can be found in Appendix I, but are briefly summarised below.
To capture the non-linear subject-specific evolution of CD4 counts (Figs. 1; 2), a flexible specification of the subject-specific square-root CD4 trajectories was adopted, using natural cubic splines of time. Specifically, the linear mixed model took the form:
yi(t) = mi(t) + εi(t) = β0+β1FNSi + β2Alcoholi + β3MSi + β4CSi + β5wti + β6B1(t) + β7B2(t) + β8B3(t) + bi0 + bi1 B1(t) + bi2 B2(t) bi3 B3(t) + εi(t)
where Bn(t) denotes the B-spline basis for a natural cubic spline. FNS, Alcohol, MS, CS, and wt are the variables functional status, alcohol intake, marital status, clinical stage and weight respectively. The β’s are fixed effects whereas the b's are random effects. For the survival process we consider three relative risk models, each positing a different association structure between the two processes, namely:
M1(t):h1(t) = h0(t)exp{γ1FNSi + γ2 Alcoholi + γ2 MSi + γ3CSi + γ4wti + α1mi(t)},
M2(t):h2(t) = h0(t)exp{γ1FNSi + γ2 Alcoholi + γ2 MSi + γ3CSi + γ4wti + α1mi(t) + α2mi(t)},
M3(t):h3(t) = h0(t)exp{γ1FNSi + γ2 Alcoholi + γ2 MSi + γ3CSi + γ4wti + α1bi0 + α2bi1 + α3bi2 + α4bi3}
where the baseline hazard h0(t) is approximated with splines (see Eq. (3) of Appendix I), mi(t) is the current true value of the CD4 count trajectory, mi(t) is the slope of the trajectories at time t (rate of change in CD4 count), γ are regression parameters of the survival model and are parameters describing the strength of the association between the CD4 count and survival processes.
2.4 Individualized dynamic predictions
Based on each joint model, prediction of survival probabilities and future CD4 counts for a new individual j who has a set of longitudinal square-root CD4 counts Yj(t) = {yjl (s); 0≤tjl ≤ t, l=1,....nj} and a vector of baseline covariates wj is required. For any time u > t, the focus of interest is in predicting both the conditional probability πj(u|t) that subject j will survive at least up to u and his/her predicted CD4 count at u. At each time of interest (e.g. a clinic visit time) t’,t < t < u, these predictions are dynamically updated, as extra information is recorded for the patient. That is, the prediction ωj(u|t) of the square-root CD4 count yj(u) that is based on the information available up to time t, can be updated at time t, to produce a new prediction
that uses the additional longitudinal information up to the latter time point t’. Under the Bayesian joint modelling framework, both predictions πj(u|t) and ωj(u|t) are based on the posterior predictive distribution, as given in Appendix I.
Standard model selection techniques for choosing between the three association structures may prefer different models, depending which model selection criteria is used[20, 21], particularly in contexts where different association structures may produce better predictions for different individuals at different time points. BMA [10, 16] explicitly takes into account model uncertainty by applying Bayesian inference to model selection. Each model is given a prior weight, in this case assuming each association structure is equally likely, and the resulting posterior model weights are used to average over the estimates. Here, following [10], instead of averaging estimates over the association structures, the predictions πj(u|t) and ωj(u|t) are averaged over the different association structures. This BMA approach can produce less risky predictions via a straightforward model choice criteria[22].