Design and setting of the study
The Surveillance, Epidemiology and End Results (SEER) Program Database was accessed and information on all osteosarcoma cases with complete data on status and survival time from 1970s to 2000s was captured. Cases who died or censored at time 0 were excluded from the analysis because of different processing strategies in general statistical softwares and potential risk of incorrect explanation. .
Covariates and outcome measures
Patients’ demographic covariates included age, sex, the year of diagnosis and race. According to age, the patients could be categorized into >60years and ≤60 years while in terms of race, they were into white/black and others. Data on tumor included tumor size, the extension of disease, and American Joint Committee on Cancer for staging (AJCC), all of which contained a type of ‘UNKNOW’. Tumor size was categorized into >100mm, <100nm and ‘UNKNOW’. Moreover, data on whether surgery, chemotherapy and/or radiation were performed were recorded, all of which were classified into “yes” and ‘no”. Patients’ outcome variables of interest referred to overall survival, including dead for all causes or alive and survival time in months.
Flowchart for statistical methods
A flowchart was designed to integrate conventional survival analyses and RMST-based analyses on right-censored data (figure 1) and it was divided into two steps: graphic description and group comparison, univariable and multivariable regressions.
Graphic description should be the first step to explore survival curves for both single group or multiple groups. Group comparison is suitable for a categorical covariate. Comparing distributions of two or more survival rates is performed by Log-rank test or Wilcoxon test. PH assumption is required for log-rank test, but not for Wilcoxon test[10]. Meanwhile, difference in areas under survival curve between 2 or more groups can be estimated with pre-specific truncation time,τ(RMST).
In univariable and multivariable regressions, we consider PH assumption as the first key step to select suitable regression. Compared with Cox regression, parametric survival regression required to test and pre-specify probability distribution of survival times, such as Weibull and Exponential distribution, meanwhile, time-dependent covariates (covariates changing over follow-up) and stratified analysis controlling for nuisance covariates, cannot be incorporated easily. If PH assumption was satisfied, Cox regression would be a preferred and robust method because of its attractive features. If not, two modified Cox regression (Cox regression with time-dependent covariates and stratified Cox regression) and parametric survival regression can be performed instead of standard Cox regression. In addition, RMST-based regression can be an alternative and promising method to deal with non-proportional hazard data, pseudo-value regression (PV) and inverse probability of censoring weighting regression (IPCW). Both of them assumes that censoring distribution is non-informative, where distribution of censoring time does not provide information for distribution of survival time.
In IPCW regression, censoring weights required to be estimated initially, which suggests censoring distribution must be correctly estimated [7]. IPCW method used Kaplan-Meier method to estimate the censoring distribution (modeling censoring mechanism) and conditional probability of having remained uncensored until time t is calculated as weight for each participant. Then, inverse of the estimated weights are included into iterative estimation of coefficients of IPCW regression. Use of Kaplan-Meier method assumes that all subjects share the same censoring distribution, namely homogeneity of censoring mechanism assumption. When a categorical covariate exists and censoring distribution in each group are not the same, it is appropriate to use group-specific weights where censoring mechanism keep homogenous in each group and weights were estimated using Kaplan-Meier technique for each group separately. Otherwise, biased weights will be used in coefficient estimation, resulting in incorrect conclusion. There are two methods in IPCW regression, IPCW regression with individual weights and IPCW regression with group-specified weights. If only one categorical variable is of interest and homogeneity of censoring mechanism is questionable, it would be more appropriate to adopt IPCW regression with group-specified weights.
Pseudo-values technique has been developed for direct regression modeling survival function, RSMT and competing risks based on right censored data[11]. In PV regression on RMST, jackknife leave-one-out estimation is employed to generate pseudo-values. Firstly, the area under the Kaplan-Meier curve up to time τ is estimated as θ ̂ on the complete data with n sample size, then remove ith subject and repeat the above estimation to get leave-one-out estimator as (θ^(-i) ) ̂. Finally, the ith pseudo-values of (θ_i ) ̂ can be calculated based on the difference between the complete sample and the leave-one-out estimator: (θ_i ) ̂=nθ ̂-(n-1)(θ^(-i) ) ̂. After calculating the pseudo-value for each subject, generalized estimating equation is used to model the effects of covariates on the RMST [8]. Since no weights are estimated from censoring distribution, PV method is not restricted by the homogeneity of censoring mechanism assumption.
PH assumption and homogeneity of censoring mechanism need to check before selecting suitable methods. There are three different methods to evaluate PH assumption: 1) graphical assessment (KM curves and ln (-ln(S(t))) vs ln(t) Curves); 2) add a time interaction, x*log(t), into COX regression and test its significance; 3) check global good of fitness by plotting and testing association between ranked survival time and Schoenfeld residuals[12, 13]. violations of the PH assumption suggests existence of interactions between one or more covariates and time. Homogeneity of censoring mechanism was evaluated applying Kaplan-Meier curves and log-rank test. Note that the event of interest was coded as censoring while others as event to draw Kaplan-Meier plot and conduct log-rank test. The curves showing substantial separation or small P value from log-rank test supported the violation of the homogeneity of censoring mechanism, which suggested censoring patterns of groups were different.
Statistical analysis
The characteristics of the participants were summarized for descriptive statistics. Categorical variables were described using frequencies and percentages, while continuous ones as means with SD or as medians with interquartile ranges when data were skewed. Statistical inference, including conventional survival analyses and RMST-based analyses, complied with the flowchart based on statistical knowledge (figure 1).
Graphics assessment was selected because large sample size may make hypothesis tests in aforementioned method 2 and method 3 be sensitive to little deviation from PH assumption. This assumption would be met when curves in Kaplan-Meier plot did not cross and curves in the log (-log (survival)) versus log of survival time graph paralleled with each other approximately.
For comparing results among these methods, all methods were conducted ignoring the two assumptions in group comparison and regressions. Theτwas specified utilizing the smallest value of the largest follow-up time across groups minus 5 months. Multiple comparison was employed in group comparison if there were more than 2 groups. Both univariable and multivariable regressions were conducted for Cox regressions and RMST-based regressions.τspecified in univariable regression was same as that obtained in group comparison. In multivariable analyses,τwas fixed at 480 months (24years).
To explore the relationship between HR and difference in RMST as well as the influence of PH assumption on such relationship, we performed data visualization and model fitting. Since HR was expressed as ratio while coefficients based on RMST as difference, a log transformed non-linear regression was considered when fitting the relationship between HR and difference in RMST (difference in RMST ~ loge(HR)).
All analyses were carried out using the SAS software for windows, version 9.4 TS1M6 (SAS Institute Inc, Cary, NC) with a 2-sided significance threshold of P < .05. Graphic description and group comparison were carried out by LIFETEST procedure, and regressions were realized by PHREG procedure, and RMSTREG procedure. Data visualization were conducted in Microsoft Excel 2016.