LNM status is crucial for the treatment of early-stage NSCLC. To date, lobectomy plus systematic lymph node dissection is the standard management to achieve low recurrence rate and prolong survival [3, 25]. However, compared with selective LND or lymph node sampling, systematic LND could be more likely to cause a series of postoperative complications [26, 27]. On other occasions, sublobar resection including segmentectomy and wedge resection has been recommended for early-stage NSCLC patients, which showed similar survival outcome as lobectomy [28, 29] and could also preserve more lung function. However, the sublevel surgery as selective LND and sublobar resection could more possibly lead to tumor residual and thus a poor prognosis if LNM occurred. Moreover, occult LNM makes the situation more complicated. It has been estimated that the occurrence rate of OLNM could be between 10.8–17.2% among stage I lung cancer [30–32]. Patients with LNM might mistakenly undergo sublevel surgery, leading to a poor prognosis. For these patients, salvage management might be necessary. Therefore, more efforts should be given to accurately predict the LNM status during or after the operation.
Previous studies have revealed some possible predictive factors for LNM in NSCLC. Yu et al reported several independent risk factors including tumor size, pleural invasion, and carcinoembryonic antigen [33]. Pani et al found that histologic subtypes could be related to lymph node status [34]. Another similar study suggested different lymph node dissection strategy for different combination of various clinicopathological features and CEA concentration and albumin level [35]. These studies used uni- and multivariate analysis to reveal clinicopathological predictors for different LNM patterns. Our study, however, innovatively adopts ML algorithms to predict LNM by incorporating a large series of clinicopathological features. Among the predictive models, we found that RFC, GBDT, XGB, ANN all achieved AUC higher than 0.9, which was similar with LR model. However, in the decision curve, LR performed better than others at threshold < 0.28, while RFC performed the best at most points of thresholds ≥ 0.28 and always kept a stably high net benefit. It is noteworthy that all models performed significantly better than treat-all and treat-none lines, indicating our models had clinical practice values and patients could gain more benefits if corresponding managements were conducted according to the predictive outcome of these models.
Furthermore, based on four potential models we identified with great performance in both ROC and decision curves, the top ten variables were found out, including solid component, CEA, pleural invasion, tumor imaging density, LVI, micropapillary component, histological subtype, acinar component, lepidic component and gender. In addition to CEA and imaging density that have been reported by previous studies [4, 5], many histological features were also strongly related to the occurrence of LNM. Besides pleural invasion and LVI, histological details of growth such as the presence of solid, micropapillary and acinar components indicated high risk for LNM, while the presence of lepidic component could indicate LNM-free disease. In fact, these variables are conventionally not included in intraoperative pathology report. Our study emphasizes the importance of these histological features in the prediction of lymph node status. Thus, intraoperative pathology may be considered to include more detailed information about adenocarcinomas to further evaluate LNM risk, especially for patients who are hard to decide between lobectomy and sublobar resection. Importantly, the risk evaluation of LNM after surgery might be necessary for early-stage adenocarcinoma patients. For those who received sublobar resection or sublevel LND, the salvage management and close follow-up could be required if a high risk for LNM was observed based on our ML model.
In recent years, predicting metastasis with machine learning algorithms, as a promising alternative for other invasive or noninvasive diagnostic method, has been proven to be feasible in lung adenocarcinoma and colorectal cancer [11, 12]. These studies predicted on CT image and histologic evidence and obtained satisfying results. However, considered the sample size in the two study was not large, the validity of machine learning prediction needs to be further confirmed on a larger NSCLC patient population. Another methodological problem remained to be further explained is that the false-positive and false-negative rate need to be low enough to achieve good clinical utility. High AUC in ROC represents high predictive accuracy but does not necessary prove good clinical utility, because false-positive or false-negative results could reduce net benefit [36]. To seek for a model that has high predictive accuracy and net benefit, we adopted DCA which has been widely proven to be efficiently and interpretable in the evaluation of clinical utility [37]. From the decision curve, it was clear that RFC has the highest net benefit across the longest stable range of clinically reasonable preferences.
To further enhance the clinical usefulness of our study, a dynamic application of RFC model with 5 clinicopathological variables introduced was developed. So, clinicians and patients worldwide can benefit from our study and evaluate the LNM risk easily. The node-positive patients could be precisely identified by the RFC application (sensitivity: 87.5%; specificity: 82.2%; Fig. 4).
This study is not without limitation. The nature of retrospective analysis inevitably causes data acquisition bias. Additionally, the enrolled patients are from a single center and share an ethnicity. Future study is expected to validate the predictive performance of RFC model and more possible clinicopathological variables in a multicenter population.