As described above, high-throughput image features may be computed. However, not all the features were useful for a particular task. 461 were selected for the models using the variance threshold method. From there, we used the best K method to select 299 features and fifinally selected 63 optimal features using the LASSO algorithm (Fig. 2). Of 63 radiomics features 39 were texture analysis feature groups, 2 were shapes feature groups and 22 were first-order statistical feature groups. The model included all patients. In Fig 3-4, the ROC curve analysis for both training and test datasets is shown for differentiating between NTM pulmonary disease and PTB.
Tab. 2
ROC results with KNN, SVM and LR classifiers of training cohort
Classifiers
|
Category
|
AUC
|
95% CI
|
Sensitivity
|
Specificity
|
KNN
|
NTM
|
0.98
|
0.95 - 1.00
|
0.90
|
0.94
|
TB
|
0.98
|
0.95 - 1.00
|
0.94
|
0.90
|
SVM
|
NTM
|
0.99
|
0.96 - 1.00
|
0.94
|
0.93
|
TB
|
0.99
|
0.96 - 1.00
|
0.93
|
0.94
|
LR
|
NTM
|
0.98
|
0.97 - 1.00
|
0.96
|
0.97
|
TB
|
0.98
|
0.97 - 1.00
|
0.97
|
0.96
|
Tab. 3
ROC results with three classifiers of validation cohort
Classifiers
|
Category
|
AUC
|
95% CI
|
Sensitivity
|
Specificity
|
KNN
|
NTM
|
0.97
|
0.91 - 1.00
|
0.87
|
0.97
|
TB
|
0.97
|
0.91 - 1.00
|
0.97
|
0.87
|
SVM
|
NTM
|
0.96
|
0.88 - 1.00
|
0.80
|
0.97
|
TB
|
0.96
|
0.88 - 1.00
|
0.97
|
0.80
|
LR
|
NTM
|
0.95
|
0.86 - 1.00
|
0.88
|
0.87
|
TB
|
0.95
|
0.86 - 1.00
|
0.87
|
0.88
|
Three classifiers were used to analyze the characteristics of the radiomic AUC, the 95% CI, sensitivity, and specificity of the training and verification cohorts are shown in (Tab.2 and 3). In the training cohort, the AUC (95% confidence interval [CI]), sensitivity, and specificity of the KNN, SVM, and LR classifiers were 0.98 (0.95 to 1.00), 0.90, and 0.94; 0.99 (0.96 to 1.00), 0.94, and 0.93; and 0.98 (0.97 to 1.00), 0.96, and 0.97, respectively. Accordingly, in the validation cohort, the above evaluation indicators of these 3 classifiers were 0.97 (0.91 to 1.00), 0.87, and 0.97; 0.96 (0.88 to 1.00), 0.80, and 0.97; and 0.95 (0.86 to 1.00), 0.88, and 0.87, respectively. The results showed that irrespective of either the training cohort or the truth verification cohort, the AUC value was significantly high and greater than 0.90 in both instances.
Tab. 4
The results of four indicators -Precision, Recall, F1-score, Support in training cohort and validation cohort
Evaluation indicator
|
Training cohort
|
Validation cohort
|
KNN
|
SVM
|
LR
|
KNN
|
SVM
|
LR
|
TB
|
Precision
|
0.90
|
0.94
|
0.96
|
0.96
|
0.96
|
0.87
|
Recall
|
0.94
|
0.93
|
0.97
|
0.87
|
0.80
|
0.87
|
F1-score
|
0.92
|
0.94
|
0.96
|
0.91
|
0.91
|
0.87
|
Support
|
119
|
119
|
119
|
30
|
30
|
30
|
NTM
|
Precision
|
0.94
|
0.94
|
0.97
|
0.89
|
0.84
|
0.88
|
Recall
|
0.90
|
0.94
|
0.96
|
0.97
|
0.97
|
0.88
|
F1-score
|
0.92
|
0.94
|
0.96
|
0.93
|
0.90
|
0.88
|
Support
|
124
|
124
|
124
|
32
|
32
|
32
|
In the training cohort, the precision of the three models was greater than 0.90, the recall rate was greater than 0.90, the F1 score was greater than 0.92, and the support was 119 and 124 (Tab. 4). In the validation cohort, the precision of the three models was greater than 0.84 with a recall rate greater than 0.80, the F1 score was greater than 0.87, and the support was 30 and 32 (Tab. 4). Combining the Precision, recall and F1-scoresrevealed that KNN outperformed the other 2 classifiers in the recognition of NTM pulmonary diseas in patients. Detailed evaluation indicator information is shown in Table 4, and the ROC curves of the 3 classifiers are shown in Figure 3-4.
NTM refers to mycobacteria other than Mycobacterium tuberculosis complex and Mycobacterium leprae, which widely exist in water and soil. So far, 191 species have been found, but only a few can cause disease[18,24-26]. NTM is a conditional pathogen. People are usually infected from the environment, and water and soil are important vectors [7,25,26]. The incidence rate of NTM is increasing in some countries and regions [27-30]. The clinical symptoms and pathology of NTM infection are difficult to distinguish from PTB, and NTM is prone to natural drug resistance. The diagnosis of NTM pulmonary disease and PTB depends on etiological detection, but the steps are complex, which affects the follow-up clinical treatment to a certain extent[18]. When NTM infection is misdiagnosed as tuberculosis, the use of anti tuberculosis regimen for treatment will lead to the delay of treatment, the prolongation of the course of disease, poor prognosis, and may lead to treatment failure [31]. Therefore, it is urgent to find other simple and effective detection methods.
Conventional CT is one of the main detection methods of NTM pulmonary disease and PTB, but the CT manifestations of NTM pulmonary disease are complex, mostly manifested as consolidation, bronchiectasis, consolidation, bronchial dissemination and other signs, which is difficult to distinguish from PTB[32.32]. Koh et al. [34] found that bronchiolitis is easy to be found on the CT image of NTM lung disease, often involving more than 5 lobes, combined with consolidation, consolidation and bronchiectasis. The CT image shows bronchiectasis, mostly involving the middle lobe of the right lung and the tongue segment of the upper lobe of the left lung, and when combined with cavities and nodules, NTM pulmonary disease should be considered [35]. Some studies [36] found that the probability of NTM consolidation was lower than that of pulmonary tuberculosis consolidation (P < 0.05), which was statistically significant, but some studies [37] showed that consolidation was not significant for the differentiation of the two diseases. The above shows that it is still controversial to use conventional CT for identification.
The characteristic lesion of tuberculosis is necrotizing granulomatous inflammation. Microscopic necrosis of tuberculosis can be seen in a variety of forms, including basophilic necrosis, suppurative necrosis and powdered necrosis, which are rich in nuclear fragments. Coccidiosis and cryptococcosis can also cause the same necrotizing granuloma [38-41], which can also be seen in NTM pulmonary disease.It is reported in the literature [42], NTM pulmonary disease is more prone to suppurative necrosis than PTB, while pink necrosis and basophilic necrosis of PTB are more common than NTM pulmonary disease. NTM pulmonary disease has more giant or bizarre multinucleated giant cells than PTB [43]. The caseous necrosis in the focus of NTM pulmonary disease is less than that of PTB, and the aggregation of epithelial like cells tends to proliferative granuloma [15]. NTM pulmonary disease is prone to atypical lesions, which can only show tissue cell aggregation without granuloma, which is common in immunodeficient patients [44,45]. From the above pathology, it can be found that the consolidation of NTM pulmonary disease is less common than PTB, with low incidence, less granuloma, and more suppurative necrosis. Conventional CT images are difficult to show the subtle differences of these pathological features, resulting in a low level of differential diagnosis with pulmonary tuberculosis.
In recent years, radiomics has developed rapidly. It converts medical images into high-dimensional images, extracts data features through quantitative high-throughput, and then analyzes the data for decision support. At present, radiomics shows good sensitivity and predictive value in the screening of small pulmonary nodules, the diagnosis, treatment and prognosis of lung cancer [46-48], and also shows good potential in some common inflammatory lesions [49,50]. Some studies[16,24,51] have used deep learning to distinguish NTM pulmonary disease and PTB, and also used the cavity characteristics of radiomics to distinguish the two diseases. Consolidation features are also common CT features of NTM pulmonary disease and PTB. The CT data extracted by high-throughput radiology can reflect the internal differences of the pathological characteristics of the two to the greatest extent, which is consistent with the pathological manifestations, and makes up for the lack of CT images observed by the naked eye and the loss of key information.
In this study, the consolidations of NTM pulmonary disease and PTB were used as potential areas of interest for delineation. A total of 63 radiomics features were obtained from ROIs, of which 39 were texture analysis, 2 were shapes and 22 were first-order statistical feature groups. The study used three supervised learning classifiers (KNN, SVM, and LR) to analyze the extracted lung consolidation features. In the training cohort, the AUC values of the three models were all greater than 0.98, 95% CI were 0.95-1.00, the sensitivity was greater than 0.90, and the specificity was greater than 0.90. In the validation cohort, the AUC values of the six models were all greater than 0.95, 95% CI were 0.86-1.00, the sensitivity was greater than 0.80, and the specificity was greater than 0.80. The obtaidne AUC values of ROC curve were all high, and their sensitivity and specificity were greater than 0.80. Among the three classifiers. The study analyzed the characteristics of the three classifiers through four clinical indicators (accuracy, recall, f1 points, and support).In the training cohort, the precision of the three models was greater than 0.90, the recall rate was greater than 0.90, the F1 score was greater than 0.92, and the support were 119 and 124. In the validation cohort, the precision of the three models was greater than 0.84, the recall rate was greater than 0.80, the F1 score was greater than 0.87, and the support were 30 and 32. Furthermore, the KNN classifier has the highest precision, recall, and F1-score, which were 0.89, 0.97, 0.93 respectively in the validation cohort. The results showed that the radiomics features derived from the consolidations could help in differentiating between the NTM pulmonary disease and PTB. Although some studies have reported that the CT imaging characteristics of NTM pulmonary disease consolidations observed through traditional clinical methods differ from those of PTB (17,38), the results obtained by researchers' naked eye recognition may appear subjective. The results showed that the radiomics characteristics of consolidation were helpful to distinguish NTM pulmonary disease and PTB. Radiomics analysis of consolidation characteristics of lung diseases has the advantages of objectivity, quantification, stability and non empirical dependence. Therefore, it has important clinical application value. It can be seen that it is very promising to distinguish NTM pulmonary disease and PTB by using radiomics characteristics. Therefore, this technique is a very important, noninvasive and simple method in the identification of these two diseases. Especially for the resource deficient medical system in developing countries, the early diagnosis of NTM pulmonary disease can improve the quality of life of patients and promote the treatment of the disease [52,53].
The study contains certain limitations that should be addressed. Firstly, in order to ensure the homogeneity of the image, 5mm thick image is used in the research process. Compared with thin-layer images, some information may be lost. Secondly, the sample size was small. Larger multicenter studies with larger sample sizes will be required to validate and expand on our results. Third, ROI segmentation was performed manually, which may have been affected by subjective bias. Last, we only take real change as the research goal and ignore other characteristics, which may lead to incomplete information. The above deficiencies will be the focus of future research.
Conclusively, this study shows that the radiomics features based on CT imaging are effective in identifying NTM pulmonary disease consolidation and PTB consolidation. KNN classifier outperformed the other 2 classifiers in the recognition of NTM pulmonary diseas in patients.