The diverse expression patterns of serum miRNAs between lung cancer and non-cancer samples
In this study, we initially performed Uniform Manifold Approximation and Projection (UMAP) analysis on miRNA expression profiles of lung cancer and non-cancer samples in four datasets. Figure 1 illustrates that lung cancer and non-cancer samples are clearly segregated in distinct regions within the LC1591 and LC972 datasets. In LC4046 and LC3924, although some non-cancer samples were mixed with lung cancer samples, it was evident that the lung cancer and the majority of non-cancer samples were distributed in different regions. These findings indicate that there are significant differences in serum miRNA expression profiles between lung cancer and non-cancer samples. Consequently, it is feasible to develop signatures for the discrimination of lung cancer from non-cancer samples based on serum miRNAs.
The signature to discriminate lung cancer from non-cancer
As illustrated in Fig. 2, a total of 90 lung cancer and 141 non-cancer samples were extracted from the training dataset by pooling LC1591 and LC972. A total of 848,487 and 1,325,596 miRNA pairs with identical REOs patterns were identified in more than 95% of lung cancer and non-cancer samples, respectively. Subsequently, 466 miRNA pairs exhibiting reversal REOs between lung cancer and non-cancer samples were identified. A set of two miRNA pairs was obtained that achieved the highest F-score based on these 466 miRNA pairs (detail in methods). The two miRNA pairs (Table 2), along with their REO patterns, denoted as LC-MPS2, were selected as the signature for the diagnosis of lung cancer .
Table 2
The two miRNA pairs in LC-MPS2.
signatures
|
miRNA pairs
|
miRNA 1
|
miRNA 2
|
LC-MPS2
|
miRNA pairs1
|
hsa-miR-1290
|
hsa-miR-6800-5p
|
|
miRNA pairs2
|
hsa-miR-1343-3p
|
hsa-miR-3184-5p
|
LC-MPS6
|
miRNA pairs1
|
hsa-miR-1343-3p
|
hsa-miR-6799-5p
|
|
miRNA pairs2
|
hsa-miR-1343-3p
|
hsa-miR-6756-5p
|
|
miRNA pairs3
|
hsa-miR-1343-3p
|
hsa-miR-6879-5p
|
|
miRNA pairs4
|
hsa-miR-6746-5p
|
hsa-miR-6741-5p
|
|
miRNA pairs5
|
hsa-miR-6746-5p
|
hsa-miR-197-5p
|
|
miRNA pairs6
|
hsa-miR-5006-5p
|
hsa-miR-6831-5p
|
Note: a sample was considered to be lung cancer if at least half of the miRNA pairs in the signature had the specific relative expression order (miRNA 1 > miRNA 2). Otherwise, the sample was considered to be non-lung cancer. |
The sensitivity, specificity, accuracy and area under the receiver operating characteristic curve (AUC) of LC-MPS2 were all 100% in the training datasets (Fig. 3A). In the independent validation dataset LC4046, which included 115 lung cancer samples and 2759 non-cancer control samples, the sensitivity, specificity, accuracy and AUC were 99.7%, 100.0%, 99.8% and 99.9%, respectively (Fig. 3B). Similarly, in the independent validation set LC3924, which included 1566 lung cancers (excluding 180 postoperative samples) and 2178 non-cancer control samples, the sensitivity, specificity, accuracy and AUC were 99.7%, 99.3%, 99.5% and 99.7% (Fig. 3C). The results demonstrated that LC-MPS2 is an effective tool for differentiating between lung cancer and non-cancer patients.
Validation of LC-MPS2 in different types of lung cancer samples
Furthermore, the recognition of LC-MPS2 in different pathological stages, T stages, N stages, M stages and histological subtype of lung cancer samples was analysed using 1356 lung cancer samples with pathological stage and histological subtype information from the LC3924.The results demonstrated that LC-MPS2 identified more than 99% of patients with IA and IB lung cancer. For patients with IIA and IIB disease, the accuracy was 97.9% and 98.4%, respectively. Furthermore, 100% accuracy was achieved for patients with IIIA, IIIB, and IV disease (Fig. 4A). For patients with different T, N and M stages of lung cancer, LC-MPS2 identified T1a, T1b, T2a, T2b, T3, and T4 stage patients with an accuracy of 99.8%, 99.3%, 99.3%, 100%, 98.9%, and 100% (Fig. 4B). The accuracy of identifying patients with stage N0, N1 and N2 was 99.6%, 98.2% and 100%, respectively (Fig. 4C). Similarly, the accuracy of identifying patients with stage M0 and M1 disease was 99.3% and 100%, respectively (Fig. 4D). These findings demonstrate that LC-MPS2 has a robust predictive capacity for lung cancer patients at various pathological stages.
For lung cancer patients with different tissue subtypes, LC-MPS2 can accurately identify lung adenocarcinoma (ADC), small cell lung cancer (SCLC), squamous cell carcinoma (SQCC) and other (mixed subtypes other than the first three) lung cancers with accuracies of 99.2%, 100%, 99.0% and 100%, respectively. (Fig. 4E). Notebly, when LC-MPS2 was used to identify lung cancer in 180 samples from patients who had undergone surgical resection, only one sample (0.6%) was identified as lung cancer (Fig. 4F). This provides further evidence that LC-MPS2 can accurately identify lung cancer samples from non-cancer samples.
The performance of LC-MPS2 in non-lung other cancer tumor samples
A further analysis of the performance of LC-MPS2 was performed using different types of cancer samples from the pooled data of LC1591, LC972 and LC4046. The results showed that LC-MPS2 would identify 99.5% of lung cancers, 99.1% of bladder cancers, 96.7% of biliary tract cancers, 85.4% of colorectal cancers, 91.6% of esophageal squamous cell cancers, 100% of gastric cancers, 83.3% of gliomas, 88.3% of liver cancers, 85.5% of ovarian cancers, 85.9% of pancreatic cancers, 77.3% of prostate cancers and 71.2%of sarcomas as lung cancers (Fig. 4G). Only all breast cancers were identified as non-lung cancers. Similar results were seen in each of the three datasets (Supplementary Fig. S1). This suggests that LC-MPS2 has certain pan-cancer properties.
The signature to discriminate lung cancer from other types of cancer
Furthermore, we intend to construct a lung cancer specific signature that can distinguish lung cancer from other types of cancer samples. As shown in Fig. 2, 90 lung cancer and 2332 other types of cancer samples (including 12 different types, details in Supplementary Table S1) from the training dataset of LC1591 and LC972 were used. A total of 1,448,193 and 1,279,775 miRNA pairs with identical REOs patterns were identified in more than 75% of lung cancer and other cancer samples, respectively. Subsequently, 13 miRNA pairs with inverted REOs were identified between lung cancer and other cancer samples. Based on the 13 miRNA pairs, we obtained a set of six miRNA pairs that achieved the highest F-score using the majority voting rule (detail in methods). The six miRNA pairs (Table 2) together with their REO patterns, designated LC-MPS6, were selected as a new signature for lung cancer diagnosis.
In the training set (pooled data from LC1591 and LC972), the sensitivity, specificity, accuracy and AUC of LC-MPS6 were 90.9%, 91.1%, 90.9% and 95%, respectively (Fig. 5A). In the independent validation dataset LC4046, which included 115 lung cancer samples and 1172 other cancer samples, the sensitivity, specificity, accuracy and AUC were 80.2%, 93.9%, 81.4% and 89.0%, respectively (Fig. 5B). These results demonstrated that LC-MPS6 is an effective signature for the identification of lung cancer and other cancer samples.
The performance of the LC-MPS6 was further analyzed in each of the 12 types of tumor samples. The results demonstrated that LC-MPS6 could accurately identify 91.1% of lung cancers, 98.6% of bladder cancers, 100% of breast cancers, 71.1% of biliary tract cancers, 68.9% of colorectal cancers, 98.9% of esophageal squamous cell cancers, 13.3% of gastric cancers, 96.7% of gliomas, 70.0% of liver cancers, 100% of ovarian cancers, 97.8% of pancreatic cancers, 96.4% of prostate cancers and 97.8% of sarcoma cancers as non-lung cancer samples in the train data (Fig. 6A). Even for non-cancer samples included in the training data, LC-MPS6 was able to discriminate them from lung cancer with 100% accuracy. In LC4046, LC-MPS6 was able to accurately identify 99.9% of non-cancer, 93.9% of lung cancer, 100% of breast cancer, 15.7% of colorectal cancer, 97.7% of esophageal squamous cell cancer, 8.7% of gastric cancer, 72.8% of liver cancer, 99.8% of ovarian cancer, 100% of pancreatic cancer and 95.7% of sarcoma cancer as non-lung cancer samples (Fig. 6B). Similar results were observed in the pooled datasets of the training data and LC4046 (Supplementary Fig. S2).