Data source and processing
Initially, we obtained a total of 14143 lncRNA expression and 19659 gene expression profiles from 1053 breast cancer samples. In addition, the corresponding clinical data of 986 patients were downloaded from TCGA. The immune-related gene list downloaded from the ImmPort database contained 1534 immune related genes (Table S1). We then obtained 937 immune-related lncRNAs through co-expression analysis of the immune gene list (|correlation coefficient|>0.4, p<0.001)(Table S2). The top 10 positively and negatively correlated lncRNAs are shown sorted by correlation coefficient in Table 1.
immuneGene
|
lncRNA
|
correlation coefficient
|
pvalue
|
regulation
|
CD19
|
AC243960.1
|
0.931523591
|
0
|
positive
|
CD79B
|
AC243960.1
|
0.921700762
|
0
|
positive
|
TNFRSF13C
|
LINC00926
|
0.91156984
|
0
|
positive
|
CD3D
|
AC004585.1
|
0.906199227
|
0
|
positive
|
CD19
|
LINC00926
|
0.902280178
|
0
|
positive
|
LCK
|
AC004585.1
|
0.901867543
|
0
|
positive
|
PTPRC
|
AL365361.1
|
0.892830584
|
0
|
positive
|
ZAP70
|
AC243960.1
|
0.892093262
|
0
|
positive
|
CD3E
|
AC004585.1
|
0.889534261
|
0
|
positive
|
CD48
|
LINC01857
|
0.888856061
|
0
|
positive
|
UBXN1
|
OIP5-AS1
|
-0.4915959
|
3.67E-65
|
negative
|
NFKBIB
|
OIP5-AS1
|
-0.476266029
|
1.00E-60
|
negative
|
NFATC3
|
SPINT1-AS1
|
-0.468085147
|
1.89E-58
|
negative
|
NCK2
|
AC008771.1
|
-0.467026674
|
3.69E-58
|
negative
|
IGF2R
|
AC073896.4
|
-0.46300298
|
4.58E-57
|
negative
|
UBR1
|
AP001505.1
|
-0.449374123
|
1.81E-53
|
negative
|
CBL
|
SPINT1-AS1
|
-0.448571697
|
2.91E-53
|
negative
|
PSMC3
|
AL122035.1
|
-0.448372597
|
3.28E-53
|
negative
|
IFNAR2
|
AC008771.1
|
-0.446906674
|
7.78E-53
|
negative
|
HSPA8
|
AC108673.3
|
-0.443217552
|
6.75E-52
|
negative
|
1 represents sorted by correlation coefficient
|
Table 1
Top 10 positive/negative immune-related lncRNAs1
Identification of immune-related lncRNAs and construction of the prognostic model
The data of breast cancer patients was allocated randomly to the training and validation cohort, 494 patient samples in the training cohort, 492 patient samples in the validation cohort. We carried out univariate Cox regression analysis on the expression profiles of the lncRNAs in the training set and obtained 15 candidate immune-related lncRNAs, significantly associated with survival, p<0.01(Figure 1a,Table2). We performed Lasso regression on these lncRNAs. In order to avoid over fitting of the predicted signal, the prediction accuracy was estimated by tenfold cross-validation (Figure 1b,c). A total of 8 immune-related lncRNAs were obtained, including OTUD6B-AS1, AL122010.1, AC136475.2, AL161646.1, AC245297.3, LINC00578, LINC01871, AP000442.2 (Figure 1d). The risk score for each sample was calculated based on the expression levels of these 8 lncRNAs. Risk score = (0.53* OTUD6B) + (- 0.91* AL122010.1) + (- 0.52* AC136475.2)+(0.20* AL161646.1)+(- 0.34* AC245297.3) + (0.24* LINC00578) + (- 0.42* LINC01871) + (- 1.10* AP000442.2) (Table3).
ID
|
HR
|
HR.95L
|
HR.95H
|
pvalue
|
OTUD6B-AS1
|
2.153122116
|
1.373184702
|
3.376046091
|
0.000832119
|
SNHG10
|
0.464651236
|
0.261933882
|
0.824256754
|
0.008771445
|
AC010226.1
|
0.243245961
|
0.087245203
|
0.678187397
|
0.00688694
|
AL122010.1
|
0.330788811
|
0.174928898
|
0.625518362
|
0.000665713
|
AC136475.2
|
0.473639558
|
0.291879867
|
0.768584807
|
0.002481537
|
AL161646.1
|
1.409383671
|
1.132997857
|
1.753191605
|
0.00206214
|
TOLLIP-AS1
|
0.475597922
|
0.283403961
|
0.798130634
|
0.004898673
|
ST7-AS1
|
0.276461931
|
0.123420958
|
0.619272451
|
0.001780562
|
FLJ42351
|
0.293301101
|
0.118560825
|
0.725581453
|
0.007952261
|
AC245297.3
|
0.46968286
|
0.300849229
|
0.733264265
|
0.000884015
|
Z68871.1
|
2.553691945
|
1.308915791
|
4.982247593
|
0.005970224
|
LINC00578
|
1.358183537
|
1.086209076
|
1.698257324
|
0.007246749
|
LINC01871
|
0.604154147
|
0.418198337
|
0.87279695
|
0.007256751
|
AP000442.2
|
0.168810321
|
0.050411344
|
0.565287931
|
0.003913707
|
AC147651.3
|
0.356585393
|
0.172081652
|
0.738911677
|
0.005538818
|
Table 2
Univariate Cox analysis for overall survival of 15 immune-related LncRNAs in training set.
ID
|
correlation coefficient
|
HR
|
HR.95L
|
HR.95H
|
p value
|
OTUD6B-AS1
|
0.529692531
|
1.69841002
|
1.065676476
|
2.706822063
|
0.025916888
|
AL122010.1
|
-0.905807602
|
0.404215308
|
0.209034782
|
0.781640328
|
0.007098656
|
AC136475.2
|
-0.517003209
|
0.596304874
|
0.368979652
|
0.963683231
|
0.034771353
|
AL161646.1
|
0.195137911
|
1.215478602
|
0.953505844
|
1.549427559
|
0.115127418
|
AC245297.3
|
-0.342477733
|
0.710008929
|
0.449757063
|
1.120855504
|
0.141510647
|
LINC00578
|
0.238157741
|
1.268909336
|
1.000738329
|
1.608942972
|
0.049292045
|
LINC01871
|
-0.419944116
|
0.657083539
|
0.447993222
|
0.963761851
|
0.031647378
|
AP000442.2
|
-1.093407536
|
0.335072774
|
0.100718288
|
1.114730656
|
0.074608365
|
Table 3
Construction of 8 immune-related lncRNAs prognostic signature.
The immune-related lncRNA model is a robust prognostic tool for breast cancer
Breast cancer patients were divided into low- and high-risk groups according to the median risk score in the training set. Figure 2a presents the result of the Kaplan-Meier test. The p value of the log-rank test was 1.215e−06, indicating that patients in the low-risk group had a 10 year longer median OS compared with the high-risk group. To assess the accuracy of the prognostic model, further examinations in both the testing set and the whole set were performed with the same algorithm cutoff. Both sets yielded similar results. The low-risk group exhibited remarkably better overall survival (OS) than the high-risk group, which indicated that the prognostic signature was effective (p = 0.0069 in the validation set; p = 1.233e−07 in the total set) (Figure 2b and 2c).
The ROC curve analysis of the model in the training set demonstrated its promising predictive value for breast cancer survival (1-year AUC = 0.812, 5-year AUC = 0.772, 8-year AUC = 0.81, 10-year AUC = 0.857, Figure 2d). We then validated the model in the testing set, and the 1-year AUC was 0.615, the 5-year AUC was 0.599, the 8-year AUC was 0.68, and the 10-year AUC was 0.655 (Figure 2e). As for the total cohort, the 1-year AUC was 0.725, the 5-year AUC was 0.678, the 8-year AUC was 0.742,and the 10-year AUC was 0.741 (Figure 2f).
Principal component analysis (PCA) of the training, testing, and total breast cancer cohort demonstrated a different distribution pattern between the high- and low-risk groups, based on the expression of the 8 immune-related lncRNAs. This was indicative of the difference between the immune phenotypes of the groups (Figure S1).
Assessment of the correlation between candidate lncRNAs and clinicopathological characteristics
We generated risk curves and scatter plots to display the risk score and survival status of each breast cancer patient, not only in the training set, but also in the testing and in the total set. The risk coefficient and mortality in the low-risk group were lower than those in the high-risk group (Figure 3a-f).
Tumors with high prognostic scores expressed high-risk immune-related lncRNAs, whereas tumors with low prognostic scores expressed protective immune-related lncRNAs.The heatmap revealed that OTUD6B−AS1, AL161646.1, and LINC00578werehighly expressed in the low-risk group, while AC136475.2, AL122010.1, LINC01871, and AP000442.2 were highly expressed in the high-risk group (Figure 3g-i).
Further, in the overall sample, we analyzed the relationship between the expression of the 8 candidate lncRNAs and different clinicopathological factors (such as T, N, molecular typing, etc.). Our results confirmed that differential expression of AC136475.2 (p<0.01), AL122010.1 (p<0.001), AL161646.1 (p<0.05), LINC01871(p<0.01) could be observed among different T grades (Figure 4a). The differences in expression for AC136475.2 (p<0.05), AL161646.1 (p<0.01), and OTUD6B−AS1 (p<0.05) were statistically significant between different N groups(Figure 4b). Except for AP000442.2 and OTUD6B−AS1, lncRNAs, including AC136475.2 (p<0.001), AC245297.3 (p<0.001), AL122010.1 (p<0.001), AL161646.1 (p<0.001), LINC00578 (p<0.001), and LINC01871 (p<0.001),were differentially expressed between different breast cancer molecular types(Figure 4c).
Evaluation of the immune-related lncRNA signature as an independent prognostic factor in patients with breast cancer
We carried out univariate and multivariate Cox regression analyses to verify that the model could serve as an independent prognostic factor for breast cancer, also accounting for certain clinicopathological variables (such as age, ER status, PR status, AJCC T stage, etc.) (Figure 5). The univariate Cox analysis revealed that the high-risk group was significantly correlated with shorter survival in the training set (HR = 1.483; 95% CI 1.273–1.729,p<0.001), validation set (HR = 1.147; 95% CI 1.012−1.301, p = 0.032), and whole set (HR = 1.220; 95% CI 1.128−1.318, p<0.001). Multivariate Cox regression analyses of the above mentioned factors indicated that the immune-related lncRNA model was a reliable and independent prognostic factor for OS in the training set (HR = 1.432; 95% CI 1.204−1.702, p<0.001), validation set (HR = 1.162; 95% CI 1.004−1.345, p = 0.044), and whole set (HR = 1.240; 95% CI 1.128−1.362, p<0.001). In the whole set, multivariate analysis revealed that age (HR = 1.040; 95% CI 1.013−1.067, p = 0.003) and PR status (HR = 0.401; 95% CI 0.173−0.931, p = 0.034) were independent prognostic factors for OS.
Gene set enrichment analysis for functional annotation of the immune-related risk signature
GSEA of the risk signature was performed using the GSEA software. The results revealed that immune-related responses were further enriched in low-risk groups compared to high-risk groups. We demonstrated 5 immune-related gene ontology terms in the GSEA results with FDR<0.25 (Figure 6), including the positive regulation of immune effector processes, positive regulation of the adaptive immune response, positive regulation of lymphocyte activation, regulation of T cell activation, and the T cell receptor signaling pathway.