Gene expression profiles of 796 nasal and 1,673 bronchial brushing samples were generated using bulk RNA-seq from 4 cohorts, DECAMP13 (Detection of Early Lung Cancer Among Military Personnel), AEGIS14 (Airway Epithelial Gene expression in the Diagnosis of Lung Cancer), BCLHS15 (British Columbia Lung Health Study and Pan-Canadian Lung Health Study), and PCA16 (Pre-Cancer Atlas) (Table 1). Study participants were current or former smokers undergoing bronchoscopy as part of either a diagnostic workup or a screening process for lung cancer. Clinical data collected on samples included smoking status, sex, age, forced expiratory volume during the first second (FEV1) percent Predicted, and, in the AEGIS study, lung cancer status (Supplemental Tables 1). Notably, we observed different patterns of association with smoking between ACE2 and TMPRSS2 gene expression between the nasal and bronchial epithelium. ACE2 was down-regulated in the nasal epithelium of current smokers in 1 out of 2 cohorts (p < 0.01) but strongly up-regulated in the bronchus in all four cohorts (p < 0.01; Figs. 1a and 1b; Supplemental Fig. 1a and 1b). TMPRSS2 expression was not associated with smoking in the nose but its expression was up-regulated in current smokers in the bronchus in 3 out of 4 cohorts (p < 0.001). CTSL expression was down-regulated in current smokers in the nasal epithelium in 1 out of 2 cohorts (p < 0.01) and the bronchial epithelium in 3 out of 4 cohorts (p < 0.05). These genes were not strongly associated with other clinical variables (sex, age, FEV1% Predicted) across cohorts in either the nasal or bronchial samples in contrast to some previous studies5,8,11,12. Adjusting for lung cancer diagnosis in the AEGIS cohort did not change these observed correlations, although positive lung cancer status negatively correlated with ACE2 expression in the nasal epithelium (p<0.05, Supplemental Fig. 1b). Upon examining paired nasal and bronchial samples from the same subjects, a lack of correlation in the expression of each VE gene was observed between the two tissues (p > 0.05; Fig. 1c; Supplemental Fig. 1c, Supplemental Table 3), suggesting that the biological processes that influence the expression of these genes may differ across airway sites.
Table 1
Summary of Patient Demographics across Sample Cohorts
|
DECAMP Nose
|
AEGIS Nose
|
DECAMP Bronchus
|
AEGIS Bronchus
|
BCLHS Bronchus
|
PCA Bronchus
|
Total number of samples/cells
|
288
|
505
|
360
|
938
|
238
|
137
|
Smoking Status (%)
|
|
|
|
|
|
|
Current
|
125 (43.40)
|
186 (36.83)
|
153 (42.5)
|
442 (47.12)
|
99 (41.6)
|
77 (56.2)
|
Former
|
156 (54.17)
|
319 (63.17)
|
202 (56.11)
|
496 (52.88)
|
139 (58.4)
|
60 (43.8)
|
Never
|
0
|
0
|
0
|
0
|
0
|
0
|
Missing
|
7 (2.43)
|
0
|
5 (1.39)
|
0
|
0
|
0
|
Sex, n (%)
|
|
|
|
|
|
|
Male
|
219 (76.04)
|
317 (62.78)
|
282 (78.33)
|
584 (62.26)
|
135 (56.72)
|
70 (51.09)
|
Female
|
69 (23.96)
|
188 (37.23)
|
78 (21.67)
|
354 (27.74)
|
103 (43.28)
|
67 (48.91)
|
Missing
|
0
|
0
|
0
|
0
|
0
|
0
|
Age
|
|
|
|
|
|
|
Mean (SD)
|
67 (8)
|
60 (11)
|
65 (7)
|
63 (11)
|
65 (6)
|
58 (7)
|
Missing
|
0
|
0
|
0
|
0
|
0
|
0
|
FEV1 (% predicted)
|
|
|
|
|
|
|
Mean (SD)
|
75.32 (19.15)
|
73.00 (21.28)
|
72.71 (20.26)
|
68.73 (22.18)
|
80.86 (20.68)
|
74.32 (20.72)
|
Missing
|
28
|
355
|
0
|
633
|
0
|
4
|
Having Lung Cancer (%)
|
|
|
|
|
|
|
Yes
|
NA
|
309 (61.19)
|
NA
|
710 (75.69)
|
NA
|
NA
|
No
|
NA
|
196 (38.81)
|
NA
|
228 (24.31)
|
NA
|
NA
|
Missing
|
NA
|
0
|
NA
|
0
|
NA
|
NA
|
Table 1. Summary of Patient Demographics. Definition of abbreviations: DECAMP = Detection of Early Lung Cancer Among Military Personnel, AEGIS = Airway Epithelial Gene expression in the Diagnosis of Lung Cancer, BCLHS = British Columbia Lung Health Study and Pan-Canadian Lung Health Study, PCA = Pre-Cancer Atlas. Values are presented as mean (SD). |
To further explore the biological pathways associated with VE genes in each site of the airway, we developed co-expression networks using Weighted Gene Correlation Network Analysis (WGCNA)17. We identified 58 and 48 gene modules within the nasal and bronchial cohorts, respectively (Fig. 2a and 2b, Supplemental Fig. 2; Supplemental Table 2). The consensus module eigengenes containing the VE genes significantly correlated with their respective VE gene expression within each cohort (Supplemental Fig. 3a, Pearson correlation, p < 2.2E-16, average R = 0.584), suggesting that the eigengene can be used as a surrogate for each VE gene. The VE gene module eigengenes showed similar associations with the clinical covariates (Supplemental Fig. 3b) to those presented in Figure 1 to those presented in Figure 1. The nasal VE gene module eigengenes were not highly correlated, however, the ACE2- and TMPRSS2-associated module eigengenes in the bronchus were highly correlated across all four datasets (Supplemental Figs. 2a and 2b), suggesting that similar biological processes control these two gene modules in the bronchial compartment.
Comparing biological pathways enriched among VE gene modules between the nose and bronchus revealed (Fig. 2c; Supplemental Table 3) that genes of the nasal ACE2 module were enriched in the inflammatory response, interferon-alpha signaling, and interferon-gamma signaling pathways while the bronchial ACE2 gene module was enriched in genes associated with protein secretion and androgen response. Genes of the TMPRSS2 module in the nose were enriched in estrogen response and KRAS signaling pathways while the module in the bronchial epithelium was enriched in MTORC1 and TNFα-NFκB signaling pathways. The lack of shared biological pathways mirrors the small number of genes shared by the nasal and bronchial modules for ACE2 (n = 5) and TMPRSS2 (n = 12). In contrast, the nasal and bronchial CTSL modules shared 177 genes (Fisher’s exact test, p = 2.23E-10) and were both enriched in genes related to inflammatory response and TNF-alpha signaling pathways. Furthermore, in the bronchus, ACE2 and TMPRSS2-associated modules share enrichment in genes involved in the p53 pathway, adipogenesis, androgen response, MTORC1 signaling, and estrogen response (Supplemental Fig. 3c). We also did not observe significant correlations between the eigengenes of each VE gene module in the paired nasal and bronchial samples (Fig. 2d and Supplemental Fig. 3d). The VE gene module and single-gene analyses are consistent and suggest that there are important differences between VE gene expression and biology between the upper and lower airways.
To determine if different cell populations contribute to the difference in VE gene expression between airway compartments, we leveraged single-cell RNA sequencing (scRNA-seq) data to characterize the expression patterns of individual VE genes and VE gene modules across major epithelial and immune cell types. We profiled 34,833 cells from 9 nasal brushings (4 collected from two volunteers and 5 from 5 patient undergoing lung cancer screening) and 2,075 cells from 17 bronchial brushings from patients undergoing bronchoscopy for suspicion of lung cancer (Supplemental Tables 6, 7). A total of 17 and 15 transcriptionally distinct cell clusters were identified in the nose and bronchus, respectively, many of which expressed similar marker genes: KRT5, KRT15 (basal cells); FOXJ1, C20orf85, CDC20B (ciliated cells); SCGB1A1, SERPINB3 (club cells); MUC5AC, TFF1, TFF3 (goblet cells); FOXI1, CFTR (ionocytes); CD3D, CD8A (T cells). We also identified two bronchial goblet-like secretory populations: one characterized by high expression of CEACAM5 but not MUC5AC, previously described as peri-goblet cells18, and the other by high expression of HLA-DQA1 and MHC class II genes (Figs. 3a-b, Supplemental Fig. 4). In the nose, we discovered two club-cell-like secretory cell clusters (STATH+ and C15orf48+), as well as a novel cell cluster of “keratinizing epithelial cells” expressing genes involved in cornification, epidermis development, and keratinocyte differentiation, such as SPRR3 and SPRR2A. Immune populations represent <1% and 26% of total cell populations in the nasal and bronchial epithelium, respectively. In both nasal and bronchial datasets, the cell subpopulations were consistently observed across multiple patients (Supplemental Fig. 5).
We found low expression of the VE genes in the single cell data, consistent with other reports (Supplemental Fig. 6)19,20, so we calculated VE gene module metagene scores in each cell to understand how biological processes associated with the VE genes in the bulk RNA-seq data are distributed across nasal and bronchial cell populations (Figs. 3c-e). In the nose, the ACE2 module was moderately expressed across many cell types but was most highly expressed in C15orf48+ secretory cells and club cells. The TMPRSS2 module was predominantly expressed in the keratinizing epithelial cells, followed by C15orf48+ secretory cells. The expression of gene modules of ACE2 and TMPRSS2 is different from the gene expression pattern reported by Sungnak et al in that TMPRSS2 gene expression was limited to ACE2+ cells21. In the bronchus, both the ACE2 and TMPRSS2 modules were expressed at the highest levels in the goblet cell population, in agreement with previous findings22. The highest median metagene score for the CTSL module was in the neutrophils and macrophages in both the nose and bronchus; however, expression was also high in T cells in the nose and dendritic cells in the bronchus. We found very few cells co-expressing ACE2, CTSL, and TMPRSS2 at high levels – 114 out of 34833 cells in the nose and no cells in the bronchus. On the other hand, we found that ACE2 and TMPRSS2 modules were more likely to be highly co-expressed in nasal keratinizing epithelial cells (odds ratio = 7.56), nasal C15orf48+ secretory cells (N = 518, odds ratio = 7.36), and bronchial goblet cells (odds ratio = 16.80) (N = 806, Supplemental Table 6, FDR q < 0.001). Thus, ACE2 and TMPRSS2 were found to be expressed in different nasal and bronchial cell populations, which may be influenced by smoking in ways that lead to their divergent correlation with smoking in the two airway compartments.
To investigate how smoking modulates different cell populations, we computationally deconvolved cell population proportions in the bulk RNA-seq data using gene markers identified from the single-cell data (Supplemental Table 7). Higher proportions of goblet cells were observed in current smokers in both the nose and bronchus in all 6 cohorts, consistent with the previous studies18. Increased proportions of ionocytes in current smokers were also observed in the nose (1 out of 2 cohorts) and bronchus (all 4 cohorts). The proportion of ciliated cells was significantly lower in smokers in all 6 cohorts (Figs. 4a-b, Supplemental Figs. 7a-d, p < 0.05). While goblet cell proportions were significantly higher in smokers in both the nose and bronchus, the ACE2 module was highly expressed only in bronchial, not nasal goblet cells. These results may explain the lack of association of ACE2 expression with smoking in the bulk RNA-seq nasal data. Additionally, in the AEGIS nasal data (Supplemental Fig. 7 ) the fraction of keratinizing epithelial cells is increased and STATH+ secretory cells is decreased in current smokers, suggesting that shifts in these populations may be partially responsible for the differential correlation with smoking between ACE2 and TMPRSS2 in the nose. For CTSL, both nasal datasets showed a significant decrease in the proportion of the neutrophil/macrophage population with respect to smoking. While this may partially explain the down-regulation of CTSL in smokers in the nasal bulk RNA-seq data, the overall proportions of these populations estimated in the bronchial data by the deconvolution were close to zero for most samples, which could have prevented us from confirming the decrease of this population with smoking in the bronchus.