Characterization of the cellular heterogeneity across four LUAD subtypes
A total of 52 freshly resected lung specimens of four subtypes (3 AAH, 5 AIS, 9 MIA, and 17 IA) were collected from 25 patients (Fig. 1A and Table 1), along with 18 adjacent non-malignant (also as normal) lung tissues from a distal region within the same lobe, which served as controls. Eight of the 25 patients presented multiple nodules (Fig. S1). For each specimen, we rapidly digested the freshly collected tissues to generate a single-cell suspension, isolated cells without enrichments for specific cell types, and generated scRNA-seq data with the 10x Chromium platform. We characterized the transcriptome of 140,556 cells from LUAD samples (P1-P22) at single-cell resolution by the V2 kits and validated our results with a separate dataset of 127,923 single cells from multiple nodules resected from an additional 3 patients (P23-P25) using the V3 kits (Fig. 1A and Table 2).
To distinguish cell populations based on distinct transcriptional profiles, we performed dimensionality reduction and unsupervised cell clustering using the Seurat package (version 3.0.3.9028). We identified 16 distinct clusters based on key marker genes that were further assigned into 16 major cell types (Fig. 1B-1E), including epithelial cells (ciliated, club, basal, AT1, AT2 and AT2-like cells) and stromal cells (endothelial cells, fibroblast cells, lymphocytes and myeloid cells). We profiled the transcriptomic characteristics of LUAD through comparative analysis between non-malignant and malignant cells from surgically resected specimens of four stages (Fig. 1C-1G). Most non-malignant cells were immune cells, and each cell cluster comprises cells derived from multiple different patients. The proportions of T lymphocytes and myeloid cells were highly reproducible across patients. Moreover, we confirmed the enrichment of B and T lymphocyte and the decline in natural killer (NK) and myeloid cells during tumor progression (Fig. 1G and Fig. S2A-S2B), indicating the activation of adaptive immune responses. In tumor tissues, we discovered that AT2 feature cell types (AT2-like cells) were associated with malignant composition, and were present at the onset of LUAD development (Fig.1B and 1D). The AT2-like cells were highly patient-specific and exhibited increasing cellular phenotypic heterogeneity with tumor progression (Fig. 1D and 1F). Notably, a very small percentage of cells expressing cell proliferation markers was observed in our data, so we did not correct for the cell cycle effect (Fig. S3A and S3B). These results illustrated a wide range of intratumoral heterogeneity in LUAD transcriptome, which could be shaped by surrounding microenvironment during progression.
Next, we reanalyzed the published data of 2 LUAD patients (out of the total 8 patients) from the study by Lambrechts et al 11 by integrating our datasets, based on the same normalization and filtering parameters (Fig. S4A-S4E), and identified the same 16 cell clusters. Both datasets revealed highly consistent assignment of identified cell types. However, our dataset consisted of multiple instances of rare cell populations/clusters (such as ciliated cells, lymphocytes, and AT2-like cell subgroups) that were not present in the Lambrechts et al dataset. For example, the AT2-like cluster was enriched in premalignant AAH and AIS tumors (Fig. S4A and S4B). A validation cohort of an additional 127,923 single cells from 3 LUAD patients using the V3 kit (Table 1) revealed that epithelial and stromal cells could be assigned to subclusters, representing 90% of cell types in the original set of 22 patients (Fig. S4C and S4D). With this more comprehensive dataset, we further characterized the stromal cell and epithelial cell populations in a greater detail to better assess cell heterogeneity during LUAD progression.
Characterization of stromal cells in coordinating tumor microenvironment through tumor progression
The stromal cells are associated with tumors, could provide deeper insights into lung cancer biology 11. To investigate stromal cell dynamics in the tumor microenvironment (TME), we analyzed the single-cell transcriptomes of endothelial cells (ECs), fibroblasts, lymphocytes, and myeloid cells from normal tissues and tumor tissues representing the four stages, as shown in Fig.1B. We detected 1, 925 ECs and five clusters based on marker genes; (Fig. 2A-2D and Fig. S5A-5B) these clusters included tip-like cells, tumor ECs, stalk-like cells, endothelial progenitor cells (EPCs) and lymphatic ECs. Most of the EC clusters observed belonged to normal tissues, and were assigned to known vascular cell types. For example, lymphatic ECs were enriched in normal tissue. Tip-like cells, stalk-like cells and EPCs were expressed in early-stage (AAH and AIS stage) tumors and normal tissues (Fig. S5A). Tumor ECs were identified in early-stage tumors, and demonstrated strong expression of PLVAP, GSN, and TSC22D1, which regulate the development and determine the cell fate of ECs. To learn more about the biology underlying these cell states, we used GSEA to compare the expression patterns in tumor ECs versus normal ECs (Fig. 2D). The results revealed Myc targets as the top enriched signature in tumor ECs. Indeed, earlier studies indicated that c-Myc is essential for tumor angiogenesis, glycolysis and oxidative phosphorylation, all of which promote vessel sprouting 13. The most significantly downregulated pathway was involved in inflammatory responses, such as the INF-α and INF-γ responses. The endothelium represents the primary interface between circulating immune cells and the tumor, and plays important roles in relaying signals and presenting epitopes from the tissues it vascularizes to the immune system 14.
Fibroblasts have long been suggested to represent a heterogeneous population, but the extent of heterogeneity has remained unclear in LUAD progression 15. Our subclustering of fibroblasts revealed six distinct clusters, including fibroblast-like cells, normal fibroblasts, smooth muscle cells, lipofibroblasts and myofibroblasts (Fig. 2E-2H and Fig. S5A-5B). The fibroblast-like cell (CFD and APOD) and myofibroblast (ACTA2 and RGS5) expression patterns were reproducibly detected in AAH and AIS tumors, and could represent common features of the LUAD TME (Fig. S5A). Many studies have identified fibroblast-like cells and α-SMA-positive (ACTA2, smooth muscle α actin) myofibroblasts as cancer-associated fibroblasts (CAFs), promoting extensive tissue angiogenesis 16, and tumor progression17. Smooth muscle cells, comprising of the main fibroblast type in the vasculature and being linked to wound healing and angiogenesis 18, were observed in IA stage tumors and a few normal tissues. Further GSEA analysis of fibroblasts from normal and tumor tissues was performed (Fig. 2H). Cancer-derived fibroblasts are associated with epithelial-mesenchymal transition (EMT) and the immune response, such as showing strong IFN-γ and IFN-α responses. Therefore, cellular dynamics in ECs and fibroblasts supported a consistent phenotypic shift of stromal cells towards tissue remodeling and angiogenesis in LUAD.
Lymphocytes often play important roles in inflammation, cancer immune evasion, and responses to immunotherapy treatment 19. In our dataset, subclustering of 61,196 detected lymphocytes revealed 10 clusters, annotated mainly as T cells, B cells and NK cells, with some other immune cell types (Fig. 2I-2K and Fig. S5A-5C). Consistent with previous findings, T cell-mediated cytotoxicity is critical for tumor cell clearance 20. We observed that CD8+ T and regulatory T (Treg) cells were enriched in the tumor, but CD4+ T cells and natural killer cells were depleted during tumor progression (Fig. S5A-5C). Furthermore, Treg cells persist in the IA stage, exerted a suppressive mechanism of antitumor immunity during tumor progression. CD8+ T cells and Treg cells were enriched in the tumor population, which was also reported in Lambrechts et al. 11. B cells and plasma cells are rare in most samples. Using definitive tumor lymphocytes and innate lymphocytes, we constructed a transcriptional trajectory for the exhaustion of lymphocytes and found the key gene expression programs during tumor progression (Fig. 2K). Indeed, transcriptional states in the trajectory suggested that lymphocytes were not associated with changes in the tumor biology. Most of cell types were observed to be distinctly positioned at the branches. However, CD8+ T and cancer cells were located in separate trajectory branches from the same ancestor, indicating their interactive differentiation states. Altogether, the changes in the cellular composition and gene expression phenotype of lymphocyte cells confirmed the direction of tumor immunity towards immune suppression in LUAD progression.
Myeloid cells play a critical role in maintaining tissue homeostasis, and regulate inflammation in the lung 21. We dissected the gene signatures of the 8 myeloid clusters revealed in this study (Fig. 2L) including granulocytes, macrophages, NK cells and dendritic cells. Neutrophils were not recovered in our experimental procedures. Two macrophage types are known to populate the lung: the alveolar macrophage (AM) type, which highly expresses the MARCO, FABP4, and MCEMP1 genes; and the tumor macrophage (TM) type, which comprises the remaining tumor-enriched clusters. The AM type macrophages were mainly detected in normal and early-stage LUAD (Fig. 2L-2N and Fig. S5A-5C). By contrast, the TM transcriptional phenotypes were mostly present in late clinical stage IA tumors, and TMs shared high expression of SPP1, APOE, CCL2 genes, involved in apolipoprotein metabolism 22. Trajectory analysis on TMs demonstrated a dynamic functional spectrum from AMs with regard to the transcriptional trajectory (Fig. 2N). Dendritic cells mostly comprised myeloid cells, three DC subsets, DCs (CCL17), activated DCs (BIRC3, CCL22) and CD141+ DCs (CPVL, KLRB1). CD141+DCs mediated by NK cells, plays a key role in an inflammatory environment (Fig. 2M). On the other hand, cancer cells in myeloid cell clusters were mostly observed in late clinical stage IA tumors, which indicated that tumors communicating with immune cells after stage IA are capable of immune escape.
TME is heterogeneous and includes reprogrammed or activated immune cells, fibroblasts, and ECs. To characterize stromal cell heterogeneity in LUAD progression, we performed simultaneous immunofluorescence staining for ECs (CD31), fibroblasts (fibronectin) and immune cells (CD45) in normal tissues and tumor tissues from different stages (Fig. S5D and Table 3). An increase in the CD45+ population was observed as the tumor progressed, suggesting that immune cells contribute to LUAD growth. On the other hand, we also examined the gene expression of subclusters in LUAD and patient survival (Fig. S6). Correlations among seven cell subtypes, such as CD4 T cells and T follicular helper cells, may reflect differences underlying the histopathology. These findings suggest that stromal cells and immune cells in the lung TME harbor both tumor-promoting and tumor-suppressing activities, which may predict clinical outcome.
Characterization of epithelial cell lineages across different stages
Lung epithelial cells have been studied extensively due to their medically important role in lung cancer and various pulmonary diseases such as asthma and fibrosis 21. Here, we explored the intrinsic transcriptome of epithelial cells through comparative analysis between normal epithelium and tumor cells from different stages. We identified 15,984 epithelial cells and re-clustered them into 10 subclusters. (Fig. 3A-3C). Based on the expression of known markers, we found that the atlas mainly comprised of epithelial cells such as AT1 cells (AGER), AT2 cells (SFTPC), club cells (SCGB1A1), basal cells (Krt5), ciliated cells (FOXJ1) and AT2-like cells. As expected, normal epithelial cells mainly have five distinct subpopulations, including AT1, AT2 cells, club cells, basal cells and ciliated cells, expressing well-defined epithelial markers. Interestingly, the SFTPC gene (an AT2 cell marker) was highly expressed in non-malignant tissues and tumors at an early stage, but not in late-stage tumor tissues (Fig. 3B).
In tumor tissues, epithelial cells may contain residual non-malignant cells along with malignant tumor cells. To separate true tumor cells from a potential non-malignant population, we used inferCNV on scRNA-seq to identifying the malignant cells (Fig. 3D-3E). We used the cell types from adjacent lung tissues as a healthy reference to estimate the copy number variation (CNV) at different stages. Chromosomal amplification (red) and deletion (blue) were mapped to each chromosomal position (columns) across the single cells in Fig. 3E. We identified large-scale chromosomal CNVs in IA stage AT2-like cells, but did not observe any CNVs in other cell types.
To confirm the expression of identified epithelial cell marker genes, we performed immunofluorescent staining of different cell types to investigate the abundance and spatial localization of AT1 cells (AGER), ciliated cells (FOXJ1), club cells (SCGB1A1), basal cells (Krt5) and AT2 cells (SFTPC) (Fig. S7A and Table 3). Non-malignant human lung alveolar and bronchi tissues were double-stained with AGER and SFTPC antibodies. AT1 and AT2 cells were found to be preferentially localized in peripheral alveoli, while club cells, ciliated cells and basal cells were mainly distributed on the bronchial surface; these observations were consistent with a previous study 11. In LUAD tissues, expression of the SFTPC gene was observed in MIA tumors, but less than 45% of the AT2 cells expressed SFTPC compared to that of AAH (Fig. 3F). AT1, basal cells, ciliated cells and club cells were not detected in MIA. In IA tumors, SFTPC gene expression was almost undetectable, and the alveolar structure was not recognizable. Therefore, SFTPC gene expression was significantly decreased during histopathological progression. These changes were associated with the previously described loss of expression of the lung lineage-defining transcription factor Nkx2-1 as well as loss of the AT2 markers SFTPC 23. The results demonstrated that AT2 cells are the origin of adenocarcinomas and are defined as AT2-like cancer cells.
To further elucidate the differentiation process in LUAD, we constructed a lentiviral vector expressing EGFRVIII and Cre driven by the carbonic anhydrase II (CAII) promoter, which is expressed mainly in AT2 alveolar cells and was previously demonstrated to efficiently drive LUAD formation in mouse lungs 24. After transfection of lentiviral vectors into mice, non-malignant and tumor lung tissues were resected after 3 and 7 months, and immunofluorescence stained (Fig. S7B). We found that the SFTPC was evenly distributed in the alveoli of non-malignant lungs. SFTPC gene expression was decreased in tumor tissue at the 3rd month and unevenly distributed and expressed at very low levels in the 7th month (Fig. S7C). Therefore, the results further supported that AT2 cells give rise to LUAD in spontaneous EGFRVIII mice and in human.
Transcriptional trajectory analysis of AT2 cells
To identify the key molecular events governing the cell-fate transition during progression from normal AT2 cells to cancer cells, we selected cell clusters that closely resemble those of AT2 and AT2-like cancer cells, and then tracked the gene expression changes along the trajectory from non-malignant tissues to AAH, AIS, MIA and eventually IA. We performed a pseudotime analysis with Monocle2 and observed non-random expression patterns (Fig. 4A-4C). The transcriptional states in the trajectory revealed normal differentiation paths as well as progression-associated changes in tumors. Non-malignant cells and cells at early clinical stages (AAH to AIS) gathered on one end, while cells from late-stage tumor tissues (MIA or IA) tended to be on the other end (Fig. 4A).
We identified 283 differentially expressed genes that exhibited dynamic expression over the pseudotime axis (q value < 0.05) and classified them into 4 groups (group 1 to group 4). Then, we ordered these genes along the temporal pseudotime and reconstructed a diffusion map (Fig. 4B). The expression profile of group 1 showed that the self-renewing AT2 genes were relatively quiescent (with high level of WIF1 inhibiting WNT) 25, 26 and had a high percentage of stem-like cell transcription and differentiation genes (LAMP3 and MUC1) expressed 9, 27. By contrast, group 2 genes showed an expression pattern similar to the start of the dedifferentiation process, where lineage-specific genes such as negative regulation of the cell death response to stimulus (AQP5) 28, RNA biogenesis processing (RPL family) 29, 30, and mitochondrial factors (MT-ND4 and MT-ND2) showed increased expression 31. The expression pattern of group 3 genes was related to inflammatory response cytokine cell activation (FOSB, NFKB1), with visible EMT-related gene (vimentin) 32. Lastly, the genes in group 4 were involved in extracellular matrix organization (TIMP1) 33, cell-cell signaling and regulation of cell migration (S100A4, VEGFR) 34. Moreover, we also identified genes (MDK, SOX4 and LYZ) previously described during cancer evolution1, 35. Although each of these expression patterns emerged at a different specific time, they all persisted in tumors once they arose, such that more advanced tumors contained a greater assortment of cells with a higher diversity of states.
We then filtered out the genes expressed in fewer than 10% of total cells and performed gene regulatory network analysis on normal and AT2-like cells in order to examine the genetic interactions between the remaining 3,613 genes in the AT2 and AT2-like cell clusters (Fig. 4C). We found that genes upregulated in normal tissues are essential for lung function and homeostasis. In contrast, the genes upregulated in cancer are involved in metabolism, ribosomal activity, or MHC class II molecule expression, which suggests that these activities are essential during tumor progression. Notably, significantly downregulated genes in tumor ECs were related to immune activation, supporting a previous finding that tumor ECs suppress immune responses 11.
Loss of AT2 features and gain of stemness associate with cell transition
Multiple mouse studies have suggested stemness of AT2 cells, and the maintenance of this self-renewal activity is of great importance for cancer progression 36, 37. We focused on clusters closing to normal AT2 cells, distinct subcluster that emerged first in AAH lost some AT2 cell transcriptional identity, but retained features of the lung epithelial lineage. Then we compared with the expression of normal AT2 cells, observed AT2-like cell subset with a signature of stemness genes, which were present in tumors throughout LUAD progression (Fig. 4B-4C). The result suggested that LUAD evolution is characterized by a loss of the AT2 feature of the lung lineage and the emergence of an alternative dedifferentiated stem-like state. These results were also consistent with previous report that suggested dedifferentiation of committed epithelial cells into stem cells in multiple diseases, especially cancer38.
Epithelial cells transition to a mesenchymal state in tumor progression has been proposed 32. Transcriptionally, EMT is defined by downregulation of the epithelial marker E-cadherin and upregulation of the mesenchymal transcription factors vimentin. To investigate the prevalence of EMT in LUAD progression, we examined the expression level of vimentin, E-cadherin and FOXM1 by RNA fluorescence in situ hybridization (FISH) (Fig. 4E). Expression of the E-cadherin decreased in conjunction with increased vimentin. We also found higher protein expression of FOXM1, which is a pro-stemness transcription factor associated with tumor proliferation in kidney and ovarian cancers 39, 40. Our results suggest that FOXM1 could be a driver of dedifferentiation and proliferation in LUAD. These findings were further supported by tissue immunofluorescence and bioinformatics analysis of known epithelial and fibroblast markers. We next used indirect immunofluorescence staining to validate our scRNA-seq findings at the protein level. Ki67 is a protein that expresses in all phases of the cell cycle except for quiescent cells in G0. Interestingly, the staining intensity in our study showed high Ki67 expression (gray) in the early stages (AAH and AIS) and gradual decreases in expression as the tumor progressed (Fig. S8). This finding is of clinical importance, as the differences in Ki67 expression during LUAD progression indicate the cutoff values used for treatment decisions, which is consistent with our bioinformatic results and previous report 41. EMT (vimentin, red) and angiogenesis (VEGF, green) related genes also gradually increased with tumor progression, as indicated by immunostaining.
There is a plethora of role of Wnt in stem cell self-renewal or lineage specific differentiation in diverse tissues in vivo 25, 42. Recently, Wnt signaling was found to be amplified by engaging the leucine-rich repeat-containing G-protein-coupled receptor (Lgr) Lgr5, which is a marker for stem cells in multiple epithelial tissues and can drive lung adenoma progression in mouse models 25. Consistent with the known role of the WNT signaling pathway, Lgr5 was reported to activate two WNT mediators: GPX2 and OLFM4 (Fig. 4B), which were shown to be increased during tumor evolution 43. RNA FISH of four stage LUAD revealed an increase in the fraction of Lgr5 compared to normal tissues (Fig. S9), suggesting that the increased tumorigenic potential correlates with an increased stem-like signature. Differential expression of another stem-like gene, AQP5 gene is a compartment of WNT-driven invasive gastric cancer 28, observed in our results (Fig. 4B). Moreover, the expression of the stem-like genes IFI27 and S100A4 increased as the tumors progressed. Therefore, we speculate that AT2 cells undergo dedifferentiation to generate a stem-like state to initiate and maintain tumor progression.
Cell-cell crosstalk visualizing potential specific interactions in LUAD
Recent studies have demonstrated involvement of the specific hematopoietic stromal cell lineage and tumor epithelial cells in a cell-type-specific crosstalk-regulated network 44. We used CellPhoneDB to identify the expression of potential crosstalk signaling molecules based on ligand-receptor interactions 45. In Fig. 5A, we showed that both normal and tumor epithelial cell clusters expressed genes found within myeloid cells, epithelial cells (especially AT2/AT2-like), and fibroblasts, which suggested a possible interaction between epithelial and stromal cells. Fig. 5B displays the detailed receptor-ligand interaction pairs (basically gene pairs) for the cell types of indicated cell types. We focused on the gene pairs in the cell types that exhibited a strong interaction, and extracted the rows with all gene pairs (p value <0.001). As a result, the number of interacting cells was dramatically elevated in AT2-like cells, communicating with immune cells. AT2 cells expressed higher levels of LGALS9, the receptors of COLEC12 and MRC2 in DCs, granulocytes and macrophages. LGALS9 on AT2 cells has previously been found to promote immune suppression via T-cell and macrophage inhibition 46. By contrast, AT2-like cells express high levels of ANXA1, MDK, FN1 and CCR1. The expression pattern of the FN1–A4B1 (A4B7) ligand-receptor complex indicates the existence of functional interactions between AT2-like cells and immune cells. Here, we pinpointed that AT2 cell interacted specifically with the myeloid cell subset by LGALS9 receptor, but AT2-like cells tightly interacted with myeloid cells via ANXA1, FN1and MDK (Fig. 4C). The results suggested that these two types of ligand-receptor interactions could be another new immune checkpoint, as potential novel immunotherapy target for LUAD. On the other hand, endothelial cells and fibroblasts prominently expressed ligands such as FN1, CCL2 and CCL12, also could match immune related receptors.
Independently, we performed flow cytometry to measure the distribution of the two populations and the preferential expression of immune cells in AT2-like cells (Fig. 4D). There was elevated expression of CD45 in AT2-like cells (Epcam+CD45+ cells) as LUAD progressed. These results were consistent with our scRNA-seq data showing higher levels of expression of PRF1, GNLY, GZMA and GZMB genes in this subset, thus confirming that immune cells communicating tumor cells towards immune suppression in LUAD.
Quantitative detection of key biomarkers in plasma of different stage LUAD
Biomarkers that can effectively diagnose lung cancer in the early stages or confirm the presence of metastasis can guide clinical intervention and treatment. Multiple research groups have recently turned to blood-based biomarkers to detect lung cancer 47, 48. At present, accepted clinical blood tests for LUAD mainly based on carcinoembryonic antigen (CEA), but recent studies have shown that sensitivity for early-stage lung cancer is limited 49.In our dataset, genes related to energy metabolism and ribosome synthesis were upregulated in the early stage of LUAD, which may favor transcription and metabolism required for tumor survival and growth. Therefore, we performed quantitative detect of protein, miRNA and metabolite biomarkers associated with metabolism and ribosome synthesis genes for LUAD diagnosis. Among them, miRNA-10a has been speculated to control synthesis via stimulation of ribosomal protein mRNA translation and ribosome biogenesis 50. On the other hand, β-hydroxybutyric acid are likely to be produced by nearby or adjacent fibroblasts to provide energy to tumor cells, and has been detected in lung cancer patients 48, 51. Therefore, we carried out RT-PCR or ELSA to identify plasma miRNA-10a and β-hydroxybutyric acid levels to distinguish LUAD patients at different stages of disease and healthy controls (Fig. 6A and Table 4). While plasma CEA level showed no difference between specimens from healthy controls and patients with early-stage cancer (AIS and MIA), the levels of miRNA10 and β-hydroxybutyric acid were significantly higher in the plasma of the early-stage LUAD patients (P<0.01) (Fig. 6A). Additionally, miRNA10a levels were found to be significantly upregulated in all four clinical stages of NSCLC 52 (P=0.002). Based on this analysis, miRNA-10a and β-hydroxybutyric acid appeared to be appropriate plasma biomarkers for distinguishing patients with early-stage disease and healthy controls.
The changes in marker gene expression were also examined along pseudotime. Notably, TIMP1 and MDK (Fig. 3B) were highly expressed at the late clinical stage. Elevated levels of MDK, a product of lysine decarboxylation, were also identified as one of the most important features in discriminating stage III LUAD (Fig. 6A). Moreover, TIMP1 also stand out as a critical feature in stage III LUAD. Previously, MDK and TIMP1 have been reported to regulate metabolism in metastases by activating the PI3K/Akt pathway 33, 53 , which were also validated in our study by immunostaining (Fig. 6B). Taken together, these results highlighted the potential of new marker genes such as miRNA10 and β-hydroxybutyric acid to diagnose early-stage LUAD, and suggested MDK and TIMP1 as potential biomarkers to facilitate our understanding of LUAD pathogenesis.