1. Single-cell sequencing of tumors and paired normal lung tissues. Single-cell sequencing was conducted using 11 specimens obtained from a total of 6 subjects. The inclusion criteria and exclusion criteria for the recruited patients are described in the Materials and Methods, and the clinical characteristics, chest CT and low-magnification slide images of individual patients are shown in Table 1 and Fig. 1A, respectively. The QC parameters for single-cell sequencing are shown in Table E1. After the filtering process, the number of cells obtained from normal lung tissue was 53,705 (55.9%) and that from tumor tissue was 42,371 (44.1%). A total of 96,076 cells were initially divided into 7 major cell groups through dimensional reduction and classification, and each cluster was annotated by comparing markers representing each cluster obtained from Seurat's FindConservedMarkers function and known canonical markers of lung cells [7–9] (Fig. 1B-C, E1). In the obtained epithelial cell cluster, barcodes of lung cancer cells were secured using the InferCNV package, and then the remaining cells were annotated as epithelial cells. For refinement of lung cancer cells, clusters with epithelial features were obtained from each tumor tissue, and then the cells that had distinct genetic aberrations were defined as lung cancer cells by InferCNV using cells obtained from normal lung tissue and nonmalignant cells obtained from tumor tissue of the same patient as references (Fig. E2, E3) [10, 11]. T cells occupied the largest proportion among the obtained cell clusters, followed by epithelial cells, including cancer cells and myeloid cells (Fig. 1D-G).
2. Lung cancer cells are located close to terminally differentiated lung epithelial cell clusters. Lung epithelial cells were further characterized after subclustering after removal of the cancer cells from all epithelial cells of merged tumor tissues and normal lung tissues (Fig. 2A-C). Among the epithelial cells, the prominent cell type was ciliated bronchial epithelial cells, characterized by overexpression of CAPS, C9orf24, and C20orf85 (Fig. 2D i-ii), followed by secretory club cells, characterized by expression of SCGB1A1, SCGB3A1, and BPIFB1. The respiratory bronchioles and alveoli are mainly composed of terminally differentiated cell groups: type I alveolar cells and type II alveolar cells. Type I alveolar cells showed overexpression of unique genes such as AGER, CAV1, and RTKN2, whereas type II alveolar cells showed cell fractions that were similar to those in type I alveolar cells and characterized by overexpression of surfactant family proteins such as SFTPC, SFTPA1, and SFTPA2. Interestingly, a few cells with neuroendocrine features overexpressing unique genes, such as GRP, CALCA, and CPE, were observed and were presumed to be pulmonary neuroendocrine cells. When it was further evaluated by immunohistochemical (IHC) staining, a few flask-shaped cells strongly stained with GRP and CALCA adjacent to the basement membrane of the bronchial epithelium could be observed (Fig. 2D iii-vi).
In our UMAP model, lung cancer cells clustered in the center of ciliated cells, club cells, alveolar type I cells and alveolar type II cells (Fig. 2A-B). The top 15 most differentially expressed genes (DEGs) between lung cancer cells and normal lung epithelial cells are shown in Fig. 2E, and the entire gene set is shown in Table E2. These DEGs were enriched for secretory vesicle and surfactant pathways, suggesting that the origin of early lung carcinogenesis is closely related to type II alveolar cells (Fig. 2F). Lung cancer cells share many markers with type II alveolar cells and fewer markers with type I and ciliated bronchial cells. Among the top DEGs, IHC staining was performed by selecting those that did not overlap with the common marker of type II alveolar cells (Fig. 2G). SPINK1 prevents trypsin-catalyzed premature activation of zymogens, and LPCAT1 converts lysophosphatidylcholine to phosphatidylcholine in the presence of acyl-CoA. Both were specifically stained in the cancer cells at the border between normal and cancer tissues and inside the tumor. Lung cancer cells also overexpress the CEACAM6 and CEACAM5 surface glycoproteins, which play a role in intercellular adhesion in a calcium- and fibronectin-independent manner [12]. CEACAM6 staining was not present on the components of normal lung tissues, but it was strongly positive in lung cancer cells regardless of histologic subtype, suggesting its possible involvement in early tumorigenesis. When DEGs obtained by comparing lung cancer cells with normal lung epithelial cells were analyzed using KEGG (https://www.genome.jp/kegg/pathway.html), significant enrichment of ERBB signaling pathways and apoptotic pathways was observed (Fig. E4).
When the lung cancer cells were projected onto the trajectories of the normal lung epithelial cell group, they traced either on type I alveolar cells or type II alveolar cells. Club cells were observed in all processes according to pseudotime of differentiation between bronchial cells and alveolar cells, whereas neuroendocrine cells were observed at the branching time point from ciliated cells to alveolar cells (Fig. 2H-J). When representative genes of major lung epithelial cell clusters were aligned according to pseudotime, concurrency was observed between the genes related to surfactant homeostasis, such as SFTPC, and cancer-specific genes, such as CEACAM6 (Fig. 2K), suggesting that the surfactant-related pathway is strongly involved in early lung carcinogenesis.
3. Depletion of CD8 + TCs and γδ TCs and enrichment of Tregs and BCs in the early tumor microenvironment. TCs are a group of cells with a very heterogeneous distribution between tumor and normal lung tissues. All TC subtypes, except Tregs, were less frequently detected in tumor tissues than in normal lung tissues (Fig. 3A-D). Immune fatigue markers showed different expression patterns according to TC type and associated cluster; CTLA4 expression was commonly observed in CD4 + TCs and Tregs, whereas LAG3 expression was commonly observed in CD8 + TCs, and HAVCR2, CD244, and CD160 expression was commonly observed in γδ TCs and natural killer (NK) cells (Fig. 3E). Compared to normal lung tissues, tumor tissues had an overwhelming number of Tregs; these Tregs had distinct expression of CTLA4 and TIGIT, showing an exhausted phenotype (Fig. 3E). In the trajectory analysis with CD4 + TCs, we identified 3 pseudotime-dependent subclusters (Fig. 3F). When the CD4 + TC clusters were aligned according to the pseudotime designating the naïve CD4 + TCs as the root state, Tregs were located in the last stage of differentiation and possessed immune fatigue phenotypes (Fig. 3G). In tumor tissue, Tregs differentiated from naïve CD4 + TCs and fully developed at the end of the trajectory as an independent cluster, whereas those in normal tissue showed no significant flow or scant numbers (Fig. 3H). The trajectory analysis of cytotoxic TC clusters, which included cytotoxic CD8 + TCs, γδ TCs, and NKT cells, set CD8 + TCs as the root state and showed gradual and even distribution of the components throughout the pseudotime (Fig. 3I). The tumor tissue trajectory analysis of these cytotoxic cells showed small groups of terminally differentiated γδ TCs from cytotoxic CD8 + TCs and NKT cells.
BCs were more strongly enriched in tumor tissues than in normal lung tissue. In this study, the 940 BCs analyzed comprised 2 subtypes of BCs (follicular and mucosa-associated lymphoid tissue (MALT) BCs). Of these BCs, 77.8 % were found in tumor tissues, making them the most prominent cells enriched in tumor tissues among the immune cells found in the lung tissues (Fig. 3J). Although follicular BCs showed a relatively high proportion compared to the MALT BCs, their degree of elevation in the proportion of follicular BCs in tumors compared to normal lung tissues was similar to that of MALT BCs (Fig. 3K). The BC chemoattractant CXCL13, secreted from cancer cells, follicular dendritic cells (DCs), and T follicular helper cells, may be related to BC influx in tumor tissues [13, 14]. When the expression of CXCL13 was investigated in cancer stromal cells, the majority of its expression was detected in CD4 + TC cells located in tumors, especially in Treg and CD4 memory cells (Fig. 3L, Fig. E5A).
4. Myeloid cells show an immunosuppressive immature phenotype in the tumor microenvironment. In this analysis, 20,345 myeloid cells were recovered and grouped into eight clusters (Fig. 4A-D). Anti-inflammatory alveolar macrophages were significantly decreased in the tumor tissue compared with the normal lung tissue. In UMAP, tissue-infiltrating macrophages (TIMs) were located adjacent to anti-inflammatory alveolar macrophages and on the opposite side of pro-inflammatory monocyte-derived macrophages. When TIMs inside tumors, tumor-associated macrophages (TAMs), in other words, were compared to macrophages in normal lung tissues, they were enriched in genes related to leukocyte chemotaxis (Fig. E5B).
Myeloid-derived suppressor cells (MDSCs) are of myeloid origin, harbor an immunosuppressive function, and reside in a cancer-related context [15]. Along with CD11b+, we adopted CD84 + as an additional marker, which has been shown to improve the detection of MDSCs through single-cell RNA sequencing analysis of human tissue [16]. CD84 + CD11b + myeloid cells were a rare heterogeneous population scattered in macrophage and DC clusters (Fig. E5C). Macrophages overexpressing the CD84 and CD11b genes were more scattered in tumor tissues than in normal tissues. (Fig. 4E). When the enriched genes were subjected to GO analysis with ToppGene (https://toppgene.cchmc.org/) and Enrichr (https://maayanlab.cloud/Enrichr/), the genes enriched in the CD84 + CD11b + myeloid cells from the tumor tissues were related to the extracellular matrix, pulmonary fibrosis and IL-10 signaling (Fig. 4F-G, Table E3), indicating that CD84 + CD11b + myeloid cells inside the tumor produce tumor-promoting cytokines (IL-10) and make the tumor microenvironment more permissive for tumor progression through alteration of extracellular matrix composition. On the other hand, the CD84 + CD11b + myeloid cells in normal lung tissue showed enrichment of genes related to host protection from infection and inflammation (Fig. 4H, Table E4). DCs clustered into four groups, and among them, IDO1, which inhibits T-cell proliferation by degrading tryptophan, was more enriched in activated DCs in tumor tissues than in normal lung tissues (Fig. 4I). Mast cells were separated from myeloid cells and formed a unique and very homogenous population, so there were no significant DEGs between individual subclusters (Fig. 4J).
5. CAFs are associated with the disruption of normal vascular structures. Fibroblasts (FBs) were the second most enriched cell type in tumor tissues than in normal lung tissue in this GGN-type early lung cancer. FBs were classified into four subtypes: matrix FBs, COL1A1 + FBs, myofibroblasts (myo FBs) and CAFs (Fig. 5A-C). Among them, the unique fibroblast group called CAFs showed higher expression of HIGD1B, COX4I2, and RGS5 than other FB clusters. The CAF clusters showed significant overexpression of hypoxia-related genes compared to other FB clusters (Fig. 5D). When the genes enriched in these clusters were subjected to GO analysis with Enrichr (https://maayanlab.cloud/Enrichr/), gene sets related to negative regulation of protein kinase activity and negative regulation of endothelial cell proliferation were enriched, and those associated with extracellular matrix organization were enriched (Fig. 5E, Table E5).
Not irrelevant to this finding, the endothelial cell (EC) cluster was adjacent to the FB cluster, and fewer cells were found in EC clusters than in FB clusters in tumor tissue (Fig. 5F-H). Interestingly, in tumor tissues, an increased number of undifferentiated EC clusters located in the middle of stalk-like and tip-like ECs were observed in UMAP, with a strikingly decreased number of tip-like ECs. These cells were characterized by RGCC, IL7R, and FCN3 expression (Fig. 5I), and when compared with the other EC clusters, this cluster showed enrichment of gene sets related to cellular locomotion and motility as well as gene sets related to vascular development and tube morphogenesis (Table E6). When trajectory analysis was conducted to further confirm their characteristics, it was shown that their location between the stalk-like EC and the tip-like EC clusters was closer to the stalk-like EC cluster (Fig. 5J). Taken together, these findings suggest that there is an undifferentiated vascular cell cluster between stalk-like ECs and tip-like ECs and that CAFs inhibit EC differentiation into tip-like ECs.