Lung cancer remains the leading cause of cancer death with over 1.6 million deaths annually, and the incidence of lung cancer is still increasing worldwide (Herbst et al. 2008; Herbst et al. 2018). More than 85% of lung cancer cases are diagnosed as non-small-cell lung cancer (NSCLC), with lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) being the two main histological subtypes. LUAD alone accounts for approximately 40% of NSCLC cases, resulting in over 500,000 deaths per year globally (Herbst et al. 2008; Herbst et al. 2018). The most important risk factor for lung cancer is still cigarette smoking, which is responsible for about 85–90% of all cases (Freedman et al. 2008; Herbst et al. 2018). A cumulative exposure to tobacco of 10–20 pack-years is reported to be associated with a clinically relevant increase in morbidity(Neumann et al. 2013). In the context of lung cancer screening, smokers with a smoking history ≥ 20 pack-years is one of the major criteria recommended by the National Comprehensive Cancer Network and the American Association for Thoracic Surgery(Boiselle 2013). The other emerging risk factors include second-hand smoking and air pollution (Oberg et al. 2011), such as PM2.5, which is claimed to cause lung cancer in many developing countries (Guo et al. 2019; Khilnani, Tiwari 2018). Although NSCLC is strongly associated with smoking, LUAD is more common in never-smokers (Herbst et al. 2018; Sun et al. 2007). Compelling evidence indicates that never or light -smoker patients with LUAD have a significantly better survival rate than smokers, suggesting different levels of smoke exposure may cause distinct molecular mechanisms underlying their clinical difference (Bryant, Cerfolio 2007; Casal-Mourino et al. 2019; Lofling et al. 2019).
Recent efforts have been put to characterize different molecular alterations in LUAD using high-throughput genome sequencing, which led to a comprehensive profiling of different oncogenic driver mutations (Cancer Genome Atlas Research 2014; Weir et al. 2007). Besides EGFR mutations and ALK fusions, for which targeted therapies have become standard treatment for LUAD, several other activated oncogenes such as, KARS, TP53, ERBB2 and BRAF are also found in LUAD (Imielinski et al. 2012; Wu et al. 2015). As more and more in-depth multi-omics studies are progressing, striking differences in molecular characteristics have been discovered between LUAD arising in never-smokers and smokers. For example, LUAD patients with different levels of tobacco consumption show different mutation frequencies of the EGFR, TP53 and KRAS genes, with EGFR mutations occurring more frequently in never smokers (Le Calvez et al. 2005; Sun et al. 2007). In addition, gene expression analysis identified distinct patterns of dysregulated genes in smokers of LUAD, of which associated altered pathways are particularly involved in the cellular immune response and cell cycle regulation(Landi et al. 2008; Liu et al. 2018). Moreover, epigenetic studies also demonstrated clear differences between methylation profiles of LUAD in never smokers and smokers(Alexandrov et al. 2016; Divine et al. 2005; Toyooka et al. 2006). To date, however, other epigenetic studies, such as open chromatin patterns associated with LUAD progression caused by smoking are still lacking. Different from whole genome (exome) sequencing, which identify genetic risks, study of open chromatin regions can offer insights on epigenetic and regulatory modifications and thus may provide novel genes or pathways that are involved.
Recently, assay for transposase accessible chromatin sequencing (ATAC-seq) has emerged as a powerful tool for profiling chromatin accessibility in different human diseases and has exerted a profound impact on understanding the coordination in gene expression processes(Buenrostro et al. 2013; Liu et al. 2019). Until now, a few studies have explored open chromatin states in NSCLC with ATAC-sEq. An elegant work by Corces et al. studied chromatin accessibility of 410 tumor samples from The Cancer Genome Atlas (TCGA), including 38 cases of NSCLC(Corces et al. 2018). More recently, an integrative analysis which linked the open chromatin variations to genomic alterations among NSCLC patients provided a comprehensive open chromatin landscape of NSCLC(Wang et al. 2019). However, emphasis has not yet been placed on linking the clinical variables, such as cigarette smoking history to open chromatin patterns in LUAD. In this study, we first generated a network based on correlations between peaks from ATAC-seq data of TCGA. Using retained peaks filtered by the correlation network, we then studied differences between never or light smokers (< 20 pack-years) and heavy smokers (≥ 20 pack-years) in LUAD patients and further identified a set of genes and their related pathways that associated with patients’ progression-free survival (PFS) and overall survival (OS).