The overview of our method for investigating hookah smoking and its constituents are described in Fig. 1, the key steps summarized in Fig. 1A. To explore the effect of hookah smoking and its constituents, we designed cell-based experiments for hookah exposure in two heating ways and tobacco or herbal shisha, and all of the 30 samples were performed RNA sequencing to generate gene expression profiles and have been deposited to Gene Expression Omnibus10 (GEO accession GSE162386) (Fig. 1B). Cigarette smoking datasets were collected from public omics databases (Fig. 1C). Then, we identified DEGs from our hookah-smoking gene expression profiles and public cigarette smoking datasets, and evaluated pathway enrichment based on the hypergeometric distribution (Figs. 1D and 1E). Next, the p-values were transformed into z-scores to quantify KEGG pathways (Fig. 1F). Finally, the meta-z-scores of subsystem- and system-levels were calculated from pathway-level z-scores to quantify subsystems and systems (Fig. 1G).
Hookah exposure experimental design
To study the harmful effects of hookah smoking on cells, we designed cell-based experiments for hookah smoking exposure with various constituents and treatment times (Fig. 1B; Table 1). The bronchial epithelial cells (BEAS-2B) were cultured and exposed with and without tobacco in two heating ways of hookah smoking, including internal control (IC), charcoal with no shisha (C), charcoal with non-tobacco (herbal) shisha (CNT), charcoal with tobacco shisha (CT), and electronic heater with tobacco shisha (ET) for 30 min while on a rocker. The exposure concentration was measured as the highest that caused no toxicity at 24 hours, as assayed by LDH release. For each exposure condition, three wells were treated with Trizol at 2 and 24 hours after exposure. RNA extraction was performed by the RNeasy Plus Mini Kit, Qiagen, according to the manufacturer's instructions.
Table 1
No. of samples of our in-house datasets.
In-house datasets (GSE162386) |
Groups | After exposure 2hr | After exposure 24hr |
Control (IC) | 3 | 3 |
Charcoal (C) | 3 | 3 |
Charcoal + non-tobacco (CNT) | 3 | 3 |
Charcoal + tobacco (CT) | 3 | 3 |
Electronic + tobacco (ET) | 3 | 3 |
Next generation sequencing
All of the 30 samples were performed RNA-seq to analyze whole-genome transcriptome using Illumina HiSeq 2500 platform with 50 bp single-end reads (Illumina, USA). For each sample, the reads were aligned with the human reference genome (GRCh38) using the HISAT2 package11 and the mapped reads were assembled into transcripts using StringTie12. The average mapping rate of reads was 90%. Next, the expression profiles of all transcripts were estimated and calculated based on fragments per kilobase of transcript per million fragments mapped (FPKM) by StringTie and Ballgown package13 in R. Expression profiles have been deposited to GEO (https://www.ncbi.nlm.nih.gov/geo/; accession no. GSE162386).
Cigarette smoking-related public dataset
For comparing the mechanisms between cigarette and hookah smoking, we collected three cigarette-related microarray datasets, GSE63127, E-MTAB-4742, E-MTAB-4740, from GEO and ArrayExpress databases10,14 (Fig. 1C; Table 2), all of which used a Affymetrix Human Genome U133 Plus 2.0 platform. GSE63127 includes 143 non-smoking (NS) and 87 smoking (S) small airway epithelium samples (Cig1); E-MTAB-4742 contained 47 non-smoking and 48 smoking oral tissues (Cig2); E-MTAB-4740 is composed of 52 non-smoking and 54 smoking nasal tissues (Cig3).
Table 2
No. of samples of public datasets used in this study.
Public datasets |
Dataset | Non-smoking | Smoking |
GSE63127 (Cig1) | 143 | 87 |
E-MTAB-4740 (Cig2) | 47 | 48 |
E-MTAB-4740 (Cig3) | 87 | 54 |
Hierarchical-systems biology model
For omics data analysis, we first identified the differentially expressed genes (DEGs) in each compared group with |log2FC| > 0.58 and p-value < 0.05 by using the limma package15 (see Supplementary Table S1 online). Based on our identified DEGs, the hypergeometric distribution was used for pathway enrichment analysis and pathways with p-value < 0.05 were considered significant16,17 (Fig. 1D). Here, the p-value is calculated as (1):
where N is the number of genes recorded in KEGG, M is the number of DEGs, n is the number of genes in the specific KEGG pathway, and i is the number of genes that are DEGs in the specific KEGG pathway. The p-value of each pathway was transformed into the z-score using the standard normal distribution (Fig. 2). We then computed the subsystem- and system-level meta-z-scores for 45 KEGG subsystems and 6 systems (Figs. 1E and 1F), such as metabolism, genetic information processing, and human diseases, based on our previous study9. The meta z-score is defined as (2):
where zi is the z-score of the pathway i, and n is the total number of pathways in a subsystem or system. The meta-z-score reflects the significance (enrichment) of a subsystem or system for a specific condition or disorder, such as hookah smoking.