Participant characteristics
In this case-control study, adult patients (age >19 years) newly diagnosed with oral squamous cell carcinoma were enrolled from the National Cancer Center, Korea, and Seoul National University Dental Hospital, covering various oral-cavity regions. Healthy controls were recruited from the cancer-screening cohort of the National Cancer Center of the Republic of Korea. This study consisted of 1022 participants. The discovery and validation datasets included 637 patients (104 with oral cancer and 533 controls) and 385 patients (53 with oral cancer and 332 controls), respectively. Ethical approval was obtained from the National Cancer Center, Korea (IRB approval numbers NCC2019-0050, NCC2019-0116, and CRI15017), and written informed consent was obtained from all participants. All participants were interviewed to assess their sociodemographic characteristics using a structured questionnaire, followed by physical examinations.
Saliva and blood sample collection
Baseline saliva samples from participants were collected after a 1 h fasting period and stored in 1.5 mL tubes at -80 °C. Blood samples were drawn from their antecubital veins into BD Vacutainer K2 EDTA tubes after a 12 h fast and centrifuged at 3,000 rpm for 20 min at 4 °C. The resulting plasma, buffy coat, and red blood cell samples were stored at -80 °C.
Oral microbiome characterization based on 16S rRNA gene amplification and sequencing
Microbial DNA was extracted from saliva samples using a Fast DNA Spin Kit (MP Biomedicals, CA, USA). DNA quality and quantity were checked using a Qubit dsDNA BR Kit and a fluorometer (Life Technologies, CA, USA). Polymerase chain reaction products were purified from 2% agarose gels and secondarily amplified with Illumina NexTera barcodes using primers from Bionics Cosmogenetech (Seoul, Korea; Table S9). Amplicons were pooled at ChunLab (South Korea), and DNA was isolated and sequenced using the Illumina iSeq100 platform (Illumina Inc., CA, USA) at the National Cancer Center, SK. Primers 341F and 805R (Supplementary Table 9) (Bionics Cosmogenetech) were used to amplify the V4 region of the 16S rRNA gene. Bacteria were classified based on taxonomic data provided by EzBioCloud 18. Poor-quality sequence reads of <80 base pairs (bp) or >2,000 bp were excluded. Taxonomic analysis was performed using the USEARCH tool. The UPARSE algorithm was used to classify the reads into operational taxonomic units (OTUs) with 97% similarity. Single-end reads were clustered into OTUs using UCLUST and the cut-off numbers.
Functional homology inferences: predicting orthologs
The functional profile of the oral microbiome was constructed using the PICRUSt algorithm with EzBioCloud’s 18 microbiome taxonomic profiling (MTP). Sequencing reads were obtained using the EzBioCloud 16S MTP pipeline and matched to reference database entries. Functional profiles were annotated by multiplying the gene counts/OTU by the OTU abundance per sample, using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The accuracy of each functional profile was analyzed using the nearest-sequenced taxon index.
Cell lines
Oral cancer cell lines (CAL-27 and SCC-1) and a normal cell line (HGF-1) (HyClone Laboratories, UT, USA), were maintained in Dulbecco’s modified Eagle’s medium. The YD-10B oral cancer cell line (HyClone Laboratories) was maintained in RPMI 1640. Both media were supplemented with 10% fetal bovine serum, 100 U/mL penicillin, and 100 μg/mL streptomycin and stored in 5% carbon dioxide at 37 °C.
Small-interfering RNA (siRNA) experiments
A negative-control siRNA and an siRNA targeting CPT1A mRNA were purchased from Genolution, Inc. (Seoul, South Korea). The abovementioned siRNAs had the following sequences: siControl: 5′-CUCGUGCCGUUCCAUCAGGUAGUU-3′; siCPT1A: 5′- GACGUUAGAUGAAACUGAAUU-3′.
3-(4,5-dimethylthiazol-2-yl)-2,5- diphenyltetrazolium bromide (MTT) assay
We performed MTT assays to determine the viabilities of HGF-1, YD-10B, CAL27, and SCC1 cells. The cells were plated in 96-well plates, grown for 24 h, and treated with dimethyl sulfoxide (vehicle control) or reverse-transfected with siCPT1A for 48 h. MTT solution (5 mg/mL) was then added to the cells, and the cells were incubated for 6 h at 37 °C. Formazan pellets were dissolved in 2-propanol, and absorbances were measured at 540 and 650 nm using a VersaMax Microplate Reader (Molecular Devices, CA, USA).
Western blotting
HGF-1, YD-10B, CAL27, and SCC1 cells were reverse-transfected with siCPT1A for 48 h. Subsequently, the cells were harvested in ice-cold RIPA lysis buffer (R0278; Sigma Aldrich, South Korea) containing protease and phosphatase inhibitors. Soluble lysate was isolated from each sample via centrifugation and quantified using the BCA Protein Assay Lit (Pierce, Thermo Fisher Scientific, MA, USA). Proteins were resolved using sodium dodecyl-sulfate polyacrylamide gel electrophoresis and transferred to polyvinylidene fluoride or polyvinylidene difluoride membranes. The membranes were blocked with 5% skim milk and probed with a primary antibody against CPT1A (ab128568, Abcam, Cambridge, England) and a secondary horseradish peroxidase-conjugated anti-mouse antibody (A90–116P; Bethyl Laboratories, TX, USA).
Immunohistochemistry (IHC) analysis of CPT1A and CD4+
To assess CPT1A expression, 2 mm core biopsies from control and tumor paraffin blocks were sliced into 4 μm sections and dried at 56 °C for 1 h. IHC was performed using the Discovery XT platform (Ventana Micro Systems Inc., CA, USA), a Chromomap DAB Detection Kit (Roche Diagnostics, Basel, Switzerland), and a CPT1A antibody (diluted 1:200). The results were captured using a Vectra Polaris imaging system and quantified using inForm software. CPT1A expression was calculated as the average H-score, yielding a final score of 0–300 cells.
CD4+ tissue samples were prepared as 4 μm-thick sections using a microtome, deparaffinized, and rehydrated. Antigens were retrieved using Tris-EDTA and sodium citrate buffer (pH 6.0). After blocking peroxidases with 3% hydrogen peroxide, the samples were stained and scanned by PrismCDX Co., Ltd. (Gyeonggi-do, Korea), as per clinical protocols (SI1).
Measuring oral microbial signals, including cytokine levels (OXSR1, CPT1A, SCFAs, IL6, and TNF-α)
Plasma oxidative stress was evaluated by performing enzyme linked immunosorbent assays (ELISAs) using the OXSR1 ELISA Kit (abx382011; Abbexa Ltd., Cambridge, UK). Briefly, saliva samples were dispensed into 96-well plates and incubated at 37 °C. Detection reagents A and B were added to the plates, and the plates were further incubated for 1 h at 37 °C. TMB substrate (90 µl) was added to each plate, followed by 50 µL of stop solution. Optical densities were measured at 450 nm using a microplate reader (SPECTROstar, BMG LabTech, Ortenberg, Germany). Total human SCFAs in each saliva sample was measured using an SCFA ELISA Kit (MBS7269061; MyBioSource, CA, USA), which has a sensitivity of 0.92 pg/mL.
Plasma CPT1A levels were measured using a CPT1A ELISA Kit (MBS724213; MyBioSource) and the minimum detectable concentration was 0.1 ng/ml. Plasma IL-6 (catalog number BMS213-2; Thermo Fisher Scientific) and TNF-α (catalog number BMS223-4) levels were quantified using an ELISA kit. The minimum detectable concentrations of the kit were 0.92 pg/mL for IL-6 and 2.3 pg/mL for TNF-α.
Statistical analysis
Python software (version 3.7.15) and the H2O Python module (version 3.38.0.2) (https://github.com/h2oai/h2o-3) was used for cancer diagnosis and biomarker identification. The gradient-based one-side sampling method was utilized within the light gradient-boosting machine (LightGBM) model, and optimization was performed based on various metrics (accuracy, area under the receiver operating characteristic [ROC] curve, F1, precision, and recall). R software (version 4.1.1) was used for analysis and visualization, and t-tests and chi-square tests were performed to compare the observed traits. Alpha diversity was determined by observing OTUs and Chao index. Beta diversity was determined using principal coordinate analysis, employing both weighted and unweighted UniFrac analyses. Statistical significance was assessed based on quartiles, Wilcoxon’s rank-sum test, fold-changes, and logistic regression. Linear-discriminant analysis effect size analysis was conducted to determine genus-level microbial differences.