Patients and samples
A total of 317 consecutive patients treated in the Department of Breast Surgery at Fudan University Shanghai Cancer Center (FUSCC) between January 1, 2008 and December 31, 2015 were enrolled in this prospective observational study, which was initiated in 2017. The inclusion criteria were the following: (1) female patients with unilateral breast tumors and histologically confirmed invasive ductal carcinoma (IDC) and triple-negative subtype (ER-, PR- and HER2-); (2) tumor size < 5cm and no lymph-node metastasis or distant metastasis confirmed by pathology; (3) patients who did not receive any type of treatment prior to surgery; (4) follow-up time > 1 year; (5) sufficient frozen tissues for RNA purification.
The ER, PR and HER2 status measured by immunohistochemistry (IHC) and fluorescence in situ hybridization (FISH) were confirmed by two experienced pathologists (Ruo-Hong Shui and Wen-Tao Yang), and the cut-off points were set as the guideline from the American Society of Clinical Oncology and the College of American Pathologists [15]. The pathologic stage was assigned based on the 8th edition of the AJCC anatomic staging system [16]. All of the breast cancer specimens were confirmed to contain more than 80% of tumor cells. A total of 330 frozen samples (including 13 paired normal tissues) from 317 TNBC patients preserved by the Department of Pathology in FUSCC (Shanghai, P.R. China) were examined.
To develop candidate mRNAs from high-throughput transcriptome sequencing data, we enrolled another 386 consecutive patients who underwent surgical treatment in the Department of Breast Surgery at Fudan University Shanghai Cancer Center (FUSCC) between 1 January 2007 and 31 December 2014 with Affymetrix GeneChip Human Transcriptome Array 2.0 (HTA 2.0) (n=141) and RNA-seq (n=245) data from our previous researches [17,18]. Using the same criteria as described above (except for sufficient frozen tissues), 189 patients (50 patients had paired adjacent normal tissues) were enrolled as developing set in this study. The follow-up was completed on 30 June, 2017, and the median follow-up length was 47.6 months (interquartile range, 38.5-59.3 months). There were 79 patients who overlapped between the developing set and the above 317 patients.
To train and validate a multi-gene signature, we randomly assigned the 317 patients into a training set (n=159) and validation set (n=158). The follow-up was completed on 31 December, 2017 and the median follow-up length was 39.3 months (interquartile range, 25.6-55.2 months) in training set and 38.5 months (interquartile range, 26.8-52.7 months) in the validation set. RFS events were defined as the first recurrence of invasive disease at a local, regional, or distant site; contralateral breast cancer; and death from any cause. Patients without events were censored at the last follow-up.
The independent ethics committee/institutional review board of FUSCC (Shanghai Cancer Center Ethics Committee) approved our study and written informed consent from patients in our study was obtained before enrollment.
RNA purification, reverse transcription and RT-qPCR
The RNeasy Plus Mini Kit (QIAGEN) was used to isolate total RNA from frozen tissues and cells and reverse transcribed to cDNA using GoScript Reverse Transcription Kit (Promega) according to the manufacturer’s protocol. Real-time quantitative-polymerase chain reaction (RT-qPCR) was performed using SYBR Premix Ex Taq (TAKARA) and amplified on QuantStudio 7 Flex System (Applied Biosystems). The primers used for RT-qPCR were listed in Table S1. The relative expression value of each mRNAs was calculated as CTmRNA-(CTU6+CTGAPDH)/2 (CT, cycle threshold).
Identification of candidate mRNAs
The detailed filtration process is illustrated in Figure 1. The mRNAs that were differentially expressed between tumor and paired normal tissues and related to RFS were selected for further research using transcriptome data in the developing set (N=189 containing 50 patients with paired adjacent normal tissues). The differentially expressed mRNA screening was processed using the Bioconductor package ‘Limma’ (Linear models for microarray analysis); significant differentially expressed genes were defined as false discovery rate < 0.05 and fold change > 1.5 or < 0.33 for up or down-regulation, respectively. The univariate Cox proportional hazards regression model was used to select mRNAs that were significantly correlated with RFS. Next, the LASSO Cox regression model analysis was processed to select the most useful prognostic mRNAs as candidate mRNAs with 200 bootstrap replicates [19].
Development and validation of the prognostic signature
The expression of the candidate mRNAs was measured using RT-qPCR in the training set (N=159) and internal validation set (N=158). In order to simplify the prognostic signatures, we used R package 'glmulti' to perform automated screening for the best combination of mRNAs based on Akaike information criterion (AIC) value and calculate the coefficients of each mRNAs in the training set [20]. The coefficients of each mRNAs were used to construct a risk score formula. The accuracy of this signature was assessed by time-dependent receiver operating characteristic (ROC) analysis, and the best cut-off score was selected according to Yoden Index [21]. To validate this signature, the risk score of each patient in the validation set was calculated using the formula generated in training set.
Gene Set Enrichment Analysis (GSEA)
GSEA (version 2.2.0) was performed using the hallmark gene set and a preranked differential expression gene list [22]. The differential gene analysis between early-stage and late-stage was conducted using DEseq2 package.
Cell cultures and siRNA transfection
Human breast cancer cell lines MDA-MB-23, BT-549 and human embryonic kidney cells HEK293T were obtained from American Type Culture Collection. These human cell lines have been authenticated using STR profiling and monitoring mycoplasma contamination. The Lipofectamine RNAiMAX Transfection Reagent (Thermo Fisher Scientific) was used to transfect small interfering RNAs (siRNAs). The target sequences were listed in Table S2. The transfection procedure was performed according to the manufacturer’s instructions.
Proliferation assay
For the proliferation assay, cells were cultured in 6-well plates for 24 hours and transfected with siRNAs. After 24 hours, 1000 cells per well were seeded in 96-well plates. The cell confluence was monitored using an IncuCyte Live-Cell Analysis Systems (IncuCyte ZOOM System, ESSEN Bioscience) and imaged every 12 hours. The cell confluence was measured by IncuCyte ZOOM software, and the assays were performed in triplicate.
Migration assay
For the migration assay, 5x104 cells in serum-free medium were seeded on the top of the transwell chamber (pore size, 8um, BD Bisosciences) and 600μl medium containing 10% FBS was added to the bottom chamber. After incubation for 8-10 hours, cells that migrated to the opposite side of the membrane were fixed in 4% paraformaldehyde for 30 minutes and stained with crystal violet for additional 30 minutes. ImageJ was used to quantify the number of cells migrated to the opposite side per field.
Western Blot
Cells were lysed in lysis buffer (50 mM tris [pH 8.1], 1 mM EDTA, 1% SDS, 1 mM fresh dithiothreitol, sodium fluoride, and leupeptin). The cell lysates were boiled in SDS-PAGE loading buffer for 15 minutes. The proteins were separated by SDS-PAGE and transferred to polyvinylidene difluoride membranes (Millipore). The primary antibody anti-TMEM101 (Abscitech, AB-35251-2, 1:2000) and secondary antibody HRP-conjugated goat anti-rabbit antibody (Jackson Immuno-Research; 1:5000) were blotted. Signals were detected with an enhanced chemiluminescence substrate (Pierce Biotechnology), and images were collected by Molecular Imager ChemiDoc XRS+ (Bio-Rad) with Image Lab Software (Bio-Rad).
Statistical Analysis
The t-test was used to compare the differences between the two groups for continuous variables and χ2 test for categorical variables. We used the Kaplan-Meier method to perform univariate survival analysis and log-rank test to calculate the hazard ratio and compare two survival curves. For multivariate survival analysis, the multivariate Cox proportional hazards regression model was used to test whether the signature was an independent prognostic factor associated with RFS. All statistical analyses were performed with R software version 3.5.3. or GraphPad Prism 8. A P value < 0.05 was considered to be statistically significant.