Patients
We studied 16 patients with early breast cancer treated with surgery and diagnosed for positive SLNs. All patients had confirmed diagnosis based on histopathology of tumor biopsy. All tumors were invasive ductal carcinomas with or without in situ component. In 2 cases, tumors were mixed and show presence of invasive lobular carcinoma component. Intraoperative SLN were evaluated by the OSNA assay (25). None of the patients had prior treatment with surgery, chemotherapy or radiation. All patients were hormone receptor (HR) positive, HER2 negative. We collected clinical parameters: age, menopausal status, personal and familiar disease precedents and clinical follow-up; pathological parameters: tumor stage was be determined according to the AJCC/UICC system (26), histological grade was determined using the Elston-Ellis grading system (27), histology (ductal, lobular, special types), associated ductal or lobular carcinoma in situ, presence of vascular and lymphatic invasion, tumor infiltrating lymphocytes, type of invasion (expansive/infiltrating), tumor multifocality, tumor necrosis; proliferation of non-tumoral tissue (ductal hyperplasia, atypical ductal/lobular hyperplasia).
Blood Processing and Isolation of Plasma. Human plasma samples were collected prospectively from early breast cancer patients who have not received any previous treatment. Peripheral blood was withdrawn before surgery. Approximately, 10-15ml of peripheral blood was collected for plasma processing in EDTA tubes. Plasma tubes were processed within 2 hours of collection and spun at 1200xg for 10 minutes. Plasma was aliquot in 1.5 ml tubes and stored at -80C until further processing. All plasma samples used in this study were inspected for absence of hemolysis as previously described (28). Briefly, the hemolysis score (HS) was determined by ultraviolet-visible (UV-Vis) absorbance measurements using a NanoDrop® 2000 Spectrophotometer (Thermo Scientific, Barrington, IL, USA). Measurements were performed by applying 2 µl of plasma on the micro-volume pedestal after centrifugation at 1000 × g for 5 min at 4°C and using saline (PBS) as a blank. In addition, monitoring of hemolysis was conducted by qPCR for all samples by comparing the level of a highly expressed miRNA in red blood cells (has-miR-451a) with a miRNA unaffected by hemolysis (has-miR-23a-3p) as previously described (29).
RNA isolation NGS Library preparation and Next Generation Sequencing
RNA was isolated from 300μl of plasma samples with the miRNeasy serum/plasma advanced kit (Qiagen, Cat No/ID: 217204) according to the manufacturer's instructions. A range of spike-ins was added to the plasma samples prior to RNA isolation. A quality check was performed by qPCR previous to sequencing the samples. Sixteen samples were selected to perform NGS, including 12 positive SLNs (n = 6 macrometastasis and n= 6 micrometastasis) and 4 negative SLNs. A total of 5μl total RNA was used to construct the NGS libraries using the QIAseqmiRNA Library Kit (Qiagen, Cat. No: 331505). Briefly, after ligation of 3' and 5' adapters and Unique Molecular Identifier (UMIs) to miRNAs, complementary DNA library ready for sequencing was constructed by reverse transcription followed by 22 cycles of PCR amplification and cDNA cleaned up using QMN beads. Next, we perform a library preparation quality check using either Bioanalyzer 2100 (Agilent) or TapeStation 4200 (Agilent). Based on quality of the inserts and the concentration measurements, libraries were pooled in equimolar ratios and quantified using the qPCR ExiSEQ LNA™ Quant kit (Exiqon). The library pools were then sequenced with a NextSeq500 platform (Illumina) using sequence runs of 75nt single-end reads with an average number of 10 million reads/sample. Raw data was demultiplexed and FASTQ files for each sample were generated using the bcl2fastq 2.18.0.12 software (Illumina). FASTQ data were checked using the FastQC tool (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
Genome annotation and quantification of miRNAs.
To perform genome annotation we used the Exiqon/Xplore RNA pipeline. Following sequencing, Cutadapt (1.9.1) (30) was used to trimmed adaptor sequences. A quality check (QC) was performed to ensure Q-scores >30 (>99.9% correct) of our data (31). Reads with correct length were analyzed for the presence of UMIs using Cutadapt (1.9.1) and then collapsed by UMIs into FASTQ files. This approach eliminates library amplification bias and allowed for true identification of the miRNAs. Bowtie2 software (2.2.6) was used for mapping the reads. The mapping criterion for aligning reads to spike-ins, abundant sequences and miRBase_20 was for reads to have perfect match to the reference sequences. To map the genome, one mismatch was allowed in the first 32 bases of the read. Small insertions and deletions (INDELs) were not allowed. The resulting sequences were annotated using the human assembly GRCh37 and the miRBase_20 database. IsomiR analysis was performed individually for each sample based on the occurrence of count variants for each detected miRNA. Read variants were merged onto a single count file with a consistent nomenclature across samples. Only isomiRs present at a level of 5% of total reads for a specific miRNA were retained. Tag per million (TPM) was used as a normalization procedure to correct for differences in sequencing depth and to quantified each RNA species.
Differential expression analysis
Differential expression (DE) analyses were performed using the trimmed mean of M-values normalization method (TMM) (32), based on the log-fold and absolute gene-wise changes in expression between samples. The TMM normalization compensates for sample specific effects caused by the variation in library size/sequencing depth between samples and also compensates for under- or over-sampling effects by trimming and scaling factors that minimize log fold changes between samples across the majority of the miRNAs. Differential expression analysis was performed using the EdgeR statistical software package (Bioconductor, http://www.bioconductor.org/). The isomiR analysis was done using Exiqon in-house scripts (exq_ngs_mircount). Predicted miRNAs analysis was performed based on the read count distribution using the exiqon_ngs_mirpred in house script and the secondary structure prediction according to the miRPara classification score (33). Volcano plots were constructed using R programming (34) by plotting the p value (-log10) on the y-axis and the expression fold change between the two experimental groups on the x-axis
Principal Component Analysis (PCA) and Heat map and unsupervised clustering
Principal component analysis (PCA) was performed using R programming and TMM-normalized quantifications from defined collections of samples as input. The same input was used to generate a heatmap and unsupervised hierarchical clustering by samples and gene expression profile with R scripts (34). We selected the top 50 miRNAs with the largest coefficient of variation (% CV) across all samples to obtain a cluster of samples. The data was normalized to TMM and converted to log2 scale.
Gene Ontology (GO) Enrichment Analysis
Gene Ontology (GO) analyses (35, 36) were done with R TopGO package with experimentally verified targets of significantly differentially expressed miRNAs as input. Two different statistical tests were used and compared. First, a standard Fisher's test was used to investigate enrichment of terms between groups. Second, we used the Emil method (37) to incorporate the topology of the GO network, to compensate for local dependencies between GO that could mask significant GO terms. We compared the predictions from these two methods to highlight truly relevant GO terms.
Quantitative real-time RT–PCR analysis
Quantitative real-time RT–PCR analysis was done with an ABI Prism 7500 Sequence Detection System using the miRCURY LNA™ Universal RT cDNA Synthesis Kit (Exiqon). The cDNA was diluted 50X and assayed in 10 µl PCR reactions according to the protocol for the miRCURY LNA™ Universal RT microRNA PCR System (Exiqon A/S); each microRNA was assayed twice by qPCR on the Serum/plasma Focus microRNA PCR panel. A no-template control (NTC) of water was purified with the samples and profiled like the samples. Analysis of the data was performed using the relative miRNA expression according to the comparative Ct (ΔΔCt) method using negative metastatic samples as reference. We used the geNorm (38) or the Normfinder algorithm (39) to select the best combination of two reference genes based on our qPCR data. Data from multiples plates were normalized using UniSp3 spike-in as interplate calibrators.
Statistics
Differentially expressed miRNAs from RNA-sequencing data were detected by an exact test based on conditional maximum likelihood (CML) included in the R Bioconductor package edgeR (40). P values from RNA-sequencing and qPCR were corrected (q-values) for multiple testing using the Benjamini-Hochberg procedure (41). A false discovery rate (FDR) q < 0.05 was considered significant. In all group comparisons missing expression values were treated as zero. Differences in total numbers of miRNAs between groups were analyzed by two-sided parametric t tests. For analysis of clinicopathological parameters, quantitative variables between groups were compared using the Student’s T-test and qualitative variables were compared using the X2 or Fisher exact tests. A two-sided p-value ≤ 0.05 was considered significant.