We used R software (version 3.6.3) (Team, 2018), GraphPad Prism (version 8) and Bioconductor (Gentleman et al., 2004) for all statistical analyses in our study. The workflow was showed in Fig.1.
Data acquisition and preprocessing
The raw data of endometrial expression profile and corresponding clinical data of RIF group and control group (GSE71331, GSE71332) were obtained from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo). Paired serum and endometrial miRNA expression profile data were extracted from GSE108966.
To ensure consistency in the results of the analysis, we started with raw data of the dataset. Due to the different character of the platforms, the raw data of GSE71331 and GSE71332 were processed by “Limma” R package(20) (Agilent-052909 CBC lncRNA mRNA V3, Agilent-046064 Unrestricted Human miRNA V19.0), and the raw data of GSE108966 was processed by “Deseq2” R package(21) (Illumina HiSeq 2500).
Selection of differentially expressed genes (DEGs)
The scanning of differentially expressed genes (DE miRNA, DE lncRNA, DE miRNA) was performed by using the “limma” R package (GSE71331 and GSE71332) (20) and Deseq2 (21) (GSE108966) with the following criteria: p value < 0.05 and |log 2-fold change| > 1.
Weighted correlation network analysis (WGCNA) of miRNA of endometria samples
With the “WGCNA” R package (22), WGCNA was performed on differentially expressed miRNAs which selected based on GSE71332 dataset. The workflow of WGCNA was as follows. First, outliers were reduced by filtering the RNA-seq data. The coexpression matrix was constructed based on the absolute points of the relationships between gene expression levels. Then, we constructed a Pearson correlation matrix of paired genes, and a weighted adjacency matrix was developed based on the power function (where is the Pearson correlation between gene b and gene d, is the adjacency between gene b and gene d, and β emphasizes the strength of the relationship between the genes). Then, we selected an optimal β value to develop a scale-free coexpression network and a similarity matrix. Next, a topological overlap matrix (TOM) was constructed from the adjacency matrix. We performed average linkage hierarchical clustering with a minimum gene dendrogram size of 40 by using TOM-based dissimilarity measurements. By analyzing the modules, we calculated the dissimilarity and constructed module dendrograms for these modules.
We then calculated gene significance (GS) to estimate the significance of each module and measure the relationships between genes and sample traits. In the principal component analysis, module eigengenes (MEs) were considered as the main components, and we summarized gene expression levels as a single feature within a given module. Then, the p-value of the linear regression between gene expression levels and clinical feature were transformed, and the result was defined as the GS value (lgP = GS). Next, to estimate the relationship between the module and sample traits, we calculated the average value of GS, which was then defined as the module significance (MS). We used the relevant p-value to determine statistical significance. Next, clinical data including first trimester losses and live births were combined with gene modules for further analysis.
For the identification of key genes, the GS and module membership (MM, the correlation between the genes in the module and their expression profiles) of every key gene were calculated with the following thresholds: cor. gene GS > 0.5 and cor. gene MM > 0.8.
Prediction of target lncRNAs/mRNAs of RIF related DE miRNAs
The intersection of the differentially expressed genes and the genes of the key modules related to RIF in WGCNA was taken as RIF related DE miRNAs. Then miRDB(http://mirdb.org/miRDB/)(23), miRTarBase (http://mirtarbase.mbc.nctu.edu.
tw/) (24), and TargetScan (http://targetscan.org/)(25) were used to predict miRNA-targeting mRNAs. NPInter (http://bigdata.ibp.ac.cn/npinter4/)(26) and DIANA-LncBase(27) were used to predict miRNA-targeting lncRNAs. The intersection of differentially mRNAs/lncRNAs in GSE71331 and the miRNA-targeting mRNAs/lncRNAs were taken as targeting-DE mRNAs/lncRNAs.
Construction of lncRNA-miRNA-mRNA regulatory network
LncRNA-miRNA interactomes were then built based on targeting-DE lncRNAs and RIF related DE miRNAs. Similarly, mRNA-miRNA interactomes were built. Subsequently, lncRNA-miRNA-mRNA regulatory networks were constructed by using cytoscape, version 3.8 (https://cytoscape.org/). Key modules were selected by MCODE(28).
Selection and validation of co-expression miRNAs between serum and endometrial samples
The intersection of endometrium DE miRNAs and serum DE miRNAs was taken as intersection miRNAs. To ensure that their expressions were relevant, Pearson correlation analysis is performed by using GraphPad Prism (version 8). Genes with the Pearson correlation coefficient | r | ≥ 0.5 were considered to be co-expressed miRNAs between serum and endometrium. And the intersection miRNAs were then validated in GSE71332.
Prediction of target lncRNAs/mRNAs of co-expression miRNAs
miRDB(http://mirdb.org/miRDB/)(23), miRTarBase (http://mirtarbase.mbc.
nctu.edu.tw/) (24), and TargetScan (http://targetscan.org/)(25) were used to predict miRNA-targeting mRNAs of co-expression miRNAs.
Functional enrichment analysis of targeted DE mRNAs
To understand the biological function of targeted DE mRNAs of GSE71332 and targeted mRNAs of GSE108966, Metascape (29) (http://metascape.org), which includes abundant functional annotations, such as KEGG pathway, Reactome pathway, canonical pathway, GO biological process, and CORUM (the comprehensive resource of mammalian protein complexes) annotations, was then used to perform functional enrichment analysis with a p-value of < 0.01 as the cutoff value (29). Selected terms with a p-value of < 0.01 and a number` of genes greater than or equal to 3 were considered significant terms.
Transcriptional Regulatory Relationship analysis of targeted DE mRNAs
TRRUST (transcriptional regulatory relationships unravelled by sentence-based text-mining, http://www.grnpedia.org/trrust) is a TF-target regulatory interactions database based on the manual curation of Medline abstracts (30). TRRUST was subsequently used to screen transcription factors related to targeted DE mRNAs and targeted mRNAs and study their transcription regulation relationships.
Causal relationship analysis
DisNor (31) (https://disnor.uniroma2.it/) is a web-based tool that can generate and explore protein interaction networks based on disease genes using Mentha protein interaction data and causal interaction information annotated by SIGNOR. DisNor was used to explore the causal relationships among targeted DE mRNAs.
Construction and validation of nomogram based on circulating miRNAs
Logistic regression analysis was then performed with three selected factors by using “survival” R package(32) to select the best fit model. Then a nomogram was built to predict the risk of RIF patients by using “rms” R package. At the same time, the consistency index (C-index) was calculated to evaluate the model's ability to distinguish. The consistency of the predicted probability and the actual probability of the model was evaluated by the Calibration curves. The predictive performance of the model was evaluated by drawing the receiver operating characteristic (ROC) curve and calculating the area under the curve (AUC) values.