Combination of multiple omics and machine learning identifies diagnostic genes for ARDS and COVID-19

doi:10.21203/rs.3.rs-3892523/v1

Download PDF

Article

Combination of multiple omics and machine learning identifies diagnostic genes for ARDS and COVID-19

https://doi.org/10.21203/rs.3.rs-3892523/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

BACKGROUND

Acute respiratory distress syndrome (ARDS) is a common acute clinical syndrome of the respiratory system with a high mortality rate and difficult prognosis.COVID-19 is a serious respiratory infectious disease caused by coronaviruses in a global pandemic. Some studies have suggested a possible association between COVID-19 and ARDS, but few studies have investigated the mechanism of interaction between them.

METHODS

Microarray data of ARDS (GSE32707 and GSE66890) and COVID-19 (GSE213313) were downloaded from the GEO database and searched for common differential genes for enrichment analysis.WGCNA was used to identify co-expression modules and genes associated with ARDS and COVID-19. RF and LASSO were performed for candidate gene identification. Machine learning XGBoost improved the diagnosis of hub genes in ARDS and COVID-19. The degree of immune cell infiltration in ARDS and COVID-19 samples was assessed using the CIBERSORT algorithm, and the relationship between hub genes and infiltrating immune cells was investigated. Changes in pathway activity per cell were visualized using Seurat standard flow down clustering (seurat) to visualize peripheral blood mononuclear cell (PBMC) single-cell RNA sequencing (scRNA-seq) data from patients with sepsis-combined ARDS and patients with sepsis alone.

RESULTS

Limma difference analysis identified 314 up-regulated genes and 241 down-regulated genes in ARDS and COVID-19.WGCNA identified the purple-red co-expression module as the core module of ARDS and COVID-19. Five candidate genes, namely HIST1H2BK, TCF4, OLFM4, KIF14 and HK1, were screened using two machine learning algorithms, RF and LASSO. XGBoost constructed diagnostic models to evaluate the hub genes with high diagnostic efficacy in ARDS and COVID-19. Single-cell sequencing revealed the presence of alterations in five immune subpopulations, including monocytes, B cells, T cells, NK cells and platelets, with high expression levels and cellular occupancy of TCF4 and HK1, which are involved in oxidative reactions.

Health sciences/Biomarkers

Health sciences/Medical research

machine learning

multiple omics

ARDS

COVID-19

diagnostic genes

Acute respiratory distress syndrome (ARDS) is a syndrome clinically characterized by progressive respiratory distress and refractory hypoxemia, with diffuse alveolar infiltrates seen on chest radiography(1). ARDS has a rapid onset and progression, and is one of the leading causes of death in critically ill patients(2).ARDS patients account for 10.4% of ICU admissions, and mortality is as high as 35%-45%(3).The pathogenesis of ARDS is complex and difficult to treat, so there is a need for more in-depth study of ARDS. Patients with ARDS account for 10.4% of ICU admissions, with mortality rates as high as 35%-45%(3). The pathogenesis of ARDS is complex and difficult to treat, so more in-depth mechanistic studies of ARDS are needed to identify new biomarkers for early diagnosis, treatment and prognosis.

COVID-19 is an acute respiratory infectious disease caused by Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), which gradually spreads to all parts of the world as outbreaks occur, causing serious impacts on global economic and social development (4). The main manifestations of novel coronavirus pneumonia are fever, dry cough, and malaise, and a few patients are accompanied by upper respiratory and gastrointestinal symptoms such as nasal congestion, runny nose, and diarrhea, and it also affects multiple organs of the human body, such as cardiovascular, gastrointestinal tract, liver, and kidneys. Studies have shown that the clinical burden of COVID-19 may extend well beyond the acute infection phase, where long-term multiple sequelae have a significant impact on an individual's quality of life, and is the most important health challenge globally (5).

Relevant studies have shown that patients with severe and critically ill COVID-19 are prone to rapid progression to ARDS in a short period of time, and that the incidence and severity of ARDS due to COVID-19 are positively correlated and have a worse prognosis than patients with simple ARDS.(6) Late stage ARDS due to COVID-19 is difficult to control, with a mortality rate ranging from 26% to 61. 5%, and early detection, diagnosis and treatment are critical to control and improve the prognosis of patients with COVID-19-induced ARDS with symptoms of lung inflammation, thick airway mucus secretion, elevated levels of proinflammatory cytokines, lung injury and microthrombosis (7). Therefore, it is important to explore the mechanism of progression of COVID-19 to ARDS, and the study of related diagnostic, therapeutic and prognostic biomarkers is more urgent.

In recent years, with the continuous development of computer and biosequencing technologies, it has become possible to analyze disease-related genes at the molecular level as medical research continues to deepen. Bioinformatics, a science that combines molecular biology and information technology, has been widely used in recent studies (8). In this study, we screened and analyzed the common genes of ARDS and COVID-19 by various methods of bioinformatics, hoping to provide new biomarkers and theoretical basis for the diagnosis and treatment of both diseases.

2.1 Differential Gene Screening and Enrichment Analysis

We collected microarray data from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/) (9) for the ARDS dataset (GSE32707 and GSE66890) and the COVID-19 dataset (GSE213313). The data were filtered, background corrected, log2 transformed and normalized by R software. The "SVA" package was used to merge and remove the batch effect on the datasets GSE32707 and GSE66890, and to perform principal component analysis (PCA) on the samples in the three datasets to observe the distribution of clusters among the samples. The "limma" package was used to identify the differentially expressed genes (DEGs) of ARDS and COVID-19, and |log2(FC)|> 1 and p < 0.05 were set as the screening criteria. Next, we analyzed the common genes that were up-regulated and down-regulated in ARDS and COVID-19 using Wayne plots and performed gene enrichment analysis (Gene Ontology, GO).

Weighted Gene Co-expression Network Analysis

We used the "WGCNA" package to construct co-expression gene modules and screened genes with expression > 0.5 for further analysis by selecting the optimal soft thresholds, identifying the most highly correlated ARDS and COVID-19 modules as well as multiple co-expression module genes for further analysis (10).

Machine Learning Candidate Gene Screening

Random forest (RF) algorithm and least absolute shrinkage and selection operator (LASSO) logistic regression were used to screen key genes from the crossover genes of DEGs and WGCNAs (11, 12).RF is an integrated prediction method that handles a large number of input variables and evaluates the significance of the variables.LASSO is a regression method that has shown superiority in evaluating high-dimensional data. We used the RF algorithm to first screen the diagnostic genes with importance scores greater than 0.5. Among the obtained genes, the LASSO algorithm was further downscaled to obtain the final diagnostic genes, and their respective ROC curves were plotted. RF analysis and LASSO regression were performed using the R packages "random forest" and "glmnet" (13).

Hub gene diagnostic model construction

Extreme Gradient Boosting (XGBoost) is a commonly used supervised integrated learning algorithm with powerful scalability and convenient features for model visualization and optimization, and the expression values of hub genes are used as feature values for training XGBoost models [16]. We first selected the ARDS (GSE32707) and COVID-19 (GSE213313) datasets as the training set, and used the ARDS (GSE66890) dataset for validation. The diagnostic effectiveness of the model was evaluated by plotting the receiver operating characteristic (ROC), precision recall (PR) curve, and area under the curve (AUC).

Analysis of immune infiltration

CIBERSORT is an inverse convolution algorithm that has been widely used to label the genomes of different types of immune cells in the microenvironment (14). The algorithm simulates the transcriptionally characterized substrates of 22 types of immune cells, including B cells, plasma cells, T cells, natural killer cells, monocytes, macrophages, dendritic cells, mast cells, eosinophils, and neutrophils, using the R software in conjunction with the inverse convolution method. We compared immune cell infiltration in peripheral blood mononuclear cell (PBMC) samples from the disease group with normal samples.

Single-cell sequencing quality control and downscaling

We downloaded the single-cell RNA sequencing dataset (GSE151263) from the GEO database, and the "Seurat" and "SingleR" software packages were used to analyze the scRNA-seq dataset. Cells with ≤ 10% of mitochondrial genes and ≤ 3% of erythroid genes were retained. At the same time, we removed the number of genes (nFeature RNA) ≤ 200 or 5000 ≥ cells. Next, we performed downscaling and clustering and selected 3000 highly variable genes. Combined with the elbow plot, we selected inflection points and PCs with smooth curves, selected the top 10 dimensions for subsequent analysis, and showed the effect of UMAP and tSNE on downscaling. We then performed cell-associated annotation by immuno-cell-associated labeling using the "SingleR" package (15). Finally, we visualized the expression of hub genes in different immune cells using violin plots.

Statistical Analysis

All statistical tests were performed using R software version 4.1.2. Differences between the two groups were analyzed using the Wilcoxon or Student's t-test. Correlations between variables were determined using the Pearson or Spearman correlation test. Statistical significance was set at a two-tailed p < 0.05.

Identification of differentially expressed genes between ARDS and COVID-19

Principal component analysis (PCA) was used to visualize the distribution of these samples before and after correction for batch effects (Fig. 1A, 1B). We performed data correction and normalization on three datasets (GSE32707, GSE66890, and GSE213313) and identified 1114 DEGs in ARDS, including 575 up-regulated and 539 down-regulated genes, and 3587 DEGs in COVID-19, including 1738 up-regulated and 1849 down-regulated genes. Meanwhile, by plotting the Wayne diagram to screen the common DEGs between ARDS and COVID-19, the results showed that 180 and 61 overlapping DEGs were found in the up-regulated and down-regulated DEGs, respectively (Fig. 1C, 1D).

Enrichment analysis

To explore the biological functions and pathways of the common DEGs, we performed GO enrichment analysis (Fig. 1E, 1F).The results of GO analysis showed that upregulated DEGs were mainly enriched in mitotic cell cycle, cytoplasmic vesicle lumen, and protein kinase regulator activity, while downregulated DEGs were mainly enriched in cellular defense response, endocytosis vesicles, and T cell receptor binding.

Weighted gene co-expression network analysis of ARDS and COVID-19

We used co-expression analysis to construct co-expression networks to explore the correlation between clinical traits and genes. In this study, clustering analysis was performed using the "Flash clust" function. When the threshold was set to 75, 6 outlier samples were detected and deleted, and 145 samples were retained (Fig. 2A, 2B).

The "Select Soft Threshold" function of "WGCNA" filters out power parameters from 1 to 30. A power of β = 5 is selected as the most appropriate soft threshold to ensure a scale-free network, and the results show that the optimal soft power value is 10, and a total of 11 modules are identified.

"Cutree" dynamics and module characterizing gene functions were used to construct cluster maps (Fig. 2C, 2D), and a total of 11 modules consisting of genes with similar co-expression traits were obtained. Heat maps of module-trait relationships were then plotted according to Spearman correlation coefficients to evaluate each module with disease clinical features (Fig. 2E). The purple-red module indicated high connectivity between ARDS and COVID-19 (ARDS: r = 0.16, p = 0.05; COVID-19: r = 0.17, p = 0.04). The purple-red module contained positively associated genes of ARDS and positively associated genes of COVID-19 (Fig. 2F, 2G).GO analysis of the module genes of ARDS and COVID-19 showed that the co-expressed genes of BP were mainly enriched in blood coagulation, trauma repair, and regulation of humoral level, and CC were mainly concentrated in the actin cytoskeleton, platelet α-granules, and cytoplasmic vesicle lumen. MF was mainly associated with actin binding, adhesin binding, and aminoglycans, and KEGG analysis of ARDS and COVID-19 showed that it was mainly associated with focal adhesion, PI3K-Akt pathway, and regulation of actin cytoskeleton (Fig. 3A).

Machine Learning Identification of Intersecting Genes in ARDS and COVID-19

We used RF method to screen the intersecting genes of ARDS and COVID-19, and further screened the intersecting genes by RF algorithm while visualizing them in the order of gene importance.

The significance of the top 30 significant genes was also visualized (Fig. 3B, 3C). We further performed dimensionality reduction by LASSO to obtain the last 5 genes, which were HIST1H2BK, TCF4, OLFM4, KIF14, and HK1 (Fig. 3D, 3E).We constructed candidate gene diagnostic models from the GSE66890 training set using XGBoost and validated them in the GSE32707 dataset. In the GSE66890 dataset of ARDS, the AUC of ROC curves was 0.952 and the AUC of PRCurves was 0.961, while in the validation set GSE32707, the AUC of ROC curves was 0.725 and the AUC of PRCurves was 0.543; the AUCs of ROC curves of ARDS models were all greater than 0.7, indicating that the model has good diagnostic value (Fig. 3F, 3G). Meanwhile, to verify whether it could identify COVID-19 patients, we used the COVID-19 dataset GSE213313 in the same model, and the results showed that the AUC of ROC curves was 1 and the AUC of PR curves was 1, indicating that the model was also used in COVID-19 patients with a high diagnostic effect (Fig. 3H).

We further evaluated the diagnostic value of the five central genes screened, and the ROC curves of HIST1H2BK (AUC = 0.802), OLFM4 (AUC = 0.716), KIF14 (AUC = 0.812), and HK1 (AUC = 0.740) were all greater than 0.7, which had high diagnostic value, and TCF4 (AUC = 0.692) had a diagnostic value that was relatively slightly worse (Fig. 4A-E).

Immune infiltration analysis

The pathomechanisms of both ARDS and COVID-19 are related to inflammatory responses due to overstimulation of the immune system, and therefore we performed immune infiltration analyses on the datasets of both. We analyzed disease and immune infiltrating cell correlations based on 22 types of immune cells using the CIBERSORT method (Fig. 4F-G). Violin plots showed that naïve B cells and regulatory T cells appeared increased in combined ARDS samples compared to control samples; in COVID-19 samples, activated mast cells, macrophages, resting NK cells and plasma cells showed an upward trend compared to normal samples, and memory B cells, CD8 T cells, resting memory CD4 T cells, activated NK cells and activated dendritic cells showed a downward trend.

Single-cell sequencing in patients with combined ARDS and sepsis

We downloaded the single-cell RNA sequencing dataset (GSE168522) and selected one healthy and one AD patient in the dataset as a pruning subject for analysis. First, we performed data quality control. We retained cells with less than 10% mitochondrial genes and less than 3% erythrocytes. Cells with gene number (nFeature RNA) greater than 2000 or less than 200 were filtered out (Fig. 5A-C). The batch effects were also merged and corrected by the "Harmony" package (Fig. 5D), with a high overlap between samples, and then the number of PCs with smooth curves was selected, and the top 10 dimensions were taken for the subsequent analysis, and the downscaling effects of UMAP and tSNE were shown.

We further clustered the cells using the FindCluster function, which showed that the percentages of monocytes, B cells, T cells, NK cells and platelets increased in the ARDS group. The cell ratio analysis also showed that T cells, monocytes, and B cells were more abundant in patients with sepsis combined with ARDS, and the number of NK cells tended to be higher in patients with sepsis combined with ARDS(Fig. 6A-B).

We performed cellular annotation of five genes previously screened by machine learning and diagnostic prediction modeling in the merged ARDS group and the unmerged ARDS single-cell RNA sequencing dataset. The results showed that TCF4, HK1, and HIST1H2BK were annotated in all five cell clusters, and KIF14 was mainly annotated by monocytes (Fig. 6C).TCF4-annotated cells were mainly concentrated in B cells and monocytes, HK1-annotated cells were mainly concentrated in monocytes and T cells, and HIST1H2BK-annotated cells were mainly concentrated in monocytes. TCF4, HK1 were highly expressed in both groups, but the gene expression was higher in the combined ARDS group than in the unincorporated ARDS group, and the expression of HIST1H2BK and KIF14 was lower in both groups, and KIF14 was almost not expressed (Fig. 6D). Due to the number of samples and the sequencing method, OLFM4 was not detected in the single-cell sequencing of the dataset.

Next, we analyzed the cell ratio and expression of the four genes; TCF4 had elevated expression in B cells, monocytes and platelets in the combined ARDS group, and B cells in the uncomplicated ARDS group, but the cell percentage was smaller than that in the combined ARDS group; HK1 had elevated expression in monocytes, NK cells and platelets in the combined ARDS group, and the cell percentage was larger than that in the uncomplicated ARDS group (Fig. 7A); HIST1H2BK had elevated platelet expression in the uncomplicated ARDS group and a greater cell percentage than in the uncomplicated ARDS group (Fig. 7B). Previous studies have shown that oxidative stress plays an important role in the pathogenesis of sepsis and ARDS, and that oxidative stress and inflammatory response interact with each other to promote disease progression. Analysis of the ssGSEA metabolic pathway showed that there was a difference in oxidative scores between the two groups, with the combined ARDS group having a higher oxidative score than the uncomplicated ARDS group (Fig. 7C), suggesting that the patients with combined ARDS had intense oxidative stress and a more severe disease.

TCF4 and HK1 were further analyzed due to their higher expression and percentage in the five cell clusters. The results of the analysis showed that the expression of TCF4 and HK1 was increased in monocytes and B cells, and the gene coexpression overlap was higher in both monocytes and B cells in patients with combined ARDS compared with patients without combined ARDS (Fig. 7D). Correlation analysis of TCF4 and oxidation scores showed a high correlation between TCF4 and oxidation scores in B cells of patients with combined ARDS and a high correlation between TCF4 and oxidation scores in monocytes of patients without combined ARDS (Fig. 7E).

The global pandemic of SARS-CoV-2 has not only increased the focus on COVID-19, but has also stimulated an in-depth exploration of the complex relationship between acute respiratory distress syndrome (ARDS) and it (16, 17). Our research focuses on uncovering the biological mechanisms that may be shared by these two diseases, particularly in terms of similarities in gene expression and immune responses. By combining machine learning algorithms and single-cell sequencing analyses, we not only identified a potential link between ARDS and COVID-19, but also highlighted several key genes, such as TCF4 and HK1, that may play a critical role in the pathogenesis of both diseases.

In our analysis, we identified differentially expressed genes (DEGs) that are co-up- and down-regulated in ARDS and COVID-19, suggesting that they play important roles in the pathophysiology of these diseases. In particular, our enrichment analysis revealed the major biological functions and pathways involved in these genes, providing valuable insights for a deeper understanding of these diseases. For example, KEGG pathway analysis revealed that these common DEGs are closely associated with key biological processes such as inflammatory response and apoptosis, which play a central role in the severe pathophysiology of ARDS and COVID-19.

In addition, our study employs advanced machine learning techniques such as Random Forest (RF) and Least Absolute Shrinkage and Selection Operator (LASSO) to refine our identification of potential key hub genes. Using the RF approach, we initially screened a set of candidate diagnostic genes, and to further refine these results and identify common key genes for COVID-19 and ARDS, we performed a downscaling analysis using the LASSO approach, which ultimately led to the identification of five key genes: HIST1H2BK, TCF4, OLFM4, KIF14, and HK1. These genes were highly significant in our model and have been shown in the literature to play key roles in processes such as immune response, inflammation, and cell death.

In particular, TCF4 displays key hub gene properties associated with immune regulation in the context of ARDS and COVID-19, and shows altered expression patterns under various inflammatory conditions. This is highly consistent with recent findings, one of which showed that the combination of hCMSCs and liraglutide significantly improved the therapeutic efficacy of ALI via the cAMP/PKAc/β-catenin signaling pathway, in which TCF4 plays a central role (18). In addition, gene expression analysis of SARS-CoV-2 infection revealed that TCF4 may be associated with cardiovascular complications of COVID-19, particularly with regard to vascular function (19). More specifically, a study of pediatric patients with COVID-19 identified autoantibodies against TCF4, highlighting the potential importance of TCF4 in the regulation of immune responses (20). Taken together, these findings highlight the central role of TCF4 in the pathophysiology of ARDS and COVID-19 and the importance of further exploring its function and therapeutic potential.The HK1 gene is of particular importance in the context of ARDS and COVID-19. Notably, the study identified unique SARS-CoV-2 phylogenetic clusters, mostly associated with HK1, coinciding with the large-scale COVID-19 outbreak in Hong Kong in July 2020 (21). In addition, a study using 18F-FDG PET technology revealed a critical role for HK1 in neutrophil activation and lung inflammation, suggesting that increased HK1 activity is associated with increased neutrophil glucose uptake and migration capacity, which is a key factor in the development of ALI/ARDS (22).

These findings not only emphasize the importance of HK1 in the inflammatory and immune response, but also highlight its potential value in molecular imaging and early disease diagnosis.OLFM4 may play an important role in the pathophysiology of ARDS and COVID-19, particularly in the inflammatory response and immune regulation. This is supported by the existing literature, in which one study constructed a predictive model based on transcriptional biomarkers and clinical parameters and emphasized the importance of OLFM4 in predicting sepsis-induced ARDS (23). In addition, a global gene expression study and docking analysis confirmed the overexpression of OLFM4 in COVID-19 infection (24). Although KIF14 has been shown to be important in the pathophysiology of COVID-19, especially in the co-pathogenesis with digestive cancers, its role in ARDS is unclear (25). This discrepancy not only highlights the complexity of gene expression in different disease conditions, but also provides a new direction for future studies to explore in depth the potential function and role of KIF14 in ARDS.

To validate our findings and assess the potential of these genes in disease diagnosis, we further performed XGBoost-based diagnostic model construction and validation. Notably, the key genes we identified, such as HIST1H2BK, OLFM4, KIF14, and HK1, showed good performance in the model, suggesting their potential value in the diagnosis of ARDS and COVID-19. The expression levels of these genes were strongly correlated with the severity of the disease and the clinical outcome of the patients, further emphasizing their importance in clinical applications.

In addition, we investigated the role of immune cells in ARDS and COVID-19, in particular their infiltration patterns and functional status during the pathological process. Using CIBERSORT algorithm analysis, we observed changes in the distribution of different immune cell subpopulations in diseased tissues that were consistent with the expression patterns of the key genes we identified. This not only reveals the complexity of the immune response, but also highlights the need for further studies to understand the immunopathology of ARDS and COVID-19.

Although our study provides valuable insights, there are several limitations. First, due to the lack of a comprehensive single-cell data set, our analysis relied heavily on available public data and previously published studies. In addition, our study did not directly examine how specific genes affect intercellular communication or the tissue microenvironment, which may be an important direction for future research. Finally, while our model shows diagnostic potential, validation of these findings in a broader patient population is needed.

In conclusion, our study reveals a potential molecular link between ARDS and COVID-19, particularly in terms of gene expression and immune response. The key genes and pathways we identified not only provide insight into these diseases, but may also guide future therapeutic strategies and drug development. However, more research is needed to understand in detail how these genes and pathways specifically affect the disease process and their potential for clinical application.

Conflict of Interest

Chuanxi Tian, Yikun Guo, Huifang Guan, Kaile Ma, Rui Hao, Wei Zhu, Jinyue Zhao, and Min Li declare that they have no competing interests.

Author Contributions

Conceptualization: C.X.T., M.L., and W.Z.; Formal analysis and Data Curation: Y.K.G., C.X.T., and J.Y.Z.; Writing - Original Draft: C.X.T., Y.K.G., H.F.G.; Writing - Review & Editing: C.X.T., Y.K.G., H.F.G., M.L., K.L.M., and R.H. All authors read and approved the final manuscript.

Funding

This work was supported by the "China-Austria Joint Laboratory Construction and Joint Research on Traditional Chinese Medicine for Prevention and Treatment of Major Infectious Diseases along the Belt and Road" project (Project No. 2020YFE0205100), with the support of the National Key Program for Strategic International Science and Technology Innovation and Cooperation.

Acknowledgments

None.

Data availability statement

The datasets generated and/or analysed during the current study are available in the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/), for the ARDS dataset (GSE32707 and GSE66890) and the COVID-19 dataset (GSE213313).

Thompson BT, Chambers RC, Liu KD. Acute Respiratory Distress Syndrome. N Engl J Med (2017) 377:562–572. doi: 10.1056/NEJMra1608077
Min Tang, Na Li. Pathophy siological mechanism of acute respiratory distress syndrome and research progress on diagnostic biomarkers of ARDS. China Journal of Modern Medicine (2022) 32:1–6.
Bellani G, Laffey JG, Pham T, Fan E, Brochard L, Esteban A, Gattinoni L, van Haren F, Larsson A, McAuley DF, et al. Epidemiology, Patterns of Care, and Mortality for Patients With Acute Respiratory Distress Syndrome in Intensive Care Units in 50 Countries. JAMA (2016) 315:788–800. doi: 10.1001/jama.2016.0291
Majumder J, Minko T. Recent Developments on Therapeutic and Diagnostic Approaches for COVID-19. AAPS J (2021) 23:14. doi: 10.1208/s12248-020-00532-2
Lippi G, Sanchis-Gomar F, Henry BM. COVID-19 and its long-term sequelae: what do we know in 2023? Pol Arch Intern Med (2023) 133:16402. doi: 10.20452/pamw.16402
Zheng J, Miao J, Guo R, Guo J, Fan Z, Kong X, Gao R, Yang L. Mechanism of COVID-19 Causing ARDS: Exploring the Possibility of Preventing and Treating SARS-CoV-2. Front Cell Infect Microbiol (2022) 12:931061. doi: 10.3389/fcimb.2022.931061
Quesada-Gomez JM, Entrenas-Castillo M, Bouillon R. Vitamin D receptor stimulation to reduce acute respiratory distress syndrome (ARDS) in patients with coronavirus SARS-CoV-2 infections: Revised Ms SBMB 2020_166. J Steroid Biochem Mol Biol (2020) 202:105719. doi: 10.1016/j.jsbmb.2020.105719
Shen Y, Liu J, Zhang L, Dong S, Zhang J, Liu Y, Zhou H, Dong W. Identification of Potential Biomarkers and Survival Analysis for Head and Neck Squamous Cell Carcinoma Using Bioinformatics Strategy: A Study Based on TCGA and GEO Datasets. Biomed Res Int (2019) 2019:7376034. doi: 10.1155/2019/7376034
Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res (2013) 41:D991-995. doi: 10.1093/nar/gks1193
WGCNA: an R package for weighted correlation network analysis - PubMed. https://pubmed.ncbi.nlm.nih.gov/19114008/ [Accessed November 14, 2023]
Breiman L. Random Forests. Machine Learning (2001) 45:5–32. doi: 10.1023/A:1010933404324
Tibshirani R. Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological) (1996) 58:267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x
Engebretsen S, Bohlin J. Statistical predictions with glmnet. Clin Epigenetics (2019) 11:123. doi: 10.1186/s13148-019-0730-1
Chen B, Khodadoust MS, Liu CL, Newman AM, Alizadeh AA. Profiling Tumor Infiltrating Immune Cells with CIBERSORT. Methods Mol Biol (2018) 1711:243–259. doi: 10.1007/978-1-4939-7493-1_12
Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, Hao Y, Stoeckius M, Smibert P, Satija R. Comprehensive Integration of Single-Cell Data. Cell (2019) 177:1888–1902.e21. doi: 10.1016/j.cell.2019.05.031
Meyer NJ, Gattinoni L, Calfee CS. Acute respiratory distress syndrome. Lancet (2021) 398:622–637. doi: 10.1016/S0140-6736(21)00439-6
Attaway AH, Scheraga RG, Bhimraj A, Biehl M, Hatipoğlu U. Severe covid-19 pneumonia: pathogenesis and clinical management. BMJ (2021) 372:n436. doi: 10.1136/bmj.n436
Feng Y, Wang L, Ma X, Yang X, Don O, Chen X, Qu J, Song Y. Effect of hCMSCs and liraglutide combination in ALI through cAMP/PKAc/β-catenin signaling pathway. Stem Cell Res Ther (2020) 11:2. doi: 10.1186/s13287-019-1492-6
Jha PK, Vijay A, Halu A, Uchida S, Aikawa M. Gene Expression Profiling Reveals the Shared and Distinct Transcriptional Signatures in Human Lung Epithelial Cells Infected With SARS-CoV-2, MERS-CoV, or SARS-CoV: Potential Implications in Cardiovascular Complications of COVID-19. Front Cardiovasc Med (2020) 7:623012. doi: 10.3389/fcvm.2020.623012
Bartley CM, Johns C, Ngo TT, Dandekar R, Loudermilk RL, Alvarenga BD, Hawes IA, Zamecnik CR, Zorn KC, Alexander JR, et al. Anti-SARS-CoV-2 and Autoantibody Profiles in the Cerebrospinal Fluid of 3 Teenaged Patients With COVID-19 and Subacute Neuropsychiatric Symptoms. JAMA Neurol (2021) 78:1503–1509. doi: 10.1001/jamaneurol.2021.3821
To KK-W, Chan W-M, Ip JD, Chu AW-H, Tam AR, Liu R, Wu AK-L, Lung K-C, Tsang OT-Y, Lau DP-L, et al. Unique Clusters of Severe Acute Respiratory Syndrome Coronavirus 2 Causing a Large Coronavirus Disease 2019 Outbreak in Hong Kong. Clin Infect Dis (2021) 73:137–142. doi: 10.1093/cid/ciaa1119
Rodrigues RS, Bozza FA, Hanrahan CJ, Wang L-M, Wu Q, Hoffman JM, Zimmerman GA, Morton KA. 18F-fluoro-2-deoxyglucose PET informs neutrophil accumulation and activation in lipopolysaccharide-induced acute lung injury. Nucl Med Biol (2017) 48:52–62. doi: 10.1016/j.nucmedbio.2017.01.005
Yao R-Q, Shen Z, Ma Q-M, Ling P, Wei C-R, Zheng L-Y, Duan Y, Li W, Zhu F, Sun Y, et al. Combination of transcriptional biomarkers and clinical parameters for early prediction of sepsis indued acute respiratory distress syndrome. Front Immunol (2022) 13:1084568. doi: 10.3389/fimmu.2022.1084568
A J, N A, K R. Global Gene Expression and Docking Profiling of COVID-19 Infection. Frontiers in genetics (2022) 13: doi: 10.3389/fgene.2022.870836
Xiong Z, Yang Y, Li W, Lin Y, Huang W, Zhang S. Exploring Key Biomarkers and Common Pathogenesis of Seven Digestive System Cancers and Their Correlation with COVID-19. Curr Issues Mol Biol (2023) 45:5515–5533. doi: 10.3390/cimb45070349

No competing interests reported.

Download PDF

Reviewers agreed at journal
01 Mar, 2024
Reviewers invited by journal
01 Mar, 2024
Editor assigned by journal
01 Mar, 2024
Editor invited by journal
04 Feb, 2024
Submission checks completed at journal
04 Feb, 2024
First submitted to journal
23 Jan, 2024

You are reading this latest preprint version

Combination of multiple omics and machine learning identifies diagnostic genes for ARDS and COVID-19

Status:

Version 1

Abstract

BACKGROUND

METHODS

RESULTS

Figures

Introduction

Materials and Methods

2.1 Differential Gene Screening and Enrichment Analysis

Results

Discussion

Declarations

References

Additional Declarations

Status:

Version 1