2.1 Quantitative proteomic analysis of COVID-19 plasma samples
We performed label-free quantification of a total of 74 depleted plasma samples out of which 20 were negative, 18 were non-severe and 36 were severe (Figure 1A). Figure 1 (B-F) depicts the schematic workflow of label-free quantification under discovery proteomics, illustrates the overview of statistical data analysis, shows the summary of synthetic peptide peaks after Multiple Reaction Monitoring (MRM) under validation proteomics, represents the outline of biological network analysis and docking study respectively. The correlation matrix of the total 74 samples is shown in Supplementary figure 1. The mass-spectrometry setting for the label-free quantification is shown in Supplementary figure 2. The LFQ analysis of 74 samples provides a total of 1206 proteins. A list of 278 missing value imputed proteins from 71 samples was taken forward for the partial least squares-discriminant analysis (PLSDA) for an overall assessment of the difference between the COVID-19 positive and COVID-19 negative sample cohort. The two sample cohorts were found segregating in two separate clusters shown in Figure 1G. The statistical analysis between the COVID-19 positive and COVID-19 negative cohort provides a list of 27 significant differentially expressed proteins which has been represented in the form of a volcano plot in Figure 1H and a heatmap in Figure 1J (Supplementary Table S2). We identified proteins such as von Willebrand factor (VWF), Haptoglobin-related protein (HPR), Glutathione peroxidase 3 (GPX3), Alpha-2-macroglobulin (A2M), Carbonic anhydrase 2 (CA2), Protein S100-A8 (S100A8), Carboxypeptidase B2 (CPB2), Heparin cofactor 2 (SERPIND1), Fibrinogen gamma chain (FGG), Profilin-1 (PFN1) and Serum amyloid A-4 protein (SAA4) to be significantly upregulated in the COVID-19 positive patients. The proteins such as Lymphatic vessel endothelial hyaluronic acid receptor 1 (LYVE1), Intercellular adhesion molecule 1 (ICAM1), Macrophage migration inhibitory factor (MIF), Histidine-rich glycoprotein (HRG), IgGFc-binding protein (FCGBP), Immunoglobulin heavy variable 3-15 (IGHV3-15) and Insulin-like growth factor-binding protein 3 (IGFBP3) are significantly downregulated in the COVID-19 positive patients. The violin plot of few dysregulated proteins SERPIND1, VWF, and MIF protein are shown in Figure 1I. We have also found a protein Cadherin EGF LAG seven-pass G-type receptor 2 (CELSR2), which is exceptionally absent in all the COVID-19 negative samples but present in COVID-19 positive samples. The identification of CELSR2 protein in COVID-19 positive is also very interesting as it is present in 5 out of 18 non-severe samples whereas found in 26 samples out of 33 severe samples.
Proteomic analysis of COVID-19 Non-Severe and COVID-19 Severe patients
Further, this study has also investigated the proteomic alteration between the Non-severe and Severe cohort which provides a list of38 significantly differentially expressed proteins (Supplementary Table S3). Figure 2A represents a heatmap of the top 25 differentially expressed proteins in context to the Severe and Non-Severe cohort. A list of 287 missing value imputed proteins was taken forward for the partial least squares-discriminant analysis (PLSDA) for an overall assessment of the difference between the Severe and Non-severe cohort. The two sample cohorts were found segregating in two separate clusters except for Sample P93, P30, and P106, which clustered closer to the opposite cohort (Figure 2B). Figure 2C depicts the significant DEPs in the form of a Volcano plot. The proteins such as Kallistatin (SERPINA4), Serum amyloid P-component (APCS), Protein S100-A8 (S100A8), Fibrinogen gamma chain (FGG), Corticosteroid-binding globulin (SERPINA6), and Alpha-1-antichymotrypsin (SERPINA3) were found to be upregulated in the severe cohort whereas proteins such as Complement factor D (CFD), Monocyte differentiation antigen (CD14), Complement component C8 alpha chain (C8A), Apolipoprotein (LPA) and Apolipoprotein M (APOM) were found to be downregulated in the severe when compared to non-severe patients. Supplementary figure S3 represents the top 25 differentially expressed proteins of Severe and Negative in the form of Heatmap and depicts the PLSDA clustering of the Severe versus Negative patients.
2.2 MRM analysis of proteins overexpressed in COVID-19 severe
The MRM study aimed to validate the differentially regulated proteins found between COVID-19 severe and non-severe samples from the LFQ data. The BSA as QC standard to monitor day-wise instrument response is shown in Supplementary figure S4. To establish that all the injections gave the more or less same response, we spiked in an equal amount of a heavy labeled synthetic peptide (FEDGVLDPDYPR) in all samples. The uniform peak areas for this peptide as shown in Supplementary figure S5 establishes the same. Even duplicates run on separate days showed comparable peak areas with very low CV. Based on the response of the differentially regulated peptides, the list was further refined to keep only peptides showing significant dysregulation (adjusted p values below 0.05) between severe and non-severe. For this, the peaks were annotated and transitions were refined according to the library match to give dotp values for all peptides. A dotp value is a measure of the match between the experimental peak and the library fragmentation patterns. Thus, the refined list had 183 transitions belonging to 28 peptides of 9 host proteins and 1 synthetic peptide. List of peptide sequences and transitions of proteins that showed differential regulation between COVID-19 non-severe and severe patient samples shown in Supplementary Table S4. Using the MS stats external tools in Skyline we determined that proteins AGT, APOB, SERPINA3, FGG, and SEPRING1 have 3 or more than 3 peptides that show a peak area fold change more than 3 and adjusted p-value less than 0.05 at a confidence of 95-99% (Figure 3). This validates that for the given set of samples, these proteins are showing statistically significant overexpression in COVID-19 severe patients than in COVID-19 non-severe patients (refer to the data availability section for the Skyline files).
2.3 Biological Pathway and Network analysis of differentially expressed protein in Severe Vs Non-severe comparison
We also identified the enriched biological processes for the 38 dysregulated proteins in COVID-19 severe compared to COVID-19 non-severe patients. The biological processed enriched proteins were shown in the form of protein-protein interaction. Few proteins have been shown in the form of a violin plot (Figure 4A). Figure 4B shows a network of enriched terms colored by clusters, where nodes that share the same clusters are typically close to each other. We identified the biological process such as regulation of peptidase activity, regulated exocytosis, extracellular structure organization, blood coagulation, fibrin clot formation, complement activation, classical pathway, leukocyte activation involved in immune response, and Response to glucocorticoid process to be enriched in COVID-19 severe patients. The list of proteins expressed in these pathways is shown in Supplementary Table S5.
2.4 Metabolomics profiling of COVID-19 patient cohort
The workflow of metabolome profiling from plasma samples is shown in Figure 5A. The quality check control of the internal standard of all the sample runs is shown in Figure 5B. The Principal Component Analysis (PCA) plot representing proper segregation of QC pools from all the batches for quality check of sample run is shown in Figure 5C. Among the analysis of COVID-19 Negative and COVID-19 Positive samples 32 metabolites came out to be common yet significant differentially expressed metabolites (DEMs) having FDR adjusted p-value less than 0.05 and fold change above 1.5 (Supplementary Table S6). Out of the 32 DEMs, only 11 were not a contaminant from the blank solvent. Of the 11 only 1 was level-2 annotated – Linoleate; 2 were level-3 annotated - Kauralexin A1 and D-(+)-Maltose (Supplementary table S7). Rest 8 of the metabolites were level-4 annotated. Furthermore, 3 and 2 metabolites were found significantly unique to the COVID-19 Positive cohort and COVID-19 Negative cohort respectively, all of which were level-4 annotated. The PCA plot showing the segregation of the sample sets were performed based on the 11 significant and non-contaminant classifiers (Supplementary figure S6F).
Non-severe and Severe COVID-19 Positive patients’ comparative data analysis resulted in 24 features having FDR adjusted p-value less than 0.05 and fold change above 1.5 were considered as statistically differentially expressed metabolites and were named as Differentially Expressed Metabolites (DEMs) (Supplementary Table S7), out of which 13 metabolites were found post blank subtraction. These 13 metabolites were used for the PCA plot (Figure 5D), and heat map preparation to show the segregation of Non-severe to Severe sample sets (Figure 5E). The blank subtracted significant DEMs were used to calculate the Variable Importance in Projection (VIP) scores and for plotting volcano plot along with their expression trend represented as box plots (Fig. 5F and 5G). The box plots represent all the level 2 metabolites i.e. 4 out of 13 significant DEMs (Supplementary Table S6), the trend of the rest of the unannotated DEMs, either level 4 or level 3 is listed in Supplementary Table S8 and their trend is represented in Supplementary figure S6G.
A total of 18 significantly altered metabolites was found in the comparison of NSC vs SC, which contains 13 DEMs, 1 metabolite specific to the NSC cohort, and 4 metabolites specific to the SC cohort. Five out of eighteen significant metabolites were found to be of level 2 MSI viz. Propionylcarnitine, N-Methylethanolamine phosphate, Indole-3-acetic acid, Creatine, and Bilirubin. Kauralexin A1 was found to be level 3 MSI and the remaining 12 significant metabolites belong to level 4 MSI (Supplementary table S9).
Propionylcarnitine was found enriched in the oxidation of branched-chain fatty acids pathway, indole-3-acetic acid was enriched and mapped on tryptophan metabolism pathway, creatine was enriched and mapped on glycine, serine, threonine, arginine, and proline metabolism pathway, and bilirubin was enriched and mapped on porphyrin metabolism pathway (Supplementary Figure S7). Kauralexin A1 was not found in enrichment or pathway analysis.
2.5 In-silico screening of drugs against differentially expressed proteins
We have performed in silico molecular docking of significantly altered proteins with the library of 58 drugs (Supplementary table 10A-E). Out of 58 drugs, 30 drugs are FDA approved, 9 drugs are clinically approved and 19 drugs are pre-clinical approved. We have identified positive control drugs for each protein from the literature which is a known inhibitor of the protein. Positive control drug gives us a possible cut-off for the docking score. After docking, we have used two criteria for selecting the drugs for the protein. Firstly, the drug's binding energy should be equal to or higher than that of the control inhibitor. Secondly, the drug's binding pocket should be similar to the control drug. For COVID-19 non-severe vs severe comparison, we have docked 5 proteins that are coming significant, which are Heparin cofactor 2, Thyroxine-binding globulin, Angiotensinogen, Carbonic Anhydrase-1, and Carbonic Anhydrase-2.
Heparin cofactor 2 (SERPIN D1) is a protein of 499 amino acid long peptide, which binds with the drug Sulodexide with a binding affinity of -7.1 kcal/mol and hence it is taken as a control drug (Supplementary figure S8A). When docked with the customized drug library, we find four FDA approved drugs that bind to a similar binding pocket as the control drug and have better binding affinity than the Sulodexide, namely, Selinexor (-8.7 kcal/mol), Ponatinib (-8.4 kcal/mol), EGCG (-7.7 kcal/mol) and Nafamostat (-8.1 kcal/mol). Another protein Thyroxine-binding globulin (SERPIN A7) is a protein with 415 amino acids. It showed a binding affinity of -7.4 kcal/mol with the drug Tamoxifen which is a well-known inhibitor of the protein hence we have used it as a control inhibitor of the protein (Supplementary figure S8B). From the customized drug library SERPIN A7 has bound to Selinexor and Ponatinib with a binding affinity of -9.3 kcal/mol (Figure 6B). 2D interaction diagram of Selinexor docked with SERPIN A7 shows the amino acids Y20 and R381 forms potential Hydrogen bonds at the binding site (Figure 6A). Angiotensinogen protein is 485 amino acids long; it binds with the Irbesartan with a binding affinity of -8.4 kcal/mol. It is a known inhibitor of the protein, so we have used it as a control drug (Supplementary figure S8C). Angiotensinogen binds to the drug ML-240, which a pre-clinical approved drug with a binding affinity of -8.9 kcal/mol. This is the only drug we have identified in our study to target Angiotensinogen. We have also performed docking of Carbonic Anhydrase-1 (261 amino acid length) and Carbonic Anhydrase-2 (260 amino acid length). Small molecule Topiramate binds with Carbonic Anhydrase-1 with a binding affinity of -9.2 kcal/mol (Supplementary figure S8D) and Acetazolamide binds with Carbonic Anhydrase-2 with a binding affinity of -6.3 kcal/mol (Supplementary figure S8E), they are used as control drug for respective proteins. In our study, we have identified EGCG as the only FDA-approved drug that can be used to target Carbonic Anhydrase-1 (with binding affinity -9.5 kcal/mol) and Nafamostat (with binding affinity -8.2 kcal/mol) to target Carbonic Anhydrase-2. Four proteins from COVID-19 positive vs negative comparison were used for molecular docking which are Protein S100 A9, Carboxy Peptidase B2, Glutathione S-transferase omega-1, and 6-Phosphogluconate dehydrogenase.
We found that the Rapamycin drug can be used to target all 4 proteins as it is binding with the proteins with higher Binding affinity than their respective control drug and the binding pocket is also the same as the control drug. The first protein is Protein S100 A9 which is a small protein with 114 amino acids. We have used Tasquinimod as a control inhibitor as it binds with the protein with a binding affinity of -7.5 kcal/mol (Supplementary figure S8F). Using the above-mentioned criteria to select drugs, we have identified Protein S100 A9 can be targeted using two FDA approved drugs: Selinexor which also binds to the protein with a binding affinity of -7.5 kcal/mol and another drug is Rapamycin which is an mTOR inhibitor, it binds to the protein with a binding affinity of -8.2 kcal/mol. Carboxy Peptidase B2 is a 423 amino acid long protein. We have used Anabaenopeptin F as a control drug, it binds with the protein with a binding affinity of -8.3 kcal/mol (Supplementary figure S8G). From our docking studies, we have identified 3 FDA-approved drugs Rapamycin (binding affinity -8.7 kcal/mol), Dabrafenib (binding affinity -8.8 kcal/mol), and Daunorubicin (binding affinity -8.6 kcal/mol) which can be used to target Carboxy Peptidase B2. Glutathione S-transferase omega-1 is a protein with 241 amino acids, CMFDA is a known inhibitor from literature. We have used CMFDA as a control drug it binds to the GSTO-1 with a binding affinity of -8.3 kcal/mol (Supplementary figure S8H).
Four FDA approved drugs from our customized drug library can be used to target GSTO-1, which are Rapamycin (binding affinity -8.8 kcal/mol), Selinexor (binding affinity -8.6 kcal/mol), Ponatinib (binding affinity -9.1 kcal/mol), and Silmitasertib (binding affinity -8.3 kcal/mol). 6-Phosphogluconate dehydrogenase is a protein with 483 amino acids, we have used Physcion as a control drug it binds with 6-PGDH 1 with a binding affinity of -7.0 kcal/mol (Supplementary figure S8I). Six FDA approved drugs from our customized drug library can be used to target 6-PGDH, which are Rapamycin (binding affinity -8.8 kcal/mol), Selinexor (binding affinity -8.5 kcal/mol), Ponatinib (binding affinity -10.3 kcal/mol), Silmitasertib (binding affinity -7.7 kcal/mol), Daunorubicin (binding affinity -8.4 kcal/mol) and Dabrafenib (binding affinity -8.6 kcal/mol). From our molecular docking analysis, we have found Rapamycin which is an already approved drug for organ transplant rejection, binds to all four significantly upregulated proteins of COVID-19 positive vs negative comparison. We have also found that Selinexor, which is an exportin antagonist and is approved for multiple myeloma, and Ponatinib which is a tyrosine kinase inhibitor and is approved for Chronic Myeloid Leukemia (CML) can be used to target proteins. from COVID-19 positive vs negative comparison and non-severe vs severe comparison as it has shown to inhibit proteins from both the comparison. Another drug Pevonedistat which is a clinically approved drug can also be explored in the future for targeting COVID-19 as it has also been shown to inhibit proteins from both the comparison.