Considering the previous knowledge that virus infections provoke changes in the structures of biomolecules, in this research, we examined the FTIR spectra of COVID-19 and healthy patients, seeking the discrimination between these two populations through the analysis of FTIR spectra and an MLRM. Although ATR-FTIR is not used as a diagnosis technique, several authors have reported the use of FTIR for virus detection; for example, Erukhimovitch et al. in 2005, stated that it is possible to apply FTIR microscopy as a sensitive and effective assay for the detection of cells infected with various members of the herpes family of viruses and retroviruses [23]. Lee-Montiel et al., in 2011, evaluated the utility of FTIR spectroscopy for rapid detection of infective virus particles poliovirus in cell cultures [24], and Santos et al. in 2020, reported several spectral features changes for hepatitis infected patients [17]. Therefore, in the search to propose new techniques that allow detecting the SARS-CoV-2 virus, FTIR spectroscopy has been considered in this research.
One of the main reasons for new diagnosis strategies was that although RT-PCR tests have shown high sensitivity for detecting the virus (97.2% [90.3%-99.7%]), false-negative results are expected within seven days of infection. Additionally, the RT-PCR process is time-consuming, and shortages in test kit supplies are common worldwide. On the other hand, serological testing of IgM and IgG production in response to viral infection is usually detected only 1–3 weeks after symptoms [25]. However, the employed technique used in this research allowed us to discriminate patients within the first day of clinical symptomology up to the third week of clinical evolution.
In the analysis of the COVID-19 population characteristics, although Peckham et al. have demonstrated that there is no difference in the proportion of males and females infected with SARS-CoV-2 [26], in this research, we documented that 160 (62.7%) men integrated the COVID-19 population, and 95 (37.3%) women, this probably due to the samples were obtained from hospitalized patients. The same authors declared that males face higher odds of intensive therapy unit (ITU) admission and death than females.
About the age, even though Hu et al. reported that it appears that all ages of the population are susceptible to SARS-CoV-2 infection, the median age of infection is around 50 years [27], which was also observed in this research, once the average age was 54.3 ± 14.7 years.
Concerning vital signs in the COVID-19 group, as previously mentioned, the only altered vital sign was the SaO2 is showing a mean of 90%. Nevertheless, it is mandatory to remember that these patients were hospitalized; one of the main criteria for hospitalization besides evidence of pulmonary affection through CT, the low PO2, which entails a low SaO2. Hu et al. have reported that the most common symptoms in COVID-19 patients are fever, dry cough, and fatigue in patients less than 50 years, adding dyspnea in patients over 60 years (27). Likewise, in this research, we found that the main reported symptoms were cough, dyspnea, headache, and fever.
About comorbidities, as previously mentioned, obesity, diabetes, and hypertension were the most reported entities in this study. These results agree with Ortiz-Brizuela et al., Berumen et al., and Petrova et al., who declared that the pathologies above are the main risk of infection and hospitalization by COVID-19 [28–30].
Regarding the blood group, even though Zhao et al. have reported that blood group O is associated with a lower risk for the infection compared with non-O blood groups [31], in this research, the main blood type was O, probability due to this blood type is the most common in Mexico [32], country where this research took place.
About the laboratory blood tests, Velavan and Meyer have declared that CRP, D-dimers, ferritin, cardiac troponin, and IL-6 could be used in risk stratification to predict severe and fatal COVID-19 in hospitalized patients [33]. In our study, we observed that the values of neutrophiles, glucose, CRP, LDH, fibrinogen, D-dimer, and ferritin were increased, i.e., the patients that integrated our study presented three of the laboratory risks mentioned by Velavan, probably due to these patients were hospitalized because they required specialized medical attention. As expected, we detected neutrophilia, as it is known the primary function of the neutrophils is clearance of pathogens and debris through phagocytosis, the liberation of neutrophil extracellular traps is needed for viral infection inactivation and restriction of virus replication, been the neutrophils the first cell recruitment in COVID-19 [34]. In addition, hypoxia and hypocapnia are seen in severe COVID-19 cases; Wang et al. reported a median PaO2 of 68 mmHg and a median of PaCO2 of 34 mmHg in 138 COVID-19 patients [35], results that are similar to the ones obtained in this research (PaO2 66 mmHg, and PaCO2 31.1 mmHg).
On the other hand, the obtained spectra were similar to those reported by Caetano et al., showing characteristics of biological samples [16]. However, it is essential to mention that the population evaluated by Caetano et al. was informed to abstain from food and caffeine products for at least two h before the saliva collection and rinse out their mouths with distilled water. Contrary, in this study, a fasting period of at least 8 hours was required, and an exclusion criterion was patients who had brushed or rinsed the oral cavity with mouthwash before sampling.
As previously mentioned, in the FTIR spectra analysis, a slight displacement, as well as a decrease in the absorbance in the regions of amide I and amide II, were exhibited in the COVID-19 group, which may be attributed to a decrease in protein production, which corresponds to that reported by Denisa Bojkova et al., who observed a decrease in the expression of proteins, especially those related to cholesterol metabolism in CaCo-2 cells infected by SARS-CoV-2 [36]. In the same way, Bouhaddou et al. reported a decrease in the abundance of host proteins and a predominance of viral proteins, which is consistent with the mechanisms reported by other viruses in the inhibition of protein translation of the guest [37]; similarly to that found in Vero cells infected by herpes viruses, in which, protein synthesis and cellular metabolism decrease in the initial stages of infection consuming cellular metabolites such as nucleotides, amino acids, and cellular enzymes [36–38]. Highlighting that Barauna et al. reported a decrease in the peak related to amide I in saliva combined with inactive SARS-CoV-2 virus compared to saliva without infection [39].
In the same way, it is important to mention that the peak at 1240 cm− 1, which is related to phosphorylated molecules, is increased in COVID-19 patients, respect to healthy patients. About this, Bouhaddou et al. reported an increase in phosphorylated proteins with a decrease in protein abundance, as well as hyperphosphorylation of the CK2 and p38 MAPK pathways related to cytokine production [37], which is also consistent with that reported by Diamond et al. [40]. Moreover, Erukhimovitch et al. reported an increase in the peak at 1240 cm-1 in cells infected with the herpes virus [38].
About the immune response, it has been declared that the combination of IgG and IgM achieves an overall sensitivity of 87.8% and specificity of 98.9% for detecting SARS-CoV-2; nevertheless, the complexity of the humoral response in COVID-19 is not fully elucidated, and the relevance of the SARS-CoV-2 antibody response for the long-term clinical outcome of viral clearance is still lacking. Some authors have declared that the reported time to IgM positivity ranges from 5 to 10 days following disease onset, whereas IgG positivity occurs between 13–21 days. Moreover, some others have stated that the earliest detection of IgM was at five days post symptom onset, and the earliest detection of IgG was at seven days post symptom onset [41–43].
In the same way, it has been reported that IgA plays an essential role in mucosal immunity, being the most crucial immunoglobulin to fight infectious pathogen in the respiratory system [44]. Furthermore, it has been stated that salivary testing is the most convenient way to measure IgA, the reason by which it has been used to characterize mucosal immune responses to many viral infections such as SARS, MERS, influenza, HIV, and RSV. Serum IgA has been detected in COVID-19 patients and appears to be detectable earlier than IgM or IgG antibodies, possibly as early as two days after onset of symptoms, suggesting that IgA may be the first antibody to appear in response to SARS-CoV-2 infection [45]. In this research, changes in absorbance in the areas related to IgG (1560 − 1464 cm− 1), IgM (1420 − 1289 cm− 1, 1160 − 1028 cm− 1), and IgA (1285 − 1237 cm− 1) were observed, noticing a higher absorbance in the spectra of COVID-19 group, which is concordant with all those mentioned above.
On the other hand, the second derivative spectra of SARS-CoV-2 patients at 1695 cm− 1, 1682 cm− 1, 1660 cm− 1, 1652 cm− 1, 1646 cm− 1, 1639 cm− 1, 1631 cm− 1, and 1625 cm− 1 showed an absorbance decrease and a displacement, suggesting changes in the protein structures. About this, Diamond et al. declared a decrease in the expression of the mRNA of ACE2 and IL-6 in saliva samples, which would correspond to the decrease in the secondary structures reported by Meirson et al., who through a bioinformatic analysis described that the main secondary structure between the union of SARS-CoV-2 and ACE is the к-helix structure (polyproline II), followed by the α-helix and β-strand, changing the disulfide bonds [40, 46]. Moreover, Giubertoni et al. assigned the peak at 1619 ± 2 cm− 1 as helical conformation similar to that of a polyproline II helix, and 1659 ± 2 cm− 1 as α-helix, which are also diminished in our study [47].
As expected, the immunoglobulins content showed that the COVID-19 group expressed a higher IgA, IgM, and IgG content than the healthy group. Moreover, when comparing the expression of these in the COVID-19 group, it can be observed that the IgA was the least immunoglobin expressed, followed by the IgM; being the IgG the most expressed immunoglobulin, which may be attributed to that most of the samples were collected at day 9.24 after PCR diagnosis, and according to the aforementioned the IgM is detected five days post symptom onset. The earliest detection of IgG is at seven days post symptom onset. Nevertheless, it has to be considered that some samples were obtained the first day when the patients showed symptoms, so that IgA was also detected in this population.
When comparing DNA and nucleic acid content, the COVID-19 group showed a higher content of these molecules. Zaling et al. have declared that in necrotic cell death, the DNA is completely unwound, the reason by which 100% of the DNA is visible to IR at this stage, observing an increase of ∼65% in DNA absorbance in necrosis compared to the control. Moreover, they also reported a decrease in the random coil structure of the total protein, similarly to the COVID-19 group of this research [48].
On the other hand, as previously mentioned, the characterization of two or more populations from the analysis of the FTIR spectra of their individuals is not an easy task; in a more complex sample, it will be more complicated to find characteristic patterns of the population. This because the links of the different components could overlap with the characteristic component links of each sample. Moreover, the nature of the samples (fluid or tissues, cells, among others) has its particularities.
Different methodologies have been proposed to identify populations from the analysis of FTIR spectra, facilitating the adoption of a classification method by allowing experimentation to focus only on the most promising. In this sense, in another work, we first experimented with linear classification models to discriminate COVID-19 patients, although these models were affected by the overlap of the spectra due to the variances of the absorbances/transmittances of the populations; this problem can be overcome by having a large population thanks to the central limit theorem. In this work, we discriminated against our groups employing an MLRM, which was validated employing a LOOCV according to our previous research.
The absorbance variations and principally the peak displacement associated with viral infections shown in Figs. 3A and 3B contributed to the excellent performance of MLRM. As we note in (1), the slope performs an essential role in MLRM models because a displacement in any peak means that one population has reached its maximum absorbance level while the other continues growing, so its sign is the opposite. Our results presented in Fig. 7 suggest that the best region to identify possible virus carriers is the amide I of proteins (1700 − 1600 cm− 1) to compact the outputs between the predictions of the same populations and the separation to the other one.
Enthought the spectra analysis allowed us to detect the molecular components that characterize a positive patient to SARS-CoV-2, and the data analysis through MLRM let us discriminate these patients from healthy persons, more assays need to be done, one of them should consider the time elapsed from the symptoms to the diagnosis and categorize this population. Another one should consider the diagnosis corroboration through the serological test (IgA, IgM, and IgG), correlating these results with the FTIR spectra.