The average FTIR spectrum for non-infected, L. infantum, and T. evansi infected canine blood serum are shown in Fig. 1, exhibiting remarkable similarity for all investigated groups. The vibrational bands can be assigned to proteins, carbohydrates, lipids, and fatty acid molecules. The vibrational assignments and molecular groups in the canine blood serum can be are exhibited in Table 1.
The broad band centered around 3276 cm− 1 is assigned to the overlap of molecular vibrational modes of protein and water molecules. The variation of water content in biological samples is common, and significative variation between the samples can contribute to a wrong discrimination between the groups. The drying step, and the same sample preparation conditions for all saples are important to not cause interefence in our data discrimination due to water content.
The absorption in the 1700 − 1450 cm− 1 spectral range is attributed to proteins molecules (Table 1). The main antibodies produced in response to Leishmania antigens belong to this group of molecules. Although, very similar FTIR spectra were obtained, including for the non-infected group, probably due to the overlap, for example, of amide I and II absorption bands in proteins. This spectral range is very infomration rich and useful for sample discrimination as previously shown by other researchers(26, 35, 36). In addition, other molecules that are assyaed by FTIR (e.g. carbohydrates, lipids, and fatty acids) can be equally important for sample classification; the immune response of the organism during infection may impact these molecules, and consequently, cause subtle spectral alterations.
Table 1
Vibrational assignments and related molecular groups for canine blood serum FTIR spectra (37–46). The symbols meaning: ν = symetric and/or assimetric stretching; σ = scissoring bending; and δ = bending vibrational modes.
band
(cm− 1)
|
Vibrational Assignments
|
Organic Group
|
Biomolecule
|
929
|
v(C-O) v(C-O-C)
|
Saccharides; carbohydrates
|
|
1033
|
|
Saccharides; DNA; carbohydrates
|
Glucose; α2-macroglobulin
|
1073
|
|
vs(PO2−)
|
Saccharides; nucleic acids; RNA/DNA; carbohydrates
|
Lactate; α2-macroglobulin
|
1164
|
|
Saccharides; carbohydrates
|
|
1236
|
|
vas(PO2−)
|
Saccharides;
Nucleic acids; phosphate diester; RNA/DNA; carbohydrates
|
Serine; tyrosine; threonine
|
1314
|
v(CH2)
|
Collagen or asymmetric phosphate (Amide III)
|
Transferrin; α1-Acid Glycoprotein
|
1343
|
1397
|
vs(COO−) δs(CH3) vs(CH3)
|
Amino acids; proteins
|
Fibrinogen; IgG1; IgM; IgA; Haptoglobin
|
1451
|
v(C-O-O) vas(CH3) σ(CH2)
|
Amino acids; lipids
|
Apolipoprotein-A1;
|
1515
|
δ(N-H)
|
α-helix of proteins (Amide II)
|
Albumin; IgG4
|
1536
|
vs(C-N) σ(N-H)
|
1633
|
v(C = O) δ(N-H) vs(C-N)
|
β-sheet and a helix of proteins (Amide I)
|
IgG2; IgG3
|
1738
|
v(C = O) Amide I v(C-N)
δ(H-N) Amide II
|
Lipids; phospholipids; cholesterol; esters; Fatty acids
|
|
2854
|
vs(CH2)
|
Lipids
|
Apolipoprotein-B
|
2872
|
v(CH2)
|
Fatty acids; esters; glycerol Phospholipids; triglycerides
|
|
2928
|
vas(CH2)
|
|
2960
|
vas(CH3)
|
Cholesterol esters; lipids; fatty acids
|
|
3067
|
v(CH2)
|
Lipids; unsaturated lipids; protein (Amide II)
|
|
3276
|
v(N-H) v(O-H)
|
Water; and Proteins (Amide)
|
|
3412
|
Figure 2 shows the score plot and loadings for the 1800 − 800 cm− 1 range, which excludes the O-H/N-H overlapped bands, and lipid/amide II contributions for sample classification. Data selection is a rational alternative to improve the clustering in PCA analysis (30), when highly correlated data are used. Detailed results related to other spectral ranges can be found in the Supplementary Material (Figures S1-S4).
The score plot exhibited in Fig. 2(a), shows a broad data distribution with no cluster separation for non-infected, L. infantum, and T. evansi infected canine blood serum. The loadings, Fig. 2(b), indicate that the first three PCs are responsible for over 91% of the data variance, mainly in the spectral range assigned to proteins around 1600 cm− 1, including vibrational modes of immunoglobulins. However, the group separation was not effective by PCA because the infection did not induce measurable spectral differences between the groups with basis on the observed vibrational bands .
The PCA analysis was applied to 1x1 group classification to find a better strategy to sample identification. Figure 2(c) shows the score plot for non-infected versus L. infantum infected canine blood serum with no apparent clustering. The three first PCs represent 92.34% of data variation. The loading data exhibited in Fig. 2(d) is similar to the one shown in Fig. 2(b), with prominent peaks around 1600 cm− 1. Analogous behavior was observed for the score plot and loadings for non-infected versus T. evansi (Figs. 2(e) and 2(f)) and L. infantum versus T. evansi infected canine blood (Figs. 2(g) and 2(h)). Therefore, 1x1 group classification did not enable cluster formation and separation, probably due to the high data correlation.
Machine learning algorithms that rely on the PCA data were then employed to maximize sample classification accuracy, testing different spectral ranges and group comparisons. The results are described in the Supplementary Material (Figures S5 and S6).
Figure 3 summarizes the overall accuracy obtained from ML results in the LOO-CV, for different group combination at 1800 until 800 cm− 1 range. The classification of non-infected, L. infantum, and T. evansi infected samples used only the first 5 PCs, responsible for 96.88% of data variance, to achieved an overall accuracy of 85.42% by using the linear KNN. Similar result was obtained for non-infected versus L. infantum infected group, with overall accuracy of 85% by using the first 4 PCs, responsible for 95.67% of data variance, in the quadractic SVM. However, the best accuracy achieved for non-infected versus L. infantum group classification, was 87.5% in the 1700 until 1450 cm− 1 range, by using the first 10 PCs with Quadratic Discriminant Analysis (Figure S6).
In turn, the classification of L. infantum and T. evansi infected samples yielded an overall accuracy of 100% using either linear or cubic SVM with 10 PCs (Fig. 3). Identical accuracy was achieved in the classification of non-infected and T. evansi infected sera by applying linear SVM with 6 PCs. The weighted KNM algorithm successfully classified only the non-infected and T. evansi infected samples, while linear SVM classified both groups. Cubic and quadratic SVM with 10 PCs also provided a 100% overall accuracy to classify L. infantum and T. evansi infected sera.
Figure 4 shows the confusion matrix arising from the leave-one-out cross-validation tests (LOO-CV) for the group classification strategies with higher overall accuracy achieved in the 1800 − 800 and 1700 − 1450 cm− 1 ranges. The higher overall accuracy (85.42%) in the classification of non-infected, L. infantum, and T. evansi infected samples was achieved adopting fine KNN with 5 PCs for the 1800 − 800 cm− 1 range, correctly classifying all samples infected by T. evansi (100%), 17 samples infected by L. infantum (85%), and 16 non-infected samples (80%) (Fig. 4(a)). Applying quadratic DA with 10 PCs for the data set from 1700 to 1450 cm− 1, the overall accuracy classification of 87.42% was determined for the non-infected and L. infantum classification. The number of samples correctly classified was the same for L. infantum, and only 2 samples (10%) were incorrectly classified in the non-infected group (Fig. 4(b)). The 100% overall accuracy was reached in the classification of both L. infantum and T. evansi infected sera employing linear or cubic SVM with 10 PCs (Fig. 4(c)), and non-infected and T. evansi utilizing linear SVM or weighted KNN with 6 PCs in the 1800 − 800 cm− 1 range (Fig. 4(d)).
ML algorithms allied to PCA were able to distinguish T. evansi infection successfully in the group comparisons and provide good classification accuracy of L. infantum infection when compared with standard methods (4, 9, 11). Additionally, methods based on these strategies could present advantages such as easy sample manipulation, low cost, and fast diagnosis. There is no need for sample preprocessing and any input, and the collection and analysis of the spectra are performed in a few minutes. For comparison, serological tests currently available for CVL diagnosis require prior processing of biological samples and inputs such as antibodies, enzymes, and buffers. ELISA or IFA tests take around 60 to 180 min to deliver the diagnosis.
The false-positive classification for non-infected versus L. infantum versus T. evansi infected samples (Fig. 4(a)) and non-infected versus L. infantum infected samples (Fig. 4(b)) is not a significant issue since the present proposal may be used as a trial test. Positive tested animals would be tested again by high accurate methods before the specialist takes any decision. On the other hand, the false-negative classification was kept constant for three or two group tests, requiring, for example, new methods of sample preparation or measurement to improve the overall accuracy.
Additionally, plasma protein kinetics is used as a biomarker for diagnosis and clinical follow-up of many infectious diseases (47, 48). Acute-phase proteins (APPs) such as haptoglobin (Hp), serum amyloid A (SAA), and C-reactive protein (CRP) have been investigated in dogs infected by L. infantum (49). Compared with the increase in specific antibodies (IgG), plasma proteins may change earlier due to infections, thus favoring an earlier disease diagnosis (49). In the present study, of the 20 animals in the control group (not infected), 4 (in the LI x N x TE comparison) and 2 (LI x N) were classified as LI (infected by L. infantum), which could be possible cases of recent infection, with an increase in APPs, but with no detectable IgG production. Thus, the animals could be infected but without clinical symptoms and with negative results in the serological tests. It is important to highlight that the animals used in this study did not undergo any selection, providing samples with high heterogeneity from animals with different sexes, ages, stages of infection, nutritional condition, and so on. Thus, the approach reported in this work presents a remarkable ability to classify animals infected by either L. infantum or T. evansi, taking into account the high inhomogeneity of the data closely related to the clinical reality.