We have described a diagnostic platform that employs Raman spectroscopy with a unique patient-interface for finger insertion, and, using specific hardware parameters together with machine learning, can detect COVID-19 transcutaneously and non-invasively, as well as glucose. When the device targets the blood transcutaneously, the derived Raman spectra together with machine learning produces a significantly different result between COVID-19(+) and COVID-19(-) patients. We included in our study both inpatients and outpatients, symptomatic and asymptomatic. To demonstrate the robustness of our platform, we extended its application to glucose detection. It is precisely the integration of our AI/ML model with Raman spectroscopy, using the finger insertion patient-interface, which allows the specific target’s signal to be detected with as few as 205 glucose readings.
SARS-CoV-2:
In conceiving our idea and design for the diagnostic device, we anticipated various challenges that were sufficiently met. Although COVID-19 is caused by a “respiratory” virus, we believed at the outset that COVID-19 could be detected by investigating the blood. The rich vasculature of the aero-digestive tract, upper and lower airways, and oropharynx would absorb almost anything inhaled, as occurs with many inhaled substances. Even if the virus itself were not in the blood, or at a concentration too low to reliably detect with our device, the increasing reports (5,12,13) of systemic effects of COVID-19 were strong evidence that there were hematological abnormalities to target (13).
Our study enrolled 476 patients (after quality control, we kept N=455 samples), none of whom experienced any adverse effects from the device. We anticipate that we can improve our results with further modifications of the device’s parameters, as well as more data input.
With the advent of Spatially Offset Raman Spectroscopy (SORS) (9,18) and other deep tissue techniques, more attention has been aimed at biomedical applications (8,9). However, those applications of Raman include a specimen on a slide, other platform, or container, or a direct application of a probe to a specimen (19,20). Our objective was to safely combine the best Raman parameters of laser power, wavelength, distance, detector, and spectrograph processing with the most powerful machine learning algorithm in order to non-invasively “see” beneath the skin into the human vasculature and target the many molecules that represent the intravascular unique signature combinations, both quantitatively and qualitatively, in an infected COVID-19 patient.
Others have demonstrated the feasibility of combining Raman spectroscopy with machine learning. A multiple- instance learning (MIL) approach in machine learning was employed for COVID-19 infection in saliva, in which an area under the curve (AUC) of 0.8 and a sensitivity of 79% (males) and 84% (females), and a specificity of 75% (males) and 64% (females) were achieved (7). A feasibility study that analyzed actual blood tests, focusing on variables “age”, “wbc”, “crp”, “ast”, and “lymphocytes” from 102 COVID-19 negative and 177 COVID-19 positive patients, applied different machine learning algorithms to achieve an accuracy of 82%-86%. Using a Random Forest classifier, their best results were accuracy 82%, sensitivity 92%, PPV 83%, AUC 84%, and specificity 65% (5).
The underlying biochemistry of the expected acute phase reactants, as well as viral composition, has been studied using Raman Spectroscopy on a specimen and has thus served as the scientific basis of our study. Please refer to Appendix B for further details.
In our study, we examined non-invasively with a unique patient-interface the entire mixture of blood components, which when viewed as a whole indicate the presence or absence of COVID-19. The text files from the Raman spectra, when processed with machine-learning, detected differences between COVID-19 positives and negatives. The feature importance analysis of the gradient boosted tree classifier reported 1800, 2200, and 2400 cm-1 as important peaks for distinguishing COVID-19. Glycerol, a metabolic product of glycerides and lipids, is known to produce a peak in this area (1000-1500) when examined on a slide. It is possible that a change in lipid metabolism occurs as a result of COVID-19, similar to the alteration in the clotting cascade. Our differences in Raman spectra between positives and negatives would be consistent with the findings of Roberts et al (13) in their study of the metabolomics of COVID-19. Similarly, activity at 1700 Raman shift is strongly associated with molecules containing a carbonyl group C=O, such as proteins. Carbonyl groups are found in HIV as well as COVID-19 (21). Higher shifts in the 2000 range could be associated with cyano groups C=N, also in molecules such as RNA. While more data is needed to definitively ascribe these important peaks to causal COVID-19 factors, the observations above suggest that ML approach may be detecting viral breakdown products (spike protein and RNA), as well as some other acute phase proteins (21). It is precisely the combinations of molecules we sought to detect and analyze.
In designing the prospective, observational study, we sought to anticipate and mitigate any potential bias. We compared our diagnostic device with the gold standard RT- PCR, performed in a hospital approved lab. As we classified in our data 73 inpatients as well as 382 outpatients and classified them further into symptomatic and asymptomatic, we believe that the potential for bias is minimized. Further, we have included both inpatients who are negative as well as asymptomatic positive patients in our analysis. Most importantly, because the underlying basic science of Raman spectroscopy has been proven to generate a unique spectral signature for each different organism, such as influenza, Ebola, or SARS-CoV-2, in a specific specimen, we believe the same likely holds true for different organisms examined transcutaneously (6,10).
Glucose and beyond:
We also demonstrated that, with our model and platform, blood glucose can be detected in this non-invasive manner. Although we plan to extend our device’s applications in larger studies, the benefits of using one patient’s data for a feasibility study include improved calibration, less distortion and cleaner data. With more data, the power of the AI/ML will increase further, not only because of the quantitative increase, but also due to the increasing richness of the data. We know that with hypoglycemia, other blood-borne molecules are present, such as glucagon and epinephrine, so that the analysis of the entire mixture of blood analytes will yield valuable input data.
A limitation of our glucose study concerns the detection of hypoglycemia, likely attributed to including only 31 values for a blood glucose under 70 mg/dl, as well as an absolute smaller amount of glucose target. However, because we do detect a distinct Raman shift representing glucose (800 cm-1 and 1100 cm-1) with a greater quantity of glucose in the blood, we expect that with more hypoglycemic data, the AI/ML will improve its detection of hypoglycemia.
We are currently working on other biomedical applications for our technology, such as infections, pathology, and cancer diagnosis. We aim to solve existing dilemmas involving intraoperative margin status in cancer surgery, infections (bacterial, viral, or fungal) and pathology with devices, such as ours, with further research. The secondary outcomes analysis from our data is ongoing and will be included as part of
future research. As we measured viral load in accordance with the PCR cycle number, we anticipate correlating our device’s test results with the cycle number and other parameters. Interestingly, among the false negatives for our COVID-19 Raman tests were positive PCR tests whose cycle number was 40. The device can detect COVID-19 among symptomatic patients even more powerfully than the statistics indicate because the study group included both symptomatic as well as asymptomatic individuals in both the PCR negative and positive groups.. Quantifying someone’s immunity, by looking not only at antibody level but also at other immune markers, is another goal. Determining the presence of viral variants and predicting the nature of the variants will require further analysis.
Although a library of Raman Spectra already exists for laboratory- examined infectious agents and many other molecules associated with disease, and is likely not to differ significantly from our transcutaneously-derived library, we hope to add to our own transcutaneously-derived library of infectious agents and molecules. A similar strategy using our device for other pathogens or variants can be used. Using probabilistic models and computational biology, we hope to depict the physical structure of future infectious agents. We could possibly then, using machine learning, determine the Raman signature of future pathogens.. In this way, we will be able to recognize a new infection before it is actually sequenced in the lab (22). Building on our glucose study, we plan to expand the application of our diagnostic platform to other molecular markers and chemistries in the blood.