Investigated cohort. Our experimental studies were run in a hospital ward dedicated to the treatment of COVID-19 infected patients during the third wave of the pandemic in Poland, in the period between March and July 2021. The SARS-CoV-2 Alpha and Delta variants dominated within this period in Poland. The breath samples were collected in the clinic rooms within a group of the hospitalised patients and a control group. We investigated 56 breath samples (33 patients with severe course of COVID-19 disease, 17 breath samples collected within the healthy control group, and 6 samples of ambient air). A detailed description of the investigated group is presented in Table S1, in the supplementary information. The number of scrutinised samples related to the contemporary pandemic situation in the north of Poland and patients admitted to the hospital ward. We investigated patients of different age, gender, and accompanying diseases. The anonymised patient data list their accompanying diseases, medications taken, and blood concentration of C-reactive protein (CRP), associated with inflammation in the human body.
Breath sample collection. All samples were collected in the morning before eating or drinking, except for fresh water, or brushing their teeth to reduce the detrimental impact of food or drink smells. The samples were collected from the patients and healthy volunteers breathing the same atmosphere of hospital ward rooms. The end-tidal part of the final wave of exhaled breath was collected by a BioVOC™ breath sampler (about 130 mL), made of Teflon, and comprised of a protective one-way flow valve, see Fig. 1. The BioVOC™ breath sampler was cleansed after each use by dismantling and inserting into a solution of 20 mL of disinfectant (Milton, France) dissolved in 1 L of distilled water. The breath sampler was left to dry naturally after 15 min of staying in the disinfectant solution. The collected breath sample was introduced into the gas chamber of the electronic nose (Fig. 1a) by attaching the BioVOC™ to the inlet of the gas chamber under low pressure and automatically opening an electrical valve.
The applied procedure of exhaled breath sampling prevented any accidental virus contamination between the examined patients due to the high contagiousness of the SARS-CoV-2 virus. It can be easily simplified to accelerate this process in the near future. A good technical solution is to blow directly into the inlet of the gas chamber with the sensors and use a micro-phone to detect the beginning of the end-tidal part of the final wave, followed by its introduction into the gas chamber26,27.
Electronic nose setup. Our setup applied a prolonged sensor cleaning process by sensor insufflation using ambient air. All highlighted stages of electronic nose operations are presented in Fig. 2. We start with cleaning the gas sensors via a continuous flow of ambient air through the gas chamber (between 2–3.2 L/min). Next, the electric pump creates a low pressure (about 350 hPa) region in the gas chamber within 30 s and sucks the breath sample from the BioVOC™ into the gas chamber. The electric valve closes the gas chamber, and we observe the gas sensors’ responses to the introduced breath sample.
We applied a set of four parameters, representing relative changes of the sensor DC resistance RS, observed in the selected time intervals, as presented in Fig. 2: F1 = ΔRS1/R0 (maximum relative change of DC resistance within analysing phase);
F2 = ΔRS2/R0 (relative change of DC resistance within the final part of the analysing phase – the last 400 samples);
F3 = ΔRS3/R0 (a tangent slope of DC resistance relative change at the beginning of the cleaning phase – evaluated within the period of the first 30 samples of the cleaning phase which follows the analysing phase); F4 = ΔRS4/R0 (a tangent slope of DC resistance relative change at the beginning of the analysing phase – evaluated within the period of the first 30 samples). The proposed parameters can be automatically evaluated from the recorded time series of the sensors’ DC resistances. Moreover, the proposed parametrisation reduced the number of the analysed data points by giving the averaged parameters, which is more robust against the noise and interferences that are naturally present in the recorded time series.
We applied MATLAB scripts, prepared for the release R2021b, and used slope evaluation functions – see the supplementary information. The selected parameters reduced the detrimental effect of the gas sensors’ slow drifts in time, induced by ageing or gradually varying fluctuations in the composition of the ambient atmosphere28. This disturbing effect was considered in numerous studies and is one of the most significant and reducing possibilities for measuring low concentrations of VOCs and any medical applications of high efficiency in practice29. We underline that in a case of COVID-19 patients there are experimental studies confirming that the environmental pollutants, present in the ambient air (due to, e.g. air conditioning systems) have a harmful impact on the effectiveness of medical treatment and therefore can also provide useful information30,31.
We applied a set of commercial gas sensors designed to monitor selected VOCs (supplementary information, Table S2). The sensors were developed within the last few years by the leading companies in the gas sensing industry. Integrated circuit technology was applied for their construction to optimise their energy consumption, which is necessary for operating at elevated temperatures, and to ensure repeatability of their parameters. The sensors are dedicated to portable applications, which is an emerging area of gas sensing applications.
The same gas sensors were applied in electronic noses designed by other research groups and aimed at similar medical or environments applications23,26,32. The selected sensors focus on various VOCs present in indoor environmental or specifically dedicated to the VOCs in exhaled breath. The electronic nose monitored the environmental conditions (temperature, humidity, and pressure) during collection of the breath samples to check their repeatability during the studies. The recorded time series for exemplary COVID-19 and healthy patients are presented in supplementary information, in Figure S1 and Figure S2, respectively. We observed remarkable differences in the shapes of the recorded DC resistances for the COVID-19-diagnosed patients (Figure S1a) and the healthy patients (Figure S2a) from the control group. The differences were visible for the applied gas sensors. The exception was the BME680 gas sensor, which responded too slowly to be used in practice for the analysed breath samples, and therefore these data were excluded from further consideration. We observed some differences in environmental conditions between the patients, especially for humidity (Figure S1b, Figure S2b) in the initial part of the analysing phase. The differences did not exceed 15% and were much less intense than identified for the applied gas sensors.
The electronic nose recorded voltages across the gas sensors operating in serial connection with resistors as independent voltage dividers, supplied by precise voltage references of 2.5 V (REF192; Analog Devices). Voltages were sampled by 16-bit low-power analogue-to-digital converters (ADS1115; Texas Instruments). The developed electronic nose comprised of more gas sensors than the applied for COVID-19 detection algorithm. We limited the built-in gas sensors to the sensors securing a stable response within the assumed time of the analysing phase. We underline that some commercial gas sensors can be potentially used for COVID-19 but require preconcentrated breath samples to accelerate their response time or their composition should be modified (e.g., by introducing noble metals or UV irradiation to enhance the gas sensitivity of the applied metal oxide sensing layers)33,34. Further improvement can be reached by applying nanoparticle technology of enhanced sensitivity or two-dimensional materials, which exhibit a high ratio of the active area to its volume and reaching the detection level of a single molecule35. A great improvement in the gas selectivity of resistance gas sensors was reported for organically functionalised golden nanoparticle prototype structures that are not available commercially36,37.
The applied setup secured stable conditions for measuring the gas sensors’ response in the analysing phase (Fig. 2). The proceeding and following cleaning phases displayed visible fluctuations in the sensors’ DC resistances, induced by non-stabilised laboratory airflow through the gas chamber. We noticed sharp changes in gas sensors response before closing the electrical valve at the beginning of the analysing phase. This is an effect of introducing the breath sample into the gas chamber, which can be controlled more accurately by a few technical modifications. Firstly, we can control the pump to manufacture a more stable/repeatable low-pressure zone. Secondly, more precise adjustment between the inlet into the gas chamber and the BioVOC™ sample secures rerun process of breath sample transmission. We can also introduce the breath sample without using the BioVOC™ but utilising a pump electronically controlled to identify the tidal-wave breath final phase that is introduced into the gas chamber. Further improvement can be reached by applying preconcentration of the analysed VOCs, as presented elsewhere, but requiring a bulkier and more energy-consuming setup38. Some of these underlined detrimental effects can be corrected with simple changes, for more accurate control of the flow of the breath sample.
Data analysis. The collected time series were automatically parameterised (Fig. 2) to reduce the sets of analysed numbers. We estimated the selected slopes at defined time intervals by using Matlab scripts. The detailed data for the investigated cohort were included into the supplementary information (Table S1). Each patient was anonymised, described by a unique number and health status (COVID-19-infected or healthy patient). Each breath sample was described by four parameters delivered from five gas sensors (Figure S1a), and three recorded environmental quantities (Figure S1b). The applied Matlab scripts, used for sensor response parametrisation, are available in the supplementary information (Section 3. Matlab scripts).
Next, a few selected algorithms were applied to determine the efficiency of COVID-19 detection. There is abundant literature presenting various methods of data analysis in olfactory applications39–42. We applied the algorithms available in the Orange software43. This is a hierarchically-organised piece of software that implement data-mining algorithms by front-end visual programming. Data processing was run by implementing a graph structure, determining all steps of the data processing (Fig. 3).
We applied four algorithms to perform the detection task: (i) a multi-layer perceptron algorithm (Neural Network widget)44, (ii) random forest45, (iii) k-nearest neighbours algorithm (kNN)46, and (iv) support vector machine (SVM) algorithm47 which implements the LIBSVM library to execute the SVM algorithm48.
Multi-layer perceptron is a supervised learning algorithm that approximates the input data into the given output by non-linear function in general. The default parameters were applied for this algorithm (neurons in the hidden layers: 100; activation function for the hidden layer: rectified linear unit function – ReLu; maximum number of iterations: 200). Random forest algorithm is a prediction learning method that is also used for classification. It builds a set of decision trees by considering the analysed data. The class selected by most of the trees is the decision result. The algorithm requires a selection of the applied number of trees (the default number of 10 trees was applied). The kNN algorithm predicts the detection results by considering the nearest training instances (e.g., by using a weighted average of the k nearest neighbours) to reduce the amount of the considered data. We applied the default number of 5 nearest neighbours. The SVM algorithm classifies the data by applying non-linear functions to determine a hyperplane of the maximised margin between the classified groups (e.g., COVID-19-infected and healthy patients). We applied radial basis function (RBF) and default parameters to determine the time of the necessary computations. The briefly mentioned and applied algorithms are commonly used for detection goals in electronic nose data analysis, including medical applications39,49.
The results of the selected detection algorithms were tested to evaluate their classification accuracy by cross-validation tech-nique50. We applied the Test and Score widget in the Orange software and split the data into five folds to run the method that randomly splits the data into 5 equal-sized sets for testing the model (one subsample) and training data (four subsamples). All algorithms were tested and the results present receiver operating characteristic (ROC) curves (ROC Analysis widget – Fig. 3) and confusion matrixes (Confusion Matrix widget – Fig. 3). The ROC curve plots the true positive (TP – correctly identified COVID-19 infected patients) versus false positive (FP – healthy patients and ambient air incorrectly identified as COVID-19 infected) rates. The ROC curve determines the dependence between sensitivity (TP) and FP (1 – specificity). The performance of the applied algorithms can also be evaluated by the data delivered by the Test and Score widget. Selected data are included in the supplementary information (Section 4. Detection efficiency).
Ethical approval
and informed consent. All subjects gave their informed consent for inclusion before they participated in the study. The study protocol and all experimental procedures were approved by the Independent Bioethical Commission for Science Research at the Medical University of Gdańsk, Poland (ethical approval code: NKBBN/501/2020). All methods were performed in accordance with the relevant guidelines and regulations.