Eighty-two volunteers were included in the study. Fifty-two, presented SSc and 30 were healthy, composing the control group. The patients with SSc were divided into two groups: (1) the normal spirometry group (n=22), which included patients diagnosed with SSc and showing normal spirometry and (2) the altered spirometry group (n=30) that was composed of patients diagnosed with SSc and presenting altered spirometry, associated with restrictive ventilatory disorder [2].
The exams were conducted at the Pulmonary Function Testing Laboratory of the Pedro Ernesto University Hospital and the Biomedical Instrumentation Laboratory of the State University of Rio de Janeiro. The Hospital Ethical Committee approved the study and all subjects gave informed written consent. This study is in agreement with The Declaration of Helsinki. The criteria for inclusion in the present study were a confirmed diagnosis of SSc according to the consensus of the American College of Rheumatology [9], including volunteers from both genders. The exclusion criteria were a history of exacerbation of disease in the previous 90 days, smoking, and presence of tuberculosis or pneumonia, chronic lung diseases, presence of respiratory infections in the previous 30 days, chest trauma or surgery, inability to perform the tests and chemotherapy and/or radiotherapy for cancer.
The control group was composed by healthy volunteers from both genders without a history of cardiovascular or lung disease or smoking. These individuals do not presented respiratory infections, and showed normal spirometry [10].
The main elements in this study are the respiratory oscillometry measurements, impedance estimation, and clinical decision support system development and performance evaluation. The complete process is shown in Figure 9. Each operation will be described in the next sections.
Respiratory oscillometry measurements and parameters
These analyses were performed using as input excitation small amplitude pressure oscillations (≤2cmH2O), which were produced by a loudspeaker and applied during tidal breathing at the entrance of the individual's airway through the oral cavity. The result of the exams were generated as the mean of three tests, each 16 seconds long. These tests were considered adequate if they were free of pauses, and presented stable rate and tidal volumes. A pseudo-random noise signal between 4 and 32 Hz was used, and the exams are repeated until all analyzed frequencies presented the minimal coherence function of 0.9. To avoid outlying values, we used a coefficient of variability ≤ 10% in the lowest frequency (4Hz) in the three used tests. The experiments were conducted using an impedance analyzer described previously [11].
A linear regression in the respiratory resistance values in the 4-16 Hz range was used to interpret the obtained results. This yielded resistance at 0 Hz (R0), the mean resistance in this frequency range (Rm) and the slope of the relationship between the resistive values and frequency (S). The low-frequency range is described by R0. This parameter integrates the Newtonian effects, related with the airways, lung and chest wall resistance, as well as the effect of gas redistribution [12]. The mid-frequency range is described by Rm, which reflects the resistance in the central airways [13]. S is associated with ventilation non-homogeneities [14].
The reactive results were interpreted using four indexes: The mean reactance (Xm), resonance frequency (fr), the impedance module (Zrs), and the dynamic compliance (Cdyn). Xm was calculated using the 4 to 32 Hz frequency range, and describes ventilation inhomogeneity. The fr occurs when the elastic and inertive properties cancel out, and Xrs becomes zero [15]. Cdyn was calculated based on the reactance at 4 Hz (Cdyn = 1/2πfX4), and reflects the respiratory compliance, comprising pulmonary, chest wall, and airway compliances. This parameter is also associated with ventilation homogeneity [13]. Zrs includes the effects of resistance and elastic loads in 4Hz, representing the total mechanical load of the respiratory system [16].
Data sets
In the present work, experiments were executed in a dataset that consisted of 246 measurements acquired from the volunteers. Healthy volunteers contributed with 90 measurements of the oscillometric parameters, patients with sclerosis and normal spirometry with 66, and patients with sclerosis and altered spirometry supplied 90 measurements.
Machine Learning Algorithms
Machine learning algorithms can discover crucial relationships among the features in a set of data [5, 17]. These models' inference can be carried out with minimal user intervention through several techniques such as linear models, graphic models, ensemble strategies, hybrid approaches, and artificial neural networks, among others. In our previous research [6, 7, 18] we have experimented with a wide diversity of models and concluded that ensemble strategies had outstanding performance. In this study, we want to investigate the Extreme Gradient Boosting (XGB) algorithm, a type of ensemble derived from gradient boosting. The final inference model is an assemblage of weak inference models, routinely decision trees. It builds the model in a stepwise mode, where its step is designed to model the error of the previous ones. XGB is an implementation of Gradient Boosting, focusing on regularization to control overfitting, which gives it better performance. Besides, we also want to explore Multiple Instance Learning (MIL) to the early examination of respiratory changes in patients with Systemic Sclerosis. Therefore, in this study, the following ML algorithms were appraised:
- K-Nearest Neighbour (KNN) [19];
- Adaboost with decision trees [20];
- Random Forest (RF) [21];
- Extreme Gradient Boosting(XGB) [22];
- Multiple Instance Learning (MIL) [23];
The first three algorithms have already been briefly described in the previous studies [6, 7, 18]; therefore, we will provide a condensed description of the two algorithms that have not been used in our studies before. A complete description of them can be found in the references.
The Extreme Gradient Boosting is a more efficient, regularized version of Gradient Boosting. In Gradient Boosting, one fits an additive model (ensemble) in a forward manner. In each stage, there is an introduction of a weak learner to cope with the previous weak learners' shortcomings. These shortcomings can be described by the residuals (errors) left by the previous weak learners. Hence, the weak learner to be added must fit the residuals to the ensemble to produce better results. The relation of this algorithm with gradient descent (GD) is since the residuals can be seen as negative gradients, and they can be employed by the GD to locate the minimum value of the loss function. Common choices for the loss function are root mean squared error (regression) and log-loss (classification).
The multi-instance learning (MIL) paradigm was introduced by [23] focused on an application in biochemistry. MIL is considered an extension of supervised learning, where the labels are assigned to a set of instances, known as bags, and not to each instance individually. MIL's central idea is related to the notion of bags: it is labelled as a negative bag (Bi-) if the total instances contained in it are negative and labelled as positive (Bi +) if, at best, one of the instances is positive. In this way, a bag can be defined as a collection of instances or regions. The Diverse Density (DD) algorithm was originally introduced by [24], where the algorithm is described as an assessment of the intersection of positive bags minus the union of negative bags. The algorithm's central idea is to find a concept point in the feature space that is close to, at least, one instance of each positive bag and far from the negative bag instances.
Experimental design
This study executed a total of six experiments. The purpose of the first experiment was to investigate the proficiency of a single oscillometric parameter alone to correctly spot the airway obstruction level in patients with systemic sclerosis. We considered two different situations: Control group versus Patients with sclerosis and normal spirometry (CGvsPSNS), and Control group versus Patients with sclerosis and altered spirometry (CGvsPSAS). The remaining experiments also evaluate the two situations described.
The second experiment exploited ML algorithms and compared with the results obtained by a single oscillometric parameter to reveal if the ML algorithms could achieve superior performance. The area under the ROC curve (AUC) was then chosen as the measurement of the performance, since it is regularly employed in medicine [25-28], and yields a superior way to confront classifiers than accuracy [29]. We did not implement feature selection; thus, all of the oscillometric indexes were used. The classifiers described in section 2.3 were realized with Scikit-learn [30], a machine learning library written in python. On the other hand, the Multiple Instance Learning was implemented by the library described in [31]. Since the dataset contains only 246 oscillometric measurements, the k-fold validation procedure [32] is indicated to allow the valuation of the generalization proficiency in the whole dataset. Hyperparameter tuning is a crucial step in model selection. Scikit-learn possesses several strategies to allow hyperparameter fine-tuning, such as grid search, which experiments with all possible combinations of the hyperparameters. Table J0 presents the classifiers and their respective chosen hyperparameters for tuning.
The third experiment evaluates the capability of MIL as a feature selector with the purpose of complexity reduction and to gain knowledge about the importance of different oscillometric parameters [33]. Its role is to select five oscillometric parameters in a previous step before the classifier training. The fourth experiment employs the recursive feature selection (RFE) also to select five oscillometric parameters before the classifier training. RFE is a wrapper strategy that can use several machine learning algorithms to assess the performance. In this paper, the ML algorithm's choice was the linear support vector machine classifier with L1 regularization. The fifth experiment uses MIL to select three oscillometric parameters, and the sixth employs RFE to choose three oscillometric parameters.
The hypothesis test is a requisite for contrasting ML algorithms. There are a wide variety of parametric tests available, which are commonly based on the t-test [17], [34], [35]. Some of the nonparametric tests most used are McNemar's and Wilcoxon's [34, 36, 37]. In this work, the hypothesis test was carried out with AUCs by applying the methodology specified in Delong et al. [38].