The current standard for detecting asynchronous breathing consists of visually inspecting 30-minute-long recordings of airway flow and pressure signals [10, 11]. Quantification is based on the asynchrony index (AI), defined as the ratio of identified asynchronies to the total respiratory rate [1]. This post hoc, subjective, and labor-intensive method is low in sensitivity and positive predictive value [4, 12].
Traditional computer methods for identifying asynchrony have focused on comparing mathematical descriptions of normal breath morphology with those of asynchronous breaths. [13–16]. Alternatively, machine learning algorithms, which also rely on signal morphology comparison, have been trained to identify specific types of asynchronies. [17–21]. More recently, the timing of respiratory patterns during pressure support ventilation has been used to train neural networks. However, the method's requirement for esophageal measurements constrains its practical application. [22].
Our approach to identifying asynchronous breathing differs fundamentally from these prior efforts. Instead of analyzing individual breaths for morphological markers of asynchrony, we aimed to replicate the decision-making process of an experienced clinician when assessing a time window of airway signal data.
Epoch classification findings.
To achieve this goal, three expert clinicians visually assessed over 50,000 epochs of airway flow and pressure data, classifying each according to a predetermined set of options that included two non-asynchronous patterns (fully synchronous and variable breath-to-breath timing) and five asynchronous patterns. Each epoch was also evaluated based on the clinicians' subjective assessment of airway signal disruption and for the possible presence of dynamic hyperinflation, determined by the difference between end-expiratory flow and the zero-flow baseline.
Nearly one-fifth of all epochs were classified as fully synchronous, with a significant proportion linked to higher rates of intravenous propofol infusion, suggesting that fully synchronous epochs could be an indication of over-sedation. In contrast, variable-timing breathing was the most common classification, occurring in 36.5% of epochs, and is likely indicative of normal respiratory variability.
Epochs were classified as asynchronous when exhibiting four or more asynchronies. This threshold was chosen because the presence of four asynchronies within an epoch at a respiratory rate of 16 bpm corresponds to an Asynchrony Index (AI) greater than 10%, the accepted threshold for asynchronous breathing [1]. Consistent with the findings of Blanch et al [23], we observed that all patients experienced asynchronous breathing at some point during the monitoring period.
Asynchronous breathing accounted for 42% of all epochs, with cycling asynchrony being the most frequent classification. Nearly half of the asynchronous epochs were graded as severe, suggesting inadequate sedation or insufficient ventilatory support during these periods. Although less common, the vast majority of double-triggering and complex epochs were also graded as severe, indicating the need to reassess the adequacy of ventilation in patients experiencing these asynchronies.
We observed dynamic hyperinflation in association with other breathing patterns in 25% of all epochs. This finding aligns with the high prevalence of patients with pulmonary disease in our cohort (52%) who are prone to developing this condition [24, 25].
Model performance.
We used relatively few variables to train the different types of machine learning models. Except for age, these training variables were derived directly from ventilator-acquired data. The study extensively utilized spectral analysis by applying the Fast Fourier Transform (FFT) to airway signals, providing a comprehensive assessment of their behavior over each epoch's duration. The frequency spectra are relatively insensitive to artifacts that can distort these signals, such as strong heart palpitations or water in the ventilator tubing. Furthermore, the frequency spectrum is largely independent of ventilatory mode, which is a crucial advantage, given that most patients were ventilated with two or more modes during the monitoring period.
The Random Forest algorithm was selected due to its superior performance with our dataset relative to other machine learning models, as detailed in the Supplementary Material. Nevertheless, it is important to note that alternative machine learning algorithms may yield comparable or superior results, particularly when applied to larger datasets.
The excellent performances of Model 1 in detecting asynchronous breathing and that of Model 4 in identifying dynamic hyperinflation are likely due to their training with dichotomous data. In contrast, the performance of Models 2 and 3, which are multiclass classifiers, is inherently lower due to the complexity of handling multiple classes.
Model 2 effectively differentiated fully synchronous breathing from variable-timing synchrony with high sensitivity and specificity, an important feature given that highly synchronous patients are known to experience worse outcomes [26, 27]. The model also demonstrated good predictive value and excellent specificity in identifying asynchronous epochs. However, sensitivity was fair to poor for detecting double triggering and complex asynchronies, perhaps due to their relatively low prevalence in the database.
Model 3 is arguably the weakest of the four models, as it depends on epoch classifications that are purely subjective and heavily reliant on the classifier’s clinical experience. Conversely, we reasoned that establishing arbitrary guidelines to assess signal disruption would have introduced bias, undermining the study's aim to replicate clinician judgment. While Model 3 accurately identified mild and severe breathing patterns, it was less effective in predicting moderate severity, possibly reflecting uncertainty in scorer’s assessments of this severity level.
Algorithm reliability vs. accuracy.
Algorithm performance is typically evaluated by its accuracy, defined as the number of correct predictions made on an unseen test set. While models trained with reliable data tend to be both accurate and reliable, those trained with unreliable data may achieve high accuracy but be unreliable. Although artificial intelligence can reduce human bias by identifying relational patterns within large datasets, it cannot guarantee the model’s reliability.
We addressed the issue of algorithm’s reliability by using identical datasets to compare the classifications made by Models 2 and 3 to those made by clinicians with varying experience in asynchrony classification. As hypothesized, the expert group exhibited moderate to substantial agreement with the model predictions, whereas the non-expert group showed only fair agreement. Furthermore, the significantly higher kappa values for the expert group compared to the non-expert group, reinforces the clinical reliability of our algorithms.
Model limitations and further improvements.
The dataset, originating from a single institution and primarily employing PRVC and PC ventilation, may not generalize well to other settings due to its specific clinical practices. The limited ethnic diversity, with only 18% Caucasian participants, could also introduce bias.
During the study design, we recognized that visual inspection for asynchrony identification might introduce subjectivity and variability into the classification process. This issue, however, is inherent to machine learning model development, emphasizing the need for a large, well-classified dataset.
Assessors often encountered difficulties when evaluating epochs that displayed two types of asynchronies, necessitating classification based on the predominant asynchrony type. Further, some asynchrony types, such as flow asynchronies and reverse triggering, were excluded from the classification scheme given their low prevalence in the database.
Model accuracy and reliability could be enhanced by expanding the database incorporating diverse clinical experiences and data sources from other centers. Additionally, the models would benefit from including a broader range of breathing patterns, such as reverse triggering, flow asynchronies, and accounting for a duality of different asynchronies within a single epoch.
In summary, this study demonstrates the feasibility of developing machine learning algorithms to emulate experienced clinicians in evaluating breathing patterns during mechanical ventilation. The application of larger databases, along with advancements in artificial intelligence, may lead to powerful algorithms capable of establishing associations between airway signals and successful ventilatory support.