Sleep apnea is a prevalent sleep disorder characterized by short episodes of complete or partial blockage of the upper respiratory airway [1]. Sleep apnea is often accompanied by loud snoring, gasping during sleep, morning headaches, and excessive daytime sleepiness, although individuals with mild forms may remain asymptomatic. Scientific studies have established a correlation between sleep apnea and increased risks of cardiovascular diseases, metabolic disorders, and cognitive impairment [2]. Despite its global prevalence, many sleep apnea patients remain undiagnosed, partly due to the absence of symptoms in mild cases or the prohibitive cost of clinical diagnostic tests. The standard diagnostic method for sleep apnea involves nocturnal polysomnography (PSG) conducted in a sleep laboratory. This test monitors various physiological signals during sleep, including brain activity, breathing patterns, heart rate, chest and abdominal movements, limb movements, and blood oxygen levels. Subsequently, registered sleep technicians manually score these signals to generate a comprehensive sleep report. This report enables sleep doctors to identify any aberrations in the patient's physiological state during sleep. However, the PSG test is expensive and time-consuming, which greatly restricts its accessibility and affordability for sleep apnea diagnosis.
Current research endeavors are actively exploring alternative methods for screening sleep apnea that are more cost-effective and user-friendly. Previous studies in this domain can be broadly categorized into two groups based on whether they utilize validated psychometric questionnaires or physiological sensing techniques. The STOP-Bang questionnaire, a validated psychometric tool for obstructive sleep apnea screening, comprises eight items that assess perceivable symptoms of sleep apnea alongside demographic characteristics. Demonstrating high sensitivity and negative predictive value (NPV) across various apnea-hypopnea index (AHI) cutoffs, the STOP-Bang questionnaire offers a reliable screening approach [3]. However, STOP-Bang scores cannot be calculated for asymptomatic patients. To address this limitation, research has explored apnea screening methods grounded in physiological sensing. Numerous studies have leveraged a subset of signals derived from PSG, such as electroencephalogram (EEG), electrocardiogram (ECG), airflow, and blood oxygen saturation levels (SpO2), for automated sleep apnea screening. Performance outcomes vary based on signal modality and computational models utilized [4–13]. Yet, most sensing modalities, including EEG, ECG, and airflow, are not readily available for home use, hindering widespread adoption for at-home apnea screening. Moreover, incorporating multiple sensing modalities further exacerbates feasibility challenges for at-home screening protocols.
This study aims to develop and evaluate a computational approach for screening sleep apnea using only overnight SpO2 signals, which can be conveniently acquired with home-use sensors. The prevalence of portable and wearable oximeters has surged, particularly during the pandemic. Many consumer-grade smartwatches and activity trackers now feature built-in photoplethysmography (PPG) sensors capable of continuously measuring SpO2 levels throughout the day [14, 15]. These devices have increasingly contributed to promoting sleep health among the general population due to their enhanced accuracy and user-friendly nature [14, 16]. Such advancements present an opportunity to develop innovative sleep apnea screening methods that are accessible, cost-effective, and easy to use. While previous studies have explored sleep apnea screening models based on SpO2 signals [4, 17, 18], most were evaluated on small datasets comprising only dozens to hundreds of subjects [4]. Consequently, the reported model performance may be inflated, and the generalizability of the models is uncertain. Furthermore, the granularity of the SpO2 data can vary significantly across devices. Clinical pulse oximeters typically provide high-frequency data sampled at 1 Hz, whereas consumer wearables may offer more sparse data. For instance, Fitbit devices only allow users to retrieve the SpO2 data at 1-minute intervals. However, the impact of data granularity on the screening accuracy of previous models remains unexplored.
In this study, we employ a probabilistic ensemble machine learning approach that combines multiple based machine learning classifiers to predict the probability of sleep apnea status through majority voting. Each base classifier predicts the probability of sleep apnea, with the final prediction determined by averaging these probabilities. An individual is classified as apnea-positive if the probability exceeds a predefined decision boundary. We develop screening models for three AHI cutoff points: \(\ge\)5, \(\ge\)15, and \(\ge\)30 using one of the largest clinical sleep datasets to reduce the risk of overfitting. We evaluate and validate the model using multiple measures and with statistical rigidity. Our evaluation seeks to answer two key research questions: (1) what decision boundaries optimize the performance of the probabilistic ensemble models at each AHI cutoff? and (2) how does the granularity of SpO2 data impact model performance?
To our knowledge, this study represents the first comprehensive investigation into the influence of decision boundaries and data granularity on machine learning based sleep apnea screening. We discuss how the decision boundary elegantly incorporates the pretest sleep apnea prevalence into model tuning, thereby improving the models’ clinical relevance as well as their transferability across diverse populations. Our findings offer novel insights for developing large sleep apnea screening models with enhanced generalizability.