Data Collection and Ethics
The study collected the data from 34 TBI patients admitted to the Gold Coast University Hospital (GCUH) in Australia between June 2020 to June 2022. All patient-related information were de-identified and accessible via Datarwe’s [15] research platform, which is a clinical data-as-a-service provider that operates as a public-private collaboration. Our research involved data extraction of electrocardiography (ECG) at 240 Hz, and vital signs, including heart rate (HR), diastolic blood pressure (DBP), systolic blood pressure (SBP), atrial blood pressure means (ABP mean), peripheral oxygen saturation (SpO2%), and respiratory rate. The following variables (or features) were also collected: oxygen saturation (so2), partial pressure of oxygen (po2), partial pressure of carbon dioxide (pco2), haemoglobin, carboxyhemoglobin, methemoglobin, chloride, calcium, temperature, potassium, sodium, lactate, and glucose. Additionally, the patients’ age and gender were collected from the registry tables, and their GCS scores (eye, verbal, and motor responses) were extracted from monitoring data at their 12-hour post admission and extracted final recorded GCS value. The GCS score was used as a measure of sedation and to assess the severity.
We excluded the very first hour of data from each patient’s admission due to the high noise and missing values, then extracted the subsequent twelve-hours data. This decision was motivated by the fact that the early hours after brain injury can significantly affect patient outcomes and recovery [16]. During this period, the body undergoes various physiological changes, including alterations in brain structure and connectivity [17], heart rate variability [3], and electrolyte balance [18]. Five patients were excluded due to insufficient data, as their recording periods were less than 12 hours and who lacked ECG, vital signs, and clinical data. The set of features used in this study is listed in Table 1.
SDNN - Measures the standard deviation of normal-to-normal RR for an entire measurement
SDANN - Measure the standard deviation of the average normal-to-normal RR interval in all 5 min segments of the entire recordings
MeanRR – Measure the mean value of RRI for an entire measurement
RMSSD - Calculation of the square root of the mean squared differences in successive RRIs
pNN50 - Number of interval differences of successive RRI intervals >50 ms divided by the total number of RRI intervals
p_VLF (%) - Power in very low-frequency range (0<= 0.4 Hz)
p_LF (%) - Power in the low-frequency range (0.04 - 0.15 Hz)
p_HF (%) - Power in the high-frequency range (0.15 - 0.4 Hz)
HF/LF – Ratio HF/LF
Recurrence rate (REC) - The percentage of recurrent points in the embedded phase space
Determinism (DET) - The ratio of the length of diagonal lines (recurrences) to the length of the whole trajectory
Laminarity (LAM) - The ratio of vertical lines (recurrences) to diagonal lines in the recurrence plot
Standard deviation1 (SD1) - Short term variability
Standard deviation2 (SD2) - Long term variability
Detrended fluctuations alpha 1 - Measurement of the correlation within the signal
Detrended fluctuations alpha 2 - Measurement of the correlation within the signal
|
HR (mean) – Mean of heart rate
HR_std – Standard deviation of heart rate
HR (slope) – Slope of heart rate
Diastolic BP (mean) – Mean of diastolic blood pressure
Diastolic BP_std – Standard deviation of diastolic blood pressure
Diastolic BP (slope) – Slope of diastolic blood pressure
Systolic BP (mean) – Mean of systolic blood pressure
Systolic BP_std – Standard deviation of systolic blood pressure
Systolic BP (slope) – Slope of systolic blood pressure
ABP Mean (mean) – Mean of atrial blood pressure
ABP Mean_std – Standard deviation of atrial blood pressure
ABP Mean (slope) – Slope of atrial blood pressure
SpO2 (mean) – Mean of saturation of peripheral oxygen
SpO2_std – Standard deviation of saturation of peripheral oxygen
SpO2 (slope) – Slope of saturation of peripheral oxygen
|
so2 – Oxygen saturation level
po2 – Partial pressure of oxygen level
pco2 – partial pressure of carbon dioxide level
haemoglobin
carboxyhemoglobin
methemoglobin
chloride
calcium
anion_gap
temperature
potassium
sodium
lactate
glucose
|
GCS (eye)_12hr – Value of Glasgow coma scale eye response value at 12-hour time window
GCS (motor)_12hr – Value of Glasgow coma scale motor response value at 12-hour time window
GCS (verbal)_12hr – Value of Glasgow come scale verbal response value at 12-hour time window
GCS_final – Final recorded total value of Glasgow come scale
|
Age
Gender
|
Table 1:Description of variables/features
This study was conducted in compliance with all stipulations of the protocol and assessed by the Human Research Ethics Committee (HREC) from Queensland Health Australia as exempt from HREC review, given that it was a low-risk de-identified data project only (Reference number: EX/2022/QGC/86736).
Study Design
Statistical and mathematical methods were used to extract features from their vital signs and ECG data. We then employed a correlation matrix and correlation-based network to reveal hidden relationships, interactions, and dependencies among these fifty-one various features. We used Pearson’s correlation coefficient (r), to measure the linear relationship between pair of features, and set a threshold of |r| = 0.5 to determine which correlations were strong enough to be included in the network. Our GNN comprised two key components: the nodes representing individual biomarker values and edge connections based on the correlations derived from our analysis. To assess the overall connectivity within the network graph, we used a graph convolutional network (GCN) that performed node embedding in an unsupervised manner. Our approach, illustrated in Fig. 1, allowed us to investigate complex patterns and correlations among a range of biomarkers, spanning across different biomarker categories, leading to a deeper understanding of the highly correlated biomarkers in the feature set.
Heart rate variability and vital sign data analysis
HRV measurements were performed in the time and frequency domains following the methods described in [19] and [20]. The R-to-R intervals (RRIs) were obtained by detecting R peaks using the Hilbert transform method [21]. Time-domain analysis allows for extracting various features from raw RR intervals. The Welch method proposed in [22], was used to quantitatively analyze the power spectral density (PSD) of the RRI across three frequency bands: very low frequency (VLF, 0.005-0.04 Hz), low frequency (LF: 0.04-0.15 Hz), and high frequency (HF: 0.15-0.4 Hz). These HRV parameters were calculated over consecutive hourly windows, and the aggregate value of the HRV features (mean) was calculated across the 12 hourly segments. Nonlinear HRV parameters were derived based on [23] and [24], Poincare plots [25], recurrence quantification analysis (RQA) [26], and detrended fluctuation analysis (DFA) [27]. Poincare analysis and RQA are commonly used to study changes in HRV resulting from endurance training, and assess parameters including standard deviation (SD) and recurrence of patterns [28] [29] [30]. DFA is another widely used method that quantifies the scaling properties of ECG data [31] [32] [33].
Cardiovascular and physiological parameters, including heart rate, diastolic and systolic blood pressure, atrial blood pressure, peripheral oxygen saturation, and respiratory rate, were collected for every minute. However, due to its clinical management by ventilator settings in the critical initial hours for TBI patients, the respiratory rate was excluded from our analysis, as it was deemed a less reliable indicator of physiological response. The remaining vital sign data was transformed into hourly features using statistical analysis and the mean and standard deviation were calculated. The slope of the best-fit linear regression line for each vital sign was also calculated over a 12-hour period to interpret trends and fluctuations in the measures. Missing data were addressed using the k-nearest neighbours (k-NN) imputation method, where the imputed values were typically the mean of k-NN values, as suggested by Zhang et al. [34].
Correlation network analysis and graph convolutional network
The CNA reveals interactions between variables, whereas GNNs capture complex dependencies in graph-structured data [35] and propagate information throughout the entire graph [13]. In our study, we constructed a correlation network with nodes representing biomarkers and edges representing correlations between pairs of nodes. Within the network, we used the length of edges to represent the level of correlation between the paired biomarkers with shorter length for higher correlation coefficients, the node degrees representing the number of connections/edges a specific node has with other biomarkers, and the betweenness centrality score measuring the number of shortest paths between any two biomarkers passing through the node. Nodes with greater centrality scores were regarded as drivers of the network because of their high interconnectivity. For this study, the degree centrality was normalized by the maximum possible number of edges (n-1), and between centrality was normalized by the number of all possible pairs of nodes of interest (n-1) (n-2) / 2 [36], where n is the number of nodes in the network. Additionally, the relationships between various biomarker groups were assessed using network density values between the subgroups, which consist of HRV, vital sign, laboratory, GCS, and demographics data.
GCSs consist of multiple layers that aggregate information from neighbouring nodes and update nodes based on their characteristics and graph structure. GCN was applied in an unsupervised manner to identify the most significant biomarkers in patients with TBI. The relevance of each feature was determined by computing the L2 norms [37] of the embedding vectors, with higher values were considered greater significance in the correlation network graph. To gain a holistic understanding, these scores were normalized across all features. However, these scores only suggest high level of connectivity and do not imply direct significance beyond their network relationships.
The entire process of feature calculation, correlation analysis, and graph analysis was conducted using Python on the AWS cloud platform. NetworkX [36] and Pyvis [38] were utilized to visualize the correlation network, while PyTorch geometry [39] was utilized for GCN training and analysis.