2.1. Participants
Subjects were recruited through internet advertisements, and only healthy subjects were accepted into the study. The criteria for health included lack of neuropathic symptoms, lack of mental illness, and a normal mental and physical developmental history. After initial telephone-based screening for developmental and medical history, all subjects were further screened for depression and anxiety using self-administered questionnaires, while attention, memory, frontal function, and executive function were assessed using computerized neurocognitive tests. All procedures were approved by the Research Ethics Committee of the Seoul National University ((IRB number: 1711/003–004) and informed consent was obtained from each participant or guardian prior to enrollment. If the assessments of either depression or anxiety reached a clinical level, that subject was excluded. If any score on the cognitive domains was below − 2z, or that on any three were below − 1.5z, the subject was also excluded. Final enrollment included 618 healthy subjects aged 4 to 19 years. All methods were performed in accordance with the relevant guidelines and regulations.
In the metadata, the age groups were specified as two subgroups: a) children aged 4 to 6 years, and b) adolescents aged 6 to 19 years. Table 1 shows the number of subjects grouped by sex and age.
Table 1
The number of subjects in each sex and age group
|
Sex
|
|
Age (years)
|
Male
|
Female
|
Total
|
a) 4 to 6
|
57
|
39
|
96
|
b) 6 to 19
|
245
|
277
|
522
|
Total
|
302
|
316
|
618
|
2.2. EEG recording and data processing
EEG was recorded using a Mitsar-EEG 202 digital electroencephalograph (Mitsar Ltd., St. Petersburg, Russia) with an electro cap (Electro-cap International, Inc., Eaton, Ohio, USA). Electrodes were placed on the surface of the scalp according to the international 10–20 system, at the following locations: Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, С4, T4, T5, P3, Pz, P4, T6, O1, and O2. The reference electrodes were placed on the mastoid process and the ground electrode was places at Fpz. Electrode resistance was maintained below 5 kΩ. The EEG sampling rate was 250 Hz and the amplifier band ranged from 0.53 to 50 Hz. The participants were instructed to sit upright while the resting state EEG was measured under the eyes-open and eyes-closed conditions, for 3 minutes each. The raw EEG data were notch filtered using a low cut-off of 1 Hz and a high cut-off of 45 Hz. Re-referencing was performed using the common average reference (CAR) method. Artifacts were removed by bad epoch rejection and independent component analysis (ICA) using an automated, cloud-based QEEG analysis platform (iSyncBrain®, iMediSync Inc., Republic of Korea. https://isyncbrain.com/). Spectral power and power ratios were also calculated.
2.3. EEG Features for each region
Because the frontal brain area tends to undergo the greatest dynamic change during development [16, 17, 18], we divided the brain area into four regions for our analysis were comprised the left anterior, the right anterior, the left posterior, the right posterior areas.
Figure 1 shows the four areas designated for analysis. The features included in the analysis consisted of sensor level, source-level region of interest (ROI), and source imaginary coherence. Absolute power of the EEG was obtained from a fast Fourier transform (FFT) on each of the eight frequency bands: delta (1–4 Hz), theta (4–8 Hz), alpha1 (8–10 Hz), alpha2 (10–12 Hz), beta1 (12–15 Hz), beta 2 (15–20 Hz), beta3 (20–30 Hz), and gamma (30–45 Hz). Source level power was estimated by standardized low-resolution brain electromagnetic tomography (sLORETA), which includes 68 ROIs from the Desikan-Killiany atlas [19]. The imaginary part of coherence (iCoh), an indicator of brain connectivity [20] was calculated among each of 68 ROIs at eight frequency bands. All of the EEG features were analyzed using the automated cloud-based QEEG analysis platform (iSyncBrain®).
Sensor-level features included Fz, Cz, and Pz, which correspond to the center axis, in all four regions. T3 and T4 are not included in the frontal lobe but were excluded from the sensor-level features because they are too ambiguous to include in the posterior region. Features of the electrooculogram (EOG)-sensitive delta waves were only used to optimize noise reduction. The left anterior region included Fp1, F3, F7, Fz, C3, Cz; the right anterior included Fp2, Fp2, F4, F8, Fz, C4, Cz; the left posterior included C3, Cz, P3, Pz, T5, O1; and the right posterior included C4, Cz, P4, Pz, T6, O2.
2.4. Machine learning algorithm
We applied the random forest technique for feature selection, and then used feature importance, which can be determined in the order of predictive variables among all features. The random forest technique for ensemble models is based on a decision tree that can explain nonlinear relationships. Because these tree-based ensemble models have their own feature import attributes, they can be viewed by ordering variables according to their importance. In the present study, importance values were extracted based on the Scikit-learn algorithm. Only features with zero importance value were excluded, and the final model was established with enough features to secure predictive power while still maintaining the model's performance. Given the fact that there were up to 10,000 features, multiple feature selection processes were carried out while taking the variable importance of the machine learning model into account for maximized predictive power of the brain age prediction model.
The EEG dataset was split into a 4:1 ratio, for training and test datasets, respectively, and were validated through random five-fold cross-validation (cv). The male group comprised 241 training datasets and 61 test sets out of 302. The female group comprised 252 training datasets and 64 test sets out of 316. The predictive target variable was defined as the biological age.
In the process of developing the female and left posterior brain age prediction model, the cv score of the third fold was lower than that of the rest of the folds. To overcome the uneven scores of all five folds, cross-validation was performed 100 times for the same subjects for only the third fold, and the model was re-trained for all four regions.
The exclusion criteria for female subjects in this study were:
-
An absolute value of the difference between actual and predicted values of > 2.
-
An absolute value of the difference > 2 for the same subjects only.
Thus, the brain age prediction model was finally developed for each of the four regions, with a total of 233 from the training set of 252 in the female group, excluding 19 subjects who met the above two exclusion criteria. Through this process, the prediction results of all five folds were evenly derived, thereby establishing a more stable prediction model.