Participants
In this study, 199 participants were enrolled from two distinct datasets: the University of Tehran (UT) dataset and the University of British Columbia (UBC) dataset. The UT dataset comprised 103 participants (25 subjects with PD and 78 healthy controls (HCs)), recruited from the Department of Neurology at Rasoul Akram Hospital, Tehran, Iran. The UBC dataset included an additional 96 participants, with 77 PD subjects and 19 HCs who mainly were the subjects' spouses, recruited from the Pacific Parkinson Research Center at the University of British Columbia, Vancouver, British Columbia, Canada. Inclusion criteria for subjects with PD were i) PD diagnosis confirmed by movement disorder specialists based on the United Kingdom Parkinson’s Disease Society Brain Bank criteria, and ii) age over 30. Exclusion criteria for both PD and HCs were i) any history of facial trauma or surgery, ii) ongoing orofacial symptoms or sequelae of previously diagnosed neurological or neuromuscular disease other than PD, iii) diagnosis of atypical Parkinsonism, or iv) incapability of using the designed applications for facial expression video acquisition. The Institutional Review Boards of each center approved the studies, and informed consent was obtained from all participants.
Video Acquisition
We utilized video recordings of the participants exhibiting neutral facial expressions and four key emotions, including happiness, sadness, anger, and disgust. In the UT dataset, participants interacted with a custom-designed smartphone application installed on their mobile phones. They were presented with videos showing facial expressions corresponding to the four emotions. They were instructed to replicate these expressions while being video-recorded using the phones' selfie cameras. In the UBC dataset, a web-based application was used to demonstrate expressions of the four emotions. They were asked to mimic each specific emotion, eliciting genuine emotional expressions. Their emotion expressions were recorded using webcams of the participants' personal devices or a computer (iMac 24', Apple Inc., CA, USA) located in a private area in the waiting room of the Movement Disorder clinic. The videos were reviewed by a movement disorder specialist and then scored according to the Movement Disorder Society-sponsored UPDRS (MDS-UPDRS) Part III26. The overall flow of the subsequent analyses is depicted in Figure 3a.
Ethical Approvals
The research project involving the UT Dataset was reviewed and approved by the University of Tehran's Ethics Committee (Approval ID: IR.UT.PSYED.REC.140.048), and the UBC Dataset was reviewed and approved by the Chair of the University of British Columbia Clinical Research Ethics Board (Approval ID: H18-03548, approved on 10/12/2023). Informed consent was obtained from all participants prior to conducting interviews and video recordings. Participants were informed of their right to withdraw from the study at any time. All collected data are securely stored, remain confidential, and no personal identifiers, such as participants' names, are mentioned in any research reports.
Assessment of Facial Expression Asymmetry
Preprocessing
For each collected video, we extracted every frame and cropped the face within the frame. To mitigate variability introduced by differing head orientations, cropping and aligning of the face is a crucial step before feature extraction. Head orientation can be modeled by three rotational angles: roll, yaw, and pitch (Figure 3b). Since yaw rotation can impact the measurement of facial asymmetry, all the videos were aligned vertically with respect to the yaw angle. Pitch rotation does not significantly affect the detection of facial asymmetry as it impacts both sides of the face to the same amount. Similarly, roll rotation does not inherently affect the assessment of asymmetry. Nevertheless, we zeroed the roll rotation angle to simplify the subsequent processes. We used the angle between the line connecting the center of each eye and the horizon to rotate the face. The scale factor was computed based on the desired distance between the centers of the eyes, set at approximately 30% of the image width. A rotation matrix was then calculated and applied to the image using the OpenCV library34. This resulted in a face that is both centered and oriented upright, irrespective of the original pose (Figure 3c).
Facial Landmark Detection
Facial landmarks were identified on the aligned faces using the Shape Preserving Facial Landmarks with Graph Attention Networks (SPIGA) algorithm35 It identifies 98 facial landmark information, offering a higher level of precision compared to other models that extract fewer landmarks36. This increased granularity is especially beneficial for accurately capturing the nuances of older faces, where wrinkles and other age-related features can significantly impact facial landmark detection.
Facial Asymmetry Indices
After the face cropping and alignment for each frame, we set six regions of interest (ROIs) among the face for further analyses; entire face, eyes, eyebrows, mouth, nose, and face outline. For each ROI of each frame, we mirrored the landmarks from the left side of the face to the right side using the line that connects the eyes' midpoint and the nose midpoint (Figure 3c). Then, the distances between these mirrored landmarks and the actual landmarks on the other side were calculated in two ways: the Euclidean distances and vertical distances. These two metrics served as the facial asymmetry indices for further analyses. Note that the laterality of mirroring does not influence the outcome, as the distance remains unchanged regardless of which side (right or left) is reflected. We calculated the Euclidean facial asymmetry index as the sum of the Euclidean distances between the mirrored landmarks and the actual landmarks divided by the square root of the sum of the squares of the width and height of the ROI Eq. (1).
Feature Extraction
We extracted several features from the calculated Euclidean and vertical asymmetry indices of each video for further analysis. These features can be categorized into static features and dynamic features. The static features included the mean and range (maximum - minimum) of asymmetry indices.
The dynamic features extracted included basic statistical time-series measures and the CATCH22 time-series features37 of the indices from the entire video sequence. Basic statistical measures for time-series data included the mean, variance, standard deviation, amplitude, root mean square, skewness, and kurtosis. The CATCH22 features are a set of 22 carefully selected statistical and mathematical properties that summarize the essential characteristics of time-series data, enabling quick and efficient analysis of patterns and anomalies without requiring extensive preprocessing or domain-specific knowledge37. These features include measures of central tendency, variability, entropy, and autocorrelation, which help in understanding the dynamics of time-series signals.
Statistical Analyses
For the comparison of baseline characteristics between PD and HC, skewness and kurtosis values were assessed to determine the normality of the distribution. For normally distributed data, chi-square tests and t-tests were performed. For non-normally distributed data, the Mann-Whitney U test and the Kruskal-Wallis test were used.
To select the most effective features, the Recursive Feature Elimination with Cross-Validation method was used. This approach offers a systematic approach to find the most significant features by iteratively removing the least important ones and evaluating model performance through cross-validation. This procedure was enhanced through a grid search technique, to fine-tune the hyperparameters for a range of classifiers, including Support Vector Machines (SVM), Random Forest, Logistic Regression, and the XGBoost classifier38. To ensure the models’ robustness and their capacity to generalize across different data segments, 5-fold cross-validation was adopted.