Subjects
Statistical analysis was performed on data from the Smartphone Health Assessment for Relapse Prevention (SHARP) study, for which seventy-six participants with schizophrenia were recruited. The SHARP study was conducted in three sites, Beth Israel Deaconess Medical Center at Boston, MA, USA; The Sangath Bhopal Hub with the All India Institute for Medical Sciences (AIIMS) in Bhopal, India; and the National Institute of Mental Health and Neuroscience (NIMHANS) in Bangalore, India. Participants in Bangalore and Bhopal were recruited from outpatient psychiatric and psychological services at their respective institutions. All participants were required to be in active treatment, be diagnosed with a psychotic spectrum disorder (confirmed by a clinician using DSM-5 criteria), and have access to a smartphone with access to cellular service or wifi. To ensure technology compatibility, participants went through an initial one-week trial period of passive data collection, with participants offered an alternative device (Samsung Galaxy M31) if they experienced multiple days of no passive data during the trial period. At the BIDMC site participants were deemed ineligible if their device did not pass the trial period and participants had no alternative.
Protocol
Each participant was enrolled in the study for a target period of twelve months, for 13 visits. Each month participants engaged in an hour long visit (in person at Bhopal and Bangalore, virtual at BIDMC) with a clinical research assistant. During each visit participants completed the Positive and Negative Syndrome Scale for Schizophrenia (PANSS) [17]. Participants were then asked to completed the following surveys in the next 24 hours via redcap: The Patient Health Questionnaire-9 (PHQ-9) [18], Generalized Anxiety Disorder-7 (GAD-7)[19], Social Functioning Scale (SFS)[20], Pittsburgh Sleep Quality Index (PSQI)[21], and other scales which are not analyzed in this study. At the intake visit, 6 month, and 12 month visit participants were also asked to complete the Brief Assessment of Cognition in Schizophrenia (BACS) [22].
Passive and active data were collected using mindLAMP, an open-source smartphone application developed by the Division of Digital Psychiatry at Beth Israel Deaconess Medical Center [23]. For the active data, participants were prompted twice a day to complete two of the following six surveys, which include PHQ-9, GAD-7, ‘Sleep’, ‘Sociability’, ‘Psychosis’, and a medication adherence assessment.
The passive data analyzed in this study were collected on a daily basis. While numerous passive data features were obtained, this analysis limits the number of passive data features to avoid multiple comparisons of derived features and focuses on interpretable digital data streams. This includes the amount of time participants spent at their home (home time), the amount of time they spent using their phone screens (screen duration), and how much they moved around to different locations throughout their day (entropy). Home time and screen duration were analyzed in hours, while entropy was analyzed on a scale of 0 to 1. Hometime and entropy were both derived from raw GPS data, and screen duration was derived from the raw device state sensor data.
At BIDMC participants were compensated $50 for visits 1, 7, and 13, and $20 for the remaining visits, for a potential maximum of $350. At Bhopal and Bangalore participants were paid between 500 to 2000 rupees for each visit, with the compensation depending on reimbursement for travel expenses. Across all sites, no compensation was provided for app engagement, or for the volume of passive data.
Brief Assessment of Cognition in Schizophrenia (BACS)
The Brief Assessment of Cognition in Schizophrenia (BACS) is a set of tests that evaluate cognitive areas often affected in schizophrenia; verbal memory, working memory, motor speed, attention, executive functions, and verbal fluency. These areas are key because they tend to be significantly challenged in schizophrenia and are linked to the disease's prognosis. The BACS is designed to be mobile and user-friendly for a wide range of health professionals, including nurses, psychiatrists, neurologists, social workers, and other mental health providers. It can be completed in approximately 30 minutes, requires minimal time for scoring, and doesn't demand extensive training to administer. Further details about each cognitive area and its method of assessment can be found below.
Verbal Memory
Verbal memory is assessed through a list learning task, in which participants are asked to remember as many of a list of 15 words as possible, assessed five times for a potential total score of 75 (score being the number of words recalled per trial).
Working Memory
Working memory is assessed through the digit sequencing task, in which participants are given increasingly large clusters of numbers, and asked to tell the assessment administrator the numbers in order from lowest to highest. The assessment is scored based on the number of correct responses(0–28), and the longest sequence recalled without errors (0–8).
Motor Speed
Motor speed is assessed through the token motor task. Participants are asked to place as many of 100 plastic tokens into a container as possible, with a time limit of sixty seconds, and a restriction of only being able to place two tokens at a time. Participants are scored on the number of tokens correctly placed in the container (100). Due to limitations on in-person visits during the study administration (which occurred in 2021), participants at the BIDMC site were not assessed for motor functioning.
Verbal Fluency: Verbal fluency was measured by two assessments: category instances and controlled oral word association test. In the former, participants are asked to name as many words in a single category (such as ‘tools’) in sixty seconds as possible. In the latter, participants are given two rounds of sixty seconds to name as many words beginning with a letter (different letter per round), such as ‘F’, as possible. For both assessments, the final score is the number of words generated.
Attention and Speed of Information Processing: Attention and speed of information processing was assessed through the symbol coding task, in which participants were shown symbol-number match pairs (ex: 9: ∞), and asked to write the numbers corresponding to a list of symbols. Scores were taken out of potential total numerals that could be matched (0-110).
Executive Functions
Executive functions were measured with the tower of london task. Over twenty trials, participants were shown two mismatching pictures, each with a configuration of the three balls of different colors arranged on three pegs, and asked to state how many times the balls must be moved in one image for the color arrangement to match. Scoring was based on the number of correct responses (0–22). The assessment ceased if participants responded incorrectly five consecutive times, and if participants responded correctly to all twenty assessments, were given two more trials.
Social Functioning Scale (SFS)
The SFS is among the most widely utilized methods for measuring social functioning in people with schizophrenia. It has been shown to demonstrate strong reliability and validity. The SFS is made up of seven subscales, the titles, and descriptions of which are included below. The SFS Composite score is the mean of the subscale scores.
The independence - competence subscale measures how much assistance people need to perform day-to-day tasks and responsibilities. Tasks include using public transport, looking after personal hygiene, and doing laundry.
The employment sub-scale measures factors about a person’s employment and daily routines.
Theinterpersonal functioning subscale measures the size of a person’s social circle and asks about how comfortable people are in their interactions with friends and relatives, as well as in groups of people.
The independence - performance subscale measures how often people complete day-to-day tasks, such as washing dishes, budgeting, and preparing meals.
The prosocial activities subscale measures how often individuals take part in social activities. Examples of activities include playing sports, visiting relatives, or attending parties.
The recreation sub-scale measures how often people take part in recreational activities that can be performed either alone or with other people. Activities include sewing, cooking, or watching television.
The social engagement (also known as withdrawal) sub-scale measures the person’s social tendencies, including how much time they spend alone or outside of the home, and their likelihood of engaging in conversation.
Positive and Negative Syndrome Scale in Schizophrenia (PANSS)
The PANSS is a well established, widely used, and well validated assessment of symptom severity in schizophrenia, designed to account for the heterogeneity of symptom presentation for schizophrenia spectrum disorders. It is composed of 30 items, broken into 3 domains: positive symptoms, negative symptoms, and general psychopathology. The PANSS is scored through the summation of different items across the different scales, such that the positive and negative scales have a range of 7–49, while the general psychopathology scale has a range of 16–112. The positive symptoms subscale measures the presence of symptoms superimposed on one’s mental status, such as hallucinations, disordered thinking, and paranoia. The negative symptoms sub-scale measures deficits in existing psychological processes, rather than novel ones such as in positive symptoms. As such, the sub-scale assesses anhedonia, and social reclusion.
Data Processing
Each of the 76 participants had available a different number of passive and active data (ranging from 2–13 samples per participant, with mean of 6.46 samples per participant) samples across the study period. Across all 76 participants for 27 features, SFS, PANSS, and clinical assessments (PSQI, GAD, PHQ9) were fully available, whereas EMA Assessments (Social, Psychosis, Mood, and Anxiety), passive sensor features (Hometime, Entropy, and Screen Duration), and the BACS cognitive assessments had varying levels of missingness. Furthermore, BACS scores were only available at most three times (intake visit, 6 month, and 12 month) per participant which meant that the data sampling frequency was also different across features. Hence, in order to observe as many features while retaining the original sample size, we decided to probe the central tendency of individuals by taking the mean across the available data per individual for each variable. After averaging, each of the EMA assessments had 21 participants missing for Mood, 25 for Psychosis, 22 for Social, and 20 for Anxiety. Passive sensor-derived features of Hometime, Entropy, and Screen Duration had 14, 17, and 17 participants missing respectively. BACS sub-scores for Tower, VM, Digit, and Fluency each had one participant missing, while 8 participants were missing BACS: Symbols, and 27 participants were missing BACS: Motor. While there were no site-specific pattern of missingness for all other features, due to COVID regulations at the time of data collection, in-person motor assessments were unavailable for the 25 participants from Boston, accounting for the relatively high level of missingness specifically for BACS: Motor. In terms of percentage of data availability per feature, 15 of the 28 features had complete data, 4 features had 98.7%, and the rest of the 9 features had varying amounts of availability between 64.5–89.5% (Fig. 1A).
Multiple Imputation by Chained Equation (MICE) Imputation
Prerequisite to performing dimension reduction methods, such as PCA, is to have a complete dataset. The nature of the missingness was assumed to be random (MAR) and hence we decided to impute the dataset by applying the Multiple Imputation by Chained Equation (MICE). MICE has been noted to be a useful tool in psychiatric research to garner insights from datasets that inevitably contain missingness. Mice package from statsmodels.imputation was used to perform MICE imputation on the dataset for each of the 27 variables. After performing imputation, in order to check whether the imputed dataset was appropriately resembling the original dataset, we performed the Solmogorov-Smirnov (KS) test to compare the similarity of the original and imputed variable’s distribution (Fig. 1B). Across all 27 variables, the imputed distributions exhibited no significant difference compared to the original variable’s distribution. Based on this validation, all subsequent analyses were performed using the imputed dataset.
Accounting for Skewness
To ensure validity of subsequent analyses, we checked whether each of the feature variables exhibited adequate level of normality by quantifying the skewness. We found that 16 of the 27 features had skewness beyond the range of (-0.5,0.5), which is typically the range of minimal skew. For right-tailed (positive) skew, we performed log-transformation, whereas those within the range of (-0.5,0.5) were kept untransformed. Notably, while two features (BACS: Tower, and SFS: Competence) had notably high left-tailed (negative) skew, we kept these variables untransformed as incorporating different types of transformation can obscure the original dataset’s patterns of behavior. The post-transform skewness was much more contained within the (-0.5,0.5) range with only 3 out of 27 features with skewness beyond the (-1.0,1.0) range which are considered high level of skewness (Fig. 2A). We further visualize and test for normality via the QQ plot in Fig. 2B. We also note that while log-transformed features are now interpreted based on percentage change rather than an absolute change, we can still compare the relative strength of features and their associations without compromising the underlying structure of the data.
Max-Min Normalization and Outlier Rejection
After accounting for skewness via feature transformation, we performed a max-min normalization so that all features were scaled to be within the range of (0,1). Finally, applying Tukey’s method, we identified data points outside 3*IQR rejected the outliers. This resulted in rejecting 2 participants with the final set of 74 participants for subsequent analyses. Finally, using this filtered/normalized dataset, an average BACS and SFS scores (‘BACS Composite’, ‘SFS Composite’) were calculated per participant to yield a total of 29 features for further analyses.
Principal Component Analysis and K-means Clustering
We investigated whether the 74 participants could be classified naturally into different subtypes based on their features. We performed PCA analysis using the sklearn.decomposition PCA package and plotted the contribution of explained variance from each principal component (Fig. 3A). We found that the Elbow method found an inflection point for the cumulative explained variance after the first nine principal components, which accounted for 80.4% of the overall variance from the dataset. Hence, we decided to use the first 9 principal components to perform dimensional reduction and proceeded to investigate naturally occurring clusters using the k-means clustering algorithm. We used three metrics to identify the best-fit number of clusters to classify our dataset - Silhouette score, Elbow Method, and the Davies-Bouldin Index. Elbow method identified the biggest slope change around k = 3, which indicated that k = 3 was optimal. Davies-Bouldin Index found the smallest index for k = 3, further substantiating the validity of using k = 3. Finally, the Silhouette Score was also the highest at k = 3, which guided our decision to model our clusters with k = 3 (Fig. 3B-C).
Cluster-wise Mean and Correlation across Features
For k = 3, cluster-wise mean and SEM were quantified for each of the 29 variables. 25 participants were classified under cluster A, 17 participants were classified under cluster B, and 32 participants were classified under cluster C. We ran one-way ANOVA to compare significant differences across the three clusters (Fig. 4A). The features can also be categorized into the macro-dimensions of EMA, Clinical Symptom Assessments, Passive Sensor Data, BACS Cognitive Assessment, and SFS Social Functioning Assessments. In addition to the mean and SEM, we computed probability density function via the kernel density estimation (kde) for each of the clusterwise features (Fig. 4B). This figure further showcases the distinct distributional patterns of features across the three clusters.
Partial Correlation Calculation
Partial correlation measures the degree of association between two variables, with the effect of a set of controlling variables removed. When determining the numerical relationship between two variables of interest, using their correlation coefficient will give misleading results if there are other confounding variables that are numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient. In order to focus on the features that were least redundant, for partial correlational analyses, we decided to omit all PANSS-related features other than PANSS Total, all BACS subdomain scores other than the BACS Composite, and all SFS subdomain scores other than the SFS Composite scores. This yielded a total of 13 features instead of 29, which we use to focus on independent sources of input for the correlational explorations. Based on the partial correlation at the significance level of p = 0.05, we found 5 significant partial correlations for cluster A, 6 for cluster B, and 5 for cluster C (Fig. 5A-C).
Difference of Partial Correlations via the Fisher r-to-Z transformation
Using the Fisher r-to-z transformation, we calculate a value of z that can be applied to assess the significance of the difference between two partial correlation coefficients (ra and rb) found in two clusters of comparison. If ra is greater than rb, the resulting value of z will have a positive sign; if ra is smaller than rb, the sign of z will be negative. Based on the two-tailed Z-test, we noted the significant Z-score difference (at p = 0.05) on each of the cluster-to-cluster difference heatmap (Fig. 6).