Participants
A total of 494 adults were assessed for study eligibility. Of these, 247 consented, completed baseline and were randomised. A total of 29 participants actively withdrew after randomisation (see Figure 2 for study flow).
Table 2 outlines the baseline characteristics of the final sample (N=218). The majority were female (74.8%), Australian born (80.3%), married or partnered (52.8%), in paid employment (74.8%) with a mean age of 39.33 years (SD: 12.37, range: 18 to 69 years) and English as the main language spoken at home (90.8%). The majority (85.3%) reported that they never required any literacy support. Nearly all participants reported that they had received a formal diagnosis of a mental illness at baseline (92.7%, n=202/218) and over half were taking prescribed medication for a mental illness (59.6%, n=130/218).
Table 2. Baseline characteristics of sample (N=218)
|
Total sample
|
|
M
|
SD
|
Age (years)
|
39.33
|
12.37
|
Depressive symptoms (PHQ9)
|
13.50
|
4.13
|
Anxiety symptoms (GAD7)
|
10.83
|
4.51
|
|
N
|
%
|
Female
|
163
|
74.8
|
University education
|
105
|
48.2
|
Married/Partnered
|
115
|
52.8
|
In paid employment
|
163
|
74.8
|
Born in Australia
|
175
|
80.3
|
First Nations
|
5
|
2.3
|
LGBTQI+
|
45
|
20.6
|
English only
|
198
|
90.8
|
Literacy support (Never)
|
186
|
85.3
|
Formally diagnosed mental illness
|
202
|
92.7
|
Age of onset
|
19.91
|
10.83
|
Age of diagnosis
|
26.94
|
10.93
|
Taking prescribed medication
|
130
|
59.6
|
Likely case of Major Depressive Disorder
|
193
|
88.5
|
Likely case of Generalized Anxiety Disorder
|
134
|
61.5
|
Note: Likely case of Major Depressive Disorder (MDD) determined by a PHQ9 score ≥ 10; Likely case of Generalised Anxiety Disorder (GAD) determined by a GAD7 score ≥ 10.
Completion of text tasks
On average, participants attempted 16.41 out of the 19 tasks (SD: 4.87, range: 0 to 19). The majority completed the text tasks on a mobile device (88.5%). Participants in sequence 4 (M: 14.84, SD: 5.99) attempted significantly fewer tasks compared to those in sequence 2 (M: 17.43, SD: 3.86; t=2.843, P<.05). No other differences in completion rates were found. Table 3 presents the word counts, character counts, and proportion of dictionary words within tasks.
Table 3: Word counts, character counts, and proportion of dictionary words within tasks per task and per participant
Task
|
Task - word count
|
Task - character count
|
Task - proportion of dictionary words
|
Participant corpus - word count
|
Participant corpus - character count
|
Participant corpus -proportion of dictionary words
|
|
M (SD)
|
M (SD)
|
M (SD)
|
M (SD)
|
M (SD)
|
M (SD)
|
A. SMS retrieval
|
64.88 (76.88)
|
336.17 (400.13)
|
93.01 (5.10)
|
280.88 (258.14)
|
1455.37 (1351.33)
|
93.07 (2.96)
|
B. Social media posts
|
62.18 (63.30)
|
328.10 (331.63)
|
94.09 (6.04)
|
219.81 (180.47)
|
1159.87 (946.47)
|
94.13 (3.66)
|
C. Emotion diaries
|
94.31 (75.62)
|
488.09 (394.78)
|
95.43 (3.59)
|
259.70 (189.24)
|
1344.03 (988.37)
|
95.45 (2.37)
|
D. Expressive writing
|
223.01 (198.22)
|
1164.18 (1034.49)
|
95.53 (2.65)
|
527.50 (479.72)
|
2753.89 (2506.60)
|
95.54 (2.04)
|
V. Personal description
|
-
|
-
|
-
|
197.34 (144.06)
|
1041.64 (761.26)
|
92.90 (3.42)
|
W. Friend description
|
-
|
-
|
-
|
167.02 (142.39)
|
870.65 (742.69)
|
95.59 (2.50)
|
X. Reflection of a complex image
|
-
|
-
|
-
|
79.44 (53.9)
|
430.74 (291.26)
|
92.26 (4.79)
|
Y. Holidaying letter
|
-
|
-
|
-
|
117.21 (72.00)
|
603.29 (373.16)
|
92.64 (3.96)
|
Within-task associations between mental health symptoms and linguistic features
Significant correlations between mental health symptoms and the 113 linguistic features were found in three of the eight tasks. In the Task B corpus (i.e., combined social media posts), higher depressive symptoms were associated with greater expressions of “power” words (r = .27, q = .010), greater expressions of “sadness” words (r = .26, q = .010) and reduced first-person plural pronouns (r = -.26, q = .010). See Supplementary Material for correlation plots (Section 4, Figure S.4.3). As shown in Table 4, these features were not found to be consistently associated with depressive symptoms when the Task B completions were examined separately.
Table 4: Associations between the significant task-level linguistic features (power words, sadness words, first-person plural pronouns) and depressive symptoms within each Task B completion.
Task - Completion
|
Power words (r)
|
Sadness words (r)
|
First-person plural pronouns (r)
|
q-range
|
Task B-1
|
.09
|
.18**
|
-.17*
|
.546 - .929
|
Task B-2
|
.22**
|
.08
|
-.17*
|
.272 – 1.0
|
Task B-3
|
.05
|
.21**
|
-.14
|
.229 - .986
|
Task B-4
|
.17*
|
.05
|
-.09
|
.551 – 1.0
|
Table note: *Unadjusted P< .05, **Unadjusted P<.02.
In Task V (personal description), higher levels of depressive symptoms were associated with greater use of “cognition” words (r = .25, q = .032), greater use of “tentative” words (r = .25, q = .032), and higher levels of “negative tone” (r = .23, q = .045). When outliers were removed, “tentative” words (r = .19, q = .175) and “negative tone” (r = .17, q = .195) were non-significant. In Task W (description of friends), higher levels of depressive symptoms were associated with greater expressions of “want” words (r = .27, q = .028). See Supplementary Material for correlation plots (Section 4, Figure S.4.4 and Figure S.4.5). No other significant correlations were found.
Shown in Table 5, the within-task machine learned models predicted up to 8% (Task B corpus) of the variance in depressive symptoms and up to 4% (Task B corpus) of the variance in anxiety symptoms. The best-performing models varied depending on the task. The models that included the linguistic features from the Task B corpus together with clinical and demographic characteristics accounted for the greatest level of variance in symptoms. The observed and predicted values from the outer cross-validation folds were significantly correlated for all depression models whereas this pattern was only found for Tasks A, B, V, and X in the anxiety models. Across the models, depression scores were predicted with a mean absolute error of 4.13-4.77 points on the PHQ-9 scale, and anxiety scores were predicted with a MAE of 4.14-4.51 points on the GAD-7 scale. The importance and directionality of linguistic features in the models varied greatly between tasks, with clinical and demographic features showing high importance in some models (see Supplementary Material Section 4 Figures S.4.6 to S.4.16).
Table 5. The best performing within-task machine learning models using linguistic features to predict depressive and anxiety symptoms.
|
Depressive symptoms (PHQ-9)
|
|
Model
|
RFE
|
Demographic and clinical
features
|
RMSE
|
r2
|
MAE
|
r
|
p
|
Hyperparameters
for final
fitted model
|
Task A
|
SVM-NL
|
No
|
Yes
|
5.77
|
.05
|
4.69
|
.22
|
.005
|
C = 1
Σ = .001
|
Task B
|
Elastic Net
|
No
|
Yes
|
5.27
|
.08
|
4.41
|
.28
|
<.001
|
α = .6
λ = 1
|
Task C
|
Random Forest
|
No
|
No
|
5.74
|
.04
|
4.68
|
.20
|
.006
|
mtry = 50
|
Task D
|
SVM-NL
|
No
|
Yes
|
5.17
|
.04
|
4.24
|
.19
|
.018
|
C = 2
Σ = .008
|
Task V
|
Random Forest
|
No
|
No
|
5.15
|
.03
|
4.13
|
.18
|
.012
|
mtry = 16
|
Task W
|
Elastic Net^
|
No
|
Yes
|
5.82
|
.05
|
4.77
|
.22
|
.005
|
α = 0
λ = 16
|
Task X
|
Random Forest
|
Yes
|
Yes
|
5.66
|
.05
|
4.50
|
.22
|
.005
|
mtry = 50
|
Task Y
|
Random Forest
|
No
|
Yes
|
5.62
|
.07
|
4.61
|
.26
|
<.001
|
mtry = 8
|
|
Anxiety symptoms (GAD-7)
|
|
Model
|
RFE
|
Demographic and clinical
features
|
RMSE
|
r2
|
MAE
|
r
|
p
|
Hyperparameters
for final
fitted model
|
Task A
|
Random Forest
|
No
|
Yes
|
5.26
|
.03
|
4.51
|
.16
|
.042
|
mtry = 1
|
Task B
|
SVM-L
|
No
|
Yes
|
5.03
|
.04
|
4.22
|
.21
|
.007
|
C = .001
|
Task C
|
Random Forest
|
Yes
|
Yes
|
5.02
|
.02
|
4.14
|
.15
|
.058
|
mtry = 1
|
Task D
|
Elastic Net
|
Yes
|
No
|
5.03
|
.01
|
4.24
|
.04
|
.585
|
α = 0
λ = 2
|
Task V
|
Elastic Net
|
No
|
No
|
4.90
|
.03
|
4.14
|
.16
|
.027
|
α = .5
λ = 1
|
Task W
|
SVM-L
|
No
|
Yes
|
5.29
|
.01
|
4.46
|
.09
|
.257
|
C = .001
|
Task X
|
SVM-L
|
Yes
|
Yes
|
5.02
|
.03
|
4.25
|
.18
|
.022
|
C = .016
|
Task Y
|
SVM-L
|
No
|
Yes
|
5.13
|
.01
|
4.28
|
.12
|
.147
|
C = .0005
|
Table note: Task descriptions: A=SMS retrieval, B=Social media post, C=Emotion diary, D=Expressive writing, V=Personal description, W=Friends description, X=Reflection of a complex image, Y=Holidaying letter. ^α = 0 represents ridge regression in the elastic net framework.
Associations between mental health symptoms and linguistic features when all tasks were combined.
As shown in Table 6, only two of the linguistic features found to be significantly associated with depressive symptoms within the tasks were found to be significant when texts from all the tasks were combined into one corpus. In this analysis, higher levels of depressive symptoms were associated with greater use of “cognition” words (r = .24, q = .024) and “cognitive process” words (r = .22, q = .027), lower levels of “positive tone” (r = -.32, q < .001) and “positive emotion” (r = -.21, q = .037), reduced first-person plural pronouns (r = -.25, q = .016), and reduced “affiliation” words (r = -.23, q = .024). Higher levels of anxiety symptoms were associated with greater use of “cognition words” only (r = .26, q = .022). See Supplementary Material (Figures S.4.17) for the correlation plots.
Table 6. Linguistic features significantly associated with depressive and anxiety symptoms within-tasks and when tasks were combined.
Linguistic features associated with depressive symptoms (PHQ9) within-tasks
|
Task
|
A
|
B
|
C
|
D
|
V
|
W
|
X
|
Y
|
Cognition
|
-
|
-
|
-
|
-
|
*
|
-
|
-
|
-
|
First-person plural pronouns
|
-
|
*
|
-
|
-
|
-
|
-
|
-
|
-
|
Power
|
-
|
*
|
-
|
-
|
-
|
-
|
-
|
-
|
Sadness
|
-
|
*
|
-
|
-
|
-
|
-
|
-
|
-
|
Want
|
-
|
-
|
-
|
-
|
-
|
*
|
-
|
-
|
Linguistic features associated with depressive symptoms (PHQ9) when combined
|
A
|
B
|
C
|
D
|
V
|
W
|
X
|
Y
|
Affiliation
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
Cognition
|
-
|
-
|
-
|
-
|
*
|
-
|
-
|
-
|
Cognitive processes
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
First-person plural pronouns
|
-
|
*
|
-
|
-
|
-
|
-
|
-
|
-
|
Positive emotion
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
Positive tone
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
Linguistic features associated with anxiety symptoms (GAD7) within-tasks
|
Task
|
A
|
B
|
C
|
D
|
V
|
W
|
X
|
Y
|
None
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
Linguistic features associated with anxiety symptoms (GAD7) when combined
|
A
|
B
|
C
|
D
|
V
|
W
|
X
|
Y
|
Cognition
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
Table note: Linguistic features listed in left column are all those found to be associated with symptoms of depression (PHQ-9) and anxiety (GAD-7) at either the task level or when combined. * indicates the features with a significant q value for that task when examined with outliers removed. – indicates no significant q value for that task with outliers removed.
Shown in Table 7, random forest models using only linguistic features were the best performing models, predicting 8% of the variance in depressive symptoms and 3% of the variance in anxiety symptoms. The observed and predicted values from the outer cross-validation folds were significantly correlated for both the depression model (r = .28, p < .001) and the anxiety model (r = .17, p < .05). As displayed in Figure 6, “positive tone” was the most important linguistic feature for predicting depressive symptoms with negative directionality (i.e., higher values lead to lower symptoms). First-person plural pronouns, “time”, “motion”, “substance” and “wellness” words also emerged as somewhat important to the model. The most important linguistic features with positive directionality (i.e., lower values lead to higher symptoms) were conjunctions, “function” “differentiation” and “death” words. The most important linguistic features for predicting anxiety symptoms with negative directionality were past-focussed words, positive tone, “drive” words, third-person singular pronouns, and “article” words. The most important features with positive directionality were “anxiety” words, conjunctions, linguistic words, “all-or-nothing” words, and “discrepancy” words. See Supplementary Material (Figures S.4.18) for the variable stability plots.
Table 7. Best performing across-task machine learning models using linguistic features to predict mental health symptoms.
|
|
|
|
Performance metrics
|
Correlation between
observed and predicted test values
|
|
Mental health
symptoms
|
Best performing model
|
Recursive
feature
elimination
|
Demographic and clinical
features
|
RMSE
|
r2
|
MAE
|
r
|
p
|
Hyperparameters
for final
fitted model
|
Depression
|
Random Forest
|
No
|
No
|
4.87
|
.08
|
3.89
|
.28
|
<.001
|
mtry = 70
|
Anxiety
|
Random Forest
|
No
|
No
|
4.50
|
.03
|
3.70
|
.17
|
.015
|
mtry = 30
|
Impact of emotional text tasks (Task C and Task D) on participants’ mood and acceptability ratings
There were almost no immediate changes in participants’ mood, as measured by the MDMQ, after completing the emotion diary tasks (Task C, P=.057 to .987) except for higher levels of relaxation after the second completion (MD: 0.26, df=895.81, P=.030) and higher levels of agitation after the third completion (MD: 0.34, df=896.42, P=.008). For the expressive task (Task D), participants reported significantly greater fatigue (P=.001 to P=.045), lower levels of contentment (P=.000-.007), higher levels of agitation (all P<.001), lower levels of energy (P=.002 to 0.40), feeling more unwell (P=.000 to .023) and lower levels of relaxation (all P<.001) after all completions (see Supplementary Material Section 4 for more detail). There were no main effects of sequence or interaction effects for sequence by time on any of these findings.
Shown in Table 8, the acceptability ratings of the tasks were mixed. Across the whole sample, the expressive tasks (Task D) were significantly more difficult to complete (M: 3.0, SD: 1.9) than all other tasks (all P < .001). Participants were also significantly less willing to complete Task D (M: 4.7, SD: 1.9) when compared to all other tasks (all P ranged from <.001 to .012). However, participants’ levels of interest and willingness were influenced by their allocation sequence. Participants in sequence 1 reported significantly greater interest in completing tasks when compared to sequences 3 (P=.026) and 4 (P=.006). Participants in sequence 1 were also significantly more willing to repeat the tasks when compared to sequences 3 (P=.036) and 4 (P=.036).
Table 8. Omnibus test results from mixed models assessing acceptability across tasks and sequences.
|
Variable
|
dfnum
|
dfden
|
F
|
P
|
Ease of completion
|
Task
|
7
|
1306.66
|
37.51
|
<.001
|
|
Sequence
|
3
|
202.64
|
0.53
|
.661
|
|
Task * Sequence
|
21
|
1306.58
|
0.80
|
.723
|
Level of interest
|
Task
|
7
|
1308.55
|
10.59
|
<.001
|
|
Sequence
|
3
|
206.74
|
4.75
|
.003
|
|
Task * Sequence
|
21
|
1308.49
|
0.89
|
.610
|
Willingness to repeat
|
Task
|
7
|
1303.95
|
12.16
|
<.001
|
|
Sequence
|
3
|
202.94
|
3.33
|
.021
|
|
Task * Sequence
|
21
|
1303.89
|
3.22
|
<.001
|