Participants
Our sample was part of an ongoing open longitudinal cohort – the Geneva Autism Cohort (Franchini et al. 2018). The cohort follows preschoolers with either ASD or typical development (TD) and collects longitudinal child behavioral measures (see Measures subsection for a detailed description). Since 2012, preschoolers have been recruited through announcements in the Geneva community (e.g., parent associations and clinical centers). Diagnosis was set by licensed child psychiatrist MS using the DSM-5 diagnostic criteria. Diagnosis was further confirmed by the diagnostic cutoffs of the ADOS. Written informed consent forms were signed and provided by the participants’ caregivers. The Ethics Committee of the University of Geneva approved the research protocol. The open cohort longitudinal design involves assessments conducted every 6 months over two years (hence totalling 5 timepoints per participant when the follow-up is completed. Our final sample comprised 371 participants (1164 timepoints) aged from 1.5 to 5.7 years old. The TD group comprised 85 participants (221 timepoints, age range 1.5–5.6 years old, 44.7% of female biological sex) and the ASD group 286 (943 timepoints, age range 1.5–5.7 years old, 17.5% of female biological sex). Sample characteristics are detailed in Table 1. Moreover, Supplementary Figure S1 provides an illustration of the recruitment process with participant’s age at each visit. For any longitudinal timepoint to be included, the participant had to be younger than 68 months old. This age corresponds to the upper limit of the Mullen Scale of Early Learning that was used to compute the DQs (see the Measures subsection). Participants that were not exposed to the French language were not included (i.e., not part of the n = 371 final sample) to get a homogeneous sample in terms of language exposition (48 participants, 8 TD and 40 ASD).
Table 1
Sample characteristics with statistical comparison between TD and ASD samples.
MEASURE [mean (SD)] | TD | ASD | P-Val |
---|
Number of participants | 85 | 286 | |
Number of timepoints | 221 | 943 | |
Timepoints per participant | 2.6 (1.2) | 3.3 (1.5) | < .001 |
Mean age [y] | 3.4 (0.9) | 3.6 (0.8) | .088 |
Age range [y] | 1.5–5.6 | 1.5–5.7 | |
Female biological sex | 38 (44.7%) | 50 (17.5%) | < .001 (χ2) |
Plurilingual environment | 28 (32.9%) | 145 (50.7%) | .004 (χ2) |
College degree completed | 73 (91.7%) [n = 84] | 151 (56.1%) [n = 269] | < .001 (χ2) |
Yearly household income > 110k | 49 (62.0%) [n = 79] | 91 (36.8%) [n = 247] | < .001 (χ2) |
Children at home | 1.8 (0.6) [n = 83] | 2.0 (0.9) [n = 268] | .184 |
Parental age at birth | 33.3 (4.6) [n = 84] | 34.5 (5.6) [n = 269] | .072 |
Mean ADOS CSS total | 1.1 (0.2) | 7.4 (1.6) | < .001 (MW) |
Mean ADOS CSS SA | 1.1 (0.3) | 6.4 (1.6) | < .001 (MW) |
Mean ADOS CSS RRB | 2.3 (1.8) | 9.0 (1.3) | < .001 (MW) |
Mean Composite DQ | 112.8 (10.8) | 70.2 (23.5) | < .001 |
Mean Expressive Language DQ | 105.0 (15.3) | 58.0 (25.4) | < .001 |
Mean Receptive Language DQ | 117.5(12.4) | 62.9 (30.1) | < 0.001 |
Mean Visual Reception DQ | 122.3 (16.3) | 82.7 (25.6) | < .001 |
Mean Fine Motor DQ | 106.1 (13.0) | 77.1 (19.3) | < .001 |
Mean PSI Stress Total | 59.1 (18.0) [n = 79] | 90.5 (22.8) [n = 241] | < .001 |
For categorical variable, chi square was applied (χ2). For continuous variables we used either two-tailed independent T tests or Mann-Whitney test (MW) when normality was not assumed (E.g., with ADOS CSS). P values < .05 are highlighted in bold. For variables that change over time (e.g., DQ), we computed mean, SD and T statistics using each participant’s averaged value over timepoints.
Early intervention program
Among the 286 autistic participants, 98 (34.3%) underwent an individualized two-year early intensive intervention program following the Early Start Denver Model (ESDM) (Rogers and Dawson 2010). As a Naturalistic Developmental Behavioral Interventions (NDBIs) (Schreibman et al. 2015), the ESDM program integrates principles from developmental science and behavioral learning, such as emphasizing the importance of developmental prerequisites and promoting child engagement in social interaction using motivating activities. Studies have recognized the ESDM as an effective intervention that significantly increases cognitive, verbal, and adaptive skills (Dawson et al. 2010; Fuller et al. 2020; Fuller and Kaiser 2020). Participants enrolled in the ESDM intervention program received between 15 and 20 hours a week of individual sessions with a graduate-level therapist trained with the ESDM approach. Children underwent evaluation every 3 months throughout the 2-year intervention period and their parents also received coaching sessions at the beginning of their child’s enrollment in the program. For more details regarding the ESDM program in Geneva, see (Godel et al. 2022).
Measures
We collected three types of measures. First were measures of outcome, used to define phenotypes of language (verbal performances), second were fine-grained descriptive measures of language acquisition (Questionnaire sur le Développement du Langage de Production en Français, DLPF), and third were measures of potential moderators of language outcome (demographic measures, non-verbal cognition, autistic symptoms). In addition, we collected a measure of parental stress (Parental Stress Index, Short Form, PSI-SF).
Language outcome
To measure verbal performances, we used the Mullen Scales of Early Learning (MSEL) (Mullen 1995) which is a standardized tool assessing children aged 0 to 68 months. The MSEL evaluates Receptive Language (RL) and Expressive Language (EL) for which we computed developmental quotient (DQ) scores by dividing the developmental age (i.e., Age Equivalent score of a scale) by the child’s chronological age and multiplying by 100 (Lord et al. 2006; Shen et al. 2013). Unlike standard scores, DQ provides the advantage of mitigating floor-effects of very low performing participants. The MSEL was not administered to a small proportion of our total sample (102 timepoints, 8.8%) because it was added later to the research protocol. For missing MSEL data, we used a substitute measure of verbal cognition: the Psychoeducational Profile – third edition (PEP-3) (E. Schopler et al. 2005). The PEP-3 is another standardized developmental evaluation designed for children aged from 2 to 7 years. As we did for the MSEL, we computed DQs for EL and RL domains.
Parental stress outcome
To explore the impact of language impairment on the parental quality of life we used the Parenting Stress Index - Short Form (PSI-SF, Abidin 1995). Parental stress has been shown to be an important mediator of quality of life (Wang et al. 2022). We used the total stress score (which is the sum of three subscales) as our measure of parental quality of life.
Linguistic Deep Phenotyping
The Développement du langage de production en français (DLPF) is a standardized parent-reported questionnaire that aims at assessing the development of expressive language in children exposed to French language aged from 18 to 42 months (Bassano et al. 2005). The structure of the questionnaire is inspired from the MacArthur Communicative Development Inventories (CDI, Fenson et al. 2012). There are four versions of the DLPF depending on the child’s age. The DLPF is divided into three linguistic sections exploring lexical, grammatical, and pragmatic development respectively. In the lexical section, parents are presented lists of words and asked to check off all the words their child can produce. Then, the grammatical section investigates grammatical forms (e.g., the use of articles, noun plurals or verbal tenses), as well as structures and complexity of word combinations. Parents completing the grammatical section are either asked to evaluate the frequency (i.e., Never, Sometimes, or Often) with which their child produces a form, or to indicate whether their child uses a specific formulation. Finally, the pragmatic section explores the participation in conversation, the language use in various contexts, and the organization of sentences for more complex communication. We rated all responses following the authors’ scoring guide (Bassano, Labrell, and Bonnet 2020), leading to three separate scores. The Total vocabulary estimates the child’s vocabulary size in number of words. In the grammatical and pragmatic sections, Never responses were scored as 0, whereas Sometimes and Often were scored as 1, yielding a Total grammatical score and a Total pragmatic score. In TD, the study of Bassano et al. (2005) showed a plateau effect when approaching 42 months, thus limiting the DLPF clinical significance in TD after this age. Nonetheless, we administered the DLPF to children up to 68 months since we expected a later plateau in children with ASD given their frequent language delay. To our knowledge, the DLPF has never been explored in ASD. However, the authors have suggested it as a possible application (Bassano et al. 2020).
Demographic Moderators
We collected demographic measures that have been identified as moderators of language acquisition in TD and/or in ASD. The aim was to explore if those variables could affect the language phenotypes in ASD. First, we measured the participant's social-economic status with the highest level of education achieved by parents and the household yearly income (Hollingshead 1975). Parental education level was categorized as either (1) elementary school or high school completed or (2) college degree completed. The household income was divided into two subgroups based on the median value. In TD, parental socioeconomic status has been reported to affect the child language acquisition, e.g. through differences in parent-child interaction and/or availability of learning resources (Pace et al. 2017). In addition, we inquired about the number of children living in the same household as well as the language(s) spoken to the child. The number of siblings can enrich the language environment of the child (Bridges and Hoff 2014), eventually boosting his.her development. Plurilingual environments can affect the rate of lexical acquisition in TD (Bialystok and Craik 2010), even though a bilingual environment has not been associated with ASD verbal outcome (Drysdale, van der Meer, and Kagohara 2015; Hambly and Fombonne 2012). In our study, a monolingual environment was defined by an exclusive exposure to French, and a plurilingual environment by an exposure to French and at least one other language. In addition, we collected parental age at child’s birth as it has been associated with an increased likelihood to develop ASD, probably through accumulating De Novo mutations in germ cells (Lee and McGrath 2015). Nonetheless, no study to our knowledge has ever explored whether those higher De Novo mutations have a link with the language domain in ASD.
Non-verbal cognition
To measure non-verbal cognition before the age of 3, we used the Visual Reception (VR) and Fine Motor (FM) domains of the MSEL and we computed developmental quotient (DQ) scores. When the MSEL was not available, we used the corresponding domains in the PEP-3, namely FM and Verbal-preverbal cognition (CVP). Early motor skills have been associated with later RL skills in children with ASD (Hannant 2018). In addition, visuospatial cognition has been associated with later RL and EL skills in preschoolers with ASD (Hellendoorn et al. 2015). Some studies also showed that FM skills were associated with language outcomes in siblings at high familial-likelihood of autism (Hwang and Lee 2022; LeBarton and Iverson 2013).
Autistic symptoms
The Autism Diagnostic Observation Schedule, second edition (ADOS-2, Lord et al. 2012) comprises several semi-structured activities that quantify symptoms in two domains: social affect (SA) and restricted and repetitive behaviors or interests (RRB). The ADOS-2 comprises five modules that depend on the child's age and language level. To compare scores across modules we used Calibrated Severity Scores (Gotham, Pickles, and Lord 2009; Hus, Gotham, and Lord 2014) to obtain SA and RRB severity scores. The ADOS-2 were administered by trained examiners and video recorded for coding. Early ASD symptom severity has been associated with language outcome (Loucas et al. 2008; Thurm et al. 2015).
Statistical Analyses
Longitudinal Analyses
We used a mixed modeling method to investigate language trajectories (using DQs and DLPF measures) over time in TD and ASD. Mixed modeling has been successfully used to measure longitudinal changes in cognition in developmental disorders including ASD (Latrèche et al. 2021; Maeder et al. 2016). Age and diagnosis (TD versus ASD) were modeled as fixed effects while language measures as random effects. Random slope model analysis was carried out using the my Mixed Model Trajectories toolbox (available publicly https://github.com/danizoeller/myMixedModelsTrajectories) implemented in MATLAB R2019b (MathWorks). We estimated language trajectories between the TD and ASD groups by fitting random-slope quadratic models, each corresponding to a specific relationship between age and one language measure (which we denoted MSEL EL DQ, MSEL RL DQ, DLPF Total Vocabulary, DLPF Grammar Score, DLPF Pragmatic Score). We also carried out mixed-model analysis with non-verbal domains (MSEL VR DQ and MSEL FM DQ) (see Supplementary Figure S2). In another supplementary analysis we excluded all females from the TD and ASD sample to ensure that their inclusion did not affect our results regarding language (MSEL EL DQ, MSEL RL DQ) (see Supplementary Figure S3).
Furthermore, we explored the longitudinal trajectories of two-word combination acquisition (measured in a dedicated item within the grammar section of the DLPF). For this binary categorical variable, we used a chi-square test with an age sliding window of 6 months (corresponding to the average time-lapse between timepoints) with an increment of one month. We compared the proportion of participants who had acquired two-word combinations at each age frame TD versus ASD. False discovery rate (FDR) correction was applied over all the tested age bins.
Outcome timepoint
Since participants have many longitudinal verbal assessments, we had to decide which timepoint best reflected verbal outcome, to then input in the cluster analysis. We had two criteria for determining outcome timepoint. First, we wanted to determine the age by which verbal DQs were stable, as labile outcome measures could compromise cluster quality. Then, the selected outcome timepoints had to be approximately the same age across participants to minimize age heterogeneity. Here are the steps we followed. We first computed the age effect on verbal DQ using a sliding window of 12 months width and one month increment (Supplementary Figure S4). Within each 12-month window (e.g., for the period going from 12 to 23 months of age), a linear mixed-effect model was applied to test the effect of age on language (receptive language and expressive language), using all timepoints collected in the ASD group. Linear model included between-subject fixed effect and random intercept to predict verbal DQ with age (Matlab® R2018b function fitlme). False discovery rate (FDR) correction was applied over all the age bins tested.
We found that by 3.75 y.o. months, age had no significant effect anymore on receptive language (i.e., p > .05 in all sliding windows from 3.75 y.o. and older). Before this age, age had a positive significant effect on receptive language DQ in all windows. For expressive language, stability of DQ (i.e., non-significant age effect) was reached a bit earlier, at the age of 3.50 y.o. This means that after those ages (3.75 and 3.50 y.o., respectively), receptive and expressive language DQs were globally stable in the autistic group. When removing the timepoints collected before 3.75, the average age of autistic participants was 4.66 y.o.s. Consequently, we defined the outcome timepoint (i.e., timepoint from which verbal DQs were used to cluster) as the participants’ timepoint nearest to 4.66 years old.
Cluster Analysis
We used a data-driven cluster analysis approach to stratify our ASD sample into distinct language phenotypes based on participants’ verbal outcome. We applied the TwoStep clustering algorithm available on IBM® SPSS® Statistics 26.0 for macOs (Armonk, NY: IBM Corp.) (Chiu et al. 2001). One advantage of this approach compared to others (e.g., K-means clustering) is that one does not have to provide an a priori number of clusters. We tested all possible clustering solutions ranging from 1 to 15 clusters and the Akaike’s information criterion determined the optimal one. We used the two aggregate measures of verbal outcome as input (EL and RL DQs at outcome timepoint).
Once the ASD language outcome phenotypes were identified, we explored their differences at outcome, their fine-grained vocabulary, grammar and pragmatic early trajectories, the early moderators of verbal outcome between them and the early moderators of verbal outcome within them. To compare phenotypes at outcome, we carried out an ANOVA test (or chi-square for categorical variables) on the parental and child measures collected at 4.6 y.o. with two-by-two post-hoc comparisons with Bonferroni correction. Then, we described the early linguistic trajectories of the phenotypes using the DLPF metrics. We applied mixed-effect models using language phenotype as a fixed effect. For any statistically significant effect, we used a one-to-one post-hoc comparison applying Bonferroni correction. Furthermore, we examined whether early factors could have led some participants to express one phenotype instead of the others. We used ANOVA or chi-square across the phenotypes using demographic variables (i.e., present since birth) and early child behavioral measures (i.e., before the age of 3). Finally, to determine whether some factors might have specifically moderated the language outcome within clusters, we used verbal DQ at outcome timepoint (average between expressive and receptive language DQs) as the predicted value. Then, within each phenotype, we applied one linear regression model per early moderator to see if this measure predicted the verbal outcome. Predictor variables included demographic, early behavior and early intervention. P-values were corrected for multiple testing (one linear regression per phenotype, n testing = n phenotypes). ANOVA, chi-square and linear regressions were performed on IBM® SPSS® Statistics 26.0 for macOs (Armonk, NY: IBM Corp.). Statistical significance threshold was set at alpha = .05.