Compared to the previous investigation (Vyshedskiy et al., 2024), this study focused on participants who demonstrated a general proclivity for art, as determined by their caregiver answering “very true” to the question “[My child] does drawing, coloring, art” (N = 14,279; 36.0% of the original cohort). Other participants may not have been exposed to “drawing, coloring, and art,” may have physical disability, or may not have been interested in this activity and therefore were excluded from analysis.
Caregivers evaluated 15 language comprehension abilities (Table 1). To explore their co-occurrence, we used two common clustering methods: Unsupervised Hierarchical Cluster Analysis (UHCA) and Principal Component Analysis (PCA). Clustering techniques automatically organize abilities based on their co-occurrence. If two linguistic abilities are mediated by the same underlying mechanism, then, the breakdown of this mechanism should result in the absence of both abilities, causing them to be grouped into the same cluster. Importantly, the clustering analysis was devoid of any design or hypothesis, as the process is entirely driven by the data.
Figure 1A depicts the dendrogram generated by UHCA. The height of the branches indicates the distance between clusters, which is an indicator of greater dissimilarity. As in the previous study, three clusters have inter-cluster distances that are significantly larger than the distances between subgroups. The right-most cluster includes knowing the name, responding to ‘No’ or ‘Stop’, responding to praise, and following some commands (items 1 to 4 in Table 1). This cluster is identical to the command-language-comprehension-cluster identified in the previous study of 31,845 autistic individuals with the addition of the ‘responds to praise’ item that was not previously analyzed (Vyshedskiy et al., 2024). The cluster in the middle includes understanding color and size modifiers, several modifiers in a sentence, size superlatives, and numbers (items 5 to 8 in Table 1). This cluster is identical to the previously identified modifier-language-comprehension-cluster. The left-most cluster includes understanding of spatial prepositions, verb tenses, flexible syntax, possessive pronouns, explanations about people and situations, simple stories, and elaborate fairy tales (items 9 to 15 in Table 1). This cluster is identical to the previously identified syntactic-language-comprehension-cluster.
The PCA (Fig. 1B) also demonstrates a clear separation between the same three clusters. The command items (knowing the name, responding to ‘No’ or ‘Stop’, responding to praise, and following some commands) are clustered in the top left corner. The modifier items (understanding color and size modifiers, several modifiers in a sentence, size superlatives, and numbers) are clustered in the lower middle. The syntactic items (understanding of spatial prepositions, verb tenses, flexible syntax, possessive pronouns, explanations about people and situations, simple stories, and elaborate fairy tales) are clustered in the top right corner.
The three-cluster solution was stable across multiple seeds as well as consistent across different age groups (4 to 6 years of age, Figure S1; 6 to 21 years of age, Figure S2;), and across different time points (first evaluation, Figure S3; last evaluation, Fig. 1).
Next, we calculated UHCA and PCA of the 15 language comprehension abilities along with the draws representationally item defined as a “very true” response to the query “[My child] draws a VARIETY of RECOGNIZABLE images (objects, people, animals, etc.)” (Fig. 2). Both UHCA and PCA grouped the draws representationally item with the syntactic items indicating their common co-occurrence.
Furthermore, we calculated UHCA and PCA of the 15 language comprehension abilities along with the draws to description item defined as a “very true” response to the query “[My child] can draw a NOVEL image following YOUR description (e.g. a three-headed horse)” (Fig. 3). Both UHCA and PCA grouped the draws to description item with the syntactic items indicating their common co-occurrence.
As a control we calculated UHCA and PCA of the 15 language comprehension abilities along with the hyperactivity item, which is not expected to be related to any particular language ability and therefore should cluster into its own group. As expected, both UHCA and PCA clustered the hyperactivity item into its own group at a significant distance from the three language clusters, validating the effectiveness of both clustering techniques (Fig. 4).
Having established that the representational drawing ability co-occurred with the cluster of syntactic language comprehension abilities, we aimed to assess the proportion of individuals in each language comprehension phenotype which exhibits representational drawing ability. To assign language comprehension phenotypes we utilized unsupervised hierarchical clustering analysis of all 14,279 participants (Fig. 5, the dendrogram on top). The two-dimensional heatmap clearly shows the same three language comprehension phenotypes as were identified in the previous study of participants with language deficits (Vyshedskiy et al., 2024). Columns represent all the 14,279 participants and rows represent the 15 linguistic abilities. Blue indicates the presence of a linguistic ability (parent’s response = very true); white indicates an intermittent presence of a linguistic ability (parent’s response = somewhat true); and red indicates the lack of a linguistic ability (parent’s response = not true). The three clusters of participants match the three language comprehension abilities clusters. The cluster of participants termed “Syntactic Language Phenotype” shows the predominant blue color that indicated good skills across all three (syntactic, modifier, and command) language comprehension items (26.0% of participants, Table 2). The cluster of participants marked as “Command Language Phenotype” shows the predominant blue color indicating good skills only among the command-items (18.9%). The third cluster of participants marked “Modifier Language Phenotype” shows the predominant blue color indicating good skills only across modifier- and command-items (55.2%). The three phenotypes were stable across different seeds, age groups (4 to 6 and 6 to 21 years of age, Figures S4 and S5, respectively), and time points (first and last evaluations; Figures S6 and 2, respectively).
The representational drawing ability, as determined by a caregiver answering “very true” to the question “[My child] draws a VARIETY of RECOGNIZABLE images (objects, people, animals, etc.),” is indicated by short black vertical marks above the heatmap (marked Representational_drawing). Representational drawing was manifested by 54.1% of individuals with syntactic-, 27.8% of those with modifier-, and 10.1% of participants with command-phenotype (Table 3, all pairwise differences between the phenotypes were statistically significant, t-test: p < 0.0001). Age did not significantly affect this distribution. Among participants of 4- to 6-years-of-age, representational drawing was manifested by 48.5% of individuals with syntactic-, 24.6% of those with modifier-, and 10.1% of those with command-phenotype (Table S1, all pairwise differences between the phenotypes were statistically significant, p < 0.0001); among participants of 6- to 21-years-of-age, representational drawing was manifested by 63.7% of individuals with syntactic-, 35.1% of those with modifier-, and 16.9% of those with command-phenotype (Table S2, all pairwise differences between the phenotypes were statistically significant, p < 0.0001).
Another question concerned with drawing-to-description ability was posed as follows: “[My child] can draw a NOVEL image following YOUR description (e.g. a three-headed horse).” The drawing-to-description ability was manifested by 34.6% of individuals with syntactic-, 7.9% of those with modifier-, and 1.9% of participants with command-phenotype (Table 4; all pairwise differences between the phenotypes were statistically significant, p < 0.0001). Age did not significantly affect this distribution. Among participants of 4- to 6-years-of-age, representational drawing was manifested by 28.4% of individuals with syntactic-, 6.6% of those with modifier-, and only 1.9% of individuals with command-phenotype (Table S3, all pairwise differences between the phenotypes were statistically significant, p < 0.0001); among participants of 6- to 21-years-of-age, representational drawing was manifested by 45.2% of individuals with syntactic-, 11.7% of those with modifier-, and 3.3% of those with command-phenotype (Table S4, all pairwise differences between the phenotypes were statistically significant, p < 0.0001).
Table 2
Participant cluster statistics
| Syntactic Language Phenotype | Modifier Language Phenotype | Command Language Phenotype | Total |
Number of participants | 3711 | 7875 | 2693 | 14279 |
Percent of Total | 26.0 | 55.2 | 18.9 | 100.0 |
Age, Mean (SD) | 6.4 (2.4) | 6.4 (2.5) | 6.3 (2.5) | 6.4 (2.4) |
Percent Male | 70.5 | 72.6 | 73.1 | 72.2 |
Table 3
The representational drawing ability, determined by a caregiver answering “very true” to the question “[My child] draws a VARIETY of RECOGNIZABLE images (objects, people, animals, etc.).”
| Syntactic Language Phenotype | Modifier Language Phenotype | Command Language Phenotype | Total |
Number of participants per cluster | 3711 | 7875 | 2693 | 14279 |
Number of participants with the representational drawing ability | 2007 | 2185 | 272 | 4464 |
Percent (%) | 54.1 | 27.7 | 10.1 | 31.3 |
Table 4
The drawing-to-description ability, determined by a caregiver answering “very true” to the question “[My child] can draw a NOVEL image following YOUR description (e.g. a three-headed horse).”
| Syntactic Language Phenotype | Modifier Language Phenotype | Command Language Phenotype | Total |
Number of participants per cluster | 3711 | 7875 | 2693 | 14279 |
Number of participants with the drawing-to-description ability | 1283 | 622 | 50 | 1955 |
Percent (%) | 34.6 | 7.9 | 1.9 | 13.7 |