A summary of the demographics for the subjects used in this study is given in Table 1. This includes data from four different databases: CT data from 122 SC patients (Apert, Crouzon, and Muenke, mean age of 5.0 ± 5.1 years, 58% male (n = 70)) and 142 healthy infants (mean age, 1.9 ± 1.2 years, 56% male), Stereophotogrammetric data from 196 healthy subjects from the LSFM dataset 15, and 139 healthy subjects from the LYHM database11. As the volunteers in both datasets were typically above the age of four while many cases in the syndromic dataset were quite young, a dataset of CT scans from healthy children was also used. For full details, see Methods.
Table 1
Overview of the face and cranium dataset of the included Syndromic Craniosynostosis and normal samples. All syndromic and infant samples were acquired via CT-scan. The LSFM and LYHM databases were obtained using 3dMD™ photometric stereo capture device set-ups 10,11
Type of SC | Number of subjects | Average age, years | Age range at time of scan | Sex (M:F) |
Face | | | | |
LSFM | 196 | 10.5 ± 4.0 | 4 years – 17 years | 98:98 (50%:50%) |
Paediatric | 142 | 1.9 ± 1.2 | 1 day – 47 months | 79:63 (56%:44%) |
Apert | 47 | 6.1 ± 6.2 | 48 days – 20 years | 28:19 (60%:40%) |
Crouzon | 61 | 5.3 ± 4.4 | 25 days – 17 years | 35:25 (58%:42%) |
Muenke | 14 | 1.6 ± 2.1 | 1 day – 8 years | 7:7 (50%:50%) |
Total | 460 | | 1 day – 20 years | 247:213 (54%:46%) |
Cranium | | | | |
LYHM | 139 | 10.9 ± 3.8 | 4 years – 18 years | 76:63 (55%:45%) |
Paediatric | 111 | 1.8 ± 1.1 | 1 day – 47 months | 59:52 (53%:47%) |
Apert | 39 | 6.5 ± 6.3 | 48 days – 20 years | 22:17 (56%:44%) |
Crouzon | 53 | 5.4 ± 4.4 | 5 months – 17 years | 30:23 (57%:43%) |
Muenke | 11 | 1.7 ± 2.3 | 1 day – 8 years | 6:5 (55%:45%) |
Total | 353 | | 1 day – 20 years | 193:160 (55%:45%) |
Intrinsic Model Evaluation
Using the available databases, three distinct classes of model were constructed to assess the role of facial and cranial shape in the diagnosis of SC; a face-only model, a head-only model, and a combined head-and-face model (see Methods).
All models reported low reconstruction errors when assessing their reconstruction accuracy and specificity (see Methods). For the face, head and combined models, these values were 1.4 ± 1.2mm, 3.8 ± 3.1mm, and 2.9 ± 2.5mm, respectively. Reconstruction error was higher for models that included the head shape, likely due to the greater degree of variation between subjects in this region. Model specificity was evaluated by randomly synthesising 1000 samples and comparing them to their nearest real neighbour 10. Values of 2.7mm, 4.3mm, and 3.9mm for the face-only, head-only, and combined models respectively indicate that the samples generated are realistic.
Manifold Visualisation
To assess the diagnostic capacity of the model, t-distributed stochastic neighbour embedding (t-SNE) was applied to the high dimensional latent vectors of the patients and volunteers. (Fig. 1) When samples were labelled by syndromic class, clear clusters emerge for the face-only healthy and syndromic groups (Apert, Crouzon, and Muenke). The two clusters that are observed for the healthy individuals are due to age, with samples from the paediatric and LSFM datasets clustering separately. Within the syndromic cluster, we observe further sub-clusters for the included syndromes. The clusters formed for the head-only embeddings are not as distinct as those observed for the head-only cases, however groups for Apert, Crouzon, Muenke, and healthy individuals do emerge. When considering the model constructed using the combined head-and-face template, we again see clear clusters forming for the different subgroups in the dataset.
In all cases, even though the syndromic samples tend to group more tightly together, it is noted that the syndromes themselves seem relatively disentangled. The proximity of the Crouzon cluster to the normal cases in each of the embeddings indicates that this phenotype has a milder manifestation than either Apert or Muenke syndrome. For both the head-only and combined head-and-face embeddings, we note that a number of Muenke and Crouzon samples cluster closer to the group of healthy cases.
Syndrome Classification
Classification was performed with all syndromic and non-syndromic scans. A split of 80%-20% for training and testing data was assessed over 1,000 iterations. The mean sensitivity, specificity, and accuracy over all iterations for each of the assessed regions in the binary classification experiment, and the confusion matrices for both binary and multi-class classification, are shown in Table 2 and Fig. 2, respectively.
Binary classification to identify whether an individual belonged to either the syndromic or non-syndromic groups showed accuracies of greater than 99% in all cases. The high sensitivity of the models indicates that very few syndromic cases were misidentified as normal volunteers (one, sixteen, and two cases per thousand for the face-only, head-only, and combined models respectively). The inverse is also true; as indicated by the high model specificity values, few healthy volunteers were seldom, if ever, misidentified as having a craniofacial syndrome.
The multi-class classification model endeavoured to predict whether a patient belonged to either the non-syndromic, Apert, Crouzon, or Muenke categories. Accuracies of 98.3%, 97.9%, and 98.2% were observed for the face-only, head-only, and combined regions, respectively. When considering the face-only, Muenke patients were the most likely to be mis-diagnosed by our model. As there are the fewest instances of this syndrome in the database, such results are to be expected, and increasing the quantity of Muenke samples would likely lead to increased accuracy for these patients. When the head shape was considered for classification, Crouzon and Apert patients were most likely to be misidentified as each other. As with the binary classification, the poorest performance was seen for the head-only model. These results would indicate that the facial region contains valuable shape information for the correct diagnosis of SC.
Our method has proven as least as sensitive as expert diagnosis in a multidisciplinary clinic and in one case more sensitive. A 2-year-old sibling of a patient with Crouzon syndrome, judged clinically to be unaffected, proved to have Crouzon syndrome on genetic testing. Subsequent three images analysed by our technique demonstrated the ability of the model to diagnose clinically undetectable variations from the norm. (Fig. 3)
Table 2
Classification results for the binary classification experiments.
Model | Sensitivity (%) | Specificity (%) | Accuracy (%) |
Face Only | 99.95 | 100.00 | 99.98 |
Head Only | 98.36 | 99.41 | 99.09 |
Head and Face | 99.82 | 100.00 | 99.95 |