Characteristics of the eligible studies
We searched the relevant databases and read the abstracts and full texts of articles found during this search. Seventeen studies were included in the analysis [8, 16, 17, 21-34].
The publication period of the retrieved literature was between 2015 and 2020. Nine of the seventeen studies were from China, four from the United States, and one each from the United Kingdom, Australia, Japan, and Singapore, respectively. Six studies investigated the use of the models of the nervous system, while five investigated the use of heart models (Table 1). The quality evaluation of most literature studies was high or moderate. Details on the literature quality assessment of the included studies appear in Supplementary Table S1. In all the 17 studies, subjects were divided into groups by randomized controlled grouping. In a few studies, the method of generating random numbers was described in detail. In none of them was the use of any blind method described. In one study, a student dropped out of the test [22]. Since the number of students included in these studies was relatively small, there may have been some bias [16, 22, 25].
Meta-analyses
Post-training tests
1.1 Nervous system model
Six studies compared 3D printed models with conventional nervous system models [17, 21, 26, 28, 31, 34]. There were 198 in the experimental group and 195 in the control group. The results showed a significant difference between the two groups (SMD: 1.27, 95% confidence interval [CI]: 0.82–1.72, P < 0.05; Fig. 1). This showed that the performance of the 3D group was better than the conventional group.
1.2 Heart model
Five studies compared 3D printed heart models with conventional heart models. These studies included a total of 100 participants in the 3D printing group and 102 participants in the conventional group [22-25, 29]. Tests were administered after the instructions for using the models or conventional methods had been given. The test score variables were continuous. Due to the different test score standards used in different studies, we used an SMD to merge the means. The results showed no significant difference between the two groups (SMD: 0.37, 95% confidence interval [CI]: – 0.25–0.98, P > 0.05; Fig. S1). Therefore, the performance of the 3D group was no better than that of the traditional group.
1.3 Abdominal anatomy
Three papers were included in the study [16, 26, 30]. The results showed that there was a significant difference between the two groups (SMD: 2.01, 95% confidence interval [CI]: 0.55–3.46, P < 0.05; Fig. S2). The results showed that the test result of the 3D group was better than that of the control group.
1.4 3D vs cadaver
Four studies compared 3D printed models with cadaver specimens [17, 22, 27, 34]. There were 153 in the experimental group and 149 in the cadaver specimen group. The results showed a significant difference between the two groups (SMD: 0.69, 95% confidence interval [CI]: 0.27–0.99, P < 0.05; Fig. 2) (i.e., the performance of the 3D group was better than the cadaver specimen group).
1.5 3D vs 2D
Ten studies compared 3D printed models with 2D pictures [16,21, 23, 24, 26, 28, 29,30,31,33]. There were 379 3D printed models in the experimental group and 378 2D pictures in the control group. The results showed a significant difference between the two groups (SMD: 1.05, 95% confidence interval [CI]: 0.64–1.64, P < 0.05; Fig. S3) (i.e., the performance of the 3D group was better than the 2D group).
Answering time
Three studies compared the differences in the answering time between the 3D printing groups and conventional groups [21,25,26]. The random effects model suggested a statistical significance (SMD: – 0.61, 95% CI: – 0.98 to – 0.24, P < 0.05; Supplementary Fig. S4). This also suggested that the answering time in the 3D printing groups was shorter compared to the conventional groups.
Usefulness
Three studies compared 3D printed models to conventional models regarding utility [16, 21, 32]. The random effects models suggested a statistical significance (RR = 2.29, 95% CI: 1.22–4.27, P < 0.05, Fig. 3). This suggested that the instruction for 3D printing was more useful compared to the instruction for conventional models.
4. Satisfaction
Six studies described the level of satisfaction in the 3D printing and conventional groups [16, 23, 25, 26][33,34]. Results from five studies indicated that students in the 3D printing group were more satisfied compared to students in the conventional group. Only one article reported that there was no statistical difference in satisfaction between the students in the 3D printing group and those in the conventional group (Supplementary Table S2).
5. Accuracy
Two studies investigated the answering accuracy in the 3D printing and conventional groups [32, 35]. The two studies are descriptive and do not incorporate data. These studies found that answering accuracy in the 3D printing group was better compared to the conventional group (Supplementary Table S3).
6. Sensitivity analysis
Regarding studies about the nervous system, each time a study was deleted and the rest of the data were combined, the P-values were less than 0.05 (Fig. 4), which suggested that the result was stable and reliable. Similarly, while comparing 3D models with cadavers, we omitted one study at a time, and the pooled estimates were calculated in both the 3D printing and conventional groups (Fig. 5). Each time a study was ignored, the pooled estimates were found to be < 0.05, which suggested that the result was stable and reliable as well.
7. Test for publication bias
In the funnel plots of the 3D printing model and the conventional model of performance testing (nervous system, Fig. 6), both Egger and Begg’s tests showed a P-value of > 0.05, indicating an even and symmetrical distribution with no publication bias. However, the integrated study of 3D vs 2D shows that P < 0.05, suggesting that there may be a publication bias.