3.1. Human Subjects Study I – Engineering Graduate Students (Quantitative)
The pre-task survey demonstrates that for a p-value of 0.951, both sets of participants (MRCS and non-MRCS condition participants) have similar perceived skills, with an average rating of approximately 3.80 out of 10, and similar standard deviation, when asked to evaluate their skillset before the Human Study task execution, as shown in Table 6. This ensures that both sets of study participants start at the same level of pre-task execution, as they have similar skill set ratings.
Table 6
Descriptive statistics: pre-task-execution
Type | N | Mean | StDev |
MRCS | 20 | 3.80 | 2.55 |
Non-MRCS | 20 | 3.75 | 2.59 |
In addition, the study participants provided an additional confidence rating with surgical skill planning and task execution, with over 90% identifying as incompetent with the task at hand, or any for that matter, as shown in Fig. 10. Only 10% demonstrated a somewhat competency with surgical skill planning and task execution. This helps confirm that they have no medical training. No study participant indicated they were either Competent and/or Very Competent.
The study participants' average incision depths were calculated based on the two study populations. Despite the target depth being 2.5mm for both sets of study participants, both groups could not achieve it as an average of their attempts. Despite this, the MRCS condition participants demonstrate an average incision depth of 3.304 mm, which is lower than the average incision depth of 4.349 mm for the non-MRCS condition participants, demonstrating the MRCS system's effectiveness in improving the performance of study participants (Table 7). The two averages are statistically significant, as their p-value is less than 0.05. In addition, the standard deviation for the MRCS study participants, at 0.474, is lower than that for the non-MRCS study participants, at 0.528. This indicates that the MRCS study participants' incision depth data points are much less spread around their mean and more accurate than those of the non-MRCS study participants.
Table 7
Performance metrics for human subject study participants
| Non-MRCS | MRCS |
Sample Size | 20 | 20 |
Target Depth (mm) | 2.5 | 2.5 |
Average Incision Depth (mm) Effectiveness) | 4.349 | 3.304 |
St Dev (Incision Depth) | 0.528 | 0.474 |
SE Mean | 0.084 | 0.075 |
Minimum Zero Difference (Proficiency Gain) | N/A | N/A1 |
iMinimum Zero Point is not reached.
Proficiency gain, the point at which a near zero-sum difference between the target and actual values is achieved and maintained, could not be seen in either group before the end of the 40-attempt task execution. This is likely because the participants lack prior surgical experience, and the task iterations are limited to 40 attempts.
The data is then analyzed as an average of the difference at each iteration for all study participants within their study participant groups. The data is plotted with the target value of 2.5mm as a negative as the study participants cut a vertical depth into a simulated tissue. it is also seen from Fig. 11 that the MRCS condition participants started much lower in the difference between their task execution incision depths at approximately 1.7 mm from the target value of 2.5 mm as compared to the 3.2 mm difference for the non-MRCS condition participants (Fig. 11). The slope in Fig. 11 shows that the vertical incision depth for both groups is trending towards the desired target zero difference. In addition, the slope for the non-MRCS group is steeper at 0.0168 than the MRCS group at 0.0117, indicating that the non-MRCS condition participants learn faster despite starting much higher in task execution.
The intercept for the trendline for the MRCS condition group is also lower at -3.5432 mm compared to the non-MRCS condition group at -4.6926 mm (Table 8), showing that they start at an incision depth whose difference is much closer to the target value of 2.5 mm.
Table 8
Trendline for MRCS and non-MRCS condition participants
| Non-MRCS | MRCS |
Slope | 0.0168 | 0.0117 |
Intercept | -4.6926 | -3.5432 |
The study participants who used the MRCS were far more likely to stay within their group average, as evidenced by collating their data points around the group mean as the iteration count increased. Non-MRCS condition participants performed much worse over time, with their group mean of 4.46 mm higher than the MRCS condition participant group mean of 3.38 mm. It can also be seen that the non-MRCS condition participants failed to reach that grouping within the two standard errors. The collation of the incision depth points within two standard errors of the group means for the MRCS condition participants simply translates to the ability of the MRCS platform to keep these study participants within their task execution range much more consistent as compared to the non-MRCS condition participants whose performance is unable to be within the two standard errors as they continued to perform the task.
To better understand the study participants' self-perception of performance, their attribute responses are analyzed to determine their confidence and task execution satisfaction. They are also asked to self-rate their confidence and task execution performance (Table 9). Despite participants displaying low confidence (Fig. 10) in their skillset and incompetence before the study task, both groups positively perceived their performance after the study, as shown in Table 9. The non-MRCS showed a much higher confidence in Task Execution Satisfaction despite not performing well, as they had much larger average incision depths and failed to reach task proficiency. The MRCS study participants scored lower (50%) for their task execution satisfaction as they could obtain real-time feedback on their task execution performance due to the MRCS guided aids. In addition, nearly 90% of the MRCS condition participants highlighted using the MRCS as an effective aid in achieving their tasks compared to 70% of the non-MRCS condition participants.
Table 9
Survey of study participants' self-evaluation
Description | Non-MRCS | MRCS |
User Surgical Task Execution Satisfaction (Post-Test Satisfaction) | 70% | 50% |
User Assessment of Effectiveness of Test Setup (medium of helpfulness) | 70% | 90% |
The study participants also improved their self-confidence in task planning and execution from the training. The MRCS condition participants showed remarkable task execution confidence from the training exercise, from 5% before the human study to a 65% approval rating from learning in that environment to become somewhat competent in task planning and execution, as shown in Fig. 12.
The non-MRCS study participants did not see a greater than 50% change in any category, with only a 10% change moving into a somewhat competent confidence in their task execution confidence, as shown in Fig. 13.
To better understand the training effectiveness among the participants, they completed a rating scale of 1–10 to quantify their responses on study participation and how practical the study had been with the results displayed in Table 10. In this survey, the MRCS study participants felt that the study had been more valuable in helping them make an effective hernia surgical plan than the non-MRCS participants. They noted correctly identifying and planning their route for that procedure and recognizing the different anatomy around the target site. The study also positively impacted their understanding of task planning and execution compared to the non-MRCS study participants.
A 2-sample t-test compares the means of the two groups' self-perceived performance rating pre- and post-task execution, as shown in Table 10. With a p-value threshold of 0.05, neither group of participants had a statistical significance of their self-perceived performance pre- and post-task execution nor between each other at post-task execution, as shown in Table 10. The p-value for the MRCS condition for the pre-and post-task execution comparison is 0.097, and the p-value for the non-MRCS pre- and post-task execution is 0.234.
Since these (both with and without MRCS) are new environments for these participants, they have difficulty perceiving their improvement, despite the MRCS participants acknowledging in the post-task completion surveys that the MRCS specifically helped them much better than the non-MRCS participants, as shown in Fig. 12 and Fig. 13.
It can also be seen from Table 10 that the lack of significance between the two groups, MRCS and non-MRCS, when the participants are asked to self-evaluate their performance in the human study, could signal that they are both not better yet when the actual performance, in terms of vertical depth incision, between the two groups is compared, the MRCS Study group outperformed the non-MRCS study group as shown in Table 7.
Table 10
Study participants' self-evaluation ratings of pre-and post-task execution (TE) performance
Sample | N | Mean | StDev | P-value |
Pre_TE_MRCS | 20 | 3.80 | 2.55 | 0.097 |
Post_TE_MRCS | 20 | 5.20 | 2.65 |
Pre_TE_NoMRCS | 20 | 3.75 | 2.59 | 0.234 |
Post_TE_NoMRCS | 20 | 4.65 | 2.08 |
Post_TE_MRCS | 20 | 5.20 | 2.65 | 0.470 |
Post_TE_NoMRCS | 20 | 4.65 | 2.08 |
The study participants are then surveyed, post-task execution, for the perceived workload. Using the raw unweighted scores, the MRCS study participants were higher than the non-MRCS study participants in five of the six sub-scales and overall workload (Fig. 14). This is likely due to the headset's weight and the time required of the MRCS study participants to walk through the tutorial and further execute the task. Despite this additional workload, they could still outperform their non-MRCS study counterparts. The MRCS study participants only tested lower on the temporal workload as they did not feel the task required much time to master and execute compared to the non-MRCS study participants.
There is no statistical significance between the two groups of study participants in perceived mental, physical, and frustration workload when a threshold p-value of 0.05 is considered, as shown in Table 11. However, the p-values of Temporal, Performance, and Effort are lower than the threshold of 0.05 at 0.027, 0.046, and 0.021, respectively. The difference in the Performance and Effort workload contribution by the MRCS may be attributed to the time required to master and perfect their technique, as the stream of images requires these study participants to engage more with the system. However, the temporal workload required to complete the task is higher for the non-MRCS as they need more guidance in expectation of the task. They have no image in front of their task interaction guiding them on the outcome of their task.
3.2. Human Subjects Study II – Medical Residents (Qualitative)
Seven Medical residents are asked to perform the same study as the MRCS condition with the same constraints and time. Their responses regarding using the MRCS are recorded and analyzed to capture their feedback correctly. The first level of feedback is from the medical resident's interaction with the 3D bioprint Specimen. The responsiveness to the user action of incision is well received, as well as the ability of the post-treatment of the bioprint to reflect diseased tissue, making it hard for the medical residents to know when to stop at the precise depth point at the start of the test execution. This is reflected in their user vertical incision depth performance as, on average, their depth is beyond 2.5mm at 4.5mm for four of the seven residents. According to their feedback, three of the seven residents in the initial response exceeded and cut through the total height of the 3D Bioprint specimen as they seemed to underestimate the simulated diseased tissue's tactile response before understanding its texture and adjusting.
The medical residents also supported the image manipulation tutorials and anatomy identification portion of the task execution exercise. They highlighted the training as critically instrumental in their thought process on planning and approaching the surgical site for task execution. The medical residents also expressed concern for the motion allowed within the virtual image, as their surgical technique execution required specific access to the surgical site. Moving forward, any user-activated virtual image interaction would need additional definitions of multiple degrees of freedom beyond translation, resizing, and rotating that the current virtual image entails. The medical residents also expressed concern about the contrast of the virtual image with the environment, making it harder to visualize at certain angles. This could be due to the lighting in the room, but this is a potential area for improvement in the future.
Regarding using the MRCS, the medical residents responded positively when asked to gauge their confidence in using it, as shown in Fig. 15. Of the seven medical residents, only one indicated they are still very incompetent when it came to understanding how to surgically plan and execute the specific human study procedure, as opposed to three before the start of the human study. The most significant jump in confidence in the surgical plan and surgical execution happened when at least five medical residents evaluated themselves as having been at least somewhat competent after interaction with the MRCS. The only student who had marked themselves as very competent before the study is now downgraded to a somewhat competent mindset. This specific downgrade of a self-assessment can be considered as a user understanding that they might not be as informed regarding surgical planning and execution as they initially thought, as they had overstated their previous knowledge base. Interaction with the MRCS training portion pre-task execution could have led to this potential self-realization of their limited skill set.
A comparison of the Post TE confidence rating between the non-medical graduate engineering students of Study 1 and medical residents in this study shows no statistical significance between the two groups at a p-value of 0.571 (Table 12). Both groups were confident in having executed the task using the MRCS platform.
Table 12
Medical resident and graduate engineering student self-evaluation ratings of post-task execution (TE) performance.
Study participant | N | Mean | Standard Deviation | P-Value |
Post-TE graduate engineering student | 20 | 5.20 | 2.65 | 0.571 |
Post-TE medical resident | 7 | 4.43 | 3.10 |
The workload rating by the medical residents highlighted that the effort and frustration required to perform the surgical task is relevant and is reflected in the Workload ratings as shown in Fig. 16. This is similar to the engineering graduate students, as both sets required time to acclimate to the MRCS platform's user interface when targeting the 2.5mm vertical incision.
Table 13
2 Sample T-Test NASA TLX workload comparison between medical residents and graduate engineering students using the MRCS.
Variable | Type | N | Mean | StDev | P-Value |
Mental | Graduate engineering student | 20 | 31.3 | 21.1 | 0.868 |
Medical resident | 7 | 30.0 | 15.0 |
Physical | Graduate engineering student | 20 | 24.5 | 19.6 | 0.175 |
Medical resident | 7 | 14.3 | 14.8 |
Temporal | Graduate engineering student | 20 | 13.3 | 16.8 | 0.870 |
Medical resident | 7 | 12.1 | 14.4 |
Performance | Graduate engineering student | 20 | 43.8 | 26.2 | 0.815 |
Medical resident | 7 | 41.4 | 20.6 |
Effort | Graduate engineering student | 20 | 41.3 | 19.7 | 0.521 |
Medical resident | 7 | 47.9 | 23.4 |
Frustration | Graduate engineering student | 20 | 25.5 | 24.6 | 0.019 |
Medical resident | 7 | 51.4 | 20.8 |
The temporal workload is highest for non-MRCS engineering graduate students. This is probably because they had to rely on their intuition to target a vertical incision depth and had no marker to tell them exactly when to stop. With the task's iterative nature, this realization could have led them to believe they felt rushed and hurried to perform the job. The MRCS participants (Graduate and Medical residents) benefitted from the MRCS, letting them know exactly when to stop. Still, the frustration with using the MRCS is highlighted in the comparison with the graduate students, as shown in Table 13, as the p-value is 0.019 and is statistically significant. This may be because the environment differs from what they expect or experience. There is no statistical significance between these two groups in the other five types of workloads when engaged with the MRCS.
3.2.1. Proctor Evaluation
The medical residents demonstrated familiarity with the patient positioning aspect of the exercise. They knew how and when to identify areas needing more proper guidance and instruction for learners. Despite the additional fidelity required in these scenarios, the current setup is adequate for personnel with limited exposure to medical training as a learning and training tool to become proficient.
The students can assess the surgical site and area for proper access and, despite the virtual imagery's confines, can also place the virtual specimen in the correct position for entry. Assessing the surgical site for entry denotes the student physically engaging with the site to ensure they have a visual and tactile feel of the tissue response. This assessment happened throughout the surgical site setup and task execution. However, the students did not augment their technique to ensure that they were accurate in their incision depth, as it can be seen from the video review that they had not achieved dexterous tissue manipulation. This can be further rectified with additional engagement with the MRCS, as the iterative exposure can undoubtedly lead to decreased timeline for skill acquisition.