The study collected pre- and post-study questionnaires, eye tracking recordings, and experiment video footages from the forty participants. Based on these data, we analysed the influence of the respective user interfaces on the participants’ cognitive load, as well as usability and learnability of the user interfaces. Specifically, the cognitive load was measured through blink duration, fixation duration, gaze areas, and NASA-TLX scale.
4.1 Cognitive load
4.1.1 Blink duration
Eye blink reflects the participants’ visual attention activities (Zagermann et al, 2016), as long blink duration used to indicate high cognitive load (Behroozi et al, 2018). We used the Pupil Core eye tracking system to collect eye movement data of each participant and processed the resulting data to remove outliers and derive averages, which ultimately resulted in the average blink duration for the two groups of participants. We calculated both the mean and median to make the results more comprehensive and reliable. The results showed that there were descriptive differences of mean/median eye blink duration between the two groups (Mgroup A = 0.211ms, SDgroup A =0.034, low = 0.166, median = 0.205, high = 0.287, Mgroup B = 0.201ms, SDgroup B = 0.019, low = 0.172, median = 0.195, high = 0.239) (Fig.12).
Due to non-normal distribution of the results, we adopted non-parametric analyse to test statistical difference of eye blink duration between the groups, and no significant difference was found (Mann Whiteny test U = 116, z = -0.720, P = 0.471 > 0.05). This finding indicated that the MCUIs had little effect on blink duration, reflecting weak cognitive load change.
4.1.2 Fixation duration
Fixation is a voluntary movement that refers to the eye gazes that last approximately from 200-300ms to several seconds (Zagermann et al, 2016). High cognitive load often leads to long eye fixation duration (Rudmann et al, 2003, Zagermann et al, 2016). A long eye fixation usually means a large amount of time is necessarily spent on interpreting the visual components. During the process, there are few meaningful visual components to be perceived.
We calculated the mean fixation duration of the two participants groups by processing the obtained eye movement data to remove outliers and derive mean values. There were descriptive differences between the two groups’ mean/median eye fixation duration (Mgroup A = 159.477ms, SDgroup A = 20.110, low = 134.346, median = 154.128, high = 205.243, Mgroup B = 156.988ms, SDgroup B = 20.174, low = 120.090, median = 154.887, high = 195.010) (Fig.13). Given the non-normal distributions of the result data, we tested the two groups’ statistical difference with non-parametric method and no significant difference was found (Mann-Whitney test U =133, z = -0.108, P = 0.914 > 0.05). This finding was consistent with the previous one that the MCUIs had little effect on the participants’ fixation duration, reflecting a weak change in cognitive load.
4.1.3 Gaze areas
We analysed gaze areas and visualised related data in the form of heatmap to reflect the participants’ regions of interest. Heat maps are effective in discovering the areas that received the greatest number of eye gazes, which were successfully tested in many previous studies (Jyotsna and Amudha 2018, Le Meur et al. 2017, Chandrika et al. 2020).
To probe how the participants looked at the MCUIs during the repetitive parcel sorting tasks, we analysed the participants’ eye gazes. Since the participants were constantly moving during the parcel sorting tasks, we manually extracted 200 specific eye tracking video clips of when the participants viewed these cardboards with MCUIs. These video clips, which lasted for couple seconds on average, comprised a number of images viewing from a static point of view. Since the participants in group A picked one of these parcels a time from the desk, the heatmaps revealed how the participants looked at the target parcel and the others (Fig.14 a). In contrast, the heatmap of the group B shows how the participants’ main visual attention when they were given multiple visual clues during parcel selecting (Fig.14 b).
The results showed that both groups exhibited the similar observable eye gaze fixation areas and the main eye gaze areas concentrated on the target cardboard, regardless of the display of MCUIs. We calculated dispersions of the fixation data and conducted a variance analysis. The results showed that the two sets of data were significantly different (p = 0.044 < 0.05). The differences of eye gaze areas are also descriptively noticeable. For example, the group B (MCUIs) participants showed more divergent eye gaze areas than the participants of the group A, indicating the difference of the two groups in terms of viewing the MCUIs. Furthermore, the eye gaze trajectories of the group B participants were more converged, regardless of the cardboard distributions. The same patterns of eye gaze fixations were also found when the participants positioned the cardboards in the target shelf slots (Fig.14 c and d), which indicated consistent influence of the MCUIs on eye gazes.
4.1.4 NASA-TLX questionnaire
After the participants completed the tasks, they were asked to complete NASA-Task Load Index questionnaires (Grier 2015) (see Appendix A). The questionnaires consisted of six dimensions for comprehensive assessment of cognitive load during repetitive tasks. The dimensions have been frequently concerned in studies assessing psychological stress (Virtanen et al. 2022) and cognitive load (Braarud 2021), and their correlations with cognitive load levels were strictly validated (Akyeampong et al. 2014). Table 1 summarized the mean levels of cognitive load of the participants in two groups.
Table 1 Result of NASA-TLX questionnaire
Subscale
|
Weight
|
Rating
|
Adjusted score
|
|
|
Group A
|
Group B
|
Group A
|
Group B
|
Mental Demand
|
3
|
26.25
|
25.38
|
78.75
|
76.14
|
Temporal Demand
|
1
|
43.5
|
40.19
|
43.50
|
40.19
|
Physical Demand
|
2
|
56.25
|
46.07
|
112.50
|
92.14
|
Performance
|
3
|
68.00
|
69.80
|
204.00
|
209.40
|
Effort
|
3
|
51.25
|
44.42
|
153.75
|
133.26
|
Frustration Level
|
3
|
28.75
|
26.35
|
86.25
|
79.05
|
Weighted sum
|
|
678.75
|
630.18
|
The analysis results showed a significant difference of perceived cognitive load between the two groups, as the level of cognitive load with group B was 7% lower than that with the group A (Mgroup A = 45.25, Mgroup B = 42.01, Mann-Whitney test p = 0.002), indicating the effectiveness of the MCUIs in reducing cognitive load.
4.2 Perceived usability
We adopted the SUS questionnaires to examine the perceived usability of the MCUIs in terms of effectiveness, ease of use, and overall satisfaction. The usability examination also validated whether the participants’ task performances were adversely influenced by any usability factors.
The results exhibited a significant difference of overall perceived usability scores between the two groups (Mgroup A = 78.750, SDgroup A = 8.586, Mgroup B = 86.974, SDgroup B = 6.378, Mann-Whitney test p = 0.004), indicating that the group B (MCUI) outperformed the group A in terms of overall perceived usability. This unexpected result suggests that the display of multiple visual clues did not impair system usability, instead it effectively enhanced over perceived usability by supplying assistive visual clues.
Furthermore, we analysed each aspect of perceived usability of the MCUIs. The results showed significant difference of effectiveness between the two groups (Mgroup A = 77.315, SDgroup A = 13.650, Mediangroup A = 75.000, Mgroup B = 90.351, SDgroup B = 6.952, Mediangroup B = 91.667, Mann-Whitney test p = 0.002). The result was consistent with the overall perceived usability result. In contrast, there was no significant difference of ease of use between the two groups (Mgroup A = 82.639, SDgroup A = 13.985, mediangroup A = 87.500, Mgroup B = 88.487, SDgroup B = 8.656, mediangroup B = 87.500, Mann-Whitney test p = 0.269). There was either no significant difference of overall satisfaction between the two groups (Mgroup A = 75.000, SDgroup A =14.852, mediangroup A = 75.000, Mgroup B = 81.579, SDgroup B =12.291, mediangroup B = 83.333, Mann-Witney test p = 0.154).
Taking together the above results, we claimed that the group B (MCUIs) participants felt confident about using MCUIs to improve parcel sorting task operation effectiveness e.g., lowering operation mistakes. In contrast, the group B participants were less confident about using the MCUIs to significantly enhance ease of use and overall satisfaction. Both groups’ participants complemented in the post-study interviews that they encountered few difficulties with or without the MCUIs. In this regard, we conservatively assume that the equivalent performances of ease of use and overall satisfaction are attributed to the simplicity design of MCUIs, visually and interactively.
Table 2 Result of perceived usability
Group
|
Effectiveness
|
Ease of use
|
Overall satisfaction
|
User Interface of group A(n=20)
|
77.315
|
82.639
|
75.000
|
User Interface of group B(n=20)
|
90.351
|
88.487
|
81.579
|
The Mann-Whitney U statistic
|
76.500
|
135.500
|
125.000
|
The Mann-Whitney z statistic
|
-3.037
|
-1.105
|
-1.425
|
p
|
0.002*
|
0.269
|
0.154
|
∗p < 0.05 ∗ ∗p < 0.01
4.3 Summary of results
To present a complete picture of the study findings, we concisely summarised all the results (Table 3). Based on the results, the H1 is supported, as both the eye tracking analysis and the questionnaire results showed consistent results about lower physical fatigue with the MCUIs group. The H2 is partially supported, as the NASA-TLX questionnaire results showed significant lower levels of cognitive load with the MCUIs group, whereas the eye tracking analysis showed inconsistent results: the MCUIs group had significantly more convergent gaze areas, but their eye blinking and eye fixations showed no difference from the other group. The H3 is also partially supported, as the MCUIs group reported significantly higher scores of overall perceived usability and effectiveness, whereas their feedback reported no difference in terms of ease of use and overall satisfaction.
Table 3 Summary of study results
Measures
|
Metrics
|
Results
|
Cognitive load
|
Blink duration
|
no significant difference, p = 0.587
|
Fixation duration
|
no significant difference, p = 0.649
|
Gaze area
|
significant difference, p = 0.044
|
NASA-TLX
|
significant difference, p = 0.002
|
Perceived usability
|
Overall usability
|
significant difference, p = 0.004
|
Effectiveness
|
significant difference, p = 0.002
|
Ease of use
|
no significant difference, p = 0.269
|
Satisfaction
|
no significant difference, p = 0.154
|