For 2018 and 2019, rankings were from 1 to 20 (21 candidates including one internal candidate interviewed each of the two years), and for 2020, rankings were from 1 to 18 (19 candidates including one internal candidate interviewed). There were 8, 9 and 11 interviewers in 2018, 2019 and 2020, respectively, 7 of which interviewed in all 3 years.
Effect of interviewers being within the same virtual rooms versus different virtual rooms on ranking variability
There were 11 interviewers in 2020, which were separated into 5 different virtual interview rooms (4 rooms with 2 interviewers, 1 room with 3) and results in the permutation of 7 within-the-same-room (WSR) pairs (the 3-interviewer room has a permutation of 3 pairs) and 48 not-within-the-same-room (NWSR) pairs. The ranking range spanned between 0 (no difference in ranking range) to 17 (widest possible range), with the most common ranking difference being “2” amongst all WSR (n = 28, 22.2%) and NWSR (n = 114, 13.2%) pairs. Overall, the mean difference in the ranks between NWSR and WSR was 1.33 (95% confidence interval [CI] difference 0.61 to 2.04, p = 0.0003), with the NWSR pairs having a greater variability (greater difference in the ranks of the 2 interviewers) than the WSR pairs. This implies that WSR interviewers, on average, ranked candidates 1.33 places closer to each other than NWSR interviewers. Furthermore, when comparing the rank range of each of the 7 WSR pairs to the 48 NWSR pairs, WSR pairs ranked candidates closer to each other than NSWR pairs, although this difference is only significant or marginally significant in 3 of the 7 comparisons (Table 1).
Table 1
Comparison of ranking variabilities of interviewers who shared the virtual interview rooms to those who did not share a virtual interview room
Groups Compared
|
mean
|
|
|
|
diff.
|
95% CI diff.
|
|
|
(beta)
|
low
|
high
|
p - value
|
NWSR to WSR ALL
|
1.33
|
0.61
|
2.04
|
0.0003
|
***
|
NWSR to WSR 1
|
-3.03
|
-4.81
|
-1.24
|
0.0009
|
***
|
NWSR to WSR 2
|
-1.03
|
-2.81
|
0.76
|
0.2592
|
|
NWSR to WSR 3
|
-1.47
|
-3.26
|
0.31
|
0.1062
|
†
|
NWSR to WSR 4
|
-0.92
|
-2.70
|
0.87
|
0.3143
|
|
NWSR to WSR 5
|
-1.14
|
-2.93
|
0.65
|
0.2113
|
|
NWSR to WSR 6
|
-0.36
|
-2.15
|
1.43
|
0.6917
|
|
NWSR to WSR 7
|
-1.36
|
-3.15
|
0.43
|
0.1352
|
†
|
NWSR = not within the same room; WSR = within the same room
*p < 0.05, **p < 0.01, ***p < 0.001, †p < 0.15 (marginally significant).
Mean diff. (beta) = mean difference resulted from WSR subtracting NWSR rank range; a negative value denotes greater variability within the NWSR pairs when compared to WSR pairs.
|
When evaluating the variability of candidate categories, WSR pairs agreed 53.6%, 42.9% and 74.3% on “accept,” “alternate” and “pass” (weighted Kappa 0.41, 95% confidence interval [CI] Kappa 0.27 to 0.56, p < 0.050) (Table 2), whereas NWSR had significantly less agreement at 30.7%, 18.2% and 64.2% for the same 3 categories (weighted Kappa 0.16, 95% CI Kappa 0.10 to 0.21, p < 0.05). The weighted Kappa of NWSR is below the lower limited of the 95% CI Kappa of WSR, suggesting that there is significantly greater degree of agreement in the categorization of candidates for the WSR interviewers than for the NWSR interviewers (Fig. 1).
Table 2
Agreement of candidate categories between pairs of interviewers that are within the same room and not within in the same room
Ranking Categories
|
Interviewer #1 (below)
|
Interviewer #2
|
Weighted
|
95% CI Kappa
|
Kappa
|
Is Kappa in CI of other Kappa?
|
Accept
|
Alternate
|
Pass
|
Kappa
|
low
|
high
|
P < 0.05
|
Research Question 3
|
|
|
|
|
|
|
|
NWSR
|
Interviews Within the Same Room
|
Accept
|
15 (53.6%)
|
5 (17.9%)
|
8 (11.4%)
|
0.414
|
0.271
|
0.557
|
yes
|
no
|
|
Alternate
|
6 (21.4%)
|
12 (42.9%)
|
10 (14.3%)
|
|
|
|
|
|
|
Pass
|
7 (25%)
|
11 (39.3%)
|
52 (74.3%)
|
|
|
|
|
|
|
Interviews Not Within the Same Room
|
Accept
|
59 (30.7%)
|
62 (32.3%)
|
71 (14.8%)
|
0.159
|
0.104
|
0.214
|
yes
|
|
|
Alternate
|
56 (29.2%)
|
35 (18.2%)
|
101 (21%)
|
|
|
|
|
|
|
Pass
|
77 (40.1%)
|
95 (49.5%)
|
308 (64.2%)
|
|
|
|
|
|
|
CI = confidence interval; NSWR = not within the same room; WSR = within the same room |
Weighted Kappa outside CI of the other Kappa means there was significant differences in candidate category agreement between WSR and NWS |
Effect of in-person versus virtual interviews on ranking variability
A 3-way comparison of variability between 2018/2019 (in-person interviews) and 2020 (virtual interviews) ranking variabilities amongst interviewers who were present at all three years showed no significant differences between in-person and virtual interviews (Table 3). When comparing agreement of candidates being in “accept,” “alternate” and “pass” categories, the weighted Kappa statistic was 0.086 for 2018, 0.158 for 2019, and 0.101 for 2020 (p < 0.05 for all years). Since the weighted Kappa statistic for 2020 is within the 95% CI of the weighted Kappa statistic for both 2018 and 2019, the 2020 virtual interviews did not result in a greater degree of disagreement in the categorization of candidates when compared to the in-person interviews in 2018 and/or 2019 (Table 4).
Table 3
Comparison of ranking variabilities of in-person interviews (2018 and 2019) with virtual interviews (2020)
Groups Compared
|
mean
|
|
|
|
diff.
|
95% CI diff.
|
|
|
(beta)
|
low
|
high
|
p - value
|
2018 to 2019
|
-0.174
|
-0.780
|
0.433
|
0.5743
|
|
2018 to 2020
|
0.386
|
-0.237
|
1.009
|
0.2245
|
|
2019 to 2020
|
0.560
|
-0.063
|
1.183
|
0.0783
|
†
|
*p < 0.05, **p < 0.01, ***p < 0.001, †p < 0.15 (marginally significant). |
Mean diff. (beta) = mean difference resulted from subtracting the rank range from one year to another. |
Table 4
Agreement of candidate categories between pairs of interviewers between in-person interviews (2018 and 2019) and virtual interviews (2020)
Ranking Categories
|
Interviewer #1 (below)
|
Interviewer#2
|
Weighted
|
95% CI Kappa
|
Kappa
|
Is Kappa in CI of other Kappa?
|
Accept
|
Alternate
|
Pass
|
Kappa
|
low
|
high
|
P < 0.05
|
|
|
|
|
|
|
|
|
2019
|
2018
|
2018
|
Accept
|
17 (22.4%)
|
18 (23.7%)
|
41 (18%)
|
0.086
|
0.002
|
0.169
|
yes
|
|
|
|
Alternate
|
23 (30.3%)
|
14 (18.4%)
|
39 (17.1%)
|
|
|
|
|
|
|
|
Pass
|
36 (47.4%)
|
44 (57.9%)
|
148 (64.9%)
|
|
|
|
|
|
|
2019
|
Accept
|
24 (31.6%)
|
17 (22.4%)
|
35 (15.4%)
|
0.158
|
0.072
|
0.244
|
yes
|
|
|
|
Alternate
|
20 (26.3%)
|
15 (19.7%)
|
41 (18%)
|
|
|
|
|
|
|
|
Pass
|
32 (42.1%)
|
44 (57.9%)
|
152 (66.7%)
|
|
|
|
|
|
|
2020
|
Accept
|
20 (26.3%)
|
25 (32.9%)
|
31 (16.3%)
|
0.101
|
0.015
|
0.188
|
yes
|
yes*
|
yes*
|
|
Alternate
|
21 (27.6%)
|
13 (17.1%)
|
42 (22.1%)
|
|
|
|
|
|
|
|
Pass
|
35 (46.1%)
|
38 (50%)
|
117 (61.6%)
|
|
|
|
|
|
|
CI = confidence interval; Weighted Kappa within CI of other Kappa means there are no significant differences in candidate category agreement between 2020 and the other years. |
Candidates with the least and greatest ranking variabilities
For 2020, we assessed the variability of rank ranges for the overall most and the least attractive candidates. The overall least attractive candidate has the lowest variability (indicating a high degree of agreement amongst the interviewers), while the most attractive candidate has the second lowest variability. The greatest discordances were with the candidate ranked 8th (GLM estimated average difference = 5.93) and the candidate ranked 5th (GLM estimated average difference = 5.31). However, a candidate’s overall composite ranking (1 to 18) was not significantly associated with the variability in the individual pair’s rankings (p = 0.8597).