Nine longitudinal preceptors from ICS course were recruited: five were internists; three were Family Medicine physicians, and one was a pediatrician. On average, they had been teaching medical students for 14 years (range: 5-38 years). Three had been clerkship directors and two had been residency program directors. Five were female, and four were male. Each reviewed a video of one of their recent longitudinal mentees, and one video of a student they had not worked with previously. Of the nine student videos selected, five of the students were male and four were female. The average checklist performance score (as assessed by SPs) was 63.5% (range: 46-80%). Seven of the assessors completed their observations and think-alouds in person with an author (EH) during a single session that typically lasted about an hour. Due to the 2019 COVID pandemic, the final two participants completed the process via GoogleMeet (Google, Mountain View, CA).
When an assessor was familiar with the student, they provided on average 8.6 entrustment determinations and think-alouds per 20-minute video (range: 5-22) compared to an average of 9.2 determinations (range: 5-22) when they had not observed the student previously. As listed in Table 1, the most common initial entrustment rating (made within the first four minutes) when assessors were familiar with a student was: “2b: With supervisor in room ready to step in as needed” which was slightly lower than the most common initial entrustment rating made by assessors unfamiliar with a student: “3a: With supervisor immediately available, ALL findings double checked.” The most common final, overall entrustment rating remained the same as the initial rating for both groups, but the range of entrustment ratings narrowed when assessors were familiar with students (2b - 3b) compared to when they were unfamiliar with students (2a - 5). The most common confidence ratings increased to “high confidence” in both groups, although when assessors were familiar with student’s prior performance they had a wider range of confidence (2-4) than when they were unfamiliar student’s prior performance (3-4).
The average length of each transcript was approximately 1500 words and each think-aloud was 150-300 words in length. Three rounds of iterative coding of four transcripts were required to develop an initial codebook. No new codes emerged after these initial four transcripts, confirming our sample size of eighteen transcripts was likely to achieve thematic saturation (33). Fifteen subthemes emerged 764 times during the coding process, which were further organized into four themes. Definitions for each theme/subtheme are available in Table 2.
Student Performance - Observable or inferred student activities - often described as skills. This theme included “student behaviors,” “inferred clinical reasoning,” and “patient rapport.” and represented two-thirds of transcribed content (66%). Student behavior, as a subtheme, indicated that assessors was commenting on something they observed the student doing. This was frequently described in neutral terms, (e.g., “the student is asking about past medical history”) but was also occasionally described as correct (e.g., “they asked the key history questions”) or incorrect (e.g., “they failed to listen to the heart”). Clinical reasoning was typically inferred from what the student was doing:
[These questions] assure me that [the student is] thinking about potentially red flag issues that might have brought the patient in.(Assessor A, Unfamiliar student)
or when the assessor questions the students’ clinical reasoning:
I'm not sure where [this is] all going.(Assessor B, Unfamiliar Student)
Student rapport related to how the assessor understood the patient to be relating to the student. This code had two variants: patient response,
The patient seems comfortable with him as well.(Assessor A, Unfamiliar Student)
and student effort:
The student [is] using active listening and summarizing to build rapport.(Assessor A, Unfamiliar Student)
Frame of Reference - How the assessor understands the task at hand to include personal context or differences in understanding related to the purposes of the assessment. The next most common theme included seven different subthemes: “future training requirements,” “Assessor preference/self,” “affective reactions” to the student’s performance, the “student’s phase of training,” “previous exposure to student performance,” “comparison with other students,” and “the curriculum.” Future training requirements revealed assessors were considering what supervision was needed in the future versus what supervision was required during the current encounter. For example, most of the time the assessors commented on what supervision the student currently required:
I would have wanted to be in the room and at least initially to be able to jump in.(Assessor G, Familiar student).
Occasionally, however, they discussed levels of supervision in the future tense – indicating they were considering what level of supervision the student would require in the future:
She'd have to work with a supervisor before she could conduct this stuff in an independent fashion. (Assessor E, Unfamiliar student)
Assessor preference/self was used when the assessor referred to themselves as frame of reference:
I would have put this patient in the chair and done the interview in the chair to make it a little more relaxing.(Assessor H, Unfamiliar student)
Assessors’ affective response appeared to manifest as disappointment, pain, feeling good/better, discomfort and surprise:
In the past I’ve seen her do this incredibly well so I’m a little surprised that she didn’t do as well on this [case].(Assessor H, Familiar student)
Phase of training represented when entrustment decisions were informed by where the student was in training rather than just their performance:
But there's part of me that also is inhibited by the fact that he's a second-year medical student.(Assessor B, Familiar Student)
An exemplar of how familiarity with a student’s prior performance can influence entrustment was:
[My rating is] based partly on my past-experience with her and what I've observed this time.(Assessor H, Familiar student)
Conversely, a lack of familiarity impacted entrustment also impacted entrustment:
Right now, I'm putting low confidence just because I don't know the student and I'm realizing that does make an impact.(Assessor D, Unfamiliar student)
Assessors occasionally compared the student in the second video to the student from the first video. Of note, the order of the videos was randomized (familiar/unfamiliar) and this subtheme did not appear more often for either type of relationship.
So, she's doing a better job than the last student characterizing the complaint - she seems to be more methodical.(Assessor G, Unfamiliar student)
The curriculum was evoked as a frame of reference for what the assessors expected. Occasionally, it was referenced as an “excuse” for the student not performing as expected. For example, this assessor is referencing a perceived inadequacy in the curriculum, despite assessing a student they longitudinally instructed:
Based on what I know about the curriculum here [...] I wonder if she may not have the ability to form a broad differential for that [chief complaint].(Assessor A, Familiar Student)
Assessor uncertainty - When the assessor questions that their ability to observe the student adequately. The third theme included “assessor confidence” in their ability to assess, “compromised information,” and concern regarding an “insufficient number of assessments.” Assessor confidence was commonly expressed near the end of the student’s performance and reflected their uncertainty about their own ability to assess aspects of a student’s performance without giving an explicit explanation:
I still don't know what his level of knowledge is… and I don't know about his clinical decision making…so I think he's able to conduct the interview, but not necessarily.(Assessor G-Longitudinal student)
In contrast, compromised information had to do with an inability to observe what the student was doing, typically due to the camera angle of the video:
I can’t tell, but it does not look to me like the bed is at a 45 degree angle… it looks much flatter because we’re looking down.(Assessor H-Novel student)
Insufficient number of assessments referred to an entrustment score being limited because of a single observation and the need for further exposure to the student:
Given the opportunity to observe [the student] several more times, I might be willing to fairly quickly move him up in the in the supervisory level scheme. But just with one […] single observation, [I] wouldn't go any higher just yet.(A-N)
The Patient - Details specific to the patient, like acuity and risk associated with care. This least common theme related to “patient safety” and “patient characteristics.” These subthemes did not relate to the quality of the student’s performance. For example, if a student neglected to ask a critical question, failed to complete an important part of the physical exam, or neglected a diagnostic option that might threaten patient safety, these errors were considered student performance. Patient safety, as a subtheme, was noted regardless of student performance:
The patient wasn't in extremis, so I don't think that the supervisor needed to be in the room.(Assessor F-Longitudinal student)
Similarly, “patient characteristics” had little to do with student performance, but instead typically highlighted an assessor’s desire to see “such a complicated patient” before they “left the clinic.”
Comparison of the subthemes we identified when assessors observed students with whom they had a longitudinal relationship as preceptor to those when they were unfamiliar with the student revealed several key differences (Table 3). When assessors were unfamiliar with a student, “self” as a frame of reference subtheme was more prevalent (6/9 vs 3/9 videos) as was the lack of confidence in their ability to assess subtheme (7/9 vs 4/9). When assessors were familiar with a student, the subtheme related to referencing previous experiences with a student was more prevalent (6/9 vs 0/9) along with the affective response subtheme (7/9 vs 4/9). The concern regarding an insufficient number of assessments subtheme was comparable for both groups: assessors desired additional opportunities to assess familiar students in three out of nine videos compared to five out of nine videos of unfamiliar students.