In this study, we assessed the validity of institutional assessments in identifying future performances of medical students on the USMLE STEP1 examination. Using linear regression and Bland-Altman analysis, we were able to quantify the ability of our PreClerkship assessments in predicting USMLE STEP 1 scores. Our results indicate that the majority of year one and two institutional assessments do not accurately predict students’ performance on the USMLE STEP 1. Specifically, students’ scores on PreClerkship assessments lead to an overpredicted USMLE STEP 1 score. This is disadvantageous for faculty and administrators, who are unable to accurately identify high-risk students and provide early individualized interventions. Moreover, medical students engaging in self-reflection are unable to self-identify areas of needs and opportunities.
The first key finding of this study is the unreliability of MCAT and GPA in predicting USMLE STEP1 performance. When used alone, either have minimal predictive value for STEP1 performance of our students. Even combined, their correlation with STEP1 is weak at best and nonexistent at worst. Similar observations were made in our earlier studies(Gullo et al. 2015). However, prospective application of prediction modeling in this study shows that traditionally used pre-matriculation data is all but useless in predicting STEP1 scores for the candidates. Value of standardized pre-matriculation assessments has been questioned in recent years(Gauer et al. 2016) and our results highlight the need for a comprehensive, 360° evaluation of candidates for admission into the MD program. One possible reason for lack of STEP1 predictive value of MCAT and undergraduate GPA in our study could be the evolution of these standardized tests themselves. In recent years, STEP1 has moved away from simple recall of facts to more rounded assessment of problem-solving skills in medical knowledge, biostatistics, evidence-based medicine and medical ethics. We should note that this data does not suggest that MCAT or GPA have no role in predicting performance on standardized tests like STEP1. These parameters appear to have predictive value for the cohort, just not for the individual. The average MCAT for students matriculating in to JCESOM, Marshall University is around the 50th percentile, and so is the average for our USMLE STEP1 scores.
The traditional 2 × 2 curricular model delivers information in a manner that restricts integration of basic sciences with clinical application throughout the first year. Our organ-system based MS1 curriculum is focused on the pedagogy of normal structure and function of the human body while the MS2 curriculum covers principles of disease and therapeutics. This design narrows the scope of assessments of each course, limiting them to only the covered disciplines. One argument in favor of this design is that students have the opportunity to learn each organ system twice. But, in practice, it creates informational compartments where MS1 courses cannot adequately assess complex problem-solving skills due to lack of context of the disease process. Effectively, the organ-system based MS1 curriculum is just a different organization of the classical three discipline MS1 curriculum, biochemistry, physiology and anatomy. This setup renders the majority of first year assessments to memorization and recall of facts, ranked lower on the Bloom’s cognitive scale(Adams 2015). This style of assessment fails to utilize the high levels of Bloom’s taxonomy, application and analysis, which are the mainstays of USMLE STEP 1 questions(Weissbart et al. 2015).
Unsurprisingly, the second key finding of our study is that performance on most PreClerkship course-assessments poorly correlate with student’s actual USMLE STEP 1 scores. Most notably, Bland-Altman analysis reveals significant over-predictions for students underperforming on STEP1. Additionally, within the first year, significant instructor and discipline variability is readily apparent. Internal assessments with low variance lead to predictions that fail to accurately identify high-risk students; therefore, students fail to receive early interventions to prevent failure on USMLE STEP 1. These shortcomings inflate the students’ sense of preparedness who often show resistance to academic support—as they are “doing well” in the curriculum.
Theoretically, predictions regarding students' performance on USMLE STEP 1 should be more accurate in the second year of PreClerkship education, because assessments integrate basic science and clinical correlations. However, performances on internal assessments continue to over predict performance on USMLE STEP 1. This is again disadvantageous to students’ ability to perform self-directed learning.
The best correlated assessments are from the course DT3, which accounts for 66% variability in students’ scores. This course is different in its approach to pedagogy and assessments in our curriculum. It employs more active learning formats, like independent-learnings, team-based leanings and problem-solving exercises. The course is also more integrated than the other MS2 courses. Faculty and instructors spend significant time in the course revisiting foundational concepts of the MS1 curriculum, followed by pathophysiology and pharmacology of the covered systems. DT3 assessments also rank higher on the Blooms scale with greater number of critical-thinking and problem-solving questions. Notably, for the Class of 2020 DT3 was the only course in the PreClerkship MD curriculum to utilize a cumulative NBME assessment at the end of the course. This assessment utilizes multiple-choice questions that resemble the format of USMLE STEP 1 questions and provides performance feedback to students and faculty. Cumulative assessments may help students retain information for longer, as opposed to “binge and purge” of biweekly assessments of other courses. Additionally, cumulative assessments may aid in improved synthesis and integration of information, as these exams are likely to focus on key concepts rather than minute details. While the latter correlation allows faculty to identify high-risk students more accurately, the timing of intervention is not advantageous. Specifically, students complete this course late into the second year after USMLE STEP 1 preparation has commenced and the time to remediate areas of weaknesses is limited.
Identifying the need for institutional action and need for improved prediction of USMLE STEP1 scores, starting with the Class of 2021, and cumulative, customized NBME assessments are included in all organ-system based courses of the MS2 curriculum. These comprehensive exams better correlate with STEP1 performance than traditional course exams(Brenner et al. 2017). As before, we believe that cumulative assessments aid students’ ability to synthesize and retain information for longer. Additionally, faculty are able to provide early interventions and focus resources on high-risk students and medical students are able to use performance feedback to recognize areas of strengthens and weaknesses to guide their preparation for USMLE STEP 1. It is important to note however that not all NBMEs are similarly correlative. Our analysis indicates that this is due to differences in course design and assessments. Course(s) not dedicating enough time to integration of first- and second-year disciplines are not able to choose higher order questions from the NBME question bank. This situation is not dissimilar to institutional assessments i.e. customized NBME assessments may not be superior in of themselves, it’s the type of pedagogy and level of integration in the courses that appears to drive the outcome of reliable predictions of student performance.
The final key finding of this study is the validity and reliability of our STEP1 prediction model generated using targeted assessments, including the course-end NBMEs. As opposed to the retrospective model published by us(Gullo et al. 2015), and others(Coumarbatch et al. 2010), (Giordano et al. 2016), this model is effective prospectively. The model is deployed to identify at-risk students early and provide intervention if needed. Struggling students are identified early in the fall semester of MS2 of the MD curriculum and offered academic remediations, as needed. Reliability and reproducibility of our model is demonstrated with STEP1 scores for the Class of 2022, where multiple students were offered early assistance based on our prediction modeling.
These results go beyond performance on USMLE STEP1. They show us the reproducibility of knowledge acquired and tested during a traditional 2 × 2 MD curriculum. Lack of clinical context in the MS1 curriculum and failure to reconnect to foundations in the MS2 curriculum, render our assessments detail-focused and less reliable. Student often memorize facts in two-week bursts and might fail to the understand the interplay and crosstalk of systems network. Critical thinking and problem-solving skills are key to successful practice of medicine. These skills require application of foundational concepts to clinical problem-solving, which in turn are fostered by integration of concepts across disciplines and systems.