This study examined the one-year test-retest reliability of baseline SCAT5 performances in National Rugby League Women’s Premiership players. Examining SCAT5 test-retest reliability is complex. The current study revealed low test-retest reliability of each SCAT5 component according to conventional standards of interpretation. In this study, the low correlations may, in part, be due to the skewed distribution of scores. The SCAT5 component scores have ceiling/floor effects,{Cameron, 2021 #2637;Hänninen, 2021 #1658;Norheim, 2018 #2053} leading to bias when calculating reliability statistics and a lower magnitude of correlations. When interpreting post-injury test scores, concerns with low reliability are partially mitigated by having published SCAT5 normative reference values that can be used for these professional women athletes.
Previous studies have reported test-retest reliability for concussion assessment tools, including previous iterations of the SCAT,{Broglio, 2018 #163;Caccese, 2023 #3332;Chan, 2013 #3326;Chin, 2016 #2060;Hänninen, 2017 #1665} computerized neurocognitive test batteries (e.g. ImPACT),{Brett, 2017 #184;Elbin, 2019 #169;Iverson, 2003 #3317;O’Brien, 2018 #167} the SAC,{McLeod, 2006 #3325} and the Balance Error Scoring System (BESS).{Broglio, 2009 #3330} A number of statistical methods have been employed to examine test-retest reliability, including Pearson’s r, Spearman’s rs, intraclass correlation coefficients (ICCs) or Kappa coefficients, and the generalizability coefficient (G). We chose Spearman’s rs and ICCs due to the non-normal distribution of these data, facilitating comparisons with prior studies. Reliability data pertaining to SCAT5 metrics is limited, however, data from the SCAT2 and SCAT3 give clinicians an idea of the temporal stability and diagnostic utility of the measure. Nonetheless, there is a dearth of literature investigating SCAT metrics in women athletes alone. In a large cohort of NCAA athletes and military collegiate athletes, poor test-retest reliability coefficients were seen across all symptoms scores (ICC = 0.029–0.331).{Caccese, 2023 #3332} The Concussion Assessment, Research and Education (CARE) Consortium reported low reliability coefficients (ICC = 0.34–0.51) across the SAC, BESS, SCAT-symptom evaluation, and Brief Symptom Inventory-18 in a large cohort of NCAA athletes and military cadets.{Broglio, 2018 #163} Similarly, in a large cohort of military cadets (n = 4,875), test-retest reliability across one year was low for the SAC (ICC = 0.32–0.34), BESS (ICC = 0.28), symptom number (kappa = 0.21), and symptom severity (kappa = 0.16).{Houston, 2021 #3320} When considering subcomponents of the SCAT, reliability coefficients for the SAC in youth athletes have been reported as low (ICC = 0.46, r = 0.46).{McLeod, 2006 #3325} In uninjured children and youth athletes, symptom components of the SCAT2 were also reported as having low to moderate reliability, with ICCs of ≤ 0.50.{Chan, 2013 #3326} Test-retest reliability of the BESS have been reported as low (r = 0.40–0.53; ICC = 0.59){Broglio, 2018 #163;Chin, 2016 #2060} and it can vary with contextual factors such as the testing conditions and environment.{Broglio, 2018 #163;Cameron, 2021 #2637;Houston, 2021 #3320}
There are several factors that might be important when examining the test-retest reliability of the SCAT5, including the time between test and retest, age, the amount/quality of sleep the night before testing, and the test environment.{Moser, 2011 #2962} Our study assessed the one-year test-retest reliability of the SCAT5. For instance, reliability metrics may be higher if the retest was administered over a shorter timeframe (i.e., within two weeks). In professional male ice hockey players who were retested over a two-week timeframe, there were higher correlations for symptoms number (rs=0.85) and symptom severity (rs=0.84) on the SCAT, but only low to moderate correlations for the SAC total (rs=0.58) and mBESS (rs=0.40).{Hänninen, #1658} In high school and collegiate athletes, re-administration of the SCAT3 at a seven day interval yielded ICCs of 0.42 to 0.64, compared to the 196 day interval, where ICCs were 0.39 to 0.54.{Chin, 2016 #2060} Obtaining a preseason baseline on every player can sometimes be challenging due to logistics and staffing resources.{Echemendia, 2012 #146;Erdal, 2012 #112;Iverson, 2015 #3328;Schatz, 2013 #195;Schmidt, 2012 #178} In addition, when evaluating a potential concussion in players from diverse linguistic backgrounds where norms have not been established, usings a player’s preseason baseline scores to evaluate change may be preferred over using normative reference data.{Echemendia, 2020 #3324} Having a reliable and accurate preseason baseline can be helpful, especially in cases where people obtain high scores or lower-than-expected scores at baseline. For example, a player who has a high score on the SAC at baseline (e.g., 29 or 30) may show reliable change following a suspected concussion but still fall within the broadly normal category (e.g., scoring 26 or 27). Thus, the normative data might suggest that her score is ‘normal’ but the reliable change from her baseline might suggest that she is experiencing some negative cognitive effects from a concussion (especially when combined with other clinical information). In contrast, a person might make more errors than most people, at baseline, on the mBESS (e.g., 5 or more error points)—so this would be important to know when interpreting her mBESS score after a suspected concussion. {Elbin, 2013 #3327}
Clinical Implications
This study has implications for the interpretation of change on the SCAT5 in professional women’s rugby league. The reliable change methodology is meant to supplement and enhance, not replace, clinical judgement. This methodology provides information on how much change in test scores is relatively common in uninjured athletes, so that when an athlete has an injury, or suspected injury, then normal variation in test scores can be considered. If one applies the 70% confidence interval for interpreting change, that means that 70% of uninjured athletes, when tested twice over consecutive seasons, will show that amount of change and only 30% of athletes will exceed that amount of change, in either direction (15% worsening or 15% improving). The tables in this article illustrate that women athletes are not expected to obtain the exact same score on the component tests of the SCAT5 when tested twice, but they also are not expected to have large variations in test scores, especially on the cognitive testing (SAC) and the mBESS. As seen in Tables 4 and 5, a worsening by 2 or 3 points on the SAC is relatively uncommon, and in the context of a suspected concussion this amount of change is likely clinically important. Moreover, using the normative data, from the published norms{Iverson, 2021 #1241} and from Table 6, a score of 25 on the SAC is a relatively low score for women in the NRLW. Thus, a clinician might consider scores of 25 to be relatively low and worsening by 2–3 points from baseline being relatively uncommon. Similarly, for the mBESS, obtaining 5 or more error points is relatively uncommon for women in the NRLW{Iverson, 2021 #1241}, as is worsening by 2 or more error points from baseline. Such a change in the context of a concussion evaluation is likely clinically important (although factors separate from concussion might influence mBESS performance and need to be considered).
Interpretation of Symptoms Can Be Challenging
Symptoms are more difficult to interpret. Symptoms can vary considerably over time, and they can be influenced by a number of factors separate from concussion, such as over-training, life stress, insufficient sleep, and mental health difficulties. When considering normative data for the NRLW,{Iverson, 2021 #1241} (and Table 6), endorsing 5 or more symptoms or having a symptom severity score of 8 or greater is relatively uncommon. However, the context in which the symptoms are being assessed is very important, and if a player is being evaluated ultra-acutely following a blow to the head in sports, such as on the sideline or in the changeroom, then the player is likely focused on reporting symptoms that she thinks are or might be associated with an injury—not necessarily symptoms from any etiology. Therefore, reporting a small number of symptoms might be clinically important—more so than if those same symptoms were reported during the preseason.
Of course, it is important to use clinical judgement when interpreting post-injury SCAT5 scores and not over-rely on data relating to change across two seasons—especially regarding symptoms. Endorsing only two symptoms during a sideline evaluation, such as headache and balance problems, might be clinically important and indicative of a concussion. Having two or more acute symptoms of concussion, following a blow to the head in sports, is sufficient for meeting criteria for a ‘suspected mild traumatic brain injury’ even if no clinical signs can be documented and the athlete performs in the normal range on cognitive and balance testing.{Silverberg, 2023 #3432}
Limitations
This study has several limitations. We had a relatively small sample size (n = 63) compared to other studies examining test-retest reliability of the SCAT.{Bruce, 2022 #2039;Hänninen, 2018 #1659;Petit, 2020 #2043;Tucker, 2021 #1668} Additionally, while the SCAT5 was administered by team physicians who undertook standardised training, the same physician may not have conducted the assessment at both time points. However, in a prior study, having different administrators was not associated with having clearly different SCAT3 reliability.{Hänninen, 2017 #2792} Furthermore, we did not include the optional 10-word list for immediate memory in this study because only 12 athletes were administered this version. Using the 10-word version in future studies may decrease ceiling effects that are common when using the 5-word list.{Echemendia, 2017 #1628} Nonetheless, our sample is reasonably representative of the entire cohort of NRLW players active across two seasons, improving our external validity. Another limitation is that player demographics were not collected and could not be analysed (e.g. history of mental health diagnoses, other heath conditions, learning disabilities, prior concussions, race, ethnicity, education, primary language spoken, years played). Further studies with larger cohorts may be required to determine whether these variables are related to SCAT5 raw scores and whether that effects test-retest reliability.