The primary and predominant purpose of the Apgar score has been to assess the status of an infant in the first few minutes of life.1,7 The rationale for such a scoring system is based on the understanding that having difficulties in the transition to extrauterine life is not good for the newborn, i.e. that such difficulties are associated with worse outcomes so identifying these babies could lead to interventions that could mitigate these outcomes. This is supported by the observation that the adoption of the Apgar score did not become widespread, and then universal, until there was evidence that low scores occurred far more frequently in babies who either died or had neurological deficits in the first year of life17, 18.
Over the decades, the score has consistently been used as a risk factor in clinical studies.7 It has been associated not only with an increased incidence of long-term neurological conditions, including cerebral palsy and seizures,19, but also with a wide variety of conditions such as attention deficit disorder/hyperactivity20, permanent dentition21, cancer22, food allergy23, autism spectrum disorder24, polycystic kidney disease25 and amblyopia26. The Apgar score is used as often for research into morbidities that manifest in the post-natal period, including all the discharge diagnoses used as short-term outcomes in this current study.10,27–35 Short term outcomes have also been used for all studies that have examined modifications or replacements for the Apgar, and any future such efforts are likely to do the same.36–38 It is noteworthy that the NRP does not use the one- or five-minute Apgar score. Rather, to identify which newborn infant might qualify for closer post-natal observation, it relies on one criterion, the need for respiratory support, but this could miss babies at risk for several of the diagnoses in the current study.2
This study used a range of morbidities occurring during the initial hospital stay to determine, first if a low Apgar score is more frequent in those babies who were given these diagnoses compared to those without the conditions and confirmed previous associations for the risk factors and the various short-term morbidities.
Our study also found that a low Apgar score at one or five minutes was found significantly more often for all but two of the outcome/Apgar analyses and the negative predictive value was generally strong across all outcomes (Table 3). When taking into account other risk factors (Table 4), at least one of the four low Apgar scores were statistically more frequently found in all but one of the ten outcomes. For NEC, when the other risk factors were added in, Apgar scores of ≤ 3 or ≤ 6 at one or five minutes was no longer significant.
The AUC value of the ROC is often used to assess the clinical value of a predictive model39 with higher values above 0.5 indicating a better model. We have used this to further analyze how much the presence of a low Apgar score contributes to identifying newborn infants who will go on to have one of the short-term outcomes included in this study. This confirmed that low Apgar scores can make a major and significant contribution in predicting HIE, which is not surprising since low scores are often part of the diagnosis4 and supports the validity of this analytic method. Overall, the inclusion of a low Apgar score added little to the predictive model. It was only statistically significant for the Apgar One_06 for RDS and MAS. Otherwise, it improved the AUC by less than 3.5%, and in many cases by less than 1%. In contrast, the addition of clinical factors to ROCs constructed by Apgars scores alone increased by 14–86%, indicating that the Apgar score does not contribute as much to identifying newborns at risk for short term morbidities as clinical factors.
There are several significant limitations to this study that should be addressed. The study used retrospective data from a single center. The ten outcomes had a wide range of incidences, which can have an effect on predictive values, for example, and several are associated primarily with prematurity, such as RDS, IVH, and NEC, but previous studies have included Apgar scores in risk assessments for these conditions.28,32,34 The accuracy of discharge diagnosis codes has been questioned.40 As one example, we found several instances where codes for both TTNB and RDS were assigned to the same subject. Our goal was to ensure that all potential cases were captured, so we used a wide range of codes. As a result, for some short-term outcomes, such as MAS, there was a high incidence. We do note that the codes are commonly used in retrospective neonatal studies, and they were used consistently within this single center. Other risk factors such as maternal age, race, or maternal chorioamnionitis were not included. We chose to use the two most common8 cut-off values at one and five minutes rather than all the Apgar scores from 1 to 10 to account for some of the known variability in scoring and capture a sufficient number of subjects per outcome to analyze. Other investigators have used the complete Apgar scale, usually.in long term outcome studies involving over one million.10 Finally, Dr. Apgar designed her system to assess the status of the infant immediately after birth. Starting in 1966,17 however, it has been used as a risk factor hundreds of times.
Strengths include a larger number of subjects than most studies which have examined the Apgar score in relation to short term outcomes. While we looked at the common ways of assessing a risk factor, such as sensitivity, specificity, positive and negative predictive value, and the odds ratio within the context of a multivariable analysis as well as the AUC of the ROC graphs, adding the AUC analysis with and without the low Apgar is a way to directly answer the question of its utility.
The Apgar score has been assessed around the world to an estimated three billion or more newborn infants over the last seventy years. During that time, concerns have been repeatedly raised about it. Yet it remains an important tool in the delivery room for assessing the immediate condition of the newborn. It appears to have good utility for assessing risks of long-term outcomes when applied to large populations, but our findings suggest that it is not a significant contributor to identifying newborn infants who would benefit from a higher level of care because of the risk of short-term outcomes.