Study selection
A total of 2,315 studies were identified across databases. After duplicate removal, we screened titles and abstracts of 1,434 studies. A total of 173 full-text articles were assessed for eligibility. Of these, 88 met the inclusion criteria common to neonatal and paediatric populations, and 24 provided relevant data for this neonatal review (Figure 1). A list of excluded full-text manuscripts is provided in Additional file 2.
Study characteristics
Included studies were published between 2008 and 2018. Most were conducted in developed countries; however, one study from Kenya, one from Guatemala, one from Lebanon and two from Mexico were included (Table 1). Nine studies had a control group, six of these used random allocation. The remaining 15 studies had a single group pre-post intervention design. Two studies provided patient outcome (Kirkpatrick level IV), 14 provided clinical performance outcome (level III), and 15 provided learning outcome (level II). Number of participating health care providers ranged from 16 to 305 (Table 1).
Risk of bias within studies
The risk-of-bias judgements of the 6 randomized studies are presented in Table 2. Five studies received an overall judgement of “some concern” (17–21). One study received “high-risk” due to three domains with some concern, including the randomization domain (22). All but one study had low risk of bias in the missing outcome domain, reflecting limited dropout from the short-term educational interventions. Two studies had published their trial protocols (20,22), which was necessary to obtain “low risk” in the selected results domain.
The risk-of-bias scores for 18 non-randomized studies are presented in Table 3. As in the randomized studies the dropout was low in all studies. Two of three cohort studies received an overall risk-of-bias score of 4 (out of 6) (23,24). The third cohort study scored 2 due to suboptimal description of the intervention and control groups, and to non-blinded outcome assessment (25).
Results and synthesis of patient outcome (Kirkpatrick level IV)
Two studies included patient outcomes (Table 1). Walker et al. conducted a cluster randomized study of 12 intervention hospitals matched (number of births, caesarean rate, mortality, complications, and number of operating rooms) with 12 control hospitals in Mexico (22). Hospitals with higher than average maternal mortality were selected from a list of government run facilities with 500-3,000 annual births. Intervention hospitals received 2+1 full days of training including interactive team and communication exercises, skills sessions, and in situ simulation of obstetric and neonatal emergencies. Control hospitals received no training during the study period. In total 305 (9%) of 3228 eligible health care professionals (nurses and doctors) received both training modules during 2010-2012. Statistical analysis adjusted for matching and presence of a NICU due to imbalance at baseline; 83% of control hospitals and 42% of intervention hospitals. Incidence of hospital-based neonatal mortality tended to be lower at the intervention hospitals; IRR (4 months) 0.73 (95% CI 0.45-1.17), IRR (8 months) 0.59 (0.37-0.94), IRR (12 months) 0.83 (0.50-1.37). However, we were concerned about the validity because the study received an overall high risk-of-bias judgement (Table 2). The intended primary outcome of perinatal mortality was changed to in-hospital neonatal mortality due to poor reporting of stillbirths (not further specified). The intervention covered both obstetric and neonatal emergencies, thus any change in mortality really reflects the combined perinatal emergency care, not neonatal resuscitation alone.
Charafeddine et al. conducted a single group pre-post intervention study at 22 hospitals in Lebanon (26). The Hospitals were part of a larger National Collaborative Perinatal-Neonatal Network (NCPNN) covering 32 hospitals. Intervention was an 8-hour session including 40 minutes teaching including the NRP algorithm, hands-on simulation on low-fidelity manikins, and finally “megacode” simulations including all steps of neonatal resuscitation. Some 256 professionals (doctors, nurses, midwives) were trained during 2009-2011; the selection process and participation rate was not described. Patient outcomes were retrieved from surveillance data on mortality at hospital discharge and neonatal morbidity from the NCPNN network. The first intervention year (2009) was chosen as reference; mortality odds ratio (OR) decreased steadily from 1.53 (95% CI 1.18-1.98) in 2006 to 0.72 (0.54-0.96) in 2013. Furthermore, the years 2011-2013 had fewer infants requiring oxygen at birth, bag and mask ventilation, intubation, and chest compressions, compared with 2009. The study obtained a low NOS score of 1 out of maximum 3 for a study with no control group and non-blinded outcome ascertainment, indicating risk of bias. There was no presentation of neonatal mortality rates for the 10 non-participating network hospitals likely also contributing surveillance data. Thus, we are concerned that other factors than the simulation training may explain the observed change in mortality. We acknowledge that the authors also do not emphasize this finding, but rather the participants’ change in knowledge (included below).
In summary of patient outcomes, we identified one randomized study and one single group pre-post study that reported a measure of neonatal mortality (22,26). Both studies were from developing countries and indicated lower hospital-based neonatal mortality after simulation-based training. Both studies had a high risk of bias, and the randomized study by Walker et al. also included obstetric emergency training, thus hampering interpretation of effects of neonatal team training.
Results and synthesis of clinical performance (Kirkpatrick level III)
We included 14 studies with clinical performance outcomes (Table 4). Eight studies had a control group, five with random allocation. Six studies were single group pre-post design. One study by LeFlore et al. simulated neonatal transport cases (24), the rest simulated neonatal (delivery room) resuscitation.
Two randomized studies evaluated the effects of simulation-based team training after approximately 3 months (18,19). Rubio-Gurung et al. conducted a large cluster randomized study in 12 hospitals in France (19). They compared 4-hour high fidelity in situ multidisciplinary team trainings of 6 professionals with no simulation training. They trained 80% of the delivery room staff within 1 month, amounting to 202 professionals in 6 intervention hospitals. Simulation-based evaluations were conducted for a random sample of professionals before (n= 116) and after (n= 114) intervention. No differences in baseline evaluations were observed. Significant improvements were demonstrated 3 months later for technical skills, team performance and global performance (Table 4). The study received a low risk-of-bias judgement in 5 of 6 domains (Table 2). Overall, we consider the study by Rubio-Gurung et al. important and well conducted. Lee et al. conducted a randomized study of 27 emergency medicine residents (18). They were randomized to a 4-hour high fidelity simulation-based session (45 min. didactics) on neonatal resuscitation, or the standard emergency medicine curriculum, including monthly paediatric (occasional neonatal) simulations. Baseline data were similar in both groups. Simulation-based evaluation after 16 weeks demonstrated no change in the neonatal resuscitation score for the control group, but a 12-percentage points improvement in the intervention group (Table 4). The intervention group also significantly reduced the time to warm, dry, stimulate, and hat on the infant compared to controls (p= 0.017). The study received an overall risk-of-bias judgement of some concern (Table 2). The randomized studies by Rubio-Gurung et al. and Lee et al. supports improved team performance and technical skills 3 months after simulation-based team training (18,19).
Two randomized studies extended re-testing to 6 months (20,21). Sawyer et al. conducted a study of 30 residents randomized to either standard oral debriefing or video-assisted debriefing at 3 high-fidelity simulation sessions approximately 2 months apart (21). Baseline data were similar in both groups. No significant differences in neonatal resuscitation performance score and time to perform critical actions were observed at 6 months comparison of standard oral and video-assisted debriefing groups (Table 4). The study received a low risk-of-bias judgement in 4 of 5 domains (Table 2). Thomas et al. randomized 98 interns to either standard NRP training (comparator) or NRP plus 2 hour session on communication and teamwork and low- (intervention group 1) or high (intervention group 2) fidelity simulation (20). At 6 months follow-up the intervention groups (analysed together to increase power) exhibited more teamwork behaviours per minute (11.8) than controls (10.0) (p=0.03). However, no differences were observed for NRP performance score, duration of resuscitation, vigilance, or workload management. The study received low risk-of-bias in 3 of 5 domains (Table 2).
Bender et al. investigated whether a NRP booster at 9 months improved performance at 15 months evaluation (17); 50 residents were randomized to either a half-day NRP booster with high-fidelity simulations (intervention) or to routine clinical duties (comparator). At 15-months evaluation the intervention group scored higher on both technical score and team performance score (Table 4). The study was of some risk-of-bias concern (Table 2).
Two cohort studies explored minor interventions related to simulation-based team training. Rovamo et al. studied 99 doctors, nurses and midwives on a one day high-fidelity neonatal resuscitation course (23). Both intervention and control groups had the same simulation training, but in addition the intervention group received a 1-hour interactive lecture on crisis resource management (CRM) and anaesthesia non-technical skills principles. There was no difference in team performance score in the two groups after the lecture (Table 4). The study received a NOS bias score of 4 of 6, indicating low to moderate risk-of-bias (Table 3). LeFlore et al. studied a neonatal transport team over 2 years (24); the first year they trained with high fidelity simulation and self-paced modular learning (control), the second year they used high fidelity simulation and expert modelled learning (intervention). Some, but not all team members participated both years. There was no significant change in team performance score (Table 4). The study received a NOS bias score of 4 of 6, indicating low to moderate risk-of-bias (Table 3).
Barry et al. studied a group of 28 first year residents (intervention) and compared them to a group of 24 senior residents (control) (25). The intervention was half-a-day equipment workshop and in situ simulation-based team training. The control group was senior residents with NRP course and routine clinical duties. Re-testing was done after 1 month, and after 1-2 years. The intervention group’s global performance score increased from a lower level before training to the same level as the senior residents after training (Table 4). The study received a NOS score of 2 of 6 indicating moderate to high concern of bias (Table 3).
Dadiz et al studied 228 perinatal health care professionals over a 3-year period (27); 90-minute multidisciplinary high-fidelity trainings were conducted in a simulated delivery room. Over the years, increasing communication checklist scores was observed (Table 4). The study received a NOS risk-of-bias score of 2 (Table 3). Five single group pre-post design studies of simulation-based team training by Walker et al., Sawyer et al., and Cordero et al. observed improved team performance scores 0-6 months later (Table 4) (28–32). NOS risk-of-bias scores ranged from 1-3 of 6 (Table 3).
In summary of clinical performance, randomized studies showed effects of team training in simulated re-evaluations after 3 and 6 months. Booster simulation sessions 9 months after NRP improved performance at 15 months evaluation. One randomized study showed no differences in team performance comparing standard oral debriefing and video-assisted debriefing. One single-group study showed steadily improving communication performance during a 3-year intervention with yearly simulation training. Six smaller single group studies showed that simulation-based team training improved team performance scores 0-6 months later.
Results and synthesis of learning (Kirkpatrick level II)
Two small studies with random allocation to the intervention and control group presented self-reported outcome on knowledge and confidence. Bender et al. observed no significant difference in knowledge 15 months after a simulation-based NRP boost at nine months (17). Lee et al. observed significant improvements in confidence in neonatal resuscitation after 16 weeks in both the intervention and control group, but no statistically significant difference between groups (18). Both studies had limited power to detect a difference.
A total of 13 studies with single group design presented self-reported learning outcome, we briefly summarize the findings of 7 studies with more than 50 participants (Table 1). They all evaluated outcome immediately after the simulated neonatal resuscitation team training intervention or within a 2-3 months period. Self-assessed improvements were reported for neonatal resuscitation knowledge (26,28,33–35), self-efficacy (28,34), communication (33,36), and leadership, confidence and technical skills (36). Dadiz et al. specifically trained and studied delivery room communication, and interestingly the health care professionals reported significant improvements in team communication in real clinical situations over a 3-year study period (27).
In summary of learning outcomes, the single-group design studies all reported significant improvements in self-reported outcomes, but 2 small randomized studies found no difference in improvements between the intervention and control groups.
Risk of bias across studies
Within each group of studies (according to Table 1) the funnel plots were quite symmetric, and no major concern about publication bias was raised (Figure 2). We found no indication of selective reporting bias, as methods section and reported result were coherent in all studies. Pre-published study protocols were available for only 2 of 6 randomized studies, which was reflected in the risk-of-bias score.