Conducting a systematic review or meta-analysis requires a significant amount of time. However, automation can be used to accelerate several steps in the process, particularly the screening phase (Adam et al., 2022; Cierco Jimenez et al., 2022; Cowie et al., 2022; Khalil et al., 2022; Nieto González, 2021; Pellegrini & Marsili, 2021; Qin et al., 2021; Robledo et al., 2021; Scott et al., 2021; Tsou et al., 2020; van de Schoot et al., 2021; Wagner et al., 2022; L. L. Wang & Lo, 2021). Artificial intelligence can assist reviewers with screening prioritization through active learning, a specific implementation of machine learning; for a detailed introduction we refer to Settles (2009). Active learning is an iterative process in which the machine continually reassesses unscreened records for relevance and the human screener providing labels to the most likely relevant records. As the machine receives more labeled data, it can use this new information to improve its predictions on the remaining unlabeled records, with the goal of identifying all relevant records as early as possible. Priority screening via active learning has been successfully implemented in various software tools such as Abstrackr (Wallace et al., 2012), ASReview (van de Schoot et al., 2021), Colandr (Cheng et al., 2018), EPPI-Reviewer (Thomas et al., 2020), FASTREAD (Yu et al., 2018), Rayyan (Ouzzani et al., 2016), RobotAnalyst (Przybyła et al., 2018), Research Screener (Chai et al., 2021), DistillerSR (Hamel et al., 2020), and robotreviewer (Marshall et al., 2017). However, among these tools, only ASReview offers the flexibility to implement the suggested model-switching approach proposed in this paper. For a curated comparison of these software tools, see van de Schoot (2023).
Priority screening via active learning allows for a more efficient and effective screening process compared to manual screening methods. By focusing early screening efforts on the most relevant records, screening fatigue might less likely result in records that may be missed using traditional screening approaches such as screening by year, title, author, or randomly. Moreover, with active learning, the relevant records are found early in the screening process, allowing the review team to start on subsequent steps of the review while less relevant records are still being screened. Another advantage of active learning is that it allows for a more sensitive and reproducible search with less filtering. With manual screening, search strategies are often designed to lead to a manageable amount of records, which may require applying filters or limiting the number of search terms. However, these filters can reduce the sensitivity of the search and may introduce bias, while also limiting the reproducibility of the search over time. Active learning can sort through large amounts of data more efficiently than manual screening and thus requires less filtering, enabling a more sensitive and reproducible search. Overall, active learning is a promising method for systematic reviews and meta-analyses due to its more efficient, effective, and transparent screening process.
However, determining the optimal point to stop screening is a critical and challenging task when using active learning. The main goal of active learning is to screen fewer records than random screening, so it is important to find an efficient stopping point in the active learning process (Yu & Menzies, 2019). However, defining a stopping rule is difficult as the cost of labeling additional records must be balanced against the cost of errors made by the current model (Cormack & Grossman, 2016). Active learning models continually improve their predictions as they receive more labeled data, but the process of collecting labeled data can be time-consuming and resource-intensive. While finding all relevant records is nearly impossible, even for human screeners (Z. Wang et al., 2020) it is important to consider that in the absence of labeled data, the number of remaining relevant records is unknown. Therefore, researchers may either stop too early and risk missing important records or continue for too long and incur unnecessary additional reading (Yu et al., 2018). At some point in the active learning process, most, if not all, relevant records have been presented to the screener, and only irrelevant research remains. Thus, finding an optimal stopping point is crucial to conserve resources and ensure the accuracy of the review.
Several statistical stopping rules have been proposed in the literature (Cormack & Grossman, 2016; Howard et al., 2020; Kastner et al., 2009; Ros et al., 2017; Stelfox et al., 2013; Wallace et al., 2010, 2012; Webster & Kemp, 2013; Yu & Menzies, 2019). However, these rules can be difficult to interpret and apply by non-specialists and have not been widely implemented in software tools.
Alternatively, heuristics have been proposed as a practical and effective way to define stopping rules for active learning-based screening in systematic reviews and meta-analyses. Several heuristics have been proposed, each focusing on a single aspect, including time-based, data-driven, and number-based strategies, such as those proposed by Bloodgood & Vijay-Shanker (2014), Olsson & Tomanek (2009), Ros et al. (2017), and Vlachos (2008). In time-based approach, the screener stops after a pre-determined amount of time, which can be useful when there is limited time to screen or when hourly costs of the screener are high. In the data-driven approach, the screener stops after labeling a pre-determined number of consecutive irrelevant records, such as labeling 50 records as irrelevant in a row. In the number-based approach, the screener stops after having evaluated a fixed number of records. This number can be based on an estimate of the total number of relevant records in the starting set (Cormack and Grossman, 2016). A variation of the number-based approach is to screen a predefined set of records randomly and use the observed fraction of relevant records to extrapolate an estimate of relevant records for the complete set (van Haastrecht et al., 2021). Lastly, we propose the key paper heuristic to validate the recall, which has been described in various sources, such as Tran et al. (2022), and Bramer et al. (2018). Key papers are typically used for validating the search strategy by ensuring that the search process adequately identifies relevant primary studies. When using the key paper heuristic for validating the active learning phase, a set of important papers is determined beforehand, for example, by expert consensus, and the screener stops if all these papers are found with active learning. In sum, these single-aspect heuristics offer practical and simple approaches to define stopping rules for active learning-based screening, and can help non-specialists more easily interpret the results. At the same time, using a single heuristic has limitations and may result in missing potentially relevant records. Therefore, we suggest implementing a combination of heuristics to avoid ending screening prematurely and increase the recall rate.
The goal of the current paper is to present a practical and conservative stopping heuristic that combines different heuristics to avoid stopping too early and potentially missing relevant records during screening, which can be applied in screening software like ASReview. The proposed stopping heuristic is easy to implement and can be effectively applied in various scenarios. The SAFE procedure consists of four phases: Screen a random set for training data; Apply active learning; Find more relevant records with a different model; Evaluate quality. We first present the results of an expert meeting in which we piloted and discussed the stopping heuristic. Next, we provide a detailed explanation of the heuristic, including its implementation and effectiveness in different scenarios. The proposed stopping heuristic balances the costs of continued screening with the risk of missing relevant records, providing a practical solution for reviewers to make informed decisions on when to stop screening. It is our aspiration that this practical and effective stopping heuristic will be widely adopted and implemented in systematic reviews and meta-analyses using active learning.