Figure 1 describes the flow of literature. 27,848 unique citations underwent title/abstract screening, and 1,799 underwent full-text review, after which we included 112 studies reported in 121 papers across KQs 1, 2, 4 and 5. Citations excluded at full-text review, with reasons, are included in Supplemental file ii.
Key Question 1 and 1a: Effectiveness and Comparative Effectiveness
No RCTs comparing screening versus no screening were eligible for this review. The review conducted for the previous task force guideline included one RCT from India (77), which compared cytology versus no screening, but this was no longer eligible in view of the new criterion of only including studies from countries with a Very High Development Index.
Ages to start and stop screening and use of 3- versus 5-year intervals
Study characteristics
Twenty-two observational studies, reported in 24 papers, informed the effectiveness of cytology screening and comparative effects of 3- versus 5-yearly cytology screening intervals (Supplemental file 1a has study characteristics, risk of bias assessments and all evidence sets). Some studies were used for both questions. There were no studies focusing on hrHPV screening. The associated publications included populations that overlapped substantially with the primary publication but either focused on a different outcome (78, 79) or contained data on age subgroups for comparing intervals (but with a smaller sample than the primary paper reporting across all ages) (80, 81). In one case, two studies by the same authors were considered different studies because, though there was some overlap in populations, the studies were used for different comparisons (screening interval (82) and age to stop (83)). We included seven studies in eight publications (78, 79, 84–89) to inform age to start screening, 10 studies in 11 publications that addressed age to stop screening (79, 82–86, 90–95), and 10 studies in 12 publications on screening at different intervals (78–80, 82, 84, 96–102). Studies were typically done in Europe and North America with a single study conducted in Japan (103). In nine studies (10 publications), screening was undertaken within an organized program (80–83, 86, 88, 89, 100, 103–105); seven studies relied on opportunistic screening (78, 79, 85, 92, 95, 97, 99); and the remainder had organized screening running parallel to opportunistic screening (84, 87, 90, 91, 93, 94). Sixteen studies (18 publications) used a case-control design with a median of 790 cases of ICC (range number of cases 39 to 5,047) (78–85, 87, 88, 90, 92, 95, 97, 99, 103, 105) and six were cohort studies (86, 89, 91, 94) with a median sample size of 545,934 (range of sample sizes 2,081 to 2,621,802). Very few studies reported on participant characteristics of interest, such as HPV vaccination status, race/ethnicity, or socioeconomic status. Of note, all studies were carried out before HPV vaccination was implemented, or done in periods when HPV vaccinate rates were very low. In terms of risk of bias, most studies did not demonstrate significant concerns for most variables with the exception of lack of adjustment or controlling for potential confounders other than age in 13 (59%) studies, which was considered an important source of bias. In most cases, the risk of bias was not considered serious such that we did not rate down during the assessments of certainty, which already started at low due to the inability of observational studies to control for unmeasured confounding.
Findings
Six studies reported the effect of cytology screening at age 20 to 34 years on the incidence of ICC (79, 84–88). None reported on individuals aged under 20 years of age. Based on pooling five case-control studies and considering findings from the cohort study, we rated the evidence to be of very low certainty for all age groups included to inform age to start cervical cancer screening (20–24, 25–29, and 30–34 years) (Supplemental file 1a). Several studies reported on ages that did not closely align with the 5-year age categories of interest, warranting rating down for indirectness. Further, there was high heterogeneity among the case-control studies and between estimates from the case-control and cohort studies; for example, pooled data from case-control studies favoured no screening in 20 to 24 and 25 to 29-year-olds, whereas the large cohort study (sample over 2 million but not reported by age) favoured screening in both of these age categories. Similarly, data from one high risk-of-bias cohort study (n = 353,045) (89) and one small (n = 1,483) case-control study (78) provided very low certainty evidence for all age groups about the effect of cervical cancer screening on subsequent all-cause and cervical-cancer specific mortality, respectively.
In separate pooled analyses of six case-control studies (79, 83–85, 90, 92) (N = 16,909) and two cohort studies (n = 569,132 (91); number not reported for the age group (86)) screening between the ages of 60–69 years was associated with lower incident ICC compared with not screening (pooled OR 0.46, 95% CI 0.34 to 0.62; pooled RR 0.54, 95% CI 0.46 to 0.62, respectively). Based on a cumulative rate for ICC in participants not screened (n = 360,093) in their early 60s in a national Swedish cohort (91) of about 20 ICC cases per 10,000 individuals over 10–15 years, the absolute reduction would be estimated at 9 fewer ICC per 10,000 over 10–15 years (i.e., similar to our threshold of 3 fewer per 10,000 after one of about three possible screening rounds). Two case-control studies (79, 90) that provided data using 5-year increments found associations with reduced ICC for both 60–64 and 65-69-year-olds. When exploring our other subpopulations, two studies examined the effect on ICC incidence of cytology screening versus no screening at age 60–65 years depending on their screening results in their 50s (83, 91). First, one case-control study (n = 12,708) from the UK (83) found reductions over 25 years of 43 per 10,000 from screening among those who had an abnormal screen during their 50s and 49 cases if they had not been screened during their 50s. Smaller reductions were observed among those who had “irregularly screened” (i.e., no abnormal tests aged 50-59y and a negative test between aged 50-54y but not 55-59y, or aged 55-59y but not 50-54y; 12.9 fewer, 95% CI 6.5 to 19.3) or “adequately” screened (i.e., only normal tests, with one in each 5-year period; 6.3 fewer, 95% CI 0.3 to 12.3). Second and similarly, a large Swedish cohort study (n = 569,132) (91) following individuals up to 24 years (mean 10.9) reported that screening versus no screening at age 61–65 benefitted those who had not been screened (33 fewer per 10,000) or had abnormal results (60–70 fewer per 10,000) in their 50s, but had less effect for those who had inadequately (5.4 fewer per 10,000, 95% CI 14.2 fewer to 3.4 more; aHR 0.82, 95% CI 0.56 to 1.22) or adequately (2.8 fewer per 10,000 [7.8 fewer to 2.2 more]; aHR 0.90 [0.69 to 1.17]) (similar definitions as in the UK study) screened. For reduction in ICC among those aged 60–69 years, we rated the certainty of evidence as moderate (rating up for large effect) for a reduction among those with no, abnormal, or inadequate screening in their 50s, and low (with some imprecision especially in the larger study) for a reduction among those adequately screened during their 50s.
For mortality from cervical cancer among those aged 60–69 years, benefit was shown in analyses of three case-control studies (78, 93, 95) (pooled OR 0.50, 95% CI 0.37 to 0.67, N = 2,582) and one cohort study (94) (RR 0.23, 95% CI 0.10 to 0.51, n = 59,065). Based on data in Finland for cervical-cancer mortality rates over about 10 years among those not invited to screen at age 65 (n = 486,869; 0.38 per 10,000) (94), the absolute risk reduction may be between 0.19 and 0.29 fewer deaths per 10,000 screened. Considering the variation in effects for incidence of ICC from studies looking at effects among subpopulations based on screening results in their 50s, we rated the certainty of evidence as moderate (rated up for large effect) for a reduction in cervical-cancer mortality among those with no, abnormal, or inadequate screening in their 50s, and low (due to indirectness of the overall findings to this population) for a reduction among those adequately screened during their 50s.
For persons aged 70–79 years, the effect of screening for cervical cancer on incident ICC, though appearing to be of benefit, was less certain than for those 60–69 years in pooled analysis of three case-control studies (OR 0.44, 95% CI 0.33 to 0.57, N = 4,258) (84, 85, 92). The analysis relied mainly (97% weight) on one study (92). We did not find a good estimate for the rate of ICC in unscreened individuals in their 70s who had been screened during their 60s, as a basis for estimating absolute effects. It was judged that the varying effects for those in their 60s for ICC, based on screening results in their 50s, would translate to this age group. The certainty was rated as low for a reduction in ICC incidence for those aged 70–79 with no, abnormal, or inadequate screening in their 50s and very low for those adequately screened during their 50s. Only one small case-control study (95) reported the effect of screening in one’s 70s on cervical cancer mortality, leading to inconclusive findings related to imprecision, lack of consistency and (for those adequately screened in their 50s) indirectness.
Eight case-control (N = 20,862) and two cohort (N = approximately 174,000) studies contributed to data on the effect of cytology screening by intervals less than 3.5 years versus 3.5 to 5.5 years on incident ICC across age groups and provided inconsistent and thus very low certainty evidence (79, 80, 82, 84, 97, 99, 100, 103–105). Analyses stratified by 5-year age groups and by setting/type of screening program did not explain the heterogeneity (Supplemental file 1a). One case-control study (n = 11,447) (78) found a significant effect for reducing the risk of death from cervical cancer when screening at an interval of ≤ 3 years compared with 3–5 years, but findings were of very low certainty for lack of consistency especially when there was inconsistency demonstrated for the incidence outcome.
Scarce data were reported for other specific populations of interest and were limited to cytology screening. In one US case-control study (n = 11,404) of persons with a cervix aged 65 years or older, race (White vs. non-white) was not associated with the (beneficial) effect of screening (p = 0.243 for interaction) when adjusted for median income by zip code and potential impact of hysterectomy (92). A large (n = 2,621,802) cohort study that adjusted results for age, there was no difference in the (beneficial) effect of screening 23 to 50-year-olds on incidence of ICC based on immigrant status (Swedish-born vs. birth outside Sweden) (86). No data were presented for trans or nonbinary individuals.
Comparative effectiveness between screening strategies
Study characteristics
We included 16 trials (14 RCTs (106–119) and 2 quasi-RCTs (i.e., using odd-even date of birth or personal identification number) (120, 121)), one observational study (122) and four associated papers (123–126) that addressed the comparative effectiveness of different screening strategies (Table 2; Supplemental file 1b). Most studies were conducted in the setting of organized screening programs in Europe and in addition, one trial each was done in Canada (127–129), Hong Kong (130), and Australia (131). Fifteen trials only reported on one screening round (number enrolled ranging from 667 (118) to 201,038 (120)), because any second rounds of screening used the same method in each group compared. One RCT (HPV Focal; n = 22,588) provided data for one round as well as the comparison between two rounds of screening with cytology with triage to hrHPV (over 4 years) and one round of screening with hrHPV with triage to cytology (112). As previously mentioned, the data from these studies were collected on those who undertook screening; some trials had very low (5–52%) (106, 109, 110, 113, 116, 117, 120, 121) rates of enrollment among those allocated. Most trials included participants across age groups, typically ranging from individuals in their 20s and 30s to those in their 60s. Two trials (109, 132) only included older participants (aged 50–60 and 56–60 years). Length of follow-up for incidence outcomes ranged from 18 months to 5 years. Outcomes in pre-specified populations and data for subgroup analyses were limited with five trials (113, 117–119, 133) enrolling persons with a cervix who were underscreened and seven trials (106, 115, 120, 121, 123, 127–129, 131, 134, 135) presenting data by age subgroups.
Four trials (117, 118, 127–129, 136) were not considered at high risk of bias for any included outcome. Six trials were at high risk of bias for inadequate sequence generation (106, 109, 110, 116, 120, 121). Blinding of participants (performance bias) and outcome assessors was unclear in most trials, though blinding of participants was not thought to be of major concern in these studies of comparative effectiveness. The domain of incomplete outcome data, from attrition after the screening test, was at high risk of bias across multiple outcomes in one trial (116) and for the incidence outcomes in two others (108, 126) (Supplemental file 1b). Two trials (107, 111) were at risk of missing data for the incidence outcomes because they were not actively ascertained for all participants (only using data linkage or safety reporting), thus some events particularly for CIN 2 and CIN 3 could have been missed.
Methods for detecting incident cases varied, and included cytology (conventional or liquid-based), hrHPV and liquid-based cytology co-testing, data linkage, safety monitoring (i.e., reported as an adverse event), and a combination of these approaches. Outcomes that included cases of ICC (incidence of CIN 2+, CIN 3 + and ICC) but did not use data linkage to find clinically detected cases were considered indirect. Further indirectness came from studies where people with CIN 2 or 3 detected during screening were not followed to find any cases that progressed. The large (n = 1,262,510) observational study from England was only included for the incidence of ICC outcome (via second round screening and cancer registries) not reported by a trial for one comparison; the groups were differentiated by changes in laboratories in some regions to implement hrHPV screening with triage to cytology but the authors noted differences between groups in socioeconomic status which was not accounted for in the analysis.
Table 2 describes the strategies in each study in detail. Based on descriptions of the interventions and clinical input from the working group, we classified the trials to examine 10 major comparisons (Box 1), with some evaluated by only one RCT. In a few cases (e.g., Comparisons 1 and 10, 2 and 8, 6a and 6b), the screening strategies were quite similar but differences in the populations (general population vs. under/never-screened) were thought to differentiate them enough to separate for analysis. Further, in two cases (Comparisons 2 and 3) there were differences between trials within the same major comparison with respect to whether there was additional follow-up (e.g., at 6–12 months) beyond the main triage testing at baseline during each round of screening. For this, we created subgroup comparisons for screening “without recall” and “with recall” in each round. Two trials provided data for more than one major comparison or subgroup. The Norwegian HPV Pilot trial (n = 157,447) (121), primarily comparing hrHPV with cytology triage versus cytology with hrHPV triage (with recall; Comparison 3b), also provided data for Comparison 2a of hrHPV with cytology triage versus cytology alone (without recall) because the hrHPV triage results in the cytology arm were not acted upon until the recall phase, to check for persistence, and positive results from cytology alone were referred to colposcopy at baseline (allowing for detection and false positive outcomes from this perspective). Likewise, the HPV Focal RCT (112) provided data for some outcomes in Comparisons 3a, b, (n = 22,588) and c (n = 16,374). In comparisons including more than one trial, there were sometimes differences between trials in the threshold used for referral to colposcopy after cytology and in the screening methods used at the recall stage (Box 1). There were no comparisons between cytology alone and either cytology with hrHPV triage or hrHPV with partial genotyping (with or without triage).
Findings
Supplementary file 1b contains the full evidence sets for each outcome-comparison. We did not rate down for indirectness in our certainty assessments when considering the trials focused on adherence to screening, versus being invited to screen, though in some cases the risk of bias was high when RCTs enrolled many fewer participants than randomized and did not demonstrate comparable baseline characteristics between arms.
For all comparisons, we are very uncertain about any impacts on all-cause and cervical-cancer mortality and for overdiagnosis. No trial reported on cervical-cancer mortality, and only one trial (COMPASS; n = 2,987) (107) in Comparison 4 reported on all-cause mortality at short follow-up (18 months) duration and with imprecision. For overdiagnosis, an associated paper (123) to the FINNISH RCT (111) in Comparison 2 used results from the trial for 5-year follow-up after one round from the screening (including prevalent and incident cancers) strategies together with historical population-based data for an estimated incidence of cancer without screening over 5 years (Finland in 1958–1962; incidence 17 per 100,000 person-years). In this study, overdiagnosis was defined as the risk of CIN 3 cases that would not have progressed to invasive disease by the next screen (5 years later) using the period prevalence of CIN 3 lesions diagnosed at the screen and during the following screening interval minus the rate of prevented cancers (squamous cell cancer) within the same screening round (the rate assuming no screening minus the rate of interval cancers found in the trial). Estimates of overdiagnosis of non-progressive CIN3 + were presented for hrHPV with cytology triage (cases overdiagnosed 39.6 per 1,000 person-years, 95% CI 31.3 to 48.9) and from cytology alone (20.3 per 1,000 person-years, 95% CI 13.6 to 27.9). The evidence was rated to have very low certainty, from data that was considered observational (this started at low certainty), at risk of bias, and indirect from the use of historical incidence data for the no screening comparator. Results for other outcomes are presented here by groupings of comparisons.
Comparisons 1 through 4: Table 3 contains the summary of findings from these comparisons that were considered most relevant for decision-making about which strategies to recommend. In each comparison, the certainty was assessed separately for each major age group reported across the studies (25–29, 30–59, and 60–69 years); if the study(ies) reporting on a comparison did not include any participants in the age group (e.g., 60–69 year-olds in Comparison 1) we report the certainty as very low.
Only three trials provided data comparing strategies using clinician-sampled hrHPV versus cytology alone (Comparisons 1 and 2), which was considered the major comparator of interest. The NTCC Phase II RCT in Italy (Comparison 1) compared hrHPV screening alone with cytology alone (≥ ASCUS to colposcopy in most centres) at nine screening centres (115). Low certainty evidence suggested little-to-no difference for 25–59 years-olds between strategies for incidence of CIN 2 and CIN 3+ (latter using detection of CIN 2 + as a surrogate) and very low certainty was found for incidence of CIN 3 and ICC (using CIN 3 + detection). Though results were statistically significant for higher detection of CIN 2 + and 3 + with hrHPV screening alone, the point estimate and its 95% CI for CIN 2+ (30.2 more) did not exceed our threshold of 100 more per 10,000 for indicating greater than little-to-no difference, and the 95% CIs (across all ages and within age groups) were imprecise for CIN 3 + detection meeting its threshold of 10 more per 10,000. There was moderate certainty evidence for at least some harm (≥ 300 per 10,000) from referrals to colposcopy for 25–59 years-olds (possibly considerably more for those 25–29 years), and from biopsies and false positives for CIN 2 + and CIN 3 + for those aged 25–29 years.
The Finnish RCT (n = 132,194; ages 25 to 65 years) (111) and Norwegian HPV Screening Pilot trial (n = 157,447; ages 34 to 69 years) (121) contributed to data for Comparison 2a, of hrHPV with triage to cytology versus cytology alone, without recalls. Data from the recall stage in both trials were not used for detection or false positive outcomes, whereas data on case detection during the recall (“intensive screening”) phase in the Finnish RCT contributed to incidence outcomes for this comparison. Using detection data for CIN 3+, findings suggested that there may be fewer incident cases of ICC from hrHPV with triage to cytology across age groups (low certainty). Across age groups, there was low certainty evidence of little-to-no difference for incidence of CIN 2 and CIN 3+ (using detection of CIN 2+) and moderate certainty of little-to-no difference in referrals to colposcopy and in false positives for CIN 2+, CIN 3+, and ICC. For Comparison 2b adding recall, where only the Finnish RCT contributed data, there was still low certainty for little-to-difference for CIN 2 or CIN3 + incidence across age groups, but only evidence of reduced ICC incidence (via CIN3 + detection) for the group of 25–29 year-olds because of imprecise findings for the older ages. It is unclear if adding a recall phase (including all HPV positives) in this main comparison increases the potential benefit from the hrHPV strategy, and this would need to be considered in light of moderate certainty evidence showing harm from false positives to recall which may be at least twice our threshold (possibly 800–900 more per 10,000).
The Canadian HPV Focal RCT (112, 124) (25–65 years) contributed to Comparisons 3a, b, and c, whereas data from two other studies contributed to each of Comparisons 3a and b. Two trials from Sweden, using the exact same comparison but non-overlapping populations with differing ages (Swedish HPV Trial 30–64 years, n = 201,028 (120) and Stockholm-Gotlund 56–60 years, n = 14,763 (110)) provided data for detection, referrals to colposcopy and false positive outcomes for Comparison 3a, because they did not include recall in their screening strategy. Data by age group for detection of CIN 3 + and ICC for the larger Swedish trial were obtained from the authors. For Comparison 3b, the Norwegian HPV Screening Pilot trial (121) contributed data for detection, referrals to colposcopy and false positive outcomes and the English HPV screening observational study (122, 125) was used for incidence of ICC (not reported by the trials). For incidence outcomes other than ICC, the HPV Focal RCT was the only contributor of data. For Comparison 3b, data for incidence after recall during round one used cases detected during round two in the cytology arm as well as the 48-month exit co-testing for both arms.
For Comparison 3a, we assessed evidence as moderate certainty for an association between the hrHPV strategy and reduced ICC incidence, via CIN 3 + detection, for the age group of 30–59 year-olds; results were imprecise for 25–29 year-olds and of low certainty for little-to-no difference for 60–69 year-olds. For those aged 30–69 years there was low certainty for little-to-no difference between strategies for detection of ICC and incidence of CIN 3+, via CIN2 + detection (for 25–29 year-olds the data was either not reported or very low certainty). For those 30–69 years, there was moderate certainty evidence for little-to-no difference between strategies for the reported harm outcomes; for the 25–29 year age group we rated the certainty down further for indirectness because this age group contributed < 1% of the total sample for the outcome.
For Comparison 3b, findings appeared similar to Comparison 3a for the 30–59 year (moderate certainty of some reduction in ICC without increased harm) and 60–69 year (low certainty for little-to-no benefit) age groups. For the younger ages, adding the recall appears to add an advantage from the hrHPV strategy for detecting CIN2+, though while noticeably increasing harm from more referrals to colposcopy and false positives. One contributing factor may have been that in the hrHPV strategy in the HPV Focal RCT (only trial with 25–29 year-olds) those persistent for HPV at recall were sent for colposcopy even if they had normal (< ASCUS) cytology results, whereas in the cytology strategy arm only those with ≥ ASCUS were sent to recall. The findings from direct evidence on incidence of ICC and CIN 3 + during follow-up were of very low certainty; apart from lack of consistency from use of one study for each of these outcomes, for ICC the observational study was at high risk of bias and for CIN 3 + there was very serious indirectness from the 48-month exit round in the HPV Focal RCT failing to include those with CIN 2 or 3 during baseline screening and from lack of data linkage to capture clinically detected cases (incidence outcomes were not a primary aim for these trials).
For Comparison 3c, using data across all ages from the HPV Focal RCT findings were of little-to-no difference for incidence of CIN 2, CIN 2 + and CIN3+ (via CIN2 + detection) for those aged 30–69 and for incidence of CIN 2 for those 25–29 years. Little-to-no difference was found for referrals to colposcopy and false positives for CIN 2 + and 3+, with moderate certainty for 30–69 year-olds and low certainty for 25–29 year-olds. Evidence for incidence of ICC via detection of CIN3 + was rated as very low certainty due to lack of consistency, indirectness, and imprecision.
One small RCT (n = 2,987) (107) contributed to Comparison 4, where hrHPV testing used partial genotyping in both arms of hrHPV with cytology triage (though referral to colposcopy for types 16/18 and for type 45 with ≥ HSIL) and cytology with hrHPV triage (referral to colposcopy for HPV types 16/28 with ≥ ASCUS). Up to two recalls 12 months apart were advised for some individuals (e.g., with ASCUS or LSIL and negative hrHPV results). There was low certainty evidence of little-to-no difference between strategies for incidence of CIN3+ (from more detection of CIN2+) for those aged 30–59 years. Findings for the other age groups and other benefit outcomes across all ages were of very low certainty, often from added imprecision (e.g., n = 629 in 25–29 year age group) or indirectness from needing to rely on data from those aged 30–64 for findings among 60–69 year-olds. For referrals to colposcopy and false positives for CIN2+, CIN3 + and ICC, there was little-to-no differences with moderate certainty for the 30–59 year-olds and low certainty for the other age groups.
Comparisons 5, 6a, and 7
These comparisons had at least one outcome with low or higher certainty evidence and Supplementary file 1b has all data sets.
In Comparison 5, an RCT from Hong Kong (108) compared hrHPV with cytology triage of negative tests versus cytology with hrHPV triage, both arms having recall. The certainty for all outcomes was very low for age groups 25–29 and 60–69 years because the trial only enrolled those 30–60 years. Low certainty evidence found reduced incidence in the hrHPV strategy arm of ICC via more detection of CIN3+ (36.0 more per 10,000, 95% CI 14.3 to 71.0 more) and little-to-no difference between arms for incidence of CIN3 + via CIN 2 + detection (51.9 more per 10,000, 95% CI 23.4 to 93.7 more). There was moderate certainty evidence that the hrHPV strategy resulted in more referrals to colposcopies and false positives (about 600 per 10,000).
In Comparison 6a, the IMPROVE RCT from The Netherlands (n = 13,799) examined self- versus clinician- sampling for hrHPV within arms using the same methods for triage to cytology (114). The certainty for all outcomes was very low for age groups 25–29 and 60–69 years because the trial only enrolled those aged 29–61 years. There was low certainty evidence of little-to-no difference between arms for CIN 2 + detection (1.5 fewer per 10,000 in self-sampling arm, 95% CI 36.5 fewer to 44.9 more) and very low certainty evidence for CIN 3 + and ICC detection, both of which had imprecision though there were a higher number of CIN 3 + cases detected in the self-sampling arm. There was moderate certainty evidence of little-to-no difference between arms in referrals and number of colposcopies (57 fewer from self-sampling) and for false positives for CIN 2+, CIN 3 + and ICC (range 54 to 81 fewer). No incidence data was reported.
Two RCTs from Sweden (Uppsala I [50–60 year-olds] & II [30–60 year-olds]; N = 11,414) (106, 109) compared hrHPV self- and clinician-sampling where individuals with positive tests in both arms had repeat testing with the same method 3–6 months later and those persistent for hrHPV were referred to colposcopy (Comparison 7). There was low certainty evidence of little-to-no difference between arms for CIN 2 + detection (32.5 fewer per 10,000 in self-sampling arm, 95% CI 65.3 fewer to 12.7 more) and very low certainty evidence for CIN 3 + detection because of imprecision. There was low certainty evidence of little-to-no difference between arms in referrals and number of colposcopies (20 more from self-sampling) and for false positives for CIN 2 + and CIN 3+ (49 and 35 more, respectively). One of the RCTs (109) enrolling 65% of the total sample in this comparison included only participants aged 50 years or older, so this age group was overrepresented in the analysis and we rated down for indirectness to the age range of interest (30–59 years).
Comparisons 6b and 8–10. Five small RCTs (range n analyzed = 164 to 2,845) reported on the comparative effects involving hrHPV self-sampling in one or more arms among populations who were either non-responders or underscreened (Box 1) (113, 116–119). In all but one (118) reporting on Comparison 9, the number of enrolled participants was far below (5.4%-13%) the number randomized and all RCTs were rated at high risk of bias. Very low certainty evidence was found for all reported outcomes (colposcopy, detection of CIN 2 + and 3 + and false positives) in these comparisons (Supplementary file 1b). Four of these RCTs were also included in KQ5 on interventions to increase screening uptake (113, 116–118).
Key Question 2: Comparative Accuracy
Ten studies in 11 papers were included for KQ2 (137–147). Several studies included in other similar reviews we screened for eligibility (41, 48) were excluded either because they reported on comparisons excluded in this review (either per protocol or post hoc as per Methods), were conducted in countries without a Very High Development Index, or did not apply the reference standard in at least a sample of the test negative population. Characteristics and risk of bias assessment of included studies, and all evidence sets are presented in Supplemental file 2.
The median patient age was 40.0 years (range of means 23.0 to 45.8) with sample sizes ranging from 247 to 256,648 participants. Studies were generally from countries with organized screening, including two studies in Greece (137, 140), three from the United States (138, 139, 141), and one from each of Canada (144), Germany (143), France (145), South Korea (146), and England (147). Only one study, conducted in Germany (142), was in an area with largely opportunistic screening. Only three studies reported that HPV vaccination had been implemented, with proportions of the study populations vaccinated ranging from 0.1–4.0% (137, 140, 141). All studies were rated as having unclear risk of bias due to lack of reporting on items for one or more domains. Thresholds for a positive HPV screening test were generally not reported, but where reported the threshold was either 1.0 relative light unit (5,000 or more HPV DNA copies) (138) or 1.0 pg/ml (144, 145, 147). Findings for cytology are for the ≥ ASCUS threshold unless otherwise stated. Lastly, findings apply to CIN 2 and CIN 3 unless otherwise stated.
False positives
The main purpose of this review was to examine false-positives from screening strategies used in KQ1 RCTs, when false positives were not reported in the RCT (i.e., for ICC from hrHPV alone versus cytology alone) or had very low certainty evidence. Additionally, we focused on comparisons between strategies not examined in the RCTs (e.g., hrHPV with cytology triage versus hrHPV, cytology with hrHPV triage versus cytology alone) to give some indication of where replacements/alternatives could possibly be used. Table 4 summarizes the findings for the comparative false positives between screening strategies. A conclusion of little-to-no difference indicates that there was less than 3% (300 per 10,000) fewer or more false positives with the first versus the second strategy.
Our evidence found that self- versus clinician sampling of hrHPV alone probably makes little-to-no difference in false positives.
Compared with hrHPV testing alone (via self-or clinician sampling), adding cytology triage or replacing the hrHPV test with one allowing partial genotyping with or without cytology triage, reduces the number of false positives. Though we did not rate our certainty in the magnitude of difference beyond the threshold of 3%, there appears to be a large reduction in false positives from adding cytology triage or genotyping to HPV alone (range 500 to almost 3,000 fewer per 10,000). There is probably little-to-no difference in false positives for CIN 2 + or CIN 3 + between hrHPV with partial genotyping (types 16/18) alone and hrHPV with cytology triage. Adding cytology for the non-16/18 types after using partial genotyping probably increases false positives.
Results from replacing cytology alone with hrHPV tests with partial genotyping (types 16/18) alone led to varying results based on the cytology threshold; when compared to cytological detection of ≥ ASCUS, there may be fewer false positives (which remain fewer if adding on cytology for non-16/18 types), whereas for cytological detection of ≥ LSIL there was little-to-no difference, and for atypical squamous cells – cannot exclude HSIL (≥ ASCH+) and ≥ HSIL the false positives probably increase when cytology is replaced by hrHPV. Adding hrHPV triage (with or without partial genotyping for types 16/18) to positive cytology may reduce false positives compared with cytology alone.
Findings within different age groups were in the same direction of effect for hrHPV alone versus cytology alone and for hrHPV with partial genotyping (types 16/18) alone versus hrHPV alone or cytology alone.
Sensitivity and specificity
All results and certainty assessments for sensitivity and specificity are included in Supplemental file 2. As expected, in a majority of cases there is a trade-off between sensitivity and specificity and as such attempts to increase specificity (to reduce false positives) often lead to lower sensitivity (i.e., some number of missed cases). Assuming prevalence rates for CIN 2 + of 1.4% and CIN 3 + of 0.6%, as aggregated across the included studies, the reduction in sensitivity and thus number of missed cases varied across comparisons. For example, adding cytology triage (≥ ASCUS) to hrHPV alone may increase specificity to a large degree (with possibly 3000 per 10,000 fewer false positives) but at the expense of missing some CIN 2+ (55 to 65 per 10,000; 2 studies, N = 38,113) and CIN 3+ (23 per 10,000; 1 study, n = 34,254) cases. Similar findings were found with replacing hrHPV alone with hrHPV with partial genotyping (types 16/18) alone (range 713 to 2943 fewer false positives but 58 to 73 fewer CIN 2 + and 12 to 33 fewer CIN3 + cases detected per 10,000; 3 studies, N = 41,018), whereas replacing cytology (≥ ASCUS) alone with hrHPV with partial genotyping (types 16/18) alone appears to have less impact (range 70 to 1830 false positives and up to 21 fewer CIN 2 + and 9 fewer CIN3 + cases detected per 10,000; 3 studies, N = 41,018).
With hrHPV alone, self-sampling probably has lower sensitivity than, and similar specificity to, clinician sampling for detecting CIN 2 + or CIN2/3 (3 studies, N = 2,832); the number of missed cases may be small (13 to 27 missed cases per 10,000). Cytology (≥ ASCUS) with hrHPV triage with partial genotyping for types 16/18 versus cytology alone may increase specificity without impacting sensitivity (CIN 2 + and CIN 3+; 1 study, n = 2,905). In two comparisons there was little-to-no difference between strategies in sensitivity or specificity for CIN 3 + detection, such that replacing one with the other may make little impact:
-
hrHPV with partial genotyping (types 16/18) alone versus hrHPV with cytology triage (≥ ASCUS) (1 study, n = 34,254; low [sensitivity] and moderate [specificity] certainty);
-
cytology (≥ ASCUS) with hrHPV triage versus cytology (≥ ASCUS) alone (1 study, n = 2,905; low certainty).
There is probably (moderate certainty) an increase in both sensitivity and specificity for the first strategy in two comparisons:
-
hrHPV with partial genotyping (types 16/18) with triage to cytology on non-16/18 types (≥ ASCUS) versus hrHPV with cytology triage (≥ ASCUS) (for CIN 3+; 1 study, n = 34,254);
-
hrHPV with partial genotyping (types 16/18) with triage to cytology on non-16/18 types (≥ ASCUS) versus cytology (≥ ASCUS) alone (CIN 2+; 2 studies, N = 38,113 and CIN 3+; 1 study, n = 34,254).
Few studies reported on accuracy of ICC detection. One study (n = 2,905) found that hrHPV with partial genotyping [types 16/18] increased sensitivity and reduced specificity for ICC compared with cytology alone (various thresholds). Most findings for CIN 2 + and CIN 3 + were similar for 20 to 29 and ≥ 30-year-olds. Compared with cytology alone, hrHPV with partial genotyping 16/18 alone may decrease sensitivity for CIN 2 + in 20 to 29-year-olds, but increase sensitivity in ≥ 30-year-olds, both having low certainty.
Key Question 3: Pregnancy Harms of Conservative Management of CIN
Two Cochrane reviews (46, 47) synthesized evidence about adverse pregnancy outcomes following excisional or ablative management of CIN (all grades and both squamous and glandular intra-epithelial neoplasia). One reported on early pregnancy outcomes (47), while the other focused on late obstetrical outcomes (46), and the latter also expanded the exposure of interest to early (stage IA1) cervical cancer. Both reviews examined outcomes among individuals treated for lesions compared with an untreated reference population (i.e., untreated females from the general population, internal controls of pregnancies in the same individual before treatment, or individuals with disease that did not receive treatment). All included studies were observational in design, thereby limiting certainty in the evidence (certainty started at low). Despite this, the authors rated most studies as good quality and did not rate down the certainty further for additional study limitations, indirectness or imprecision. Supplemental file 3 presents the summary of findings tables.
Early pregnancy outcomes
There may be little-to-no difference in total miscarriage rates between individuals treated for CIN and those not treated (RR 1.04, 95% CI 0.90 to 1.21; ARD 1 more, 95% CI 3 fewer to 6 more per 1000; 10 studies, N = 39,504; low certainty evidence). The authors’ meta-analysis found that, across all studies, CIN treatment was associated with increased risk of second trimester (12 to 24 weeks’ gestation) miscarriage versus no treatment (RR 2.60, 95% CI 1.45 to 4.67; ARD 6 more, 95% CI 2 to 14 more per 1000; 8 studies, N = 2,182,268). Based on input from the working group, current clinical practice for the management of CIN 2 has become much more conservative (e.g., avoiding excisional procedures, more surveillance due to better knowledge about their frequent regression) in recent years for those ≤ 25 years old or prioritizing reproductive futures (148, 149). Because of this, we rated down the certainty to very low from indirectness of the types of management provided in the studies (most conducted pre-2010) compared with current practice. Other priority outcomes in early pregnancy (i.e., cerclage and cervical insufficiency) were not addressed in the studies.
Late obstetrical outcomes
Meta-analysis of 59 studies found that preterm birth (< 37 weeks) rates were higher among individuals who have been treated for CIN or early cervical cancer compared with those who were not treated (RR 1.75, 95% CI 1.57 to 1.96; ARD 41 more, 95% CI 31 to 52 more per 1000; N = 5,242,917). Risk for preterm birth progressively increased with increasing cone depth among persons treated for CIN or early cervical cancer by excisional procedures versus untreated controls (from RR of 1.54 for depth ≤ 10 to 12 mm to RR of 4.91 for depth ≥ 20 mm). Other treatment factors that increased the risk for prematurity included multiple treatments versus single, excision rather than ablation management, and more radical treatment techniques. Because of clinical input that the management strategies in these studies would usually be more aggressive than used in current practice, we rated the certainty as very low from indirectness. Risk for low birth weight (< 2500 g) was also shown to be higher among individuals treated for CIN or early cervical cancer versus those who were not treated (RR 1.81, 95% CI 1.58 to 2.07; ARD 29 more, 95% CI 21 to 39 more per 1000; 30 studies, N = 1,348,206), though this evidence was also rated down for indirectness to current practice. For both preterm birth and low birth weight, the review authors rated down the evidence for inconsistency; we did not rate down for this factor because a majority of study results were in the same direction of effect even if their magnitudes differed. Lastly, higher rates of cervical cerclage in later pregnancy were found for treated persons compared with untreated controls (RR 14.29, 95% CI 2.85 to 71.65; ARD 15.9 more, 95% CI 2.2 to 84.4 per 1,000; 8 studies, N = 141,300), but again the certainty was very low.
Key Question 4: Relative Importance of Potential Outcomes from Screening
Twenty-three observational studies were included for this key question. Nineteen studies measured health state utility values (150–168) and four measured preferences using other methods (169–172). Across all studies, participant median age was 39.9 years (range of means 18.9 to 53.4) and the median sample size was 342 (range of sample sizes 36 to 146,336,855). Eleven studies were from Europe (151, 155–157, 159–161, 167, 170–172), four from Australia (152, 164, 166, 169), and four from the United States (153, 154, 158, 165). One study was included from each of Canada (150), Japan (162), South Korea (163), and Thailand (168). Twelve studies measured utilities with the EQ-5D (and other instruments in some cases) whereas seven only measured utilities with another instrument. The main issues in risk of bias were low study recruitment rates and failure to perform appropriate analysis to adjust for confounding. Fifteen of the 23 studies recruited less than 50% of eligible subjects or lacked information in this domain. Additionally, 13 of the 19 studies estimating utility scores did not perform appropriate analysis to adjust for confounding. Characteristics of included studies, risk of bias assessments and summary of findings tables are presented in Supplemental file 4.
Disutilities
A weighted average of EQ-5D utilities of the general public was calculated among five of the included studies for use as an indirect comparison to calculate estimates of disutilities in studies that lacked a control group. The resulting utility of the general public was 0.86 (95% CI 0.82 to 0.90), which was similar to that of estimates obtained in a Canadian population (173). We rated this estimate as having high certainty.
Using the EQ-5D instrument, the disutility from cervical cancer is probably 0.11, more than one year after initiating treatment (155, 160, 162, 165, 168). Some within-study evidence suggests that the disutility may be considerably higher immediately after diagnosis or during treatment and/or with more disease severity. EQ-5D disutility from CIN 3 is very uncertain, but for CIN 2/3 it may be about 0.05 after (18 to 20 months) treatment (low certainty), with insufficient data from after a diagnosis to postulate whether disutility would significantly differ (160–162). Additionally, there was high certainty of little-to-no disutility from having a false positive result after cytology screening, mainly in comparison with those screening with normal results (150, 156, 157, 159, 167). When using these data together with findings from utility measurement using other tools, cervical cancer may be at least twice as important as CIN 2/3, and both cervical cancer and CIN 2/3 are probably much more important than false positives. The relative utilities across outcomes from data within studies (160, 161) that compared different health states using EQ-5D and TTO aligned with these findings, whereas the SG technique in one study (162) found quite similar utilities for ICC, CIN 3 and CIN 2.
Other data on preferences: For the studies reporting on non-utility data, we did not undertake a narrative synthesis as planned (52) because the four studies differed in their methodology and/or populations substantially (Supplemental file 4). All studies focused on preferences related to cytology screening. One quantitative study (n = 248) (170) among those aged 30–60 years eligible for the Dutch screening program directly assessed the importance of benefits versus harms from screening to inform screening decisions via an online survey using a 7-point Likert scale for ratings and a ranking system. We judged that the data provided portrayed a relatively high net benefit from screening over 30 years: ICC incidence (8 vs. 25 per 100,000) and mortality (2 vs. 8 per 100,000) versus false positives (1,000 among 100,000) and overdiagnosis (descriptive without numerical data). On the Likert scale the benefits were rated higher than harms (5.09 and 5.37 vs. 4.88 and 4.65, respectively), and when ranking the outcomes the benefits were ranked first and second. Some variability existed between participants (e.g., SDs 1.79 and 1.75 for ratings for ICC incidence and false positives), but results did not differ by age. The evidence was of low certainty that a large majority of individuals aged 30–60 years may weigh the benefits as more important than the harms of screening for cervical cancer using cytology, but think it is important to provide information on benefits and harms for decision making.
Three other studies inferred preferences between the benefits and harms based on data on intentions or attitudes about screening (169, 172) or a willingness-to-pay (WTP) experiment (171). One study (n = 161; 12% previously screened) (169) in Australia provided university students (range 17–24 years) with information on outcomes (stating no benefit in ICC incidence or mortality) using two decision aids that both detailed several harms but differed in whether they explained overdiagnosis (1,600 of 100,000 will have pre-cancers treated that may have resolved). Intentions (3.2 ± 1.3 and 3.0 ± 1.3 on 5-point Likert scale) and attitudes (31.1 ± 8.8 and 32.8 ± 8.2 on scale with range 6–42 with higher indicating more positive attitude) did not differ between the two decision aids. This study found that some individuals < 25 years may have intentions to screen even when informed that screening does not reduce cancer diagnoses or deaths for their age group (low certainty). Two other studies examined intentions to screen (n = 283) and WTP (n = 1,524) amongst regular screeners within the UK using comparisons between factual information and controls. Authors of the WTP study cite literature verifying the validity of WTP for screening in the UK. Both studies provided data on a large reduction in ICC incidence (e.g., 1 in 100,000 screened vs. 10 in 100,000 unscreened) and focused on false positives (e.g., 10% each year, one indicating 50% over 7 rounds) without any information on overdiagnosis of precancers. Although intentions and WTP reduced in intervention versus control groups, they remained quite high (79% vs. 88% and mean WTP 128£ vs. 175 £). Findings indicated that across all ages that may be eligible for screening, a large majority of individuals may weigh the benefits as greater than the harms from screening for cervical cancer. Due to risk of bias and indirectness from relying on inferences about the relative importance of outcomes based on intentions/WTP for screening which may relate to other factors, we rated the certainty of evidence as low.
Key Question 5: Effectiveness and Comparative Effectiveness of Interventions to Increase Screening Rates
There were 44 RCTs in 46 publications included for this key question. One RCT (119) included in KQ1 from our search update (a search update was not conducted for KQ5) also relates to this question but was not included in the synthesis; findings were very similar to those reported here for the effects of opt-in and universal HPV sampling kits. Characteristics of included trials are presented in Supplemental file 5. Trials included a range of participant ages (range from 20 to 74 years) and were typically undertaken in organized screening settings with outcome ascertainment based on register or medical record data of having performed cervical cancer screening either through cytology or HPV testing. Sample sizes ranged from 88 to 90,247 participants. Of 46 publications, most were from Europe (55, 113, 116–118, 174–197), seven were from the United States (198–204), four from Canada (205–208), three from Australia (209–211), two from Japan (212, 213), and a single study from Malaysia (214). One trial in HIV-positive individuals with a cervix (199) was excluded from the meta-analyses and is described qualitatively due to a difference in population, intervention, and usual care compared with other studies. Five trials were considered at high risk of bias (113, 174, 181, 205, 214). Two trials considered high risk of bias had issues with random sequence generation (113, 205), and baseline imbalances; and in single trials there was lack of allocation concealment (181), incomplete outcome reporting (174), and lack of blinding in a subjective/self-reported outcome assessment (214). The remaining trials were rated as having unclear risk of bias due to lack of reporting across multiple domains. Thus, no trial was considered at low risk of bias.
Five main analyses with large sample sizes were undertaken based on grouping similar interventions together; written contact (RR 1.50, 95% CI 1.22 to 1.84; ARD 619 fewer per 10,000, 95% CI 273 to 1041; 16 trials, N = 138,880), personal contact (RR 1.50, 95% CI 1.07 to 2.11; ARD 797, 95% CI 1116 to 1770; 7 trials, N = 17,034), composite interventions (RR 1.73, 95% CI 1.33 to 2.27; ARD 1351, 95% CI 610 to 2350; 8 trials, N = 17,738), universal mail-out HPV (RR 2.56, 95% CI 2.10 to 3.12; ARD 1534, 95% CI 1082 to 2085; 22 trials, N = 211,031), and opt-in HPV self-samples (RR 1.56, 95% CI: 1.19 to 2.03; ARD 727, 95% CI 247 to 1338; 11 trials, N = 71,433). All interventions improved cervical cancer screening rates among persons with a cervix who were never or under-screened. The largest effects appear to be from mailing HPV self-sampling kits to all eligible persons, with about 15% more people screened. There was high heterogeneity in magnitude (not direction) of effect within four of the five main analyses, but none of the pre-specified subgroup analyses reduced heterogeneity (Supplemental file 5). For the fifth analysis which indicated inconsistency in direction of effect, subgroup analysis indicated that the effectiveness of a strategy of opt-in HPV self-samples may be most applicable when the screening test is requested, obtained and returned via one’s home versus requiring an in-person contact (10 trials, N = 61,908: RR 1.61, 95% CI 1.19 to 2.18 vs. 1 RCT n = 9,525: RR 1.00, 95% CI 0.90 to 1.12, respectively). Examination of funnel plots and Egger’s tests (all p values > .05) did not indicate issues with small study effects. Using the GRADE approach and considering only direction of effect, the certainty of the evidence was rated high for all comparisons aside from opt-in HPV self-sampling, which was rated moderate certainty due to the concerns with inconsistency.
Pre-specified between-study analyses exploring the effect of population characteristics (i.e., SES, immigrant, and indigenous status, and rural/remote communities) were not done due to a paucity of data; most trials did not report these characteristics. Further, screening rates by pre-specified populations within trials were not commonly reported. Generally, the magnitude of effects differed to some degree between populations but the interventions remained effective across groups with one exception: one universal mail-out trial in Italy showed improved screening uptake only in urban centers (183) (Supplemental file 5).