We registered the protocol for this review in PROSPERO: CRD42020187876. All changes to the protocol are explicitly reported in the methods section.
This systematic review was performed according to the recommendations of the Cochrane Handbook for Systematic Reviews of Interventions [22] and follows the reporting recommendations of the updated Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [23].
Eligibility criteria
Participants
Eligible participants must be diagnosed with atrial fibrillation (AF) and above the age of 65 years. We operationalized the age criterion as follows:
- ≥80% of the randomized population aged ≥65 years.
- Subgroup analysis reports on participants aged ≥65 years.
Intervention
The intervention group must be treated with any type of non-vitamin K antagonist oral anticoagulant. These include:
- apixaban
- dabigatran
- edoxaban
- rivaroxaban
We only included trials with long-term DOAC treatment, defined as a treatment duration of at least 12 months. This criterion was added during study selection because different from our expectation, we recognized that in some, mainly early phase RCTs DOACs treatment was very short, which is not comparable to routine care. Any dose or regimen was eligible. Trials on DOACs not approved in the European Union before 2020 (e.g., ximelagatran, darexaban, or letaxaban) were excluded.
As comparator, we accepted any active control such as conventional anticoagulation treatment, and no treatment, or placebo treatment. Furthermore, additional antithrombotic treatment in combined regimens (i.e. antiplatelet therapy in addition to warfarin) had to be the same in all groups, so that the groups only differed regarding DOAC treatment.
Outcomes
We prioritized all-cause mortality, all-cause hospitalization, and major or clinically relevant bleeding (MCRB) as primary outcomes (critical outcomes in GRADE). Secondary outcomes were any adverse event, discontinuation due to adverse events, renal failure, delirium, and falls (important outcomes). In addition, we extracted data on bleeding according to organ system classification.
We did not consider stroke or systemic embolism because we expected that the effectiveness of DOACs for reducing stroke is stable across age groups [24, 25] and consequently the subgroup effect of age would not shift the benefit-risk ratio.
Types of studies
Only RCTs or subgroup analyses of RCTs on the relevant age group were eligible.
Publication status
We only included trials published in English or German or with data available in an English language trial registry.
Information sources
The identification of relevant literature comprised two stages.
First, we screened the titles/abstracts of the references of all systematic reviews included in an overview previously prepared by the research group of one member of our review team [24].
Second, we updated the electronic literature searches used in the aforementioned overview. For this purpose, MEDLINE, MEDLINE in Process, and Embase (all via Embase) were searched for studies published from 1st June 2014 onwards. We ran the last search on 9th November 2020.
In addition, we searched the reference lists of all included RCTs and systematic reviews on the same topic. Moreover, we searched ClinicalTrials.gov for ongoing and unpublished trials on 30 June 2020.
Search strategy
The search strategy was prepared by an experienced information specialist in collaboration with clinical experts. The full search is presented in supplement I. The search was limited to English and German. In addition, we limited the search to articles and reviews (i.e., excluded conference abstracts) and excluded case reports, in vitro studies and animal experiments. The search included a search filter for the elderly, a modified generic search filter (in addition to specific terms such as bleeding or mortality) for adverse events and a validated search filter for RCTs [26-28]. The search strategy was reviewed by a second person using the PRESS-checklist and validated by checking if clearly eligible RCTs already known would have been identified [29].
Selection process
Two reviewers independently screened the titles and abstracts of all records identified by the literature search. Next, full-text articles of potentially relevant reports were retrieved and assessed for compliance with the eligibility criteria by two reviewers independently. Disagreements between reviewers were resolved by discussion until consensus.
Multiple reports of the same RCT were merged, so that each trial is the unit of analysis. The study selection process was summarized in an updated PRISMA flow diagram [42].
Data collection process
Descriptive data were extracted by one reviewer and checked for accuracy by a second reviewer. Two reviewers independently identified relevant outcome data by marking the section in the relevant source. Subsequently, one reviewer extracted the data, and a second reviewer checked its correctness. All disagreements were resolved in discussions until consensus.
In case of missing data or inconsistent data on primary outcomes in different sources, we contacted the corresponding author by e-mail.
Data items
Supplement II lists all items for which we extracted data.
We extracted data on outcomes for the last available follow-up, i.e. the longest observation period.
Supplemental to the outcome data, we extracted data on within study subgroup analyses. We only extracted data if the relevant subgroup analysis was pre-specified and a test of interaction was used to quantify the statistical certainty of the subgroup effect [30].
Study risk of bias assessment
We assessed the risk of bias with the revised Cochrane risk-of-bias tool for RCTs (RoB 2 tool) [31]. The RoB 2 tool provides a framework for assessing the risk of bias for one particular outcome that is for each outcome separately.
Effect measures
All considered outcomes were dichotomous. We extracted relative risk ratios from regression analyses (e.g., hazard ratios from a survival analysis) with 95% CIs. If these were not available (e.g., data from trial registries), we extracted raw data on events and number of participants for each group and calculated relative risks.
Synthesis methods
Statistical synthesis method
We pooled data only if RCTs were sufficiently clinically and methodologically homogenous and the p-value of the statistical test for heterogeneity was >0.05. To describe statistical heterogeneity, we calculated prediction intervals and I-square.
We pooled adverse event data separately for each comparator (VKAs, Aspirin only, Placebo) and dose because we assumed, they would have different risks, in particular for bleeding. We calculated systemic adverse events across AF patients (AF-only patients) and AF patients who had a percutaneous coronary intervention (AF-PCI patients), provided the patients were clinically comparable otherwise (e.g. renal function, comorbidity).
Mortality and hospitalization are composite outcomes, to be concrete measures that combine benefits (e.g. stroke reduction) and harms (e.g. bleeding). Therefore, for mortality and hospitalization we combined different comparators because we were interested in the net benefit of DOACs compared to all possible treatments that are applied in routine care. Moreover, we pooled mortality and hospitalization separately for AF and AF-PCI patients because the benefits of DOACs (e.g., stroke prevention) likely differ between AF and AF-PCI patients.
We derived the log standard errors, which are necessary for meta-analysis from the 95% confidence intervals (95% CIs). If more than one distinct subgroup for older adults was available (e.g. 65-74 years and ≥75 years), we pooled the results within one RCT using fixed effect meta-analysis. To combine different RCTs, we performed inverse variance random effects meta-analyses using the Hartung-Knapp method and the Paule–Mandel heterogeneity variance estimator [32, 33]. For outcomes for which only sparse data were available (event rate <5%, zero event studies, less than four RCTs in meta-analysis) we planned to use beta-binomial regression models for sensitivity analyses [34, 35].
We used the R-Package Meta in R 9.4 for the meta-analyses [36]. In case of heterogeneity, we synthesized results across RCTs presenting range of effects of the point estimate of the relative risk ratio.
Subgroup analyses for exploring heterogeneity
We expected that our primary analyses would be mainly based on data from subgroup-analyses, and we had therefore not planned to perform subgroup analyses. However, in some meta-analyses there was statistically significant heterogeneity, and therefore we performed post-hoc subgroup analyses on study level according to agent.
Sensitivity analyses
We planned to perform a sensitivity analysis excluding RCTs at high risk of bias in the randomisation domain.
Reporting bias assessment
We planned to assess publication bias by visual inspection of funnel plots for asymmetry, if at least 10 trials for each outcome were available.
We expected adverse events and mortality to be assessed in all RCTs. We considered RCTs/publications specifically on older adults in which mortality, overall adverse events, or discontinuation due to adverse events were not reported (and for which we got no information in response to author requests) susceptive for reporting bias. Bias in selection of the reported results within one trial is a domain of the RoB2 tool (see above). In the RoB2 assessment, we compared the list of outcomes reported in the protocols or methods section with the outcomes reported in the published paper.
Certainty of evidence assessment
We rated the certainty of the body of evidence using the GRADE approach (Grading of Recommendations, Assessment, Development and Evaluation). In the GRADE system evidence from RCTs starts as “high-certainty” and the following criteria are applied for downgrading the certainty of evidence by one or two levels [37]:
- Risk of bias
- Imprecision
- Inconsistency
- Indirectness
- Publication bias
The rating of these criteria leads to four levels of the certainty of evidence for each of the prioritized outcomes [38]:
- High-certainty evidence: the review authors have a lot of confidence that the true effect is similar to the estimated effect.
- Moderate-certainty evidence: the review authors believe that the true effect is probably close to the estimated effect.
- Low-certainty evidence: the review authors believe that the true effect might be markedly different from the estimated effect.
- Very low-certainty evidence: the review authors believe that the true effect is probably markedly different from the estimated effect.
One reviewer judged the certainty of the evidence and a second reviewer verified the assessment. Disagreements were resolved by discussion until consensus.
The certainty of evidence and results are presented in 'Summary of Findings' (SoF) tables [39]. The SoF tables were prepared using GRADEpro GDT [40]. For estimating the absolute effect, we used absolute risks for the control group based on publications thought to be representative for routine care in western countries [15, 16, 18]. If we could not find a suitable publication for one outcome, we used the risk of the comparator group of included RCTs.
To report the findings in consideration of the certainty of evidence, we used the standardized informative statements suggested by the GRADE working group [41].
The certainty of evidence is expressed with the following statements:
- High-certainty: reduces/increases outcome
- Moderate-certainty: “likely/probably” reduces/increases outcome
- Low-certainty: “may” reduce/increase outcome
- Very low-certainty: the evidence is uncertain