One of the basic analytic challenges within implementation science is to study and understand implementation within real-world, dynamic settings. Implementation of multifaceted interventions typically involves many diverse elements working together in interrelated ways, including intervention components, implementation strategies, and features of local context. Furthermore, boundaries between an intervention, its implementation, and its contextual features can prove difficult to discern in practice. [1, 2]
For researchers seeking to explain these complex relationships encountered in real-world settings, causal inference can play an important role. Since the mid-1980s, Configurational Comparative Methods (CCMs) have increasingly been recognized as effective methods for causal inference, especially in the social sciences. As Fig. 1 belows shows, the cumulative number of CCM-related publications listed in the core collection of the Web of Science [3] has dramatically escalated in recent years, with more total publications appearing during the three-year period from 2017–2019 than in the entire preceding 22-year period between 1995–2016.
CCMs have also started to make prominent appearances within the health services research and implementation science literatures. CCMs, for example, were used in a recent Cochrane Review to identify conditions directly linked with successful implementation of school-based interventions for asthma self-management [4]; featured as an innovative member of the mixed-methods repertoire in in a major methodological review in public health [5]; and applied to determine different pathways for federally-qualified health centers to achieve patient-centered medical home status [6].
CCMs are designed to investigate different hypotheses and uncover different properties of causal structures than traditional regression analytical methods (RAMs).[7, 8] Qualitative Comparative Analysis (QCA) is one kind of CCM that, to date, has been most frequently applied in implementation science and health services research. The purpose of this article is to introduce a new CCM to the implementation research community: Coincidence Analysis. Coincidence Analysis (CNA) is a mathematical, cross-case approach that can be applied as a standalone method or in conjunction with other methods (including RAMs) to support causal inference, and is available via the R-package cna.[9, 10, 11]
CNA offers a new cross-case method for implementation and health services researchers exploring causality when evaluating or implementing multifaceted interventions in complex contexts. Investigators applying CNA can conduct analyses across entire datasets to identify specific combinations of components and conditions that consistently lead to outcomes, and can be applied to large-n as well as small-n studies. Peer-reviewed, implementation-related work involving CNA has started to emerge, including podium presentations at major implementation conferences[12, 13]; methods workshops dedicated specifically to CNA[14]; published protocols[15]; and full-length articles in established journals[16].
CNA is a new comparative approach that can be used by the implementation research community to support causal inference, answer research questions about conditions that are minimally necessary or sufficient, and identify multiple causal paths to an outcome. We present this article in three parts. In part 1, we establish the theoretical foundation for CCMs, define CNA as a method within the CCM family and describe what CNA (and CCMs) uniquely offer. In part 2, we illustrate CNA by applying the method to a publicly available dataset that was originally analyzed using RAMs. In part 3, we offer guidance for reporting CNA design and results, and we discuss the limitations and challenges of CNA. In additional files accompanying this article we provide detailed descriptions of the steps and coding used to conduct the analysis [see Additional file 1] and the analytic dataset used [see Additional file 2] along with the R script [see Additional file 3] to allow for independent replication and validation of results.
Part 1: Laying the Theoretical Foundation for CCMs
Defining causal inference in CCMs. CNA is one method within a class of CCMs used to model complex patterns of conditions hypothesized to contribute to an outcome within a set of data. CCMs search for causal relations as defined by a regularity theory of causality, according to which a cause is a “difference-maker” of its effect within a fixed set of background conditions. More specifically, X is a cause of Y if there exists a fixed configuration of background factors F such that, in F, a change in the value of X is systematically associated with a change in Y. If X does not make a difference to Y in any F, X is redundant to account for Y and, thus, not a cause of Y. The most influential theory defining causation along these lines is Mackie's INUS-theory,[17] with refinements by Graßhoff and May[18] and Baumgartner.[19] An INUS condition of an outcome Y is an Insufficient but Necessary part of a condition that is itself Unnecessary but Sufficient for Y. To use a common example for illustrating INUS conditions: not every fire is caused by a short circuit—fires can also be started by, for example, arson or lightning. However, a short circuit in combination with other conditions – e.g., presence of flammable material and absence of a suitably placed sprinkler – is sufficient for a fire. In this example, the short circuit is an INUS condition: it is a necessary part of a sufficient condition for a fire. This particular causal path to a fire includes the combination of three specific conditions: presence of a short circuit, presence of flammable material, and absence of a sprinkler. All three of these conditions are difference-makers, for if one of them is missing, the fire does not occur along this causal path.
Regularity theories can be described using Boolean properties of causation, which encompass three dimensions of complexity. The first is conjunctivity: to bring about an outcome, several conditions must be jointly present. For example, in a study of high performance work practices and front line health care worker outcomes, Chuang and colleagues[20] found that no single high performance work practice was alone sufficient to produce the outcome of high job satisfaction. Instead, a configuration consisting of creative input, supervisor support, and team-based work practices together accounted for 65 percent of highly satisfied front-line health care workers.[20] Chuang and colleagues identified a second configuration that also led to high job satisfaction: supervisor support, incentive pay, team-based work and flexible work.[20] Both configurations resulted in high job satisfaction independently of each other. These configurations illustrate disjunctivity, a second dimension of complexity in which an outcome can alternatively result from multiple causal paths. The third dimension of complexity is sequentiality: outcomes tend to produce further outcomes, propagating causal influence along causal chains. For instance, high job satisfaction of health care workers may, in turn, promote patient satisfaction.[21]
Why use CCMs in implementation research? CCMs study different properties of causal structures than RAMs and thus are appropriate for exploring different types of hypotheses. RAMs examine statistical properties characterized by probabilistic or intervention theories of causation. In the probabilistic model, X is a cause of Y if and only if the probability of Y given X is greater than the probability of Y alone and there does not exist a further factor, Z, that explains (i.e., neutralizes) the probabilistic dependence between X and Y.[8] In the intervention model, X causes Y if intervention X changes the values of outcome Y controlling for other variables. The intervention theory of causation is counterfactual in that a case cannot simultaneously “receive” and “not receive” an intervention; instead, the intervention model maps possible values of Y onto possible values of X, focuses on how variables X and Y relate to one another, and generates average treatment effects over a population.[8]
Conversely, CCMs examine Boolean properties of the data as described by regularity theories of causation, according to which X is a cause of Y if and only if X is an INUS condition of Y (see INUS definition above).[8, 17] CCMs study implication hypotheses that link specific values of variables as “X = χi is (non-redundantly) sufficient/necessary for Y = γi.”[8, 11] In this way, CCMs, including CNA, model the effect of conditions (e.g., high degree of X) on outcomes. This is a fundamentally different vantage point than the one adopted by RAMs which examine covariation hypotheses that link variables. Further, CCMs are case-oriented methods, in which observations consist of bounded, complex entities (e.g., organizations) that are considered as a whole.[22] A case-based unit of analysis differs from the approach taken in RAMs, where cases are deconstructed into a series of variables, and estimates represent the net effect of a variable for the average case. Because CNA and other CCMs employ case-based analysis, they present opportunities for implementation and health services research questions in particular because these methods can be used to identify which interventions work in an array of contexts.
Different types of CCMs
While CCMs have a common regularity theoretic foundation, various types of CCMs rely on different a priori conceptions of outcome and causal factors and build causal models in different ways. For example, Qualitative Comparative Analysis (QCA), in its standard implementation that uses the Quine-McCluskey (QMC) algorithm,[23, 24] requires identification of exactly one factor as an endogenous outcome. It begins by identifying maximal sufficient and necessary conditions of the outcome, which are subsequently minimized using standard inference rules from Boolean algebra to arrive at a redundancy-free solution composed of INUS conditions of the outcome.[7] However, the QMC algorithm was not designed for causal inference. For instance, the absence of cases instantiating a potential causal model, also known as limited diversity, forces QMC to draw on counterfactual reasoning that goes beyond available data and sometimes requires assumptions contradicting the very causal structures under investigation.[25] Moreover, QMC has built-in protocols for ambiguity reduction when multiple solutions fit the data equally well. Potential solutions are often eliminated to reduce ambiguity without justification, which is problematic for causal discovery.[25, 26]
Advantages of using CNA. Coincidence Analysis (CNA) is a new addition to the family of CCMs.[27, 28] It uses an algorithm specifically designed for exploring causal inference, thus avoiding the problems mentioned above. In particular, it does not build causal models by means of a top-down approach that first searches for maximally sufficient and necessary conditions and then gradually minimizes them using the QMC algorithm. Rather, CNA employs a bottom-up approach that first tests single factors for sufficiency and necessity, and then tests factor combinations of two, three, etc.[10, 11] All sufficient and necessary conditions revealed by this approach are, by definition, minimal and redundancy-free.
Additionally, CNA is designed to treat any number of variables as endogenous and is therefore capable of analyzing causal chains, or common-cause structures.[29] Analyzing causal chains may be advantageous if, for example, intervention factor A occurs as a result of other factors but is not the ultimate outcome of interest. Identifying the full causal model, including which factors produce A on the path to the ultimate outcome of interest, is valuable when seeking to understand causal complexity. CNA is the only member of the CCM family that builds and evaluates models representing causal chains.
Part 2: Demonstrating CNA using Publicly Available Data
Data source
In March 2016, Rehn and colleagues reported the impact of implementation strategies on human papillomavirus (HPV) "catch-up vaccination" uptake in Sweden among 5th and 6th grade girls.[30] The purpose of the original study was to estimate the impact of various information channels and delivery settings on county-level catch-up vaccine uptake to inform future vaccination campaigns in Sweden.
The authors obtained county-level data on catch-up vaccinations and the eligible population from administrative data. They collected implementation strategies from county health care offices via an open-ended questionnaire emailed in 2012 asking respondents to list and describe “information channels” used to reach eligible girls and the settings in which they offered the vaccine. A subsequent phone interview was conducted in 2014 to update the lists.
Rehn and colleagues used regression analysis to estimate county-level catch-up vaccine uptake as a function of information channels and delivery settings. The authors concluded that the availability of vaccines in schools explained differences in county-level vaccine uptake; no information channels were found to make a difference in uptake.
Rehn and colleagues defined the outcome and predictor variables as follows:
Outcome variable. County-level catch-up vaccine uptake was defined as the percent of eligible girls born between 1993–1998 who received at least one dose of vaccine by 2014.
Predictor variables. Ten variables represented information channels and four variables represented the delivery settings where the vaccinations were available (some schools, all schools, primary health care centers, and other health care centers). All 14 factors were dichotomized with values of 1 (present) or 0 (absent).
All county-level data on vaccine uptake, information channels and delivery settings used for the CNA illustration were reported in the article.