The included trials varied in intervention design and participant characteristics, which improved external validity and the ability to perform subgroup analyses. The decision to collaborate in a PMA led to greater outcome harmonisation with 18% of outcome categories included in all trials prior to PMA-inclusion versus 91% after PMA-inclusion, and a 54% increase in number of collected outcome categories. However, on a more detailed level when assessing specific outcome measures, only 7% of measures were identical across all trials. While individual trials had limited power to detect the observed intervention effect for the primary outcome, combining their data substantially increased the statistical power.
Strengths and Limitations
To our knowledge, this study is the first to quantify the advantages of a PMA in increasing data harmonisation. We made use of a range of data sources, and the extensive records documenting the planning and conduct of the individual studies and the PMA. The availability of registry records that recorded which outcomes the trials had planned to collect prior to inclusion in the PMA enabled a comparison of outcomes prior to, and post, PMA-inclusion.
The main limitation to this study is that the registry records that were used to measure outcome categories before PMA inclusion were less detailed than the variable maps used to measure outcome categories after PMA inclusion. This difference in level of detail provides an alternative explanation for the greater extent of outcome harmonisation and the increase in collected outcome categories. Potentially, trials did not record all outcomes they planned on collecting in their registry records. However, there are three reasons why this is an unlikely sole explanation for the large observed increase in outcome harmonisation. Firstly, we used broad outcome categories to quantify outcomes to account for less detail in the registration records. Secondly, a major aim of prospective trial registries is to record all outcomes that trials plan to collect,(29) and most trial registries permit large numbers of outcomes to be recorded.(30) Thirdly, no new outcome categories needed to be created to code the outcomes the trials collected after inclusion in the PMA. That is, all additional outcomes the trials collected after inclusion in the PMA fitted into the pre-existing outcome categories derived from the registration records of at least one of the other trials. This suggests that the observed additional outcomes categories were collected in response to outcome categories collected by other trials in an effort to increase outcome harmonisation, and are not artefacts of different levels of detail.
Interpretation of findings
One of the main differences between PMA and multi-centre trials is that in a PMA individual participating sites have greater autonomy.(9, 12) Trials in a PMA aim for a high level of data harmonisation, without complete outcome standardisation, across all trials as would occur in a multi-centre trial. In the PMA used in this study, this greater independence resulted in substantial variability between intervention design, timelines and participant groups across trials. This has the advantage of a heightened external validity; with results being more generalisable as they are not restricted to one particular centre, intervention or population group.
While this variability in trial design is desirable to some extent, outcome harmonisation in a PMA ensures the ability to conduct meaningful combined analyses.(8) Our results clearly show how outcome harmonisation improved after the decision to collaborate in a PMA was made, with outcome category harmonisation increasing from 18% to 91% of outcome categories being collected by all trials. This increase in harmonisation led to an increase in the amount of data that were collected by each trial, resulting in slightly increased resources required for data collection by the individual trials than they had originally planned. Yet, the resulting increase in total combined data availability enabled us to answer many more research questions than would have been possible without the PMA data harmonisation process.
The increase in statistical power to detect treatment effects constitutes one of the main advantages and reasons for synthesising evidence. Increasing sample size strengthens the chance of detecting effects, and it enables us to determine the size of these effects with greater certainty.(31) Increased outcome harmonisation directly translates into more outcomes being available for combined analyses, and thus, greater power to detect potential treatment effect differences.
However, while we succeeded in improving outcome category harmonisation across trials, there were still residual differences in how these outcomes were operationalised, reflecting a problem in the specificity of the data harmonisation process. When looking at the outcome measures assessed within outcome categories only 7% were identical in all trials. Figure 2 shows the different levels of specificity that can be used to describe outcomes, and the importance of a high level of specificity for outcome harmonisation. For some outcomes, the way they were to be measured was not pre-specified in sufficient detail. This led to trials choosing different measures or tools for the same outcome category, and the data managers at the central data collection centre had to find ways of converting these measures to common outcome variables. For example, some trials assessed sleep duration by asking ‘What time does your child usually go to bed at night?’, ‘What time does your child usually wake up in the morning to start the day?’ and ‘How often and how long the child usually wakes up at night?’, whilst other trials simply asked ‘About how many hours and minutes does your child usually sleep in total during the night?’. Whilst these different measures can both be used to derive the same outcome of ‘sleep duration’, there was a significant computational effort associated with the derivation, and it is possible that the trial which took into account ‘waking up at night’ time, systematically led to lower total sleep duration estimates, and potentially unintended increased heterogeneity.
Yet, in a PMA it is not expected nor desirable to have 100% harmonisation across outcome measures. One reason for this is that within a PMA trials can have additional focus areas that are only covered by an individual trial, and do not have to be assessed by all trials – this is a major desirable feature and stands in contrast to multi-centre trials in which all trials usually collect the same information. For instance in the EPOCH PMA, the Poi.NZ trial had a particular focus on sleep, and collected over 50 additional variables related to sleep, including the resource intensive use of an accelerometer. It was not desirable nor feasible for all trials to collect these extra outcomes measures, but instead, all the other EPOCH trials added a few common core measures of the outcome category sleep to their data collection forms.
Recommendations for future PMAs
Whilst increased outcome harmonisation enables greater data synthesis and improved statistical power to detect intervention effects, this needs to be balanced against unnecessary collection of data if this leads to undue burden on participants and research waste. For future PMAs, we recommend careful consideration and extensive dialogue about the amount of core common data that is necessary and desirable to answer all relevant research questions as early as possible in the planning phase of both the PMA and the participating individual trials. If already existing, agreed core datasets(32) within particular specialities should be the basis for these decisions.
To avoid differences in how common outcomes are measured and operationalised across trials, we recommend that future PMA collaborations be more specific a priori regarding how they plan to measure common outcomes at different levels, as displayed in Figure 2. For example, for the outcome category ‘breastfeeding’, the measurement tool may be a self-reported questionnaire asking “Has your child ever been breastfed?”. To ensure all trials collect this measure consistently, the outcome would need to be defined very explicitly. In this case, ‘ever been breastfed’ may be defined as the infant having received breast milk even just once, including putting the infant to the breast to feed or giving expressed breastmilk. Consistent definitions and measurement methods for common outcomes greatly enhance the ability to synthesise data and reduces the amount of recoding and cleaning necessary to achieve this. Each trial is nonetheless able to collect additional trial-specific outcomes for their own purposes.