Our study shows that, generally speaking, trial teams spend about three times more of their data collection time collecting secondary outcome data than they do collecting primary outcome data. Or put another way, for every hour spent collecting primary outcome data, three hours are spent collecting secondary outcome data. We didn’t find any clear suggestion that using a core outcome set increases this ratio. Public health trials balance their data collection time between primary and secondary outcomes more evenly although total data collection time, and therefore cost, is noticeably greater than for other trials.
Allocating most data collection effort to secondary outcomes is not a problem if a) this work is justified given the known information needs of the intended users of the results and b) this activity is planned and budgeted for. Literature and experience suggest that these criteria are not always met. Heneghan and colleagues8 cite numerous examples of the lack of relevance of trial outcome data to patients and other decision-makers, including the use of surrogate and composite outcomes. The outcomes trialists choose are not always the ones patients consider most important15. Recent work by Trial Forge comparing the outcome 44 trial teams chose as their primary outcome with what patients and healthcare professionals ranked as the most important outcome found that they agreed just 30% of the time16. That many trial teams struggled to tell us how long it took them to collect their data strongly suggests that resource planning and budgeting does not explicitly account for data collection workload. Allocating most data collection effort to secondary outcomes is rarely a considered judgement, it just happens.
We might accept some of this if all trials finished on time, to budget and published all their data but we know this is not true. Recruitment and retention problems are widespread17,18, the cost of trials is escalating1 and substantial amounts of data never make it into the public domain6. A study evaluating trials funded by the UK’s National Institute for Health Research Health Technology Assessment Programme, a highly competitive funding stream, found that between 1997 and 2020, 128/388 (33%) of trial teams needed to extend their recruitment period17. Only 207/388 (53%) reached 100% of the original recruitment target and median retention for the primary outcome was 88%. There were no data on secondary outcome retention, but it was probably lower. Even experienced teams need to keep a laser-focus on the outcomes that are most important to decision-makers; the operational challenges of trials mean there is no room for ‘nice-to-haves’.
Trial burden was flagged in a 2019 multi-stakeholder James Lind Alliance Priority Setting Partnership on unanswered research questions in trial retention19. In this priority setting process, how to reduce burden on staff and participants was ranked the 3rd most important question, after how to make better use of existing data (2nd) and what motivates a participant to complete a trial (1st). The current study identified a total of 918 outcomes in just 161 trials, ranging from trials that collected a single outcome to one collecting 30 (Trial ID 152, a Phase III drug trial). The burden that data collection often represents for staff and participants, is likely to be a central factor in these questions getting their top 3 positions.
Staying with workload, it is worth remembering that ORINOCO focuses on data collection time, not total data collection effort. We do not know what spending three times as much time on secondary outcomes as on primary outcomes means for total data collection effort. We can, however, guarantee that total workload will be more than the time we quantify in this article. A good example of the distinction between data collection time and total data collection effort is the use of routine electronic medical record systems. If data exist in routine electronic systems, data collection time is 0 and data will instead be transferred from a data controller to the trial team for analysis. Achieving this transfer can be challenging. A recent UK study in two trials, Add-Aspirin and PATCH, found that it took 13 months for Add-Aspirin to receive data from the National Cancer Registration and Analysis Service and 15 months for PATCH to receive data from NHS Digital20. In another example, changing interpretation of information governance regulations by data controllers meant that the team behind the EPOCH trial were unable to gain access to post-discharge hospital data in Wales and had to change their primary analysis because of it21. Even in the extreme case of ‘Time to collect’ being 0, it is still important to consider total data collection effort. It will not be 0.
Core outcome sets– outcomes that should always be collected for a given type of trial –could help because they are developed using formal methods of patient and other stakeholder involvement to choose outcomes that do matter9. Our sample of trials using a core outcome set is small and the findings therefore far from definitive. However, they do at least suggest that for rheumatoid arthritis trials, using a core outcome set does not generally increase data collection workload compared to other Phase III trials. Core outcome sets remain a minority choice, only 2% of all trials use them,22 although use in rheumatoid arthritis, the area we chose, is much higher with 82% of rheumatoid arthritis trials found to use a core outcome set in a recent systematic review23. Increased data collection workload does not appear to be an argument against using core outcome sets.
The picture for public health trials is different to that for the other trials. As Table 1 shows, Public health trials focus proportionately more time on primary outcomes than Phase III and Core outcome trials but overall spend much more time on data collection. Public health trials generally have more participants and fewer secondary outcomes than the other trials in our sample, a combination that is likely to make data collection more expensive but more focused on primary outcomes.
Strengths and limitations
There are a number of limitations. First, our sample of 161 trials is big but not very big. We were unable to get timing data for all trials. For Phase III trials this was perhaps not such a problem, but the number of Core outcome set and Public health trials was always modest, and then got smaller. Second, the Core outcome set trials all looked at a single clinical area: rheumatoid arthritis. We are more confident about the general picture for Phase III trials than we are for the two other trial types.
Third, it was difficult for many trial teams to say how long their data collection took. This is perhaps the greatest limitation because it means the timings we provide are sometimes uncertain. We think it is also something of a strength because it underlines our point that trial teams do not routinely give a great deal of thought to how long it will take to collect their trial outcomes. Even when teams have given it thought and were able to tell us how long each outcome took to collect, we did not find a single study that reported this in the trial publication. Given the well-known practical challenges of trials, it seems odd that there is no explicit attempt to assess the workload generated by the outcomes selected. Fourth, our calculation of total time assumed 100% retention at every measurement point, which is of course unlikely and means our times may be overestimates. This retention assumption does, however, reflect the maximum data collection workload a trial team has committed to in its design, which we think is worth knowing. Strengths are that we haven’t seen data similar to the data we present before and all the included trials are recent: none were published earlier than 2015. The problem we highlight– that the majority of data collection effort is dedicated to the less important outcomes – is not a fading piece of history but very much relevant to the here and now.
Implications for practice
This is simple: we think trial teams should routinely consider the work involved in collecting the data their selected outcomes need. They should then look at how this effort is distributed and decide whether for some outcomes that effort is not worth it given the relative importance of the outcome. That latter judgement about importance needs the explicit view of the intended users of the trial results, often patients and health professionals. Using a core outcome set will give greater confidence that selected outcomes are important. The trial budget should directly reflect the data collection workload. Our experience with ORINOCO has been that trial teams often have little or no idea how long their data collection took. This has to increase the chance of a mismatch between workload and available resources. Finally, trial teams could start to report how long data collection took, which would help other trial teams plan their own data collection.
Implications for research
Our database (https://osf.io/FNB3E/) is the starting point of a Time to collect tool that can help trial teams to estimate the data collection workload for their trial. Collecting more timing data would improve this tool, especially if done prospectively rather than retrospectively as was the case for our timing data. It would also be useful to know whether having a better idea of data collection workload during trial design does influence design decisions and budgeting. For example, knowing that a minor secondary outcome will demand substantial amounts of time (and therefore money) to collect ought to put a question mark next to inclusion of that outcome. Whether that is what happens would be interesting to know.
Core outcome sets suggest what is important to measure but not how to measure it. We think it would be useful for trialists to be more explicit about how to measure core outcomes. The configuration of some core outcome sets may be more efficient than others and it would be useful for trial teams (and funders) to know this when planning their trials. Finally, more work on the impact on total data collection effort of using routine data would be welcome. An additional challenge for trial teams will be that some ‘standard’ data collection items, such as health-related quality of life for cost-effectiveness evaluations, are unlikely to be available in routine data, which limits the scope for Time to collect reductions. Our data (see Supplementary files) do contain trials that used routine data but we have not looked at these systematically.