2.1 Data extraction and criteria.ClinicalTrials.gov is the largest clinical trial database worldwide and contains 400,873 research studies in all 50 states and 220 countries as of December 2021. On December,1,2021, we queried ClinicalTrials.gov for gastric cancer listed in phase II/III. The search term included “Stomach Cancer”, “Stomach Neoplasms”, “Gastric Cancer”, “Gastric Neoplasms”, “Gastric Carcinoma”, “Stomach Carcinoma”, “Gastroesophageal Junction Cancer”. We included trials evaluating multiple cancer types.
According to the United States Food and Drug Administration (FDA) and International Committee of Medical Journal Editors (ICMJE) trial registration requirements, we excluded trials that start prior to 01/01/2007. We also excluded trials starting after 12/01/2020 to allow trials to have at least 12 months to enrol participants before our analysis. We dropped trials with “not yet recruiting”, “suspended”,“withdrawn” or “unknown” status because we were not sure about their actual termination status.Trials that failed to provide an anticipated accrual number or a start date were also dropped from our dataset.Following these criteria, we included 567 trials in our dataset. All clinical trials information and characteristic are downloaded and extracted from ClinicalTrials.gov XML files.
2.2 Data cleaning and trial characteristic classification.We categorized clinical trial status “recruiting”, “enrolling by invitation”, “active, not recruiting” as “active”.We summarized the reasons behind the status for clinical trials of terminated status. We categorized them into nine reasons: safety reason, efficacy reason, ethical reason, trial no longer needed, business/sponsor reason, recruitment failure, logistic reason, PI left and no reason given.We defined trials terminated due to safety, efficacy, and ethical reasons, and trials no longer needed as terminated for good reasons.And, we defined trials terminated due to business/sponsor reason, recruitment failure, logistic reason, PI left and no reason given as terminated for bad reasons.We described trials terminated for good reasons as a substantive outcome in descriptive trial characteristic analysis.
If a trial has more than one recruiting centre in its clinical site record, we categorized it as a multicentre trial.Otherwise, we classified it as a singlecentre trial. If a trial has multiple recruiting centres located in more than one country, we labelled it as a multicountry clinical trial. Otherwise, we labelled it as a single country clinical trial.We considered trials listed as phase I/II in the record are phase II trials, and trials listed as phase II/III in the record are phase III trials.We found that the intervention type of some trials was mislabeled in the dataset.Our clinical experts reviewed each trial and categorized them according to their primary intervention type.In our study, each trial's anticipated accrual number roughly follows a normal distribution.But we found some outliers in the trial anticipated accrual number as some phase II studies would enrol less than 10 participants, and some phase III studies would enrol more than 5,000 participants. Thus, we decided to winsorize the top 1% and bottom 1% of the data.Trial duration was calculated from the actual start date to the actual study completion date or the termination date. For active studies, the trial duration was calculated from the actual start date to the date we downloaded files (Dec 1, 2021). Again, as extreme large phase III trials would consume significantly longer time than other trials, we considered winsorizing the top 1% of data.
2.3 Statistical analysis.We used the Kruskal-Wallis tests to compare median trial anticipated accural and trial duration across studies. Categorized variables like intervention type, phase, sponsor type were compared using chi-square tests.
We used Stata version 15 SE (Texas, USA) to compare the risk factors and likelihood of clinical failures. Each trial's trial duration was considered as the survival time in our model. Clinical characteristics in our study include phase, start year, treatment, sponsor type, single centre vs multicentre, single country vs multicountries, duration, anticipate accrual and status. Logistic regression models were used to estimate how trial characteristics are correlated with clinical trial failure. We used cumulative Kaplan-Meier survival curve to estimate the failure risk of trials in different times.