Datasets
The MPN-RC trials accrued a total of 269 patients from 2011-2016 and has been previously reported.10 The financial toxicity study accrued a total of 116 patients from 2019 to 2021. The CAR-T Quality of Life study accrued a total of 67 patients from 2018 to 2020. Completion rates in these studies ranged from 70.8-100%.
A representative table of PRO Completion from the MPN-RC study is shown in the appendix (Supplemental Table S1). Such a table is important to account for each participant on the trial, whether they are expected or not expected to complete PROs and when PROs were completed. This table also displays intercurrent events such as deaths, progression and discontinuations due to adverse events or other reasons which excluded patients from consideration in point estimates per our selected estimand (Supplemental Table S2).
Longitudinal graphics of physical function
Stacked bar charts
The stacked bar chart is a concise but comprehensive graphical method that can be used to display scale data such as physical function domain results (Figure 1, Panel A) or response distribution to individual items (Figure 1, Panel B). These bar charts represent the distribution of change from baseline scores or responses among all patients who completed the survey at each time point. Bar charts representing all patients who were expected to respond were also produced; however, interpreting such bar charts was difficult because the height of the bars was impacted by the proportion of patients who did not complete surveys. To display continuous scales (Figure 1A), meaningful categorization (e.g. improved, maintained, declined) was needed. Since the original nature of the score was continuous (0 to 100 scale), categorizing patients as “improved,” “maintained” or “declined” may be arbitrary and can potentially misclassify patients into the wrong health state. When using stacked bar charts, categories should be clearly defined and justified.
Individual items responses (Figure 1B) are easier to display in a stacked bar chart because responses are already categorized. Here the physical function tasks being queried in each item have been ordered from top of the figure to bottom to represent least impairment (ex. difficulty completing a strenuous activity) to most impairment in physical function (ex. difficulty with eating, dressing and washing). Display of the individual items can help to communicate more tangible impairments to patients and clinicians to aide in interpretation of bar charts based on scale scores only.
In bar charts, the “declined” category appears at the bottom to allow for visual inspection of changes over time most easily in the proportion of patients in this category. An alternative visualization would be to split the “declined” category below the axis and “improved/stable” above the y-axis (as in Supplemental Figure S1).
In these and all graphical representations, the number of patients completing surveys, the number of patients expected to complete surveys, and the completion rate (PRO Completed / PRO Expected) at each time point are displayed. Such accounting is important to communicate the population included in the visualization and delineate patterns of missing data.
In bar charts and other applicable representations, the color palette was selected to be complaint with the Americans with Disability Act (ADA) and to be colorblind accessible. Contrasting colors as opposed to shades were chosen to be easy to discriminate. Reddish/orange colors were typically selected to represent unfavorable categories (ex. “declined” group in scales) while colors in the blue/green range represent more favorable outcomes (ex. “not at all” response in individual items). Color palettes were consistent with those currently employed in FDA’s Project Patient Voice. Palettes were tested using the colorblind R package, which allows you to simulate what figures would look like to a person who has a color-vision deficiency.21
Line plots
The line plot depicts a readily comprehensible trajectory of physical function that can identify temporal patterns at the group level with arithmetic (raw) means of scale scores in each group (Figure 2A) or highlight changes from baseline zeroed scores (Figure 2B). Differences can be identified with confidence intervals and reference lines, and arrows to signify directionality of declines or improvements. Questionnaire completion information is included for each timepoint. While simple depiction of trajectory is the strength of this graphic, it does not communicate the individual patient experience.
Stacked bar chart & line plot combination
Interpretation of trajectories together with responses to single items may help more concretely elucidate what may be driving patterns of physical function and other PROs reflecting tolerability. In Figure 3, a dip in FACT-G Total Score is more pronounced in patients receiving autologous transplant than CAR-T, and in both groups tracks with responses to the FACT GP5 question. Similar analysis can be completed with physical function scales, as shown in Supplemental Figure S2 in which a line plot of responses over time to QLQ-C30 physical function scale is shown along with responses to the question “Have any trouble doing strenuous activities, like carrying a heavy shopping bag or suitcase?” These plots allow the viewer to integrate the pattern in the scaled scores with the pattern in individual item responses. This achieves increased granularity and an understanding of features that might be driving group-level changes.
Pie charts and waffle plots
Pie charts offer a familiar and easily interpretable representation of statistics to patients and clinicians. In Figure 4, physical function T score is depicted at a single pre-defined timepoint of 6 months to give a sense of how a patient who remained on the cancer therapy for 6 months might be functioning at that time point. The cubes in the waffle are scaled and numbers representing the percent in each category of “improved,” “maintained” or “declined” further clarify that in this study, most patients had maintained their physical function at 6 months on treatment. In an alternate approach, waffle plots and pie charts can depict a data summary of physical function percent over a certain time frame (Supplemental Figure S3). While this plot seemingly describes data in a succinct fashion, defining the categories of “improved” and “declined” over several intervening points is not straightforward and may pose challenges to interpretation of the plot (e.g., “improvement” requires improvement at all observed time points, while “decline” only requires a single observation of worsening).
Waterfall plot
A waterfall plot depicts individual patient trajectories as each line represents a single patient’s experience. It is easy to appreciate in the waterfall plot (Figure 5) that while most patients had stable physical function, few experienced a large improvement and a fair number of patients on this study, particularly in the hydroxyurea arm, experienced a worsening of physical function after six months on therapy. This plot is a combination of response plus magnitude, providing the group level proportions (e.g., percentage in each category of improvement, stable, and worsening) but also showing the granular experience of individual patients. It is advantageous for detail it can provide on each patient. However, comparison between arms may be difficult to the eye on larger studies with more subtle differences.
Stakeholder feedback: Clinicians and clinical investigators
Structured feedback on graphics was collected from health outcomes researchers in the National Clinical Trials Network (NCTN) Alliance for Clinical Trials in Oncology (Alliance). Respondents from the Health Outcomes Committee (17 meeting participants) largely identified as clinicians (36%) but also included clinical or translational researchers (27%), statisticians (5%), Alliance or other administrators (14%), industry representatives (5%) and a patient advocate (5%). This group demonstrated a preference toward the pie chart (38%) as opposed to the waffle plot (33%) though a substantial number indicated they liked them equally (24%) and some indicated they disliked both (5%). When shown line plots, the group preferred group mean changes from baseline (44%) over group means over time (28%), with some preferring to see both options (28%) and no one indicating a dislike for both. Respondents demonstrated a slight preference toward the line plot (42%) over the stacked bar chart (37%), but many indicated liking both equally (16%) while a minority disliked both (5%). When asked which data was preferred for representation in the pie chart – a longitudinal data summary over 6 months versus a cross-sectional summary at 6 months – there was a substantial preference for the longitudinal approach (47%) over the cross-sectional measure at 6 months (18%), with a substantial number of respondents indicating they like them both equally (35%) and no one disliking them both. When shown the waterfall plot displaying cross sectional physical function scale data at 6 months compared to a pie chart displaying similar data, there was a preference toward the waterfall plot (60% waterfall plot, 25% pie chart, 10% liked equally, 5% disliked both).
Stakeholder feedback: Patient advocates
Feedback on these graphics was additionally collected from patient advocates in the NCTN Alliance Patient Advocate Committee. 14 meeting participants were patient advocates (90%) with a few 10% other categories. This group demonstrated a preference toward the pie chart (54%) as opposed to the waffle plot (15%) though a substantial proportion indicated they liked them equally (23%) and few disliked both (8%). When shown line plots, the group preferred seeing both the group mean changes from baseline and the group means over time (46%) and an equal number preferred the group means over time (15%) and group mean changes from baseline (15%). A substantial amount indicated they disliked both (23%). Respondents demonstrated a preference toward the stacked bar chart (58%) over the line plot (17%), but many indicated liking both equally (17%) while a minority disliked both (8%). When asked which data was preferred for representation in the pie chart – a longitudinal -data summary over 6 months versus a cross-sectional summary at 6 months – there was an equal preference between cross sectional measure at 6 months (33%), the longitudinal approach (33%), and liking them both equally (33%), and no one disliking them both. When shown the waterfall plot displaying cross sectional physical function scale data at 6 months compared to a pie chart displaying similar data, there was a preference toward the waterfall plot (57% waterfall plot, 36% pie chart,7% liked equally, 0% disliked both).
Patient advocate and clinician/clinical investigator preferences are summarized in Supplemental Table S3. Additional qualitative feedback from the patient advocates was collected during a discussion following the Zoom poll; comments from the patient advocates on these graphics is described in Supplemental Table S4.