Data source
This retrospective cohort study used the nationwide Flatiron Health electronic health record– (FEHR-) derived deidentified database. This is a real-world longitudinal database that comprises deidentified patient-level structured and unstructured data, which were curated via technology-enabled abstraction [9, 10]. The database originated from approximately 280 US cancer clinics (~ 800 sites of care). The majority of patients were from community oncology settings, and relative community/academic proportions may vary depending on study cohort.
Our cohort (N = 9386) included patients with early-stage BC diagnosed between January 2011 and May 2020. Institutional Review Board approval of the study was obtained prior to study conduct and included a waiver for informed consent. The data were deidentified and subject to obligations to prevent reidentification and protect patient confidentiality. The dataset included patient-related and tumor-related variables, including epidemiological, clinical, and pathological data as well as age at diagnosis, race, ethnicity, menopausal status, tumor size, grade, Ki-67 (%), number of lymph nodes involved, chemotherapy, and ET histories. HRs and human epidermal growth factor receptor 2 (HER2) expressions were assessed locally in accordance with standard immunohistochemistry nuclei staining by a local pathologist. Expression of HRs and HER2 was measured as the percent of nuclei staining and membrane staining, respectively. HER2 amplification was assessed according to locally assessed fluorescent in situ hybridization.
Recurrence-free survival (RFS) was defined as the time in months from the date of first treatment to the date of diagnosis of metastasis, first local recurrence, or death, whichever occurred first; patients without these data were censored at the last date known alive. Overall survival (OS) was defined as the time from the date of first treatment to the date of death; patients without these data were censored at the last date known alive.
Real-world cohort key inclusion and exclusion criteria
Inclusion criteria used to select cases from the FEHR BC cohort included having a diagnosis of histologically proven stage I-III BC and receiving surgical treatment for BCs with curative intent. Exclusion criteria were having carcinoma in situ, metastatic BC and having HER2+ BC. Patients who received chemotherapy before or after surgical treatment were classified as having received perioperative chemotherapy. Information on perioperative chemotherapy, ET, and radiation therapy was abstracted. Because of the diversity in the number of chemotherapy regimens, these regimens were classified as anthracycline- and taxane-containing, anthracycline-only and taxane-only regimens, and other. Patients were followed from the date of resection (index date) until the date of biopsy-proven tumor recurrence, death, or last follow-up. Patients with > 90 days between diagnosis and first Flatiron Health–reported structured activity were excluded to avoid missing treatment data. Furthermore, any patient with < 90 days of follow-up from the index date was excluded.
Statistical methods
Among 9386 early-stage BC patients, 3490 had missing or incomplete ER, PR, or HER2 status and/or did not have at least 90 days of follow-up. A further 1299 patients had HER2+ BC and were also excluded from the final analysis, leaving 4697 evaluable patients (634 with TNBC and 4063 with ER+ BC).
Initially, descriptive characteristics were described by using means (standard deviations) and medians (ranges) for continuous variables and frequencies for categorical variables. The ER+-low and ER+-intermediate cohorts were compared using the Kruskal-Wallis test for continuous variables and Fisher exact test for categorical variables.
A training dataset and test dataset were established by dividing the ER+ dataset into two-thirds training and one-third test data. The range of ER percent staining values was divided into 10 levels, with 10% increments. A cut point analysis was performed on the training dataset using the Contal and O’Quigley method [11], which enabled us to assess the optimum cut point of ER+ staining, which was correlated with RFS using Cox proportional hazards regression. Because the analysis did not identify a cut point that was adequately and significantly differentiated from other cut points, we used 20% as a clinically reasonable cut point (P < .0001). New exploratory ER+-low and ER+-high groups were defined by this cut point (1%-19.9% vs ≥ 20%).
We then performed Kaplan-Meier survival curves for RFS and OS for both the training and test sets. Furthermore, a multivariable Cox proportional hazards survival analysis was used on the training set and adjusted for age, radiotherapy, and ET. This model was then used on the test set, and the c-statistic was used to check for model fit. Hazard ratios and 95% confidence intervals are reported. All analyses were performed in SAS v 9.4 (Cary, NC).