We used ICD-9-CM and ICD-10-CM codes (Table 1 in Supplement) to assemble a study cohort of adult patients with chronic liver disease between 2006 and 2022 using electronic health record (EHR) data at an academic health system (UCLA Health), which included two hospitals and over 200 medical clinics across Southern California (Figure 1). All patients had a minimum of two ambulatory care visits in primary care at least one year apart to be considered established ambulatory care patients in the health system (n=26,439). We defined the follow-up period as the time between the first ICD code entry for HCC to the most recent encounter. We first queried for patients with potential HCC, which we defined as having at least one entry of ICD-9-CM (155.0) or ICD-10-CM (C22.0) code for HCC (n=1,007). To assess the performance of a single ICD code entry for HCC, we assembled a development sample, which was a random pool of 300 patients from the potential HCC sample, for chart review by three physicians (SJS, AB, SB) using a structured abstraction form. Separate chart reviews were conducted by a transplant hepatologist (CRW) for any discrepancies. HCC diagnosis was established using a combination of imaging, histology, and clinical notes in the EHR. Abstraction of clinical data from chart reviews provided information about true HCC cases (true positive) and false HCC cases (false positive), which served as the gold standard comparison. From our chart reviews using the development sample, we identified the most frequent non-HCC malignancies, which were included in our algorithm. Since the algorithm was based on the development sample using manual chart reviews as the gold standard, we also performed a sensitivity analysis using the institution’s cancer registry as a reference. This sensitivity analysis was applied to 285 patients from the development sample diagnosed with HCC between 2006 and 2020. The sensitivity analysis excluded 15 patients because they were diagnosed with HCC outside of the registry date range (e.g. after 2020 when registry data was unavailable at the time of this study).
Performance measurements, including sensitivity, specificity, PPV (precision), negative predictive value (NPV), F-score (harmonic mean of PPV and sensitivity), and accuracy (percentage of patients who were correctly classified) accompanied each algorithm iteration (Table 1). We selected the best performing algorithm with the highest PPV, F-score, and accuracy to reduce the number of false positive and negative cases. We internally validated the highest performing algorithm using a different random sample of 300 patients from the pool of potential patients with HCC (Figure 1).
Table 1. Performance measurements of ICD code-based algorithm iterations
|
Sensitivity
|
Specificity
|
PPV
|
NPV
|
F-Score
|
Accuracy
|
Algorithm 1: Increasing number of ICD code entries for HCC
|
|
³ 9 ICD code entries for HCC
|
88.1
|
93.7
|
91.0
|
91.6
|
0.90
|
91.3
|
³ 10 ICD code entries for HCC
|
88.1
|
94.8
|
92.5
|
91.7
|
0.90
|
92.0
|
³ 11 ICD code entries for HCC
|
85.7
|
94.8
|
92.3
|
90.2
|
0.89
|
91.0
|
Algorithm 2: Exclusion of each non-HCC malignancy with ³ 1 ICD code entry for HCC
|
Secondary malignancy
|
95.2
|
17.8
|
45.6
|
83.8
|
0.62
|
50.3
|
Cholangiocarcinoma
|
89.7
|
17.2
|
44.0
|
69.8
|
0.59
|
47.7
|
Pancreatic cancer
|
100
|
0
|
42.0
|
0
|
0.59
|
42.0
|
Colorectal cancer
|
100
|
0
|
42.0
|
0
|
0.59
|
42.0
|
Neuroendocrine tumor
|
100
|
0
|
42.0
|
0
|
0.59
|
42.0
|
Algorithm 3: ³ 10 ICD code entries for HCC and exclusion of each non-HCC malignancy
|
Secondary malignancy
|
83.3
|
95.4
|
92.9
|
88.8
|
0.88
|
90.3
|
Cholangiocarcinoma
|
77.8
|
98.9
|
98.0
|
86.0
|
0.87
|
90.0
|
Pancreatic cancer
|
88.1
|
94.8
|
92.5
|
91.7
|
0.90
|
92.0
|
Colorectal cancer
|
88.1
|
94.8
|
92.5
|
91.7
|
0.90
|
92.0
|
Neuroendocrine tumor
|
88.1
|
94.8
|
92.5
|
91.7
|
0.90
|
92.0
|
Algorithm 4: Increasing number of ICD code entries for HCC and sum of ICD code entries for HCC exceed sum of ICD code entries for non-HCC malignancies
|
³ 9 ICD code entries for HCC
|
88.1
|
97.1
|
95.7
|
91.8
|
0.92
|
93.3
|
³ 10 ICD code entries for HCC
|
88.1
|
98.3
|
97.4
|
91.9
|
0.92
|
94.0
|
³ 11 ICD code entries for HCC
|
85.7
|
98.3
|
97.3
|
90.5
|
0.91
|
93.0
|
PPV; positive predictive value, NPV; negative predictive value; HCC, hepatocellular carcinoma
Performance measurements of each algorithm were obtained using the Development Sample, which was a random pool of 300 potential HCC patients with at least 1 ICD code entry for HCC.
Algorithm 1 included an increasing number of ICD code entries for HCC using ICD codes 155.0 and C22.0. Performance measurements in Algorithm 2 were for exclusions of each respective non-HCC malignancy: secondary malignancy (ICD codes 197.7, C78.7), cholangiocarcinoma (ICD codes 155.1, C22.1), pancreatic cancer (ICD codes 157.9, C25.9), colorectal cancer (ICD codes 153.9, C18.9), or neuroendocrine tumor (ICD codes 209, 209.0, 209.00-209.03, 209.1, 209.10-209.17, 209.72, C7A.1, C7A.010-C7A.012, C7A.019-C7A.026, C7A.029, C7B.02). Algorithm 3 included the best performing iteration in Algorithm 1, which included at least 10 ICD code entries for HCC, and excluded each respective non-HCC malignancy, which were evaluated in Algorithm 2 (secondary malignancy, cholangiocarcinoma, pancreatic cancer, colorectal cancer, neuroendocrine tumor). Algorithm 4 included iterations from Algorithm 1, which included an increasing number of ICD code entries for HCC and required the sum of ICD code entries for HCC to exceed the sum of ICD code entries for non-HCC malignancies
(secondary malignancy, cholangiocarcinoma, pancreatic cancer, colorectal cancer, neuroendocrine tumor). Highlighted are the highest performing iterations in Algorithm 1 and Algorithm 4.