According to the framework, the SubtlePET™ AI solution (Subtle Medical, Menlo Park, CA, USA) was selected to be assessed in a single, European Association of Nuclear Medicine (EANM) Research Ltd (EARL) [12] accredited, PET/CT center with three scanners. Appropriate legal review was undertaken to ensure that the solution was certified with the correct Medical Device classification and was compliant with the European General Data Protection Regulations (GDPR). In parallel, the technical architecture (Fig. 2) for the integration of the solution was reviewed. Once the legal and technical requirements were validated, the software was installed, configured and verified, and the personnel were trained on its use and limitations.
SubtlePET™ software uses a convolution neural network-based algorithm to reduce noise and improve image quality of fluorodeoxyglucose (FDG) and amyloid PET and PET/CT images [6, 13]. Even though SubtlePET™ is certified and validated for clinical use for all major PET/CT vendors and many different models, according to the framework, a clinical assessment had to be undertaken. This prospective analysis was designed to verify the performance of the algorithm using real-world data.
Patient enrollment
Patients referred for 18F-FDG PET/CT during diagnostic work-up for oncological disease were screened for prospective enrollment. Inclusion criteria were: (a) age > 18 years; (b) FDG-avid malignancy; (c) glycemia < 180 mg/dL; (d) adequate physical condition to allow them to remain still for approximately 40 minutes, for two consecutive PET scans. Claustrophobic patients were excluded. Patients meeting these criteria were approached to participate in the study.
According to GDPR and institutional procedures related to the information provided to patients for the examination process, all the patients signed an informed consent form prior to any study procedures.
Examination protocol
PET images were acquired with three different 3D PET/CT scanners (Discovery ST-4 - PET scanner 1, Discovery ST-16 – PET scanner 2 and Discovery IQ – PET scanner 3) from the same manufacturer (GE Healthcare, Milwaukee, WI, United States) without time of flight (TOF) technology.
18F-FDG was provided by Advanced Accelerator Applications pharmaceuticals (AAA by Novartis, Saint-Genis-Pouilly, France) in compliance with Good Manufacturing Practice (GMP) and in accordance with EANM procedure guidelines[5].
For the purposes of this study, FDG doses were reduced by one-third compared to the standard injected dose to a patient with the same body weight, according to institutional procedure guidelines. All doses were injected via peripheral venous catheter.
During the same day, patients underwent two sequential PET scans in continuous-bed-mode; a reduced dose acquisition scan (PET-processed) and a reference acquisition scan (PET-native). The PET-processed scan was acquired first at 60 minutes post-injection from skull base to mid-thigh. In order to simulate normal acquisition time and reduced injected dose, PET images for scanners 1 and 2 were acquired at 2.5 minutes per bed-position, while for scanner 3, images were acquired at 1.5 minutes per-bed position, in accordance with institutional procedure guidelines.
Following the PET-processed scan, the PET-native scan was acquired for the same region without moving the patient. The PET-native images were acquired with an elapsed time, increasing the minutes per bed-position, to simulate normal acquisition time and standard injected dose. To define the PET emission acquisition time that simulated a full dose examination, a phantom study was performed on each PET/CT scanner using cancer imaging conditions, applying the following equation: standard time acquisition per bed * exp(900*λ) * 1,25 (second).
Patients underwent one low-dose CT prior to PET-processed acquisition, for attenuation correction and anatomical correlation of PET findings.
Emission data was corrected for randoms, dead time, scatter, and attenuation and was reconstructed iteratively by an ordered-subsets expectation maximization (OSEM) algorithm.
According to institutional processes, images were reviewed for artifacts by the technologist before the patient was discharged. Upon confirmation, PET-processed acquisitions were sent by the radiographer from the modality to the Subtle server (SubtleEdge) for processing. Incoming images were automatically anonymized and quality controlled (QC) according to the SubtlePET™ process. Images that passed QC were processed and were sent automatically to the Picture Archiving and Communication System (PACS) in an average time of ten minutes.
Image Quality assessment
The PET-native images were defined as the standard of reference and were reviewed by two independent physicians who had access to all the clinical, imaging and reconstruction data, to reach to consensus report that was delivered to the patient within 24 hours.
The PET-processed and PET-native datasets were anonymized, separated and randomized allowing independent assessment of each dataset over a four weeks period, by four blinded board-certified nuclear medicine physicians, with more than five years’ experience (EP and VA > 15 years; GP and AI > 5 years). Each reviewer assessed all datasets. They were blinded regarding image acquisition, reconstruction technique and clinical information. 18F-FDG PET/CT images were reported according to EANM procedure guidelines[5].
For image quality, the PET datasets were rated on a 5-point scale (1: very poor/non-diagnostic; 2: poor; 3: moderate; 4: good; and 5: excellent) with scores 4 and 5 considered adequate to provide diagnostic confidence.
Furthermore, each reviewer had to give their opinion as the whether they were reviewing the PET-processed or the PET-native dataset or if this was indeterminate.
Lastly, the detectability of all lesions was evaluated in a per-lesion analysis. In patients with ten lesions or fewer, all lesions were assessed by the reviewers, while in patients with more than ten lesions, those ten with the highest standard uptake values (SUV) max were included in the analysis. In the two datasets, the SUVmax of the largest lesion and the SUVmean of the liver were measured. SUV was defined as activity concentration (Bq/mL) divided by injected activity (Bq) normalized to body weight. The highest voxel value (SUVmax) and the mean voxel value (SUVmean) were obtained in a volume of interest (VOI) covering the entire tumor as defined by each reviewer. Considering that all PET-native images were acquired after the PET-processed images, a correction factor for the SUV values was calculated according to Appendix 1 and Supplemental Table 1[14]. Lesions not detected by a reviewer in a specific dataset were assigned as 0.
Once the independent analysis of the native and processed datasets was complete, they were logged and unblinded. The whole dataset was scrutinized to determine whether the processed scans were non-inferior to the native scans. Quantitative assessment was of lesion detectability and SUV levels; qualitative assessment on subjective image quality. Inter-observer variability for image quality assessment was performed. Differences in results between PET/CT scanner models were also assessed.
Statistical analysis
Descriptive statistics for categorical variables were presented as relative/absolute frequencies, while those for continuous ones as the median (range). The inferential analyses for categorical and continuous variables were performed by the Fisher’s exact test and the Mann–Whitney test, respectively. The degree of agreement among reviewers for evaluating image quality was assessed using intraclass correlation coefficients (ICC) and their 95%CI, using a 2-way mixed, single measure, consistency model. ICC was interpreted according to Landis J. R. interpretation scale[15] (0.0: poor; 0.0-0.20: slight; 0.21–0.40: fair; 0.41–0.60: moderate; 0.61–0.80: substantial; 0.81-1.00: almost-perfect reproducibility). To analyze the lesion detectability in the two PET datasets, the detection rate was calculated for each reviewer based on the total number of suspected lesions determined by the standard of reference. All p values were obtained by the two-sided exact method at the conventional 5% significance level. Data were analyzed with R 3.6.1 (R Foundation for Statistical Computing, Vienna-A, http://www.R-project.org).
Once the analysis of the outcomes from the clinical evaluation was completed, an assessment of the business benefits was performed. The potential net savings from the use of SubtlePET™ were calculated using data from the whole PET/CT network, not just the single center, assuming replicability of results. A percentage of these savings was then agreed as a fair price for the AI solution.