Histo-molecular profiling of IM
We hypothesised that histologically defined gastric complete IM would be rich in expression of genes whose protein products have previously been shown to be associated with complete IM or the small intestine and that histologically defined incomplete IM would be high in gene expression whose gene products have previously been shown to be associated with incomplete IM or the colon. Our objective was first to identify genes that could be used to molecularly subtype IM samples from Affymetrix microarray data and second to validate potential gene product biomarkers on histologically defined complete and incomplete single IM glands. We initially chose single gland and not whole tissue validation as this would correspond to the highest possible resolution of histological subtyping.
Previous studies have shown exclusive expression of brush border markers such as CD10 (MME gene) and IAP (ALPI gene) (27–29) in the complete subtype of IM as well as higher expression of CDX2 when compared to the incomplete subtype (30). By contrast higher expression of CD24 has been described in Type II incomplete IM (31). To carry out an exploratory molecular-based subgrouping of macro-dissected IM-GC samples with the available Affymetrix microarray data, two additional genes, MUC12 and CDX1, were chosen that have previously been shown to be tissue enriched in the colon when compared to the small intestine (32, 33).
Using the above 6 gene signature, unsupervised hierarchical clustering of IM-GC samples produced two main clusters (Fig. 1A): cluster C1 containing samples with high expression of CDX2, MME and ALPI and cluster C2 containing samples with relatively higher expression of MUC12, CD24 and CDX1. Samples S12 (from C1) and S8 (from C2) were classified as molecularly subtyped mixed IM due to the relatively high expression of all 6 target genes and the remaining samples were defined either as molecularly subtyped complete IM (C1 without sample S12) or incomplete IM (C2 without sample S8).
Differential gene expression and pathway analysis of IM-GC samples
To gain insight into how complete and incomplete IM might differ overall at the gene expression level and use this information to identify an optimal subtype biomarker, differential gene expression analysis was carried out (Fig. 1B, Supplementary Table 4). A total of 18 and 12 genes were over-expressed (log2 fold change > 0.6 or <-0.6 with adjusted FDR at p < 0.05) in complete and incomplete IM, respectively, and comprised the differentially expressed gene (DEG) signature. Molecular based IM subtyping was further confirmed by performing unsupervised hierarchical clustering of all IM-GC samples, including the two mixed IM samples, with the DEG signature (Supplementary Fig. 4). Overall, the complete IM gene list was enriched in small intestine specific genes (RBP2, MME, XPNPEP2) and others related to carbohydrate digestion (APOA4, SLC2A5, SLC2A2, MGAM and KHK) confirming the strong small intestinal-like characteristics of these samples. There was also a highly enriched chemokine CCL25. The incomplete IM gene list contained genes normally expressed in the colon (HOXA10 and HOXA13) and a chemokine, CXCL5. Additionally, two other GC associated genes were also present in the incomplete IM list (CLDN1 and CDH3).
To further determine whether the complete IM samples were relatively enriched in small intestine associated pathways compared to the incomplete IM samples, ssGSEA using the KEGG pathway database was performed. Eighteen pathways were significantly enriched (adjusted p < 0.05, Wilcoxon rank sum test with Benjamini-Hochberg correction) in complete IM but none in incomplete IM (Fig. 1C, Supplementary Table 5). Enriched pathways were mainly associated with carbohydrate and lipid metabolism suggesting that complete IM was indeed enriched in small intestine associated processes.
CD10 as a biomarker for single complete IM glands
Given its highly significant difference in gene expression levels between complete and incomplete IM samples, the gene product of the MME gene, CD10, was chosen as a candidate biomarker for complete IM. Initial validation of the anti-CD10 antibody was accomplished with IHC staining of complete and incomplete IM samples (Fig. 2A).
Next, IM samples representing both the IM-GC and IM + GC cohorts were stained with CD10 using immunofluorescence staining. Single gland analysis demonstrated that CD10 is highly sensitive (91.1%) and specific (97.8%) for detecting complete IM glands (Table 1) with high PPV (98.2%) and NPV (89.1%). Further stratification of the samples based on cohorts representing varying risk of progression to GC (where IM-GC is lowest risk and IM + GC is highest risk), showed CD10 had an increasing sensitivity for detecting complete IM glands (from 87.5% in IM-GC patients to 94.9% in IM + GC patients). The reverse trend was observed for specificity with 100.0% in IM-GC patients and 96.7% in IM + GC patients. PPV was above 96% in both cohorts and NPV increased from 80.0% in the IM-GC cohort to 95.1% in the IM + GC cohort.
Table 1
Sensitivity and specificity of CD10 and Das1 for individual complete and incomplete intestinal metaplastic glands
Biomarker
|
IM gland subtype
|
N0 of glands
|
N0 + ve glands
|
N0 -ve glands
|
Sensitivity
(95% CI)a
|
Specificity
(95% CI)a
|
PPV/
NPV
|
AUROC
|
CD10
|
|
|
|
|
|
|
|
|
All cohorts
|
|
|
|
|
|
|
|
|
|
Complete
|
123
|
112
|
11
|
91.1%
(84.6%-95.5%)
|
97.8%
(92.4%-99.7%)
|
98.2%/
89.1%
|
0.944
|
|
Incomplete
|
92
|
2
|
90
|
IM-GC
|
|
|
|
|
|
|
|
|
|
Complete
|
64
|
56
|
8
|
87.5%
(76.9%-94.5%)
|
100.0%
(89.1%-100.0%)
|
100.0%/
80.0%
|
0.938
|
|
Incomplete
|
32
|
0
|
32
|
IM + GC
|
|
|
|
|
|
|
|
|
|
Complete
|
59
|
56
|
3
|
94.9%
(85.9%-98.9%)
|
96.7%
(88.5%-99.6%)
|
96.6%/
95.1%
|
0.958
|
|
Incomplete
|
60
|
2
|
58
|
Das1
|
|
|
|
|
|
|
|
|
All cohorts
|
|
|
|
|
|
|
|
|
|
Complete
|
127
|
11
|
116
|
29.2%
(20.3%-39.3%)
|
91.3%
(85.0%-95.6%)
|
71.8%/
63.0%
|
0.603
|
|
Incomplete
|
96
|
28
|
68
|
IM-GC
|
|
|
|
|
|
|
|
|
|
Complete
|
60
|
1
|
59
|
28.6%
(11.3%-52.2%)
|
98.3%
(91.1%-100.0%)
|
85.7%/
79.7%
|
0.635
|
|
Incomplete
|
21
|
6
|
15
|
IM + GC
|
|
|
|
|
|
|
|
|
|
Complete
|
67
|
10
|
57
|
29.3%
(19.4%-40.1%)
|
85.1%
(74.3%-92.6%)
|
68.8%/
51.8%
|
0.572
|
|
Incomplete
|
75
|
22
|
53
|
aConfidence intervals for sensitivity and specificity are Clopper-Pearson confidence intervals; PPV: Positive Predictive Value; NPV: Negative Predictive Value; AUROC: Area Under Receiver Operating Characteristic (ROCR package in R). |
Das1 as a biomarker for single incomplete IM glands
Given complete IM has less propensity to progress to cancer than incomplete IM, it is important to try and identify markers that may help distinguish IM that will progress from IM that is unlikely to progress. Das1 was chosen for this study for its associations not only with incomplete IM but also with complete IM in a cancer setting (22). Initial validation of the Das1 antibody was accomplished with IHC staining of complete and incomplete IM samples (Fig. 2B).
Serial sections of the IM samples used for the CD10 experiment were stained with Das1 and were scored using the same criteria. Das1 had a low sensitivity of 29.2% but a high specificity of 91.3% for detection of incomplete IM glands in both cohorts combined (Table 1). PPV and NPV of Das1 for incomplete IM glands were 71.8% and 63.0%, respectively. After separation of the cohorts based on potential risk of progression, Das1 continued to demonstrate a low sensitivity across both cohorts (28.6% and 29.3% in IM-GC and IM + GC patients respectively) but a high specificity (98.3% and 85.1% in IM-GC and IM + GC patients respectively).
Logistic regression model using CD10 and Das1 staining
To determine whether combined CD10 and Das1 staining could help improve identification of single complete IM glands, logistic regression modelling was performed comparing CD10 on its own with combined CD10 and Das1 staining (glm in R). In the model with CD10 on its own, a highly significant positive association with complete IM glands was observed as expected (Fig. 3A). In the combined model, both CD10 (positive) and Das1 (negative) had a significant association with complete IM glands. The Akaike Information Criterion decreased when Das1 staining status was added to the model suggesting an overall improvement. This was further confirmed by an increase in AUROC observed in the combined model (Fig. 3B). The addition of Das1 offered a small but significant improvement for the detection of complete IM and, by inference, an IM that has lower propensity to progress to cancer.
Das1 is associated with the incomplete subtype of IM
Das1 staining was often observed in the lower parts of IM glands (Fig. 4A, B). Given that the criteria for single gland analysis was restricted to the quantitation of staining in the top half of the glands, this may explain the low number of IM glands with positive staining. Thus, to determine whether Das1 staining across all parts of IM glands was associated with the incomplete IM subtype, regions rich in IM glands and adjacent ChG as control tissue were digitally quantified for positive staining in patients of both the IM-GC and the IM + GC cohorts (Supplementary Fig. 3).
Analysis of Das1 staining in the IM-GC cohort showed no staining in ChG, little staining in complete IM but significantly more staining in incomplete IM (p = 0.0003 and p = 0.016 compared to ChG and complete IM respectively, as determined using the Mann-Whitney test) (Fig. 4C). A single complete IM sample with considerable Das1 staining (4.2% positive staining of IM tissue) was found to be an outlier as determined using the ROUT method. Interestingly this was the only complete IM sample that differed in subtype diagnosis between the original H&E section by the in-house pathologist following endoscopy (incomplete IM) and a second H&E section from the same formalin block cut directly prior to the commencement of the current study (complete IM) (Supplementary Table 2, sample N4S2). Gastric IM often consists of interspersed glands with differing subtype thus sections cut from a FFPE block at different levels of depth may differ in IM subtype diagnosis. However, the high levels of positive Das1 staining observed only in this complete IM sample suggest that CEP is likely a marker of local instability normally associated with incomplete IM.
In IM + GC patients, complete IM showed significant more Das1 staining than ChG (p = 0.0048, Mann-Whitney test) (Fig. 4C). Again, incomplete IM showed a higher percentage of Das1 staining compared to complete IM (p = 0.009, unpaired t test).
Das1 staining is associated with complete IM in IM + GC samples
Given that adjacent non-malignant tissues from patients with cancer have been shown to be of a more molecularly advanced nature than the same histological tissue type in patients without cancer (34, 35), a comparison was performed of Das1 staining between IM-GC (“early IM lesions”) and IM + GC (“advanced IM lesions”) samples (Fig. 4C). This comparison would allow for the assessment of Das1 staining as a progression risk biomarker. Das1 positive staining in incomplete IM did not differ between IM-GC and IM + GC samples. However, complete IM tissues in IM + GC samples showed a significant increase in Das1 staining compared to those in IM-GC samples (p = 0.019, Mann-Whitney test). Overall, these findings suggested that Das1 staining is associated with a more “advanced type” of complete IM (IM + GC cohort) but does not change between “early” and “advanced” incomplete IM.