A Core of Common Genes Responsive to Cdx2 during Mouse Development
To address if common genes are regulated by the Cdx2 transcription factor in its consecutive function during trophectoderm formation, posterior growth and intestinal fate determination in the mouse embryo, RNAseq data were compared between Cdx2-overexpressing vs wild type ES cells [12,13], E8 Cdx-null vs wild type embryos [14] and intestinal epithelial cells of E16 Cdx2-/-vs wild type embryos [11]. Using a |log2(Fold-Change)|>2 and a p-value<0.05, a core of 221 common differentially-expressed genes (CDEGs) was identified between these 3 conditions, corresponding to 162 orthologue genes in human (Fig. 1a, Additional file 1.1). Interestingly, the direction of the expression changes of the CDEGs, either up or down, was not always similar at the 3 developmental steps, suggesting a context-dependent response to Cdx2 (Fig. 1b, Additional file 1.2). Term enrichment analysis performed on these 162 human orthologues revealed a significant association with “extracellular exosome”, “extracellular matrix”, “multicellular organism development”, “sequence-specific DNA binding”, “gene regulation”, “metabolic process” and “Wnt signaling” (Fig. 1c, Additional file 1.3). Twenty-eight genes among the 162 orthologues encode nuclear proteins involved in chromatin conformation, DNA transcription and repair, of which 11 are homeobox genes (underlined): Arid3a, Bmyc, Cdx1, Cdx2, Commd3, Ets2, Gata4, Hmgn3, Hoxb1, Hoxb5, Hoxc5, Hoxc6, Hoxc8, Id2, Id3, Nkx1.2, Pbx1, Prickle1, Prr13, Pitx1, Rcor2, Smarca1, Sox2, Sp5, Tbx4, Tfeb, Tlx2, Znf503.
A Core of Common Chromatin Regions Bound to Cdx2 during Mouse Development
Next, we compared the location of the Cdx2 protein on chromatin by ChIPseq at the three developmental steps [11,13,14]. This analysis resulted in a core of 1047 common chromosomal regions sharing overlapping peaks in the three conditions (Fig. 1d). Among these peaks, 265 were associated with protein coding genes, 466 with gene promoters (defined as the 2-kb segment upstream of the transcription start site), 52 with non-protein coding genes and their 2-kb promoter, and 264 with intergenic regions (Additional file 1.4). Eight genes of the core of DEGs (Arid3a, Epha4, Hoxc6, Man1c1, Mgat1, Mid1ip1, Sgsm1, Tfeb) exhibited conserved Cdx2 ChIP peak(s) in their 2-kb promoter or coding sequence. This relative low number of CDEGs linked to adjacent Cdx2 ChIPseq peak(s) suggests that important regulatory elements targeted by Cdx2 might be located far away in intergenic regions. In line with this, 75/264 (28.41%) of the intergenic Cdx2 ChIPseq peaks fell into Super-Enhancer domains (Additional file 1.5). Alternatively, Cdx2 may have both inductive and permissive transcriptional effects, thus uncoupling direct transcription activation from DNA binding, as reported in the gut epithelium [27–29]. Finally, it has also been shown that Cdx2 controls the expression of several downstream targets by interacting with other DNA-binding proteins, without the presence of a typical Cdx-type binding site [30,31].
Sequence analysis showed that 835 of the 1047 ChIPseq regions (77.75%) harbored at least one analogous motif to the functionally-described consensus Cdx-binding site (Fig. 1e) [32], giving a total of 1801 Cdx-type sites (p-value =10-152) (Additional file 1.6). Similarly, 214 promoters of the 221 CDGEs (96.83%) identified above exhibited at least one consensus Cdx-type binding site for a total of 1314 sites (Additional file 1.6). Interestingly, the +/-50-bp segments around the Cdx-type sites present in the ChIPseq regions and in the promoters of the CDEGs were enriched in DNA-binding motifs for respectively 149 transcription factors grouped into 25 families and 74 transcription factors grouped into 11 families (p-value<0.05) (Fig. 1f, Additional files 1.7 and 1.8). The proximity of enriched binding sites for these transcription factors and for Cdx2 suggests possible direct or indirect interactions.
Fate of the Core of Common CDEGs in Human Pathologies
Having established the core of 221 CDEGs during mouse embryonic development, we examined human diseases exhibiting alterations in the CDX2 transcript levels for the proportion of the 162 human orthologues deregulated in these pathologies (Fig. 1g, Additional files 2.2). Firstly, given that the CDX2 expression is physiologically limited to the intestine at the adult stage and that it is decreased in a subset of colon adenocarcinoma, we compared the transcriptomes in the deciles of tumors exhibiting the lowest vs highest CDX2 levels (n=44 each) within the TCGA collection of 436 colon adenocarcinoma (COAD). This revealed a significant number of 46 deregulated genes (p=0.044) (Fig. 1h, Additional file 2.3). Conversely, we considered pathological situations exhibiting an abnormal ectopic expression of CDX2 outside the gut, namely in the upper digestive tract, i.e. the esophagus [15] and stomach (TCGA), and in acute myeloid leukemia (TCGA). Retrieving the list of differentially expressed genes between healthy CDX2-free esophageal mucosa and CDX2-expressing non-dysplastic Barrett metaplasia (ESOBM-nd, n=14), low-grade dysplastic Barrett metaplasia (ESOBM-lgd, n=8) and adenocarcinoma (ESOAD, n=12) revealed respectively 123 (p=0.16 x 10-73), 118 (p=0.21 x 10-64) and 116 (p=0.96 x 10-44) orthologues of the CDEGs core (Fig. 1g,h, Additional files 2.4-6). In the stomach, the list of differentially expressed genes in the quartiles of adenocarcinoma presenting the highest vs lowest levels of CDX2 (n=35 each) within the series of 272 STOAD samples comprised 44 CDEGs of the core (p=0.0028) (Fig. 1g,h, Additional file 2.7). Similarly, 35 core DEGs (p=0.14 x 10-4) were recovered among the genes differentially expressed between the quartiles with the highest vs lowest levels of CDX2 (n=38 each) in the series of 151 AML (Fig 1g,h, Additional file 2.8). Altogether, these data indicate that a significant proportion of genes of the core defined on the basis of mouse developmental models are differentially expressed in human diseases along with CDX2 changes. This proportion was higher when comparing lesions with normal tissues (here, the various types of esophageal lesions) than when comparing tumor samples among them in each pathology (here, colon cancers, stomach cancers and leukemia) because of the variable nature of the specimen in the latter cases.