Identification of C/EBP genes in the vertebrates
A total of 92 C/EBP genes were identified in 17 vertebrate genomes. The number of C/EBP genes in each species was slightly different as shown in Table 1. In total, the C/EBP family included C/EBPα, C/EBPβ, C/EBPδ, C/EBPε, C/EBPγ, and C/EBPζ. In particular, the C/EBPα and C/EBPγ genes of all species are located on the same chromosome within a distance of less than 100 kb, except for Xenopus tropicalis.
Various physicochemical properties of each C/EBP TF were calculated and are shown in Table 2 and Table S1. The C/EBPζ protein contains the highest number of amino acids and has the highest molecular weight, and the C/EBPγ protein consists of the least number of amino acids with the lowest molecular weight. The aliphatic index, a measure of thermostability, ranged from 60.36 to 76.17. There are significant differences (p<2e-16) in the aliphatic index between C/EBP TFs. The GRAVY values of C/EBP proteins are negative indicating hydrophilic properties. The GRAVY values vary from -1.4060 to -0.4360. The pI values of C/EBPβ, C/EBPε, and C/EBPγ proteins in each species are higher than 7, and the pI values of C/EBPζ are less than 7. The C/EBPα protein of zebrafish is acidic, and C/EBPα proteins of other species are alkaline. The C/EBPδ protein is acidic in 1/3 species and alkaline in the remaining 2/3 species.
Phylogenetic relationship analysis
To analyze the phylogenetic relationships between 92 C/EBP genes in 17 species, an unrooted maximum likelihood phylogenetic tree was constructed as shown in Fig 1. All C/EBP TFs were classified into two groups. Group I contains 16 vertebrate animal C/EBPβ TFs and was named C/EBPβ. Group II contains the remaining C/EBP TFs, which can be divided into five clades. Following the nomenclature, we named the clades of group II as C/EBPα, C/EBPδ, C/EBPε, C/EBPγ, and C/EBPζ. C/EBP genes were detected in most vertebrates indicating that the C/EBP family members originated in the early stage of vertebrate evolution. In each clade, C/EBP genes of the same order tend to cluster together indicating higher similarity to each other than to other orders of C/EBP genes. C/EBP orthologs of the order primates includes human, macaque, and chimpanzee; the order artiodactyls includes pig, cattle, and goat; the order carnivore includes cat and dog; and the order rodents (rat and mouse) genes are clustered together.
To investigate the structural features of the C/EBP members, the gene structure and conserved motifs were evaluated by the phylogenetic analysis, as shown in Fig 2. The number of exons in C/EBPζ genes, which contained 15 or 16 exons, was higher than that in the other C/EBP genes. Most of the C/EBPα, C/EBPβ,C/EBPδ, C/EBPε, and C/EBPγ genes contained one or two exons. A total of 10 conserved motifs were identified in C/EBP proteins. All C/EBP TFs contain motif 2, motif 3, and motif 6. Motifs 8 and 10 are unique motifs in C/EBPζ proteins and may be associated with the clade-specific functions of the C/EBP proteins.
Expression analysis of the C/EBP genes
The expression of C/EBP genes was compared in 27 adult Duroc pig tissues. The 6 C/EBP genes from pig were classified into two groups based on the cluster analysis of the C/EBP gene expression levels in various tissues, as shown in Fig 3. Group I contains only the C/EBPε gene, and other genes are included in group II. The expression levels of the C/EBPε gene were low in all tissues (FPKM<5), and the gene is expressed only in the intestine, salivary gland, thyroid, uterus, and lymph. Although C/EBPα, C/EBPβ, C/EBPδ, C/EBPγ, and C/EBPζ are expressed ubiquitously, the expression patterns in various tissues are inconsistent; the tissue specific index values (τ) are 0.862, 0.700, 0.654, 0.499 and 0.433, respectively. C/EBPα was expressed at the high levels in the thyroid, liver, lung, and adipose tissues. C/EBPβ was expressed at the high levels in the thyroid, adrenal gland, lung, adipose, liver, and ovary tissues. C/EBPδ was expressed at the high levels in the thyroid, gall bladder, ovary, and uterus. C/EBPγ and C/EBPζ are widely expressed in other tissues at similar levels.
Additionally, the results indicate that the expression patterns in certain tissues, such as brain and spinal cord in the nervous system and ovary and uterus tissues in the female reproductive system, are similar.
Construction of a transcriptional regulatory network of the C/EBP gene family
The C/EBP family is an important family of transcription factors that regulate the expression of the target genes by binding to the promoter regions to maintain the normal physiological processes in vivo. According to the PWMs of the C/EBPα, C/EBPβ, C/EBPδ, C/EBPε, and C/EBPγ genes from the cisbp database, we predicted 4,662, 3,164, 8,383, 7,278, and 1,604 target genes regulated by these genes, respectively (see Table 3). A total of 10,270 target genes are regulated by the C/EBP genes for a total of 25,091 regulatory relationships. Binding sites for other transcription factors and miRNAs are present in the regulatory regions of the C/EBP genes.
In this study, the C/EBP genes are predicted to be regulated by 423 TFs forming 1,582 regulatory relationships; mir503 and mir7140 are predicted to regulate C/EBPβ and C/EBPγ, respectively. Additionally, C/EBPβ and C/EBPγ are regulated by C/EBPα, C/EBPδ, and C/EBPε and C/EBPβ and C/EBPε, respectively. Interestingly, C/EBPβ self-regulation is also predicted. Thus, we constructed a pig C/EBP regulatory network (summarized in Fig 4) that includes C/EBP genes, TFs, miRNAs, and target genes. These genes were defined as a node, and the distribution of node degree approximately follows the power-law distribution indicating that the gene regulatory network is a scale-free network. Certain network concepts, including the clustering coefficient, network centralization, and network heterogeneity, were calculated to be 0.1890, 0.8160, and 25.1730, respectively.
FFLs related to the C/EBP genes
A total of 27 miRNA-FFL motifs were identified in the pig C/EBP regulatory network. According to the sequences, we predicted that mir503 and mir7140 regulate C/EBPβ and C/EBPγ, respectively. The miR503 and C/EBPβ genes coregulate 14 target genes forming 14 miR503-C/EBPβ-target gene FFL motifs, and miR7140 and C/EBPγ coregulate 11 target genes, including 11 miR7140-C/EBPγ-target gene FFL motifs. Additionally, miR503-ELF3-C/EBPβ and miR7140-ARID5B-C/EBPγ motifs were identified. The miR503-ELF3-C/EBPβ motif is involved in the miR503 and ELF3 coregulation of the C/EBPβ gene, and the miR503 gene targets the ELF3 and C/EBPβ genes. The miR7140-ARID5B-C/EBPγ motif is included in 3 regulatory relationships: miR7140→ARID5B, ARID5B→C/EBPγ, and miR7140→C/EBPγ. The C/EBPβ-binding sites in 5'-untranslated region (5'-UTR) of C/EBPγ, and C/EBPβ and C/EBPγ co-regulate 10 target genes forming 10 TF-FFL motifs. Thus, the combinations of all FFL motifs were used to construct the FFL sub-network (see Fig 5).
Based on the data of RNA-seq, gene expression patterns related to this FFL sub-network were analyzed in 27 adult Duroc tissues. The results indicate that target genes regulated by C/EBPβ and C/EBPγ have variable expression patterns in various tissues. The ATP synthase F1 subunit alpha (ATP5F1A) gene is expressed ubiquitously, and the glutamate decarboxylase-like protein 1 (GADL1) and Slit-Robo GTPase activating protein 3 (SRGAP3) genes are expressed at the high levels in the muscle and brain, respectively. We suggest that the C/EBPβ-C/EBPγ-GADL1 FFL motif may play an important role in the brain. Some FFL motifs may be tissue-specific. Based on the target genes expression pattern, we estimated that the number of FFL motifs in each tissue may be significantly different (see Table 4); however, miRNA expression patterns were not evaluated in the present study.
The dN and dS analysis of the C/EBP genes and target genes
The data on the nonsynonymous (dN) and synonymous (dS) substitution rates between the human and pig sequences were downloaded from the Ensembl database. The dN/dS values of the C/EBP genes ranged from 0.02 to 0.21 indicating that pig C/EBP genes underwent purifying selection. The dN+dS value of the C/EBPδ gene was 1.52, which was higher than that of five other C/EBP genes (0.31 ~ 0.67) indicating that the C/EBPδ gene evolved rapidly and had an increased mutation rate.
The dN+dS mean values of the target genes of C/EBPα, C/EBPβ, C/EBPδ, C/EBPε, and C/EBPγ are 0.58, 0.48, 0.56, 0.53, and 0.51, respectively. The dN/dS mean values of the target genes are 0.166, 0.175, 0.17, 0.177, and 0.166, respectively. The dN/dS and dN+dS mean values of the target genes of each C/EBP gene were compared using Kolmogorov-Smirnov (KS) test. The results indicate that the dN/dS distributions of the target genes of C/EBPα are similar to that of C/EBPβ, C/EBPε and C/EBPγ (p<0.05), respectively. The very low dN/dS values suggest strong negative selection on all C/EBP genes, which may remain due to genetic drift or persistence. The dN+dS value distribution of the C/EBPα target genes is similar to that of the C/EBPδ target genes and is significantly higher than that of other C/EBP target genes (P<0.05). The results indicate that the target genes of C/EBPα appear to be evolving rapidly.
Functional enrichment analysis of the C/EBP genes and target genes
We used the DAVID software to analyze the functions of the pig C/EBP genes. The results indicate that the functions are associated with many biological processes, including macrophage differentiation (GO: 0030225), inner ear development (GO: 0048839), positive regulation of osteoblast differentiation (GO: 0045669), transcriptional misregulation in cancer pathways (ssc05202), and tuberculosis pathways (ssc05152) (Table 5).
The functional enrichment analysis of the target genes regulated by the C/EBP genes showed that the target genes of C/EBPα, C/EBPβ, C/EBPε, and C/EBPγ are associated with nucleoplasm (GO: 0005654) and extracellular exosome (GO: 0070062). The target genes of C/EBPδ and C/EBPγ are involved in the transforming growth factor beta (TGFβ) receptor signaling pathway. The target genes of C/EBPδ are involved in the platelet-derived growth factor receptor signaling (GO: 0048008) (Table 6).