3.1 Identification and chromosome map of KNOX proteins and genes from Brassica
A total 15 KNOX proteins from B. rapa, 14 from B. oleracea, and 32 from B. napus were identified on the official website (BRAD (brassicadb.cn)), while an additional 89 members were screened from the genomes of several green lineage species (Phytozome (doe.gov), version 13.0) (Supplement File 1). KNOX proteins are exclusively found in higher plants except one member with the KNOX2 domain identified in Ostreococcus lucimarinus ; They cannot be detected in lower plants such as Chlamydomonas reinhardtii, Volvox carteri and Phaeodactylum tricornutum, despite having HOX homologous genes (Supplement File 1).
The KNOX genes of B. rapa were mapped onto six chromosomes. Chromosome A09 showed the highest number of five KNOX genes, followed by chromosome A02/03 with three genes. Chromosomes A04, A07 and A10 did not contain any KNOX genes. Nine KNOX genes of B. oleracea were also mapped onto six chromosomes (C01-03, C05, C07 and C08), while BoSTM, BoKNAT4a, BoKNAT6a/6b and BoKNAT7 could not be successfully mapped. In total, 29 KNOX genes of B. napus were mapped onto 14 chromosomes (six A- chromosomes and eight C- chromosomes). However, three C- genome specific genes (BnKNAT1b-C, BnKNAT4b-C and BnKNAT7a-C) could not be mapped (Fig. 1).
3.2 Phylogenetic and classification analyses of KNOX protein family
To analyze the evolutionary patterns of KNOX, we constructed a comprehensive phylogenetic tree for 150 KNOX proteins, including: 32 from B. napus, 16 from Populus trichocarpa/Zea mays, 15 from B. rapa, 14 from B. oleracea, 13 from Oryza sativa, 11 from Brachypodium distachyon, 9 from Arabidopsis thaliana, 8 from Fragaria vesca, 7 from Medicago truncatula, 4 from Physcomitrella patens/Selaginella moellendorffii, and 1 from O. lucimarinus (Fig. 2; Supplement File 1). Subsequently excluding PpKNAT2a/2b and OlKNAT7 due to their inability to be classified into Class I or Class II despite sharing the same domain organization as others (similar results were obtained using different phylogenetic trees constructed by Neighbor-Joining or Maximum Likelihood methods), we successfully categorized the remaining set of 147 KNOX proteins into three distinct classes. Notably in Brassica species specifically, B. napus not only encompasses all KNOX proteins found in both B. rapa and B.oleracea but also exhibits an additional three unique members within its repertoire (Fig. 2; Supplement File 1).
3.3 Phylogenetic and domain analyses of Class I
Arabidopsis Class I KNOX genes play a crucial role in shoot apical meristem (SAM) activity, carpel development, sporophyte development and abscission zone development (Zhao et al. 2020). Class I specifically encompasses KNOX proteins found in vascular plants, including 2 members in S. moellendorffii, 3 in M. truncatula, 4 in A. thaliana, 5 in B. rapa/B. oleracea/F. vesca, 7 in B. distachyon, 9 in P. trichocarpa/O. sativa, 10 in B. napus and 11 in Z. mays (Fig. 3; Supplement File 1). Monocots exhibit a higher number of AtKNAT1 homologous proteins compared to dicots, while maintaining highly conserved domain organizations characterized by the major domains: KNOX1 (PF03790), KNOX2 (PF03791), ELK (PF03789), and Homeobox (PF00046). The phylogenetic relationships of the protein classes are closely linked to plant species evolution (Fig. 3). Within Class I, KNOX genes can be categorized into three branches labeled as Groups I, II, and III. Group I represents the STM group with a conserved domain organization. Group II KNAT1 comprises the largest group. Most species including maize and rice possess more than two copies of KNAT1. Notably, the Up-frameshift suppressor 2 (Upf2) domain is identified within AtKNAT1 and FvKNAT1 (Wang et al., 2006). The Upf2 domain is conserved in eukaryotes and is crucial for mRNA decay (Yi et al. 2021). Epstein-Barr virus nuclear antigen 3 (EBNA-3, PF05009), which is an EBNA family member that responds to stimulated Epstein-Barr virus-specific T cells during adoptive immunotherapy is found in BoKNAT1 (Wang et al., 2014). However, OsKNAT1f/1g and BdKNAT1d lack the ELK and Homeobox domains, respectively. Group III includes AtKNAT2/AtKNAT6 and its homologs. In addition to the major domains, FvKNAT6a contains an Integrator complex subunit 2 (INTS2, PF14750) domain, that is involved in snRNA transcription and processing. PtKANT6c contains an S-methyl trans domain (Homocysteine S-methyltransferase, pfam02574), and PtKANT6f contains a functionally unknown TMEM156 domain. Fern has similar domain organizations (Fig. 3).
In Class I, both B. rapa and B. oleracea exhibit a total of five members, including two homologs of AtKNAT6 as well as one homolog each of AtSTM, AtKNAT1, and AtKNAT2 (Fig. 3; Supplement File 2). The domain organizations remain highly conserved except for the duplicated BrKNAT1 members BrKNAT6a/BoKNAT6a which lack ELK and Homeobox domains (Magnani and Hake, 2008) and are not detected among the Group III proteins that also play roles in KNOX transcriptional regulation and leaf proximal-distal patterning.
B. napus contains 11 homologous proteins (BoKNAT1 duplication) of B. rapa and B. olereace. The domain organizations of B. napus KNOX proteins exhibiting a high degree of conservation with their respective donors. Notably, BnKNAT1-A shows an increased EBV-NA3 domain compared to BrKNAT1, while BnKNAT6b-A lacks the ELK and Homeobox found in BrKNAT6b. Some sequence lengths also exhibit variability, such as BnSTM-C compared to BoSTM and BnKNAT6a-A relative to BrKNAT6, Fig. 3). All Class I KNOX genes of the three Brassica species are syntenic to corresponding genes in Arabidopsis, except BnKNAT1b-C and BnKNAT6a-C (Supplement File 2). The syntenic genes suggest existence of the orthologs between B. napus and its parental species, namely, B. rapa and B. oleracea. In addition to inheriting most KNOX genes from its parents, new KNOX genes are also present in the genome of B. napus (BnKNAT1b-C, BnKNAT4b-C, BnKNAT6a-C, BnKNAT7a-A, BnKNAT7a-C, BnKNAT7c-C), while the BoKNAT6a ortholog is lost.
3.4 Phylogenetic and domain analyses of Class II
The functions of Class-II KNOX genes remain unclear, but they are potentially involved in the regulation of tissue differentiation, seed germination, root development and secondary wall formation (Li et al., 2011; Furumizu et al., 2015). Class-II may contain older KNOX proteins, and can be detected in all higher plants. A total of 2 members are present in P. patens/S. moellendorffii /F. vesca, 3 in M. truncatula, 4 in A. thaliana/B. distachyon/O. sativa, 6 in Z. mays/P. trichocarpa, 7 in B. oleracea, 8 in B. rapa and 17 in B. napus (three copies of BoKNAT7) (Fig. 4; Supplement File 1). Similar to Class I KNOX genes, the major domains found within Class II include KNOX1, KNOX2, ELK and Homeobox domain. The phylogenetic tree reveals three distinct groups: Group I consists of KNAT3/KNAT4, Group II contains KNAT5 and Group III includes KNAT7. KNOX proteins of fern and moss are at root of the Group-I. Group-I contains multiple members and two or three homologous proteins from each organism except for the eight members from B. napus, four from P. trichocarpa and one from strawberry. The phylogenetic relationships among Group I proteins are related to plant species evolution (Fig. 4). The domain organizations within Group-I are conserved except for BoKNAT4a and BnKNAT4a-C/4b-C. Group II exclusively comprises Cruciferae. AtKNAT5 possesses an additional enterotoxin motif in the heat-labile enterotoxin alpha chain. BnKNAT5a-A/5a-C have additional Virul-Fac motif. Group III encompasses AtKNAT7 and its homologs with conserved domain organizations except for ZmKNAT7a, which lacks the ELK domain, and BnKNA7a-C/7b-C which only possesses the KNOX1 domains.
All KNOX genes are duplicated in Brassica, with the exception of B. oleracea due to the presence of only one copy of KNAT7 in Class-II. Duplicated genes of Brassica encode proteins with conserved domain organizations. Based on the conserved relationships with Arabidopsis homologs, it is suggested that BrKNAT3a/3b and BoKNAT3a/3b may be involved in seed germination and early seedling development, while BrKNAT7a/7b and BoKNAT7 play roles in secondary wall formation.
B. napus has more than two of the sums of B. rapa and B. oleracea and the domain organizations are much conserved with their donators, except for BnKNAT4a-C lacking the ELK domain, BnKNAT7b-C/7c-C containing only the KNOX1 domain, and BnKNAT5a-A/5a-C carrying an additional Virul-Fac domain (pfam10139, Fig. 4). Similar to Class I genes, all Class II genes of B. rapa and B. oleracea show synteny with Arabidopsis homologs except for BnKNAT4b-C/7a-A/7a-C/7c-C (Supplement File 2).
3.5 Phylogenetic and domain analyses of Class-III
The KNATM, a novel KNOX subfamily, is maintained by a homeodomain-independent mechanism (Magnani and Hake, 2008; Gao et al. 2015). A bioinformatic analysis shows that KNATM is found only in dicots and that it lacks the ELK and Homeobox domain (Fig. 5; Supplement File 1). Class III contains only one member each in F. vesca, M. truncatula, A. thaliana and P. trichocarpa. B. rapa and B. oleracea contain duplicated KNATMs, whereas B. napus has quadruple KNATMs.
The relationships of Class III proteins are related to plant evolution but domain organizations are not well conserved (Fig. 5). PtKNATM, FvKNATM and MtKNATM conserve KNOX1 and KNOX2 domain organizations. AtKNATM only possesses the KNOX1 domain, whereas BoKNATM2 and BnKNAT2-C contain the KNOX2 domain. BrKNATM2 has KNOX2 and a Fer4_NifH domain (PF00142), which is found in various proteins that share a common ATP-binding domain. Conversely, BnKNATM2-A displays typical KNOX1, KNOX2 and P-loop NTPase domains. In addition to the KNOX1 and KNOX2 domains, BrKNATM1 and BoKNATM1 have an additional Chlamydia polymorphic membrane protein middle (ChlamPMP_M) domain (PF07548). However, their homologs BnKNATM1-A/1-C in B. napus lack these domains (Fig. 5). Similar to Class II genes, all genes in Class III from B. rapa and B. oleracea show synteny with Arabidopsis homologs (Supplement File 2).
3.6 Analysis of cis-acting elements of BnKNOX gene promoters and gene structure in B. napus
The cis-acting elements of promoters specifically bind to transcription factors to form transcription initiation complexes, which initiate gene expression. Therefore, we identified 717 cis-elements belonging to 26 different types (Fig. 6). These cis-elements could be divided into three groups, plant growth and development, phytohormone responses and abiotic stress responses. For instance, we detected 12 light-responsive cis-elements involved in growth and development with a cumulative occurrence of 388: ACE, AE-box, ATCT-motif, Box 4, GA-motif, GATA-motif, G-Box, GT1-motif, I-box, L-box, MRE and TCT-motif (Table S4). Among the cis-acting elements involved in hormone response, ABRE, GAREs (GARE-motif and P-box), O2-site, TCA-element, TGA-element and the MeJA-responsive (CGTCA-motif and TGACG-motif) were identified in the promoter elements regions of 59, 18, 14, 28, 18 and 98 occurrences respectively. Additionally, drought and low temperature-stress related cis-acting elements were also detected in the promoter regions of BnKNOX genes. These findings indicate that BnKNOXs may be pivotal in modulating the growth of plants development and may help elucidate precise functions of the proteins from the BnKNOX genes family.
In order to explore the structural diversity of BnaKNOXs, a comparative analysis was conducted on the gene structures. Visual analysis revealed that while most family members exhibit similar counts of exons and introns, there is variation in their length. Furthermore, it was observed that the majority of these members possess 3–7 exons, with the exception of BnKNAT7a-C which contains 2 exons, and the BnKNAT7c-C gene which contains only 1 exon. The distribution pattern of exons and introns appears to be intricate, suggesting a potential correlation with phylogenetic subgrouping.
3.7 Gene collinearity and duplication of BnKNOXs in B. napus
The gene collinearity analysis facilitates the discovery of homologous sequences within species, which can be used as evidence of the whole genome duplication events. We detected 26 duplication events, 15 of which occurred between subgenome A and C (Fig. 7). Notably, the BnKNOXs on chromosome A05 were not collinear with BnKNOX genes on other chromosomes (Fig. 7, Supplement File 4). In chromosomes, gene family can expand by tandem, segmental replication or whole genome (Soylev et al. 2019). In BnKNOXs gene family, 24 pairs were amplified by fragment replication, and only 1 pair (BnKNAT7b-A/ BnKNAT7a-A) were amplified by tandem replication (Supplement file 5).This result suggested that fragment replication events contributes most to the expansion of the BnKNOXs in B. napus. Previous study shows that duplication of genes can prevent the loss of function caused by genetic mutation (Abdullah et al. 2022; Yadav et al. 2023).
To analysis the mechanism by which BnKNOX gene family evolved, the Ka/Ks ratio was calculated on the 26 pairs of genes with collinearity. The results showed that the Ka/Ks ratio of all duplicated BnKNOX gene pairs were < 1(Supplementary file 4), which indicates that this family was subject to purifying negative selection throughout evolution. Additionally, the duplication events date was estimated around 0.5–31 MYA (Million Years Ago).
3.8 Expression patterns of BnKNOXs in different tissues from BrassicaEDB
The expression patterns of a gene are closely related to its function. For a comprehensive understanding of BnKNOXs functions, we analyzed all members’ expression patterns during flowering and fruiting transition stage using available transcriptome data in different tissues including young leaves, roots, seeds, flowers, siliques (Fig. 8A). The results showed that all BnKNOX genes were expressed at least 1 tissue indicating they may play a vital role during the developmental stage. Remarkably, the expression pattern of BnKNOXs in different tissue was quite different among the three subgroups, while the members clustered in the same group showed a relatively similar expression pattern (Fig. 8A). It means that BnKNOXs may affect diverse biological functions in different tissues. The transcripts of most genes from group II were relatively highly observed in most of the tissues, reflecting their ubiquitous roles in plant development. The four BnKNOXs (BnKNATM1-A, BnKNATM1-C, BnKNATM2-A, BnKNATM2-C) in group III were expressed at very low levels in all tissues except for higher expression in mature seed coat.
Similarly, BnKNOXs from group I were concentratedly expressed only in stem, root, and inflorescence tip. For example, BnSTM-A, BnSTM-C and BnKNAT1a-C were highly expressed in various stages of stem, suggesting their important roles in stem development.
3.9 BnKNOXs Expression levels of in reproductive organs by qRT-PCR
Numerous studies have demonstrated the significant roles of KNOX genes in floral organs (Box et al. 2012) and fruit development (Keren-Keiserman et al. 2022). To confirm the temporal and spatial expression patterns of BnKNOX genes, we subsequently conducted qRT-PCR experiments in various tissues, including buds (bolting bud, 0.8 cm bud, 1.2 cm bud, 1.6 cm bud), floral organs (sepal, petal, anther and stamen), seeds, siliques (1 cm silique, 3 cm silique and 5 cm silique) and young leaves (Fig. 8B). The expression level of BnSTM-A in young leaves was used as a reference with a value set at 1. Overall, the qRT-PCR results are consistent with the transcriptome data trends observed. For example, the expression profile showed that Class I and Class II KNOX genes were expressed broadly in the bud development compared with Class III group. Notably, BnKNAT3a-A and BnKNAT3a-C displayed high expression levels in bolting buds, suggesting their potential involvement in inflorescence formation. Furthermore, we found significantly elevated expressions of five BnKNOXs (BnKNAT7a-A, BnKNAT7a-C, BnKNAT7b-A, BnKNAT7b-C, BnKNAT7c-C) specifically in stigma indicating their putative roles in stigma development. Interestingly, BnKNOXs in Class III (BnKNATM1-A, BnKNATM1-C, BnKNATM2-A, BnKNATM2-C) were specifically highly expressed in seeds, implying their crucial functions during seed development processes. Additionally, BnKNAT3a-C and BnKNAT3b-A might play vital roles during silique development stage. Moreover, gene members belonging to the same group exhibited similar expression characteristics.