The phylogenetic relationships between MLO proteins from various plant species were investigated to obtain insights into their evolutionary history and possible functional variation. Associated with the phylogenetic investigation based on the protein sequences, a genome-wide analysis of the selected MLO genes was also carried out including gene and CDS size as well as the number of introns and exons. Analysis of the amino acid sequences allowed to predict a range of protein characteristics, such as the presence of functional domains (MLO, TM and CaMBD), as well as amino acid motifs, general protein physicochemical properties (MW, pI and GRAVY values), and subcellular protein localization.
Analysis of the phylogenetic tree based on MLO protein sequences revealed that members of the family in monocots, family Poaceae, were found exclusively in clade VI (Fig. 1). Monocot and dicot MLO proteins have evolved group-specific sequence conservation patterns (Appiano et al., 2015; Chen et al., 2021; Pépin et al., 2021; Traore et al., 2021). Indeed, MLO proteins related to the PM susceptibility in monocots and dicots are grouped into separate clades (Fig. 1). Among the dicot MLO proteins, which were more numerous in this study, some special groupings were also noted, as was the case of the Solanaceae MLO proteins comprising a separate clade III. It is important to highlight that in studies that include families of MLO proteins from different species, not all protein family members of the same species remain isolated in an exclusive clade in the phylogenetic tree. Contrary to this, it is more common for them to be distributed among different clades and subclades (Qin et al., 2019). As an example, here we found that the MLO proteins from the Rosaceae family, well represented in this study, were distributed in different clades and subclades in the phylogenetic tree. The same was also noted for MLO proteins from the families Fabaceae, Cucurbitaceae, Cannabaceae, Moraceae, Euphorbiaceae, Salicaceae and Vitaceae (Fig. 1; Table S1).
A great benefit of the phylogenetic approach lies in the homology, therefore, prediction of conserved function between MLO proteins from different species, as in the case of susceptibility to PM (Traore et al., 2021). In general, members of the same clade appear to be evolutionarily conserved (Konishi et al., 2010). Analysis of homology and conservation of MLO proteins may indicate a common ancestral origin, followed by the species differentiation (Tian et al., 2022). According to the hypothesis of Kusch et al., (2016), MLO could be an ancestral protein, having evolved from unicellular photosynthetic eukaryotes, consolidating in land plants.
Segmental duplication could be the main form of amplification of the soybean MLO family (Shen et al., 2012), with the possibility of tandem duplication events, followed by migration to other parts of the genome in Oryza sativa (Liu and Zhu, 2008). From the perspective of homology and conserved function, some similarities between clades and subclades were notable. Interestingly, the composition of clade I (Fabaceae, Cucurbitaceae, Cannabaceae, Moraceae and Rosaceae) was repeated in clade IVb, with the addition in the latter of MLOs from two other families, Euphorbiaceae and Salicaceae. It is therefore tempting to hypothesize the origin of MLO proteins before the separation of these taxonomic groups. Indeed, not only angiosperms are known to contain MLO proteins, but also gymnosperms, lycophytes, bryophytes, and even algae, highlighting the long evolutionary history of this protein family (Pépin et al., 2021).
The main characteristics of MLO proteins involved in plant susceptibility to pathogens that cause PM are the presence of the MLO domain, seven TM domains, and the CaMBD domain, the latter being located downstream of the seventh TM in the C-terminal portion of the protein (Chen et al., 2014; Deshmukh et al., 2016; Pépin et al., 2021). CaMBD is associated with the response to Ca2+ signal, modulating defense against PM (Shen et al., 2012). The CELLO subcellular localization prediction tool indicated with a high degree of probability, that all proteins evaluated in this study are indeed localized to the plasma membrane (Fig. S1a). This is in line with the previous findings by Feechan et al., (2009) and Deshmukh et al., (2016) that verified the localization to the plasma membrane for several MLO proteins using confocal microscopy. All proteins selected in this study contain a relatively large MLO domain, ranging from 414 to 497 amino acids in length, with a median of 480 amino acids (Fig. S1b). Regarding the number of passages through the plasma membrane, it is important to highlight that there is not always a complete correspondence between different tools used for this prediction. Regardless of the tool adopted, 24 proteins in this study were predicted to have 7 TM domains, which is the key characteristic of MLO proteins (Devoto et al., 1999). For the other proteins studied here, seven passages through the membrane were also predicted using at least two different tools (Table S1). Additionally, all MLO proteins analyzed in our study were found to have the CaMBD domain (Fig. S2).
The passages through the membrane could be illustrated by analyzing the topology of the proteins with the Protter tool, which made it possible to visualize the arrangement of each protein concerning the plasma membrane (Figs. 2 and S3). Evolutionarily, the topology of MLO proteins is highly conserved, with seven TM domains and an intrinsically unstructured C-terminal tail (Kusch et al., 2016). Regarding this topology, the investigated MLO proteins present, on average, 55.1% of their amino acids exposed in the cytoplasmic environment, 26.5% in the membrane, and 18.4% in the extracellular space. These observations are relatively similar to what was previously reported in Devotto et al., (1999), when they found a distribution of 60%, 25%, and 15% residues in the cytoplasm, the membrane, and the extracellular space, respectively. According to Traore et al., (2021), the extracellular loop 1 and intracellular loops 1 and 2 may be essential for susceptibility to PM. According to Reinstädler et al., (2010) and Deshmukh et al., (2016), the last two cytoplasmic loops could be critical for PM susceptibility in different species. Chen et al., (2014) draw attention to two conserved regions that modulate PM infection in the C-terminal portion of MLO. In any case, percentage of amino acid identity between the MLO protein segments exposed on the cytoplasmic side may be related to common processes in the intracellular context.
Our study revealed a higher level of MLO protein sequence variability in the extracellular space than in the cytoplasmic and transmembrane spaces. It is assumed that the lower percentage of identity between amino acids in the extracellular space may be related to the specificity of the plant-pathogen interaction, since this region may be subject to greater selective pressure. Each plant species is attacked by a specific pathogen and, therefore, it is possible that each pathogenic species has its peculiar mechanism adapted to overcome the general or specific defense layers of its host. Thus, diversification between MLO proteins from different species may be the result of natural selection, which could help explain greater dynamics regarding the coevolution of different pathosystems. It was interesting to note that, despite the low percentage of identity between amino acids in the extracellular space, MLO proteins of some clades and subclades share relative similarity, in accordance with the phylogenetic tree. In the same way that it was possible to verify for amino acids in the intracellular spaces and transmembrane domains. These findings represent clues to the particular evolutionary histories of each pathosystem.
In the same line of reasoning as above, the amino acids that cross the membrane are the most conserved, as was also observed previously by Deshmukh et al., (2016). The high percentage of identity between proteins concerning these segments can be explained by the immersion and anchoring function of MLO proteins in the plasma membrane - their site of action. Additional research is needed to identify molecular mechanisms associated with the cellular processes after the pathogen makes an intimate contact with the plant.
Subsequently, we sought to characterize the amino acid motifs conserved in different segments of MLO proteins, extracellular, transmembrane, and cytoplasmic, as well as in the entire protein. For the segments that cross the membrane, the predominance of hydrophobic amino acids was notable, while for the extracellular and intracellular spaces, polar amino acids with positive, negative, or no charge were predominant. For full-length proteins, the majority of detected motifs were found in the MLO domain. Since the MLO domain is relatively large, ranging from 414 to 497 amino acids among the different members of the MLO family, the fact that the amino acid motifs observed in our search reside in the MLO domain (Pfam03094) is not unexpected. Deshmukh et al., (2016) have previously identified the highly conserved LEETPTW motif, which we confirmed in this study (Fig. S4b). Interestingly, all studied proteins have the MLO domain, however not all of them shared the 15 identified amino acid motifs. As an example, within the Fabaceae clade, two typical motif distribution models were identified, supporting the formation of subclades (Fig. S4a). On the other hand, MLO proteins from the same clade may differ in the composition of amino acid motifs. For example, SlyMLO01 is the only protein in the Solanaceae clade that does not contain the motif number 8, while NtaMLO01, in turn, does not have the motif 15 (Fig. S4a). Several particularities observed here highlight the significance of these variations for the diversity of the MLO family. In general, proteins conserved across different species indicate functional redundancy. On the other hand, some differences in the composition, position and presence of specific motifs could be important for the adaptation of each protein to its biological role. Differences in terms of conserved protein motifs and gene structures between different clades and subclades imply potential divergences in gene function (Tian et al., 2022).
Typically, MLO genes have a relatively high number of introns, generally 7 to 14 per gene (Chen et al., 2014; Deshmukh et al., 2016; Traore et al., 2021). In the context where the loss of function of these genes can mean heritable, broad-spectrum, and recessive disease resistance (Traore et al., 2021), the characteristics that were brought together and highlighted in this study can contribute to new studies strategies.
Regarding the composition of amino acids, we identified that MLO proteins are rich in leucine, and this was also noted previously by Deshmukh et al., (2016). In addition, we found that serine and valine were also overrepresented in the MLO proteins set studied here (Fig. 6). At a plant family specific level, it is important to emphasize the more prominent presence of alanine in MLO proteins from the Poaceae family, while valine was more abundant in the Poaceae and Solanaceae families. Among the less abundant amino acids, cysteine, methionine, and tryptophan stand out. Leucine abundance is important for several reasons, such as contribution to the stability of proteins in the membrane, the interaction of the protein with other molecules, regulation of plant metabolism, metabolic pathways, perception of the environment, signal transduction, and a marked presence in leucine-rich repeat containing immune receptor proteins (Ariza-Suarez et al., 2023; Wang et al., 2024). Serine, on the other hand, is important as a site for phosphorylation, a chemical change that can activate or inactivate specific functions in response by cells to changes in their environment at both a biotic and abiotic level (Rawat et al., 2023). The phosphorylation of serine residues in membrane proteins plays a fundamental role in regulating the activity of these proteins. Among many examples, it was for instance found in Arabidopsis that the phosphorylation of Ser-511 of the MPK15 protein is critical for its function in resistance to PM (Shi et al., 2022). Valine may also play a role in plants' response to environmental stresses such as cold, salinity, and drought (Shan et al., 2021), protect against oxidative stress, and improve mitochondrial performance (Sharma et al., 2023), as well as in plant resistance in defense processes against biotic stresses (Han et al., 2024). Among the less frequent amino acids, cysteine stands out, an amino acid conserved in MLO proteins, being part of the seventh passage through the membrane and with an important action on the stability of some proteins that form disulfide bonds. Elliott et al., (2005), Tian et al., (2022) and Chen et al., (2021) highlighted invariant cysteine and proline residues in the extracellular loop or the TM domain as fundamental to MLO function.
In clades III and VI, the MLOs of the Solanaceae and Poaceae families, respectively, were exclusively gathered. Clades I, II, IV, and V, in turn, had more than one family in their composition. However, a significant number of MLOs from Fabaceae and Rosaceae can be noted in clade I and V, respectively. Brassicaceae MLOs were found only in clade II, and clade IV can be considered as that of multiple families. Given these premises, it was observed that the highest representativeness values of the HR, ASR and DR categories were observed for the Poaceae (VI) clade. In relation to the LR category, a certain balance was obtained between the clades, with a subtle superiority of the Fabaceae and Brassicaceae clades. Finally, PPRs from the Solanaceae clade demonstrated relative enrichment of the BSR and Core categories. On the other hand, with regard to low representation rates, it is necessary to highlight again the Solanaceae clade for the HR category, the Brassicaceae clade for the ASR and BSR categories and the multiple family clade for the BSR category. According to Davis et al., (2017) MLO proteins would have redundant functions in PM infection, root trigmomorphogenesis and sexual reproduction. Functions originally diverse from the response to biotic stress may justify the distribution of CRE categories in the PPRs of MLOs. These results provided information about the representativeness of the CRE categories for each clade in this study, which can be related to different stimuli, in addition to those specific to the pathogens. Interestingly, different categories of CREs were overrepresented among MLOs (Fig. S5), which indicates complexity in regulating the expression of these genes. New developments of these findings may further elucidate the modulation of MLO expression from different aspects.