Comparative chloroplast genomic analysis
To address whether cpDNA reflects SH resistance phenotypes, we assembled the whole chloroplast genomes of 42 individuals and studied the topology of the resulting haplotype network. Newly assembled chloroplast genomes ranged between 154,815 and 155,188 bp long and were grouped into 16 haplotypes (Fig. 1A, Supplemental Table S1, Table 2).
Haplotype H01 was composed of 22 individuals, regardless of the SH factors they showed (Supplemental Table S1). Indeed, all nine SH factors were represented in this haplotype at least three times, such as SH8, and up to 19 times in the case of SH5 (Fig. 2A). Additionally, seven out of the nine SH factors were found in two or more haplotypes (Fig. 2B). These results suggested the lack of maternal inheritance of the SH resistance factors throughout the chloroplast genome. Consistently, haplotypes were distributed within the network, separating individuals per species instead of per resistance factors. Apart from haplotype H02 the ‘C. arabica and HDT derivative’ haplotypes clustered together (Fig. 1A). This low differentiation was not surprising given that most of the C. arabica and HDT-derivative individuals are relatives. Examples can be found in haplotypes H03 and H05, which included the parental female and the siblings altogether. Furthermore, some of the closest haplotypes included individuals that come from the same geographic origin, such as H04 and H05 (with Ethiopian backgrounds). Others, like haplotypes H03 and H07, grouped landrace genotypes from the northeast African highlands (Rume Sudan Ethiopian landrace), the geographic origin of C. arabica [11, 14]. When the information from 18 conspecific individuals from NCBI (13 C. arabica individuals) was included, the C. arabica and HDT derivative cluster were reinforced (Fig. 1B and Supplemental Table S1).
C. arabica and HDT-derivatives cluster was closer to C. eugenioides haplotypes (genetic distance = 29, Fig. 1B) than to C. canephora haplotypes (genetic distance = 800). This was congruent with the accepted hypothesis that C. eugenioides was the female parent of C. arabica [4, 6]. Indeed, based on the similarity in plastid DNA sequences, previous research has suggested that C. eugenioides was the ovule donor during the C. arabica hybridization event [7, 33–35].
Our results further suggested that C. arabica was the female parent of the spontaneous hybrid HDT. To our knowledge this was the first molecular study addressing the maternal donor of HDT as field and morphological characteristics were used to infer C. arabica x C. canephora as its ancestors [11, 14].
Haplotype H02 comprised three closely related individuals, with C. arabica 34/13 being the parental female of both C. arabica H147/1 (C. arabica 34/13 x C. arabica 110/5) and HDT- F2_H535/10 (C. arabica 34/13 x HDT 1343/269) (Fig. 1, Table 2, Supplemental Table S1). The C. arabica 34/13 genotype was obtained from hybrid derivatives of C. arabica x C. liberica developed in the Central Coffee Research Institute (CCRI) of Balehonnur, India [11].
Our analyses further revealed that individuals from haplotypes H02 and H16 shared nearly identical cpDNA sequences, suggesting a close maternal parentage between them. Haplotype H16 represented the Kawisari hybrid, which is one of the oldest Indian hybrids created from C. arabica x C. liberica. Our network analysis showed that H02 and H16 were more closely related to C. canephora and C. liberica/excelsa haplotypes (genetic distance = 380 and 382, respectively) than to C. arabica and HDT-derivatives (Fig. 1B). The substantial genetic distance between haplotype H02 and the other Arabica haplotypes (almost one thousand) raised the possibility that the genotypes within H02 may have originated from a non-arabica/eugenioides female parent, potentially related to the Kawisari hybrid. Further research is needed to ascertain its inheritance.
Haplotypes found in C. canephora, C. liberica/excelsa, C. racemosa, and C. eugenioides showed considerable genetic distances within each species, particularly when compared with C. arabica (Fig. 1). Our data reinforced the knowledge of the lower polymorphic genetic diversity of C. arabica when compared to the diploid relative species [36].
Comparative analysis of nuclear genes encoding plastid-targeted proteins
After confirming that cpDNA did not explain individual resistance patterns, we focused on nuclear-encoded chloroplast proteins, described as targets of retrograde signalling generated within the chloroplast [17]. The chloroplast proteome has been estimated between 2100 and 3600 proteins, and approximately 3000 chloroplast proteins are nuclear-encoded [37]. To detect possible association with SH factors, we focused on the 25 individuals with known SH factors (Supplemental Table S1) and on the following nuclear-encoded protein families involved in resistance and acting on chloroplasts [16, 20, 21]: ATP-dependent zinc metalloprotease (FtsH); Elongation factor Tu (EFTU); Ferredoxin-thioredoxin reductase (FTR); Thioredoxin reductase (TRR); D-glycerate 3-kinase (GLYK); NAD(P)H dehydrogenase-like (NDH); Thioredoxin and Thioredoxin-like (TRX); Translation initiation factor (IF); Oxygen-evolving enhancer protein (OEE); and Cytochrome b6-f complex iron-sulphur subunit (ISP). In total, 89 nuclear regions were analysed considering DNA variants in the ORF as well as upstream and downstream flanking regions. We found 139 variants unevenly distributed among 11 nuclear regions that corresponded to polymorphisms in 8 proteins (Table 1). In addition, several variants found in the upstream and downstream flanking regions (regions 61 and 21 in Fig. 3A, respectively) can also play a role in controlling transcriptional and post-transcriptional events. A disproportionate number of variants for the membrane-anchored thioredoxin-like protein HCF164 (114 out 139) were exclusively shared among SH9 individuals and were mainly associated with chromosome 7c (112 variants) (Fig. 3A). A detailed analysis of this gene showed clear differences between individuals with or without the SH9 factor (Figs. 3B and 3C). The clustering of individuals within the haplotypic network estimated for this region suggested the potential relationship between variants identified in the HCF164 nuclear region and the presence of the SH9 factor (Fig. 3B). Moreover, the observed variants impacted the peptide sequence codified by this region as protein prediction performed on the 25 studied individuals allowed us to identify three HCF164 protein isoforms. Two of the previous isoforms were exclusively found in non-SH9 individuals both exhibiting the thioredoxin domain and the redox-active disulphide center (CEVC catalytic motif). On the other hand, the five SH9 individuals (HDT genotypes: 832/1; 4106; H420/10; HW26/13; H419/20) shared the third isoform, in contrast to those found in the non-SH9 individuals. This isoform lacks the thioredoxin domain and the peptide sequence of the redox-active disulphide center due to a 19-residue deletion identified in this work (Fig. 3C).
Table 1: DNA variants of chloroplast nuclear-encoded proteins potentially associated with SH phenotypes. Chromosome number and parental origin (c – C. canephora; e – C. eugenioides), locus code, protein description, SH factor and number of variants found in the ORF and the upstream and downstream flanking regions are shown
Three-dimensional structural models were developed for HCF164 proteins expressed in SH9 and non-SH9 individuals showing that the α-helix ranging from cysteine 163 to aspartate 182 (C163-D182) containing the typical CEVC catalytic motif of the protein was completely absent in the SH9-individuals (Supplemental Figure S1). This suggests that the lack of the redox-active disulphide center of HCF164 protein in the SH9 individual might have important biochemical implications as thioredoxins target several proteins and can modulate their activity. HCF164 is a membrane-anchored thioredoxin-like protein known to be indispensable for the assembly of the cytochrome b6-f complex (Cytb6-f) in the thylakoid membranes; the loss-of-function hcf mutants exhibited decreased photosynthetic electron transport rates [38].
Cytb6-f provides an essential electronic connection between the light-powered chlorophyll protein complexes, photosystems I and II (PSI and PSII). It is suited to sensing the redox state of the electron transfer chain and the chloroplast stroma, interacting with various regulatory elements that transduce these signals to optimise photosynthesis in fluctuating environmental and metabolic conditions [39]. Cytb6-f complex is a ~ 220 kDa functional dimer with each monomeric unit comprising four major subunits: cytochrome f, cytochrome b6, Rieske iron-sulphur protein (ISP) and subunit IV; as well as four minor subunits [39 and references therein]. Results obtained by Motohashi and Hisabori [40] suggested that the interaction between HCF164 and both the cytochrome f and ISP subunits were important prerequisites for the correct assembly of the Cytb6-f complex. They further evidenced the physiological significance of HCF164 as a transducer of reducing equivalent within the thylakoid lumen. In addition to this complex, HCF164 may interact and probably reduce other target proteins of the thylakoid membrane, such as metalloprotease FtsH2 and FtsH8, several ATP synthase subunits and chlorophyll a-b binding proteins [38, 40].
HCF164 protein-protein interactions were explored with the STRING database (only protein-protein interactions retrieved from Experimental/Biochemical Data or Association in Curated Databases were considered) using Arabidopsis protein annotations (as the interaction networks are better characterised than in coffee). As DNA variants for GLYK (6 variants localised in chromosome 4e; Table 1) were also exclusively found in SH9 individuals, we consider both proteins for the STRING analysis. Although no direct interaction between HCF164 and GLYK proteins was evidenced, the enrichment p-value obtained (< 1.0e-16) supports that, as a group, the proteins were metabolically connected (Fig. 4). GLYK catalyses the conversion of glycerate to 3-phosphoglycerate involved in photorespiration and redox metabolism. The glyceraldehyde-3-phosphate dehydrogenases ALDH7B4 and ALDH3H1 are described as stress-responsive dehydrogenases that catalyse the conversion of glyceraldehyde 3-phosphate to D-glycerate 1,3-bisphosphate. HCF164 shows several interactions with superoxide dismutase (CDS1, CDS2) and peroxiredoxins (2CPA, 2CPB, PRXIIA, PRXIID, PRXIIE) (Fig. 4).
A relationship between redox metabolism, photorespiration, and glycolysis was evidenced, establishing a biological connection between these processes. Thereby, any changes to the balance of these proteins can affect chloroplast metabolism. Recently, the mechanisms of stripe rust (Puccinia striiformis f. sp. tritici) effectors in wheat have been identified. Rust effectors targeted the ISP subunit of the Cytb6-f complex: some effectors interacted with ISP (nuclear-encoded chloroplast protein) in the cytosol blocking its translocation to the chloroplast [21]; other effectors interacted with ISP within the chloroplast preventing the complex assembly [20]. Both types of effectors interfered with the Cytb6-f complex functioning and ROS production by chloroplasts [20, 21]. The authors further showed that completely blocking the Cytb6-f complex assembly was not advantageous for the fungus as it led to insufficient nutrients for fungal development in the latter stages of infection. So, although biotrophic rust fungi need to suppress chloroplast-mediated defences by their host plants, they need to retain the biosynthetic abilities of these organelles, which are vital for their survival.
The association of HCF164 polymorphism on chromosome 7c (C. canephora origin) with the resistance factor SH9 aligns with the existing information that the SH9 coffee resistance factor to H. vastatrix is derived from major genes from C. canephora (considered a resistance source) [12 and references therein]. The lack of a thioredoxin domain and redox-active disulphide center of HCF164 protein isoform expressed only by the SH9 individuals may suggest a biochemical advantage of these individuals over others. This difference in HCF164 function may result in a greater ability for SH9 individuals to resist fungal infections or to better regulate other biological processes. On the other hand, the redox-related roles of HCF164 might be taken over by other thioredoxins-like proteins. It will be necessary to determine if SH9-HCF164 is recognized by Hv effectors and if it could act as a decoy, preventing the effector’s function(s) while still allowing normal plant development. However, functional redundancy (or metabolic flexibility) is proposed but has not yet been fully characterised. Our results further reinforce the chloroplast-mediated defences against leaf rust, particularly carbon metabolism and redox homeostasis [16]. This study shows a strategy for searching proteins/genes associated with SH factors as well as candidate H. vastatrix effector targets, thus opening new perspectives for plant breeding programs.