Combinatorial interactions among natural structural variants of Brassica SOC1 promoters and SVP depict conservation of binding affinity despite molecular diversity

doi:10.21203/rs.3.rs-2543926/v1

Download PDF

Research Article

Combinatorial interactions among natural structural variants of Brassica SOC1 promoters and SVP depict conservation of binding affinity despite molecular diversity

https://doi.org/10.21203/rs.3.rs-2543926/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Genes constituting floral regulatory network can be targeted to generate climate resilient, early flowering crops. SOC1, a central integrator of flowering, is downregulated by SVP. In highly duplicated, amphidiploid genome of Brassica juncea, flowering is plausibly mediated by combinatorial interactions among natural variants of multiple SOC1 promoters and SVP. Although fluctuating temperatures can influence energetics of molecular interactions, a mechanistic view on how these impact phenotypes remains unexplored. Analysis of binding patterns of biomolecules thus underpin new paradigms for precision trait engineering. Herein, we characterize 9 natural variants (homeologs and isoforms) of B. juncea SVP differing in MIKC domains. Generation and characterization of refined models of 15 SVP proteins (natural and hypothetical) and 3 SOC1 promoter fragments revealed extensive structural diversity. Despite this, binding affinity of 48 docked complexes were comparable except in cases where truncated proteins were involved. Investigation of 27 docked complexes for distribution and type of molecular contacts (π-π stacking, hydrophobic interactions, Van-der-Waals forces, Hydrogen bonds); shared or unique interacting patterns, revealed substantial variation suggesting involvement of compensatory mutations for preserving binding. Yeast one-hybrid assays validated binding potential predicted in docked complexes. Conserved amino-acid residues and nucleotides involved in non-covalent interactions were identified. Computational alanine substitution, established the cruciality of amino-acid hotspots conferring stability to docked complexes. Our study is relevant from an application standpoint. Identification of conserved amino-acid hotspots is essential for rational protein design since targeted mutagenesis of these can modify natural binding spectrum of regulatory proteins, and is a way forward for trait engineering.

Polyploidy

flowering

homeolog

docking

Y1H

crop engineering

Identification of conserved amino-acid hotspots is essential for rational protein design since targeted mutagenesis can modify binding spectrum of regulatory proteins, and is a way forward for crop engineering.

Brassica species, a major source of edible oil, vegetables and condiments, are globally recognised for their agronomic value (Friedt et al. 2018; Jat et al. 2019). Large scale disruption of soil ecology, erratic weather patterns, and terminal heat stress continue to cause crop losses (Elias et al. 2019; Srinivasarao et al. 2021). Generation of climate-resilient Brassica varieties demonstrating increased productivity is thus an essential imperative for sustainable agriculture. Field environments encounter sudden and drastic temperature shifts causing altered thermodynamic interactions amongst biomolecules which control most agronomic traits. Notwithstanding, the mechanistic aspects of such adaptive crop traits have not been unravelled from a structural biology viewpoint even though structural basis of transcriptional regulation has been investigated (Wolberger 2021; Strader et al. 2022; Zhang et al. 2023). Introgression of earliness in flowering, an agronomic trait of interest in Brassicas, promises mitigation of yield losses encountered due to early onset of summers and other environmental stresses (Blümel et al. 2015; Jat et al. 2019; Kaur et al. 2021).

The morphologically diverse Brassica species share a unique ancestry illustrated by polyploidy driven adaptive radiation (Soltis et al. 2015; Van De Peer et al. 2017). Whole genome sequence analysis of Brassicaceae species suggests extensive remodelling of genomes caused by cycles of genome duplication, gene fractionation and DNA rearrangements (Parkin et al. 2005; Schranz et al. 2006; Ziolkowski et al. 2006; Lysak and Koch 2011). Diploid species, B. rapa (AA), B. nigra (BB) and B. oleracea (CC), and, allo-tetraploid species, B. juncea (AABB), B. carinata (BBCC) and B. napus (AACC), bear partially redundant sub-genomes representing varying levels of gene fractionation viz. Least Fractionated (LF), Moderately Fractionated (MF1) and Most Fractionated (MF2) (Nagaharu 1935). During the course of evolution, gene homeologs accumulate natural mutations in both coding and cis-regulatory elements resulting in functional and morphological diversification (Sankoff et al. 2010; Cheng et al. 2018; Nieto Feliner et al. 2020). One outcome, unique to polyploids, is an increased molecular diversity of promoters and proteins which interact combinatorically to modify phenotypes. Dissection of molecular genetic bases of agronomic traits is thus challenging in Brassicas since multiple homeologs regulate phenotypes collectively (Parkin et al. 2005; Glover et al. 2016; Schiessl et al. 2017).

In Arabidopsis thaliana, a complex gene regulatory network, comprising of circa 306 genes, governs flowering time (Bouché et al. 2016; Kinoshita and Richter 2021; Lau et al. 2021; Quiroz et al. 2021; Xu and Hong 2021). The floral integrator SOC1 (SUPPRESSOR OF OVEREXPRESSION OF CONSTANS), a MADS domain protein (Lee and Lee 2010), is a preferred candidate for introducing earliness in plant species (Lee et al. 2000; Samach et al. 2000; Yoo et al. 2005; Liu et al. 2008; Hong et al. 2013). Genetics and interaction studies among promoters and upstream regulators establish centrality of SOC1 in floral regulation (Borner et al. 2000). Since SOC1 integrates signals from multiple input pathways, transcription factor binding sites (TFBS) for many upstream regulators, are present on SOC1 promoter elements (Sri et al. 2020) which have been validated in A. thaliana (Seo et al. 2009; Lee and Lee 2010; Immink et al. 2012; Hong et al. 2013), B. juncea (Jiang et al. 2018; Li et al. 2018; Yan et al. 2018; Zhou et al. 2018; Ma et al. 2019)d rapa ssp. chinensis (Huang et al. 2019). Among numerous upstream regulators of SOC1, SVP (SHORT VEGETATIVE PHASE), also a MADS domain containing protein, integrates signals from thermo-sensory pathway (Lee et al. 2007), gibberellic acid and autonomous pathways (Li et al. 2008). Repression of SVP results in potent activation of SOC1 and acceleration of floral transition even in the absence of promotive effects of AGL24 and FT (Li et al. 2008; Tao et al. 2012). SVP has been shown to directly bind with SOC1 promoter in A. thaliana (Li et al. 2008; Immink et al. 2012; Tao et al. 2012; Pajoro et al. 2014; Preston et al. 2014). The B. rapa genome represents 2 copies of SVP in BRAD (Bra038511_AALF and Bra030228_AAMF1 (Sri et al. 2020) whereas B. juncea contains only a single copy (BjuB024255). In case of SOC1, 3 annotated copies (Bra004928, Bra0039324 and Bra000393) are reported in B. rapa Chiifu (V 1.5, BRAD). In B. juncea, 4 annotated copies (BjuA017536, BjuA010734, BjuB020000, BjuB046838) and 3 unannotated copies are reported in BRAD (Sri et al. 2020). Understandably, retention of multiple homeologs of both SOC1 and SVP in polyploid Brassicas implies an interplay of combinatorial interactions at DNA-protein level.

Analysis of SOC1 promoters in sub-genome specific homeologs of B. juncea reveals partitioning of ancestral TFBS resulting in differential expression levels of SOC1 gene copies (Sri et al. 2020). The 3 SOC1 promoter homeologs (AALF, AAMF1 and AAMF2) harbour a single binding motif for SVP which is variable pointing to variable binding affinity with SVP proteins. The SVP binding site on AALF, AAMF1 and AAMF2 promoters is 5’-CCAAAATAGC-3’, 5’-CCAAGAAAGC-3’ and 5’-CCAAAAATAGC-3’, respectively (Sri et al. 2020).

The complexity of DNA-protein interactions among natural structural variants of homeologs are poorly understood in plants. Though application of experimental methods for unravelling such interactions are cumbersome in polyploid Brassicas, structural informatics promises relevant hypotheses for structure based rational protein engineering of select candidates. In this study, we report natural structural variation among B. juncea SVP proteins and SOC1 promoters and show how these translate into a spectrum of unique and shared binding patterns. Yeast one-hybrid assays has been used to confirm the binding potential determined by in-silico approaches. Through our study, we attempt to convince the reader on relevance of integration of structural informatics and protein engineering methods for precise modification of crop traits.

2.1 Retrieval and isolation of SVP sequences

The SVP sequence from A. thaliana was retrieved from The Arabidopsis Information Resource (TAIR) (https://www.arabidopsis.org/). SVP sequences were also retrieved from Brassica Database (BRAD) (http://brassicadb.cn/#//) as described in Singh and Singh (2021) for 25 Brassicaceae species including Brassica juncea, Brassica napus, Brassica rapa, Brassica nigra and Brassica oleracea. The annotated sequences were directly retrieved from BRAD using A. thaliana SVP sequence (AT2G22540) as query in Syntenic Gene@Subgenomes feature. The unannotated, full-length genomic sequences were retrieved using respective BLAST coordinates via JBrowse. For annotation of coding sequences and in-silico translation of protein sequences, FGENESH online tool (http://www.softberry.com/berry.phtml? Topic = fgenesh&group = programs&subgroup = gfind, (Solovyev et al. 2006) was employed. For establishing orthology of these sequences, NCBI-BLASTp (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins) was using A. thaliana as reference genome.

To isolate SVP sequences from B. juncea cv. Varuna, homolog specific primers were designed on the SVP CDS sequences retrieved from B. juncea var. tumida (BRAD). Oligonucleotide 5’ATGGCAAGAGAGAAGATTCAG3’ was used as forward primer to amplify Bju_B01_SVP, Bju_B02_SVP and Bju_BjuB024255 while 5’ATGGCGAGAGAGAAGATTCAG3’ was used as forward primer to amplify Bju_A04_SVP and Bju_A09_SVP. For the reverse primers, oligonucleotide 5’CTAACCGCCATACGGTAGG3’ was used for BjuB024255, 5’TTACCCGAGCC TAAGGGAG3’ for Bju_A09_SVP, 5’TCAACAGGAGCACCGGTGG3’ for Bju_B01_SVP, 5’TTAATCATTTCTTCGAGAAAAAG3’ for Bju_B02_SVP and 5’TCAACAGGAGCGCCGGTTG3’ for Bju_A04_SVP.

The total RNA was isolated from freshly harvested B. juncea cv. Varuna leaf tissue using TRIzol® (Invitrogen, Carlsbad, CA, USA), as per manufacturers’ recommendations. After normalization, 1µg of RNA was treated with DNase I (Thermo Fisher Scientific, Waltham, Massachusetts, USA) and reverse transcribed using RevertAid™ H minus first strand cDNA synthesis kit (Thermo Fisher Scientific, Waltham, Massachusetts, USA). For RT-PCR, the first strand cDNA was used as a template for amplifying B. juncea SVP cDNA sequences. The reaction mix comprised 1X reaction buffer, 0.2 mM dNTPs, 0.5 µM of primers, 1 unit (U) of Phusion DNA polymerase (Thermofisher) and the PCR conditions included initial denaturation of 30 sec at 98°C, 32 cycles of 98°C for 10 sec; annealing temperature for 30 sec and 72°C for 30 sec, followed by a final extension at 72°C for 2 min. The purified PCR products were cloned into Sma1 digested pGADT7 vector using a reaction mix consisting of 2 µl 10X Ligation Buffer, 1 µl of T4 DNA Ligase (Thermo Scientific), along with 2 µl of 50% PEG 4000 (Polyethylene Glycol). The ligation mixture was incubated at 16°C overnight and transformed into E. coli DH5α competent cells. A consensus sequence for each SVP homolog was achieved by sequencing at least 3 clones and GenBank accession numbers assigned viz. OP172879, OP172880, OP172881, OP172882, OP172883. The nomenclature followed for annotated SVP homologs from Brassicaceae species includes an abbreviated form of the species followed by accession number in BRAD. For unannotated SVP sequences, the nomenclature includes an abbreviated species name followed by chromosomal coordinates and ‘SVP’. For example, Bju_A04_SVP describes the SVP sequence from B. juncea which was identified via BLAST on the chromosome ‘A04’. The nomenclature for isolated SVP homologs followed a similar logic. For example, BjuVAR1_SVP describes SVP homolog number 1 isolated from B. juncea cv. Varuna. Pair-wise sequence identities were calculated for both nucleotide as well as in-silico translated protein sequences using BioEdit ver. 7.0.5.3 (Hall 1999).

2.2 Phylogenetic reconstruction and sequence analysis

Phylogenetic analysis of Brassicaceae protein sequences was undertaken using BEAST v1.6.2 software, a Bayesian based Inference (Drummond and Rambaut 2007). Clustal X ver. 2.0.12 (Larkin et al. 2007) was used for sequence alignment. The alignment files were generated in .nex format and subsequently used as input into BEAUti v1.6.2 (Drummond and Rambaut 2007) employing the default parameters to generate ‘.xml’ files. The ‘.xml’ files were used as input for BEAST ver. 1.6.2., to generate phylogenetic trees, using default parameters while selecting thread pool size as ‘Automatic’. The phylogenetic trees were subsequently analysed using Tree Annotator ver. 1.6.2 and visualised using FigTree ver 1.3.1 (Rambaut 2006). Assignment of sub-genome and progenitor genome identities to unknown sequences was performed as described by Jain et al. (2018).

DNA polymorphism across genomic sequences from Brassicaceae was mapped using DNA Polymorphism feature of DnaSP 6 (Rozas et al. 2017) keeping the window size and step size as 25 and 2 nucleotides, respectively. Gene structure analysis was undertaken using Splign (https://www.ncbi.nlm.nih.gov/sutils/splign/splign.cgi, (Kapustin et al. 2008) to map exon-intron boundaries, identify isoforms and to determine pattern of exon conservation. The analysis of functional domains was carried out using default parameters in Batch Conserved Domain Database (https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi, (Marchler-Bauer and Bryant 2004)).

2.3 In-silico three-dimensional structure modelling, cross-validation, and refinement of B. juncea SVP proteins

The models for SVP proteins from B. juncea var. tumida were generated using I-TASSER (https://zhanglab.ccmb.med.umich.edu/I-TASSER/), which offers the flexibility of integrating both threading as well as ab-initio based approaches (Pollock and Treisman 1990; Zheng et al. 2019). Default parameters of I-TASSER were used employing complete I-TASSER template library. Further, ab-initio SVP models were generated using ALPHAFOLD (Jumper et al. 2021; Varadi et al. 2022). The structural qualities of the predicted models were cross validated using the Protein Structure Validation Software (PSVS, https://montelionelab.chem.rpi.edu/PSVS/, Bhattacharya et al. 2007). PSVS is generally applied for analysis of model quality and permits consolidated analyses of models by identifying details of Ramachandran plot using versatile tools such as PROCHECK (Laskowski et al. 1993), PROSA (Sippl 1993), MolProbity (Lovell et al. 2003) and Verify3D (Eisenberg et al. 1997; Bhattacharya et al. 2007). The protein models were refined iteratively and visualised using WinCoot (Emsley et al. 2010). Energy minimisation was undertaken using YASARA (http://www.yasara.org/minimizationserver.htm, Kriger et al. 2002). The refined models of SVP proteins from B. juncea var. tumida were then used as templates based on percent sequence identity to serially model the structures of corresponding SVP proteins from B. juncea cv. Varuna employing SWISS-MODEL (https://swissmodel.expasy.org/, Waterhouse et al. 2018).

To delineate the influence of individual domains of SVP (MIKC) on DNA-protein interactions, six distinct hypothetical protein models were generated representing differential domain presence. The hypothetical proteins were also modelled using SWISS-MODEL (https://swissmodel.expasy.org/, Waterhouse et al. 2018) while utilising the original I-TASSER generated and refined Bju_A09_SVP protein model as the template.

For predicting ligand (nucleic acid) binding amino acid residues, COACH server (https://zhanggroup.org/COACH/, Yang et al. 2013) was employed apart from I-TASSER. Secondary structures were analysed using DSSP based Stride Web interface (http://webclu.bio.wzw.tum.de/cgi-bin/stride/stridecgi.py, Heinig and Frishman 2004) and ESPript 3.0 server (https://espript.ibcp.fr/ESPript/ESPript/, Robert and Gouet 2014). Graphical representations of the docked complexes, superpositions and individual structural models were generated employing PyMOL (The PyMOL Molecular Graphics System, Version 1.8 Schrödinger, LLC) and Chimera (Pettersen et al. 2004).

2.4 In-silico structure modelling of B. juncea SOC1 promoter homeologs

SVP specific TFBS (Transcription Factor Binding Sites) on the B. juncea SOC1 promoter homeologs are already reported (Sri et al. 2020). Models of the nucleotide sequences corresponding to SVP specific TFBS on B. juncea SOC1 promoter homeologs (AALF, AAMF1 and AAMF2) were generated using the default parameters of the 3D-DART (https://alcazar.science.uu.nl/dna/dna.php, van Dijk and Bonvin 2009). The w3DNA 2.0 (http://web.x3dna.org/) was employed to analyse nucleic-acid-containing structures across a range of molecular parameters involving base-pair and other atomic pairs (Li et al. 2019) Since the nucleotide sequence flanking the TFBS also influences the specificity of binding of the transcription factor (Leonard et al. 1997; Morin et al. 2006; Nagaoka et al. 2001; Rajaram and Kerppola, 1997), 10 bp equivalent flanking region (5’ and 3’) was included while modelling the DNA structure. The models thus generated were compatible with the HADDOCK docking server (https://alcazar.science.uu.nl/services/HADDOCK2.2/ haddockserver-easy.html, De Vries et al. 2010). Illustrations of the models of B. juncea SOC1 promoter homeologs were generated using Chimera (Pettersen et al. 2004). Nomenclature for models of B. juncea SOC1 promoter homeologs AALF, AAMF1 and AAMF2 used henceforth is BjupSOC1_AALF, BjupSOC1_AAMF1 and BjupSOC1_AAMF2, respectively.

2.5 DNA - Protein interaction and prediction of binding affinities

The docking of B. juncea SOC1 promoter homeologs with B. juncea SVP protein homologs were accomplished using ‘The Easy interface’ of the HADDOCK webserver (https://alcazar.science.uu.nl/services/HADDOCK2.2/haddockserver-easy.html, De Vries et al. 2010) using default parameters. HADDOCK enables docking of specific residues and base pairs which are either predicted or known to have DNA or transcription factor binding properties, respectively. DNA binding sites predicted by I-TASSER were used as input in HADDOCK for specifying the active residues of the respective proteins. The I-TASSER (Zheng et al. 2019) predicts additional features pertaining to protein structure including secondary structure, probable function and ligand binding sites while employing COACH (Yang et al. 2013) and COFACTOR (Zhang et al. 2017) to predict potential ligand binding sites. In the case of DNA model for promoter sequences, the core TFBS nucleotides were marked as active residues in the HADDOCK server. The list of these amino acid and nucleotide residues which were used as active residues for docking is given in Supplementary Table 1 and Supplementary Table 2, respectively.

HADDOCK output represents clusters of related docked models. For analysis of binding affinity and molecular interactions, most stable model from the cluster ranked as highest, as predicted by HADDOCK, was selected as the representative of each bimolecular SVP_protein: SOC1_promoter (SVP: pSOC1) interaction. To determine stability of the predicted models from the HADDOCK outputs and to compare relative binding strengths of the 45 different SVP: pSOC1 docked complexes, respective binding affinities were calculated. The docked complexes were used as inputs to PreDBA (http://predba.denglab.org/, Yang and Deng 2020). The output generated as ΔG values (Dissociation Gibbs Free Energy, Yang and Deng 2020) was indicative of corresponding binding affinities.

2.6 Molecular contact analysis and hotspot prediction

A comprehensive approach has been undertaken to analyse the molecular contacts between modelled proteins and modelled DNA fragments utilising CCP4i (Potterton et al. 2003), DNAproDB (Sagendorf et al. 2017), PremPDI (https://lilab.jysw.suda.edu.cn/research/PremPDI/, Zhang et al. 2018) and DIMPLOT (Laskowski and Swindells 2011). The “Analyse Molecular Contacts” feature of CCP4i (Potterton et al. 2003) was employed keeping a threshold distance of 4.5Å (Kuznetsov et al. 2006) to identify all possible contact points between SVP protein and B. juncea SOC1 promoter homeologs in each docked complex. The lists generated for molecular contacts were manually scrutinised to identify conserved interactions and interacting partners as nucleotides and amino acid residues within SOC1 promoter homeologs and SVP proteins, respectively. All such conserved interactions were subsequently visualized using UCSF Chimera (Pettersen et al. 2004). The most critical set of amino acid residues that established contacts with nucleotides in SVP: pSOC1 docked complexes, were identified by the PremPDI that has estimated the impact of mutations on DNA-protein interactions by calculating the change in binding affinity.

2.7 In-vivo validation of DNA – Protein interaction

In-vivo validation of the bi-molecular interactions among B. juncea SVP proteins and SOC1 promoter homeologs was performed using yeast one-hybrid assays (Y1H, Matchmaker® Gold Yeast One-Hybrid Library Screening System), as per the manufacturer’s instructions. Fragments of B. juncea SOC1 promoter homeologs harbouring TFBS corresponding to SVP (~ 250bp) were amplified using homeolog specific primers and cloned into Kpn1(NEB) and Xho1 (NEB) digested pAbAi vector, to construct bait plasmids. Oligonucleotides 5’-GGTACCGTAACATATCATAACTCTAGCCTG-3’, 5’-GGTACCTTACAGTGGGGCATATAAGTAC-3’ and 5’-GGTACCCAGATTAATTACATGTAGGGCATG-3’ containing Kpn1 sites were used as forward primers, while 5’-CTCGAGAACCTCATCCTTTTACTTATTTTGG-3’, 5’-CTCGAGAAAGTCTTGAGAAAGACCAAG-3’ and 5’-CTCGAGATATGTGTCGAAAATATGATGGTCG-3’ containing Xho1 sites were employed as reverse primers, to amplify SOC1 promoter fragments from BjupSOC1_AALF, BjupSOC1_AAMF1 and BjupSOC1_AAMF2, respectively. Nomenclature for SVP TFBS harbouring SOC1 promoter fragments used henceforth is BjupSOC1_AALF_Frag, BjupSOC1_AAMF1_Frag and BjupSOC1_AAMF2_Frag. Yeast cells were transformed using BstB1 (NEB) linearised bait plasmids specific to each SOC1 promoter fragments (250 bp) and plated on SD/-Ura media (Takara Bio Inc.) and incubated at 30°C for 2–3 days. Positive yeast colonies were screened using PCR based genotyping for presence of bait plasmid. Oligonucleotide 5’-TACCAATCTAAGTCTGTGCTC-3’ and 5’-GTGTATTTGTGTTTGCGTGTC-3’ were used as pAbAi vector specific forward and reverse primers, respectively, for confirmation of positive clones.

To determine the basal expression of AbA^r in the absence of prey, minimum inhibitory concentration of Aureobasidin A (Takara Bio Inc.) was determined for each bait yeast strain. This was achieved by spotting bait yeast cultures with OD₆₀₀ ranging from 1 to 0.00001, on SD/-Ura/+AbA plates containing different concentrations of AbA (100–1000 ng/ml) as per Rao and co-workers (Rao et al. 2022).

B. juncea cv. Varuna SVP sequences were cloned into Sma1 linearised pGADT7 to generate prey plasmids. Each bait yeast strain was co-transformed with individual prey plasmids for analysing pairwise SOC1 promoter and SVP protein interactions. Screening of yeast strains harbouring prey plasmids was performed on SD/-Leu media (Takara Bio Inc.). Positive yeast clones harbouring both bait and prey plasmids were PCR confirmed using oligonucleotide 5’-AAGATACCCCACCAAACCC-3’ and 5’-GAAAGAAATTGAGATGGTGCAC-3’ as pGADT7 vector specific forward and reverse primers, respectively.

For analysis of interaction between a pair of SOC1 promoter and SVP protein, cultures of individual yeast strains harbouring specific bait and prey plasmids, ranging in OD₆₀₀ from 1 to 0.00001 were spotted on SD/-Leu/+AbA plates containing specific concentrations of AbA. A range of yeast culture OD₆₀₀ were spotted to quantitate the binding strength of SVP proteins with SOC1 promoter fragments. All experiments were performed in triplicates. Empty vectors (pAbAi and pGADT7) were employed as negative controls. An overview of the methodology followed for unravelling interactions among SOC1 promoters and SVP protein homeologs from B. juncea is provided in Fig. 1.

3.1 Genomes of Brassicaceae retain divergent SVP homeologous copies

Copy and sequence level variation was characterised among Brassicaceae SVP homologs and to delineate divergence between B. juncea SVP homeologs, 50 sequences from 25 Brassicaceae species, including 6 unannotated sequences corresponding to B. juncea var. tumida retrieved from BRAD viz. Bju_A04_SVP (A04: 12160962..12163530), Bju_A09_1_SVP (A09: 49119658..49122837), Bju_A09_2_SVP (A09: 49449411..49452590), Bju_B01_SVP (B01: 19438290..19440691) and Bju_B02_SVP (B02: 6832951..6835995), and 5 cDNA sequence isolated from B. juncea cv. Varuna (BjuVAR1_SVP, BjuVAR2_SVP, BjuVAR3_SVP, BjuVAR4_SVP and BjuVAR5_SVP). A list of these homologs is provided (Supplementary Table 3). Details on genomic coordinates, progenitor and sub-genome of origin are also provided along with proposed nomenclature. To understand the extent of variation among SVP homologs from Brassicaceae, CDS and protein sequences were surveyed for size and pairwise percentage identity. The size of the SVP ranged from 1582 (Bra_BraA04g016520.3C) to 5870 bp (Bju_BjuB024255), while the proteins varied from 72 (Bju_B02_SVP) to 406 amino acids (Bju_BjuB024255) (Supplementary Table 3). Pairwise percentage identity at nucleotide level of 50 SVP Brassicaceae homologs ranged from 63 to 100%. Pairwise percentage identity at protein level ranged from 65 to 100% (Supplementary Table 4a and 4b). Extremely low values for pairwise percentage identity (2.6% and 1.2%, at the CDS, and protein level, respectively) were attributable to size variation in SVP homologs. Among the SVP homologs from B. juncea, the pairwise percentage identities ranged from 15.9 to 100% and 15.5 to 100% at CDS and protein level, respectively. The nucleotide sequence was found to be identical for 2 tandemly organised SVP copies on chromosome A09 viz. Bju_A09_1_SVP and Bju_A09_2_SVP were found to be identical at nucleotide level. In general, size and sequence variation were observed in SVP proteins across Brassicaceae. In B. juncea, as many as 9 SVP homologs were identified which differed in both size and sequence suggesting variation in predicted protein models.

3.2 Phylogenetic reconstruction establish genome of origin identities of SVP variants

To determine sub-genome of origin and homeolog-based divergence, phylogenetic reconstruction was carried out using 50 SVP protein sequences (Fig. 2). Overall, SVP homologs formed clusters based on species of origin followed by sub-genome (LF, MF1 and MF2) with phylogeny recapitulating established species relationships.

Within Brassica specific clades, the homeologs were found to cluster as per their genome and sub-genome of origin suggesting triplicated nature of meso- and allopolyploid species. Within the homeolog specific clades, sequences diverged according to progenitor genomes (AA, BB and CC). Evidently, 2 prominent clades (I and II) can be discerned in the phylogram. Clade I comprise of SVP sequences derived from non-Brassica species viz. Arabidopsis thaliana (Ath), Arabidopsis helleri (Aha), Arabidopsis lyrata (Aly), Descurainia Sophia (Dso), Boechera retrofracta (BOERET), Boechera stricta (BOESTR), Capsella grandiflora (Cgr), Capsella rubella (Cru) and Camelina sativa (Csa). In contrast, Clade II, primarily represents SVP homologs from Brassica species within sub-clade IIB. Assignment of sub genome and homeolog affiliation of Bju_A04_SVP, Bju_A09_1_SVP, Bju_A09_2_SVP, Bju_B01_SVP, Bju_B02_SVP, BjuVAR1_SVP, BjuVAR2_SVP, BjuVAR3_SVP, BjuVAR4_SVP and BjuVAR5_SVP from B. juncea based on B. rapa, B. oleracea and B. napus is provided in Supplementary Table 3. Since the size of Bju_B02_SVP (marked as O) was exceptionally small (72 aa), sub-genome identity could not be interpreted. Interestingly, in majority of species with well-characterised sub-genome structures, only LF and MF1 specific homeologs were detected. The MF2 specific homeolog was identified exclusively in C. sativa. In summary, the phylogenetic reconstruction of Brassicaceae SVP sequences led to unambiguous assignment of sub-genome of origin of nine B. juncea SVP sequences.

Detailed sequence analysis of 50 SVP homologs from Brassicaceae with respect to natural variation in gene structure, gene polymorphism and presence or absence of domains showed considerable structural variation in SVP proteins. Analysis of nucleotide polymorphisms as Pi values (DnaSP) across 50 SVP homologs (Supplementary Fig. 1) revealed lower Pi values in regions corresponding to exons. Analysis of exon-intron splicing patterns, using nine exons annotated in A. thaliana SVP as reference, revealed marked variation across 50 SVP homologs (Fig. 3). The exon-intron structures of Brassicaceae SVP homologs mainly varied with respect to differential presence of exons and introns (Fig. 3). For instance, the 1st intron is absent in Aar_AA_scaffold2954_20, Bna_ZS11A04G016220, Bra_BraA04g016520.3C and Tha_Thhalv10000342m, while the 7th and 8th exon are absent from Lal_ LA_scaffold3554_2. mRNA1 and Aar_AA_scaffold2954_20, respectively. The 8th and 9th exons appear particularly variable across variants.

Analysis of a total of 9 SVP cDNA sequences isolated from B. juncea also led to detection of splice variants. Instances of exon skipping, alternative donor splicing, and partial intron retention were detected in SVP sequences isolated from B. juncea cv. Varuna (Fig. 4); exon skipping was observed in BjuVAR1_SVP and BjuVAR2_SVP, while alternative donor (AD) along with exon skipping was observed in BjuVAR3_SVP and BjuVAR5_SVP. Interestingly, Bju_B02_SVP represented a much smaller variant (219 bp) comprising of 1st exon along with partially retained intron. These data point to an interplay of complex interaction patterns among SOC1 promoters and SVP proteins in B. juncea with splice-forms contributing additionally to existing homolog complexity.

Analysis of conserved domains (Supplementary Table 5) for SVP protein homologs derived from Brassicaceae (Supplementary Figs. 2 and 3) revealed differential presence of MIKC domains. Pairwise percentage identity in the DNA binding MADS domain of SVP homologs ranged from 89.4–100% (Supplementary Table 6). Inclusion of SVP sequences encoding truncated or unusually large sized proteins increased the range as 73.6–100%. Based on the differences in MIKC domain, 3 size classes of SVP proteins were identified. These included full length forms with intact or partial MIKC domains, and other forms with skipped or duplicated I, K and C domains. The SVP protein encoded by Aar_AA_scaffold2954_20, Bna_ZS11A04G016220, Bra_BraA04g016520.3C and Tha_Thhalv10000342m lacks MADS domain correlating to a skipped first exon. Among the 6 SVP copies from B. juncea, Bju_B02_SVP (ChrB02: 6832951.. 6835995) lacks a K-domain. On the other hand, Bju_BjuB024255 contains 2 K-domains. In summary, B. juncea harbours highly diverse variants of SVP proteins with potentially altered binding potential with SOC1 promoter homeologs.

3.3. SVP proteins reveal substantial sequence-based structural variation

Models were generated for a total of 15 SVP proteins, including 9 natural and 6 hypothetical proteins. Primarily, refined models were generated for 5 SVP homologs from B. juncea var. tumida (Bju_A04_SVP, Bju_A09_SVP, Bju_B01_SVP, Bju_B02_SVP and Bju_BjuB024255) using ab-initio and threading based approach, which were then employed as templates to generate models for the B. juncea cv. Varuna derived SVP (BjuVAR1_SVP, BjuVAR2_SVP, BjuVAR3_SVP and BjuVAR4_SVP) and hypothetical SVP proteins (Hyp1_SVP, Hyp2_SVP, Hyp3_SVP, Hyp4_SVP, Hyp5_SVP and Hyp6_SVP) based on percent sequence identity. Hyp1_SVP sequence was generated from natural protein Bju_B02_SVP which harbours a MADS domain of 59 amino acids. The Hyp1_SVP (60 amino acids) possess a MADS domain identical to Bju_B02_SVP, except for deleted RLGLGT (position 61 to 66). The Hyp2_SVP (93 amino acids) represents an intact M domain (72 amino acids) but is devoid of K- and C- domains. Similarly, Hyp3_SVP (77 amino acids) represents an intact M domain (72 amino acids) but misses I-, K- and C-domain. Hyp4_SVP (96 amino acids) represents an intact M domain (72 amino acids) and I-domain with only 3 bases from K-domain. Hyp4_SVP is devoid of K- and C-domains and Hyp5_SVP (171 amino acids) possesses an intact M-, I- and K- domains, but misses a C-domain. Finally, Hyp6_SVP represents an intact M domain (72 amino acids) but is devoid of an I-domain. BjuVAR1_SVP, BjuVAR2_SVP and BjuVAR3_SVP exemplify splice variants from B. juncea cv. Varuna.

Protein models generated for 5 SVP sequences retrieved from BRAD are shown in Fig. 5. The sequence details are provided in Supplementary Table 3 (highlighted in yellow). The protein models were iteratively refined, energy minimised and cross-validated for stable secondary structure conformations and stereochemical parameters (Supplementary Table 7). The refined models of these SVP proteins depicted characteristic structural features including a loop region, followed by an α-helix (α1), 2 β-strands (β1 and β2) and α-helices (α2, α3 and α4) as shown in magnified illustration of the models generated for Bju_A09_SVP (Fig. 5a). Such prominent structural features are clearly visible in Bju_A04_SVP, Bju_A09_SVP, and Bju_B01_SVP, and Bju_BjuB024255 (Fig. 5b), with minor variations. Furthermore, the significantly distinct model of Bju_B02_SVP may be attributed to absence of α3 and α4 helices, suggesting plausible role in DNA binding. The detailed structural features of MADS domain of the 5 proteins are depicted in Fig. 5c. As expected, the SVP proteins from B. juncea var. tumida (BRAD) represent structural features characteristic of MEF-2A class of MADS-box superfamily.

Using refined models of SVP proteins (Fig. 5) as templates, homology based modeling was undertaken to generate models for 4 B. juncea cv. Varuna specific SVP proteins isolated in the present study (BjuVAR1_SVP, BjuVAR2_SVP, BjuVAR3_SVP and BjuVAR4_SVP) (Supplementary Fig. 4). Identification of highest pair-wise identity values between B. juncea var. tumida in-silico translated SVP sequences and B. juncea cv. Varuna sequences formed the basis of template selection (Supplementary Table 4b). The refined model of Bju_B01_SVP was used as template for generating the models of BjuVAR3_SVP, while Bju_A09_SVP was employed as template for modelling BjuVAR1_SVP, BjuVAR2_SVP and BjuVAR4_SVP. The models generated for B. juncea cv. Varuna SVP proteins are represented in Supplementary Fig. 4a and the quality parameters of all generated protein models are provided in Supplementary Table 7. The prominent structural features represented in SVP proteins derived from B. juncea var. tumida (Fig. 5) were also apparent in BjuVAR3_SVP, BjuVAR4_SVP and BjuVAR5_SVP. Absence of α3 helix from SVP isoform BjuVAR1_SVP corresponds to a missing K-domain. The magnified view of secondary structural aspects of MADS domain of proteins is provided in Supplementary Fig. 4b.

To delineate the influence of individual domains (M-, I-, K-, and C-) on overall DNA–protein interactions, 6 hypothetical proteins (Hyp1_SVP, Hyp2_SVP, Hyp3_SVP, Hyp4_SVP, Hyp5_SVP, and Hyp6_SVP), representing differential presence of M-, I-, K-, and C- domains, were additionally modelled using SWISS-MODEL employing Bju_A09_SVP as a template. The sequences of hypothetical proteins along with their sizes and proposed nomenclature are provided in Supplementary Table 8 and the models generated along with the assesment scores are given in Supplementary Fig. 5a and Supplementary Table 7, respectively. The MADS domains of these modelled hypothetical SVP proteins are depicted in Supplementary Fig. 5b. The criteria employed to design the hypothetical proteins is demonstrated in Supplementary Fig. 5c. The newly modelled proteins were markedly similar to respective template. Models of BjuVAR1_SVP, BjuVAR2_SVP and BjuVAR4_SVP and 6 hypothetical SVP proteins were nearly similar Bju_A09_SVP template. Similarly, the model of BjuVAR3_SVP was similar to the template Bju_B01_SVP. Overall, a considerable degree of sequence dependent natural structural variation was observed for B.juncea SVP proteins.

3.4. Nucleic acid models of homeologs of B. juncea SOC1 promoter fragments are structurally diverse

DNA models were generated using 3D-DART server for B. juncea SOC1 promoter homeologs harbouring SVP binding site. The fragment size reckoned for BjupSOC1_AALF and BjupSOC1_AAMF1 was 30bp and harboured a 10bp SVP binding site. For BjupSOC1_AAMF2, a 31bp promoter fragment containing 11 bp SVP binding site was considered for generating models. The models visualized using Chimera, revealed major (Ma) and minor grooves (Mi) on the B-form double stranded DNA models (Fig. 6a). A 2D linear representation of BjupSOC1_AALF, BjupSOC1_AAMF1 and BjupSOC1_AAMF2 generated using DNAproDB is provided in Fig. 6b, to simultaneously depict the sequence and arrangement of nucleotides. The nucleotide positions are marked with arrows in the anti-parallel strands of DNA. For instance, the nucleotides are numbered as 1–30 and 31–60 on the plus and minus strand, respectively, of BjupSOC1_AALF promoter fragment. Since the TFBS for SVP are present on the minus strands of promoter fragments, the nucleotides at position 11–20 and 11–21 correspond to SVP binding motifs in BjupSOC1_AALF/BjupSOC1_AAMF1 and BjupSOC1_AAMF2, respectively. The TFBS in each promoter homeolog is highlighted with blue background (Fig. 6b) and the corresponding regions are coloured in cyan in the 3D illustration (Fig. 6a). Analysis of 3D DNA models revealed considerable natural variation in several molecular parameters. These mainly included nucleotide position-wise information on H-bond length (Å) among atom pairs, local base-pair parameters viz. shift, slide, rise, tilt, roll and twist, major and minor groove width among others. Supplementary Table 9–11 provides detailed information on structural variability observed across DNA models of BjupSOC1_AALF, BjupSOC1_AAMF1 and BjupSOC1_AAMF2. The variation across base-pair and other atomic pair parameters were found interspersed across nucleotide positions 1 to 30 in case of BjupSOC1_AALF/BjupSOC1_AAMF1 and 1 to 31 in case of and BjupSOC1_AAMF2. Detailed investigation revealed considerable variability in nucleotide position 15, 16, 17 and 18.

3.5. Comparable binding affinity of docked complexes despite natural structural variation in SVP proteins and SOC1 promoter homeologs

To examine if structural variation exhibited in models of SVP proteins and SOC1 promoter fragments harbouring variable SVP binding sites influenced binding affinity between regulatory pairs, docking studies were undertaken. In total, 45 bimolecular interactions among 3 promoters (BjupSOC1_AALF, BjupSOC1_AAMF1 and BjupSOC1_AAMF2) and 15 SVP proteins were analysed. These included 9 natural proteins (Bju_A04_SVP, Bju_A09_SVP, Bju_B01_SVP, Bju_B02_SVP Bju_BjuB024255, BjuVAR1_SVP, BjuVAR2_SVP, BjuVAR3_SVP and BjuVAR4_SVP) and 6 hypothetical sequences (Hyp1_SVP, Hyp2_SVP, Hyp3_SVP, Hyp4_SVP, Hyp5_SVP and Hyp6_SVP). DNA and protein binding residues from SVP proteins and SOC1 promoter homeologs, respectively, were specified as active residues while performing docking. Despite identical sequence of MADS domain, the DNA binding residues predicted by I-TASSER for Bju_A09_SVP were 134G, 138V, 139I, 141T, 142K, 143S and 145K. In contrast, 2A, 3R, 4E, 6I, 20T, 23K, 24R, 27G, 30K, and 31K were predicted for Bju_A04_SVP, Bju_B01_SVP, Bju_B02_SVP and Bju_BjuB024255, respectively. To resolve this, COACH analysis was performed exclusively for Bju_A09_SVP which led to the identification of 2A, 3R, 4E, 6I, 20T, 23K, 24R, 27G, 30K, 31K as DNA binding residues. Therefore, docking studies for Bju_A09_SVP were repeated using both the sets of predicted DNA binding residues resulting in an increase in the total number of B. juncea SVP: pSOC1 biomolecular interactions to 48.

The HADDOCK outputs are depicted as multiple clusters of similar models representing diverse structural conformations of the docked complexes. The most stable representative model for each B. juncea SVP: pSOC1 docked complex is provided (Supplementary Fig. 6–8). As expected, all SVP proteins were found to interact with SOC1 promoter homeologs, with either of the strands of the double stranded promoter region via DNA binding MADS domain. Moreover, the promoter region (shown as green strands) flanking the SVP TFBS (shown as blue coloured strands) is also observed to interact with SVP proteins, indicating their significance in DNA-protein interaction.

To analyse the binding affinity of the 48 SVP: pSOC1 docked complexes, Gibbs Free Energy of Dissociation (ΔG) values were calculated using PreDBA. The ΔG values (kcal/mol) representative of the binding affinities of respective SVP: pSOC1 complexes, are provided as a heat map in Fig. 7. In case of 27 docked complexes generated from natural promoters and proteins, significant conservation of pair-wise binding affinities was uncovered despite the structural diversity in individual promoters and SVP proteins. Specifically, the 3 promoter homeologs viz. BjupSOC1_AALF, BjupSOC1_AAMF1 and BjupSOC1_AAMF2 exhibited comparable binding affinity to all but one natural SVP proteins (Bju_A04_SVP, Bju_A09_SVP, Bju_B01_SVP, Bju_BjuB024255, BjuVAR1_SVP, BjuVAR2_SVP, BjuVAR3_SVP and BjuVAR4_SVP). Unexpectedly, ΔG values of 3 promoters with Bju_BjuB024255 SVP protein harbouring additional K-domain also depicts similar binding affinities. Bju_B02_SVP was an exception which depicted considerable increase in the binding affinity for BjupSOC1_AAMF1 (-8.93 kcal/mol) relative to BjupSOC1_AALF (-11.99 kcal/mol) and BjupSOC1_AAMF2 (-11.82 kcal/mol).

Exceptionally high binding affinities were reported for Bju_B02_SVP protein, a truncated protein encoding only MADS domain, with BjupSOC1_AALF (-11.99 kcal/mol), BjupSOC1_AAMF1 (-8.93 kcal/mol) and BjupSOC1_AAMF2 (-11.82 kcal/mol), respectively. This pointed to stabilising effect conferred by I-, K- and C- domains. Analysis of binding strengths of hypothetical SVP proteins with differential presence of M-, I-, K- and C- domains revealed interesting patterns. Hyp1_SVP, Hyp2_SVP, Hyp3_SVP and Hyp4_SVP lacking both K- and C- domains showed significantly higher binding affinities to BjupSOC1_AALF (-11.99 kcal/mol), BjupSOC1_AAMF1 (-8.93 kcal/mol) and BjupSOC1_AAMF2 (-11.82 kcal/mol) relative to SVP proteins which retained either or both these domains (Fig. 7). This observation was in corroboration to the energy patterns observed with truncated natural protein Bju_B02_SVP which lacks both K- and C- domain. These data highlight the impact of mutual absence of K- and C- domains on binding potential of SVP. Interestingly, the I- domain was found to have an insignificant effect on the binding affinities of the SVP proteins to the SOC1 promoter homeologs. Though Bju_B02_SVP, Hyp1_SVP and Hyp3_SVP lacked an I-domain, the binding strengths of these were comparable to Hyp2_SVP and Hyp4_SVP which possessed the I-domain. Broadly, the data suggests that despite natural structural variation in SVP proteins and SOC1 promoters, the binding potential has remained preserved.

3.6. Unique and shared binding patterns underpin molecular interactions among SOC1 promoters and SVP proteins

To fine-map bi-molecular interaction patterns, types of molecular contacts and contact residues were identified on 27 SVP: pSOC1 docked complexes between 9 SVP proteins and 3 SOC1 promoter homeologs from B. juncea. The molecular contacts were screened and observed for variability with respect to non-covalent interactions such as hydrogen bonds (2.2–3.6 Å), π-π stacking (3.6–5Å), Van der Waals (0.3–0.6 Å) and other hydrophobic interactions. To identify π-π stacking (3.6–5Å), the cut-off distance was extended to 5Å. A list of crucial non-covalent interactions is provided in Supplementary Table 12. The amino acid residues involved in these bonds are marked as ‘interacting amino acid residues. As Bju_B02_SVP protein was predicted to have the strongest binding affinity with the 3 B. juncea SOC1 promoter homeologs, the molecular interactions between Bju_B02_SVP: BjupSOC1_AALF, Bju_B02_SVP: BjupSOC1_AAMF1 and Bju_B02_SVP: BjupSOC1_AAMF2 were largely compared with those between a representative protein involving Bju_A04_SVP and 3 B. juncea SOC1 promoter homeologs. The stabilising bonds formed between each SOC1 promoter homeolog and Bju_B02_SVP and Bju_A04_SVP are shown in Fig. 8a-c and 8d-e, respectively.

To delineate plausible conservation pattern of the residues involved in establishing contacts with nucleotides from 3 B. juncea SOC1 promoter homeologs, hydrogen bonds, hydrophobic interactions, π-π stacking and Van der Waals forces were compared (Supplementary Table 12). SVP protein specific amino acid residues mediating hydrogen bond interactions were identified for all B. juncea SVP proteins. For instance, the residues GLN7, LYS10, ARG24, LYS31 and ARG3, GLU4 and LYS5, GLN7 from Bju_B02_SVP and Bju_A04_SVP, respectively, were involved in formation of hydrogen bonds with the 3 B. juncea SOC1 promoter homeologs.

Hydrophobic interactions were also detected in all docked complexes between SVP proteins and SOC1 promoter homeologs from B. juncea, except for Bju_BjuB024255: BjupSOC1_AAMF1 and Bju_BjuB024255: BjupSOC1_AAMF2. However, conservation of SVP specific, hydrophobic interaction forming amino acid residues was observed only for Bju_B02_SVP, Bju_A04_SVP and BjuVAR2_SVP. Amino acid residues LYS5 and ARG3 from Bju_A04_SVP and BjuVAR2_SVP, respectively, were found as involved in forming hydrophobic interactions with all 3 B. juncea SOC1 promoter homeologs. Likewise, Bju_B02_SVP specific residues GLN7 and ARG9 are involved in hydrophobic interactions with all 3 B. juncea SOC1 promoter homeologs. Furthermore, π-π stacking was also observed, albeit not for all 27 docked complexes. Conservation of SVP protein specific π-π stacking was found only for Bju_A04_SVP, Bju_A09_SVP, Bju_BjuB024255 and BjuVAR1_SVP proteins. Interestingly, PHE was majorly responsible for π-π stacking, except for Bju_A09_SVP, where TYR was the key residue. PHE at 21st and 48th position in Bju_A04_SVP, at 21st and 29th position in Bju_BjuB024255 and at 29th position in BjuVAR1_SVP were found to be involved in π-π stacking with all 3 B. juncea SOC1 promoter homeologs. In case of Bju_A09_SVP, TYR at position 152nd was identified as conserved residue involved in π-π stacking. Overall, the interacting amino acid residues from AA sub-genome specific SVP homeologs were found to be involved in all the 4 categories of non-covalent interactions, however, few BB sub-genome specific SVP homeologs displayed distinct pattern as these did not establish π-π stacking.

The set of interacting nucleotides in each docked complex was also compared to examine conservation in pattern. Notably, base pairs at position 15th, 16th, 17th and 18th of SVP specific TFBS within the double stranded B. juncea SOC1 promoter homeologs were found to be involved in interactions in all the 48 docked complexes (Supplementary Fig. 9a and b). It was interesting to note that nucleotides at 15th, 17th and 18th position were not conserved across the 3 promoter homeologs. Nevertheless, these were still predicted to interact with corresponding proteins. A representation of the SVP specific binding site along with 10bp flanking sequence on BjupSOC1_AALF, BjupSOC1_AAMF1 and BjupSOC1_AAMF2 - highlights nucleotide variation at positions 15th, 16th, 17th and 18th of the unaligned SVP TFBSs (Supplementary Fig. 9c). The figure also depicts frequency of occurrence of nucleotides at specific positions in the TFBSs corresponding to SVP from 3 B. juncea SOC1 promoter homeologs, as generated using WebLogo server (https://weblogo.berkeley.edu/). Evidently, the nucleotides at the positions 17 and 18 were found to tolerate naturally occurring transversions since no stearic hindrance-based impact was apparent in the interaction patterns. Since nucleotide positions 15th, 16th, 17th and 18th on promoters BjupSOC1AALF, BjupSOC1AAMF1 and BjupSOC1AAMF2 refer to the positions on double stranded DNA, the regions of interaction imply complementary nucleotides.

The conserved pattern of interacting residues was investigated by way of superposition of docked complexes of a specific protein with three individual SOC1 promoter homeologs. The superposed models of the highest docking affinity depicting docked complexes viz. Bju_B02_SVP: BjupSOC1AALF, Bju_B02_SVP: BjupSOC1AAMF1 and Bju_B02_SVP: BjupSOC1AAMF2 are given in Fig. 9a. This superposition demonstrates the interactions (Table 3) made by the Bju_B02_SVP specific conserved residues GLN7, LYS10 and ARG24 with all 3 SOC1 promoter homeologs. Likewise, superposed models of Bju_A04_SVP: BjupSOC1AALF, Bju_A04_SVP: BjupSOC1AAMF1 and Bju_A04_SVP: BjupSOC1AAMF2 are given in Fig. 9b, which demonstrates the interactions made by the Bju_A04_SVP specific conserved residues ARG3, GLU4 and LYS5 with all 3 SOC1 promoter homeologs. These data highlight the preservation of interacting residues despite the structural differences in B. juncea SOC1 promoter homeologs.

To analyse the overall binding pattern at the B. juncea SVP: pSOC1 interaction interface, 2D illustrations of representative group comprising of 27 docked complexes between 9 natural SVP proteins from B. juncea var. tumida and B. juncea cv. Varuna (Supplementary Table 3) and 3 B. juncea SOC1 promoter homeologs were generated using DNAproDB. The 2D illustrations facilitate the clear depiction of the interface features. The DNAproDB representation of the Bju_A04_SVP: BjupSOC1_AALF complex is shown (Fig. 10) while remaining are provided in Supplementary Fig. 10. The DNA-protein interface was manually selected to display DNA moieties viz. major and minor groove, nucleoside, pentose and phosphate moieties. The secondary structure elements (SSE) such as loops, strands and helices are marked. In the bimolecular interaction between Bju_A04_SVP: BjupSOC1_AALF (Fig. 10), the nucleotides corresponding to the SVP TFBS were involved in establishing contact with the SVP proteins, as expected. However, the interacting nucleotides at positions 14th, 15th, 16th, 17th, 18th and 41st, 42nd, 43rd, 48th, 49^th, 50th are present on opposite DNA strands (Fig. 10a). Furthermore, the nucleotides at 51st and 52nd position, flanking the TFBS, were also involved in interaction with Bju_A04_SVP. Additionally, most amino acid residues were observed to bind in the major groove of DNA and make contacts with sugar moieties and phosphate groups of the nucleotides (Fig. 10b). A detailed graph depicting explicit interactions of specific amino acid residues with different nucleotides along the length of the major groove of DNA is also provided in Fig. 10c. The secondary structure to which the interacting amino acids belong are denoted by symbols such as circle-helix, triangle-strand, square-loop. The relative size of these symbols, on the other hand, denote the number of interactions made by the residues. Similar trends were observed for other 14 SVP: pSOC1 interactions (Supplementary Fig. 10). The type and number of interactions in each of the 27 docked complexes is depicted as 2D representation in the DIMPLOT (Fig. 11; Supplementary Fig. 11). Overall, the results from DNAproDB and DIMPLOT analysis are in corroboration with Chimera visualisation. Clearly, nucleotides from both DNA strands mediate the interactions with the SVP proteins. Involvement of nucleotide residues adjoining the TFBS in DNA-Protein interaction is also confirmed.

To validate critical amino acid residues stabilising each of the 48 SVP: pSOC1 complexes in B. juncea, hotspots were identified using PremPDI server. These hotspots are italicised in Table 3. Expectedly, most of the predicted hotspots for each docked complex comprised of residues mapping to the MADS domain. Analysis of entire list of hotspots led to the detection of at least one amino acid hotspot as conserved for each SVP protein irrespective of the SOC1 promoter homeolog that these interacted with. BjuVAR3_SVP was found to be an exception. A diagrammatic representation of homeolog-wise conserved hotspots is also provided (Fig. 12).

3.7. Validation of binding affinity preservation by in-vivo yeast one-hybrid analyses

A total of 8 interactions between 2 B. juncea SOC1 promoter homeologs and 4 SVP proteins were analysed using yeast one-hybrid assays. The minimum inhibitory concentration of Aureobasidin A was found as 250 ng/ml for BjupSOC1_AALF_Frag, BjupSOC1_AAMF1_Frag and yeast strain harbouring empty pAbAi (Fig. 13a).

Yeast one-hybrid assays confirmed the binding of BjuVAR1_SVP, BjuVAR3_SVP, BjuVAR4_SVP and BjuVAR5_SVP proteins to BjupSOC1_AALF_Frag and BjupSOC1_AAMF1_Frag. Spotting the cultures of yeast harbouring specific bait and prey plasmids, with OD ranging from 1 to 0.00001 on SD/-Leu/+AbA plates facilitated the quantification of the binding strength of SVP proteins with SOC1 promoter fragments. The binding strength of BjuVAR1_SVP (10^− 5 dilution) and BjuVAR3_SVP (10^− 5 dilution) proteins with BjupSOC1_AALF_Frag was found to be greater than that of BjuVAR4_SVP (10^− 3 dilution) and BjuVAR5_SVP (10^− 2 dilution) with BjupSOC1_AALF_Frag (Fig. 13b). However, the binding strength of BjuVAR1_SVP, BjuVAR3_SVP, BjuVAR4_SVP and BjuVAR5_SVP proteins to BjupSOC1_AAMF1_Frag was found as comparable (Fig. 13c). The negative controls for each interaction are depicted in Fig. 13d. Since the minimum inhibitory concentration for BjupSOC1_AAMF2_Frag could not be achieved. Therefore, interaction analyses of SVP proteins with BjupSOC1_AAMF2 could not be performed. Overall, the yeast one-hybrid results validated preservation of binding potential as indicated by docking analyses.

The Brassica species complex represented in U’s triangle, comprises of genetic diploids and amphidiploids (Nagaharu 1935). The unique ancestry marked by Brassica lineage-specific whole genome triplication, in a two-step WGD event, followed by gene-loss and natural allo-polyploidization, presents an interesting model for investigating natural variation in patterns of molecular interactions within regulatory networks (Van De Peer et al. 2017). Polyploid Brassicas possess a characteristic genome architecture wherein Least- (LF), Moderately- (MF1), Most- (MF2) fractionated sub-genomes may reflect differential retention of gene copies, termed as homeologs (Cheng et al. 2012, 2018). In amphidiploids such as B. juncea (AABB), an additional level of gene redundancy is contributed by natural hybridization between triplicated progenitor genomes B. rapa (AA) and B. nigra (BB) (Cheng et al. 2014; Kang et al. 2021, Nagaharu 1935). Under relaxed selection pressure, paralogs and homeologs undergo diversification by acquiring natural mutations resulting in morphological innovations. Relative to A. thaliana, in B. juncea, complex patterns of interactions manifest among gene copies encoding cognate partners to mediate biological functions. Consequently, mutations altering crucial contact residues mediating molecular interactions; DNA: protein or protein: protein, can potentially result in modification of phenotypes. With this perspective, we used polyploid B. juncea system to develop a mechanistic understanding of molecular interactions among promoters and upstream proteins in polyploid Brassicas, and to rationalise integration of structural insights in precision crop engineering.

At the outset, SVP sequences were characterised to select representative homologs for structural studies. Particularly, we examined patterns of variation among sub-genome specific homeologs and isoforms. Phylograms can be used for delineating sub-genome of origin identities of homologs derived from uncharacterized genomes such as B. juncea cv. Varuna, and, unannotated sequences retrieved from fully sequenced genomes (Jain et al. 2018; Singh and Singh 2021). Application of Bayesian based phylogenetic reconstructions of 50 SVP proteins from 25 Brassicaceae species led to successful assignment of sub-genome identities of SVP sequences isolated from B. juncea cv. Varuna and B. juncea var. tumida (BRAD). The sub-genome identity of SOC1 copies of B. juncea cv. Varuna were already reported in our previous studies (Sri et al. 2015). Promoter analysis revealed differential repertoire of transcription factor binding sites including binding motifs for SVP among sub-genome specific homeologs (Sri et al. 2020).

Alternative splicing is known to result in expansion of proteome diversity (Wang et al. 2014; Xu et al. 2022). We identified several SVP isoforms representing instances of exon skipping, alternative donor and partial intron retention. These data clearly indicated that protein diversity in B. juncea was contributed not only by variation in sequences of SVP homologs but also splice forms originating from homologs. MADS domain mediates binding to DNA, whereas K- and C- domains are implicated in dimerization and mediating higher order assemblies (Kaufmann et al. 2005; Gramzow et al. 2010). The MADS domain of MEF-2A type is organized as three-layers; the DNA binding layer includes an N-terminal random coil (N-extension, 12 amino acids) and an α-helix. This is followed by 2 β-strands (βI, βII) joined by a β-hairpin (Santelli and Richmond 2000; Han et al. 2003). In the current study, we discovered a large array of SVP variants with differential presence of M-, I- K- and C- domains. In addition, natural isoforms with additional or missing K-domains were discovered. Overall, it may be inferred that B. juncea harbours a large array of natural variants of SVP proteins and SOC1 promoters suggesting complex interaction patterns among structural variants.

Next, we generated protein models of natural proteins from B. juncea cv. Varuna and B. juncea var. tumida along with 6 hypothetical proteins as controls. A combination of threading, ab-initio and homology dependent modelling was employed to generate, refine and describe protein models which displayed widespread structural variation. Although MADS domain sequence was conserved in all the SVP proteins, corresponding protein models varied markedly in structural characteristics. Notwithstanding, all SVP proteins were found to be in accordance with the established MEF-2A structure. The structural variation observed in SVP proteins could be attributed to naturally occurring mutations manifesting as amino acid substitutions, insertions and deletions. For instance, superposition of Bju_A09_SVP and Bju_A04_SVP (RMSD 20.263) displayed a bend in α3 helix, possibly due to amino acid substitutions viz. Asn94/His94, Ser95/Ala95 and Arg96/Leu96, in Bju_A04_SVP and Bju_A09_SVP, respectively. Additionally, the amino acid residues ‘GMKLMDENKRLRQH’ at positions 157 to 170 in Bju_A04_SVP, are missing in Bju_A09_SVP, which significantly impact the overall secondary structure of the 2 proteins. Similarly, superposition of Bju_A09_SVP and Bju_B02_SVP (RMSD 1.899) revealed a bend in α3 helix despite alignment of α1 helix and β1 and β2 strands. This could be attributed to variation in amino acids at position 61 to 72. Consequently, ‘SMREVLERHNLQ’ in Bju_A09_SVP were found replaced with ‘RLGLGTFSRRND’ in Bju_B02_SVP. Likewise, superposition of Bju_A09_SVP and Bju_B01_SVP (RMSD 13.871) models resulting in substitution of amino acids at positions 143 to 156, from ‘SEKIMNEISYLQRK’ to ‘GMQLMDENKRLRQQ’ in Bju_A09_SVP and Bju_B01_SVP, respectively, resulted in a conformation change in α5 helix. Thus, we could show that sequence variation explained variation in SVP protein structures.

Next, we generated the models for hypothetical SVP proteins using refined models of SVP proteins from B. juncea var. tumida as template. Homology dependent modelling of structures of related proteins after refined models are available as templates have been reported previously. Duminil and co-workers (2021) predicted the model of A. thaliana iPGAMs (independent Phosphoglycerate mutases) using the resolved X-ray diffraction-based structure of phosphoglycerate mutase from Leishmania Mexicana (3IGY) as template (Duminil et al. 2021), while Madhurantakam and Mazumdar (Tyagi et al. 2018) modelled A. thaliana FD protein employing X-ray diffraction resolved structure of C/EBPbeta Bzip homodimer (V285A) as a template.

In A. thaliana, the binding sites for SVP were mapped using ChIP assays (Li et al. 2008; Immink et al. 2012; Tao et al. 2012). These sequences were used to identify SVP binding sites on B. juncea SOC1 promoter homeologs (Sri et al. 2020). Promoter fragments (30bp), harbouring SVP binding site, were used for generating DNA models to determine sequence dependent structural variation in conformation of B. juncea SOC1 promoter fragments. The 3D DNA models were found to be highly variable in key rigid-body parameters such as H-bond length (Å) among several atom pairs, local base-pair parameters viz. shift, slide, rise, tilt, roll and twist, major and minor groove width among others. Interestingly, considerable natural structural variation was also observed in nucleotide position 15, 16, 17, 18 on minus strand and 43, 44, 45, 46 and 45, 46, 47, 48 on plus strand of BjupSOC1_AALF/BjupSOC1_AAMF1 and BjupSOC1_AAMF2, respectively. These nucleotides are identified as involved in establishing molecular contacts with SVP proteins in docking studies. Additionally, nucleotides in the sequence flanking TFBS were also involved in interactions with SVP proteins.

Structural variants of SOC1 promoters and SVP proteins interact with differential binding strengths. A total of 48 docked complexes were subsequently modelled to analyse binding patterns among SOC1 promoters and SVP proteins. These included 27 docked complexes generated using 9 natural SVP proteins and 3 SOC1 promoter homeologs from B. juncea. Inclusion of 6 hypothetical SVP proteins increased the total number of docked complexes to 48. Contrary to expectations, binding affinities were found to be largely conserved despite striking structural variation observed in both proteins and DNA. Possibly, the interacting amino-acids and nucleotide residues involved in stabilising molecular contacts are evolutionarily conserved. Alternatively, compensatory mutations have preserved the stability of structures. Further investigations revealed co-existence of both of these models with compensatory mutations playing a greater role in preserving the overall binding affinity.

In-vivo yeast one-hybrid assays validated binding patterns observed in most docked complexes except for few cases. Even as BjuVAR4_SVP and BjuVAR5_SVP showed stronger interaction with BjupSOC1AAMF1_Frag relative to BjupSOC1AALF_Frag in yeast one-hybrid assays, binding affinity was found to be similar in docking analysis. Since the size of bait fragments was large (250 bp), multiple TFBS in the adjoining regions may have interacted with upstream regulators in the in-vivo conditions to dilute the individual effect of SVP binding site. In contrast, the 30 bp promoter fragments reckoned for docking analyses examined the binding potential with SVP binding site only.

Interestingly, the binding affinity for all docked complexes involving truncated SVP, were found to be highest. The partial MADS domain of truncated SVP maybe competing with full-length proteins for SVP binding sites to modulate phenotypes in a dominant negative manner. Co-existence of gene copies and isoforms encoding full-length and truncated proteins are previously reported in B. juncea for both SOC1 (Sri et al. 2020) and FLC (unpublished data). In A. thaliana, these have been shown to act in dominant negative manner. Mizukami and co-workers (1996) demonstrated that ectopic expression of truncated AGAMOUS (AG) protein, devoid of the C-terminal domain, produced an ag mutant phenotype in A. thaliana (Mizukami et al. 1996). Pose and co-workers (2013) similarly reported dominant negative isoform (FLMδ) which shares binding site with FLMβ, for interaction with SVP (Posé et al. 2013).

In the subsequent course of study, we attempted to differentiate docked complexes based on a range of structural features. Investigations were undertaken to classify types, distribution, and number of molecular contacts such as π-π stacking, hydrophobic interactions, Van der Waals forces and Hydrogen bonds, between amino acids residues and nucleotides. We also examined if molecular contacts involved shared or unique interacting residues (nucleotide and amino acids) to determine homeolog specific patterns, if any. Overall, we found that the molecular contacts established by amino acid residues from AA sub-genome specific SVP homeologs included all the 4 categories of non-covalent interactions listed above. However, few BB sub-genome specific SVP homeologs were distinct since they did not form π-π stacking. Next, we examined if various SOC1 promoters displayed commonalities in nucleotides establishing molecular contacts with SVP proteins and, if unique or shared interacting amino acids, were involved in these interactions. Consistently, nucleotide positions 15th, 16th, 17th and 18th embedded within the SVP binding sites, were found to be involved in interactions. Further, we asked if substitutions at these positions, altered the binding strength. We found that naturally occurring transitions and transversions at these positions did not significantly affect binding affinity of docked complexes. This finding is crucial as it provides and explanation for the degeneracy often observed in DNA binding sites of most MADS box proteins. Comparative analysis of 27 docked complexes also showed that interacting partners were stabilised largely by a unique set of amino-acid residues across docked complexes. Alanine scanning for hotspots, nevertheless, led to the identification of conserved amino acid hotspots. Most hotspots were found to map to MADS domain of which at least one amino acid residue per protein was found to be conserved for interaction with the 3 B. juncea SOC1 promoter homeologs. Interestingly, the conserved residues detected as hotspots were also involved in formation of hydrogen bonds, hydrophobic interactions and π-π stacking. These data also suggest that amino acid residues, crucial for stabilising interactions, are under purifying selection, and, mutations in such critical amino acids of regulatory proteins can potentially perturb GRNs resulting in modified phenotypes (Rebeiz et al. 2015). Taken together, docking studies revealed substantial variation in the bi-molecular interaction patterns supporting the view that compensatory mutations among SVP proteins and SOC1 promoters are involved in preserving overall binding affinity though evolutionarily conserved amino acid residues are also involved. Identification of conserved amino-acid residues is significant from crop improvement standpoint. Arguably, mutating conserved amino-acids identified as hotspots within SVP may prevent binding with SOC1 promoters leading to increased levels of SOC1 expression necessary for early-flowering. Since thermodynamic interaction patterns are most profoundly influenced with terminal heat stress, we propose routine integration of structural informatics and protein engineering for trait modification of crops.

Using homologs of Brassica SOC1 promoters and SVP protein regulators, we show that despite considerable natural structural variation in both DNA (SOC1 promoter fragment) and protein (SVP), the binding affinity of forty-eight docked complexes has remained evolutionarily conserved suggesting cruciality of structural features necessary for biological functions. Analysis of 27 docked complexes, formed between 9 natural SVP proteins, and 3 SOC1 promoter fragments, permitted mapping of distribution and type of a range of molecular contacts (π-π stacking, hydrophobic interactions, Van-der-Waals forces, Hydrogen bonds). Shared and unique interacting residues (nucleotide and amino acids) were also discerned. Our data revealed substantial variation in patterns of interaction, with each docked complex stabilised uniquely, implicating compensatory mutations for preservation of binding potential. Using computational alanine substitution, amino acids conferring stability to docked complexes were identified. Yeast-one-hybrid assays validated the binding potential predicted in docked complexes. This first ever study integrates structural insights to provide a mechanistic understanding of complex combinatorial interactions among multiple promoters and upstream proteins in polyploid Brassicas. Our study has significant implications in Brassica improvement, especially in context to building climate resilient crops. For instance, precise editing of SVP protein(s) homologs at crucial amino-acid positions identified for establishing contacts with promoters, can abolish binding of SVP, thereby resulting in de-repression of natural SOC1 promoters. We recommend mainstreaming of structural studies as a strategy for any mutation based functional studies and crop improvement.

A. thaliana Arabidopsis thaliana

AA Brassica diploid base genome A

B. juncea Brassica juncea

BB Brassica diploid base genome B

BRAD Brassica database

CC Brassica diploid base genome C

HADDOCK High Ambiguity Driven protein-protein DOCKing

LF Least Fractionated

MF1 Moderately Fractionated

MF2 Most Fractionated

SVP SHORT VEGETATIVE PHASE protein

SOC1 SUPPRESSOR of OVEREXPRESSION of CONSTANS

Competing Interests

The authors declare no issue of competing interests.

Funding

This work was supported by grants received from Department of Biotechnology, Govt. of India (BT/PR24047/BPA/118/364/2017) to AS. Junior and Senior Research Fellowship was received by BG from the grant. Financial assistance as Junior and Senior Research Fellowship to SK and RS from Department of Biotechnology, Council of Scientific and Industrial Research, Govt. of India, is gratefully appreciated. Infrastructural support from TERI School of Advanced Studies is gratefully acknowledged.

Author Contributions

AS was responsible for overall conceptualisation, planning, fund acquisition, and project administration. SK established analytical flow for structural informatics and carried out all investigations (informatics and yeast one-hybrid assays), generated all data, and wrote the original manuscript. SK, RS and CM analysed protein models and docked complexes. CM supervised structural informatics. BG cloned baits for yeast one-hybrid assays. AS and SK wrote the final manuscript. All authors have read, edited and approved the manuscript.

Funding

Bhattacharya A, Tejero R, Montelione GT (2007) Evaluating protein structures determined by structural genomics consortia tools for structure quality evaluation. Proteins 66(4):778–795. https://doi.org/10.1002/prot.21165
Blümel M, Dally N, Jung C (2015) Flowering time regulation in crops-what did we learn from Arabidopsis? Curr Opin Biotechnol 32:121–129. https://doi.org/10.1016/j.copbio.2014.11.023
Borner R, Kampmann G, Chandler J et al (2000) A MADS domain gene involved in the transition to flowering in Arabidopsis. Plant J 24:591–599. https://doi.org/10.1046/j.1365-313X.2000.00906.x
Bouché F, Lobet G, Tocquin P, Périlleux C (2016) FLOR-ID: an interactive database of flowering-time gene networks in Arabidopsis thaliana. Nucleic Acids Res 44(D1):D1167–D1171. https://doi.org/10.1093/nar/gkv1054
Cheng F, Wu J, Cai X et al (2018) Gene retention, fractionation and subgenome differences in polyploid plants. Nat Plants 4:258–268. https://doi.org/10.1038/s41477-018-0136-7
Cheng F, Wu J, Fang L et al (2012) Biased gene fractionation and dominant gene expression among the subgenomes of Brassica rapa. PLoS ONE 7:e36442. https://doi.org/10.1371/journal.pone.0036442
Cheng F, Wu J, Wang X (2014) Genome triplication drove the diversification of Brassica Plants. Hortic Res 1:14024. https://doi.org/10.1038/hortres.2014.24
De Vries SJ, Van Dijk M, Bonvin AMJJ (2010) The HADDOCK web server for data-driven biomolecular docking. Nat Protoc 5:883–897. https://doi.org/10.1038/nprot.2010.32
Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7:214. https://doi.org/10.1186/1471-2148-7-214
Duminil P, Davanture M, Oury C et al (2021) Arabidopsis thaliana 2,3-bisphosphoglycerate-independent phosphoglycerate mutase 2 activity requires serine 82 phosphorylation. Plant J 107:1478–1489. https://doi.org/10.1111/tpj.15395
Eisenberg D, Lüthy R, Bowie JU (1997) VERIFY3D: assessment of protein models with three-dimensional profiles. Methods Enzymol 277:396–404. https://doi.org/10.1016/S0076-6879(97)77022-8
Elias EH, Flynn R, Idowu OJ et al (2019) Crop vulnerability to weather and climate risk: Analysis of interacting systems and adaptation efficacy for sustainable crop production. Sustain 11:6619. https://doi.org/10.3390/su11236619
Emsley P, Lohkamp B, Scott WG, Cowtan K (2010) Features and development of Coot. Acta Crystallogr Sect D Biol Crystallogr 66:486–501. https://doi.org/10.1107/S0907444910007493
Friedt W, Tu J, Fu T (2018) Academic and Economic Importance of Brassica napus Rapeseed. In: Liu S, Snowdon R, Chalhoub B (eds) The Brassica napus Genome. Compendium of Plant Genomes. Springer, Cham, pp 1–20
Glover NM, Redestig H, Dessimoz C (2016) Homoeologs: What Are They and How Do We Infer Them? Trends Plant Sci 21:609–621. https://doi.org/10.1016/j.tplants.2016.02.005
Gramzow L, Ritz MS, Theißen G (2010) On the origin of MADS-domain transcription factors. Trends Genet 26:149–153. https://doi.org/10.1016/j.tig.2010.01.004
Hall TA (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41:95–98
Han A, Pan F, Stroud JC et al (2003) Sequence-specific recruitment of transcriptional co-repressor Cabin1 by myocyte enhancer factor-2. Nature 422(6933):730–734. https://doi.org/10.1038/nature01555
Heinig M, Frishman D (2004) STRIDE: a Web server for secondary structure assignment from known atomic coordinates of proteins. Nucl Acids Res 32:W500–W502. https://doi.org/10.1093/nar/gkh429
Hong JK, Kim SY, Kim KS et al (2013) Overexpression of a Brassica rapa MADS-box gene, BrAGL20, induces early flowering time phenotypes in Brassica napus. Plant Biotechnol Rep 7:231–237. https://doi.org/10.1007/s11816-012-0254-z
Huang F, Liu T, Tang J et al (2019) BcMAF2 activates BcTEM1 and represses flowering in Pak-choi (Brassica rapa ssp. chinensis). Plant Mol Biol 100:19–32. https://doi.org/10.1007/s11103-019-00867-1
Immink RGH, Pose D, Ferrario S et al (2012) Characterization of SOC1’s central role in flowering by the identification of its upstream and downstream regulators. Plant Physiol 160:433–449. https://doi.org/10.1104/pp.112.202614
Jain A, Anand S, Singh NK, Das S (2018) Sequence and functional characterization of MIRNA164 promoters from Brassica shows copy number dependent regulatory diversification among homeologs. Funct Integr Genomics 18:369–383. https://doi.org/10.1007/s10142-018-0598-8
Jat RS, Singh VV, Sharma P, Rai PK (2019) Oilseed brassica in India: Demand, supply, policy perspective and future potential. Ocl 26:8. https://doi.org/10.1051/ocl/2019005
Jiang W, Wei D, Zhou W et al (2018) HDA9 interacts with the promoters of SOC1 and AGL24 involved in flowering time control in Brassica juncea. Biochem Biophys Res Commun 499:519–523. https://doi.org/10.1016/j.bbrc.2018.03.180
Jumper J, Evans R, Pritzel A et al (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596:583–589. https://doi.org/10.1038/s41586-021-03819-2
Kang L, Qian L, Zheng M et al (2021) Genomic insights into the origin, domestication and diversification of Brassica juncea. Nat Genet 53:1392–1402. https://doi.org/10.1038/s41588-021-00922-y
Kapustin Y, Souvorov A, Tatusova T, Lipman D (2008) Splign: Algorithms for computing spliced alignments with identification of paralogs. Biol Direct 3:1–13. https://doi.org/10.1186/1745-6150-3-20
Kaufmann K, Melzer R, Theißen G (2005) MIKC-type MADS-domain proteins: Structural modularity, protein interactions and network evolution in land plants. Gene 347:183–198. https://doi.org/10.1016/j.gene.2004.12.014
Kaur S, Atri C, Akhatar J et al (2021) Genetics of days to flowering, maturity and plant height in natural and derived forms of Brassica rapa L. Theor Appl Genet 134(2):473–487. https://doi.org/10.1007/s00122-020-03707-9
Kinoshita A, Richter R (2021) Genetic and molecular basis of floral induction in Arabidopsis thaliana. J Exp Bot 71:2490–2504. https://doi.org/10.1093/JXB/ERAA057
Krieger E, Koraimann G, Vriend G (2002) Increasing the precision of comparative models with YASARA NOVA - A self-parameterizing force field. Proteins: Struct Funct Genet 47:393–402. https://doi.org/10.1002/prot.10104
Kuznetsov IB, Gou Z, Li R, Hwang S (2006) Using evolutionary and structural information to predict dna-binding sites on dna-binding proteins. Proteins 27:19–27. https://doi.org/10.1002/prot.20977
Larkin MA, Blackshields G, Brown NP et al (2007) Clustal W and Clustal X version 2. 0. Bioinformatics 23(21):2947–2948. https://doi.org/10.1093/bioinformatics/btm404
Laskowski RA, MacArthur MW, Moss DS, Thornton JM (1993) PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Crystallogr 26(2):283–291. https://doi.org/10.1107/S0021889892009944
Laskowski RA, Swindells MB (2011) LigPlot+: multiple ligand–protein interaction diagrams for drug discovery. J Chem Inf Model 51:2778–2786. https://doi.org/10.1021/ci200227u
Lau V, Woo R, Pereira B et al (2021) AGENT: the Arabidopsis Gene Regulatory Network Tool for Exploring and Analyzing GRNs. https://doi.org/10.1101/2021.04.28.441830. bioRxiv 2021–2024
Lee H, Suh SS, Park E et al (2000) The AGAMOUS-lIKE 20 MADS domain protein integrates floral inductive pathways in Arabidopsis. Genes Dev 14:2366–2376. https://doi.org/10.1101/gad.813600
Lee J, Lee I (2010) Regulation and function of SOC1, a flowering pathway integrator. J Exp Bot 61:2247–2254. https://doi.org/10.1093/jxb/erq098
Lee JH, Yoo SJ, Park SH et al (2007) Role of SVP in the control of flowering time by ambient temperature in Arabidopsis. Genes Dev 21:397–402. https://doi.org/10.1101/gad.1518407
Leonard DA, Rajaram N, Kerppola TK (1997) Structural basis of DNA bending and oriented heterodimer binding by the basic leucine zipper domains of Fos and Jun. Proc Natl Acad Sci U S A 94:4913–4918. https://doi.org/10.1073/pnas.94.10.4913
Li C, Ma G, peng, Xie T et al (2018) SOC1 and AGL24 interact with AGL18-1, not the other family members AGL18-2 and AGL18-3 in Brassica juncea. Acta Physiol Plant 40:1–11. https://doi.org/10.1007/s11738-017-2580-9
Li D, Liu C, Shen L et al (2008) A Repressor Complex Governs the Integration of Flowering Signals in Arabidopsis. Dev Cell 15:110–120. https://doi.org/10.1016/j.devcel.2008.05.002
Li S, Olson WK, Lu XJ (2019) Web 3DNA 2.0 for the analysis, visualization, and modeling of 3D nucleic acid structures. Nucleic Acids Res 47:W26–W34. https://doi.org/10.1093/nar/gkz394
Liu C, Chen H, Er HL et al (2008) Direct interaction of AGL24 and SOC1 integrates flowering signals in Arabidopsis. Development 135:1481–1491. https://doi.org/10.1242/dev.020255
Lovell SC, Davis IW, Arendall WB 3 et al (2003) Structure validation by Cα geometry: φ,ψ and Cβ deviation. Proteins 50(3):437–450. https://doi.org/10.1002/prot.10286
Lysak MA, Koch MA (2011) Phylogeny, Genome, and Karyotype Evolution of Crucifers (Brassicaceae). In: Schmidt R, Bencroft I (eds) Genetics and Genomics of the Brassicaceae. Springer, New York, pp 1–31
Ma GP, Zhao DQ, Wang TW et al (2019) BBX32 interacts with AGL24 involved in flowering time control in Chinese cabbage (Brassica rapa L. ssp. pekinensis). Not Bot Horti Agrobot Cluj-Napoca 47:34–45. https://doi.org/10.15835/nbha47111205
Marchler-Bauer A, Bryant SH (2004) CD-Search: Protein domain annotations on the fly. Nucleic Acids Res 32:327–331. https://doi.org/10.1093/nar/gkh454
Mizukami Y, Huang H, Tudor M et al (1996) Functional domains of the floral regulator AGAMOUS: Characterization of the DNA binding domain and analysis of dominant negative mutations. Plant Cell 8:831–845. https://doi.org/10.1105/tpc.8.5.831
Morin B, Nichols LA, Holland LJ (2006) Flanking sequence composition differentially affects the binding and functional characteristics of glucocorticoid receptor homo-and heterodimers. Biochemistry 45:7299–7306. https://doi.org/10.1021/bi060314k
Nagaharu U (1935) Genome analysis in Brassica with special reference to the experimental formation of Brassica napus and peculiar mode of fertilization. J Jap Bot 7:389–452
Nagaoka M, Shiraishi Y, Sugiura Y (2001) Selected base sequence outside the target binding site of zinc finger protein Sp1. Nucleic Acids Res 29:4920–4929. https://doi.org/10.1093/nar/29.24.4920
Nieto Feliner G, Casacuberta J, Wendel JF (2020) Genomics of evolutionary novelty in hybrids and polyploids. Front Genet 11:1–21. https://doi.org/10.3389/fgene.2020.00792
Pajoro A, Biewers S, Dougali E et al (2014) The (r)evolution of gene regulatory networks controlling Arabidopsis plant reproduction: A two-decade history. J Exp Bot 65:4731–4745. https://doi.org/10.1093/jxb/eru233
Parkin IAP, Gulden SM, Sharpe AG et al (2005) Segmental structure of the Brassica napus genome based on comparative analysis with Arabidopsis thaliana. Genetics 171(2):765–781. https://doi.org/10.1534/genetics.105.042093
Pettersen EF, Goddard TD, Huang CC et al (2004) UCSF Chimera — A Visualization System for Exploratory Research and Analysis. J Comput Chem 25(13):1605–1612. https://doi.org/10.1002/jcc.20084
Pollock R, Treisman R (1990) A sensitive method for the determination of protein-DNA binding specificities. Nucleic Acids Res 18:6197–6204. https://doi.org/10.1093/nar/18.21.6197
Posé D, Verhage L, Ott F et al (2013) Temperature-dependent regulation of flowering by antagonistic FLM variants. Nature 503:414–417. https://doi.org/10.1038/nature12633
Potterton E, Briggs P, Turkenburg M, Dodson E (2003) A graphical user interface to the CCP4 program suite. Acta Crystallogr D Biol Crystallogr 59:1131–1137. https://doi.org/10.1107/s0907444903008126
Preston JC, Jorgensen SA, Jha SG (2014) Functional characterization of duplicated SUPPRESSOR OF OVEREXPRESSION OF CONSTANS 1-like genes in Petunia. PLoS ONE 9:1–6. https://doi.org/10.1371/journal.pone.0096108
Quiroz S, Yustis JC, Chávez-Hernández EC et al (2021) Beyond the genetic pathways, flowering regulation complexity in Arabidopsis thaliana. Int J Mol Sci 22:5716. https://doi.org/10.3390/ijms22115716
Rajaram N, Kerppola TK (1997) DNA bending by Fos–Jun and the orientation of heterodimer binding depend on the sequence of the AP-1 site. EMBO J 16:2917–2925. https://doi.org/10.1093/emboj/16.10.2917
Rambaut A (2006) FigTree: tree fig drawing tool version 131. Institute of Evolutionary Biology University of Edinburgh. http://tree.bio.ed.ac.uk/software/figtree/
Rao S, Gupta A, Bansal C et al (2022) A conserved HSF: miR169 : NF-YA loop involved in tomato and Arabidopsis heat stress tolerance. Plant J 112(1):7–26. https://doi.org/10.1111/tpj.15963
Rebeiz Mark, Patel NH, Hinman VF (2015) Unraveling the tangled skein: The evolution of transcriptional regulatory networks in development. Annu Rev Genomics Hum Genet 16:103–131. https://doi.org/10.1146/annurev-genom-091212-153423
Robert X, Gouet P (2014) Deciphering key features in protein structures with the new ENDscript server. Nucleic Acids Res 42:320–324. https://doi.org/10.1093/nar/gku316
Rozas J, Ferrer-Mata A, Sanchez-DelBarrio JC et al (2017) DnaSP 6: DNA sequence polymorphism analysis of large data sets. Mol Biol Evol 34:3299–3302. https://doi.org/10.1093/molbev/msx248
Sagendorf JM, Berman HM, Rohs R (2017) DNAproDB: An interactive tool for structural analysis of DNA-protein complexes. Nucleic Acids Res 45:W89–W97. https://doi.org/10.1093/nar/gkx272
Samach A, Onouchi H, Gold SE et al (2000) Distinct roles of constans target genes in reproductive development of Arabidopsis. Science 288:1613–1616. https://doi.org/10.1126/science.288.5471.1613
Sankoff D, Zheng C, Zhu Q (2010) The collapse of gene complement following whole genome duplication. BMC Genomics 11:313. https://doi.org/10.1186/1471-2164-11-313
Santelli E, Richmond TJ (2000) Crystal structure of MEF2A core bound to DNA at 1.5 Å Resolution. J Mol Biol 297:437–449. https://doi.org/10.1006/jmbi.2000.3568
Schiessl S, Huettel B, Kuehn D et al (2017) Targeted deep sequencing of flowering regulators in Brassica napus reveals extensive copy number variation. Sci Data 4:1–10. https://doi.org/10.1038/sdata.2017.13
Schranz ME, Lysak MA, Mitchell-Olds T (2006) The ABC’s of comparative genomics in the Brassicaceae: building blocks of crucifer genomes. Trends Plant Sci 11:535–542. https://doi.org/10.1016/j.tplants.2006.09.002
Seo E, Lee H, Jeon J et al (2009) Crosstalk between Cold Response and Flowering in Arabidopsis Is Mediated through the Flowering-Time Gene SOC1 and Its Upstream Negative Regulator FLC. Plant Cell 21:3185–3197. https://doi.org/10.1105/tpc.108.063883
Singh S, Singh A (2021) A prescient evolutionary model for genesis, duplication and differentiation of MIR160 homologs in Brassicaceae. Mol Genet Genomics 296:985–1003. https://doi.org/10.1007/s00438-021-01797-8
Sippl MJ (1993) Recognition of errors in three-dimensional structures of proteins. Proteins 17(4):355–362. https://doi.org/10.1002/prot.340170404
Solovyev V, Kosarev P, Seledsov I, Vorobyev D (2006) Automatic annotation of eukaryoticgenes, pseudogenes and promoters. Genome Biol 7:101–1012. https://doi.org/10.1186/gb-2006-7-s1-s10
Soltis PS, Marchant DB, Van de Peer Y, Soltis DE (2015) Polyploidy and genome evolution in plants. Curr Opin Genet Dev 35:119–125. https://doi.org/10.1016/j.gde.2015.11.003
Sri T, Gupta B, Tyagi S, Singh A (2020) Homeologs of Brassica SOC1, a central regulator of flowering time, are differentially regulated due to partitioning of evolutionarily conserved transcription factor binding sites in promoters. Mol Phylogenet Evol 147:106777. https://doi.org/10.1016/j.ympev.2020.106777
Sri T, Mayee P, Singh A (2015) Sequence and expression variation in SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1): homeolog evolution in Indian Brassicas. Dev Genes Evol 225:287–303. https://doi.org/10.1007/s00427-015-0513-4
Srinivasarao C, Rakesh S, Ranjith Kumar G et al (2021) Soil degradation challenges for sustainable agriculture in tropical India. Curr Sci 120:492. https://doi.org/10.18520/cs/v120/i3/492-500
Strader L, Weijers D, Wagner D (2022) Plant transcription factors — being in the right place with the right company. Curr Opin Plant Biol 65. https://doi.org/10.1016/j.pbi.2021.102136
Tao Z, Shen L, Liu C et al (2012) Genome-wide identification of SOC1 and SVP targets during the floral transition in Arabidopsis. Plant J 70:549–561. https://doi.org/10.1111/j.1365-313X.2012.04919.x
Tyagi S, Mazumdar PA, Mayee P et al (2018) Natural variation in Brassica FT homeologs influences multiple agronomic traits including flowering time, silique shape, oil profile, stomatal morphology and plant height in B. juncea. Plant Sci 277:251–266. https://doi.org/10.1016/j.plantsci.2018.09.018
Van De Peer Y, Mizrachi E, Marchal K (2017) The evolutionary significance of polyploidy. Nat Rev Genet 18:411–424. https://doi.org/10.1038/nrg.2017.26
van Dijk M, Bonvin AMJJ (2009) 3D-DART: A DNA structure modelling server. Nucleic Acids Res 37:235–239. https://doi.org/10.1093/nar/gkp287
Varadi M, Anyango S, Deshpande M et al (2022) AlphaFold Protein Structure Database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res 50:D439–D444. https://doi.org/10.1093/nar/gkab1061
Wang H, You C, Chang F et al (2014) Alternative splicing during Arabidopsis flower development results in constitutive and stage-regulated isoforms. Front Genet 5:1–9. https://doi.org/10.3389/fgene.2014.00025
Waterhouse A, Bertoni M, Bienert S et al (2018) SWISS-MODEL: Homology modelling of protein structures and complexes. Nucleic Acids Res 46:W296–W303. https://doi.org/10.1093/nar/gky427
Wolberger C (2021) How structural biology transformed studies of transcription regulation. J Biol Chem 296:100741. https://doi.org/10.1016/j.jbc.2021.100741
Xu D, Tang Q, Leister D, Kleine T (2022) Response of the organellar and nuclear (post) transcriptomes of Arabidopsis to drought stress. https://doi.org/10.1101/2022.08.09.503311. bioRxiv 2022:2022-08
Xu S, Hong L (2021) Navigating flower development with a new atlas. Dev Cell 56:399–400. https://doi.org/10.1016/j.devcel.2021.02.001
Yan K, Li CC, Wang Y et al (2018) AGL18-1 delays flowering time through affecting expression of flowering-related genes in Brassica juncea. Plant Biotechnol 35:357–363. https://doi.org/10.5511/plantbiotechnology.18.0824a
Yang J, Roy A, Zhang Y (2013) Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics 29:2588–2595. https://doi.org/10.1093/bioinformatics/btt447
Yang W, Deng L (2020) PreDBA: A heterogeneous ensemble approach for predicting protein-DNA binding affinity. Sci Rep 10(1):1278. https://doi.org/10.1038/s41598-020-57778-1
Yoo SK, Chung KS, Kim J, Lee JH, Hong SM, Yoo SJ, Yoo SY, Lee JS, Ahn JH (2005) CONSTANS activates SUPPRESSOR OF OVEREXPRESSION OF CONSTANS 1 through FLOWERING LOCUS T to promote flowering in Arabidopsis. Plant Physiol 139(2):770–778. https://doi.org/10.1104/pp.105.066928
Zhang C, Freddolino PL, Zhang Y (2017) COFACTOR: Improved protein function prediction by combining structure, sequence and protein-protein interaction information. Nucleic Acids Res 45:W291–W299. https://doi.org/10.1093/nar/gkx366
Zhang N, Chen Y, Zhao F et al (2018) PremPDI estimates and interprets the effects of missense mutations on protein-DNA interactions. PLoS Comput Biol 14:1–15. https://doi.org/10.1371/journal.pcbi.1006615
Zhang Y, Xu Y, Nie J et al (2023) DNA – TCP complex structures reveal a unique recognition mechanism for TCP transcription factor families. Nucleic Acids Res 51(1):434–448. https://doi.org/10.1093/nar/gkac1171
Zheng W, Zhang C, Bell EW, Zhang Y (2019) I-TASSER gateway: A protein structure and function prediction server powered by XSEDE. Futur Gener Comput Syst 99:73–85. https://doi.org/10.1016/j.future.2019.04.011
Zhou W, Wei D, Jiang W et al (2018) The protein J3 regulates flowering through directly interacting with the promoter of SOC1 in Brassica juncea. Biochem Biophys Res Commun 496:1217–1221. https://doi.org/10.1016/j.bbrc.2018.01.174
Ziolkowski PA, Kaczmarek M, Babula D, Sadowski J (2006) Genome evolution in Arabidopsis/Brassica: Conservation and divergence of ancient rearranged segments and their breakpoints. Plant J 47:63–74. https://doi.org/10.1111/j.1365-313X.2006.02762.x

SupplementaryFig1.eps
Supplementary Fig1 Graph depicting nucleotide polymorphism as Pi (π) values across Brassicaceae SVP genomic DNA sequences.. The window and step-size were taken as 25 and 2 nucleotides, respectively
SupplementaryFig2.eps
Supplementary Fig2 Multiple sequence alignment of 50 SVP protein sequences from 25 diverse species of Brassicaceae. The MADS and K domain are marked as orange and green bars
SupplementaryFig3.eps
Supplementary Fig3 MADS domain (red), I- domain (yellow), K- domain (green), and C-domain (grey) are represented in a schematic manner depicting differential presence of the respective domains across the 9 SVP proteins from B. juncea
SupplementaryFig4.eps
Supplementary Fig4 In silico homology modelling and assessment of SVP proteins from B. juncea cv. Varuna a. Models for BjuVAR1_SVP, BjuVAR2_SVP, BjuVAR3_SVP, BjuVAR4_SVP are marked as a1, a2, a3 and a4, respectively. The α-helices are marked in turquoise blue, β-strands in pink, and loops in brown. The C- and N-termius are labelled. b. Magnified view of the MADS domain (b1, b2, b3 and b4) from the models generated for B. juncea cv. Varuna SVP proteins (a1, a2, a3 and a4), respectively
SupplementaryFig5.eps
Supplementary Fig5 In silicohomology modelling of hypothetical SVP proteins a. Models generated for hypothetical SVP proteins (Hyp1_SVP, Hyp2_SVP, Hyp3_SVP. Hyp4_SVP, Hyp5_SVP and Hyp6_SVP). The α-helices are depicted in blue, β-strands in pink, and loops in brown. The C- and N-termius are labelled. b. Magnified view of the MADS domain from the models generated for hypothetical SVP proteins c. Design of hypothetical proteins depicting differential presence of M-, I-, K-, C- domains. Differential presence of domains in natural SVP proteins Bju_B02_SVP and BjuVAR1_SVP is also illustrated
SupplementaryFig6.eps
Supplementary Fig6 Docked complexes between B. juncea var. tumidaSVP proteins and SOC1 promoters (BjupSOC1_AALF, BjupSOC1_AAMF1and BjupSOC1_AAMF2). The MADS domain α-helices are marked in blue , β-strands in pink and loops in brown. The TFBS for SVP is indicated in purple. a1. Docked complexes for Bju_A04_SVP: BjupSOC1_AALF a2. Bju_A04_SVP: BjupSOC1_AAMF1 a3. Bju_A04_SVP: BjupSOC1_AAMF2 b1. Bju_A09_SVP: BjupSOC1_AALFb2. Bju_A09_SVP: BjupSOC1_AAMF1 b3. Bju_A09_SVP: BjupSOC1_AAMF2c1. Bju_B01_SVP: BjupSOC1_AALF c2. Bju_ B01_SVP: BjupSOC1_AAMF1 c3. Bju_ B01_SVP: BjupSOC1_AAMF2 d1. Bju_B02_SVP: BjupSOC1_AALFd2. Bju_ B02_SVP: BjupSOC1_AAMF1 d3. Bju_ B02_SVP: BjupSOC1_AAMF2e1. Bju_BjuB024255: BjupSOC1_AALF e2. Bju_BjuB024255: BjupSOC1_AAMF1 e3. Bju_BjuB024255: BjupSOC1_AAMF2
SupplementaryFig7.eps
Supplementary Fig7 Docked complexes between B. juncea cv. Varuna SVP proteins and SOC1 promoters (BjupSOC1_AALF, BjupSOC1_AAMF1and BjupSOC1_AAMF2). The MADS domain α-helices are marked in turquoise blue, β-strands in pink and loops in brown. The TFBS of SVP is shown in purple. a1. BjuVAR1_SVP: BjupSOC1_AALFa2. BjuVAR1_SVP: BjupSOC1_AAMF1 a3. BjuVAR1_SVP: BjupSOC1_AAMF2b1. BjuVAR2_SVP: BjupSOC1_AALF b2. BjuVAR2_SVP: BjupSOC1_AAMF1 b3. BjuVAR2_SVP: BjupSOC1_AAMF2 c1. BjuVAR3_SVP: BjupSOC1_AALFc2. BjuVAR3_SVP: BjupSOC1_AAMF1 c3. BjuVAR3_SVP: BjupSOC1_AAMF2d1. BjuVAR4_SVP: BjupSOC1_AALF d2. BjuVAR4_SVP: BjupSOC1_AAMF1 d3. BjuVAR4_SVP: BjupSOC1_AAMF2
SupplementaryFig8.eps
Supplementary Fig8 Docked complexes between hypothetical SVP proteins and SOC1 promoters (BjupSOC1_AALF, BjupSOC1_AAMF1and BjupSOC1_AAMF2). The MADS domain α-helices are marked in blue, β-strands in pink and loops in brown. The TFBS of SVP is shown in purple. a1. Hyp1_SVP: BjupSOC1_AALFa2. Hyp1_SVP: BjupSOC1_AAMF1 a3. Hyp1_SVP: BjupSOC1_AAMF2 b1. Hyp2_SVP: BjupSOC1_AALF b2. Hyp2_SVP: BjupSOC1_AAMF1 b3. Hyp2_SVP: BjupSOC1_AAMF2 c1. Hyp3_SVP: BjupSOC1_AALFc2. Hyp3_SVP: BjupSOC1_AAMF1 c3. Hyp3_SVP: BjupSOC1_AAMF2 d1. Hyp4_SVP: BjupSOC1_AALF d2. Hyp4_SVP: BjupSOC1_AAMF1 d3. Hyp4_SVP: BjupSOC1_AAMF2 e1. Hyp5_SVP: BjupSOC1_AALFe2. Hyp5_SVP: BjupSOC1_AAMF1 e3. Hyp5_SVP: BjupSOC1_AAMF2 f1. Hyp6_SVP: BjupSOC1_AALF f2. Hyp6_SVP: BjupSOC1_AAMF1 f3. Hyp6_SVP: BjupSOC1_AAMF2
SupplementaryFig9.eps
Supplementary Fig9 Mapping of SVP specific transcription factor binding sites (TFBS) on SOC1 promoter fragments corresponding to BjupSOC1_AALF, BjupSOC1_AAMF1 and BjupSOC1_AAMF2, depicted as unaligned sequences (a) and aligned sequences (b). A flanking region of 10 bp has been taken along with 10/11 bp SVP specific TFBS on B. juncea SOC1 promoter homeologs AALF and AAMF1/AAMF2. The nucleotides at positions 15, 16, 17 and 18 found to interact with all SVP proteins are marked. The consensus SVP specific TFBSs present generated using Web Logo is depicted in (c) wherein height of letters used for nucleotides (A, G, C, T) indicate degree of conservation
SupplementaryFigure10.pdf
Supplementary Fig10 2D representation of bimolecular interactions for 26 docked complexes generated using DNAproDB. a. Residue contact map depicts hydrogen bonds as orange lines. The Secondary Structure Elements (SSE) are depicted as coloured symbols (referred as markers). These are, helices (red circle), strand (green triangle) and loop (blue square). The relative sizes of these markers depict the number of nucleotide interactions b. Helical Contact Map depicts the interactions of protein secondary structural elements (SSE) along the DNA helical axis c.Helical Shape Overlay depicts major groove width along the DNA fragment (30 nt) harbouring presumptive SVP binding site (CCAAAAATAGC). The interacting amino acid residues and their secondary structures are plotted to depict the position where each residue interacts at the interface. The docked complexes are indicated as: Bju_A04_SVP: BjupSOC1_AAMF1 (10.1), Bju_A04_SVP: BjupSOC1_AAAMF2 (10.2) Bju_A09_SVP: BjupSOC1_AALF (10.3) Bju_A09_SVP: BjupSOC1_AAMF1 (10.4) Bju_A09_SVP: BjupSOC1_AAMF2 (10.5) Bju_B01_SVP: BjupSOC1_AALF (10.6) Bju_B01_SVP: BjupSOC1_AAMF1 (10.7) Bju_B01_SVP: BjupSOC1_AAMF2 (10.8) Bju_B02_SVP: BjupSOC1_AALF (10.9) Bju_B02_SVP: BjupSOC1_AAMF1 (10.10) Bju_B02_SVP: BjupSOC1_AAMF2 (10.11) Bju_BjuB024255: BjupSOC1_AALF (10.12) Bju_BjuB024255: BjupSOC1_AAMF1 (10.13) Bju_BjuB024255: BjupSOC1_AAMF2. (10.14) BjuVAR1_SVP: BjupSOC1_AALF (10.15) BjuVAR1_SVP: BjupSOC1_AAMF1 (10.16) BjuVAR1_SVP: BjupSOC1_AAMF2 (10.17) BjuVAR2_SVP: BjupSOC1_AALF (10.18) BjuVAR2_SVP: BjupSOC1_AAMF1 (10.19) BjuVAR2_SVP: BjupSOC1_AAMF2 (10.20) BjuVAR3_SVP: BjupSOC1_AALF (10.21) BjuVAR3_SVP: BjupSOC1_AAMF1 (10.22) BjuVAR3_SVP: BjupSOC1_AAMF2 (10.23) BjuVAR4_SVP: BjupSOC1_AALF (10.24) BjuVAR4_SVP: BjupSOC1_AAMF1 (10.25) BjuVAR4_SVP: BjupSOC1_AAMF2 (10.26)
SupplementaryFigure11.pdf
Supplementary Fig11 2D representation of bimolecular interactions for 26 docked complexes generated using Ligplot version v.2.2.5 software. Green dotted and maroon lines indicate hydrogen bonds and hydrophobic interactions, respectively. The docked complexes are indicated as: Bju_A04_SVP: BjupSOC1_AAMF1 (11.1), Bju_A04_SVP: BjupSOC1_AAAMF2 (11.2) Bju_A09_SVP: BjupSOC1_AALF (11.3) Bju_A09_SVP: BjupSOC1_AAMF1 (11.4) Bju_A09_SVP: BjupSOC1_AAMF2 (11.5) Bju_B01_SVP: BjupSOC1_AALF (11.6) Bju_B01_SVP: BjupSOC1_AAMF1 (11.7) Bju_B01_SVP: BjupSOC1_AAMF2 (11.8) Bju_B02_SVP: BjupSOC1_AALF (11.9) Bju_B02_SVP: BjupSOC1_AAMF1 (11.10) Bju_B02_SVP: BjupSOC1_AAMF2 (11.11) Bju_BjuB024255: BjupSOC1_AALF (11.12) Bju_BjuB024255: BjupSOC1_AAMF1 (11.13) Bju_BjuB024255: BjupSOC1_AAMF2. (11.14) BjuVAR1_SVP: BjupSOC1_AALF (11.15) BjuVAR1_SVP: BjupSOC1_AAMF1 (11.16) BjuVAR1_SVP: BjupSOC1_AAMF2 (11.17) BjuVAR2_SVP: BjupSOC1_AALF (11.18) BjuVAR2_SVP: BjupSOC1_AAMF1 (11.19) BjuVAR2_SVP: BjupSOC1_AAMF2 (11.20) BjuVAR3_SVP: BjupSOC1_AALF (11.21) BjuVAR3_SVP: BjupSOC1_AAMF1 (11.22) BjuVAR3_SVP: BjupSOC1_AAMF2 (11.23) BjuVAR4_SVP: BjupSOC1_AALF (11.24) BjuVAR4_SVP: BjupSOC1_AAMF1 (11.25) BjuVAR4_SVP: BjupSOC1_AAMF2 (11.26)
SupplementaryTable1.docx
Supplementary Table 1: Amino acids identified as active residues establishing molecular contacts with promoters.
SupplementaryTable2.docx
Supplementary Table 2: DNA fragments (~30 bp) corresponding to B. juncea SOC1 promoter homeologs used for generating DNA models. The nucleotides comprising TFBS are shown in red fonts while the flanking nucleotides are depicted in black.
SupplementaryTable3.xlsx
Supplementary Table 3: Information on SVP sequences from 25 species of Brassicaceae, retrieved from Brassica Database (BRAD). Information on genome coordinates, sub-genome identity, size and nomenclature is provided. The sub-genome identities of Bju_A04_SVP, Bju_A09_SVP, Bju_A09_1_SVP and Bju_A09_1_SVP interpreted from phylogenetic reconstruction are given. The rows highlighted in yellow denote SVP proteins from B. juncea var. tumida for which models were generated using ITASSER.
SupplementaryTable4.xlsx
Supplementary Table 4: Sequence identity matrix for SVP sequences retrieved from 25 Brassicaceae species a. cDNA b. proteins.
SupplementaryTable5.xlsx
Supplementary Table 5: Domain analyses of SVP proteins derived from 25 species of Brassicaceae.
SupplementaryTable6.xlsx
Supplementary Table 6: Sequence identity matrix for MADS domain predicted in SVP sequences retrieved from 25 Brassicaceae species.
SupplementaryTable7.docx
Supplementary Table 7: Quality parameters of modelled, refined and energy-minimized SVP proteins from B. juncea var. tumida, B. juncea cv. Varuna and hypothetical SVP proteins.
SupplementaryTable8.docx
Supplementary Table 8: Amino acid sequences, length and nomenclature of hypothetical SVP proteins.
SupplementaryTable9.docx
Supplementary Table 9: Rigid body parameters of DNA model generated for BjupSOC1_AALF a. Detailed hydrogen bond information for BjupSOC1_AALF b. Local base-pair parameters for BjupSOC1_AALF c. Local base-pair step parameters for BjupSOC1_AALF d. Local base-pair helical parameters for BjupSOC1_AALF e. Minor and major groove widths for BjupSOC1_AALF
SupplementaryTable10.docx
Supplementary Table 10: Rigid body parameters of DNA model generated for BjupSOC1_AAMF1 a. Detailed hydrogen bond information for BjupSOC1_AAMF1 b. Local base-pair parameters for BjupSOC1_AAMF1 c.Local base-pair step parameters for BjupSOC1_AAMF1 d. Local base-pair helical parameters for BjupSOC1_AAMF1 e. Minor and major groove widths for BjupSOC1_AAMF1
SupplementaryTable11.docx
Supplementary Table 11: Rigid body parameters of DNA model generated for BjupSOC1_AAMF2 a. Detailed hydrogen bond information for BjupSOC1_AAMF2 b. Local base-pair parameters for BjupSOC1_AAMF2 c.Local base-pair step parameters for BjupSOC1_AAMF2 d. Local base-pair helical parameters for BjupSOC1_AAMF2 e. Minor and major groove widths for BjupSOC1_AAMF2
SupplementaryTable12.docx
Supplementary Table 12: The non-covalent interactions identified in docked complexes between 9 natural SVP proteins and 3 SOC1 promoter homeologs from B. juncea

Download PDF

Version 1

posted

You are reading this latest preprint version

Combinatorial interactions among natural structural variants of Brassica SOC1 promoters and SVP depict conservation of binding affinity despite molecular diversity

Status:

Version 1

Abstract

Figures

Key Message

Introduction

Materials And Methods

2.1 Retrieval and isolation of SVP sequences

2.2 Phylogenetic reconstruction and sequence analysis

2.3 In-silico three-dimensional structure modelling, cross-validation, and refinement of B. juncea SVP proteins

2.4 In-silico structure modelling of B. juncea SOC1 promoter homeologs

2.5 DNA - Protein interaction and prediction of binding affinities

2.6 Molecular contact analysis and hotspot prediction

2.7 In-vivo validation of DNA – Protein interaction

Results

3.1 Genomes of Brassicaceae retain divergent SVP homeologous copies

3.2 Phylogenetic reconstruction establish genome of origin identities of SVP variants

3.3. SVP proteins reveal substantial sequence-based structural variation

3.4. Nucleic acid models of homeologs of B. juncea SOC1 promoter fragments are structurally diverse

3.6. Unique and shared binding patterns underpin molecular interactions among SOC1 promoters and SVP proteins

3.7. Validation of binding affinity preservation by in-vivo yeast one-hybrid analyses

Discussion

Conclusion

Abbreviations

Declarations

Competing Interests

Funding

Author Contributions

Funding

References

Supplementary Files

Status:

Version 1