Mining MSDIN sequences for candidate cyclic peptides
With a combination of multiple known MSDINs as queries, the genomes revealed the presence of MSDIN genes in all three sequenced Amanita species, i.e. As (strain AsYYH), Ae, and Ar, yielding 27, 23 and 39 MSDIN genes, respectively (Table 1–3). In total 89 MSDINs were discovered, and 37 were novel. All genomes possessed MSDIN genes coding for at least three major cyclic peptide toxins, namely α-amanitin, β-amanitin and phallacidin. Overall, the precursor genes share similar structures, designated as leader peptide, core peptide and recognition sequence [28]. Based on the newly discovered MSDINs, linear and cyclic peptides with and without further modifications, were predicted. A schematic diagram of the genome-guided approach was illustrated in Fig. 2.
Table 1
The MSDIN gene family from two independent strains of Amanita subjunquillea
Leader Peptide | Core Peptide | Recognition Sequence | Monoisotopic Mass | Expressed |
MSDINATCLP | IWGIGCNP | CVGDEVAALLTRGEALC | 918.3541(α-amanitin) | √ |
MSDINATRLP | IWGIGCDP | CVGDEVTALLTRGEALC | 919.3382 (β-amanitin) | √ |
MSDINATRLP | IWGIGCDP | CIGDDVTALLTRGEALC | 919.3382 (β-amanitin) | √ |
MSDINATRLP | AWLATCP | CAGDDVNPTLTRGESLC | 788.316 (phalloidin) | √ |
MSDINATRLP | AWLVDCP | CVGDDINRRVVSAFA-C | 846.3217 (phallacidin) | √ |
MSDMNATRLP | LIQRPFAP | CVSDDVDFALIRRCALVYAESSV | 922.5461 | √ |
MSDINTARLP | HFASFIPP | CIGDDIEMVLKRGESLC | 896.4617 | √ |
MSDINTARLP | TFLPPLFVPP★ | CVSDDIEMVLTRGESLC | 1108.6394 | √ |
MSDINATRLP | LNILPFMLPP | CVGDDVNPTLTRGEDLC | 1135.6536 | √ |
MSDMNATRLP | LIQRPYAP | CVSDDVNSPLTRGESLC | 938.5410 | √ |
MSDINTARLP | IGRPESIP | CVGDDIEMILERGQKLC | 849.4781 | √ |
MSDINTARLP | LRLPPFMIPP | CVGDDIGMVLTRGENLC | 1161.6805 | √ |
MSDVNATRLP | FNFFRFPYP | CIGDDSASVLGLGESLC | 1215.5938 | √ |
MSDINATRLP | SSVLPRP | CVGDVDNIILTSREKLC | 736.4304 | √ |
MSDMNVARLP | ISDPTAYP | CVGGDIHAVLRRGE | 844.4039 | × |
MSDMNVARLP | ISDPTAYP | CVGGDIHAVLRRGE | 844.4039 | × |
MSDINVTCLP | FIFWFFWPP | CVGDDAASIIK-GK | 1267.6291 | × |
MSDINAARLP | FIFPPFFIPP | CVSDDIEMVLTRGE | 1202.6601 | × |
MSDINTARLP^ | AFFPPFFIPP★ | CVSDDIEMVLTRGESLC | 1160.6131 | √ |
MSDINATRLP^ | IPILPIPP | YCSDDANTTLTLGESLC | 840.5545 | √ |
MSDINATRLP^ | LFLLAALGIP | --SDDADSTLTRGESLC | 1008.6444 | √ |
MSDTNDARLP^ | LFFWFWFLWP | SVSDDIDSVLNRGEDLC | 1469.7397 | √ |
MSDINTVCLP | VQKPWSRP | CVGDDIEMILERGEDLC | 978.5472 | √ |
MSDINTAALP | FFFPPFFIPP | CVSDDIEMVLTRGENLC | 1236.6444 | √ |
MSDINITRLP | FFPIVFIPP | CIGDDAASIVKQGENLC | 1057.6073 | √ |
| | | | |
MSDINTVCLP | LQKPWSRP | CVGDDIEMILERGE | 992.5628 | N/A |
MFDINITRLP | IFWFIYFP | CVGDDVTALLTRGE | 1113.5761 | N/A |
1Gray shade indicates MSDIN genes shared by the two strains. Black box for MSDINs present only in strain AsYYH; box of dashed line for MSDINs present only in the previously reported strain; ^ for newly found MSDINs compared to the previous report. |
2Red letters present differences in MSDINs from the two strains. |
3Green letters indicate alternative splicing. |
4★and underlined letters indicate novel cyclic peptides detected with MS and MS/MS. |
5The monoisotopic masses are for unmodified cyclic peptides based on MSDIN core peptides, except for major toxins. |
Table 2
༎ The MSDIN gene family in Amanita rimosa
Leader Peptide | Core Peptide | Recognition Sequence | Monoisotopic Mass |
MSDINSTRLP | IWGIGCNP | SVGDEVTALLTRGEA | 918.3541 (α-amanitin) |
MSDINATRLP | IWGIGCNP | SVGDEVTALLASGEA | 918.3541 (α-amanitin) |
MSDINATRLP | IWGIGCDP | CVGDDVAALTTRGEA | 919.3382 (β-amanitin) |
MSDINATRVP | AWLVDCP | CVGDDISRLLTRGEK | 846.3217 (phallacidin) |
MSDINATRLP | AWDSKHP | CVGDDVSRLLTRGE | 821.3893 |
MSDINATRLP | AWDSKHP | CVGDDISRLLTRGE | 821.3893 |
MSDINATRVP | AWLAECP | CVGDDISHLLTRGE | 770.3494 |
MSDINASRLP | FFIIIVKP | CGNPYVSDDVNSTLTRGE | 957.6124 |
MSDINTSRLP | FIPLGIITILP★ | CVSDDVNTTITRGD | 1177.7547 |
MSDINTACLP | FLFPVIPP | CLSEDANVVVLNSGE | 910.5316 |
MSDINVTRLP | FFPIVFIPP | CI | 1057.6073 |
MSDINIARLP | IFWFIYFP | CVGDDVDNTLSRGE | 1113.576 |
MSDINVTRLP | IFLIMFIPP | CIGDDAASILKQGE | 1071.6263 |
MSDINTSCLP | IFIAFPIPP | CVSDDIQTVLTRGE | 995.5917 |
MSDTNTACLP | IFIAFPIPP | CVSDDIQTVLTRGE | 995.5917 |
MSDINASRLP | ILKKPWAP | SVCDDVNSTLTRGE | 933.5872 |
MSDINVARLP | ISDPTAYP★ | CVGDDIQAVVKRGE | 844.4039 |
MSDINATRLP | IIIVLGLIIP | LCVSDIEMILTRGE | 1044.7383 |
MSDINASRLP | IILAPIIP | CISDDVNTTLTCAE | 830.5702 |
MSDINTTGLP | HFYNLMPP | CFSDDTGMVLVRGE | 999.4709 |
MSDINATRLP | HPFPLGLQP | CAGDVDNFTLIKGE | 986.541 |
MSDINASCLP | LILVANGMAYV | --SDDVSPTLTRGE | 1144.6387 |
MSDINTARLP | SYIPFPPP | CLSEDTNAVLMLGE | 898.4661 |
MSDINTARLP | SYIPFPPP | CLSEDTNAVLMLGE | 898.4661 |
MSDINTSRFP | SYGYRAFP | CVGDDVEMVLMHGE | 941.4468 |
MSDINVTRLP | VLVFIFFLP | CISDDAASIIKLGE | 1075.6543 |
MSDIDTTRLP | LILFTLQP | SIGDDVNPTLTRGEK | 925.5709 |
MSDIHAARLP | FPTRPVFP★ | SAGDDMIEVVLGRGE | 941.5196 |
MSDNNAARLP | FYFYLGIP | SDDAHPILTRGERLA | 1000.5058 |
MSDTNTARLP | ILFIQLEIP | CISDDVHPVLTRGE | 1066.6426 |
MSDVNTTRLP | FNFFRFPYP | CICDDSEKVLELGE | 1215.5865 |
MSEINTARFP | NHGHRTIP | CVGDDIEMVLMHGE | 912.4678 |
MSEINTSRLP | LVFIPPYFAP | CVSDDIQMVLTLGE | 1144.632 |
MFDMNTTCLP | GFIIYAYV | --GDDVNHTLTRGE | 926.4901 |
MLDINTARLP | FSLPTFPP | CVSDEIDVVLKRGE | 886.4588 |
MLDINATRFP | LGRPTHLP | CVGDDVNYIL | 871.5027 |
MTDINDARLP | ILLLIFFWIP | CANDDDENILNRG | 1255.7732 |
MTDINDTRLP | FVWILWLWLA | CVGDDTSILNRGE | 1327.748 |
MPDINVTRLP | LLIIVLLTP | CISDDNNILNRGK | 975.6732 |
1★ and underlined letters indicate novel cyclic peptides detected with MS and MS/MS. |
2The monoisotopic masses are for unmodified cyclic peptides based on MSDIN core peptides, except for major toxins. |
Table 3
༎ The MSDIN gene family in Amanita exitialis
Leader Peptide | Core Peptide | Recognition Sequence | Monoisotopic Mass |
MSDINATRLP | IWGIGCNP | CVGDDVTSVLTRGEA | 918.3541(α-amanitin) |
MSDINATRLP | IWGIGCDP | CVGDDVTALLTRGEA | 919.3382(β-amanitin) |
MSDINATRLP | AWLVDCP | CVGDDVNRLLTRGE | 846.3217(phallacidin) |
MSDINATRLP | AWLTDCP | CVGDDVNRLLTRGE | 786.3443 |
MSDINTTRLP | FVFVASPP★ | CVGDDIAMVLTRGE | 844.4556 |
MSDINTARLP | FIWVFGIP | G–DDIGTVLTRGEK | 959.5342 |
MSDINLTRLP | GIIAIIP | CVGDDDDVNSTLTRGQ | 677.4549 |
MSDINATRLP | IILAPVIP | CISDDNDP–TLTRGQ | 816.5546 |
MSDINTARLP | IPIPPFFFP | FVSDDIEIVLRRGEK | 1055.5917 |
MSDINTARLP | IPIPPFFFP | FVSDDIEIVLRRGEK | 1055.5917 |
MSDINATRLP | IGRPQLLP | CVGGDVNYILISGEK | 874.5461 |
MSDINPTRLP | IFWFIYFP | CVSDVDST-LTRGE | 1113.5761 |
MSDINTARLP | IYRPPFYALP | CVGDDIQAVLTRGE | 1217.6670 |
MSDINTARLP | IIWIIGNP | CVSDDVERILTRGE | 906.5400 |
MSDINVIRAP | LLILSILP | CVGDDIEV-LRRGE | 862.5964 |
MSDINATRLP | LFFPPDFRPP★ | CVGDADNFTLTRGEK | 1213.6357 |
MSDINATRLP | LFFPPDFRPP★ | CVGDADNFTLTRGE | 1213.6357 |
MSDINVIRLP | SMLTILPP | CVSDDASNTLTRGE | 852.4852 |
MSDINTARLP | VFSLPVFFP★ | --SDDIQAVLTRGE | 1033.5709 |
MSDINVTRLP | VFIFFFIPP | CVGDGTADIVRKGEK | 1107.6230 |
MSDINATRLP | VWIGYSP | CVGDDCIALLTRGE | 802.4086 |
MSDINATRLP | VWIGYSP | CVGDDCIALLTRGE | 802.4086 |
MTDINDTRLP | FIWLLWIWLP | SVGDD-NNILNRGEE | 1367.7867 |
1★ and underlined letters show novel and known cyclic peptides detected with MS and MS/MS. |
2The monoisotopic masses are for unmodified cyclic peptides based on MSDIN core peptides, except for major toxins. |
Transcriptome of As
As was chosen to be sequenced via Illumina RNA-Seq technique, as this was a readily available species. The specific sample was collected only a few meters apart from the strain AsYYH. In total 11.7 Gb clean reads were obtained. The assembled transcriptome yielded 46.1 Mb with 25,453 unigenes. BLAST search against the transcriptome with MSDINs from the two As genomes produced 24 MSDINs, four more than those in our previous result [14]. These four sequences were used as queries to re-search the genome, and the result confirmed their presence (Table 1). For the previously sequenced genome, 22 out of the 24 MSDIN genes were found to be expressed at the transcription level, i.e., 91.7% of the MSDINs were expressed (Table 1). For the new genome of the strain AsYYH, 23 out of 25 genes, i.e. 92.0%, were expressed (Table 1). Our result showed that most of MSDINs were expressed at the transcription level in As. Regarding the gene structure, exons and introns were conserved among most of the MSDIN genes, while alternative splicing was detected in two of the MSDIN transcripts marked in green in Table 1. The expressed MSDINs were the focus for potential cyclic peptide production in the following MS and MS/MS analyses.
LC-HRMS and LC-MS/MS analyses on novel cyclic peptides in As
With the Agilent LC-HRMS platform, correlations between measured masses and predicted peptides based on genomic data were carefully assessed. There were no linear versions of these peptides detected. Further, hydroxylation(s), sulfoxidation and cross-bridging, by themselves or by various combinations, were not detected on predicted linear or cyclic peptides. However, two matches were found between theoretical and measured masses of two predicted cyclic peptides. These two matches corresponded to the cyclized core peptide sequences of TFLPPLFVPP (named CylG1) and AFFPPFFIPP (named CylG2) without further modifications, respectively. The molecular formula for the candidate new cyclic peptide CylG1 is C59H84N10O11, and the theoretical molecular weight is 1109.6394 [M + H]+. The measured molecular weight was 1109.6398 [M + H]+, with mass discrepancy of 0.36 ppm (Fig. 3A). The molecular formula for CylG2 is C65H80N10O10 with the theoretical molecular weight at 1161.6131 [M + H]+, and the measured molecular weight was 1161.6161[M + H]+, with mass discrepancy of 2.53 ppm (Fig. 3B). Two adduct ions of [M + Na]+ and [M + K]+ were shown in the figure as well. CylG1 and CylG2 were treated as candidate new cyclic peptides for further characterization.
In order to determine amino acid composition of CylG1 and CylG2, the candidate cyclic peptides were submitted to LC-MS/MS. The resultant spectra were first analyzed via Center for Computational Mass Spectrometry (CCMS) using the database of UniProtKB/Swiss-port (Fig. 3C). The obtained peptide sequence for CylG1 was PPLVFTPPLE, only one amino acid different at the C-terminal (E vs. F, one letter codes used here and below for easy comparison), and this was due to that the CCMS only used linear peptide databases (Fig. 3C underlined). The molecular weight of F is 165.19 and losing H2O will make the mass 147.18, close to that of E (147.13) and therefore accounting for the presence of E over F. For Cy1G2, similar processes were applied, and discrepancy was also found at the last predicted amino acid (Fig. 3D underlined). Some of our peptides did not return any results with CCMS, and they were then analyzed with other platforms (Mascot, pNove and XCMS). In general, the results were largely in line with the above. Although the automated pipelines offered some evidence, we mostly relied on the following manual process to determine the peptide sequences.
Manual amino acid composition analysis was mostly based on y-type fragmentation of linear peptides. For CylG1, fragment ions were calculated using the core peptide, and the result of searching for these ions in the MS/MS spectrum was shown in Fig. 3E. Every fragment ion was manually checked and confirmed. Illustrated in Fig. 3E, y-type fragments (y2 to y9) were shown to be in strong agreement with the core peptide, and all the amino acids could be readily explained by the y-type ions. Immonium ions for P, L and F were also identified for CylG1. As a result, CylG1 was assigned as a novel cyclic peptide with the amino acid composition and combinations matching the circular MSDIN core peptide sequence Cyclo(TFLPPLFVPP) (Fig. 3C, E). CylG2 underwent the same analyses, in which case immonium ions for P, I and F were identified. Similarly, y-type fragments (y2 to y9) were shown to be in strong agreement with the core peptide, and all the amino acids complied with the y-type fragment ions (Fig. 3D, F). In conclusion, CylG2 was confirmed to be a novel cyclic peptide with the sequence Cyclo(AFFPPFFIPP). The peptide only differed from antamanide, Cyclo(FFVPPAFFPP), by one amino acid (I vs. V) when they were compared in circular structures.
Cloning MSDIN genes and LC-HRMS analysis in As AsBJ strain
Because the two cyclic peptides, CylG1 and CylG2, were not detected in the AsBJ strain, we were intrigued to see if the corresponding MSDINs were present. All attempted PCR reactions came back negative, while all the controls for α- and β-amanitins were positive. The result was consistent with that the lack of the two cyclic peptides in AsBJ was due to the absence of the two corresponding MSDIN genes.
Novel cyclic peptides in other Amanita species
CylG1 and CylG2 in As were our first effort towards finding novel cyclic peptides in the genus. Taking advantage of this genome-guided approach, we analyzed three other Amanita species, i.e., A. pallidorosea (Apa), Ae, and Ar. In total, 10 additional new cyclic peptides were discovered (Table 4). Relevant data analyses of mass spectrometry were included in the supplementary information (Additional file 1: Figures S1–S6). Three new cyclic peptides were found in Ar, and their sequences, in the same order of their corresponding MSDIN core peptides, were Cyclo(ISDPTAYP), Cyclo(FIPLGIITILP) and Cyclo(FPTRPVFP) (Additional file 1: Figures S1 and S4), which were named CylH1, CylH2, and CylH3 respectively. Five new cyclic peptides were found in Apa, and their sequences were Cyclo(EFIVFGIFP), Cyclo(FVIIPPFIFP), Cyclo(YFFNDHPP), Cyclo(TIHLFSAP) and Cyclo(MHILAPPP) (Additional file 1: Figures S2 and S5), which were named CylI1, CylI2, CylI3, CylI4, and CylI5, respectively. Further information on MSDIN genes in Apa were included in Additional file 1: Table S1. Two new cyclic peptides were found in Ae, and their peptide sequences were Cyclo(FVFVASPP) and Cyclo(LFFPPDFRPP) (Additional file 1: Figures S3 and S6), which were named CylJ1 and CylJ2 respectively. An previously known cyclic peptide called amanexitide with the sequence Cyclo(VFSLPVFF) [16] was also found in this study (Additional file 1: Figures S7). In conclusion, 12 novel and one known cyclic peptides were discovered in the four sequenced species (Talbe 4).
Table 4
Twelve novel and one known cyclic peptides discovered in Amanita subjunquillea, A. rimosa, A. exitialis and A. pallidorosea.
Species | Cyclopeptide sequence | Molecular formula | Theoretical(m/z) | Measured(m/z) | δ(ppm) |
A. subjunquillea | TFLPPLFVPP (CylG1) | C59H84N10O11 | 1109.6394 | 1109.6398 | 0.36 |
AFFPPFFIPP (CylG2) | C65H80N10O10 | 1161.6131 | 1161.6161 | 2.53 |
A. rimosa | ISDPTAYP* (CylH1) | C39H56N8O13 | 845.4039 | 845.4040 | 0.36 |
FIPLGIITILP (CylH2) | C61H99N11O12 | 1178.7547 | 1178.7555 | 0.64 |
FPTRPVFP (CylH3) | C48H67N11O9 | 942.5196 | 942.5191 | 0.53 |
A. pallidorosea | EFIVFGIFP (CylI1) | C56H75N9O11 | 1050.5658 | 1050.5694 | 3.35 |
FVIIPPFIFP (CylI2) | C65H90N10O10 | 1171.6914 | 1171.6941 | 2.30 |
YFFNDHPP (CylI3) | C51H59N11O12 | 1018.4417 | 1018.4421 | 0.35 |
TIHLFSAP (CylI4) | C42H62N10O10 | 867.4723 | 867.4733 | 1.14 |
MHILAPPP (CylI5) | C41H64N10O8S | 857.4702 | 857.4714 | 1.39 |
A. exitialis | FVFVASPP (CylJ1) | C44H60N8O9 | 845.4556 | 845.4582 | 3.07 |
LFFPPDFRPP# (CylJ2) | C63H83N13O12 | 1214.6357 | 1214.6357 | 0.00 |
VFSLPVFFP^ | C56H75N9O10 | 1034.5709 | 1034.5734 | 1.97 |
1*indicates that the sequence has been found in A. subjunquillea, A. pallidorosea and A. rimosa. |
2#The superscript number 2 indicates this sequence has two copies. |
3^Indicates amanexitide. |
LC-HRMS analysis on As mycorrhizae and gene cloning for major toxins
Successful tracing of As hyphae to plant roots were carried out in this study. Figure 4A illustrated a fresh fruiting body of As showing apparent association with a network of plant roots, and 4B displayed a cross section of a selected root tip with signs of mycorrhizal symbiosis. All four major toxins, i.e., α-amanitin, β-amanitin, phalloidin and phallacidin, were detected in LC-HRMS, with mass discrepancies of 2.12, 0.50, 0.76, and 2.93 ppm, respectively (Fig. 5). Cloning of the MSDINs for the major toxins was successful and the corresponding MSDINs of the toxins were shown above each section of the figure. We were not able to quantify the toxins because of limited amount of the material. However, α-amanitin was clearly the most significant as it showed as a robust peak in the HPLC chromatograms.