Establishing compatibility between peptide-activating and peptide-coupling enzymes
At the outset of our studies, we searched for a broadly applicable enzyme for C-terminal peptide esterification, to provide accessible reactive handles for Peptiligase. Accordingly, we explored the peptide amidase (PAM) from Stenotrophomonas maltophilia, which affords sequence-independent C-terminal peptide modification with absolute regioselectivity24. Using computational redesign, we significantly expanded the synthetic utility of PAM25–26. However, after exhausting different protein engineering strategies, our surveys for a mutant that catalyzes direct esterification reactions in aqueous solution were unfruitful, leading to the requirement of a bridge joining the two biocatalysts. Thus, we began to consider hydrazide chemistry, which was implemented by the Liu group and has been one of the most widely used extensions to NCL27. In this method, the thioester functionality of the acyl donor is initially masked in the form of a C-terminal hydrazide and sequentially retrieved via a combination of nitrite oxidation and thiolysis. We envisioned that this strategy might be adapted for Peptiligase-catalyzed ligation, while it was unclear if there is an appropriate alcohol reagent for peptide acyl shifting.
We initially explored the feasibility of using 2-hydroxyacetamide, which would afford the peptide carboxamidomethyl (Cam) ester (standard Peptiligase substrate), for intermediate ester formation. The model peptide hydrazide Ac-DFSKL-N2H3 was oxidized using sodium nitrite in an acidic buffer solution at -15 °C, producing a peptide azide. Subsequently, 2-hydroxyacetamide was added to form the corresponding peptide Cam ester. Finally, the acyl acceptor ALKKA-NH2 (1.5 equiv.) and Omniligase-128 (0.003 equiv., a commercially available enzyme from the Peptiligase family) were added to the reaction mixture, and the ligation was allowed to proceed for 30 min at pH 8.5 and room temperature. To our delight, the desired ligation product Ac-DFSKLALKKA-NH2 was formed (20% yield). This preliminary result demonstrated the possibility of using peptide hydrazide in Peptiligase-catalyzed peptide ligation in a one-pot approach, albeit with low ligation efficiency. Detailed analysis of intermediates in the cascade reactions revealed that multiple side products were formed during the esterification process, most likely due to Curtius rearrangement of the peptide azide. Therefore, for a high ligation yield and a clean reaction, it is crucial to rapidly convert the peptide azide into the corresponding ester with a strong nucleophile. The formed peptide ester is also expected to be stable in the weakly alkaline solution and fit the binding sites of Peptiligase. Besides, the alcohol should be a good leaving group for the enzymatic S-O exchange.
With these stipulations in mind, we investigated a panel of alcohols for their efficiency in the model 5 + 5 reaction (Fig. 2), including aliphatic alcohols, aromatic alcohols, fluoroalcohols, and 2-hydroxyacetamide analogues. Most of the tested alcohols were able to mediate peptide ligation with moderate overall yields (10%-50%). In particular, in phenols-mediated reactions, the peptide phenolic ester performed well in the ligation step. Accordingly, we further examined a series of phenol derivatives containing polar and electron-withdrawing substituents on the aromatic ring to improve their nucleophilicity and aqueous solubility. Gratifyingly, 4-hydroxybenzoic acid (HBA, e6), 4-hydroxyphenylacetic acid (e7), and 4-hydroxyphthalic acid (e8) gave almost quantitative conversion of the peptide ester within 5 min. The resulting peptide esters demonstrated good chemical stability under ligation conditions, and their ligation efficiencies (> 90%) were comparable to those of the standard peptide Cam esters. Considering its overall performance and commercial availability at scale, HBA was selected for further studies.
The scope of the enzymatic peptide hydrazide ligation was then investigated. To map the substrate profile of six binding pockets of Omniligase-1 in HBA-mediated peptide hydrazide ligations, we performed an extensive series of reactions with four acyl donor and two acyl acceptor site-saturation peptide libraries under identical reaction conditions (Fig. 3a). We were pleased to observe that the desired ligation products were obtained in all tested reactions of this thorough substrate scan, and a majority of ligations proceeded smoothly with high coupling yields (> 70%) in 1 h. Remarkably, using this activation-ligation approach, we were even able to ligate peptides that usually represent poor substrates for Omniligase-1, for example, P1 = Pro peptide, with over 80% coupling efficiency. These results indicated that the broad sequence compatibility of Omniligase-1 was well maintained and even partially improved in this newly devised HBA-mediated ligation process. In principle, the small portion of less efficient ligations may be accomplished using Peptiligase variants with different substrate profiles.
Having demonstrated the utilization of peptide hydrazides in Peptiligase-catalyzed ligation, we continued to investigate whether the peptide modifying enzyme PAM is feasible for converting the most basic SPPS products (peptide amides/carboxylic acids) into the corresponding peptide hydrazides, by using computationally redesigned PAM12B25. Preliminary experiments showed that the hydrazidation of the model peptide Ac-DFSKL-NH2 proceeded smoothly in aqueous solution at room temperature through kinetic control. In the presence of 0.5 M hydrazine and 0.00001 equiv. PAM12B, the conversion of the peptide amide was complete after 45 min, giving the hydrazidation product with 96% efficiency. Based on the initial success, we next evaluated the substrate scope of this peptide-activating enzyme. Structural analysis revealed that the peptide substrates interact with the enzyme mainly via the last two C-terminal residues. Accordingly, we investigated the substrate sequence preference of the two terminal residue binding pockets of PAM12B by performing hydrazidation reactions with two site-saturation peptide amide libraries. In most reactions, hydrazidation products (except for P1 = Pro or Asn peptides) could be obtained in over 90% yield in 1 h (Fig. 3b), demonstrating the desired versatility of the peptide-activating enzyme.
However, PAM12B-catalyzed modification of peptide carboxylic acids is practical only in organic environments (H2O < 10%) due to the thermodynamic barrier, which restricts the direct activation of recombinant proteins. To overcome this severe limitation, we sought to recruit an additional biocatalytic module for the selective functionalization of the peptide or protein carboxyl terminus. In the animal kingdom, many secreted peptides are processed by a peptidyl-glycine oxidation system29. During the transformation, peptidyl-glycine hydroxylating monooxygenase (PHM) catalyzes the stereospecific hydroxylation of the α-carbon of the terminal glycine with oxygen and ascorbate, and sequentially, one molecule of glyoxylate is removed by peptidyl-α-hydroxyglycine amidating lyase (PAL) to form the des-glycine peptide amide, which is the ideal substrate for PAM. With the expectation that the oxidative enzymes and hydrolytic enzymes might work cooperatively, we prepared PHM from Rattus norvegicus and PAL from Exiguobacterium sp. by recombinant expression and tested their activities. Since both PHM and PAL interact mainly with the last two residues of peptide substrates, we investigated the substrate scope of this oxidative system at the penultimate position. In the presence of PHM and PAL, almost all 20 model peptides DLSYXG-OH (1 mM) were quantitatively converted to the corresponding peptide amides in 15 min under identical reaction conditions (4 h for P1 = Cys peptide, Fig. 3c). Generally, these afforded products can be swiftly (< 30 min) and efficiently (> 90% yield) hydrazinolyzed by PAM12B in a one-pot reaction as expected (Fig. 3c).
Having acquired all of activating modules for supplying appropriate substrates for the coupling module, we finally tested the complete reaction route utilizing all catalytic modules with the model peptide Ac-DFSKVG-OH (1). Briefly, this native peptide carboxylic acid was successively converted to Ac-DFSKV-NH2 (2) by PHM and PAL and to Ac-DFSKV-N2H3 (3) by PAM12B. Then, excessive HNO2 was utilized to remove residual hydrazine and oxidize 3 at -15 °C. Upon the addition of HBA, the corresponding peptide ester (4) was obtained and subsequently conjugated with equivalent ALKKA-NH2 by Omniligase-1 to produce Ac-DFSKVALKKA-NH2 (5, Fig. 3d). The whole process was conducted in one pot in 3 h with only trace amounts of enzymes, exhibiting excellent catalytic efficiency, chemoselectivity, and regioselectivity in the presence of a multitude of side-chain reactive functionalities. These results demonstrated that all catalytic modules exhibited broad substrate spectrum and functioned well in series, and the PALME platform was ready for further investigation.
Broad application scope of the PALME platform for protein synthesis and functionalization
We next examined the PALME’s utility for practical applications. Considering the synthetic availability and cost, we were poised to prepare long peptide/protein hydrazides (> 10 residues) using peptide activating enzymes and synthesize short peptide hydrazides directly via SPPS. Initially, we tested the feasibility of enzymatic peptide N-to-C sequential condensation and employed this strategy to synthesize exenatide30, the API in the antidiabetes drugs Byetta® and Bydureon®. We divided exenatide into three segments, which were prepared in the form of peptide hydrazide (N-part) or peptide amide (middle part and C-part). The N-terminal segment was transformed into its HBA ester and ligated with the middle segment, giving the conjugation product with a 48% isolated yield. The obtained peptide hydrazide was then activated and conjugated with the C-terminal segment to produce 3.0 mg of purified exenatide with a 63% isolated yield. (Fig. 4a). Compared to multifragment condensation in the C-to-N direction31, our strategy avoids the multiple intractable metal-mediated deprotection processes at N-termini, providing a more efficient protocol for chemoenzymatic total synthesis of biomacromolecules.
In addition to intermolecular conjugation, we investigated intramolecular ligation that could generate more rigid cyclic peptides than their linear substrates32. We selected sheep myeloid antimicrobial peptide (SMAP)33, which does not contain Cys/Ser/Asp/Asn residues for sequence-limited chemical ligation or transpeptidation in aqueous solution, as the tested object. After esterification and ligation processes, SMAP was cyclized with 86% efficiency according to HPLC analysis (Fig. 4b), illustrating that our activation and ligation strategy was a well supplement to the current peptide cyclization methodologies.
Encouraged by the successes of peptide sequential condensation and cyclization, we asked whether this strategy could be applied on the object prepared by recombinant expression, hence Cys-free 4-oxalocrotonate tautomerase (4-OT) was selected as the target for semisynthesis. 4-OT is a fascinating enzyme that promiscuously catalyzes various important synthetic reactions, including Michael addition34, aldol condensation35, and epoxidation36. Since 4-OT is considered to lie at the interface between organocatalysis and biocatalysis, this protein scaffold serves as an excellent template for chemical engineering to further broaden the synthetic versatility of biomacromolecules. We prepared the C-terminal part of 4-OT by recombinant expression followed by removal of the His-SUMO-tag and then attempted to conjugate it with the SPPS-synthesized N-terminal fragment. The full-length protein was obtained after HPLC purification, and the Michaelase activity of the refolded semisynthesized enzyme resembled that of recombinant 4-OT (Fig. 4c). Subsequently, we performed N-terminal functionalization of much bulkier recombinant proteins. In 10 molar equivalents of biotin/FITC-modified peptide hydrazides, the 10 kDa ubiquitin-like modifier FAT1037 and the 12 kDa rationally designed HIV-1 immunogen C4S338 could be labeled with an efficiency of up to 92% (Fig. 4c). The modification process also successfully worked on the 248-mer enhanced green fluorescent protein (EGFP), which implied that our strategy is highly promising for the functioning of the majority of human proteins (those with a mass of up to ~ 30 kDa).
Next, we tested whether the proteins bearing post-translational modification in the C-terminal region are accessible by the PALME platform. As one of the most widely investigated regulatory proteins39, ubiquitin (Ub) has hitherto been a classic target of chemical protein synthesis and semisynthesis, despite the tedious desulfurization process after NCL. We divided Ser65-phosphorylated ubiquitin into two segments, and the majority part of the targeted protein can be obtained via recombinant expression. Accordingly, recombinant Ub(1–59)-Gly was amidated by PHM and PAL, followed by PAM12B-mediated hydrazidation. Afterward, the purified protein hydrazide was esterified and ligated with the synthetic 17-mer phosphorylated peptide, successfully producing the full-length phosphorylated Ub (Fig. 4d). Overall, by harnessing multiple activating and coupling enzymes that present both strict regioselectivity and broad substrate specificity, we demonstrated that the designed PALME platform should be able to cover the expected full spectrum of applications.
Semisynthesis of intractable proteins through the PALME platform
Having verified the PALME platform’s broad application scope, we next attempted to apply it to the currently intractable targets. Recombinant proteins bearing multiple adjacent Cys residues are tricky to handle because cysteine-based chemical methods require pretreatment of the native proteins to reduce disulfide bonds40. When applying thiol-dependent chemical methods to synthesize these targets, native Cys residues were often mutated or protected to avoid side reactions. With the thiol-free and native Cys-independent activation and ligation strategy in hand, we were poised to activate and functionalize NrdH-redoxin, an electron donor bearing a CXXC catalytic motif at the active site that forms a disulfide bond in the oxidation state. This protein is a promising drug target, since it functions cooperatively with prokaryote-specific class Ib ribonucleotide reductase and is essential for cell metabolism41. By utilizing PHM, PAL, PAM12B, and Omniligase-1 together, we converted the recombinant NrdH-Gly to the corresponding protein hydrazide and labeled it with biotin/FITC at the C-terminus (Fig. 5a). The vital disulfide bond was not disturbed throughout the process, suggesting that the PALME platform could be a suitable supplement for handling intractable multiple-Cys proteins without protection/deprotection processes.
Finally, we examined the PALME’s utility in protein semisynthesis applied to internal regions, which is one of the most profoundly demanded methodologies in protein synthesis3. We chose Lys56-acetylated human mitochondrial heat shock protein 10 (mHSP10), which participates in cellular protein folding by composing a chaperonin symmetrical football complex with mitochondrial heat shock protein 60 (mHSP60)42, as the demonstrating target. The location of the desired modification site within the protein sequence most often determines whether semisynthesis is viable. Since the modified Lys56 is located in the internal region of mHSP10, a multistep ligation strategy involving the assembly of three segments was expected to be adopted. Unfortunately, there is no Cys or even Ala residue for conventional NCL protocols between Lys56 and Asp102 at C-terminus. Owing to the PALME platform’s unprecedentedly broad substrate spectrum in terms of both the sequence and the C-terminal functionality, we could design a synthetic scheme that requires chemical synthesis of only one 16-mer peptide amide. First, we converted the synthetic peptide amide to the almost equivalent amount of the corresponding peptide hydrazide. Next, the protein hydrazide of N-part was produced from the recombinant protein glycine smoothly. Afterward, two rounds of esterification, ligation, and HPLC purification were performed following a general protocol for sequential fragment condensation, giving out the full-length acetylated mHSP10 (Fig. 5b). Overall, the platform’s modular nature could provide researchers with flexible selections of input substrates and output functions and their combinations, which would create plentiful retrosynthetic disconnections for disassembling hard-to-access proteins.