Residues 452-567 of E. cuniculi CPSF73 (Genbank KMV65242.1) and residues 525-639 of E. cuniculi CPSF100 (Genbank NP_597379) were amplified by PCR from cDNA clones (Katinka et al, 2001). A ligation independent cloning protocol used the NdeI and BamHI restriction sites present on modified pET vectors. The ecCPSF73(452-567) was inserted into a pET-MCN ampicillin-resistant plasmid for expression following a His6-tag and tobacco etch virus (TEV) protease site. To allow for co-expression, the ecCPSF100(525-639) used a pCDF-MCN plasmid for streptomycin resistance and was expressed without any purification tags.
Protein expression used Escherichia coli BL21(DE3) lysY (New England Biolabs) co-transformed with both expression vectors. For unlabelled protein, 500 mL of lysogeny broth (LB) with 100 mg/L ampicillin and 50 mg/L streptomycin was incubated with shaking at 37 °C, using bacteria from a 10 mL LB overnight culture also grown at 37 °C. Expression was induced with 0.25 mM isopropyl β-D-1-thiogalactopyranoside (IPTG) at an OD600nm of 0.6, followed by overnight growth at 25 °C. Uniform incorporation of stable isotope used the same protocol but with M9 media supplemented with 1 g/L [15N]-ammonium chloride and 2 g/L [13C]-D-glucose. Fractional [13C]-labelling used 0.2 g/L [13C]-D-glucose and 1.8 g/L natural abundance D-glucose, with 1 g/L [15N]-ammonium chloride also added. Amino acid-specific labelling of isoleucine, valine or leucine used 500 mL of a base medium of M9 media with 1 g/L [15N]-ammonium chloride and 2 g/L natural abundance D-glucose. Thirty minutes prior to induction 100 mg/L [13C,15N]-Ile, [13C,15N]-Val, or 13C,15N-Leu was added to the medium. Specific labelling of deuterated amino acids used a similar strategy, with either 100 mg/L [2H]-Phe in 500 mL M9, or a combination of 100 mg/L [2H]-Tyr and 100 mg/L [2H]-Trp in a culture size of 250 mL.
Following expression the bacteria were collected by centrifugation at 4500 x g for 15 min at 4 °C. The cell pellet was resuspended in Binding Buffer containing 20 mM TRIS (pH 7.5), 500 mM NaCl, 2.5mM imidazole, and 5 % (v/v) glycerol, supplemented with lysozyme and stored at -80 °C. For purification the frozen sample was thawed in ice then subjected to sonication on ice at 20 % power for 5 min with alternating 30 s pulse and recovery periods. The lysate was cleared by centrifugation at 20000 x g for 1 h at 4 °C, passed through a Whatman GD/X 25 filter, and added to a 1 mL Nuvia (Bio-Rad) Ni2+-column. After 15 mL of additional Binding Buffer, the column was washed with 7 mL of the same buffer but with 25 mM imidazole. Elution used the buffer with 500 mM imidazole into 1 mL fractions. Samples containing protein were identified by a rapid Bradford assay, and 2.5 mL were pooled and exchanged back to the Binding Buffer by using PD-10 columns. After an overnight digest with 100 μg/mL His6-tagged tobacco etch virus protease (TEV) at 4 °C, the sample was added to a 1 mL Nuvia Ni2+-column and the flow-through was collected. An additional 1.5 mL of Binding Buffer was added to ensure that all cleaved protein was collected. The sample was concentrated to 500 μL using 10 kDa cutoff Amicon Ultra 4 mL centrifugation device then changed to the final NMR buffer of 20 mM TRIS (pH 7.0), 150 mM NaCl, 2 mM DTT using a NAP-5 column. A final concentration of the sample again used a 10 kDa cutoff Amicon Ultra 4 mL centrifugation device. Analysis of the samples by SDS-PAGE confirmed sample quality and a 1:1 ratio of ecCPSF73(452-567):ecCPSF100(525-639). Heterodimer protein concentration was determined from absorbance at 280 nm and an extinction coefficient of 22970 M-1 cm-1 calculated by using ProtParam (https://web.expasy.org/protparam/). The final samples included 10% (v/v) D2O added for the lock, or for the unlabelled and aromatic-specific deuterated samples prepared in 100 % D2O.
Assignment spectra were collected at 298 K on a Bruker Neo Avance spectrometer at 700 MHz or 800 MHz, equipped with a standard triple resonance gradient probe or cryoprobe, respectively. NMR data were processed by using NMRPipe/NMRDraw software (Delaglio et al. 1995) and NMR spectra were visualized and analyzed using Sparky (T. D. Goddard & D. G. Kneller, University of California, San Francisco, USA).
The backbone 1HN, 1Hα, 13C’, 13Cα, 13Cβ, and 15NH resonance assignments for the minimal C-terminal heterodimer were initially determined on a sample of 350 μM [13C,15N]ecCPSF73(452-567)/ecCPSF100(525-639) by using the data from a two-dimensional (2D) 1H,15N-TROSY spectrum, and three-dimensional (3D) HSQC versions of HNCO, HN(CA)CO, HNCA, HNCACB, CBCA(CO)NH, HNHA, HA(CACO)NH, and N(COCA)NNH spectra. TROSY versions of 3D HNCO and HNCA spectra were also collected to improve the sensitivity of certain resonances. Due to the large number of residues in the complex, and the corresponding spectral overlap, a standard approach was insufficient to complete the assignment. Unfortunately, it is only possible to produce the complex by co-expression and thus labelling of a single peptide within the complex is not possible. We therefore opted to produce three additional 15N-labelled samples in which only isoleucine, valine or leucine were 13C-labelled. The concentration of the heterodimer in these 13C-Ile, 13C-Val and 13C-Leu samples was 625, 150 and 280 μM, respectively. TROSY 3D HNCO and HNCA spectra were acquired for the 13C-Ile, 13C-Val and 13C-Leu samples, and 2D 1H-15N planes of HSQC-HNCO and HSQC-HNCA spectra were also measured for the 13C-Ile, 13C-Val samples. Annotated overlays of these latter spectra are included as examples (Fig. 1A,B).
Assignment of the aliphatic sidechain resonances used 2D 13C-HSQC and 3D H(CCO)NH, (H)C(CO)NH, H(C)CH-TOCSY, and (H)CCH-TOCSY spectra, measured with the 350 μM uniformly 13C,15N-labelled sample. The sidechain amides of asparagine and glutamine were none-stereospecifically assigned using the same sample and a 3D 15N-HSQC-NOESY (120 ms mixing time) spectrum. Isoleucine and valine sidechain resonances required 3D H(C)CH-TOCSY and (H)CCH-TOCSY spectra measured on the 625 μM 13C-Ile and 150 μM 13C-Val samples, respectively. In addition, a 3D (H)CCH-TOCSY spectrum was acquired on the 280 μM 13C-Leu sample. Methyl-centred constant-time 13C-HSQC spectra were collected for all four samples, and additionally on a 880 μM 10%13C-labelled sample in order to allow for stereospecific methyl assignment (Senn et al 1989). Since this latter sample was also uniformly 15N-labelled, it was used to collect a 3D 15N-HSQC-TOCSY (60 ms mixing time) spectrum to validate the connection between the sidechain assignments and backbone amide resonances.
Aromatic side chain resonances were assigned based on 2D 1H,1H-NOESY (120 ms mixing time), 1H,1H-TOCSY (60 ms mixing time) and double-quantum-filtered 1H,1H-COSY spectra, from an 458 μM unlabelled sample prepared in D2O. The assignment process was assisted by the use of two additional samples in which one or more deuterated aromatic amino acids were used during the expression. 1H,1H-TOCSY (60 ms mixing time) and double-quantum-filtered 1H,1H-COSY spectra were collected on a sample of 2H-Phe (150 μM), and a sample including both 2H-Tyr and 2H-Trp (85 μM), with each sample prepared in D2O. By using an overlay of all three spectra, the sidechain aromatic 1H spin systems were assigned (Fig. 1C,D). The aromatic assignments were subsequently connected to the aliphatic part of the residue by using the initial 2D 1H,1H-NOESY on the unlabelled heterodimer in D2O.
Assignments, data deposition and chemical shift analysis
By using four different samples, the backbone resonance assignment is essentially complete (Fig. 2). Missing backbone resonances are mainly due to line broadening in the N-terminal Gly-His-Met-Leu residues of the ecCPSF73(452-567) peptide, and for the ecCPSF100(525-639) peptide the N-terminal Met-Ser-Asp, residues Pro570-Arg571, residue Gly612, and several residues within a likely loop from Gly623-Tyr626. There are also fourteen other backbone resonances scattered throughout the two peptides that cannot be unambiguously assigned. Anomalous backbone chemical shift values for ecCPSF73 include upfield-shifted Ser507 1HN (5.79 ppm) and Asp485 15NH (112.96), and downfield-shifted Glu547 1Hα (5.24 ppm). For ecCPSF100, anomalous backbone resonances include upfield-shifted Asp604 15NH (112.96), and downfield-shifted Gly556 1Hα2 (5.24 ppm) and Asp618 1Hα (6.19 ppm). For the sidechains, assignment of the observable 1H nuclei and protonated 13C and 15N nuclei is also nearly complete. Anomalous sidechain assignments are all upfield-shifted proton resonances and include Lys454 1Hβ2,3 (2.68 ppm), Glu499 1Hβ2 (-0.104 ppm), Glu499 1Hβ3 (0.837 ppm), Glu499 1Hγ2 (1.52 ppm) and Ala558 1Hβ (0.28 ppm) from ecCPSF73, and Leu601 1Hβ3 (-0.28 ppm), Leu601 1Hδ1 (-0.54 ppm) and Arg629 1Hγ2,3 (0.56 ppm) from ecCPSF100. The 1H, 13C and 15N NMR chemical shift data have been deposited in the Biological Magnetic Resonance Data Bank (https://bmrb.io/) under BMRB accession code 51624.
As a first step in the characterization of the minimal C-terminal heterodimer of ecCPSF73-ecCPSF100, secondary structure predictions were obtained by using TALOS-N (Fig. 3A,B) from the online server (https://spin.niddk.nih.gov/bax/nmrserver/talosn/; Shen and Bax, 2013). Secondary chemical shift values (Δδ) were also determined (Fig. 3C,D) based on a comparison to backbone 13Cα and 13C’ values predicted for the random coil by using the online ncIDP server (https://st-protein02.chem.au.dk/ncIDP/; Tamiola et al, 2010). The overall pattern of secondary structure elements is similar between the constructs from ecCPSF73 and CPSF100. In addition, the secondary structure is consistent with initial predictions made from the cryo-EM data (Zhang et al, 2020; Sun et al, 2020).