Three surviving phenotypes after the loss of > 99.9 % of viable cell culture
To test if all transformed BL21(DE3) cells over-express target protein (Supplementary Fig. 1), E. coli was streaked onto LB agar plates with increasing IPTG concentration (Fig. 1a). A variety of sfGFP-tagged IMPs along with soluble sfGFP were streaked for comparison including an ammonium transporter (amtB)[23], three drug transporters (bcr[24], mdtL[25], mdtG[25]), a sugar transporter (setB[26]) and the soluble protein sfGFP[22]. No exogenous overexpressed protein was observed at 0 mM IPTG. The expression of sfGFP is clearly visible throughout and, as anticipated, IMPs expression is significantly lower. Against expectation, as the IPTG concentration increased to 1 mM, the number of CFU counts dramatically fell from a confluent growth to only two CFUs in some cases (Fig. 1a; setB-sfGFP). Phenotypically, the CFUs formed three categories described as large and green (LG), large and white (LW) or small colony variants (SCVs; Fig. 1b, 1c, Supplementary Fig. 2). This applied to both IMPs and the soluble protein.
Next, these results were quantified using the well characterised protein targets sfGFP and mdfA[27,28]. Both proteins were chosen because they have been overexpressed to milligram quantities sufficient for structure determination by X-ray crystallography[29,30]. Two E. coli protein production strains, BL21(DE3) and BL21-Gold(DE3)pLysS (Agilent Technologies), were chosen. Surprisingly, both the soluble protein (Fig. 2a) and the IMP (Fig. 2b) behave identically with the CFU numbers remaining high from 0 to 0.1 mM IPTG concentrations followed by a collapse in numbers at IPTG concentration above 0.1 mM. In addition, both E. coli strains behaved identically (Fig. 2) suggesting that the strain does not account for this sharp drop in CFUs. Calculating the percentage drop for the soluble sfGFP protein, by averaging the CFU count between the 0 – 0.1 mM IPTG counts (Fig 2a, red bar; 26.19 million) and averaging between 0.2 – 1.0 mM IPTG counts (Fig2a, green bar; 3000 CFUs) gives a population decrease of 99.99 %. Similarly, for the IMP mdfA the 0 – 0.1 IPTG counts averaged 31.42 million CFUs (Fig, 2b, red bar) while the 0.2 – 1.0 mM IPTG CFUs averaged 9713 (Fig. 2b, green bar) which equates to a fall of 99.97 %. Hence, for both IMP and the soluble targets, exogeneous protein production can only come from < 0.03 % of the initial total number of bacteria.
Theoretically, when using >0.1 mM IPTG, high expression levels of the T7 RNAP (Supplementary Fig. 1) could result in the observed ‘toxic’ effect (Fig. 2). To test this, fresh, untransformed BL21(DE3) cells (no expression vector present) were plated at an OD600 of 0.2 on LB agar plates containing 0 and 0.4 mM IPTG and CFUs were counted. If the overproduction of T7 RNAP is causing the toxic effect, then we would expect 3 – 9 thousand CFUs at 0.4 mM IPTG. To the contrary, both IPTG concentrations resulted in high counts (0 mM IPTG: 29.3 ± 3.0 million (n = 6) and 0.4 mM IPTG 36.6 ± 5.0 million (n = 6)). This confirmed Schlegal et al (2015)[15] data hence toxicity is not due to T7 RNAP. Therefore, is the fall in CFUs because of the presence of the vector? This can be determined from Fig. 2. The average combined CFU counts at 0 mM IPTG (expression vector present but no protein induction) for both E. coli expression strains of 27.1 ± 8.4 million (n = 25) is at the same previous high level in the absence of vector. Therefore, we can exclude IPTG, the presence of uninduced vector, the type of protein (IMP or soluble), the E. coli strain or T7 RNAP as the cause of the toxic effect.
Genomic mutations are central to the E. coli response
To determine the potential genetic changes as a result of protein over-expression, the genomes of six samples (1 – 6) and two controls (7 – 8) were sequenced (Supplementary Table 1). Samples 1 – 4 were taken from single colonies that were exposed to 1.0 mM IPTG. Samples 5 and 6 were produced by exposing cells to only 0.01 mM IPTG. This low level of IPTG was not expected to be a metabolic burden to the cells based on the CFU counts (Fig. 2). This was confirmed by the generation of a lawn of visible green cells for both types of proteins (Supplementary Table 1). We assumed therefore that at 0.01 mM IPTG all cells behaved identically therefore many cells, as part of a swipe, were used as the sample for genome sequencing. Samples 7 and 8 contained the expression vectors but were never exposed to IPTG.
Genomic sequencing of control samples 7 and 8 indicated that there were three missense background mutations, not related to protein expression. (Supplementary Table 2). The genome sequencing results confirm that, if uninduced, the expression vector causes no major genetic changes.
In comparison with the BL21-Gold(DE3)pLysS published genome template, there were eleven mutations of which nine were common to samples 1 – 6. These six samples were generated using two different IPTG concentrations and two vectors hence the common mutations can be considered as background mutations already present in our laboratory strain. The remaining two genetic changes were linked to the observed LW and LG phenotypes.
The lack of visible green colour from the LW colonies implies that they are not synthesising target protein. The sequencing results confirmed this assumption as, for both the soluble and membrane proteins, the T7 RNAP has been disrupted by a single 1293 base pair IS10 insertion within the gene (Fig. 3). IS10 is the transposase of the Tn10 transposon. These mutant T7 RNAP’s activity were not measured but the known properties of IS element integration[31–33] and the lack of colour indicate a non-functional enzyme.
The percentage of LW CFUs differs between BL21(DE3) and BL21-Gold(DE3)pLysS (Supplementary Table 3). Higher LW CFU numbers for BL21-Gold(DE3)pLysS may be due to the presence of the IS10 in its genome. The BL21(DE3) genome does not carry the IS10 insertion sequence[34,35]. A possible reason for BL21(DE3)’s LW phenotype could be explained by C44(DE3) and C45(DE3) strains which have stop codons within the T7 RNAP gene[16].
For sfGFP and mdfA-sfGFP, LG mutations were identical. As previously observed in C41(DE3) the mutation resulted in the conversion of the lacUV5 promoter’s -10 site (TATAAT) back to the lacI -10 sequence (TATGTT). This change is known to result in a 10-fold lower T7 RNAP transcriptional rate[12,13,15]. Surprisingly, the same mutation was also found in ~10 % of the multi-genome Sample 5 reads. Sample 5 was generated from sfGFP over-expression in BL21-Gold(DE3)pLysS using 0.01 mM IPTG. The fact that identical mutations are found in the LG colonies for both the soluble protein and the membrane protein as well as in the sample exposed to only 10 µM IPTG is consistent with the phenomenon of adaptive mutation i.e., non-toxic selection produces mutations that relieve the selection pressure[36]. This is supported by the presence of two additional T7 RNAP IS10 insertions identified in a 2 % subset of Sample 5 (Supplementary Table 2, Fig. 3). These two additional sites are also likely to render the T7 RNAP inactive.
High-level exogenous protein over-expression is bactericidal
As we have shown, induction of exogenous protein results in very large cell losses and a survival rate of < 0.03 %. Is the process bactericidal or bacteriostatic? Replica plating lawns of cells grown with 0.75 mM IPTG onto LB agar with no IPTG demonstrated that the process was bactericidal (Supplementary Fig. 3). Hence, from Fig. 2 the minimal bactericidal concentration for protein over-expression is 0.2 mM IPTG. To clarify this point, IPTG is not an antibiotic but the bactericidal effect is only seen when IPTG is used for high level exogeneous protein over-expression in combination with T7 RNAP and the expression vector containing an in-frame coding gene.
In liquid culture, ~50 % of all cells produce no exogenous target protein
Finally, we examined what happens in solution because this is the usual method of protein over-production. Using the recommended protocol[1], BL21(DE3) cells transformed with the sfGFP vector were grown in LB. Samples were taken before and after the induction of exogeneous protein over-expression and plated on the appropriate LB agar plates (Fig. 4 and Supplementary Fig. 4). Phenotype and CFU numbers were counted. As expected, uninduced control samples produce CFU counts > 106 per mL with higher OD600 values producing higher CFUs (Fig. 4a; blue circles). Plating these cells on LB agar + 0.4 mM IPTG should result in the 99.99 % decrease due to the bactericidal effect. These predictions are plotted as red circles in Fig. 4a. The measured CFU counts, when plated on 0.4 mM IPTG plates (green circles), confirm this prediction. Next, exposure of the cells to 0.4 mM IPTG in the liquid medium begins the mutant (LG, LW, SCV) selection process. Therefore, we expect that non-mutant cells would die, and, in time, the mutant phenotypes would grow to numbers equivalent to WT levels (Supplementary Fig. 4). This is indeed observed since, in overnight cultures, the total number of CFUs reaches the trillions (Fig. 4, open diamonds) in line with the observed uninduced results (blue circles). The difference between these two groups is confirmed in the final phenotypes as WT cultures (blue circles) were always uniform in size and colourless (Supplementary Fig. 2 at 0 mM IPTG) while the IPTG exposed cultures were LG, LW and SCV variants.
Quantification of the various phenotypes throughout the over-expression trial showed a re-distribution of the proportion of mutants over time (Fig. 4b). There is a clear gradual dominance in the LG and LW mutants, eventually resulting in an almost 50:50 LG:LW ratio in overnight cultures.
It is believed that E. coli over-expression of some exogenous target proteins can be toxic, particularly so for IMPs[12,37]. In addition, it is often found that reliable expression protocols suddenly fail, or protein expression levels vary from prep-to-prep. Our data shows that the manufacturer’s 0.4 mM IPTG recommendation[1] results in selection of the LW, LG and SCV phenotypes. Larger LG and LW CFUs compared with SCV imply a faster doubling rate. Therefore, by the end of a protein over-expression experiment, LW and LG numbers dominate (Fig. 4b). Even though the average overnight LW proportion was ~50 %, the measured values ranged from 30 – 70 %. The initial LW population also varied from ~3 - 16 % (Supplementary Table 3). This significant variability in total amount of non-productive LW cells would contribute to the observed prep-by-prep inconsistencies. Our work points to the variation in LW, LG and SCV proportions rather than clonal instability[15] as the cause of variability and provides an explanation for Miroux and Walker (1996)[12] data in which over-expression of the F-ATPase subunit b decreased over an extended period of time due to ‘lost expression capacity’.