This study shows that retrotransposition activity values obtained from cell-culture based assays are roughly proportional to the estimated rates at which new L1s emerge in the human population. L1s with higher retrotransposition activities branch more frequently on the phylogenetic tree of human L1s. There appears to be an asymmetry in the evolution of retrotransposition activity, where L1s change more readily from high to low retrotransposition activity than from low to high. This asymmetry was shown by two different analysis methods. These two methods also showed that while low-activity L1s rarely turn into high-activity L1s, the rate for this transition is not zero. When combined, the estimated insertion rates and rates of L1 retrotransposition activity evolution suggest that L1s continue to grow in the human population.
Comparative methods, such as the ones used in this analysis, have several limitations. They can provide misleading results when applied to un-replicated evolutionary events [22]. The lack of replication should not be a major concern in our dataset since the L1 tree indicates that there were several phylogenetically independent transitions between high and low retrotransposition activity (Fig. 1). A specific caveat of the BiSSE model is that unaccounted variation in the speciation rate can lead to a spurious correlation between specific character states and the speciation rate [23]. However, this is mainly a problem for analyzing speciation rates of complex organisms where myriads of traits could potentially affect the speciation rate. The context of our analysis is different. For one, L1s are not organisms and therefore harbor fewer traits that could be associated with speciation. Furthermore, cell-culture based retrotransposition estimates directly quantify insertion events. The most parsimonious expectation should therefore be that the speciation rate observed on the L1 phylogenetic tree is proportional to cell-culture based estimates. Our BiSSE results indicate that this expectation is consistent with the data.
There is an additional caveat for applying the BiSSE model to L1 retrotransposition. The BiSSE model interprets each internal node of the phylogenetic tree as a speciation (or in our case retrotransposition) event. The 155 different L1 loci studied in our analysis require 154 retrotransposition events. However, these 154 retrotransposition events do not have to exactly coincide with the 154 internal nodes of the L1 tree, because strictly speaking, the internal nodes correspond to coalescent rather than retrotransposition events and the coalescent process within the human population might be on a comparable time scale as the time between different retrotransposition events. More accurate parameter estimation might therefore require a model that considers the coalescent and retrotransposition process simultaneously. Nevertheless, the ratio of speciation rates estimated via the BiSSE model for high and low retrotransposition L1 is very close to the ratio of retrotransposition rates for these L1 classes obtained from cell cultures, suggesting that the results obtained by ignoring the coalescent process might still be reasonably accurate.
The results of the BiSSE and BayesTraits models provide also information on how retrotransposition activity evolves after insertion. Both approaches show clear statistical support for a model in which the evolutionary change from high to low retrotransposition activity is much more likely than for the reverse. This is consistent with a priori expectations since random mutations of L1 sequences are more likely to disrupt the retrotransposition machinery than improve it. Both approaches indicate that, nevertheless, L1s occasionally change from low activity to high activity. Each model has its own strength and weakness. The BiSSE model requires ultrametric trees, and hence a more restrictive phylogenetic estimation procedure, but it allows incorporating the effects of activity on branching. The BayesTraits model poses no restrictions on the tree branch lengths but does not incorporate the effects of activity on branching. The fact that both models arrive at qualitatively similar conclusions about the evolution of retrotransposition activity underscores the robustness of these results.
Interpretation of the BiSSE parameters require consideration of the data collection process. Both studies whose data were used in this analysis [10, 11] searched for full-length L1s in a limited set of sample sequences. L1s that occur at low frequency in the human population have a low probability to be detected by this process. The best-fitting BiSSE model restricts the extinction rates to zero (µ0 = µ1 = 0), suggesting that L1s that reach a sufficient population frequency to be detected in these studies are unlikely to be lost from the population. An extinction rate of zero does not contradict a frequent loss of L1s shortly after insertion, because most of these low frequency L1s would not be detected in the studies analyzed here.
According to the BiSSE model, an average full-length L1 generates 3.2 *10− 6 new L1 insertions per generation. The model furthermore estimates that at a steady state, 75% of L1s are low activity, leading to an average retrotransposition activity of 27%. Ewing & Kazazian estimated the L1 retrotransposition in humans to be between 1/95 and 1/270 births [24]. Our population-level estimates of insertion rates would be equivalent to the insertion rate per individual if L1 insertions were selectively neutral [25]. In that case, each individual would have to carry on the order of 104 average retrotransposition competent full-length L1s for our estimate to be compatible with the estimate by Ewing & Kazazian. This number of full-length L1s seems too high and a more plausible explanation for the difference between our rate and the one by Ewing & Kazazian [24] is that L1s are under negative selection, as shown by Boissinot et al. [26]. Negative selection weeds out many L1s shortly after insertion, which would explain why the insertion rate on the individual level is much higher than a population-level substitution rate.
It is currently unknown whether L1s are growing in the human population or are at a stable equilibrium. Linear models, such as the BiSSE model, only allow for exponential growth or decline. According to our parameter estimates, L1s grow exponentially with a doubling time in human genomes of 2.2*105 generations. It is not clear what mechanism would lead to a negative feedback of L1 density on average retrotransposition rate that is required for a stable equilibrium. It has been suggested that a stable equilibrium for retrotransposition is obtained when the number of available genomic positions becomes limiting and L1s repeatedly insert into pre-existing L1s [7, 8]. However, the low density of active L1s in human genomes makes it unlikely that such a feedback is the driving force for an equilibrium. Alternatively, there might be no equilibrium for the number of L1s but instead co-evolutionary cycles where phases of high L1 retrotransposition lead to evolutionary adaptations in the host that suppress retrotransposition, which in turn increases selection for L1s that can escape the host suppression. There is some empirical evidence for such cycles [27]. A more complete understanding of the L1 dynamics in human genomes will require a model that combines the effects of L1 retrotransposition rate on L1 growth, the evolution of this rate and the fitness effects on the host. The results presented here are a first step in that direction by providing parameter estimates for the first two components.