Strategies for discriminating between introgression and ILS
Introgression is a common phenomenon in plant evolution[52] and distinguishing it from ILS is a crucial step in phylogenetic analysis. The processes of introgression and ILS both create gene tree discordance, but they predict differing gene tree distributions that can be distinguished using a variety of probabilistic approaches[53]. Common phylogenetic tools to distinguish ILS and introgression, such as D-statistics (a site-based method)[54] and quantifying introgression via branch lengths (QuIBL, a tree-based method)[47], can quickly detect the existence of introgression by testing whether the null hypothesis (that only ILS occurred) can be rejected. However, these methods cannot directly quantify the proportion of introgression present and often impose hard bounds on the maximum number of taxa and topologies that can be tested in one analysis, limiting their application in species-rich clades. On the other hand, although some phylogenetic tools for reticulate evolution, such as PHYLONETWORKS[45], can infer networks of arbitrary shape with quantified introgression, these methods require an unrealistically large amount of computational resources and time even with moderate numbers of taxa[55]. Hence, these methods are limited either by a priori sampling limitations (for D-statistics and QuIBL) or by indirect sampling limitations via computational resource scaling (for PHYLONETWORKS), creating the need for “divide-and-conquer” approaches for the common case where it is unclear which taxa specifically are involved in hybridization.
One feasible “dive-and-conquer” approach to roughly identify nodes affected by ILS/introgression, and thus to adjust the strategy and scope of introgression analysis accordingly, is to identify globally on a phylogeny those problematic nodes that would form the basis for smaller-scale tests with more elaborate probabilistic approaches. As implemented here, first we calculate for each node estimates of the site concordance factor (sCF) and two site discordance factors (sDF1 and sDF2). As sDF1/2 actually represents the proportion of two IGTs (detailed in §2.5), in a quartet (based on parsimony-informative sites), the taxa affected by ILS and introgression can be roughly determined from the high/imbalanced values of sDF1/2 respectively.
We then reduce the scope of hybridization inference to those nodes with suggestive patterns of discord on the basis of site discordance factors, in this case those nodes with imbalanced sDF1/2, yielding on a quantitative basis for identifying candidates for a more computationally feasible sub-analysis. Specifically, there are two possible scenarios to consider: for Amana (A), Erythronium (E), and Tulipa (T), we can determine that the bipartition EA|T corresponds to CGT based on a plastid derived tree topology, whereas the alternative bipartition TA|E corresponds to the IGT of higher proportion (versus TE|A). This indicates two possible introgressions: one is between A and T, and the other is directional gene flow from an unsampled common ancestor/sister of ‘E, A and T’ to E, two alternative scenarios that can be distinguished by divergence time, internal branch length, and other non-topological information.
Phylogenetic framework of tribe Tulipeae
The tribe Tulipeae consists of four monophyletic genera: Amana, Erythronium, Gagea (including Lloydia), and Tulipa. Gagea has always been regarded as sister to the others, which is supported by our phylogenetic study and others.[5-7]. The relationships among the three remaining genera, however, remains ambiguous. In addition to detecting high levels of ILS within this tribe, we also documented numerous cases of introgression among Amana, Erythronium, and Tulipa, which makes their phylogenetic relationship particularly challenging to resolve. Previous studies showed substantial disagreement, recovering all three possible resolutions of these genera, but there are very few studies that strongly support a sister relationship between Erythronium and Tulipa[6, 9]; neither does our result.
Most previous studies support a topology of EA|T with Erythronium sister to Amana[5, 7, 10, 11, 13] (Table 1). This is consistent with the plastid ML tree and the ASTRAL tree based on our lower copy nuclear dataset (1,594 OGs). The sDF values also show that the parsimony-informative sites slightly favor Amana as sister to Erythronium (Figures 5 & 6).
In contrast, most of the networks predicted by PhyloNetworks, including the best ones, indicate that Amana is sister to Tulipa (not Erythronium). Those networks with a structure of TA|E scored much lower (better) than the networks with structure of EA|T, even if we deliberately used the guide tree of EA|T structure. We must take this "dissent" seriously. When combined with high levels of ILS, even low levels of extant introgression (e.g. among non-sister clades) can cause AGTs and thus misleading species tree estimations, while even massive introgression may not necessarily lead to AGTs[56, 57]. With the simultaneous presence of ILS and introgression, neither concatenation nor coalescent methods are robust, while coalescent-based networks are consistent[49]. In essence, the MPL model has the ability to take both ILS and introgression into account, therefore potentially resolving discord quite differently from methods assuming a tree model, which gives the result substantial weight.
Because of the uncertain relationship, we used three topologies: ((Erythronium, Amana), Tulipa), ((Tulipa, Erythronium), Amana), and ((Tulipa, Amana), Erythronium) to carry out D-statistic analyses (shorthand as EA|T, TE|A and TA|E, respectively) and chose N. campanulatum (N, outgroup), all spp. in Erythronium and Amana, and 6 spp. in Tulipa as samples for 72 independent sets of D-statistic runs. All 72 test groups using topology TE|A detected significant introgression between Erythronium and Amana (Table S4), which indicated that TE|A is an unlikely evolutionary history because introgression only occurred between Erythronium and Amana, which would lead to the dominance of TEA (polytomy) or EA|T but would not explain a close relationship between Amana and Tulipa. Similar to TE|A, the results based on TA|E showed that there was highly significant introgression between Erythronium and Amana in all the 72 test groups (P-value = 0, Table S4). This result is unusual because the D values do not show a particularly obvious species correlation, if the structure of TA|E is true, then each branch of Amana and Erythronium (geographically isolated) all have experienced continuous introgressions, which is very doubtful. EA|T’s results showed that 38 out of 72 test groups detected introgression between Amana and Tulipa; 5 test groups showed introgression between Erythronium and Tulipa; the remaining 29 test groups did not detect introgression (Table S4). All E3 (18) and almost all E2 (17) test groups showed introgression between Amana and Tulipa, while most E1 (15) and E4 (13) test groups did not support the existence of any introgression. It is intriguing that all instances of introgression between Erythronium and Tulipa are exclusively linked to E4. Given the general belief that the divergence time of Erythronium is earlier than that of Tulipa and Amana[6, 12, 58], this result is reasonable, because the introgression signal is related to each branch of Erythronium, but is almost independent of Amana and Tulipa. These findings suggest that either Erythronium and Amana have undergone extensive and continuous gene flow in their history (TA|E), and/or they are actually sister taxa (EA|T).
However, the analyses based on internal branch lengths provided different results (Table S5). Based on the same groupings described above for the D-statistic, QuIBL were used for performing internal branch length analysis of the 72 test groups. The mean frequencies of gene trees supporting specific topologies are as follows: 28.07% (TE|A), 36.02% (TA|E), and 35.37% (EA|T). This indicates that both TA|E and EA|T are potential candidates for being consistent with the true species tree. However, given that introgression between Amana and Tulipa affects approximately 66.67% of triplets and 20.56% of loci, which is greater than the impact of introgression between Amana and Erythronium (47.22% and 19.76%), TA|E is the best supported topology.
Due to the close proximity of the proportions of gene trees that support the three triplets, it was difficult for us to interpret the results of QuIBL. As a result, we used ASTRAL for calculating the corresponding coalescent trees of 72 triplets (with N as an outgroup), and the phylogenetic relationships were inferred finally (Table 2). In terms of quantity, TA|E is more frequent EA|T, but interestingly, the Eurasian branch of Erythronium (E1 & E2) strongly supports Tulipa to be sister to Amana, while the Western North America branch (E4) strongly supports Erythronium and Amana as sister groups, and the Eastern North America branch (E3) supports both of them. As the basal taxon of gen. Tulipa, T. heterophylla (T1) seems to have a special status, because it supports the Eurasian branch of Erythronium to be sister to Amana in 2/3 cases.
In order to prevent the instability of phylogenetic tree reconstructed by only four species, we used N as an outgroup, together with all two Gagea spp., A1, A3, T1-T6, T. biflora and T. hissarica, a total of 13 species to tested different combinations of E1-E4 (Table 3). The result was decisive: all the trees including E1 or E2 showed sister relationship of Amana and Tulipa; However, E3, E4 and their combinations supported the sister relationship between Erythronium and Amana. Moreover, if the same taxa were reconstructed by ML method, Amana and Tulipa would be clustered into one branch with high support, which clearly showed that ILS has misled the ML inference.
Overall, the Eurasian group of Erythronium species possesses some unique characteristics compared to its North American counterpart. Specifically, it exhibits a disjunct distribution pattern, yet slightly overlaps with the distribution of Amana. In this scenario, Erythronium’s plastid and nuclear genes appear to have undergone different evolutionary trajectories, with the former supporting a sister relationship with Amana, whereas the latter favor it as a sister to both Tulipa and Amana. Thus, better understanding its genome may be one of the keys to resolving the phylogenetic problems among the three genera. Additionally, E. albidum (eastern North America) showed the characteristics of branches that are half like those of taxa from Eurasian and half like branches of Western North America taxa, which may be related to its homologous polyploidy[12]. In the reticulate evolution analysis not presented in the main text, introgression of about 50% of genetic components from the ancestor of Erythronium to E. albidum was also detected, indicating that E. albidum may retain a relatively ancient genome that sheds light on the phylogenetic history of this genus.
Reconstructing the evolutionary history of tribe Tulipeae
In our study, when using low-copy nuclear datasets (1,594 OGs and the datasets used in pre-experiments), ML trees always conflict with coalescent trees, which can be simply explained by ILS. However, when a further 1,000 OGs with slightly higher copies are added to the analyses, the support value of this node in coalescent tree decreases rapidly, and finally collapses to a sister relationship between Amana and Tulipa, albeit with poor support. We have reason to believe that although the additional 1,000 OGs added later are fewer in number, they may possess stronger phylogenetic signals than the initial 1,594 OGs. Moreover, due to their slightly higher copy numbers, they may evolve more quickly.We have noticed that when we add more OGs with higher copies or remove the branches of gene trees with low support values, then the support value at this node tends to increase. It can be predicted, therefore, that if whole genomes could be used in the future for reconstructing their phylogenetic relationships, then the sister relationship between Amana and Tulipa will likely emerge as true. This unusual situation is similar to a scenario discussed by Solís-Lemus et al. (2016)[49]. Under cases of high-level introgression (>30%), ASTRAL will be more and more firmly misled by the wrong structure of introgression with the addition of more well-reconstructed gene trees. Mallet et al. (2016)[21] put forward the same view based on a case of the Anopheles gambiae complex: abundant introgression may lead to a "tyranny of the majority", such that with an increasing number of genes involved, more and more gene trees that are inconsistent with the species tree will impose a quantitative advantage on the phylogenetic result. Therefore, the real history may be similar to the anomaly zone envisaged by Solís-Lemus et al. (2016)[49]. In brief they hypothesize that when introgression occurs between two rapid differentiation events, alleles affected by the introgression from another lineage may immediately experience the second differentiation event without sorting, which makes the two descendant populations (i.e., sister taxa, like Erythronium and Amana) inherit the alleles vertically and by introgression, respectively, and thus lead to an inconsistency between gene trees and the species tree.
With all this said, we want to emphasize that we are not arguing that TA|E must be the wrong topology for these genera. Considering the results of the Robinson-Foulds-based metric (see § 3.1), it is necessary to invoke introgression to account for the inconsistency between the plastid and nuclear gene topologies. Therefore, there is another possibility that Amana, indeed, may have experienced multiple hybridization and introgression events with Erythronium, eventually leading to the capture of its plastid. One reason we say this is that the coalescent tree results of 72 quartets showed that E. californicum (from western North America) may have an especially close nuclear relationship with Amana from Asia (Table 2 and trees in supplementary File S1). In fact, instances of introgression between Amana and North American Erythronium were frequently detected (Figure 8a). In D-statistic analyses, not only E. californicum but also E. japonicum (East Asia) showed a close relationship with Amana (East Asia). In addition, the average divergence time for Tulipa, Erythronium and Amana is about 22.82 Mya, 24.38 Mya (15.88 Mya for Eurasian branch) and 10.96 Mya respectively[6, 58]. Together these lines of evidence support one of the reasonable migration hypotheses[12] – that the ancestor of the three genera originated in the Far East near the western part of North America, with the ancestor of Erythronium diverging first and spreading westward to form the Eurasian branch, and eastward to form two North American branches. The ancestor of Tulipa also diverged around the same time, with one lineage moving westward to the Tianshan-Pamir region. Another lineage of Tulipa (or possibly Erythronium) remained in the Far East and diverged into early lineages of Amana during the middle to late Miocene.
Another fact to consider is that Amana is more similar to Tulipa in morphology, whereas Amana has a geographic distribution more similar to Erythronium[5, 12]; differing genomic partitions alternately correlating with morphology or geography is characteristic of many ancient plastid capture events[59, 60].
We further speculate that the historical distribution range of Amana may have been much broader and substantially overlapping with that of Tulipa and/or Erythronium. This is because under the influence of glaciation, lowland taxa likely underwent southward migration, range contraction, and fragmentation[61]. The migration history of Amana aligns with this scenario[62]. Additionally, plastid capture has been detected in other monocotyledonous herbs (e.g., Hemerocallis, Asphodelaceae)[60], and so a bolder hypothesis is that the ancestor of Amana once inhabited a shared distribution between East Asia and North America, captured its plastid by hybridization.
Phylogenetic Relationships within Tulipa
Following the classification framework of genus Tulipa proposed by Veldkamp et al. (2009, 2012)[14, 15] and Christenhusz et al. (2013)[1], our research tested the monophyly of the four currently recognized subgenera. Three of these, namely subg. Tulipa (including T. sinkiangensis), subg. Clusianae and subg. Erythronium, are strongly supported as monophyletic. The phylogenetic relationship among these three subgenera is clear, that is: (Tulipa, (Eriostemones, Clusianae)).
Subgenus Orithyia, however, is not monophyletic. This is not surprising, since there has been a lack of research on the systematics this subgenus. Zonneveld et al. [14, 15] defined subg. Orithyia with a combination of three characters: a style nearly as long as the ovary, bulb tunics usually naked inside, and DNA 2C value of only 38–39pg. They recognized three species, T. uniflora, T. heterophylla and T. heteropetala, as members of subg. Orithyia, but did not rule out the possibility of T. sinkiangensis and T. thianschanica also being part of this subgenus. However, we now know that T. thianschanica actually belongs to the subg. Tulipa. Subsequently, Christenhusz et al. used only T. uniflora to represent the subgenus in their phylogenetic analysis and warned that the remaining species need to be sampled in order to evalute the monophyly of the group[1]. Our study shows quite strikingly that T. heterophylla and T. heteropetala are successive sister species to the remainder of the genus Tulipa, whereas T. sinkiangensis is sister to subgenus Tulipa and might be best classified within it. Since T. heterophylla is similar to other Tulipa spp. in morphology, geographical distribution and phylogeny, there is no sufficient reason to separate it as a monotypic genus as some have proposed. Therefore, following the study by Wilson[58], we suggest the old taxon Eduardoregelia Popov (which includes a single species, Eduardoregelia heterophylla (Regel) Popov, a synonym of T. heterophylla) should be redefined as a new subgenus of Tulipa. As for T. heteropetala and T. uniflora, we only sampled the former, but they share a highly similar morphology, and considering that a previous study has shown their clustering together[6, 58], it is reasonable to continue recognizing them within a now smaller circumscribed subg. Orithyia.
In the small subg. Clusianae, we reconstructed a topology different from Christenhusz et al. [1]. Here we find that T. clusiana is sister to T. montana and T. linifolia. An introgression event, that T. liniflora acquired more than 20% of its genetic components from the ancestor of subg. Clusianae may be reasonable for this conflict (Figure 8c). Additionally, our plastid phylogenetic tree indicates that T. montana is the basal taxon of subg. Clusianae (Figure 2), conflicting with results based on nuclear genes. Wilson's study also demonstrates the same scenario[58], suggesting potential discordance in the evolutionary history between the plastid and nuclear genomes of T. montana.
In subg. Eriostemones, we found that T. turkestanica is embedded within T. biflora (2, 3), and they are supported to form a polytomy (Figure 8). Research also indicates that they form a complexes characterized by white flowers[58]. Furthermore, three species, T. patens, T. dasystemon and T. sprengeri have unstable phylogenetic positions, with some of the reasons explained by the results of reticulate evolution analysis in § 3.3 (Figure 8d). Based on the plastid phylogenetic tree (Figure 1), the T. patans sample from NCBI separates from our T. patans sample, but its position aligns with Wilson et al.'s study[58]. This suggests that the species referred to as "Tulipa patens" may not constitute a natural group, although more data are needed to confirm this. T. dasystemon exhibits cytonuclear conflict, a scenario consistent with findings in Wilson's study. Therefore, it is also possible that the nuclear and chloroplast genomes of T. dasystemon have different evolutionary history.
As mentioned earlier, we consider T. sinkiangensis to be a member of subg. Tulipa, even though T. sinkiangensis has a style (an original plesiomorphy). Although many phylogenetic trees in our study supported this anomalous species to be the basal taxon of Clade T1 (Figures 3 & 4), our plastid phylogenetic tree and the reticulate evolution analysis support it as the sister taxon of the entire subg. Tulipa (Figures 2 & 8e). Some other results that stand out within the topology of subg. Tulipa are that T. thianschanica is sister to T. iliensis and T. tetraphylla, rather than T. thianschanica var. sailimuensis (Figures 3 & 4). For this reason we feel that T. thianschanica should not be treated as a synonym of T. iliensis, and that T. thianschanica var. sailimuensis should be treated as a distinct species rather than a variety of T. thianschanica[1]. However, it is important to be cautious, the sample of T. thianschanica from NCBI shows a closer relationship with T. tetraphylla (Figure 1), indicating a potential for T. thianschanica to be part of species complexes. Additionally, we can safely conclude that the T. iliensis sample from NCBI has been misidentified; it should actually be classified as T. altaica. Such identification errors seem to be quite common in subg. Tulipa[56][58].
For Clade T2 (Figure 8f), reticulate evolution analysis suggests that T. armena is more likely the sister taxon of Clade acu. We also found that T. praecox and T. acuminata are not closely related, so at least one of them cannot be treated as a synonym of T. agenensis. We wish to point out that because we were unable to sample T. undulatifolia, it is still unclear whether T. eichleri is a synonym of that species.
Finally, we also evaluated the monophyly of the various sections classified within the larger subgenera, but most of these, except for sect. Neo-tulipae, Saxatiles (embedded in sect. Sylvestres), Multiflorae & Spiranthera (only one sample), were found to be not monophyletic. To highlight a few examples, T. tarda and T. urumiensis, which belong to sect. Biflores, are embedded in sect. Sylvestres; the species of sect. Kolpakowskianae are scattered across three clades; species of sect. Vinistriatae and sect. Tulipanum are mixed together; and sect. Tulipa is divided between two distant clades. Previous studies have delved deeper into the sections of tulips and offered specific recommendations[58]. Further examination of morphological, genetic, and other characters in light of these phylogenetic results is warranted in order to better classify the horticulturally popular tulips.