The HIV-1 epidemic in Argentina and the neighboring countries of Uruguay, Chile, Paraguay, and Bolivia is characterized by the cocirculation of B subtype and BF1 recombinant viruses [15-25]. Most of the BF1 recombinant forms in these countries appear to derive from a common recombinant ancestor, as inferred from coincident breakpoints and clustering in phylogenetic trees [16,17,22,26-28]. As the subtype F fragments of these recombinants cluster with viruses of the F subtype strain circulating in Brazil and there is no evidence of the circulation of this strain in other South American countries, it has been proposed that the common ancestor of these recombinants might have originated in Brazil, with subsequent recombination events giving rise to a great diversity of recombinant forms [16,28], some of which became circulating, of which CRF12_BF, CRF17_BF, CRF38_BF, and CRF44_BF had been identified previously [16,17,22,26,27]. Due to their common ancestry and similarity in recombination structures, all these viruses have been proposed to constitute a CRF “family” [29,30] (similarly, other CRF families could the CRF_BGs from Cuba, numbers 20, 22, and 23 [47], CRF_BGs from Spain and Portugal, numbers 14 and 73 [48], and CRF_01Bs from Malaysia, numbers 33, 53, 58, and 74 [49,50]). The first to be identified in the CRF_BF family from the Southern Cone of South America was CRF12_BF, which is widely circulating in Argentina and Uruguay [15-17] and in lower proportions in Chile [25], Paraguay [23], and Bolivia [24]. The second was CRF17_BF, representing a small proportion of infections in Argentina, Paraguay, and Bolivia [26]. Two other members of the family, CRF38_BF and CRF44_BF, were identified in Uruguay [22] and Chile [27], respectively. In a molecular epidemiological study in Bolivia, with samples collected in 1996 and 2005, a cluster of 4 BF recombinant viruses branching apart and differing in mosaic structure from CRF12_BF was identified among samples collected in the capital city of La Paz in 2005. The authors proposed that it could represent a new CRF of the CRF12_BF family [24]. Here, we show that this cluster (comprising the 4 viruses collected in 2005 and a fifth virus collected in 1999 [17] (Fig. 1)) forms part of a larger cluster, comprising 38 viruses collected in two other South American countries (Peru and Argentina), three European countries (Spain, United Kingdom, and Sweden) and Japan, with samples collected in Spain representing a majority, although most of them are from Bolivian or Peruvian individuals (Fig. 1). We show through the analysis of 5 NFLG sequences, three of which were newly derived from samples collected in Spain and two from databases from samples collected in Bolivia and Peru that the identified cluster represents a new CRF derived from subtypes B and F1, designated CRF89_BF (Figs. 2 and 3). This CRF is closely related to CRF12_BF and CRF17_BF, as deduced from multiple breakpoint coincidences and close phylogenetic clustering, and more distantly to CRF38_BF and CRF44_BF. CRF89_BF has a complex mosaic structure with 13 breakpoints, delimiting 7 subtype F and 7 subtype B fragments. One of the subtype B segments, in gag, is absent from CRF12_BF and related CRFs, and another segment in env is absent from CRF12_BF but found in CRF17_BF and CRF38_BF. Breakpoint coincidence with different CRF_BFs from the Southern Cone suggests a complex scenario of BF recombinant generation in this area through successive rounds of recombination with subtype B viruses, as previously proposed [28]. However, it seems unlikely that CRF89_BF derives from CRF12_BF or CRF17_BF, since in the NFLG phylogenetic tree, the CRF89_BF clade is not nested within CRF12_BF or CRF17_BF radiations but forms a separate clade (Fig. 2), and it exhibits several differences in breakpoint locations from both CRFs (Fig. 5).
CRF89_BF comprises three major clusters. One comprises exclusively samples collected in Western Europe (Spain, UK, and Sweden); however, out of 10 individuals with data on country of origin (all residing in Spain), 8 were Bolivian, and only 2 were Spaniards, whose viruses branch interspersed among those from Bolivian individuals. Therefore, it seems reasonable to assume that this cluster originated and spread initially in Bolivia, and its finding in Western Europe reflects the importation of infections acquired in Bolivia rather than local circulation of CRF89_BF. Otherwise, clustering of CRF89_BF strains among native European individuals would be expected but was not seen. Failure to identify viruses collected in Bolivia within the Euro-Bolivian cluster may be due to the low number of HIV-1 sequences from Bolivia available in public databases. A second CRF89_BF cluster comprises all five samples collected in Bolivia, all from La Paz. The third cluster comprises 3 sequences from Peru, 6 from Peruvians residing in Spain, and two from Japan, the last ones closely related to a Peruvian virus. Similar to the case of the Euro-Bolivian cluster, we assume that this cluster represents a variant originating and circulating in Peru, and its presence in Spain and Japan probably reflects the importation of infections acquired in Peru rather than the local circulation of CRF89_BF. It is interesting to point out that although a small proportion of HIV-1 BF recombinant viruses have been identified in Peru (approximately 2% [18,51]), no evidence has been published of their circulation among the local Peruvian population. Therefore, the results presented here would be the first evidence indicating that an HIV-1 BF1 recombinant form, in this case CRF89_BF, is most likely circulating in Peru. It is also interesting to note that although heterosexual transmission is predominant among CRF89_BF infections, all 3 infections with information on transmission route in a subcluster of 4 individuals within the Peruvian cluster were in MSM. This reflects the circulation of CRF89_BF among Peruvian MSM and the linkage between HIV-1 heterosexual and MSM transmission networks. A similar linkage was observed in a CRF02_AG cluster in Spain, although in this case, the spread was from an MSM to a heterosexual network [52].
According to phylodynamic estimations, CRF89_BF probably emerged in Bolivia around the mid-1980s, with its major clusters emerging around the early 1990s, 2 of them in Bolivia and 1 in Peru (Fig. 7). These estimations were performed assuming that CRF89_BF infections in Bolivian and Peruvian individuals residing in Spain acquired their infections in their country of origin, which seems a reasonable assumption, as discussed above. However, since we could not rule out that subclusters of more recent origin comprising viruses sampled in Spain reflected local transmissions, a second analysis assuming HIV-1 acquisition in Spain of the most recently diagnosed infections of subclusters comprising Bolivian or Peruvian individuals was performed, yielding similar results (Supplementary Fig. S4) . The MRCA of CRF89_BF, according to our estimations, would be approximately 10 years more recent than that of CRF12_BF (Supplementary Fig. S5). However, we cannot rule out an earlier emergence of CRF89_BF, since estimations could change with more representative HIV-1 sampling in Bolivia.
In Bolivia, CRF89_BF was detected in only 5 samples from La Paz, 4 collected in 2005 and 1 in 1997. In 2005, CRF89_BF represented 13.3% HIV-1 samples collected in La Paz sequenced in Pr-RT. However, given the low proportion of Bolivian HIV-1 strains sequenced and the fact that no sequences from samples collected after 2005 are available in public databases, the current prevalence of CRF89_BF in Bolivia and its geographical spread in the country cannot be accurately estimated. Considering that in one of the major CRF89_BF clusters, 8 of 10 viruses, all of which were collected in Europe, were from Bolivian individuals and that 18% of HIV-1-infected Bolivian individuals residing in Spain studied by us harbored CRF89_BF viruses, we hypothesize that CRF89_BF could be widely circulating in some areas of Bolivia.
The identification of CRF89_BF infections in Spain and other European countries, mainly in South American immigrants, reflects the increasing relation between the South American and European HIV-1 epidemics, which is also reflected in the expansion in Western Europe of clusters of South American strains of subtypes C [53-56] and F1 [31,57-59], of CRF12_BF [60] and of CRF17_BF [54], and in the identification in Western Europe of CRFs derived from parental strains of South American ancestry [61-63].
The identification of CRF89_BF and other CRFs in NFLG sequences is relevant for molecular epidemiological studies because it allows for the proper characterization of HIV-1 strains circulating in different geographic areas and population groups. In this regard, some CRF89_BF viruses were misclassified as CRF12_BF viruses in GenBank submissions (accessions MF403410, MF403416), and such misclassification may not be irrelevant, since, even though both CRFs exhibit similar mosaic structures, they are not identical and form separate clades. It should also be pointed out that even relatively minor genetic differences in viral genomes may result in important biological differences. Examples in HIV-1 are CXCR4 coreceptor usage in CRF14_BG, which is associated with only four amino acid residues in the Env V3 loop [64], all or most of which are absent in viruses of the closely related CRF73_BG [48], which has a very similar, but not identical, mosaic structure, and differences in pathogenic potential or therapeutic response associated with clusters within HIV-1 genetic forms [65,66]. The identification of CRF89_BF may also be relevant for the development and testing of vaccines intended for use in areas where this CRF circulates, considering the correlation of susceptibility to protective immune responses with HIV-1 clades and with intraclade genetic diversity [5].