Goats (Capra hircus), one of the most important livestock animals, were domesticated from a single wild ancestor 10,000 years ago [39]. Since then, due to their ability to adapt to various territories, goats have accompanied humans in all activities and have spread all over the world [40]. During domestication, some specialized goat breeds with high performance in many characteristics were generated which now provide important resources for humans. Nevertheless, genetic research on goat traits is lacking, and the limited information we currently have is mostly derived from sheep studies.
With the goal of contributing to our knowledge of goats, we performed a study in Capra hircus about the genomic organization of the TRB and TRG loci, which revealed a surprising example of gene family expansion in ruminant species studied so far (i.e., sheep and cattle).
Given the complexity of the TR loci, the recent release of the highly contiguous reference goat genome generated by a combination of methods [41] improving the previous whole genome shotgun assembly [42] was a further stimulus for our analysis of the goat TR loci.
As expected, the general genomic organization of goat TRB is similar to that of the other artiodactyl species [11–14, 16–17], with three in tandem D-J-C clusters located downstream of an array of TRBV genes and upstream of a single TRBV gene, which is positioned in an inverted transcriptional orientation (Fig. 1). Analysis of a publicly available pool of goat cDNAs has shown that the presence of three D-J-C clusters offers to this species, as in the other artiodactyls, a biological advantage from the improved combinatorial and junctional diversity of the CDR3 domain, involved in the antigen binding site, that results from the increased number of the TRBD and TRBJ genes (Fig. 4). Moreover, expression analysis has reported that, beside the intra-cluster rearrangements, TRBD/TRBJ inter-cluster rearrangements occur during TRB recombination, thus increasing once more the functional diversity of the TR b repertoire.
In contrast to the great diversity in the combinatory region, the three TRBC genes are highly similar to each other and to the other TRBC genes of artiodactyl species. This conservation may reflect a strong functional constraint linked to the role of the TR constant domain in signal transduction or in interactions with other molecules on the cell surface [43–44].
The structure of the TRBV region appears, however, to be unique within the goat TRB locus. Ninety-one goat TRBV genes, grouped into 27 distinct subgroups, lie in a region of approximately 434 kb. Of these, 182 kb are occupied by a massive expansion of the TRBV5 and TRBV6 gene subgroups, with 30 and 29 members, respectively. The 6 gene members of the TRBV21 subgroup represent the other multimember subgroup. As shown in the locus map (Fig. 1), the TRBV5 and TRBV6 genes are alternated and intermingled within the genomic region, indicating that shared duplicative events contributed to their extensive expansion, unlike the TRBV21 genes that are clustered together due to tandem duplications occurring from a single ancestral gene. Dot-plot analyses of the duplicate TRBV region reinforce the conclusion that the TRBV5 and TRBV6 genomic organization arose through a series of complex tandem duplication events, which rarely involved either the TRBV5 or the TRBV6 gene, but, more frequently, involved genes from both subgroups generating duplication units of different sizes (Additional file 5–6: Figure S2A-B).
The same extensive gene duplications occur in the sheep and bovine TRB loci [11, 13]. In sheep, 94 TRBV genes have been identified and grouped into 26 subgroups. Among these genes are 33 TRBV5 and 30 TRBV6 genes, which are intermingled as observed in the sheep locus. Also in this species, six clustered genes form the TRBV21 subgroup. In this regard, our phylogenetic study (Fig. 3) demonstrates that the gene duplications occurred during the shared evolutionary history of sheep and goat. In fact, most of the sheep TRBV5, TRBV6 and TRBV21 genes correspond to the orthologous goat genes, indicating that the duplication events occurred in a common ancestor of the two species.
In cattle, although the TRB sequence in the third bovine assembly seems incomplete [13], 134 TRBV genes, distributed among 24 subgroups have been found. In this case, the major germline repertoire is also attributable to the expansion of two TRBV subgroups whose genes are alternated at the 5’ of the V-cluster. One subgroup is the TRBV6, as in goat and sheep, which consists of 40 members; the other subgroup, with 35 members, is classified as TRBV9 and likely corresponds to the goat TRBV5. As a matter of a fact, the same authors mention that the identity between the nucleotide sequences of the TRBV9 and TRBV5 genes is often > 75%. In addition, dot-plot analyses of the bovine TRB locus (Fig. 3 in [13]) show the same duplication scheme observed in the goat TRBV5 and TRBV6 genomic region (Additional file 5–6: Figure S2A-B). Furthermore, the bovine TRBV21 subgroup contain 16 members which, unlike in sheep and goat, appears to have been generated by duplications that also involved the bovine TRBV18, TRBV19 and TRBV20 genes [13].
If ancient gene duplications within the TRB locus led to the generation of the different TRBV subgroups shared among mammals [45], in ruminants the framework of the TRBV germline repertoire evolved with a more recent expansion of two main TRBV subgroups rather than with the emergence of diverse TRBV subgroups. Overall, this extensive gene expansion resulted in ruminant species (goat, sheep and cattle) possessing a germline TRBV repertoire with the highest number of genes among all the mammalian species studied so far [7, 11], including other artiodactyl species such as pigs and camels [14–17].
For comparison, it is interesting to note that TRBV5, TRBV6 and TRBV21 are also multimember subgroups in rabbits [8, 11], which possess 17 TRBV5, 14 TRBV6 and 7 TRBV21 genes. Humans and rhesus monkey possess major expansion in TRBV5 and the TRBV6 [11], though the gene number of each subgroup is never as high as in ruminants. The human TRB locus contains 8 TRBV5 and 9 TRBV6 genes, whereas 10 TRBV5 and 8 TRBV6 genes are present in rhesus monkey.
However, the ruminant functional TRB repertoire is strongly conditioned by the proportion of non- functional germline TRBV5 and TRBV6 genes. In fact, the percentage of non-functional goat TRBV5 genes is 40% (12/30), while 62% (18/29) of the TRBV6 subgroup genes are non-functional. The percentage of non-functional genes for the two subgroups is similarly high in sheep and cattle: in sheep, 57.5% and 66.6% of TRBV5 and TRBV6 genes, respectively, are non-functional, whereas in cattle, 50% and 34.2% of TRBV6 and TRBV9 genes are non-functional. Therefore, it appears that the gene expansion of these subgroups might be related not to specific functional needs but rather to the scheme of gene duplications. In contrast, it has been reported [19, 22] that the sheep and cattle TRD repertoire is clearly determined by the high percentage of functional germline genes belonging to the multimember TRDV1 subgroup, where the percentage of non-functional genes is very low (22% (6/27) in sheep and 14.2% (8/56) in cattle).
The organization of the TRG genes into two distinct and separate genomic regions was already known in sheep [26–27] and cattle [24–25]. However, in both species, the genomic structures of the two paralogous TRG loci was archived by analysis of BAC clones, and the structural relationship between them had not yet been determined. The deduced genomic organization of the TRG loci from the goat genomic assembly allowed us to establish the precise chromosomal position of the two loci, their distance and their reciprocal transcriptional orientation. Moreover, in all mammalian species with a single TRG locus, the AMPH and STARD3NL genes represent the IMGT 5’ and IMGT 3’ borne, respectively, since they are located upstream of the first and downstream of the last TRG gene (IMGT®, http://www.imgt.org). In goat, and likely other ruminants, the synteny has been broken as a consequence of the evolutionary TRG split, with the AMPH located at the 5' end of the TRG1 locus and the STARD3NL gene at the 3' end of the TRG2 locus (Fig. 5). Taking into account that an intrachromosomal transposition seems to have moved the TRG2 genes to the current 4q15-22 position [27, 38], this implies that the split also involved the STARD3NL gene. Therefore, in goat and likely all ruminant species, two more gene boundaries should be defined: the IMGT 3’ borne of the TRG1 and the IMGT 5’ borne of the TRG2 locus. We propose the LSM8 gene located 4.5 kb from the TRGC4 gene as the 3’ borne of the TRG1 locus, whereas no gene was found in the vicinity of the TRGV5-1 gene to be proposed as the 5' borne of the TRG2 locus.
The molecular characterization of the goat TRG loci showed that the TRG1 locus is very similar to the corresponding sheep locus in terms of gene content and genomic organization (Fig. 5 and Additional file 14–15: Figure S6A-B). The comparison between the goat and sheep TRG1 sequences (Additional file 18: Figure S8A), however, revealed, between TRGV2 and TRGV9 genes, homology traces with the J-C regions, likely due to an additional cassette, which is still present in the same position in the bovine TRG1 locus (TRGC7 cassette in https://www.imgt.org/IMGTrepertoire/).
An additional functional V-J-J-C cassette is, however, present in the goat TRG2 locus compared to that of sheep and cattle, as result of a recent duplication event involved the ancestral TRGC2 cassette giving rise to TRGC2A and TRGC2B. As a matter of a fact, the two TRGC2 cassettes show the highest nucleotide similarity between them (> 97%) even if the presence of a complete deletion of EX2 in the TRGC2A gene probably makes this cassette not functional.
In line with the evolutionary scenario proposed by Vaccarelli et al., [26] for the formation of the sheep TRG loci, we hypothesize that similar reiterated in tandem duplications of V-J-J-C units may also have generated the goat TRG loci. Briefly, after the duplication of a minimum ancestral cassette consisting of one V, three J and one C gene, the ancestral TRG locus consisted of two cassettes, that were likely the forerunners of the TRGC5 and TRGC6 cassettes and were bordered by the AMPH and STARD3NL genes at their 5' and 3' ends, respectively. Subsequently, the TRGC5 cassette formed the TRGCC3 cassette, which in turn duplicated to generate the TRGC4 cassette. A duplication of the TRGC4 cassette produced the TRGC2 cassette, which in turn generated the TRGC1 cassette. At this point, in the goat TRG2 locus, a further duplicative event involving the TRGC2 cassette may have generated the fourth cassette. However, given the high identity between the TRGC2A, TRGC2B and TRGC1 cassettes, it is also possible that the additional cassette resulted from an inequal crossing-over event between the ancestral TRGC1 and TRGC2 cassette, and the reworked TRGC2A pseudogene may represent the outcome of this event. Inequal crossing-over events have previously been evoked in artiodactyls as the origin of the third TRBD-J-C cluster [12–13, 46].