Assembly and annotation of the cattle sex chromosomes
The bovine X and Y chromosomes were assembled from whole genome sequence of a hybrid male with a Bos taurus taurus (Angus) sire and a Bos taurus indicus (Brahman) dam [23] (see URLs). The assembled chromosomes presented here are the Brahman X chromosome which comprises 146 Mb in 106 contigs with 983 genes, and the Angus Y chromosome which comprises 16 Mb in 67 contigs with 51 unique genes (Supplementary Table 1). These sequence assemblies have been deposited at NCBI (X: CM0011833.1; Y: CM0011803.1). The full length of the cattle Y chromosome has been estimated at ~50Mb, at least half of which is in the highly repetitive region [27]. As in other species [19, 20, 28], even with long read sequencing, we could not assemble the ampliconic region. highly repetitive region [27]. As in other species [19, 20, 28], even with long read sequencing, we could not assemble heterochromatic regions. Full annotation of the Brahman X and Angus Y chromosomes are available from Ensembl release v97 (UOA_Brhaman_1 and UOA_Angus_1). Analysis of the PAR and X-degenerate regions are presented below.
Identification of the cattle PAR
Alignment of assembled Brahman X and Angus Y chromosomes to each other identified a 6.8 Mb region with 99% sequence identity that extends from the start of the assembled X chromosome sequence (CM0011833.1) to 2,933 bp distal to GPR143, after which sequence identity decreases to 86% for 348 bp and then drops abruptly to ~15% for the next 1 Mb (Figure 1). The X chromosome PAR is assembled in one contig while the Y chromosome PAR only has two contig gaps. This enabled us to precisely define the PAR boundary and size. The PAR on the Brahman X and Angus Y chromosomes contained 31 genes in the same order. Of these, 29 are single-copy genes and two are multi-copy gene families, OBP, which has 3 copies and BDA20, which has 4 copies (Supplementary Table 4). The Brahman X chromosome PAR contains 12 genes that are missing from the proximal end of the X chromosome in the current Hereford reference genome ARS-UCD1.2 (Supplementary Table 3).
Identification of cattle X-degenerate regions
Additional genes outside the PAR showed between 60%-96% sequence identity between the X and Y chromosomes and are located in X-degenerate regions of the Y chromosome. The first of these regions, X-d1, is located distal to the PAR and spans 1.48 Mb, between 6.84 Mb and 8.32 Mb. X-d1 contains 11 single-copy protein coding genes. The corresponding region on the X chromosome spans 35 Mb. and contains 10 X-d1 homologues in a different order but misses RPL23AY, which is located on chromosome 19 (Figure 2). A 3 Mb ampliconic region immediately distal to X-d1 contains the male-specific Y (MSY) gene families PRAMEY, TSPY, and HSFY. At the distal end of the ampliconic region, the second X-degenerate region, X-d2a, spans 1.63 Mb and contains two single copy genes, UBE1Y and TXLNGY. The X chromosome homologs of these two genes are separated by a 44 Mb interval that contains 285 X chromosome-specific genes. Distal to X-d2a lies a 4.5 Mb ampliconic segment containing the bovine specific MSY genes ZNF280AY and ZNF280BY, which are equivalent to TSPY and HSFY found in other species. The copy numbers of multi-copy MSY gene families are listed in Supplementary Table 2 and the complex arrangement of multi-copy genes is presented in Supplementary Figure 3. The distal end of chromosome Y contains the third X-degenerate region, X-d2b, which extends over 1.3 Mb and includes SRY and two copies of RBMY. The X chromosome homologs of these, SOX3 and RBMX, are located in a 5 Mb segment at the distal end of the X chromosome.
Comparison of sex chromosome structure in mammals
Alignment of the Brahman X chromosome with the current Bos taurus taurus (Hereford) cattle reference sequence (ARS-UCD1.2) revealed a 4 Mb inversion as a major structural difference. In both assemblies this inverted region ends at contig breakpoints. Alignment of the Brahman X chromosome with the water buffalo X chromosome[29] revealed a high level of co-linearity, with one large inversion and five small inversions at the distal end of the chromosome. The Brahman and water buffalo X chromosomes are 30 and 25 Mb longer, respectively, than the goat X chromosome, which consists of two scaffolds with a combined length of 116 Mb [30]. The goat X chromosome shows excellent co-linearity overall with the sheep X chromosome (Supplementary Figure 1c-d) but both showed numerous break points and several inversions, particularly on the short arm, in comparison with the Brahman and water buffalo X chromosomes. Non-ruminant mammalian X chromosomes, i.e. human, pig, dog and horse, revealed a striking similarity in the pattern of rearrangements in comparison to the Bos taurus indicus (Brahman) X chromosome (Supplementary Figure 1f-i). These consisted predominantly of 5 large inversions.
Alignment of the Angus Y chromosome assembly with pig, horse and human Y chromosomes showed limited co-linearity which was confined to the PAR and X-degenerate regions (Supplementary Figure 2b-d).
Gene content and order of the mammalian PAR
There is a very high level of conservation of synteny among mammalian PARs (Figure 3). PLCXD1 is the most proximal PAR gene in human, horse, Brahman cattle and water buffalo. At their proximal ends, the PAR regions in the Hereford cattle reference genome, and sheep, goat and pig assemblies are truncated distal to DHRSX, CLRF2, CD99 and GYG2 respectively (Supplementary Table 3). At their distal end, the pig and dog PAR extend beyond GPR143 with a boundary distal to SHROOM2. In comparison to all the other species, the goat PAR has an inversion of three genes (TBL1X, GPR143, SHROOM2) close to the ruminant PAR boundary. This region is contained in one contig of the goat assembly and may thus be a contig orientation error, rather than a goat-specific rearrangement. The human sex chromosomes are an exception amongst mammals and have PARs at the proximal and distal ends[8]. The PAR1 in human is equivalent to the single PAR of other mammalian species, but is much shorter, with a distal boundary proximal to XG. The PAR of horse is the shortest with the distal boundary at PRKX (Figure 3).
PAR gene family expansions in different lineages
The OBP gene family, which is distal to PRKX in all species, is within the PAR of all ruminants, pig and dog, but is outside the PAR of horse, and is missing from the human X chromosome. This gene family is expanded in ruminants (Figure 3). The BDA20 gene family is immediately distal to the OBP family and present in all ruminants for which data are available, including Yak [31], Deer[32] and Chiru[33], but is not found in other mammals (Figure 3). The BDA20 family shows differential expansion in the different ruminant species, with two or more copies with 74% - 91% nucleotide sequence identity at mRNA level in cattle [21], sheep [34], goat [30] and buffalo [29] (Figure 3, Supplementary Table 4). In contrast, ARSF, a member of the ARS family, has been reported as a PAR gene in other mammalian species, but is not found in any of the ruminant PARs [29, 30, 34].
Comparison of X-degenerate Y chromosome regions
Most of the X-Y paired genes of cattle, pig and horse that are outside the PAR are found in the X degenerate region, X-d1, located adjacent to the PAR (Figure 4). Of the 11 genes in the cattle X-d1 region 8 are in common with horse and pig X-d1 regions, but the gene order differs between the three species. RPL23AY is only found in the cow X-d1, while TMSB4Y is found in the horse and pig X -d1 regions and the human X-d3 region missing from cow X-d regions. Five additional bovine gametologs are found in two X-d2 regions, X-d2a and X-d2b, which correspond to the single X-d2 in horse and pig. Cattle X-d2a is distal to X-d1 and contains 2 genes, UBE1Y and TXLNGY. Both genes are found in the pig X-d2 region but UBE1Y is in an ampliconic region of the horse Y chromosome. The cattle X-d2b region contains SRY and is in a telomeric position similar to the X-d2 region of pig. The cattle X-d2b region contains two copies of RBMY, which is also duplicated in the horse X-d2[20].