Identification of CNVs in the ampliconic region
To detect CNVs in the amplicon region of the Y chromosome in the analysed samples (supplementary table 1), we normalised the mean depth values of the selected amplicons by the mean depth value of a unique region of the Y chromosome (supplementary tables 2 and 3) (9, 10). The resulting copy number calls are based on the expected value of the ratio between the copy number of the putative amplicon and the copy number of the reference genome (Fig. 2 and supplementary tables 4 and 5). Note that the reference sequence carries the ancestral number of amplicons (9) and therefore chromosomes showing differences in copy number compared to the reference have been interested by mutational events. We focused on the entire AZFc region (which includes palindromes P1, P2 and P3, as well as other segmental duplications), palindromes P4 and P5 of the AZFb region, and palindromes P6 - P8 (Fig. 1). In this way, we identified a total of 21 males (24.1%) with at least one amplicon with a copy number different from the reference. In particular, we observed 15 chromosomes with duplications, 2 chromosomes with deletions and 4 chromosomes carrying both deletions and duplications. Most CNVs (17 out of 21) were found within the AZFc region. As for the amplicons in this particularly complex region, we confirmed our observations with the normalised EMA analysis (Fig. 2, Fig. 3B and supplementary Fig. 1). The proportion of Iranian individuals with structural rearrangements is higher than in the global population represented by the 1000 Genomes (24.1% vs. 16.0%), although this difference is not statistically significant (Fisher exact test p = 0.053). Interestingly, when deletions and duplications were considered separately, we observed a much higher and statistically significant incidence of duplications in the Iranian population (17.2% vs. 7.8%, p = 0.005) than in the global population and, conversely, a lower incidence of deletions (2.3% vs. 6.9%, p = 0.116).
Description of recombinational events
Some of the structural rearrangements identified here are easily explained by a single NAHR event occurring between amplicons with the same orientation. However, there are several cases for which more complex rearrangements are required to explain the observed pattern.
Surprisingly, we identified a large duplication (about 8.0 Mb) in sample R006 (haplogroup J-M47), resulting in a fourfold increase of the copy number of the genes BPY2, DAZ and CDY1 and possibly two additional copies of PRY (Fig. 2, Fig. 3). To our knowledge, this is the largest duplication ever reported in the AZFc region This rearrangement is compatible with two NAHR events that have occurred between sister chromatids. Indeed, two subsequent b2/b4 duplications are required to explain this pattern (Fig. 3). Notably, genes in this region have been proven to be dosage sensitive, therefore their duplication could reduce fertility, at least in Asian men (14).
Amplicon copy number in samples R005, R075 and R091 indicates a complex evolutionary history (Fig. 2). Their rearrangement pattern is compatible with a series of two subsequent NAHR events between amplicons, for example a common r2/r3 inversion (36) followed by a b2/b3 duplication, but these cannot explain the presence of an extra grey amplicon. We argue that an additional duplication event occurred involving this amplicon (supplementary Fig. 2). This complex pattern has been previously observed in only one individual of haplogroup O2 (9). In our data set, this pattern is present in one J-L26* individual and two males belonging to haplogroup J-M67. The phylogenetic relationships of the two J-M67 males with respect to eighth J-M67 subjects analysed by Teitz et al (2018), which do not carry the rearrangement, suggest that this mutation could be polyphyletic within J-M67 too. On the other hand, three individuals, namely R012, R056 and R083 show the same simple NAHR pattern, without the extra grey amplicon (Fig. 2, supplementary Fig. 2). This arrangement of amplicons is relatively common and has been previously observed throughout the whole Y chromosome phylogenetic tree in the haplogroups A0, C1, C3, O2, O3 and R1b among the 1000 Genomes samples (9). Consistently with its polyphyletic nature, we found it in three additional haplogroups (J-L26*, J-M47 and T-L162; Fig. 2).
The pattern found in the sample R008 (haplogroup C-M217) can be explained by two simple NAHR events: a g1/g2 duplication and a b1/b3 deletion (Fig. 2, supplementary Fig. 3). In this case, the two events may have occurred at the same moment, since the deletion can be both an inter-chromatic and an intra-chromatic event. This amplicon pattern has not been observed in the 1000 Genomes dataset as reported by Teitz et al. (2018).
All the samples that we detected as haplogroup N belong to the Turkmen ethnic group and have some kind of chromosome rearrangements in the AZFc region (Fig. 2). The sample of haplogroup N-M46 (R063) has an amplicon copy number that is compatible with a r2/r3 inversion followed by a b2/b3 deletion (supplementary Fig. 4A). This rearrangement has been previously reported to be fixed in haplogroup N3 chromosomes (37). The three samples of haplogroup N-L666 (R061, R064 and R065) show a pattern that can be explained by the two events described above and a subsequent duplication event occurred between the blue amplicons (supplementary Fig. 4B). This event, described as b2/b3 rescue, has already been observed among haplogroup N individuals (supplementary Fig. 5) and can be interpreted as a way to partially or completely restore the dosage of genes PRY, DYZ, CDY and BPY2 present in the amplicons (9).
The pattern observed in the sample R077 (haplogroup J-L24) is partially compatible with a b1/b3 deletion, but the presence of an additional grey amplicon (2 instead of 1) suggests that a complex non-NAHR event occurred to explain the arrangements of the amplicons.
The individuals R014 and R086, belonging to two different haplogroups (R-M198 and G-M3406, respectively) show an arrangement that can be explained by a simple NAHR recombination between two green amplicons resulting in a large duplication known as gr/gr duplication, (+ 1 b, + 1 g, + 2 r, 1 gr,+1 y). This pattern has previously been reported as the most common AZFc rearrangement and has been observed in several Y chromosome haplogroups as a consequence of multiple independent recombinational events (9). Based on previous studies, men carrying the gr/gr primary duplication are at increased risk for infertility in Asia (14) while this association seems to be not statistically significant among European populations (38).
The individuals R019 (haplogroup J-M267), R033 (Haplogroup T-L162), R042 (haplogroup E-M34), show three different patterns that are not compatible with simple NAHR events occurring between amplicons. They can only be explained by micro-duplications that involve only one amplicon (for samples R019 and R033) or two amplicons (for the sample R042) (Fig. 2, supplementary table 6). All these patterns have already been previously observed, although in different Y chromosome backgrounds (9).
Notably, individuals R053, R058 and R059 share the same extra copy of P4 palindrome arm in the AZFb region, which may result in an additional HSFY gene. These samples belong to the same haplogroup (T-Y11151) and ethnic group (Qashqai), suggesting that this mutation occurred only once in this population and likely spread due to patrilineality. Although this pattern has also been observed in one individual of haplogroup T from the 1000 Genome Project (9), it belongs to a paraphyletic haplogroup suggesting that two different independent recombinational events occurred. Since the well-known multicopy Y-STR DYS385 is located within amplicon P4, we performed a Y-STR analysis among 93 Qahsqai individuals belonging to 9 different clans as a fast experimental approach to evaluate the frequency of this duplication within this ethnic group. The three sequenced subjects carrying the P4 duplication showed an unbalanced Y-STR pattern clearly due to the presence of an extra copy of the DYS385 Y-STR (supplementary Fig. 6). The same pattern was also observed in 19 additional subjects, all belonging to a specific Qashqai clan, representing 56.4% (22 out of 39) of the males (supplementary table 7).