Kurds are divided geographically, linguistically and tribally as a consequence of earlier invasions and migrations. Kurdish tribes are found throughout Iran, Turkey, Syria and Iraq and many of the tribes in Iraq live in and around Sulaymaniyah province in Iraqi Kurdistan [28, 29]. Therefore, the study of population genetics in the ethnic Kurd was necessary to trace the paternal and maternal lineage of the Kurdish tribes.
In the present study, the Y-STRs were used to determine the haplotype frequency and genetic variation of 17 loci among Sorani Kurds in Sulaymaniyah province. The results revealed that the highest genetic diversities were for DYS385a/b (0.848) and DYS458 (0.828) loci. The lowest genetic diversity for the Sorani Kurdish population was DYS392 (0.406). These results are similar to those previously reported for the Iraqi Arab population and Kurdish people in northern Iraq [12, 21, 30].
STR duplicates were confirmed in two individuals at two loci, DYS19 and Y GATA H4. A double allele at locus DYS19 was observed in one sample. Double alleles at the same locus were previously observed in the Iraqi population with the same duplicated alleles (15, 16) [18]. Other studies observed double alleles in DYS19 [22, 31, 32]. In the YHRD Release 66, the mutation rate of the DYS19 locus was 2.12e-3 (42 in 19,807) and this duplication (15, 16) was at a frequency of 0.051%. A duplication of the YGATA H4 locus was observed in one individual with the values 11, 12. A global study of the Y chromosomal haplotypes showed that one sample carried duplicated alleles at YGATA H4 with the values of 11, 12 [22]. The YHRD (Release 66) contained a total of four observations (0.0013%) of the YGATA 11, 12 duplication. In addition, null alleles were found in the Sorani Kurdish population in two loci, DYS448 and YGATA H4. These null alleles are most likely due to deletions of the target region or primer binding site regions or by mutations in the primer binding sites [33]. The previous study of the Y chromosomal haplotypes showed that the DYS448 has the highest level of null alleles in 51 countries (59/19,630) [22]. Based on the YHRD (Release 66), the mutation rate of the DYS448 locus was 1.37e-03 (15 in 10,935) and 827 null alleles were observed; whereas the mutation rate of the YGATA H4 was 2.51e-03 (30 in 11,970) and observations of 22 null alleles were reported.
In this study, off-ladder alleles were observed at locus DYS635 allele 27 (three samples) and locus Y GATA H4 allele 14 (two samples). However, these off-ladder alleles of the Yfiler kit are present in the allelic ladder of other commercial kits such as a YfilerTM Plus and PowerPlex® Y23 System [34, 35]. This addition in the other commercial kits can be helpful in appropriately designating rare alleles.
Y-STR haplogroups were inferred using Whit Athey’s Haplogroup predictor. The samples of the Sorani Kurds were classified into 18 different haplogroups. The major ones (>10%) were J (42.67%), R (18.47%), E (17.19%) and G (10.83%). The subclades of these haplogroups were J2 (28%), E1 (17.19%), R1 (14.64%), J1 (14%), G2 (10.83%) and R2 (3.8%). The results of the present study are in agreement with previous results that the most common haplogroups in the Kurdish population are J and R. Previous studies concluded that the haplogroup J is a common male lineage in West Asia [36]. However, the phylogeography of this haplogroup is complex. The two sub-haplogroups J1 and J2 are similar in distribution, but J2 is most common among modern Kurdish [21], Jewish [37], Iranian [9] and South Asian populations [38], while the maximal frequency of the subclade J1 is in Arab-speaking populations. Predictions indicated the predominance of haplogroup J1 in Iraqi Arabs (36.6%) [18], in Saudi Arabia (71%) [24], and in Kuwait (37%) [39], whereas haplogroup prediction in the Bahraini population suggested that the most common subclade is J2 (27.6%) followed by J1 (23%) [26].
Members of the haplogroup R are widespread in Europe, R1a being most common in eastern Europe and R1b in western Europe [40, 41]. Studies indicated that the haplogroup R originated in north Asia about 27,000 years ago and spread into western, central and southern Asia [41, 42]. In addition, members of the clade R are also found at high frequencies in the central-western part of the African continent [43]. In the present study, the second major haplogroup among the Sorani Kurds was R 18.4% (29/157) (R1a = 12, R1b = 11, R2 = 6). The previous results on the Kurdish population in northern Iraq revealed that the major sub-haplogroup was R1a 17.17% (17/104); four other samples belonged to R1b sub-haplogroup (4.04%) while R2 was not observed [21].
Genetic studies indicated that the highest frequency of the haplogroup E1b1b-M35 is in north Africa and reaches an average frequency of 42–45% across the region [15, 44]. In the current study, high frequencies of the E1b1b-M35 sub-haplogroup were observed 16.56% (26/157) while the E1b1a sub-haplogroup was found in only one individual (0.64%).
The haplogroup G is most common in the Caucasus, Near/Middle East and in southern Europe. Archaeological research estimated the origin of the clade G, adjacent to eastern Anatolia. The haplogroup G, with the haplogroup J2, has been associated with the spread of agriculture into Europe [45]. In the present study, the haplogroup G, particularly sub-haplogroup G2a-P15, was frequently observed 10.83% (17/157) in the Sorani Kurdish population. Observations of the sub-haplogroups E1b1b-M35 and G2a-P15 were also significant from the earlier study on the paternal lineage of Northern Iraqi Kurds, E1b1b-M35 (13/104) (12.5%) and G2a-P15 (8/104) (7.69%) [21].
However, these slight differences in the genetic parameters were expected. The current study, the first to our knowledge, focuses on one group of the Kurdish population (Sorani Kurds) in Iraq, separated from the Kurmanji Kurdish group in the northwest of Iraq on the border with Turkey. In addition, a higher number of population samples were collected than in the previous paternal lineage studies of the Kurdish population [21, 30, 46], which is also an important consideration in obtaining more reliable results with greater precision and power.
The Sorani Kurdish population was compared with 15 other populations in the YHRD database. A pairwise population genetic distance (Rst) revealed significant differences between Sorani Kurds and populations from western Europe and Africa, while similarities were observed with the west and central Asian countries. However, genetic distance results are strongly influenced by the loci number and sample sizes. Increasing the loci numbers will improve the precision estimates of the genetic differences. In addition, larger sample sizes per population can provide more accurate mean values if insufficient loci are available [47].
The Y-STR HapMap developed in this study revealed not only closer geographical proximity of the population samples, but also a more distinct sub-grouping of the respective populations. The results of the present study show that the Sorani Kurdish population is part of the Middle Eastern population.