Y-chromosome diversity
To research the genetic variation within the Hungarian-speakers, we employed evolutionarily stable binary markers (SNPs) to define the haplogroup of each Y-chromosome. Subsequently, we examined the Y-STR variation of the groups, and specific phylogenetic analyses within certain haplogroups.
The Y haplogroup frequencies of the two populations are presented in Table 1. The most frequent haplogroups of the Zobor region, Slovakia population were R1a-Z280 (32.5%), R1a-M458 (25%), R1b-P312 (15.00%), G2a-L156 (7.5%).
In the case of the Baranja males, the most frequent haplogroups were I2a-P37 (21.95%), R1a-Z280 (17.07%). The overall pattern of Y-chromosomal haplogroup distributions in the two studied populations were similar, but haplogroups R1a-Z93, N1c-M46, C2-M217, J2b-M12 appeared only in the Baranja population (Table 1). Haplogroups G2a-L156 and R1b-M343/P25 (L23) were observed more frequently in the Baranja population. We focused on the genetic history of these specific haplogroups (G2a-L156, R1b-L23) beside the R1a-Z280 and I2a-P37, as they have been previously shown to represent phylogeographically relevant structures (Pamjav et al. 2017; 2022).
The Baranja group exhibited haplotype and haplogroup diversities of 0.99938 and 0.90586, respectively. In contrast, the Zobor region displayed lower values, with 0.98974 for haplotype diversity and 0.81154 for haplogroup diversity. The Y-STR and Y-SNP outcomes for the 40 samples from the Zobor region (Slovakia) and the 81 samples from Baranja (Croatia) are detailed in Supplementary Table S3 and S6. The diminished diversity observed in the Zobor region might be attributed to the smaller sample size. However, this reduced diversity is still more pronounced than what was observed in the Váh valley group, which was presented by n = 48 Y-STR haplotypes (Pamjav et al. 2022).
Table 1
Haplogroup frequencies of the Y-chromosomal haplogroups from Baranja and Zobor region
Haplogroup frequencies in Baranja, Croatia | Haplogroup frequencies in Zobor region, Slovakia |
Haplogroup | Sample number | Frequency | Haplogroup | Sample number | Frequency |
C-M217 | 1 | 1.22% | C-M217 | 0 | 0 |
E1b1b-M78 | 6 | 7.32% | E1b1b-M78 | 1 | 2.5% |
E1b1-M123 | 1 | 1.22% | E1b1-M123 | 0 | 0 |
G2a-L156 | 3 | 3.66% | G2a-L156 | 3 | 7.5% |
I1-M253 | 8 | 9.76% | I1-M253 | 0 | 0 |
I2a-P37 | 18 | 21.95% | I2a-P37 | 4 | 10.0% |
I2b-M223 | 1 | 1.22% | I2b-M223 | 0 | 0 |
J2b-M12 | 2 | 2.44% | J2b-M12 | 0 | 0 |
L-M11 | 1 | 1.22% | L-M11 | 0 | 0 |
N-VL29 | 1 | 1.22% | N-VL29 | 0 | 0 |
R1a-M458 | 4 | 4.88% | R1a-M458 | 10 | 25.0% |
R1a-Z280 | 14 | 17.07% | R1a-Z280 | 13 | 32.5% |
R1a-Z93 | 2 | 2.44% | R1a-Z93 | 0 | 0 |
R1b-M312 | 1 | 1.22% | R1b-M312 | 0 | 0 |
R1b-M343* | 1 | 1.22% | R1b-M343* | 0 | 0 |
R1b-M412 | 1 | 1.22% | R1b-M412 | 1 | 2.5% |
R1b-P25* | 8 | 9.76% | R1b-P25* | 1 | 2.5% |
R1b-P312 | 4 | 4.88% | R1b-P312 | 6 | 15.0% |
R1b-U106 | 4 | 4.88% | R1b-U106 | 0 | 0 |
T-M70 | 1 | 1.22% | T-M70 | 0 | 0 |
R1a-M198* | 0 | 0 | R1a-M198* | 1 | 2.5% |
Paternal genetic structure of the two populations
The Y haplogroup frequency data were calculated incorporating reference populations and used for a PCA plot (Supplementary Table S7 and Fig. S1). The location of the studied populations on the PCA plot is roughly consistent with the geographical distances between them. Populations from the same geographic region were clustered together and Hungarian populations overlapped with the surrounding Slavic populations, and the Zobor region shows more connections to the northern, northeastern populations. The resulted pattern with slight shift of the Zobor region sample set from the Baranja group is primarily due to the relatively high I2a, I1 and E1b1 haplogroup frequencies in Baranja populations. Further differences may be due to the preponderance (25%) of R1a-M458 in the Zobor region population, which is common among Western Slavs, the absence of the R1b subgroup (U106), which is common in Western Europe, but the small sample numbers of the tested groups might also influence these affinities.
Pairwise FST distances and p-values for 41 populations, including Baranja, Zobor region, and other Eurasian populations from published sources were calculated as shown in Supplementary Table S8 and presented in a heatmap plot with clustering (Fig. S2). The Zobor region shows significant genetic distance from almost every other group (p < 0.05), whereas the Baranja group is in non-significant distance from the population of Hungary, the Székelys, Moldovans and Slovenians. While small sample sizes limit the scope of definitive conclusions, the clustering groups populations with high genetic affinities to one another. Eastern Europeans and Hungarians from Hungary, the Baranja, and Zobor regions form one cohesive cluster. Another closely related clade comprises North Europeans. In contrast, the more Southeastern European populations, including the Székelys and Csángós, constitute a distinct cluster (Supplementary Table S8).
We further investigated these inter-population affinites with Y-STR data, calculating RST genetic distances. We constructed non-metric multidimensional scaling (MDS) plot based on Y-chromosomal haplotypes (n = 7,287) collected from YHRD.org, consisting of 23 STR available loci from geographically relevant populations (YHRD 2023) (Fig. 2). The RST genetic distances and RST p-values of the studied populations are presented in Supplementary Table S9.
Whereas the Székely population still shows connections toward southern populations and to diverse groups of the Carpathian Basin (like nonsignificant distance from the Váh valley population and Baranja) and the Slovenians, the Baranja population shows a stronger genetic similarity to Bodrogköz, Váh valley, Slovenian and Czech populations beside the Székelys. In the case of the Zobor region, Rétköz and Bodrogköz groups were the closest from the Carpathian Basin, although Polish population was also in nonsignificant distance (Supplementary Table S9).
Summarizing these results, we can conclude that the studied populations do not separate from their neighboring groups, and whereas different trends are present in the two new datasets, fine-scale geographic pattern is not decipherable through grouped Y-haplogroup or 23 STR data analyses and low-resolution SNP typing.
Phylogenetic analysis of the paternal lines
Based on the Y-STR haplotypes, median joining networks were constructed for the two investigated regions (Fig. S3), and for the available 21 Y-STR datasets of the Carpathian Basin (Fig. S4). We can observe on these networks that paternal haplotypes are spread throughout the studied regions. Furthermore, based on the data reviewed to date, the Carpathian Basin does not display specific Y-haplotype structure in the modern male population that corresponds with its geography. Only subtle differences are observable in the Székely population along with their shared paternal ancestries in the Bodrogköz/ Rétköz populations (J2 and N1a haplotypes).
Subsequently, specific haplogroups that can be linked to ancient Hungarian data were analyzed from phylogenetic aspects separately.
Generally, we used 10 Y-STR haplotypes from other Eurasian populations and aDNA results from published sources for the networks, in order to widen the geographical context. In some cases, where more high-resolution data were available, we also created networks using 15, 17 or 21 STRs. We constructed eight networks (R1a-Z93, N-M46, R1b-P25/M343, G2a-L156, R1a-Z280, J2b-M12, C-M217 and I2a-P37) that are potentially helpful in uncovering the genetic legacy of the populations being studied. Four of them (G2a-L156, R1a-Z280, J2b-M12, C-M217) can be found with descriptions in Supplementary Information as Fig. S5-S9. Founder (or modal) haplotypes (which form the biggest haplotype cluster and are shared by several populations in the network) were used to calculate the TMRCA and were constructed using the median value of the loci for the given haplogroups.
Median Joining network of 123 R1a-Z93 haplotypes
A MJ network of 123 R1a-Z93 haplotypes from the 18 populations tested in this study or published previously from modern (Underhill et al. 2015ró et al. 2015; Dudás et al. 2019) or ancient sources (Keyser et al. 2021; Olasz et al. 2018; Fóthi et al. 2020) is seen on Fig. 3. On the left side of the R1a-Z93 network, three modern Hungarian and one Xiongnu period (TUK04) aDNA samples, including one modern Baranja sample (DV 60) formed a common branch with Bashkirian Mari, Uzbek Khwarazm and Uzbeg Fergana samples. On this branch, one modern Hungarian sample shared two haplotypes (one TUK25 from the Xiongnu period and one Bashkirian Mari) in haplotype cluster 1. Two Uzbek (one Khwarazm and one Fergana sample) and one Baranja samples can be derived from this cluster (see Fig. 3). Cluster 2 includes three haplotypes: one Hungarian aDNA (Nagykőrös Gr2), one Xiongnu period aDNA (TUK09A) and one modern Armenian sample, separated by one molecular step (loci DYS389I and DYS390) from clusters 3 and 4. Cluster 2 is considered as the founding haplotype, as it contains two ancient haplotypes from males that lived 1,000 to 2,500 years ago. The Y-STR haplotype of Hungarian King Bela III (died in 1196 CE) is found three molecular steps away from cluster 2 (see KB3 in Fig. 3). Cluster 3 includes one Hungarian speaking Székely, one Mongolian, one Altaian, and one Bronze Age Andronovo aDNA (S10) samples. One ancient Hungarian sample (sample II54 from the Hungarian Royal Basilica of Székesfehérvár) is located one mutational step (DYS391) from cluster 5 and one mutational step (DYS19) from a Xiongnu period sample (TUK45).
The age of accumulated STR variation (TMRCA) within the R1a-Z93* lineage for 123 samples is estimated as 13 ± 3 kya (95% CI = 10.0–16.0 kya), considering that cluster 1 is the founder haplotype, which is higher than that of the SNP-based calculation (4.6 kya, 95% CI: 4.2-5.0 kya) (Adamov et al. 2015; “Yfull” 2023).
The paragroup R1a-Z93* is most common in the Altai region of Southern Siberia nowadays but it has also spread in Kyrgyzstan and in all Iranian populations (Underhill et al. 2015). Furthermore, the R1a-Z93 haplogroup is also common in Tajik ethnic groups in Afghan Pashtuns and Caucasus as well. Downstream haplogroup R1a-Z2125 occurred at highest frequencies in Kyrgyzstan and among Afghan Pashtuns (Underhill et al. 2015).
Keyser et al. (2021) demonstrated that ancient Xiongnu period (TUK45, TUK04, TUK25 and TUK09A) samples from Mongolia belonged to haplogroups R1a-Z93 (Z2125), which are also included in this network. Some of them shared the same haplotype on 10 Y-STR level with a present-day Hungarian (see cluster 1 in Fig. 3) or Hungarian aDNA (Nagykőrös Gr2) samples (see cluster 2 in Fig. 3), but they differ in deeper analyses when we consider their full available Y-STR profiles.
Ancient DNA studies showed that the Hungarian King Béla III, and another sample from Royal Basilica (II54) belonged to haplogroup R1a-Z93 (Olasz et al. 2018), and two other Z93 samples were found among the Hungarian Conqueror population as well (Fóthi et al. 2020). On 15 Y-STR level they are three steps away from each other.
Although, Y-STR analyses complicated in aDNA research due to degradation, SNP data are accumulating via whole genome sequencing and genome-wide capture approaches. We gained a more accurate haplogroup classification of the modern DV020 (Baranja) R1a-Z93 sample: DV020 is R1a1b2a2a3c2~ (FGC56440 terminal SNP), which subhaplogroup can be found in the Hun period Hungary and Kazakhstan as well (Gnecchi-Ruscone et al. 2021). From whole genome data, among the Z93 samples a Hun period sample from Romanian Marosszentgyörgy and three other middle-late Avar samples are classified as R-S23201*(xYP1348, Y65081), which means that it can be the same as haplogroup R1a1a1b2a2a3c ~ of ISOGG (Maróti et al. 2022). Another Hun period sample (Budapest Vezér street) completely matches the DV020 sample on ISOGG classification, its Y haplogroup marker is R-S12380/etc*(xY73177), which is a subbranch of the DV020-specific FGC56440 branch on Yfull. There are many examples of this haplogroup known from ancient genomic studies dated to the Bronze Age, found at Russian Krasnoyarsk, Kazakh Aktogay (1900 − 1400 BCE) and Early Iron Age Tasmola culture (700 − 500 BCE) (Narasimhan et al. 2019). Some samples from the (Middle) Late Bronze Age Mongolia and a few Xiongnu samples also show the Z2124 subgroup based on whole genome SNP data (Jeong et al. 2020; Wang et al. 2021). The R1a1a1b2a2a3 ~ subhaplogroup is present in an early medieval (second third 10th century to first third 11th century) Hungarian village cemetery site Homokmégy-Székes as well (Maár et al. 2021). We conclude that this Y haplogroup might have arrived at the Carpathian Basin in one of the eastern migrations and can have an origin in the Kazakh steppe.
Median-joining network of 180 N-M46 haplotypes
Based on 180 haplotypes, an N-M46 (ISOGG 2019–2020: N1a1-M46/Tat) MJ network was constructed using populations previously studied by Bíró et al. (2015); Fehér et al. (2015); Szeifert et al. (2022); Pimenoff et al. (2008); Ilumäe et al. (2016) (Fig. 4). The modal haplotype of the N-M46 network (former N1c-Tat) was shared by 32 samples from eight populations (haplotype cluster 1 on Fig. 4), recognized by researchers as the ancestral haplotype (Derenko et al. 2007)). Cluster 2, which is the largest cluster, includes 43 haplotypes from five populations, and located one molecular step (DYS391) from cluster 1. More than 70% of the studied Buryat samples belonged to cluster 2, indicating that the Buryats we studied belong to a young and isolated population (Dudás et al. 2019ró et al. 2015). One Hungarian aDNA haplotype from Örménykút (Ö52/50, Fóthi et al. (2020)) was positioned six mutational steps away from cluster 1 and formed a haplotype branch with two present-day Hungarian males.
From the perspective of Hungarians, haplotype cluster 3 is pertinent. Cluster 3 comprises three identical haplotypes: two Hungarian and one Northern Mansi, with our Baranja N-M46 haplotype included among them.
As it can be seen from Fig. 4, the right branch of the network has almost exclusively representatives of the Finno-Ugric language group, except for Bashkirs. Bashkirs have been living close to the Ugric peoples around the Ural Mountains, where admixture can be traced back for a millennium. This network branch can be derived from cluster 2, where the Buryats form the majority of the haplotypes.
The other population samples included in the network formed independent clusters, such as Bashkirian Mari, Finns, Bashkirians, and Khanties from the research of Ilumäe et al. (2016) and shared haplotypes with other populations or were scattered in the network.
The age of accumulated STR variation (TMRCA) within N-M46 lineage for 180 samples is estimated as 11.4 ± 3.3 kya (95% CI = 8.1–14.7 kya), considering cluster 1 is the founder haplotype, which is similar to its sequence-based calculation of 13 kya (95% CI:11.3–14.6 kya) (Ilumäe et al. 2016).
Macrohaplogroup N-M231 is widespread from Scandinavia to the Kamchatka in North Eurasia and it is the most frequent haplogroup in Siberia (Karafet et al. 2008; Rootsi et al. 2007). Based on the geographic distribution of the parahaplogroup N*-M231, it most likely originated in Southeast Asia, whereas its most widespread subgroup is N-M46 (Rootsi et al. 2007). Other authors consider Southern Siberia as its geographical origin (Derenko et al. 2007).
The present-day Hungarian Y-chromosomal gene pool contains only a small percentage of N-M46 (1%) and has a distribution typical of East-Central Europe (Völgyi et al. 2009). However, its incidence is higher among the Hungarian-speaking Bodrogköz in East Hungary (6.2%) and among Székelys from Miercurea Ciuc, Romania (6.3%) (Bíró et al. 2015; Pamjav et al. 2017). According to the results of the Hungarian aDNA studies, N-M46 is detected at a higher frequency (17–36%) in the Hungarian Conquest Period population in the Carpathian Basin (Fóthi et al. 2020; Neparáczki et al. 2019).
We also constructed this network using 15 Y-STRs, incorporating additional ancient samples related to ancient Hungarians from the Ural and Volga regions (Fig. S9). The DV sample diverges from a Bashkir haplotype by two mutational steps, while the other Hungarian males share their haplotype with medieval samples from the Volga and Ural regions. The Avar period samples belong to a different subgroup (N1a1a1a1a3a-F4205) of the N-M46 clade, as corroborated by genomic studies (Gnecchi-Ruscone et al. 2021; Csáky, Gerber, Koncz, et al. 2020).
Median-joining network of 196 I2a-P37 haplotypes
The MJ network of 196 I2a-P37 STR haplotypes from 14 populations is illustrated in Fig. 5. Greek, Irish (Hallast et al. 2015), Catalan samples (Solé-Morata et al. 2015), Croatian samples from FTDNA (“FTDNA I2a Project” 2023), Slovakian samples (Nováčková et al. 2015; Rębała et al. 2013) were used in the analyses. One sample (Karos II, grave 16) from the Hungarian Conquest Period (Fóthi et al. 2020) and all other samples originated from previously studied and published populations (Bíró et al. 2015; Pamjav et al. 2017; 2022; Borbély et al. 2023). In this network, Hungarian speakers residing in rural regions and adjacent countries were distinguished separately. While they are well-documented, there is limited data available that combines both haplotype and haplogroup information from other European populations.
Several haplotype clusters can be seen in Fig. 5, which shows different connections of the populations included in the study. Many shared haplotypes can be observed between the populations (circles of mixed colors). The haplotype cluster 1 is shared by 16 males from eight populations (one Csángó, two Székely, four Hungarian, one Váh valley, two Bodrogköz, one Slovakia, one Greek and four Baranja). A conquering period Hungarian ancient haplotype (Karos II aH) is located one mutational step (DYS389II) away from cluster 1, which indicates that it is genetically closely related to samples of cluster 1.
The age of accumulated STR variation within the I2a-P37 lineage for 196 samples is estimated to be 18.5 ± 3.5 kya (95% CI = 15.0–22.0 kya), considering the median haplotype (see M in Fig. 5) is the founding (one Hungarian male), which is consistent with the sequence-based calculation of 18.3 kya (“Yfull” 2023).
Haplogroup I-M170 is a fundamental element of the European Y-DNA gene pool, representing, on average, 18% of the total male lineages. Its near absence in other regions, including the Near East, indicates its likely emergence in Europe, potentially before the Last Glacial Maximum (LGM) (Semino et al. 2000).
Haplogroup I-M170 has two major subgroups: I1-M253, which is common in Scandinavia, and I2-M438. I2 subgroup I2a-P37 (formerly I1b, currently I2a1a), extends from the eastern Adriatic to eastern Europe and noticeably decreases towards the southern Balkans. I2a probably diffused from its homeland, Eastern Europe or the Balkans after the LGM (Rootsi et al. 2004).
In contrast, I2b-M223 (formerly I1c, currently I2a1b1) most likely arose in southern France/Iberia and similarly to the other subclades, it underwent a postglacial expansion (Rootsi et al. 2004). Taken together, these observations suggest that haplogroup I-M170 may have played a central role in the process of human recolonization of Europe from isolated refugia after the LGM and suggest that a comprehensive phylogeographic study should localize the in situ origin and spread of major male founders (Rootsi et al. 2004). According to the study of Peričić et al. (2005), the I2a haplogroup corresponds to the historic expansion of the Slavs that may have taken place in the middle of the first millennium AD and resulted in significant admixture with the substratum populations living in Eastern Europe. Their haplogroup, subgroup I2a, is widespread among the Slavs, especially the western south Slavs, but has also been detected up to 3-7.1% in populations of the Northern Caucasus (Rootsi et al. 2004).
I2a-P37 is found today with the highest frequency in Bosnians (40%), followed by Croatians (31.2%), other peoples in the Balkans, and Ukrainians (16.1%), and populations in Central and Eastern Europe (Rootsi et al. 2004).
Based on the studies we conducted, the frequency of I2a-P37 in Hungarian speakers was as follows: Hungarian (16%), Csángó in Ghimeş (26.6%), Székely from Csíkszereda (8%), Székelys around Székelyudvarhely (21.8%), Bodrogköz (19%), Rétköz (6.6%), Váh valley (16.67%) (Pamjav et al. 2022; Borbély et al. 2023; Völgyi et al. 2009ró et al. 2015; Pamjav et al. 2017). In the present study, it was also prevalent in Baranja (19.75%) and Zobor region (10%) populations.
Based on our network, there are admixtures and shared lineages between the Hungarian-speaking populations (clusters 1–11), and also with that of the neighbouring Slavic countries (clusters 1–4 and 7–8), included in the study (Fig. 5). Among the conqueror Hungarians, there were three samples that had haplogroup I2a, but due to the lack of overlapping STR loci, only one sample could be included in the study. According to the authors, these three I2a samples were close relatives on the paternal lineage (Fóthi et al. 2020). Based on another Hungarian study, six Conquest Period individuals also carried the I2a haplogroup, but two of them were duplicates of those included in the study of Fóthi et al. (2020) namely Karos II (graves 16 and 52) and Neparáczki et al. (2019). According to the archaeological records, the male from Karos II (grave 52) was a leader among the Hungarian conquerors (Révész 1996). He belonged to subgroup I2a1a2b1a1a2 (I-Y4460*(xY5598, Y13498, Y16810)) according to Maróti et al. (2022).
Based on the YSEQ I2 panel, the deeper classification of the DV046 (Baranja) sample is I2-Y125026, which is I2a1b2a2b2b ~ according to ISOGG 2019–2020. Modern-day Yfull data from this haplogroup are known from Hungary, Croatia, Romania, Serbia, Turkey, Montenegro, Greece, Poland, and Russia.
Median-joining network of 207 R1b-P25 haplotypes
An R1b-P25 MJ network (with R1b-L23 and R1b-M73 subbranches) was constructed using 207 samples from the present study, from FTDNA data and populations previously studied (Myres et al. 2011ró et al. 2015; Purps et al. 2014) (Fig. 6). The founding haplotype of the Eastern R1b*-P25 (L23, a subgroup of R1b-M269) was shared by 17 samples, including eight Hungarian and two Caucasian Avar, two German and five Czech samples, as shown in Fig. 6 (see cluster 1). All Hungarian haplotypes included in the network belong to the R1b-L23 cluster and most of them appear to be descended from the founding haplotype (cluster 1). The pattern of these haplotype clusters is starlike, representing a set of closely related haplotypes of Hungarian males. Out of ten samples from the present studied regions, six samples are located one or two mutation steps away from the founding haplotype cluster together with other Hungarian males indicating a close relationship with each other in space and time. In addition to the founding haplotype, some Hungarian haplotypes were shared with Europeans (cluster 2) or with Lezghians and Armenians from the Caucasus (cluster 3) as well as with Croatians (cluster 4). Hungarian, Belgian, Armenian, Croatian, German, and Scottish samples show the most similar haplotypes to Baranja and Zobor region samples within the R1b-L23 haplogroup if we compare them to samples from FTDNA at 17/21 STR level.
The age of accumulated STR variation (TMRCA) within the R1b-P25 lineage for 207 samples is estimated to be 25.23 ± 4.6 kya (95% CI = 20.64–29.8 kya), considering the founding haplotype is the ancestral one (cluster 1 in Fig. 6), which is higher than the time (20.4 kya) calculated on sequence data (“Yfull” 2023). For the R1b-L23 branch, the age of accumulated STR variation is 18.5 ± 3.8 kya (95% CI = 14.7–22.3 kya), which is higher than its sequence-based calculation of 6.4 kya (“Yfull” 2023). The time calculated by Myres et al. (2011) is 10.3 ± 1.7 kya (95% CI = 8.6–12 kya), which is between the two values calculated above.
Haplogroup R1b-M269 is the most frequent Western European lineage that was originally thought to have originated in the Paleolithic era, but recently suggests a Late Neolithic origin (Batini et al. 2017). Most of the R1b-M412 chromosomes belong to Western European males, but another subgroup, R1b-L23 (xM412, R1b1a1b1), is commonly referred to as “Eastern European R1b”, prevalent in the Caucasus, Turkey, and Ural, with about 10% frequency (Myres et al. 2011).
Olalde et al. (2018) have confirmed the role of R1b-L23 subclades in the expansion of the Eastern population of the Bell Beaker culture to Iberia. And it was also shown to be an important part of the Yamnaya-related Early Bronze Age paternal ancestry (Haak et al. 2015).
Based on recently published aDNA studies, haplogroup R1b-L23 was present in the territory of today’s Czechia and Poland in Corded Ware culture associated samples from 2000–3000 BC (Linderholm et al. 2020) and later in the Migration Period in Hun period, in the Avars and the Hungarian conquerors (Fóthi et al. 2020; Gnecchi-Ruscone et al. 2021; Csáky, Gerber, Koncz, et al. 2020; Neparáczki et al. 2019). Nowadays, examples of this subgroup are scattered throughout Europe, with the highest concentrations in the United Kingdom and Ireland, as per Yfull data.
In the two regions under investigation, five R1b-P25 samples were analyzed for the marker Z2103. All results fell under the Z2103 (R1b1a1b1b) subgroup. The paternal lineage of R1b-Z2103 is believed to have originated around 4250 BCE, branching off from the ancestral R1b-L23 line and distinguishing it from other lineages. This estimation for the most recent common ancestor (TMRCA) is derived from current-day testers' STR and SNP data from FTDNA (“FTDNA R1b” 2023).
Additional median-joining networks
Further analyses of G2-L156, R1a-Z280, J2b-M12, and C-M217 median-joining networks can be found in Supplementary Information of this paper. We provide a summary of these analyses in the subsequent sections.
In the Median-joining network of 280 G2-L156 haplotypes (Fig. S5), we included six G2-L156 samples from the Baranja and Zobor region populations, each of which falls into subclades L497, P303* and M406. Among the samples from the present study, four samples belong to the M406 branch, one Zobor region sample is identical to the assumed modal (founding) haplotype. Nearly all Hungarian males from this study cluster either with Caucasian population samples, such as the Balkarians/Karachays, or with samples from Tyroleans (as seen in the M497, M406 branches) and Germans.
The G2-L156 haplogroup was also found among aDNA samples from the Carpathian Basin. The researchers showed that one ancient Avar and four Hungarian conqueror samples belonged to the G2a-L293 subgroup, two Hungarian conqueror samples belonged to the G2a-U1 subgroup, and one Hungarian conqueror sample belonged to the G2a2b-L30 subgroup (Fóthi et al. 2020; Neparáczki et al. 2019).
A MJ network of R1a-Z280 (R1a1a1b1a2) samples was generated using 126 haplotypes from nine populations, including 25 samples from the present study (10 Zobor region and 15 Baranja). As illustrated in Fig. S6, the R1a-Z280 MJ network presents a notable pattern. The populations examined in this study share common haplotypes with not only Hungarian-speakers but also Finno-Ugric-speakers (haplotypes 1–3, 5, and 8). Intriguingly, samples from either the Baranja or Zobor regions also possess shared haplotypes with Uzbek samples (haplotypes 3 and 7). This suggests a deep-rooted common paternal genetic lineage, given the vast geographical distance separating these populations for at least a millennium (Róna-Tas 1999).
The median-joining network of 195 J2b-M102/M12 haplotypes was constructed from 12 populations (Fig. S7). Predominantly, European populations share the same haplotypes (clusters 1–2 and 4–5) in this network, suggesting that the ancestors' admixture occurred in Europe. An exception is the presence of an Indian sample in haplotype cluster 3. Haplogroup J2b-M12 is more prevalent in Hungarian-speaking minorities (such as Székely, Csángó, and Baranja) residing in countries bordering Hungary or in more isolated regions within Hungary (like Bodrogköz and Rétköz), with frequencies ranging from 3–5%, compared to the broader Hungarian population.
Median-joining network of 279 C2-M217 haplotypes (Fig. S8) was based on 15 populations. Several haplotype clusters can be seen in the figure, which shows different admixtures of Asian populations. Our Baranja Hungarian sample from Croatia (see DV in Fig. S8) is located on a branch with Hazara, Korean and Manchu samples far from the star-cluster. This branch can be derived from a haplotype consisting of two Mongolian and one Buryat samples. C-M217 haplogroup was detected in a rare 0.2% in the modern Hungarian samples submitted to the FTDNA database and had more prevalence in the Avar period of the region.
Evaluation of the mitochondrial DNA data
Haplogroup-based analyses
Altogether 168 newly reported high-coverage whole mitogenomes were analyzed in this study, 79 from Zobor region and 89 from Baranja with a mean mitogenome coverage of 209.05x (using Illumina NGS technology).
Table 2
Major mtDNA haplogroups and their frequencies in the Zobor region and Baranja populations. Subhaplogroup resolution is detailed in Supplementary Table S3.
Haplogroup mtDNA | n (absolute frequency, Zobor) | Frequency Zobor region | n (absolute frequency, Baranja) | Frequency Baranja region |
H | 34 | 43.04% | 37 | 41.57% |
K | 8 | 10.13% | 7 | 7.87% |
U5a | 7 | 8.86% | 5 | 5.62% |
U2 | 5 | 6.33% | 2 | 2.25% |
J | 4 | 5.06% | 7 | 7.87% |
T/ T1 | 4 | 5.06% | 3 | 3.37% |
U4 | 3 | 3.8% | 5 | 5.62% |
HV | 2 | 2.53% | 3 | 3.37% |
T2 | 2 | 2.53% | 4 | 4.49% |
V | 2 | 2.53% | 3 | 3.37% |
X | 2 | 2.53% | 2 | 2.25% |
Y | 2 | 2.53% | 0 | 0% |
D | 1 | 1.27% | 0 | 0% |
N1 | 1 | 1.27% | 0 | 0% |
U5b | 1 | 1.27% | 5 | 5.62% |
L | 1 | 1.27% | 0 | 0% |
R | 0 | 0% | 1 | 1.12% |
U3 | 0 | 0% | 2 | 2.25% |
W | 0 | 0% | 3 | 3.37% |
In the Zobor region, 79 mitogenome sequences revealed 377 polymorphic sites, corresponding to 63 distinct haplotypes. These exhibited a haplotype diversity (Hd) of 0.9932. On the other hand, 89 mitogenome sequences of the Baranja region population displayed 447 variable sites, clustering into 78 unique haplotypes with a marginally elevated haplotype diversity of Hd = 0.9969 compared to the Zobor region.
The median-joining network of mitogenomes from the investigated regions showed a large variety of different haplogroups among the villages, without any unique pattern in either case (see Fig. 7). Most of the samples belong to the typically European H and U macrohaplogroups. Most of the haplogroups were shared among the villages, and almost all villages have diverse haplogroup distribution of the maternal lines in both studied regions. Notably, the U macrohaplogroup was absent in the samples from the Pohranice municipality in the Zobor region. This absence however might be attributed to the limited sample size from Pohranice. In the Baranja dataset, the majority of samples associated with haplogroup K originate from a single community, specifically Suza. Nevertheless, the distribution of haplogroups among different villages exhibits clear diversity.
Due to the uneven and sometimes limited number of samples across villages, conducting an AMOVA test for heterogeneity wasn't feasible. However, the variations both within and between villages are distinctly illustrated in Fig. 7.
A single aDNA study from the 9th-12th century exists for the Zobor region, which served as a Hungarian-Slavic contact zone during that era. Although the ancient sample set is limited in size and restricted to hypervariable sequences, some parallels can be observed, notably within haplogroup U5a1b (Csákyová et al. 2016). From the Baranja region, mostly prehistoric sample sets are published yet, which attest among others for the Neolithic presence of haplogroups T2a-b, K1a-b, and K2b in the area, and the prehistoric prevalence of J1c in the broader North Balkan area (Mallick and Reich 2023). These lineages are also found in today’s Baranja population.
Although most haplogroups in our samples align with those predominantly found in Europe, several outlier haplogroups were identified, including haplogroups L1b, N1a, X2, Y1a, D4, U4b, and U3b3. The appearance of outlier maternal lineage L1b in the Zobor data set is noteworthy. In Europe, mtDNA macrohaplogroup L represents less than 1% of the total population. L1b subgroup, dated at about 10 kya, has its frequency maximum in West Africa (Cerezo et al. 2012). According to phylogeographic analyses carried out by Cerezo et al. (Cerezo et al. 2012), around 65% of the European L lineages are believed to have arrived during more recent historical periods, such as the Roman period, the Arab conquest of the Iberian Peninsula and Sicily, and the Atlantic slave trade era (Cerezo et al. 2012). Ancient DNA data are scarce from these periods of Europe yet, therefore the origin of this group in the Zobor dataset remains open.
Although the mitochondrial N1a haplogroup was prevalent among the ancient Hungarians, the N1a representative from the Zobor region belongs to the prehistoric branch of the haplogroup (N1a1a1a3). The closest parallel to this lineage is from the southern area of Transdaubia (Western Hungary) and dates to the transition between the 6th and 5th millennia BCE (sample I0176 in Haak et al. 2015 (Haak et al. 2015)).
Haplogroup X2 occurs in two-two cases from both regions (X2c1 and X2b). X2 is more prevalent in the populations of the Near East, Caucasus, and Mediterranean Europe compared to those of northern and northeastern Europe and rare among Eastern European populations. Furthermore, it is virtually absent in the Finno-Ugric and Turkic-speaking peoples residing in the Volga-Ural region (Reidla et al. 2003). Both detected subgroups have their parallels in prehistoric Europe, where X2b was more frequent. Two X2c1 samples from the Zobor region have close parallel from the conquest period Karos-Eperjesszög cemetery from northern Hungary (Karos 2/70) (Neparáczki et al. 2018).
The rear mitochondrial haplogroup Y1a is most probably a sign of the maternal continuity of the Avar population in the Zobor region, based on parallels in Gnecchi et al. 2022, Maróti et al. 2022 (Gnecchi-Ruscone et al. 2021; Maróti et al. 2022). Besides the Avar period of the Carpathian Basin, aDNA haplogroup matches are only known from Mongolia and Kazakhstan (Gnecchi-Ruscone et al. 2021; Jeong et al. 2020; Mallick and Reich 2023). Other outlier haplogroups (D4, U4b, U3b3) are discussed along phylogenetic analyses in the subsequent chapter.
We used PCA to visualize the population genetic relatedness based on mtDNA profiles and haplogroup frequencies of 42 different populations (Supplementary Table S5, Fig. 8).
The PCA analysis positions both the Baranja and Zobor region datasets within the European cluster, aligning closely with the Czech and Slovakian populations. Due to the applied resolution of the haplogroup data, finer differentiation within this European cluster is not discernible.
Sequence-based analyses of the mitogenomes
We conducted a comprehensive examination of complete mitogenomes, encompassing 16,569 base pairs, through DNA sequence level analysis. Subsequently, Slatkin FST values were computed and documented in Supplementary Table S10. A heatmap, illustrating the clustering of FST values, was generated to elucidate the genetic differentiation among the populations under investigation (see Fig. S10). Interesting is that among the included Conquest Period aDNA datasets (Szeifert et al. 2022) the KL6 (stands for larger village cemeteries from the 10-11th centuries) clusters with Baranja, Hungarian and Székely datasets.
The differences between the FST values are very small, whole mitochondrial data are missing from some neighboring regions and the Slovakian and Czech datasets are also limited; therefore, the resolution of that analysis is restricted to a broader scale.
We analyzed individual maternal lineages to discern the inter-regional relationships of contemporary Hungarian lineages and their ties to prehistoric and historic populations, among other associations. In the following we present those lineages that show diverse phylogenetic connections of the two study areas, including ancient reference samples as well (see references with non-NCBI IDs in Supplementary Tables S11).
On the NJ tree of haplogroup T1a, one individual from Baranja (DV082) can be found in the close proximity to one individual from archaeological site Bolshie Tigany (Volga-Kama region Early Medieval) on an excerpt of the phylogeographically very diverse and therefore less informative T1a tree (Fig. S11A). Another studied mitochondrial lineage from the Zobor region (ZB006) is situated close to an early medieval lineage from Bayanovo site in the Cis Ural region, associated with the late Lomovatovo culture and to one another sample from 9-10th century site Bolshie Tigany, both located in Russia (Fig. S11B). Whereas the structure of the whole T1a tree do not allow firm phylogenetic conclusions, these proximities on the tree might hint on the common history of these people (see further maternal connections of ancient Hungarians with the populations of the Cis Ural and Volga regions in Szeifert et al. (2022)).
One sample from the Zobor region belongs to lineage D4b2b. While haplogroup D4 is predominantly found across East Asia, Southeast Asia, Siberia, Central Asia, and among the indigenous populations of the Americas, its presence in Europe is notably sparse (Comas et al. 2004). The D4 mitogenome NJ tree (Fig. S12) shows that the D4b2b subgroup is rather disseminated in Eastern Eurasia nowadays. Although the currently known medieval ancient data (such as late medieval Mongolian sample) do not cluster with the examined ZB058, this lineage could reach the Carpathian Basin in the historically recorded migration weaves of the 1st millennium BC.
The U2e phylogenetic tree highlights the diversity observed within the Zobor and Baranja regions (Fig. S13). While the Baranja sample DV023, classified under lineage U2e2a1a, demonstrates northern affiliations, two samples from the Zobor region don't neatly fit into any subgroups currently recognized in the phylotree (falling into the U2e1'2'3 category). Notably, samples from both the Zobor and Baranja regions share the U2e1b1a subgroup with individuals from the 10th-11th centuries in the Carpathian Basin. Furthermore, representatives from the Zobor region and the steppe, associated with the U2e1a1 subgroup, are also evident in the U2e tree (refer to Fig. S13).
The U3 phylogenetic tree indicates that the U3b and U3b3 lineages in Baranja have connections primarily to the south and east (Fig. S14), where ancient haplogroup matches also coming from the Middle East and Caucasus (Mallick and Reich 2023).
The U4 haplogroup evolved during the Last Glacial Period, and spread in Northern Eurasia, having been a relatively common lineage among Mesolithic European hunter-gatherers (Mallick and Reich 2023). On the U4 phylogenetic tree the Baranja samples have rather southern (Bulgarian, Serbian) connections whereas the Zobor region lineages show toward Central and Eastern Europe (Fig. S15).
The U5a haplogroup, prevalent across Western Eurasia, is also well-represented in the modern Carpathian Basin. Notably, its U5a2 subclade establishes a clear link with ancient samples from the closer and wider region, with important examples from the 9-11th century cemeteries of ancient Hungarians (see Fig. 9, Fig. S16).
The H13 haplogroup is present in both the Zobor region and Baranja, with pairs of individuals in each. However, their phylogeographic patterns differ strikingly (Fig. S17). In Baranja, the H13 lineages branch off basally, preceding most contemporary lineages. Conversely, in the Zobor region, lineages either match Northern European examples (as seen in the H13a1a1a lineage of ZB013) or are akin to a Roman-era sample from Dobrudja and modern Polish, Russian samples (as observed in the H13a1a3 lineages of ZB042 and ZB047).
Mitogenome sequences from Hungarian populations from Hungary, Székely (Hungarian) people from Transylvania near Odorheiu Secuiesc, Romania (Borbély et al. 2023) and the here presented two populations from Baranja and Zobor region were tested in Arlequin for population differentiation and showed FST values below 0.0035 with significant p-values.