Population structure in Ireland and Britain
We inferred fine-scale population structure from the largest collection of Irish reference haplotypes (n = 3,502) with geographic provenance assembled to date, combining datasets of Irish and British ancestries (Table 1). We identified a total of 25 genetic communities using the Leiden network community detection algorithm(29) over three levels of recursive clustering, constructing a network of summed IBD-segment sharing amounts between individuals (see Methods). Identifying genetic communities based on haplotype sharing patterns then allowed us to study regional demographic histories of Ireland and Britain.
Table 1
Dataset | Population | Geographic Data |
Present | Absent |
ALS Case Control Cohort | Northern Ireland | 7 | 0 |
Republic Of Ireland | 533 | 445 |
Irish DNA ATLAS | Northern Ireland | 24 | 0 |
Republic Of Ireland | 166 | 5 |
People of the British Isles | England | 2405 | 29 |
Isle of Man | 56 | 0 |
Northern Ireland | 69 | 0 |
Orkney | 136 | 1 |
Republic Of Ireland | 6 | 1 |
Scotland | 152 | 30 |
Wales | 290 | 5 |
TRINITY Student Study | Republic of Ireland | 0 | 2214 |
Confirming previous reports(2, 3), these genetic communities segregated by geography, and were thus assigned labels to broadly reflect these geographic affinities (Supplemental Table 1). The first recursion classified individuals into two communities (Supplemental Fig. 1a), one with predominantly Irish membership and the other predominantly with British membership. The second recursion largely split the communities by historical or administrative boundaries — provinces within Ireland (n = 3) and generally countries within the UK (n = 5) (Fig. 1a-b). The third recursion identified fine-scale structure within these groups (Supplemental Fig. 1b-i). We identified previously unobserved genetic structure within Ireland (N.Kerry, S.Leinster, three Dublin communities) and the UK (East Anglia).
We characterised the significance and reproducibility of this fine-scale structure by performing hierarchical clustering using pvclust(32) on the average inter-community haplotypic lengths(33) (see Methods). We find supporting evidence that these communities represent meaningful divisions in our sample of Irish and British haplotypes (Fig. 1c and see Supplemental Note 1 for full analysis). Permutations of total-variation distances (TVD)(22) (see methods) confirmed the robustness of the communities (p-value < < 0.01, Supplemental Table 2).
We then sought to demonstrate the value of these findings in the context of the UK Biobank (UKB), one of the largest accessible datasets of Irish and British ancestry(11). Using UKB individuals with an Irish or Northern Irish birthplace and Irish ancestry (“UKB Irish”), we show that the UKB self-reported “white British” or “white Irish” ethnicity label poorly captures the diversity of Irish ancestry (Fig. 1d). We trained a Naive Bayes classifier to predict regional Irish and British ancestry, dividing our Irish and British dataset into training and validation subsets (see Methods). We applied the model to predict the regional ancestry of the UKB Irish. We find that 67.81% of these Irish or Northern Irish-labelled UKB participants are predicted to have regional ancestry most commonly found in the Republic of Ireland. Breaking down by broad region defined by our second-level clusters (Fig. 1), 33.82% of UKB Irish are classified as of Western/North-Western Irish ancestry, 20.85% of Southern Irish ancestry and 13.14% of Eastern Irish ancestry. Of the remaining 32.19%, 6.68% were classified into one of any other British-majority ancestry cluster, and 25.51% were classified in the Northern Irish/South-West Scottish/Manx-like group (Supplemental Table 3). This group was predominantly made up of those UKB Irish born in Northern Ireland (80.17% vs 19.83% who were born in the Republic of Ireland).
Genealogy and population structure
Genealogical data can also provide additional context to genetic communities detected through haplotype analysis. Genealogical data available from the Irish DNA Atlas allowed us to test the preponderance of different surname origins (i.e., English, Scottish, Irish) within each of the Irish third-level clusters (p-value < 2.2e-16 and adjusted Cramer’s V of 0.20 indicating a strong association) (Supplemental Fig. 3a). We observed a significant enrichment of Scottish and Gallowglass surnames in the SW.Scotland-N.Ireland community — matching historical settlement of Scottish mercenaries in that area between the 13th and 16th centuries(4). Within the N.Munster and N.Kerry communities, we observe an enrichment of Welsh surnames. In Wexford, we find an increase of English, Anglo-Norman and Scandinavian surnames which possibly may reflect older Anglo-Norman settlements in these regions(5). When we extended this analysis to individual surnames (Supplemental Fig. 3b), we observed an enrichment of “Walshes” in N.Kerry and W.Leinster, “Sullivans” and their variants in N.Kerry and S.Munster, “Ryans” in N.Munster, “O’Donnells” in W.Ulster-Argyll.
Demographic Profiling of Genetic Communities Across Ireland and Britain
With a set of regional genetic communities established across Ireland and Britain, we sought to comparatively profile the demographic histories of these genetic communities across different timescales by separating IBD segments into bins according to their lengths. We further used these bins to estimate temporal migration rates in Ireland and the UK, and the ancestral contributions from other European populations over three time periods. Since the time frames of IBD segment bins have wide ranges(34, 35), we refer to the bins by their length noting their age relative to each other, i.e. more recent, older etc. We complemented these methods with estimating levels of Runs-of-Homozygosity (ROH) for signals of inbreeding and/or isolation (see Methods). Lastly, we estimated changes in the effective population (Ne) sizes over the past 100 generations using IBD segment data (See Supplemental Note 2).
Changes in structure across time
Comparing IBD sharing profiles in Ireland and Britain, we find evidence of shifting demographic relationships over time. We observe that approximately 100 generations ago (IBD length bin [1,3cM) ) Irish communities on average shared more and longer IBD segments than most English (right-tailed t-test p-value < 2.2 x 10–16) and Scottish (right-tailed t-test p-value < 2.2 x 10–16) communities. (Fig. 2a and Supplemental Table 4). Further, the variation of sharing patterns within the Irish communities is subtler in comparison to their British counterparts, suggesting greater homogeneity (Fig. 2a-c and Supplemental Table 4). Examining patterns of IBD sharing between the Irish and British communities reveals change across more recent IBD length bins, demonstrating subtle shifts in population structure. We observed that elevated shared IBD levels between Irish communities (Supplemental Fig. 4a-c and Supplemental Fig. 5a-c) is primarily due to sharing within the [1,3cM) bin and it gradually decreases as the IBD bin lengths increase. This indicates an older signal of relative isolation in the Irish communities when compared to their general British counterparts. This corroborates our FST findings where the Irish communities have a lower median FST of 2.20 x 10− 4 compared to UK communities (Fst=9.51 x 10− 4 ; Supplemental Note 1 and Supplemental Table 5).
In the [1,3cM) IBD length bin, we observed that IBD sharing increases as we move from the east to the west of the island (Fig. 2a), signalling greater isolation in the west of Ireland ~ 100 generations ago. The within community sharing in Ireland however, remains significantly lower than highly isolated communities such as Orkney, N. and S. Wales and the Isle of Man across all length bins (Fig. 2a-c) (left-sided t-test p-value < 2.2 x 10–16). Irish communities have a similarly high degree of haplotype sharing as the Scottish, Manx, Welsh, and Orcadian groups in the [1,3cM) IBD bin (Supplemental Fig. 4a & 5a). However, while the Irish show equivalent values with the Manx and SW.Scottish groups in the [3,5cM), it diminishes with the NE.Scottish and Orcadian genetic communities in IBD bins ≥ 3 cM. In contrast, the N.Welsh genetic community appears to share slightly more IBD segments in the [3,5cM) and ≥ 5 cM bins (Supplemental Fig. 4a-c & 5a-c & 6a-c), as do the Cornish communities. These changing affinities between populations across time likely reflect complex demographic relationships which may be better understood by considering migration rates and isolation across time.
Population size and isolation can leave detectable signals on the distribution of ROH segments in a community. ROH segments in the [1,3cM) length bin indicate signals of isolation in the English Midlands, Orkney, Cornwall, and Connacht genetic communities 100 generations ago (Fig. 2d). This signal persists in the [3,5cM) ROH length bin for the Midlands, Orkney and Connacht genetic communities. However, the distribution ROH segments indicate recent inflation of parental relatedness in the S.English, Manx, and W.Ulster-Argyll groups in > 5cM length bin. Overall, ROH shared within the Irish, Scottish and English groups are comparable (Supplemental Table 6), whereas specifically the Orcadian, Manx and Welsh groups share higher levels of ROH. The SNP-based inbreeding coefficients (Fis) and ROH-based coefficients (FROH) for the populations indicate that the patterns of ROH and IBD-sharing observed is more likely due to small effective population sizes rather than consanguinity(36) (Supplemental Fig. 8).
Changes in population size, isolation and migration
Haplotype sharing patterns within Ireland and Britain can provide insight into the sizes and movements of populations, providing context for observed genetic structure. Irish communities show relatively high IBD-sharing with little variation in sharing between Irish communities (Supplemental Fig. 4 ) indicative of low effective population sizes (Fig. 3). Using IBDNe, we estimate Ne over time. We observed that this relatively homogenous sharing pattern is reflected in the similar Ne estimates of the Irish communities across the island 100 generations ago (Supplemental Fig. 7a-c, Fig. 4). Two-thirds of the Irish communities show a reduction in Ne 40 generations ago, specifically in the south and east of Ireland. The Wexford, N.Leinster, and S.Munster genetic communities which show 38–72% reduction in population size between 100 and 40 generations ago (Supplemental Table 7). We see a further reduction of 10% in the effective population size in the Wexford genetic community while there is an exponential increase (18–100%) in Ne in the other Irish genetic communities (Supplemental Table 7). In addition, the N.Kerry community appears to have had a population expansion and followed by contraction within the past 30 generations (Supplemental Fig. 6a) which could be due to its small membership or may reflect cryptic relatedness just above our threshold of relatedness filtering (see Methods).
Within the UK, there are elevated levels of haplotype sharing across varying bins of IBD or ROH length. The Orcadian, Manx, and the N. and S.Welsh genetic communities (and to a lesser extent Cornwall) demonstrate features of isolation (Fig. 2a-c and Supplemental Table 4). This is further reflected in their low effective population sizes over time, which are consistently lower than other British communities and show evidence of recent population contraction (Fig. 3, Supplemental Fig. 7d-g and Supplemental Table S7), especially N.Wales. By performing a PCA on an IBD-sharing matrix from IBD-segments in bins [1,3cM) and [3,5cM), principal components 1 and 2 resolved these communities from the Irish and other British communities (Supplemental Fig. 6a-c). PCA on the average total length of IBD shared also shows that the Scottish, Manx and N.Irish groups are on a west-to-east cline between the Irish communities and British communities. The Cornish communities appear to be more recently isolated (Supplemental Fig. 6d) with their population sizes exponentially decreasing over the course of 80 generations, matching an enrichment of within-community IBD sharing > 5cM. Also, these Cornish communities share marginally more IBD with the Devon community and then the S.English community when compared to the other British communities (Supplemental Figs. 4 & 5). In contrast, the S.English communities see a steady increase in Ne over time (Supplemental Fig. 7e), though with a common contraction in Ne followed by a recovery around 10 generations ago that is observed in nearly all clusters.
The observations of isolation in periphery communities above are reinforced by the migration rate surfaces estimated from MAPS(35) (Fig. 4). We observed that the Orkney islands, Isle of Man, Wales, Cornwall and Devon show low migration rates to and from the mainland both in the older [1,3cM) and more recent (≥ 5cM) IBD bins, further supporting their isolation. The effective population size dips consistently over time in NE.Scotland, Isle of Man, Orkney Islands and Cornwall (Supplemental Fig. 6). There are further stable migrational barriers between north and south Wales, confirming long standing structure within Wales. Additionally, there was little migration between the Scottish Lowlands and Highlands and Britain in the older [1,3cM) IBD bin. However, we observed a new migration corridor opening between the Scottish Lowlands and N.England in the more recent IBD bin ( > = 5cM), while the Highlands remain isolated from the rest of Britain (Fig. 4a-b). Interestingly, and supportive of analysis of IBD sharing across all the IBD length bins, N.Wales exhibits a historically smaller Ne around generations 30–50 than Orkney or the Isle of Man.
To further contextualise the structure and signals of isolation in Ireland, the migration rate surfaces (Fig. 4) show that the only stable migration corridor that consistently existed between Ireland and Britain is between northeast Ireland and southwest Scotland, demonstrating the isolation of the island and matching previous genetic evidence of migration to and from these regions (2, 8). The within-Ireland migrational cold spots however shifted across time from central-West Ireland in the [1-3cM) IBD bin to isolating Leinster from the rest of the island 15 generations ago (Fig. 4), separating Kerry from Galway, which may explain the signals of isolation in Connacht. Additionally, in the north of the country, a shift in migration corridors links the west and east of Ulster, likely reflecting gene flow from Scotland across Ulster.
Continental European ancestry in Irish and British communities
We also investigated the European ancestry contributions to our genetic communities across time using PCA of European IBD sharing (Supplemental Fig. 9 and Supplemental Table S8) to complement the previous results (2, 3, 22). In the oldest [1,3cM) IBD bin, we observe that English genetic groups separate out from Irish communities on axes driven by Germanic-Swedish-Norwegian ancestries. Scottish/Northern Irish communities demonstrate more Swedish influence while there appears to be strong North and North-Western Norwegian influence in the Manx and Orcadian groups (Supplemental Fig. 9a and Supplemental Fig. 10-13a-c). In the recent-history bin (> 5cM) however, there appears to be more of a West-Germany ancestry signal in the English communities while there is more of a Swedish-Finnish-Norwegian component in the Scottish groups. The Norwegian signal persists in the Manx and Orcadian groups (Supplemental Fig. 9b and Supplemental Fig. 10-13a-c).
Focussing on the contributions of European ancestry to Ireland over time, we find a strong signal from north Norway and western France and to a lesser extent from Sweden in the Irish communities in the [1-3cM) IBD bin (Supplemental Figs. 10–13). There appears to be a more recent contribution of ancestry in the [3,5cM) IBD bin from North-North-West Norway and Sweden in Ireland, specifically in the W.Leinster community (Supplemental Figs. 10–13). This can be seen as further confirming reports of gene flow from Ireland to Norway(9).