Globally, genetic variability within LRRK2 confers the high genotype and population attributable risk for PD, and the frequency of many variants appear to be population specific1. All pathogenic LRRK2 mutations elevate its kinase activity, whereas p.G2019S (rs34637584_A) directly breaks the hinge of the ‘activation segment’ keeping the enzyme constitutively active2,3. In Arab Berbers, rs34637584_A has a background frequency of 0.9%, it accounts for >30% of sporadic patients and 40% of those with a family history of PD4. Despite the gene’s identification through linkage as a dominant Mendelian disorder5, the penetrance of rs34637584_A is incomplete6; while subtle prodromal signs may be missed, including hyposmia, REM sleep behavior disorder and orthostasis, the expressivity of LRRK2 parkinsonism is as variable as idiopathic PD7. Conceivably, penetrance and expressivity are a function of other genetic and environmental modifiers. Notably, polymorphisms within the LRRK2 locus are associated with PD susceptibility1 and AOO in progressive supranuclear palsy8, and these may influence LRRK2 expression, protein interactions, and kinase activation9. Penetrance may also be influenced by LRRK2’s role in intracellular innate immunity and pathogen response. In animal models LRRK2 p.G2019S promotes survival in infections disease10,11. Variability in the LRRK2 locus is also associated with inflammatory bowel disease12, pediatric immune disorders13, and type-1 response in leprosy14. Here we investigate the effects of genetic variability in cis and in trans of rs34637584_A in a sample from the Tunisian Arab-Berber population. We assess the relationship with AOO and the genetic evidence for ancestral LRRK2 haplotype selection.
Results for rs34637584_A were generated by direct genotyping and added to high density arraydata. As there are no Arab-Berber reference genomes within public databases we selected thirteen rs34637584_A heterozygotes with extreme AOO phenotypes (7 with young onset PD (mean AOO=34.6 SD=7.02 (22-42) years), and 6 elderly but clinically asymptomatic individuals (mean age=78.7 SD=7.0 (69-89) years) for whole genome sequencing (WGS). Chromosome 12 SNP imputation and haplotype phasing was done using a European genome reference with and without Tunisian WGS. Nevertheless, results were similar regardless of the imputation reference used and yielded 16,997 SNPs on chromosome 12 (average minor allele frequency (MAF) = 0.25 ± 0.13 SD, range 0.50 - 0.017).
Data from LRRK2 p.G2019S heterozygotes and homozygotes were compared to define allelic variability in cis for the longest, most parsimonious, allele for the majority of samples. This spanned a genomic distance of 396Kb, from rs878010 to rs73110066, and included a total of 69 markers in addition to the pathogenic LRRK2 c.6055A variant (rs34637584_A at 12:40340400 (GRCh38)) (Supplementary Table 1). The 396Kb cis haplotype included complete genotyping data for all samples (n=145) and was identical in all but one unaffected control with rs2404840_G>A, which may be due to recombination. SNP frequencies in the most parsimonious LRRK2 haplotype, versus allele frequencies in unrelated control participants without rs34637584_A, enabled the age of the mutation to be calculated at approximately 40 (95% CI 28-52) generations. Assuming 30 years per generation, the rs34637584_A ancestral allele in this sample originated approximately 1,200 ± 360 years ago. Within the same dataset we observe 81 alternate LRRK2 haplotypes in trans (unique haplotypes defined as having ≥ 1/69 difference in marker alleles). A variable length Markov chain Monte Carlo method15, implemented in Beagle3.3, was used to identify the shortest haplotype in trans most associated with AOO, but none were observed that reach significance after correction for multiple testing. Additionally, a maximum likelihood method was used to resolve haplotype relationships as a phylogenetic tree. This identifies 3 major clades from a central unrooted node and can be partitioned by five major SNPs (rs2638245, rs10878199, rs2638271, rs2708438, and rs1388587) that span the 40.1 – 40.3 Mb interval. Nevertheless, no clade association with AOO was apparent (z(2)=0.40, p=0.69) (Supplemental Fig. 1).
Lastly, we investigated whether the background frequency for the highly conserved rs34637584_A haplotype might be driven by recent positive selection. Integrated haplotype scores (iHS) summarize the evidence for the entirety of chromosome 12, as illustrated for affected heterozygotes and homozygotes (AG+AA), and wild type (GG) affected and unaffected individuals. A cluster of higher iHS’s (>2.5) demarks an interval between 39.8 and 41.0 Mb (Fig. 1). The distribution of iHS scores for the entirety of chromosome 12 iHS values (minus the LRRK2 locus) was bootstrapped, and suggests this cluster is highly significant compared to scores for the rest of the chromosome (p=4.50 E-18). Curiously iHS scores within the LRRK2 locus from 40.2 – 41.0Mb were also significant for rs34637584_G wild type alleles (GGall=422, p=2.95 E-4 to 1.61 E-6.Table 2). As several inflammatory disorders are associated with the LRRK2 locus, we removed any individual with these disease-associated SNP alleles, namely rs11175593_T12, rs4768236_C16, and/or rs17466626_G13 (GGnim = 208) and the LRRK2 signal was ablated (Table 2). Despite the reduction in sample size, mean iHS scores and their distributions were comparable in sub-groups with and without inflammatory markers (0.79 ± 0.59SD vs 0.80 ± 0.59SD) (Supplementary Fig. 2). Overall, these results are consistent for positive evolutionary selection for the LRRK2 region not just for the rs34637584_A allele.
In conclusion, this studysupports and extends prior studies suggesting LRRK2 p.G2019S heterozygotes are descendants of a common ancestral founder who originated at least 40 (95% CI 28-52) generations ago (Supplementary Table 1). This result is within the confidence interval of prior estimates (Supplementary Table 1 & 2). The LRRK2 locus includes SNPs that nominate genome wide associations to several inflammatory disorders (Crohn’s disease [rs11175593_T]12; Inflammatory bowel disease [rs4768236_C]16 pediatric immune diseases [rs17466626_G]13, and platelet count [rs529898481_G]17). However, in our data, those alleles are not in linkage disequilibrium with rs34637584_A (the LRRK2 c.6055A haplotype). Rather, those alleles are captured on haplotypes in trans. Whether these variants confer a functional change on LRRK2 expression or activity has yet to be demonstrated.
Despite our limited sample size, the cluster of iHS values around the LRRK2 locus is indicative of positive selection for LRRK2 rs34637584_A. Although Tunisia has a high frequency of consanguineous marriages, neither isolation nor genetic drift are likely to produce the distribution of values observed. Overall, the burden of evidence from our data and others suggests rs34637584_A, and the constitutive LRRK2 kinase activity it confers, offers a survival advantage to reproductive age. To date, this has enabled a >19-fold increase in the background frequency of rs34637584_A in Tunisia, in our sample, as compared to the global mean (rs34637584_A MAFTunisia = 0.0094 (7/742)4; MAFgnomADr2.1-all = 0.0004884 (138/282542), gnomADr2.1-African = 0.0001202 (3/24962). Fisher’s p=9.53 E-27).
The evolutionary forces driving positive selection are unknown but epidemiologic and experimental research on pathogens restricted by LRRK2 kinase activity may be informative. Intriguingly, the retromer is also central to innate immune responses and often corrupted by intracellular pathogens18. Its core component, VPS35 p.D620N, is linked to PD and activates LRRK2 kinase19,20. RAB32 p.S71R, recently linked and associated with PD, also causes LRRK2 kinase activation21. RAB32 is central to the biogenesis and transport of melanosomes in melanocytes, and similar components are deployed in catecholamine metabolism and pigment production22. RAB32 traffics mitochondrially derived itaconic acid to the pathogen-containing vacuole, to inhibit bacterial growth23. It also interacts with PINK121 that instigates mitophagy, and for which loss-of-function mutations are best described in Tunisian families with parkinsonism24. Hence, most Mendelian gene mutations that cause PD impinge on phagolysosome biology and intracellular innate immunity and may illustrate convergent evolution. Nevertheless, the ability to meaningfully investigate this for other linked loci that cause PD is limited by sample size. How peripheral and central immunity might influence the vulnerability of dopaminergic neurons in LRRK2 parkinsonism and idiopathic PD remains to be defined, but both cell autonomous25 and non-autonomous mechanisms evidently contribute11.