Over a two year period (February 2017-January 2019), all women who presented for antenatal care at Sonam Norboo Memorial Hospital in Leh were approached to take part in the study. In total, 316 families were recruited. Maternal and fetal baseline characteristics are summarized in Table 1. Mean birth weight was 3.18kg (mean birth weight centile 44.5th ) in term individuals (>37 weeks’ gestation). According to intergrowth charts (https://intergrowth21.tghn.org), 14% of infants were born at less than the 10th centile (which defines a small for gestational age [SGA] newborn) and only 7.8% of all infants were born with a low birth weight (LBW, defined as less than 2.5kg). The average age for pregnant women was 28.9 years and 37.9% (120/316) were primigravida. The majority of subjects (306/316; and 96.8%) reported >3 generations of Ladakhi ancestry. The ten women who reported a Tibetan ancestry all gave birth at term and with a mean birth weight of 3.6kg. Preterm birth was infrequent (15/316 [4.7%] infants were born before 37 weeks). Amongst these, birth weight was, as expected, lower (mean 2.4kg) than those born at term (mean 3.2kg) [P=0.0001] although birth weight centile was similar (52nd vs 45th P=0.324) suggesting appropriate growth for gestational age. Recognizing that being preterm would confound birth weight if used in isolation, for further analysis we instead used birth weight centiles as a more useful measure of growth potential, given that it adjusts for gestational age.
Table 1
Maternal and fetal characteristics summarized as mean [min/max] with birth weight of infant (kg) divided into those born at term gestations of >37 weeks and those born preterm at <37 weeks, birth weight centile (%) derived from Intergrowth and those born at <10th centile recorded (defined as small for gestational age) and ancestry by history (Ladakhi and Tibetan).
Characteristic | Term>37 <42+1 weeks n=301 | Preterm<37 weeks n=15 | Mean n=316 |
Maternal Age (years) | 29.1 [18/40] | 26 [20/35] | 29.0[18/40] |
Weight (kg) | 57.0 [37/92] | 56.9 [40/82] | 57.0[37/92] |
Hb (g/dl) | 12.7 [10.4/14.8] | 12.3 [7.5/17] | 12.3 (7.5/17) |
Primiparous n (%) | 111 (36.9) | 6 (40) | 117 (37) |
Birth weight (kg) | 3.18[1.98-4.40] | 2.45[1.5-3.2] | 3.15[1.5-4.4] |
Birth weight centile % | 44.5 [0.13/98.7] | 49 [1.5/99] | 44.9 [0.13/99.9] |
<10th centile (%) | 42 (14) | 2 (13.3) | 44 (13.9) |
Tibetan by history kg (centile) | | | 3.6 (70) [n=10] |
Ladakhi by history kg (centile) | | | 3.17 (43.9) [n=306] |
Reconstruction Of Ladakhi Population History
To optimize our GWAS, we first sought to understand the population history of the Ladakhi population. We compared Ladakhi with other surrounding populations based on major language groups including Tibeto-Burman (Tibetans, Sherpa, Ladakhi), Indo-European (Indo-Aryan, Hazara), Austro-Asiatic (Munda) and isolate languages (Bursushaski) (Fig. 1) using Principle Component Analysis (PCA). To understand the nature and extent of Himalayan ancestry in the Ladakhi population, we compared them with high altitude populations (Sherpa, Tibetans) and other Tibeto-Burman populations adjacent to the foothills of the Himalayas in Nepal and Bhutan.
In the PCA, the Ladakhi individuals project together, forming a genetic cluster that is distinct from the neighboring Indo-Aryan Indian population. This Ladakhi cluster forms a genetic cline between the two termini of principal component 1, which separates Indo-Aryan Indian individuals from individuals with Himalayan or East Asian ancestry. The second principal component appears to separate out Han (East Asian) individuals from Sherpa (high altitude), where Ladakhi individuals are placed towards the high-altitude-associated Sherpa population alongside individuals from Himalayan regions such as Tibet, Nepal, and Bhutan. The intermediate position of Ladakhi individuals on principal component 1 is suggestive of admixture between Indo-Aryan and Himalayan populations. The position of Ladakhi individuals on principal component 2 suggests the Himalayan genetic group that the Ladakhi share affinity to is Tibetan or Sherpa with no evidence in PCA of admixture with Han-related East Asian ancestries.
We further contextualized our population structure analysis using the maximum-likelihood estimation of individual ancestries using ADMIXTURE (Fig. 2). At k=2, the two ancestral components, are maximized in either Himalayan or the East Asian Han populations (in red), and Indo-Aryan Indians (green). We observe that Ladakhi individuals are modelled with a slightly higher proportion of the East Asian ancestry than South Asian ancestral component, in agreement with their average location on principal component 1. At k=3, one ancestry component (represented by blue in Fig. 2) is maximized in the East Asian Han, the second (green) maximized in Sherpa, and the third (red) in Indo-Aryan Indians. The majority average component in Ladakhi individuals is the Sherpa-maximised ancestry (52.4%), then the Indo-Aryan-maximised component (32.3%), suggesting the majority of Ladakhi are closer to Tibeto-Burmans than Indo-Aryans. At k=4 an ancestry component separates ancestry which is maximized in Tibetan-Bhutanese individuals from other East Asian sources. This component (in purple) is also found in appreciable proportions in other high-altitude populations. Ladakhi individuals also show appreciable proportions of this ancestry component common in both Nepalese Tibeto-Burman and Tibetans. However, the blue ancestry component at k=4 in Ladakhi individuals continues to support admixture with other South Asian population such as Hazara, Burusho or Indo-Aryan Indians. Thus, Ladakhi appear genetically closer to Tibeto-Burman speakers of Himalayan region than Indo-Aryan populations of South Asia. Long-term isolation in remote highland, practices of endogamy culture and a founder effect might have added greatly to the formation of distinct genetic cluster in the Ladakhi agreeing with PCA result (Fig. 1). In summary, the data supported Ladakhi as a genetically distinct Himalayan population closer to Tibeto-Burman population and Tibetan populations than to Indo-Aryan.
We next performed two additional analyses to further confirm that the Ladakhi are admixed between lowland Indo-Aryan and highland Tibeto-Burman sources. Firstly, we leveraged f-statistics [20] in the form of f3 (X; A, B) where X is tested for evidence of admixture between sources A and B, where a negative f3-statistic is indicative of admixture. We placed Ladakhi as X, and tested combinations of lowland/Indo-Aryan populations (Burusho, Indo_Aryan, Hazara, Munda) as X and highland/East or North Asian populations (Han, Japanese, Sherpa, Tibetan, TB_Bhutanese, TB_Nepalese, Yakut_Siberian) as Y, reporting those f3 results with an absolute Z score >3 (Fig. 3). The strongest evidence of admixture (most negative f3-statistic) is between Tibetan or Sherpa sources and Indo-Aryan or Burusho sources – supportive of results from PCA. Additionally, we performed Runs of Homozygosity (ROH) detection using PLINK, comparing Ladakhi to a subset of neighbouring lowland and highland references. We detect elevated ROH (See Supplementary Fig. S1) in the high altitude Tibeto-Burman and Sherpa populations, agreeing with previous estimates [21], but only modest levels of ROH in the Ladakhi, more comparable to the general Tibetan or Indo-Aryan labelled individuals. These modest levels of ROH would be consistent with admixture between different ancestries.
Birth Weight Analysis On Ladakhi Subjects
In order to determine if there were any genome-wide significant predictors of birth weight in the Ladakhi population, we performed a GWAS of birth weight in the 176 individuals, who were genetically identified as Ladakhi from the population study (see Supplementary Fig S2). We did not find any signals that were genome-wide significant after correction for multiple testing (i.e. p<5 x 10-8). However, when we looked in the tail of the association statistics from our birth weight GWAS (i.e. p=1 x10-4 to 1 x 10-7 uncorrected), we noted the presence of multiple variants previously associated with traits relevant to high altitude adaptation, including body mass index (rs10968576 in LINGO2) and blood related traits (rs16893892 in RP5-874C20.3, rs2298839 in AFP, rs9261425 in TRIM31 and rs362043 in TUBA8) [23].
Next, in order to determine whether the genetic architecture of birth weight in the Ladakhi population overlaps with that observed in lowland populations, we sought the association with birth weight in Ladakhis of the 70 genetic signals previously associated with birth weight in lowland populations [8]. Overall, 32 of these 70 signals were either directly genotyped or captured through linkage disequilibrium (r2>0.8) in our Ladakhi dataset. Seven of these 32 signals were significantly associated (P < 0.05, uncorrected) with birth weight in the Ladakhi GWAS study (see Table 2). These variants mapped to five different birth weight associated genes (ZBTB38, ZFP36L2, HMGA2, CDKAL1 and PLCG1) with the direction of effect being consistent with the original discovery reporting [8] all seven cases.
Table 2
Single nucleotide polymorphism (SNP) and gene location along with P and Beta values of 176 babies born with a >3 generation ancestry in Leh Ladakh compared with UK biobank and GWAS_Catolog
SNP | Mapped Gene | SNP_ Annotation | P-values | Beta values |
Ladakhi | UK biobank | GWAS_ Catolog | GWAS_ Catolog | Ladakhi |
rs6440006 | ZBTB38 | Upstream | 0.03515 | 1.099e−5 | 4 x 10−12 | -0.021224 | -0.0818 |
rs8756 | HMGA2 | 3_prime_UTR | 0.000078 | 1.97e−138 | 1 x 10−19 | -0.027582 | -0.176 |
rs1351394 | HMGA2 | intronic | 0.000078 | 3.28e−137 | 2 x 10−33 | -0.043 | -0.176 |
rs7968682 | HMGA2 | intergenic | 0.000078 | 4.97e−130 | 4 x 10−60 | -0.041831 | -0.176 |
rs4952673 | ZFP36L2 | intergenic | 0.04 | NA-not available | 2 x 10−11 | -0.020476 | -0.09315 |
rs35261542 | CDKAL1 | intronic | 0.01864 | NA | 3 x 10−45 | -0.040621 | -0.08755 |
rs753381 | PLCG1 | missense variant | 0.02988 | NA | 3 x 10−9 | -0.01512 | -0.0792 |