Whole-genome sequencing and variant calling
After removing duplicate reads, whole genome sequencing of the 12 individuals from Iowa produced an average of 217,503,897 (± 131,486,905) reads per individual, out of which 96.27% were considered of high quality (quality score >20) and retained for further analyses. Here, an average of 86% (± 0.57) of the reads mapped to the reference Northern Pike genome. This produced an average depth of 26.26x that ranged from 8x to 49x. In the case of the samples from the Canadian population, 86% of the reads were considered high quality, 63% of the high-quality reads were successfully aligned to the Northern Pike genome and a depth of 11.44x was obtained for the sequenced sections of the genome. Details on the sequence data of each individual from Iowa and the averages for the Iowa and Canadian populations are shown in Table 1.
Table 1. Depth of coverage for raw whole-genome sequence data for Iowa samples
|
Sample #
|
Sex
|
Lake
|
Sequenced reads
(#)1
|
Aligned reads
(%)
|
High-quality reads
(%) 2
|
Depth
(x) 3
|
|
1
|
Female
|
Okoboji
|
203,824,613
|
86.34
|
96.85
|
16.23
|
|
2
|
Female
|
Okoboji
|
315,700,163
|
86.16
|
96.78
|
24.93
|
|
3
|
Female
|
Okoboji
|
306,850,333
|
86.04
|
96.99
|
24.34
|
|
4
|
Male
|
Okoboji
|
99,981,043
|
87.30
|
96.22
|
8.26
|
|
5
|
Male
|
Okoboji
|
150,029,226
|
86.38
|
96.75
|
12.12
|
|
6
|
Male
|
Okoboji
|
538,733,400
|
87.32
|
96.44
|
41.53
|
|
7
|
Female
|
Big Spirit
|
438,773,328
|
86.29
|
95.68
|
49.98
|
|
8
|
Female
|
Big Spirit
|
538,733,400
|
85.62
|
95.29
|
47.41
|
|
9
|
Female
|
Big Spirit
|
510,088,829
|
85.62
|
94.77
|
47.41
|
|
10
|
Male
|
Big Spirit
|
157,555,467
|
86.76
|
96.56
|
14.34
|
|
11
|
Male
|
Big Spirit
|
140,430,988
|
86.73
|
96.41
|
12.81
|
|
12
|
Male
|
Big Spirit
|
177,235,006
|
85.92
|
96.56
|
15.8
|
|
Iowa Average
|
---
|
---
|
217,503,897
|
86.37
|
96.27
|
26.26
|
|
Canada Average
|
---
|
---
|
1,022,373
|
63.11
|
86.15
|
11.44 4
|
|
1. Reads aligned to Northern Pike reference genome.
|
|
2. Quality score > 20.
|
|
3. Depth shown is prior to mapping quality filtering.
|
|
4. Depth was calculated for sequenced sections.
|
|
Breadth of coverage at different depth thresholds are shown in Table 2. After aligning the reads of Muskellunge from Iowa to the Northern Pike genome, 80% of the bases were covered with a depth larger than 0x. Overall, 66% of the bases were covered with a depth of 10x, which was used as a threshold to be included in all downstream analyses. Also, 0.03 percentage of the genome was covered at a depth higher than 1000x; potentially pointing at highly repetitive segments that showed issues with proper alignment and therefore were counted as the same segment [16].
Table 2. Average breadth of coverage across Iowa samples
|
Depth threshold
|
Bases above threshold (#)
|
Percentage (%)
|
>0x
|
735,465,012
|
80.05
|
10x
|
607,490,936
|
66.12
|
20x
|
539,961,789
|
58.77
|
50x
|
21,395,310
|
2.33
|
100x
|
7,981,231
|
0.87
|
1000x
|
260,029
|
0.03
|
Figure 2 shows the average depth coverage per mega base across the twelve individuals from Iowa after the sequencing was aligned to the Northern Pike genome. The overall average depth across all 1 mega base windows was of 21.16x with a standard deviation of 4.44x. Depth of coverage was highest at chromosome 9, mega base 18.5, reaching a depth of 40.4x. Additional peaks were observed in chromosome 11 mega base 22 and chromosome 23 mega base 10 with depths of 37.1 and 38.6, respectively. Additionally, chromosomes 2, 3, 4, 6, 8, 9, 12, 14, 15, 16, 18, 20, 22 and 25 produced a depth of 0x on the last mega base. Finally, all chromosomes showed higher depth towards the centromere compared to the telomeric regions.
The variant calling pipeline produced three large sets of Single Nucleotide Polymorphisms (SNPs) that were used in further analyses. When variants were called for the twelve whole genome sequenced samples from Iowa, a total of 36,627,942 biallelic SNPs were called out of which 8,218,039 were not monomorphic. Given the large number of SNPs, all SNPs that did not have a call rate of 100% were dropped, resulting in a final set of 7,886,471 SNPs that was used for all analyses involving only individuals from Iowa. The variant calling of Canadian samples produced 128,213 biallelic SNPs, out of which 16,059 were not biallelic.
The transitions/transversions ratio (Ti/Tv) was calculated for the two set of samples, and the results were of 1.09 and 1.29 for samples from Iowa and Canada, respectively.
When combining the two Canadian and Iowan dataset, a total of 108,132 biallelic SNPs were called with 22,705 SNPs not being monomorphic. After retaining only SNPs with a call rate larger or equal to 90%, a total of 16,867 SNPs were kept for further analyses. The number of Iowa-specific biallelic SNPs and biallelic SNPs called for Iowa and Canada along with the density of SNPs per mega base are shown in Figure 3 panels A and B, respectively. As shown in Figure 3A, chromosome 19 showed the highest number of biallelic SNPs with 5,675 SNPs, while chromosome 11 showed 1,125 SNPs that were common between Canada an Iowa samples, being the chromosome with the most SNPs. On the other hand, chromosome 25 was the chromosome with the least number of SNPs in both cases with 2,525 and 331 respectively. On average, 4,162 and 644 SNPs per chromosome were called for the samples from Iowa and the combined samples with a standard deviation of 872.84 and 216.78, respectively. When looking at the density of SNPs per Mb as shown in Figure 3B, the chromosome with the highest density of SNPs was chromosome 23 with 136.54 SNPs per mega base when SNPs were called for the Iowan population only and chromosome 20 with 23.41 SNPs per mega base when SNPs were called for the Canadian and Iowan populations simultaneously. On average 120 SNPs per mega base were called for the Iowa population and 18 SNPs per mega base were called for the Canadian and Iowan populations combined with a standard deviation of 10.44 and 2.62 SNPs per mega base, respectively.
Principal Component Analysis (PCA) results for the Iowa and Canadian populations are shown in Figure 4. No clustering was observed when comparing Iowan samples (Figure 4A). Given that individuals were sampled at two different lakes that are known to be connected, these results were not surprising although it is important to note that individuals 4 and 5 did not cluster with the rest of samples. However, in Figure 4B a clustering by sex can be observed with the exception of individuals 4 and 5 that did not cluster with the other males. This clustering may indicate some level of genetic differences between individuals of the different sexes and not sex differences themselves. Figure 4C shows several clusters that were identified in the Canadian population which are likely related to what part of the water system they were sampled from. When PCA was performed on Iowa and Canada samples combined (Figure 4D), populations form Iowa and Canada clustered separately, indicating genetic differences between the two populations.
Admixture analyses confirmed the findings from PCA. Results from admixture analyses are shown in Figure 5. When both the Iowa and Canada populations were analyzed, the likely number of subpopulations found was between 12 and 19 as these numbers produced the lowest values for the cross-validation error (Figure 5A). Here, admixture detected clear differences between populations from Iowa and Canada. The differences between populations were so large that when the results of Admixture with k =2 were plotted (Figure 5B), all 12 individuals from Iowa showed a composition >0.9 for the same subpopulation while all samples from Canada showed a composition of 1.0 for the second population. In a similar matter when ancestry estimations were plotted for k values of 12 and 19 (shown in Supplementary Figure panels A and B, respectively) all 12 individuals from Iowa grouped in the same subpopulation with a composition > 0.9 in both cases, illustrating the high degree of differentiation that exists between samples from Iowa and Canada.
Pooled heterozygosity and genome wide Fst.
Regardless the subgroup of the Iowa population that was considered for pooled heterozygosity (Hp) analyses (all individuals, females only and males only), the same windows were seen to be highly heterozygous in all cases, as shown in Figure 6. All these six windows showed normalized pooled heterozygosity scores that were more than three standard deviations from the mean and were therefore identified as outliers. Table 3 shows the six windows that had high heterozygosity. In total, 244 genes were found in these windows although only 53 have been previously annotated (Genes and coordinates shown in Supplemental Table 1). Given the small number of annotated genes found in these six windows, gene ontology analyses only showed one significantly enriched term, this being sensory perception of pain. Other enrichment terms found included sensory perception, positive regulation or synaptic transmission and regulation of glial cell proliferation (Complete list of enriched GO terms related to Hp analyses shown in Supplementary Table 2).
Table 3. Pooled heterozygosity values for individuals from Iowa.
|
|
Chromosome
|
Mega base
|
Minor allele counts
|
Major allele counts
|
Hp 1
|
zHp 2
|
17
|
48.5
|
746
|
1,366
|
0.0009
|
32.71
|
|
2
|
40.0
|
826
|
2,702
|
0.0006
|
19.24
|
|
14
|
37.5
|
1,622
|
3,994
|
0.0004
|
11.77
|
|
3
|
36.5
|
2,954
|
6,190
|
0.0002
|
6.90
|
|
4
|
35.0
|
2,605
|
9,131
|
0.0002
|
5.19
|
|
19
|
47.5
|
4,486
|
8,906
|
0.0001
|
4.44
|
|
1. Pooled heterozygosity
|
|
2. Normalized pooled heterozygosity values.
|
|
Although PCA showed a clear clustering according to sex in the Iowan population, Fst analyses did not provide any insight on the differences. As shown in Figure 7, there were no windows with mFst values above 0.9 and the overall Fst value between sexes was of 0.05.
As shown population stratification analyses, Iowa and Canada have markedly different genetic backgrounds, and this is reinforced by the mFst values obtained for the comparison between both populations, where the overall Fst value was 0.24. Window based Fst results are shown in Figure 8. In total, 14 windows produced a mFst value larger than 0.9 and 8 of these windows had an mFst value of 1, indicating that the majority of the SNPs in the window are fixed or almost fixed for opposite alleles. All windows that were deemed of interest after performing analyses of signatures of selection, had been sequenced at a depth that ranged from 17x to 32x and thus are considered as accurate results. This warrants a more in-depth analysis that might shed light on regions of the genome that are responsible for adaptation to the different specific environments. A total of 641 genes were identified in the 14 windows with mFst scores higher than 0.9. However, only 331 of these genes have been annotated and as in the case of Hp analyses, no statistically significant enriched terms were found (List of annotated genes found in mFst windows with score higher than 0.9 shown in Supplemental Table 3). Although not significant, several GO-terms associated with development and growth were enriched, these included negative regulation of developmental process, positive regulation of chondrocyte differentiation and positive regulation of cartilage development, among others (Complete list of enriched GO terms related to Fst shown in Supplemental Table 4).
Inbreeding and runs of homozygosity (ROH)
Inbreeding coefficients ranged from 0.00 to 0.44 depending on the level of stringency considered to call a segment as a run of homozygosity. Nevertheless, out of the six different stringency levels at which runs of homozygosity were analyzed, the level that was considered to produce the most realistic levels of inbreeding was with windows that included at least 20 SNPs while allowing a maximum of three heterozygotes. Individual details for the Iowa population and the average for the Canadian population are found in Table 4. On average, individuals from Iowa showed 3.5 ROH segments with a length of 36,699.20 Kb. The individual with the highest number of ROH segments was sample 9, a female from Big Spirit Lake with 7 segments of 50,042.8 Kb of length in average. In contrast, sample 1, a female from Okoboji did not show any ROH segments. On average, females showed slightly higher number of ROH segments than males; however these segments were approximately 2,000 Kb shorter than in males. As shown on Figure 9, individuals from Canada showed a markedly higher level of inbreeding than samples from Iowa, being on average 0.32. Additionally, the Canadian population showed a slightly wider distribution of estimated inbreeding coefficients, ranging from 0.25 to 0.38 while the Iowa population shows a very short range from 0.00 to 0.05. The length of the segments in both populations is very similar, spanning about 6,500 Kb.
Table 4. Individual and average Froh scores.
|
ID
|
Sex
|
Lake
|
# ROH 1
|
Total Kb
|
Av. length (Kb) 2
|
Froh
|
S1
|
Female
|
Okoboji
|
0
|
0.00
|
0.00
|
0.00
|
S2
|
Female
|
Okoboji
|
2
|
12,622.20
|
6,311.08
|
0.01
|
S3
|
Female
|
Okoboji
|
3
|
23,042.30
|
7,680.75
|
0.03
|
S4
|
Male
|
Okoboji
|
2
|
12,730.70
|
6,365.36
|
0.01
|
S5
|
Male
|
Okoboji
|
2
|
15,358.40
|
7,679.21
|
0.02
|
S6
|
Male
|
Okoboji
|
4
|
24,561.20
|
6,140.30
|
0.03
|
S7
|
Female
|
Big Spirit
|
5
|
36,351.40
|
7,270.28
|
0.04
|
S8
|
Female
|
Big Spirit
|
5
|
36,499.60
|
7,299.91
|
0.04
|
S9
|
Female
|
Big Spirit
|
7
|
50,042.80
|
7,148.97
|
0.05
|
S10
|
Male
|
Big Spirit
|
4
|
31,859.10
|
7,964.77
|
0.03
|
S11
|
Male
|
Big Spirit
|
3
|
21,129.30
|
7,043.11
|
0.02
|
S12
|
Male
|
Big Spirit
|
5
|
36,699.20
|
7,339.84
|
0.04
|
Canada average 3
|
--
|
--
|
46
|
294,087.37
|
6,446.51
|
0.32
|
1 Number of segments considered runs of homozygosity.
|
2 Average length of ROH segments in kilobases.
|
3. Average for all Canadian samples.
|