Genome re-sequencing and read-mapping
We conducted whole-genome re-sequencing of two closely related mosquito species, Ae. aegypti and Ae. albopictus, generating a total of 94.06 Gb and 137.22 Gb data, out of which 88482206354 and 833695702 base pair high quality reads were obtained, respectively. Around 98.5% of Ae. aegypti and 97.53% of Ae. albopictus reads were successfully mapped with their respective reference genomes [42, 43]. The GC percentage of Ae. aegypti and Ae. albopictus were 37.5% and 40.7% respectively, very similar to the GC content of their respective reference genomes. The genome coverage of 65.4% for Ae. aegypti and 41.4% for Ae. albopictus yielded high-quality bases for further analysis. A summary of the sequence reads and mapping data for both the mosquito genomes is presented in Table 1.
Table 1
Summary of mapping statistics of Ae. aegypti and Ae. albopictus in this study.
| Ae. aegypti | Ae. albopictus |
Mean read length (bp) | 147 | 145 |
Total number of bases | 94,06,59,44,374 | 1,37,22,86,64,100 |
Total number of mapped reads | 58,91,71,540 | 81,30,65,449 |
Total number of unmapped reads | 89,61,216 | 2,06,30,253 |
Mean mapping quality | 35 | 19 |
Detection of DNA polymorphism (SNPs and InDels)
Two different variant callers, Genome Analysis Toolkit (GATK) [45] and Samtools-mpileup [46] were employed to detect high-quality genome variants, including SNPs and InDels. Both methods identified a total 14,416,484 SNPs for Ae. aegypti and 28,940,433 SNPs for Ae. albopictus after applying a 10x filtration threshold (Figure S1). Additionally, total number of InDels for Ae. aegypti and Ae. albopictus were 156,487 and 188,987 respectively (Figure S1). Aedes albopictus exhibited about 1.9 times more SNPs than Ae. aegypti, while less InDels were seen in Ae. albopictus when compared to Ae. aegypti. A summary of SNPs and InDels frequency in Ae. aegypti and Ae. albopictus relative to their reference genomes is presented in Table 2.
Table 2
Frequency of SNPs and InDels in Ae. aegypti and Ae. albopictus genomes.
| SNPs | Insertion | Deletion |
| Total | Per 100 kb | Total | Per 100 kb | Total | Per 100 kb |
| | Max. | Min. | | Max. | Min. | | Max. | Min. |
Ae. aegypti | 1,44,16,484 | 33.41 | 0.01 | 74,427 | 27 | 0.01 | 82, 060 | 27 | 0.01 |
Ae. albopictus | 2,89,40,433 | 51.46 | 0.01 | 88,943 | 0.22 | 0.01 | 1,00,044 | 0.26 | 0.01 |
Characterisation of predicted DNA polymorphism
The DNA polymorphisms in Ae. aegypti and Ae. albopictus genomes were analysed based on their genomic location and their impact on protein function. The number of detected SNPs and InDels showed considerable variations across chromosomes and scaffolds in Ae. aegypti and Ae. albopictus genomes respectively, using SnpEff (Supplementary Table S1, S2, S3 and S4). We compared the number of transitions (Ts) and transversions (Tv) between both the mosquito species relative to their reference genomes. In Ae. aegypti, we identified 9,496,179 SNPs being transitions and 7,532,184 being transversions resulting in a Ts/Tv ratio of 1.26. Similarly, in the Ae. albopictus genome, we found 19,861,171 SNPs being transition and 14,032,475 being transversions, yielding a Ts/Tv ratio of 1.415 suggesting a very small influence of sequencing error on our variant calling [47]. Notably, the frequency of transitions (A/G and C/T; Ts) outnumbered transversions (A/T, G/C, A/C, and G/T; Tv), possibly due to frequent occurrence of tautomeric shifts and deamination [48, 49]. In the case of transition, frequencies of A/G and C/T were 1,934,862 and 2,085,766, respectively. However, frequency of A/T transversion (1,359,849) and G/C transversion (37,625) was much less than the frequency of transitions (Fig. 1A). This pattern has been observed previously in organisms such as An. funestus, Drosophila [50–52], where the higher prevalence of transition (C↔T and A↔G) compared to the transversion (A↔T, G↔C, A↔C and G↔T), is partially attributed to the frequent occurrence of 5-methylcytosine deamination reactions, particularly at CpG dinucleotides [50]. This preference for specific transition types is intricately linked to factors influencing codon degeneracy and selective pressures that govern gene preservation [52]. The length distribution for deletion ranged from 1bp-64bp in both the species, while insertion length varied in Ae. aegypti (2bp-46bp) and Ae. albopictus (2bp-43bp) (Fig. 1B and 1C). Dinucleotide insertions were prominent, accounting for 45.6% in Ae. aegypti and 43.8% in Ae. albopictus, while single-nucleotide deletions made up 42.3% and 41.6% in the respective species. Deletions slightly outnumbered insertions, with 82,060 in Ae. aegypti and 100,044 in Ae. albopictus, compared to 74,427 and 88,943 insertions, respectively.
Genomic distribution and annotation of DNA polymorphism
We conducted a comprehensive genome-wide annotation of predicted DNA polymorphisms in Ae. aegypti and Ae. albopictus, focusing on their genomic localization and effects on protein function. We found DNA polymorphisms occurred more frequently in noncoding region including intron, intergenic, 5’UTR and 3’ UTR region where as the coding region such as exonic regions had the lowest percentages, with 1% and 1.8% of SNPs, 0.11% and 0.45% of InDels in Ae. aegypti and Ae. albopictus, respectively (Fig. 2A, B). The significant genomic variation found in introns and intergenic regions could result from reduced natural selection pressure and/or domestication effects in these areas [53, 54]. We have selected DNA polymorphism from exonic region for further analysis. The functional impacts of exonic polymorphisms were classified into four categories: high, moderate, low, and modifier (Supplementary Table S5). Among all variants, modifier variants occupied nearly 99%, followed by low-impact variants at 0.9% and 1.5%, moderate impact at 0.17% and 0.27%, and high-impact variants being the least common around 0.002% and 0.003% in Ae. aegypti and Ae. albopictus genomes, respectively. High-impact variants were associated with disruptions in functional proteins, including splice sites and start/stop codons, Moderate impacts included non-disruptive changes in protein function (missense mutation), while low impacts were mostly harmless (e.g., synonymous coding), modifier impacts were observed in non-coding regions [53, 55, 56]. Frame-shift mutations were prevalent in the high-impact group, while missense mutations dominated the moderate-impact category. Synonymous mutations were most common in the low-impact group, while variation in intron regions were most common in the modifier group (Supplementary Table S6). Since high impact variants cause non-functional proteins leading to various phenotypic changes, we were motivated to investigate these variants among Ae aegypti and Ae. albopictus. The high impact variants include disruption of splicing sites, loss of translation start codon, and introduction of premature stop codon, and frame shift mutation [57].
Genes carrying high-impact polymorphism and their functional relevance
Comparative analysis revealed a slightly higher number of genes carrying high-impact SNPs/InDels in Ae. albopictus compared to Ae. aegypti. We separated mixed genetic polymorphism (SNPs and InDels) into distinct categories, identifying 1742 SNPs and 168 InDels affecting protein function in Ae. aegypti. Similarly, in Ae. albopictus, we found 3175 SNPs and 933 InDels with significant impacts on protein function. These SNPs included disruptions in splice sites, loss of translation initiation codons, introduction of premature stop codons, and loss of stop codons (Fig. 3). Notably, both the genomes exhibited a higher occurrence of premature stop codons (stop-gained), followed by splice site donor and acceptor sub-type of variants. The functional classification of these variant containing genes using cluster of orthologous gene (COG) with eggnog-mapper [58] revealed that a large group of variants affect the genes involved in signal transduction, post-translational modification, protein turnover, chaperones, transcription, cytoskeleton, intracellular trafficking, secretion, vesicular transport, lipid transport, and metabolism (Fig. 4A, B, C).
To assess the impact of these variants on protein function, we conducted an analysis where we filtered and determined the number of SNPs detected in each protein family (Pfam) groups of selected proteins utilizing the results from the COG analysis. We selected the Pfam group harbouring more than 10 SNPs (Fig. 5A, B) and we observed a significant presence of high-impact SNPs in various Pfam domains. Specifically, we identified 154 and 68 SNPs in genes containing zinc-finger domains, 143 and 43 peptidase family includes 74 and 28 SNPs belong to trypsin (pfam) like serine proteases or proteases belonging to S1 family, respectively. Additionally, 40 and 34 SNPs were in genes with protein kinase domains, 37 and 33 SNPs were identified in genes encompassing leucine-rich repeat domains and very small number of variants in protein coding for serine protease inhibitor (serpin) family in Ae. albopictus and Ae. aegypti, respectively. We majorly focused on variants (stop gained and frame-shift variant) containing genes with serine protease domains for further analysis as these proteins can have significant implication on the mosquito biology. For instance, stop-gain variants lead to pre-mature termination of protein translation resulting in protein truncation and frame-shift variants can disrupt the reading frame of the genetic code, leading to altered amino acid sequences and degradation of the transcript as studied earlier in human [57, 59]. Hence, investigating these variants can be helpful in gaining insights into the genetic basis of certain traits such as blood-digestion and immunity, enabling us to understand the consequence of these genetic alterations on protein functionality.
Sequence variation in genes encoding serine proteases and their inhibitors
Our study aimed to examine genetic variants of serine protease genes and their inhibitors, known as serpins. We focused on these genes due to their diverse functions, limited existing studies, and potential applications in mosquito population control. We identified 13 stop-gain and two frame-shift variants affecting 28 serine protease genes in Ae. aegypti, and 58 stop-gain and seven frame-shift variant sub-types impacting 74 serine protease genes in Ae. albopictus (Supplementary Table S7, S8). These stop-gain variant sub types within serine proteases underscore their evolutionary significance and adaptations to environmental pressures [60, 61]. According to SnpEff and COG classification [58, 62], these genes predominantly belong to the peptidase S1 protein family under the class “O” and “E”, associated with amino acid transport, metabolism, cellular processes, signalling, post-translational modification, and protein turnover (Fig. 4). Cross-referencing with databases like NCBI and Vector Base validated our findings, enabling characterization of sequence variations, domain regions, active sites, and mutation positions. Based on the presence or absence of the catalytic triad essential for the catalytic activity serine protease like genes classified into serine proteases and serine protease homologues, respectively [63, 64]. These proteases feature either or a combination of these specific modules, such as clip domain, complement control protein (CCP) domain, low-density lipoprotein receptor class A (LDLA) domain, myb/SANT-like (MADF) domain, complement CUB domain, frizzled (FRI) domain and scavenger receptor Cys-rich (SR) domain [63–65].
Among the SNP-bearing serine proteases, a shared lineage emerges in both genomes, including trypsin-3-like, trypsin-4-like, chymotrypsin-like, snake-like serine protease, and transmembrane serine protease-9. Furthermore, Ae. aegypti features additional proteases such as stubble-like and trypsin-5G1-like, while Ae. albopictus showcases melanisation protease, polyserase type, easter-like, grass-like, prostatin-like, masquerade-like, phenoloxidase activator-like, persephone-like, and acrosin-like and other clip-domain serine proteases. These proteins play various roles such as immune Toll pathway modulation [66–68], involvement in the melanisation pathway [65, 66, 69, and 70], dorsal-ventral patterning [63, 66, and 71], wing development [72], somatic muscle attachment [66, 73] and mosquito blood-feeding behaviour [66, 74]. Our findings call for more biochemical and expression studies on trypsin-like serine proteases due to their crucial role in mosquito blood-feeding behavior [74, 75].
Furthermore, our data uncovers polymorphisms in genes of the CLIP B and D like serine protease sub- families, corroborating earlier RNAseq reports [61]. Notably, CLIP-B is the major sub-family, members like CLIPB15, CLIPB9, and CLIPA14, alongside CLIPB8, participate in prophenoloxidase activation in melanisation process and play roles in the Toll and IMD pathways of the innate immune response in Aedes mosquitoes [76, 77]. Our results align with previous study [35], where Dengue virus and Rift valley fever virus infected Ae. aegypti cell line analyses identified up regulated CLIP-B and CLIP-D serine proteases (CLIP-B15, CLIP-B34, CLIP-B35, CLIP-B46, and CLIP-D1). Expansion in the clip domain serine protease gene family has been identified previously in Ae. aegypti and Ae. albopictus genomes [42, 43]. Our high-impact polymorphism data reveals more DNA polymorphisms in CLIPs of both Aedes genomes, including stop gained, stop lost, start lost, frame shift variant, splice site, and intron variants. These polymorphisms in Indian field isolates of Aedes may have implications for immune response functionality, warranting further molecular and biochemical exploration.
Additionally, our study revealed high-impact polymorphisms in serine protease inhibitor proteins, a larger superfamily involved in development, reproduction, and regulation of innate immunity across eukaryotes [78, 79]. The COG analysis classified these genes under "V" class, associated with defence mechanisms (Fig. 4) [58]. We observed more stop-gain and splice site polymorphisms in different serpins in both Ae. albopictus and Ae. aegypti genomes (Supplementary Table S9, S10). Notably, serpins named as SRPN27A found in our study have been implicated in innate immunity, activating phenoloxidase and stimulating melanization reactions [78]. Splice site variants in Serpin-10 in Ae. albopictus may be involved in immunity [79]. Identifying more of these serpin-serine protease pairs throughout the entire genome can enhance our understanding of how they work together in the immune system. This knowledge is particularly important for combating disease vectors like Aedes mosquitoes and can provide valuable insights into developing targeted bioinsecticides, contributing to effective strategies for mosquito control. On the other hand, in the field of vector population management, some plant protease inhibitors have emerged as promising targets, particularly those aimed at mosquitoes’ digestive proteases like trypsin [80]. Current research focuses on harnessing the insecticidal potential of several plant-derived protease inhibitors against these Aedes mosquito species [80–82] offering insights into enzyme specificity and design for larval control.