Strain Identity
In India, S. frugiperda was found to be a serious pest in maize, sorghum and was occasionally found on sugarcane and cotton in different parts of the country. 53 FAW samples were collected from 35 different regions in India and Nepal during the year 2018-20 and were subjected to molecular identification at Division of Genomic Resources, ICAR-NBAIR, Bangalore, India. The generated COI sequences showed 100% similarity with S. frugiperda and were deposited in NCBI GenBank database. The sequences and the specimen details were submitted to the BOLD database and DNA barcodes were generated. A list of all COI sequences generated at ICAR-NBAIR is provided in Supplementary File (Table 1a; 1-53). Additional 105 COI sequences which were already deposited in GenBank from different parts of India between 2018-20 were also retrieved and combined with the ICAR-NBAIR dataset to study the diversity of Indian populations. Furthermore, FAW sequences from America (n=163), Africa (n=148) and rest of Asia [China, Japan, Vietnam, Korea, Bangladesh, Pakistan, Myanmar] (n=76) were collected for comparative sequence analysis and inter-population studies. The approximate locations from which the analysed specimens were collected are depicted in Figure 1 and the details of all the retrieved sequences are provided in Supplementary File (Table1).
We began by investigating the polymorphisms in COI and Tpi gene sequences in India, using strain defining loci and polymorphic sites mentioned by Nagoshi et al. (2019) [18]. COIA region or the barcode segment contains polymorphic strain defining locus mCOI602Y, which has been used to distinguish between ‘R’ and ‘C’ sister strains of FAW in Western Hemisphere. Upon investigation of the mentioned locus, out of 158 COIA sequences from India, 89.5% of the samples belonged to ‘R’ strain, and the rest belonged to ‘C’ strain, while the classification of 6 specimens could not be determined because of incomplete COI sequences. The observed structure is reminiscent of the populations from East Africa and the previously characterised Indian populations from the state of Karnataka [18]. Based on the fact that the samples were mainly collected from corn crops, it reflects the discordance between the COI marker and the host association of FAW.
Another strain marker COIB, a segment downstream of the barcode region, was analysed which contains polymorphic loci mCOI1164D and mCOI1287R giving rise to five haplotypes, four belonging to Corn strain (CSh1-4); and one to the Rice Strain category. The relative distribution of COIB haplotypes, CSh4 and CSh2, has been used as an indicator to study the descent of FAW populations. FAW populations collected from Florida and East coast of America majorly belong to CSh4 haplotype while those from Texas and most other parts of America show predominance of CSh2 haplotype [23]. COIB haplotyping was done for 24 populations of FAW in India and 22 of the tested populations belonged to ‘R’ strain while the rest two were ‘C’ strain belonging to CSh4 haplotype. Overall, the distribution and the composition of both COI markers at strain defining loci are strikingly similar to those of Nagoshi et al. (2019), and do not show much variation in the span of two years (Figure 2a,b).
The polymorphisms in Tpi gene were studied in the fourth exon and intron segment, which identified three ‘C’ Strain haplotypes (n=17), TpiCa1a (n=6), TpiCa2a (n=2) and TpiCa2b (n=2). Besides this, seven specimens demonstrated the hybrid TpiCa1a/TpiCa2a haplotype (Figure 2c). Unlike the previous study, we could not find any specimen belonging to Tpi ‘R’ strain category, in homozygous or hemizygous form. While TpiCA1 is still the predominant haplotype as observed by Nagoshi et al. (2019), but our data shows higher frequency of TpiCA2a haplotype than that observed earlier. Also, like majority of the invasive populations, we could observe discordance between Tpi and COI markers for host association prediction [18-19]. In fields, 16 of the analysed Tpi specimens were collected from corn crops while one was isolated from wheat crop. Thus clearly, Tpi gene marker was found to correlate well with the host strain preference in Indian FAW populations for our dataset as well.
Polymorphism analysis for FAW populations in India
In order to understand the haplotype diversity of FAW in India, we analysed 460bp of COI barcode region for 144 specimens from India. We were able to find 29 polymorphic sites (31 mutations) in the dataset at a nucleotide diversity of 0.00313. Occurrence of non-synonymous mutations (71%) was higher than the synonymous ones (29%). We also identified 21 different haplotypes from Indian CO1 sequences at a haplotype diversity of 0.498, out of which 17 haplotypes belonged to rice strain while four belonged to the corn strain. As per our knowledge, this study catalogues the highest genetic diversity for FAW reported from India till date. Majority of the analysed COI sequences belonged to a single rice haplotype (65.2%; India_haplotype 1; n=94) which was distributed throughout the country. This was followed by another ‘R’ strain haplotype (India_haplotype 2; n=12), while the next largest haplotype belonged to the predominant ‘C’ strain from India (India_haplotype 3; n=10). Bulks of the remaining haplotypes were represented by singletons (Figure 3a,b). We could not find region specific association of any haplotype in India. Fu’s Fu test and Tajima’s D test statistic were significantly negative, suggesting that the FAW population in India is undergoing expansion (Table 1). Within ‘R’ strain (n=131) and ‘C’ strain (n=13) populations, the nucleotide diversity was higher in ‘C’ strain than ‘R’ strain. Mismatch distribution curve followed a largely unimodal curve for ‘R’ strain populations, representing population expansion within the strain (Figure 3c). The same was well corroborated with Fu’s Fu and Tajima’s D neutrality tests statistics (Table 2). Surprisingly, we found two gene conversion tracts within India_haplotype 21 and one gene conversion tract in India_haplotype 20, indicative of mitochondrial recombination events between ‘C’ and ‘R’ strain in India (Figure 3d). Both the haplotypes were identified as ‘C’ strain haplotypes on the basis of mCOI602C locus but demonstrated an inter-strain haplotypic signature. Both the haplotypes were found in close proximity to each other in the Indian state of Tamil Nadu.
It is important to note here that due to the unavailability of longer COI sequences for several specimens, we had chosen a partial COIA region (460bp) for polymorphism analysis so that we could include a majority of the Indian populations for this study. In order to derive information from the missing region, we chose a subset of sequences (n=86), where polymorphism analysis could be performed with 594bp of COI gene sequence. With this subset, we observed 49 polymorphic sites, 53 mutations and a haplotype and nucleotide diversity of 0.6440 and 0.005 respectively (Supplementary File Table 2). If we merge both the datasets, it is clear that the haplotype diversity of FAW in India is definitely on a higher end than that projected by both the individual datasets.
Comparative genetic analysis across the geographical range
We were interested to compare the diversity of Indian FAW population with that of other geographic regions to study the biogeographic patterning of FAW and to study the relatedness of different sub-populations with each other. For this, we divided the FAW population in four broad groups where population level studies were feasible. These groups included 1. America (contains population from North and South America); 2. Africa, 3. India and 4. Asia-II (includes populations from Bangladesh, China, Korea, Vietnam, Japan, Myanmar and Pakistan). Along with the previously defined 144 specimens from India, we analysed 163 COI sequences from America, 148 from Africa and 76 from rest of the Asian countries for polymorphisms. For a comparative study, analysis from all sequences was performed using the same 460bp region of COI gene which was used for the analysis of Indian populations. In the process, we observed 27, 3, 5 haplotypes respectively from America, Africa and East-Asia. The predominant ‘R’ and ‘C’ COI haplotypes from America [GenBank Accession: U72977.1, U72974.1] represent the principal haplotypes in all the invaded regions and are presumed to represent the ancestral haplotypes introduced to these regions. Other than the ancestral haplotypes, Africa, India and Asia-II were represented by 1, 19 and 3 haplotypes specific to these regions respectively. Amongst the invaded regions, we were surprised by the reduced haplotype diversity of Africa and rest of the Asian countries when compared to that of India. Neutrality test statistics for both African and rest of Asian populations suggested that the populations are still evolving rather neutrally in these places (Table 1).
We compared the polymorphic sites among the different sub-populations to access the shared polymorphisms across the groups (Figure 4). We found that African and American populations shared the maximum number of mutations (n=7), while both India and Asia-II shared 6 mutations each with the American population. Also among the invading populations, India had the highest number of unshared polymorphisms with the other populations (n=25), upholding the other statistics regarding population expansion in India.
Haplotype network and genealogical inferences
Haplotypes networks were constructed using haplotypes and their frequencies from populations of America, Africa, India and Asia-II using R-package pegas v0.14. The network constructed for 50 FAW haplotypes across all locations is shown in Figure 5. The network clearly shows the star-like expansion pattern for the two ancestral rice and corn strain COIA FAW haplotypes. Both the ancestral haplotypes are dominant haplotypes of each of the four geographical regions. However, there are clear differences in the distribution of these haplotypes across the four regions. While the frequency of rice/corn haplotypes in America is 0.41; rice haplotype is the dominant haplotype of the three invaded regions with Africa, India and Asia-II showing frequencies of 0.84, 0.89 and 0.79 respectively. The network suggests introduction of the two identical maternal lineages at all the invaded regions. Apart from that, there is no evidence to propose multiple introduction events in the invaded regions on the basis of this data. Most of the novel corn haplotypes were found in America whilst most of the novel rice haplotypes were sequenced from India despite much recent invasion of the ancestral rice strain in India. Hap46 from India represent a link between the two ancestral haplotype networks. Hybrid haplotypes of this nature have not been found or reported from any other region before. Notably, gene-conversion tracts were also observed in the same haplotype through polymorphism studies.
Population structure
Genetic structure between the four geographical groups was analysed using AMOVA. We ran seven separate AMOVA analysis, initially comparing the four geographical groups and then the same analysis was performed between different combinations of groups (i.e. America and Africa, America and India, America and Asia-II, Africa and India, Africa and Asia-II, India and Asia-II). The results are shown in Table 3. The results suggest significant genetic differentiation between the four geographical groups (24.8%) indicating the existence of population structure between regions. In addition to this, significant variation was observed between native American and the invasive populations, with the highest differentiation seen with respect to Indian populations (39%). Also, the results predict that both India and Asia-II are genetically most similar to the African population than America.
Similar results were corroborated with DAPC analysis and the resultant membership table which indicated two clear discriminant clusters between America and India (Figure 6).