Evaluation of introgression chromosome recombination fragments in CSSLs
After several generations of self-pollinated, 515 markers were selected to evaluate the locations of introgression segments from donor parent in the lines with multi-segments again. Based on the genotypes of the molecular markers and the basis of the physical locations, the lengths and the locations of the introgression segments in each line were determined (Table 1), and a physical map was constructed (MM-map) (Figure 1a). A total of 480 introgression segments were identified in the 325 CSSLs using SSR markers, with introgressions ranging from the least 10 ones on chromosome A03, D02 and D04 to the most 30 ones on chromosome D11. Among these, 222 lines carried one introgression segment despite the differences in lengths, and 103 lines were classified into the multi-segments group (Additional file 1: Table S1).
Based on SNPs from the sequencing data, 17,992 recombinant bins distributed on the 26 chromosomes were identified, which ultimately constructed 1,211 recombination chromosome introgression segments from Gb in the 313 CSSLs (Figure 1b and Additional file 2: Table S2). None chromosome introgression segments were detected in 10 lines in the CSSLs populations based on SNPs. The physical length of the introgression segments ranged from 97 kb to 104.23 Mb, with an average length of 4.43 Mb. Based on the physical map (GR-map), re-sequencing data significantly reduced the number of SSSLs, only 54 lines carried only one donor segment, and the lines with less than four segments just closed to half of the population (Additional file 1: Table S1). Significant difference of introgressions appeared in Dt-subgenome with 14 one on D02 and 126 ones on D07 (Table 1).
Comparison of the genome coverage between SSR markers and SNPs
Based on the marker position of the genetic map, 6175.33 cM of the total length of the donor segments was counted by SSR markers, with3462.62 cM of effective coverage length. The whole cotton genome coverage based on the genetic map was 78.42%, and At-subgenome had a lower coverage ratio of 73.73% compared with the 83.33% in Dt-subgenome. The lowest coverage was on chromosome A07 with only 25.46%, and the highest appeared in the Dt-subgenome with no missing on chromosome D08 (Table 1).
The physical map constructed by SNPs covered 2.24 times of the total length of the cotton genome (Additional file 3: Table S3), with 1922.93 Mb of effective coverage length and 86.11% whole genome coverage. Compared to the MM-map, GR-map had a higher percentage of coverage in At-subgenome (89.48% in At-subgenome vs 80.31% in Dt-subgenome). Although the coverage of 16 chromosomes exceeded 90%, there were still 4 chromosomes with coverage of less than 50%. Notably, chromosome A07 had the lowest coverage consistent with the MM-map result, and more than 98 CSSLs detected the same segment on the chromosome D07 located at 5.0-6.5 Mb.
Phenotypic variation in CSSLs
Significant differences were observed between the parents across multiple traits and multiple environments, such as seed cotton weight per boll (BWT), lint percentage (LP), seed oil content (SOC) and all fiber quality traits. Fourteen traits were evaluated in five environments except that SI was just investigated in two environments (Additional file 4: Table S4 and Additional file 5: Table S5), and all traits showed a continuous distribution in the CSSLs. The broad-sense heritability (H2) was lower than 50% for the yield-related traits, indicating that they were easily affected by the environment (Additional file 6: Table S6). Higher H2 value of the lint percentage (LP) (76%), fiber length (FL) (77%) and SOC (87%) indicated that they were more affected by the associated genes coming from the Gb-genome. Fiber quality of Gb was outstanding in all environments, while the mediocre level of the fiber traits was observed in the lots of the CSSLs. Interestingly, recombination of the interspecific genomes also produced various fuzz fiber mutations with different densities and colors (Additional file 7: Figure S1). The N29 line produced fuzz-less phenotype similar to the Gb reported previously [10].
Positive and negative correlations between evaluated traits were calculated (Table 2). Plant height (PH) and first fruit branch height (FFBH) showed weak correlations with each other and with the yield-related traits (BWT and LP). But significant correlations were observed between fiber quality traits. Fiber length (FL) was significant positively correlated with fiber strength (FS) and fiber uniformity (FU), while negatively with micronaire value (MIC), fiber elongation (FEL), short fiber content (SFC) and fiber mature content (FM). The higher value of the SI followed the principle of negative correlation between yield and fiber quality, which may in turn increase of SOC.
Genetic basis of the morphological mutation in the CSSLs
Although the donor parent 3-79, the genetic standard of Sea-island cotton, had undergone artificial selection, cognitive of the plant height type for Sea-island cotton still appeared in the CSSLs (Figure 2a). The “open-bud” floral buds phenotype was found during the flower development with the exposed stigma and dead anther (Figure 2b). The associated marker BNL3479 located on chromosome D13 was similar to the former research (Additional file 8: Table S7) [34].
By using the high resolution of recombination segments, the iconic characteristic of the Gb, sub-okra leaf trait was identified in the CSSLs. Two nearby KNOTTED1-LIKE HOMEOBOX Ⅰ transcription factors homologous to the LATE MERISTEM IDENTITY1 (LMI1), Ghir_D01G021810.1 and Ghir_D01G021830.1, were located near the 61.14 Mb on chromosome D01. An 8-bp deletion in the third exon of the gene Ghir_D01G021810.1 showed the same mutation as reported previously (Figure 2c and d) [37]. These examples showed that the high throughput detection methods could confirm an identified locus at a single gene-level resolution in this population.
QTL mapping yield-related and fiber quality traits in the CSSLs
To evaluate the valuable genetic loci of interspecific hybridization that are important in cotton breeding, QTL was mapped based on these CSSLs. The coverage fragments in the genome were divided into 620 blocks, with an average of 3.12 Mb ranging from 29 kb to 69.47 Mb (Additional file 9: Table S8). A total of 64 QTL for 14 traits were mapped on 20 chromosomes with 38 in At-subgenome and 26 in Dt-subgenome (Figure 3 and Table 3). The phenotypic variation explained by each QTL ranged from 0.73 to 14.67%. There were 19 QTL for four yield-related traits (BN, BWT, LP and SI) and the favorite alleles were from the Gh background. All the QTL for BWT and LP had negative alleles from Gh background, suggesting that the Gh has been domesticated for high yield. While, two QTL had positive alleles for BN indicating that Gb also had the potential to increase yield production. A total of 28 QTL were detected for fiber quality traits, most of which (18/28) had positive alleles from Gb. Of these, completely co-localization was observed for FL and FS, indicating that there was a significant correlation between them. Eight QTL for MIC were detected on seven chromosomes which explained phenotypic variation ranging from 2.54 to 7.09%. Contrary to FEL and FU, the positive alleles of SFC and FM were contributed by Gh. Poor fiber quality phenotype in the CSSLs declined that the genetic recession has occurred in the interspecific hybrids between Gh and Gb.
Genetic recession in the CSSLs
Genetic recession was a widespread phenomenon in the distant hybridization population. Fiber quality is one of the primary goals of cotton interspecific breeding. In this study, 7 lines with longer FL and 4 QTL for FL were identified in the CSSLs. Interestingly, two lines (N180 and R88) did not contain the QTL intervals, and two QTL intervals (on A01 and D06) also did not appear in the longer FL lines. The 13 fiber quality QTL identified in the single segment substitute lines (SSSLs) was inconsistent with the results of the same traits in this study except q-FLA02 [10]. So, we designed a weight mean of additive effects of fiber quality (WAF) value to analyze the source of additive effect for minor-effect genetic loci. Based on the correlations among the fiber traits, the additive effect of the genome was calculated (Additional file 10: Table S9). As a result, At-subgenome from Gh showed a higher additive contribution to fiber quality, while D-subgenome from Gb showed opposite results (Additional file 11: Table S10). In the Gb genome, more than 80% regions of chromosome A012, D02 and D12 had an additive effect on fiber quality improvement (Figure 3). In addition, there was no additive effect from Gb on chromosome D07. More than 90% regions of chromosome A11 showed the effect of Gh. Notably, the non-contribution effect for fiber quality in At-subgenome was signification higher than that in Dt-subgenome. Of these, both chromosome A08 and A12 from Gb or Gh had more than half of the regions contributing no effect for fiber improvement.
QTL mapping for SOC and substitution mapping of QTL locus q-SOCA01-1
Less concern of the SOC in Gb showed significant difference compared with the recurrent parent ‘Emian22’. A total of 12 lines showed extremely significant (p ≤ 0.001) and stable higher SOC than recurrent parent ‘Emian22’ (Additional file 12: Table S11), and 15 QTL were detected to be related to SOC using BLUPed data; of these QTL, 12 were firstly characterized and only two QTL for SOC have been reported previously in an interspecific population (Table 3) [43]. Fortunately, three SSSLs (N159, N160 and N161) contained the same block (block3) on chromosome A01, providing an excellent materials for further research. Compared with another 7 lines including the parents, these three lines showed extremely significant high SOC properties like the donor parent (Figure 4). In the associated interval (block 3 ≈ 1.08Mb), there were 69 and 70 annotated genes in the Gh reference genome TM-1 and Gb reference genome 3-79, respectively. A previously study showed that cottonseed oil accumulates rapidly at the middle-late stages (20 to 30 days post anthesis) [44]. Hence, we focused on the genes that are expressed in gradients in ovules with significantly higher expression levels than other tissues (root, stem, leaf and fiber)[10]. Among these genes, the Gene Ontology (GO) analysis indicated that only six were involved in fatty acid metabolism process in both genome (Additional file 13: Table S12). Unfortunately, it is not significant difference expression of these oil relate genes in ovule between Gh and Gb (Additional file 14: Figure S2). Intringuing, another gene, Gbar_A01G002860.1, encoding a predicted mitochondrial pyruvate dehydrogenase kinase (mtPDK), showed higher expression than its homologous gene Ghir_A01G003150.1. However, previous data from Marillia et al. reported that the seed-specific partial silencing of the mtPDK resulted in increased storage lipid accumulation in developing seeds [45]. Hence, this gene may play an important role in storage lipid accumulation in late developing stage of cotton seeds.