Materials and sample collection
TheCMS-D8 system, a sterile line (A), maintainer line (B) and restorer line (R) were provided by the Institute of Cotton Research (ICR), Chinese Academy of Agricultural Science, Anyang, Henan, China. A BC1F1((A line ×Rline) ×B line) segregation population was constructed, and all materials were grown at the Cotton Research Farm at the ICR. Fresh leaves were obtained from the parent lines and BC1F1 population. Anthers from buds1-2 mm, 3 mm, and 4 mm in length were collected and combined from 100 plants. All harvested samples were snap-frozen in liquid nitrogen and stored at −80°C before use.
Fertility segregation analysis
During anthesis, visual fertility surveys were conducted for 1623 individuals of the BC1F1 population of CMS-D8 under fieldtrial conditions, three times per plant. The presence of pollen in a plant indicated fertility and was determined by squeezing the anthers between the fingers because the male sterility of CMS-D8 occurs during meiosis, the S(rf) gametes are sterile, and S(rf2rf2) produces no pollen. So,the observed values of fertile and sterile plants were obtained, the fertility trait segregation ratio compatibility test was carried outusingExcel software, and the Chi-square value was obtained to determine the genetic model of Rf2.
DNA extraction, library construction andIllumina sequencing
DNA was extracted by the CTAB method[80], and the quality of DNA was assessed by 1.2% agarose gel electrophoresis. The purity of DNA was examined usingan Agilent Technologies 2100 Bioanalyzer. The DNA concentration was estimated using a Qubit® DNA Assay Kit in a Qubit® 2.0 Fluorometer (Life Technologies, CA, USA).Equal amounts of DNA (1.5 μg/sample) from 100 BC1F1 plants with fertile phenotypes were mixed to form the fertile sample (F-pool), and those from another 100 plants with sterile phenotypes were mixed to form the sterile sample (S-pool). Sequencing libraries were generated using the VAHTSTM Universal DNA Library Prep Kit for Illumina® V3(Vazyme Biotech) according to the manufacturer’s recommendations. Briefly, the DNA samples were fragmented by sonication to a size of 300-500 bp. Then, the DNA fragments were end-polished, A-tailed, and ligated with the full-length adapter for Illumina sequencing by PCR amplification. Consequently, the PCR products were purified (VAHTSTM DNA Clean Beads (Vazyme #N411)), and libraries were analysed for size distribution by an Agilent 2100 Bioanalyzer and quantified by real-time PCR. The libraries constructed above were sequenced by the Illumina HiSeq platform, and 150 bp paired-end reads were generated with an insert size of approximately 350 bp.Thesequenceraw reads were submitted to the SRA database of NCBI under the BioprojectPRJNA685585.
Data analysis, data filtering, and alignment
The Fast x-toolkit (v 0.0.14–1) was used to filter out the low-quality reads such as reads with ≥10% unidentified nucleotides (N), reads with > 50% bases having phredquality < 5,r reads with > 10 nt aligned to the adapter, allowing ≤10% mismatches, and putative PCR duplicates generated by PCR amplification in the library construction process (read 1 and read 2 of two paired-end reads were completely identical).The released genome of Gossypium hirsutum was downloaded from the Cotton Research Institute (CRI) of Nanjing Agricultural University of China (http://mascotton.njau.edu.cn/Data.htm, v1.1) and used as a reference genome[49].For mapping to the reference genome, BWA(Burrows-Wheeler Aligner)[81] was used to align the clean reads of each sample against thereference genome (settings: mem -t 4 -k 32 –M -R). Alignment files were converted to BAM files usingSAMtools software [82] (settings: –bS –t). In addition, potential PCR duplications were removed usingthe SAMtools command “rmdup”. If multiple read pairs had identical external coordinates, only thepair with the highest mapping qualitywas retained.
SNP/InDel detection and annotation
Variant calling was performed for all samples by using the unified genotyper function in GATK [83] software. SNP was used as the variant filtration parameter in GATK (settings: --filterExpression "QD < 4.0 || FS > 60.0 || MQ < 40.0", -G_filter "GQ<20", --cluster WindowSize 4). InDel was filtered by Variant Filtration parameter (settings: --filter Expression "QD < 4.0 || FS > 200.0||Read PosRankSum < -20.0 || Inbreeding Coeff < -0.8 "). ANNOVAR [84], an efficient software tool, wasused to annotate the SNPsandInDels based on the GFF3 files for the reference genome.
Determination of a candidate interval using SNP index and G’values
To obtain a highly accurate SNPset, a range of filters were also employed[85].The homozygous SNPs between two parents were extracted from the VCF files for SNPs.The read depth information for homozygous SNPs in the above offspring pools was obtained tocalculate the SNP index [86]. The SNP index method was used for the analysis, and the SNP-index dot matched curve was obtained by regression fitting as described by Abe et al.[87].The G’ values areused for noise reduction while also addressing linkage disequilibrium (LD) between SNPs[88].To rule out the effects of unreliable markers, we screened markers based on the SNPindex using the following conditions: 1) the sequencing depth of both parents was greater than 8; 2) both pools had a sequencing depth greater than 10;3) the SNPindex values of the two pools were not greater than 0.8 or less than 0.2 at the same time; and4) the SNPindex value difference was greater than 0.8.Sliding window methods were used to present theSNP index of the whole genome. Usually, we used a window size of 2 Mb as the default settings, andused QTL-seqr software to calculate Δ (SNPs-index) and G', based on the markers that were scanned[89].
Fine mapping ofRf2usinghigh-throughput SNP genotypingandInDel makers
For the analysis, DNA was isolated from the two parental lines and the BC1F1 population. The informative molecular markers were used for genotyping each plant of the BC1F1 population, various recombinants in the target region were identified, and the linkage relationship between markers and the Rf2 locus was analysed for gene mapping. According to the results of sequencing mutation detection and BSA of sequencing candidate intervals, one SNP was selected for approximately every 1.5 Mb of physical distance. The selected SNPs met the requirement of having a variation index close to 0.5 in the F-pool and0 in the S-pool.A subset of 24 selected SNPs was used for genotyping by the Illumina HiSeq PE150 sequence. Individual BC1F1 population (except 200 plants that formed pools) plants were genotyped using high-throughput SNP genotyping. Based on the SNP locus exchange of individual plants, we narrowed down the range in which the target gene was located.
TheInDel markers were developed based on the insertion deletion mutation within the selection interval. Primers were designed using Oligo7 software [90] and synthesized commercially (TSINGKE Biological Technology, Zhengzhou, China). The PCR system consisted of 20 μL of PCR mixturethat contained 1× reaction buffer, 2.0 mM MgCl2, 0.2 mM dNTPs, 0.5 mM each primer, 1 U Taq DNA polymerase (Takara, Japan), and 50 ng of DNA template.The PCR amplification conditions were as follows: 35 cycles of denaturation at 94°C for 30 s, annealing at 56°C-58°C for 30 s, and extension at 72°C for 30 s. Then, the reaction was held at 4°C. The PCR products were visualizedby3.5% agarose gel electrophoresis. Based on the difference between the genotypes as assessedusing polymorphic markers, recombinants were identified inthe BC1F1 population and used to fine mapRf2.
Marker-assisted breeding of restorer lines
The utility of the InDel markers for marker-assisted selection was determined in a segregating population. First, the restorer line [N(Rf2Rf2)] of CMS-D8 was crossed with the recurrent parent [N(rf2rf2)], which has excellent agronomic characteristics. Beginning in the BC1F1 generation, the codominantInDel marker 1327 was used to track the restorer gene in each generation, and the other markers were used for further verification. Only those individuals verified by the markers were chosen as the female parent for successive backcrosses. In the BC4F1 population, 120 individuals were randomly selected, and then InDel1327 was used to perform segregation analysis. The individuals verified by the markers as homozygous at the restorer gene locus were test-crossed with the sterile line [S(rf2rf2)] to determine the segregation of the fertility phenotype in the offspring under field conditions. The atpA SCAR marker distinguishes the CMS cytoplasm from other types of cytoplasm[2].Here, the InDel 1327 marker was combined with theatpASCAR markerto identify hybrids in the CMS-D8 system.
Real-time quantitative PCR (qRT-PCR) analysis
The annotated genesin the interval were analysed, and real-time quantitative PCR was performed to identify PPR family genes and differentially expressed genes that were selected based on anther transcriptome data of the CMS-D8 system(unpublished data).Total RNA was isolated using the Sigma Spectrum Plant Total RNA Kit (Sigma-Aldrich, USA) according to the manufacturer’s protocol. Reverse transcription was conducted using the PrimeScriptTM RT Reagent Kit (TaKaRa, Beijing, China). TransStartTopGreen qPCRSuperMix (Trans gen, Beijing, China) was used according to the manufacturer's instructions to conduct qRT-PCR of the genes.The internal control gene used for qRT-PCR was the cottonHis3 gene (i.e., histone 3) with primers of GhHIS3F and GhHIS3R, and relative gene expression levels were calculated using the 2-△△CT method[91],as described in detail in previous studies[92-94]. Each gene in each sample was analyzed with three replicates and two technical replicates. All primers are listed in Table S1.