Chromosomal composition of the BC4 hybrid, QC12-20006, by GISH
Genomic in situ hybridisation analysis of QC12-20006 with whole Erianthus DNA identified three Erianthus chromosomes in the sugarcane background. The red chromosomes are inherited from Erianthus and the green chromosomes are inherited from Saccharum (Figure 2). One of these chromosomes, indicated with a yellow arrow which has a majority of red signal also contains a small introgression of green indicating a recombinant chromosome (Figure 2).
Flow Cytometry and identification of peaks with Erianthus chromosomes
In all sugarcane genotypes examined to date, five composite peaks (I, II, III, IV, V) were identified in the flow karyotype except for the genotype R570 which only presents the first four (Metcalfe et al. 2019). Each genotype also had a characteristic flow karyotype profile. Five composite peaks were also identified in the QC12-20006 flow karyotype, even though the genotype has three Erianthus chromosomes (Figure 3). The QC12-20006 flow karyotype is similar to other sugarcane cultivars but distinct from S. officinarum (Metcalfe et al. 2019). The highest composite peak in the S. officinarum flow karyotype is II, whereas composite peaks I and II in QC12-20006 are the highest and equally abundant, with composite peaks III and IV also of equal height (Figure 3a).
To ensure enrichment of Erianthus chromosomes when sequencing, GISH with whole Erianthus genomic DNA was used to narrow down the screening of single flow sorted chromosomes to one or two composite peaks (Figure 3b). No signals were observed on chromosomes flow sorted onto slides from composite peaks I, II or V. Signals from the distinctive Saccharum-Erianthus recombinant chromosome was observed in QC12-20006 composite peaks III and IV (Table 1).
Amplification, screening and sequencing of chromosomes
Chromosomes from composite peaks III and IV were flow sorted into six 96 well plates and amplified. Using the standard amplification reaction time suggested with the REPLI-g Single Cell Kit (Qiagen), the amplification success rate was very poor. Only one out of the five samples processed resulted in a product at the expected yields and was positive for the Saccharum/Erianthus transposable element (TE). After increasing the amplification incubation time to 16 hrs, as described in the supplementary protocol for maximum yield, the success rate improved to 4 successfully amplified products out of 5 samples.
Thirty single chromosomes from the two QC12-20006 flow karyotype composite peaks were purified, amplified and PCR screened. The DNA from the single chromosome amplifications were evaluated on a 1.5% agarose gel. An example of the amplifications is shown in Figure S1. In Figure S1, all amplifications show high molecular weight DNA, except for Lane 2 which represents an amplification failure. The negative control produces a product of the same size as the samples because DNA is generated during the REPLI-g Single Cell reaction by random extension of primer dimers. Total yields ranged from 6 to 31 ug, with an average of 19 ug.
Purified and amplified samples were PCR screened with the three sets of primers (Table S1). Wells with no chromosomes flow sorted into them were PCR negative, indicating no contamination from the flow sorter or amplification process. Of the thirty single chromosome amplification products screened, twenty-four, 80%, were PCR positive for the Saccharum/Erianthus repeat, 17 of these 24 chromosomes, 71%, were negative for the universal bacterial 16S rDNA sequence. After screening with the Erianthus specific repeat, there were two MDA products that were positive for the Saccharum/Erianthus TE and the Erianthus specific repeat, and negative for the universal bacterial 16S rDNA. As we would expect 3 out of 105 chromosomes to be Erianthus (2.8%) and we identified 2 Erianthus chromosomes out of 17 tested (11.7%) by flow sorting from a selected peak we obtained a 4-fold enrichment for Erianthus chromosomes. Amplification yields for the MDA products that were Saccharum/Erianthus PCR positive and bacterial 16S rDNA negative were between 15 and 31 ug in a total volume of 25 uL.
Read trimming, alignment and mapping
Around 9 Gb of sequence was obtained for both single chromosomes sequenced (Table S2), resulting in approximately 110 x coverage, based on the estimated chromosome size of 80 Mb for sugarcane (cultivar R570: 10 Gb/112 chromosomes (Piperidis et al. 2010). After trimming and quality checking the coverage dropped to approximately 62x, with a mean read length of 107 bp and approximately 80% of the reads retained (Table S2). The top hit for about ½ of a random sample of 5,000 reads with a blastn against the NCBI nucleotide database was Saccharum, following by Sorghum (Table 2). The top hit was Saccharum rather than Erianthus because there is very little sequence information available for Erianthus. There was no indication of bacterial or human contamination.
Approximately 70% of the reads were properly paired and mapped to the R570 single tilling path (STP) (Garsmeur et al. 2018), which dropped to 43% for the first MDA product and 36% for the second MDA product after quality filtering (Table S2). For mapping to gene regions only, 24% and 18% of reads from MDA products one and two were properly paired and mapped, respectively. This dropped to 17% (7,602,141 reads) and 12% (10,056,900) after BAM mapping quality filtering (Table S2).
Filtered MDA product reads were separately mapped to the R570 STP chromosomes and genes (Garsmeur et al. 2018). Mapping to genes was used to identify which chromosome had been amplified and to examine gene coverage. Mapping to STP chromosomes was used to identify resistance genes and SNPs and to examine overall coverage. The highest proportion of reads from the first MDA product mapped to genes from chromosome 7, consistent with chromosome mapping results (Table 3, S3 and S4). Results for the second MDA product were not concordant, where reads mapped with higher frequency to gene sequences on chromosome 4, but overall mapped best across the entirety of chromosome 5. However, the highest mean depth and second highest mean breadth of read mapping was to chromosome 5 genes (Table 3, S3 and S4). Results for the second MDA product are therefore shown for chromosome 5.
Coverage of STP chromosome seven (MDA one) and chromosome five (MDA two) was uneven, showing very high coverage in some regions and very low coverage in other regions (Figure 4a). Figure 4 (b) shows that for the first MDA product 80% of chromosome 7 had no reads mapped, 16% had less than 200 reads, while 2.4% had over 200 reads, while for the second MDA product 98% of chromosome 5 had no reads mapped, 1.39% had less than 200 reads, while only 0.27% had over 200 reads.
The average gene read depth and percentage mapping breadth was much higher for the first MDA product (74.05 and 44.15% respectively), than for the second MDA product (30.68 and 12.59% respectively) (Table 3 and S4). This is reflected in Figure 5 (a), a higher proportion of reads from MDA one map with higher coverage to chromosome 7 than to any chromosome. Reads from the first MDA product covered 5% of the R570 STP chromosome 7 genes 80-100%, 16% of genes 40-80%, 42% of genes >0-40%, while 38% of genes were missing (Table S5). One hundred resistance genes were identified on chromosome 7 and 68 on chromosome 5. For the first MDA product, over half had some read coverage, 16% had over 40% coverage. Only 16% of the chromosome 5 resistance genes had some coverage with reads from the second MDA product. (Figure 5 and Table S5).
SNP identification and verification
As the test case was to see if we could identify SNP markers inherited from the Erianthus chromosomes, genes that were annotated as resistance genes were targeted. SNPs were identified in 24 classic resistance genes in MDA 1 on chromosome 7, i.e. nearly half of the resistance genes with some read coverage and in 7 of the resistance genes in MDA 2 on chromosome 5, i.e. over half of the resistance genes with some read coverage. A total of 136 SNPs within 31 annotated resistant genes were identified as potential candidate markers (Table S6). Genes were identified that had SNPs at a coverage of 3 to 19 reads or above. Four KASP primers were designed against four genes identified from the first MDA product and two KASP primers against one gene identified from the second MDA product (Table 4).
The KASP primers were optimised for annealing temperature and run across the parents of the BC3 Erianthus/Saccharum population. Five of the SNP markers did not segregate as single dose markers. One SNP, in gene ID Sh07_g011420, segregated as single dose in a small BC3 population (Figure 3f). This SNP segregated as expected for a single dose marker (1:1) and verified the high quality of the sequence information. A preliminary association test between this SNP data from the BC3 clones and their pachymetra root rot rating revealed a significant association (p≤0.01).