Salix exigua genome assembly - Fresh frozen leaf tissues were collected for genome sequencing and assembled from one male (SE967M) and one female (SE916F) that originated from the Rio Bonito population in eastern New Mexico. High molecular weight DNA was extracted and sequenced by Nextomics Biosciences (Wuhan, China) using a MinIon platform (Oxford Nanopore Technologies, Oxford, UK). For sequencing error correction, 150bp pair-ended Illumina short-read sequencing was also performed by Nextomics Biosciences (Wuhan, China) on these same individuals using MiSeq 2000 (Illumina, Inc., San Diego, CA, US). Finally, for scaffolding Hi-C libraries were constructed for the male individual and sequenced on the MiSeq 2000 platform (Illumina, Inc., San Diego, CA, US).
The Oxford nanopore reads were filtered for average read quality > = 7 using Guppy (Oxford Nanopore Technologies, Oxford, UK), and adapters were trimmed by Porechop v0.2.4 using default parameters57. The raw assembly was completed using Flye v2.9-b1778 with no scaffolding mode58 and polished with Illumina NGS reads for 4 rounds using Nextpolish v1.3.1 59. In total, 122.5 X coverage (49 Gb) of long reads were used to assemble the raw genome of S. exigua, and 427.5 X coverage (171 Gb) short reads from Illumina were used for polishing. Hi-C reads were aligned with the nanopore raw assembly contigs using Juicer v1.560, and contig scaffolding into chromosomes was finished by 3d-dna (version 180922)61 using diploid mode and correction for 10 rounds. An all-to-all alignment using lastz v1.1.13 between our S. exigua assembly and the S. purpurea 94006 PacBio assembly v5.1 (https://phytozome-next.jgi.doe.gov/info/Spurpurea_v5_1) was conducted to confirm assembly quality using the following parameter settings: --chain --gapped --transition --maxwordcount = 4 --exact = 100 --step = 2062. Alignments were plotted using last-dotplot62. Chromosomes were numbered according to the S. purpurea 94006 PacBio assembly v5.134.
The initial Oxford nanopore assembly for the Salix exigua male resulted in an N50 of 351k. After scaffolding with Hi-C reads, we anchored 89.5% of the nanopore contigs into 19 chromosomes and 2,258 additional scaffolds. The BUSCO score of final assembly was 98.1% and synteny with the S. purpurea genome was high, indicating a reliable reference genome (Fig. S1). The female genome, which was assembled using only Nanopore and Illumina sequencing, consisted of 1,650 contigs with an N50 of 665k.
Sequence capture and SDR mapping - Salix exigua leaf tissue from 24 males and 24 females were collected near the Rio Bonito in Fort Stanton-Snowy River Cave National Conservation Area, Lincoln County, Ruidoso, New Mexico in the spring of 2017. Tissues were dried in silica gel prior to DNA extraction. Sampled individuals were > 25 m apart to minimize sampling from the same genotype. Sex was scored by the presence of male or female flowers on catkins. To phase sex-linked haplotypes, we grew 48 half-sibs from seeds collected from the same S. exigua female that was used for genome assembly (SE916F). Leaf tissues were collected for DNA libraries construction and then merged into the same sequence capture pipeline for the wild population. Finally, leaves were sampled for DNA extractions from a diverse set of 10 male S. purpurea in the germplasm collection of the willow breeding program at Cornell University in Geneva, NY.
For S. exigua samples, DNA was extracted from leaves following the protocol from the Qiagen Plant DNA Extraction Kit (Qiagen, Hilden, Germany), but modified with 2 extra hours of precipitation. Bar-coded libraries were made for each individual using the NEB Next Ultra II Library preparation Kit (New England Biolabs, Ipswitch, MA, USA). Library size and concentration were confirmed using Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA), and 8 libraries were pooled prior to hybridization during the sequence capture. 60,000 RNA baits were designed from coding regions of S. purpurea 94006 PacBio assembly as described in Sanderson et al. 2021 and were used to enrich 16580 target genes in library sample 28 using standard protocols provided by Arbor Biosiences myProbes protocol v 3.0.1. After hybridization, 48 hybridized libraries were pooled and sequenced to 60x on Illumina HiSeq 5000 Genome Analyzer (Illumina, Inc., San Diego, CA, USA). Leaf samples were also collected from ten unrelated male S. purpurea samples and DNA was extracted using a modified Qiagen protocol that used 500 uL of buffer AP1 instead of 400uL, and included three, 65uL elution steps, followed by an ethanol precipitation to purify DNA. Library preparations was conducted using SparQ enzymatic DNA fragmentation and illumina compatible iTru adapters. Libraries were sequenced using an illumina paired-end P3 300 cycle flow cell platform (Illumina, Inc., San Diego, CA, USA).
Paired-end reads were mapped to the S. purpurea version 5 genome assembly34 with the W chromosome (without Chr15Z) using bwa mem v0.7.17-r118863. S. exigua population were additionally mapped to S. exigua genome assembly to examine the position of SDR as well as S. purpurea version 5.1 genome assembly with W chromosome for analyzing homologies among X, Y, Z, and W. Mapping output was converted into BAM files following the Broad Glod Standard methods for variant processing64.65..
Variant calling was completed with the GATK variant calling pipeline66. HaploCaller module from GATK was employed to call variants from mapping results of 48 individuals under parameters of allowing minimum heterozygosity 0.015 for SNP and 0.01 for indels. The raw VCF file which includes genotypes was generated by GATK GenotypeGVCFs module. Mapping information including QualByDepth (QD), Depth (DP), FisherStrand (FS), StrandOddsRatio (SOR), RMSMappingQuality (MQ), MQRankSum and ReadPosRankSum was screened to determine the hard filtering criteria. To remove low quality data, initial filtering was performed under criteria of MQ < 15, SOR > 4, QD < 20, FS > 60, MQRankSum < -10, ReadPosRankSum < -2, ReadPosRankSum > 2. Individual DP values were also checked to avoid the removal of good quality variant calls from a few individuals. Final genotype filtering kept variants with DP between 6 to 100.
SNP loci with at least 75% genotypes in filtered VCF files of each species were used for downstream analyses. Sex was included and tested as a binary phenotype using the GWAS pipeline using PLINK v1.90b467. GWAS results were shown as p values for each SNP loci. Lower p values indicate large distinction found between males and females. Manhattan plot was generated in R68 using ggplot2 package69 based on the position and negative log-transformed p values of each SNP to show the boundary of SDR. After transformation, SNPs with high values on plot from upper outliers with a Bonferroni-adjusted significance threshold of P < 0.05 were selected out for screening allele preference in males and females. Genotype matrix of SNPs with high p-value using Bonferroni multiple testing correction were output from vcftools v0.1.170. Output of genotypes was reinterpreted as heterozygosity level on sex-associated SNP loci. Sex determination systems were determined by whether heterozygous happens on male (XY) or female (ZW).
Phasing sex-linked haplotypes in S. exigua - The same mapping and SNP calling pipeline was used on 48 half-sibs from S. exigua. First, sexes of each half-sib were determined using sex associated SNPs identified in our mapping study (see above). Alleles on the Y chromosome (hereafter Y-alleles) in the S. exigua SDR could be identified for some loci by subtracting the mother alleles from half-sib progeny genotypes. When the mother was 0/0, Y-alleles could be identified when male progeny were 0/0, 0/1, or 0/2; and when the mother was 0/1, Y-alleles could be identified when male progeny were 0/0, 1/1, 1/2, & 0/2. Y-alleles were confirmed only if all males in the half-sib progeny elicited one of these genotype patterns. X-alleles were identified as the alternate alleles in the female half-sib progeny at the same loci where the Y-alleles were identified. X- and Y-alleles were assigned as sex-linked if they were sex-linked in our GWAS analysis.
Flanking regions (150bp) of Y-linked SNPs that mapped onto the S. purpurea 15W chromosome were extracted to align in S. purpurea 15Z and S. exigua Chr15 to decide the movement of sex-linked region among different sex chromosomes using BLASTN71. Plotting the positions and connection between pairs of flanking regions were performed using a custom Visual Basic application (Balena, F., & Fawcette, J., 1999).
Identification of Y-specific contigs - Y-specific contigs from our S. exigua genome assembly were identified by comparing sequence coverage on Chr15 and all unassembled scaffolds between males and females. Y-specific regions are absent from females. Candidates for sex-specific regions were used to identify and extract specific contigs from the Nanopore assemblies. Additional alignments were performed against the genome assembly of the opposite sex to confirm that the Y-specific regions were absent from females. Synteny maps between candidate and sex chromosomes were generated using lastz62 and perl tools.
Phylogeny reconstruction - Eight species in Salicaceae were selected to reconstruct a phylogeny (Idesia polycarpa, Populus trichocarpa, S. nigra, S. chaenomeloides, S. arbutifolia, S. purpurea, S. exigua, S. viminalis). Each sample was hybridized against a sequence capture array targeting 1126 genes (Sanderson et al. 2020) and sequenced. Reads of all species were mapped to reference genes and assembled using Hybpiper72. All assembled genes in each species were concatenated into supermatrices. Species tree were inferred by IQTree under parameters of –bb 1000 -alrt 1000 -bnni -m MFP73–75.(Johnson et al., 2016). All assembled genes in each species were concatenated into supermatrices. The phylogenetic tree was inferred using IQTree under parameters of –bb 1000 -alrt 1000 -bnni -m MFP73–75.
Identifying ARR17 partial duplicates in S. exigua and S. purpurea - RR17 gene (P. trichocarpa: XM_002325453.3) copies were queried at both the chromosome and contig level in our S. exigua assembly, in S. purpurea Chr15 Z and Chr15W, and on Chr19 of both S. exigua and S. purpurea using BLASTN with -wordsize 871. Different partial duplicate copies on Chr15Y of S. exigua and Chr15Z of S. urpurea, and the full-length RR17 genes from Chr 15W of S. purpurea and Chr19 of P. trichocarpa (an outgroup) were aligned by ClustalW, respectively76. Phylogenies of each of the partial duplicates were reconstructed by MEGA11 using Maximum Likelihood with 100 bootstraps77.