2.1 Plant Materials
All plant samples in this study were sourced from the Bailai State-owned Forest Farm in Hutou Town, Anxi County, Quanzhou City, Fujian Province, China (118°3' E, 25°49' N, 590-600 m above sea level). The plant experiments and field studies, including the collection of plant materials, complied with relevant institutional, national, and international guidelines and regulations. The germination and sprouting experiments of cypress seeds were conducted in the greenhouse of Fujian Agriculture and Forestry University, with seeds disinfected before sowing. At the end of 2015, seed germination began under controlled environmental conditions, maintaining a temperature of 22°C to 25°C and a relative humidity of 70%-80%. Successfully germinated seeds were then transferred to seedling trays filled with standard nutrient soil to promote seedling growth. Seedlings received at least 8 hours of natural light, supplemented by grow lights to ensure adequate lighting, and were watered and fertilized according to standard care procedures. In April 2016, when the seedlings reached about 15cm in height, they were transplanted to outdoor plots with similar soil texture, drainage conditions, and light exposure. The experiment employed a completely randomized block design (CRD) with 8 plots, each containing representative seedlings from 14 different provenances (Table S1; Figure S1). Five seedlings from each provenance were randomly assigned to each plot to ensure even distribution across the experimental area, with a planting distance of 2m to minimize competition. Appropriate measures were taken to alleviate transplant shock, and the growth and adaptability of the seedlings were regularly monitored post-transplantation. In April 2020, four years after transplantation, key growth parameters of the surviving seedlings were measured, including Height, Diameter at Breast Height (DBH), Crown Width (CW), and the Average Angle of Dips, the latter obtained by calculating the average angle of the three lowest secondary branches of each plant.
2.2 Transcriptome Sequencing
In April 2020, we randomly selected seven individuals from each seed provenance and collected their stem-differentiating xylem, leaves, branches, and roots. These tissues were mixed by category and disinfected uniformly with alcohol for 30 seconds to one minute, then rinsed in deionized water for 2 minutes to remove any alcohol residue and dead microbes. One gram of mixed tissue sample from each type was immediately weighed, flash-frozen in liquid nitrogen, and stored at -80°C for later use. Total RNA was extracted using the RNAprep Pure Plant Kit (Tiangen, Beijing, China) and the quality of RNA was assessed by 2% agarose gel electrophoresis. The concentration of RNA was determined by the NanoPhotometer® spectrophotometer (IMPLEN, California, USA) and the Qubit® 2.0 Fluorometer in conjunction with the Qubit® RNA Assay Kit (Life Technologies, California, USA). RNA integrity was evaluated using the RNA Nano 6000 Assay Kit on the Agilent® 2100 Bioanalyzer (Agilent Technologies, California, USA), with an RNA integrity number (RIN) greater than 7 considered satisfactory. Subsequently, libraries were constructed using the NEB-Next® Ultra™ RNA Library Prep Kit for Illumina® (NEB, USA) and sequenced on the Illumina® NovaSeq 6000 platform to produce 150 bp paired-end reads.
2.3 RNA-seq read mapping and quantification
We performed initial processing of the raw RNA-Seq reads using fastp v0.23.2 (S. Chen et al. 2018) with parameters set to ensure quality (-qualified_quality_phred 15 --unqualified_percent_limit 50 --n_base_limit 15). The resulting clean reads, along with those from other C. hodginsii RNA-Seq datasets (NGDC: CRA013478), were aligned to the genome using the Two-pass alignment mode of STAR v2.7.8a (Dobin et al. 2013). The alignment parameters (--alignIntronMin 20 --alignIntronMax 50000 --sjdbOverhang 149 --outFilterMismatchNmax 2 --outSJfilterReads Unique --outSAMmultNmax 1 --outSAMmapqUnique 60) were chosen to enhance the accuracy and reliability of the alignment. After alignment, structural annotation files based on the new alignment results were generated using gffcompare v0.12.6 (Pertea and Pertea 2020). The clean reads were then realigned to the C. hodginsii reference genome, which included new annotations, to ensure the accuracy of the alignment. Finally, gene expression levels for each gene were quantified using featureCounts v2.0.1 (Liao, Smyth, and Shi 2014) with default parameters.
2.4 Static analysis
First, we obtained information on 19 environmental factors from various seed provenance locations (Table S1), sourced from the WorldClim database (Fick and Hijmans 2017). Spearman correlation analysis was employed to assess the interrelationships among these environmental factors. Subsequently, we conducted gradient forest model analysis using the R package gradientForest v0.1-37 (Ellis, Smith, and Pitcher 2012), setting the tree count to 500, to evaluate the impact of each environmental factor on gene expression levels and determine their significance. Based on the results of the gradient forest analysis and the correlations between environmental factors, we selected factors that were both significant and had low intercorrelations to reduce potential collinearity issues. These selected environmental factors were then used as explanatory variables in subsequent Redundancy Analysis (RDA) to delve deeper into their relationship with gene expression data. RDA, known for its low false-positive rate (Capblancq and Forester 2021), was conducted using the R package vegan v2.6-4 (Jari Oksanen et al. 2017) to further reveal how environmental factors influence gene expression patterns. Significant environment-associated expressed genes were defined by their loadings on the tails of the distribution, using a standard deviation cutoff of 3 along one or more RDA axes.
To gain a deeper understanding of the modular characteristics of gene expression, this study employed WGCNA. We utilized the Python package PyWGCNA v2.0.1 (Rezaie, Reese, and Mortazavi 2023) with default parameter settings. This analysis clusters genes into modules based on similar expression patterns, aiding in exploring the correlations among these modules, environmental factors, and morphological traits of each seed provenance. It helps identify groups of genes that may be influenced by specific environmental factors. Within each module significantly related to environmental and phenotypic factors, we further filtered to select genes with a KME value greater than 0.6 as key genes. These key genes are considered to play central roles within the modules and may be sensitive to environmental changes. Finally, we selected these key genes representing modules with high correlations to environmental factors to construct a Partial Least Squares Path Modeling (PLS-PM) using the R package plspm v0.5.1 (Tenenhaus et al. 2005), with the maximum iteration count set to 300. This model integrates the relationships among environmental factors, functional genes, and phenotypic changes. The PLS-PM model reveals the direct and indirect effects among these variables, offering profound insights into how gene expression is regulated by environmental factors at the seed provenance location, impacting the phenotype.