Safflower (Carthamus tinctorius L.) is a member of the Compositae family, grown as a vegetable, cut flower, herbal medicine, animal feed, birdseed, and oilseed, etc. in over 60 geographical regions covering the Middle East, Africa, America, Europe, and Asia [1]. In recent years, with a growing demand for healthy cooking oil and clean biofuel and bio-lubricants, safflower has emerged as a modern industrial oilseed crop due to its higher oleic and linoleic acid content compared to other oilseed crops [2, 3]. In 2019, FAO data showed safflower seed production world-wide was approximately 0.6 million tonnes, and the top 4 largest growers (Kazakhstan, United States, Russian Federation, and Mexico) produce over 75% of total production[4]. The Australian safflower growing area is currently about 40,000 ha, down from its peak of 74,688 ha in 1979 [5]. As a potential crop that could grow in a drier environment, safflower is gaining more research attention [6].
To date, genetic analyses for agronomic traits in safflower have largely been undertaken using conventional family-based methods. This has allowed the identification of genes and quantitative trait loci (QTL) for traits such as plant height, seed oil content, days to flowering, etc. [7–9]. Association mapping approaches have also been used to identify QTL in safflower. Six marker-trait associations (MTAs) for PH and five MTAs for DF were identified in an association study using microsatellite markers [10], while another study using AFLP markers detected four MTAs for PH under drought conditions [11]. The Fad2 gene family (Fatty acid desaturases, FAD) in safflower have been sequenced with genes being isolated and cloned [12, 13]. By using safflower core collections with SSR markers, an association study discovered several MTAs for oil content, oleic acid content and linoleic acid content [10]. However, no genome-wide association studies (GWAS) based on single nucleotide polymorphisms (SNPs) markers have been reported in safflower.
Statistical methods used in GWAS analysis are important for identifying MTAs for complex traits [14, 15]. Single-locus GWAS with mixed linear models (MLM-GWAS) has been widely used to detect the MTAs for agronomic traits in a variety of plants, including wheat [16], rapeseed [17], soybean [18], etc. To increase power to discover SNP with small effects and reduce false-positive associations, summary statistic-based methods (meta-GWAS) have been adopted in some studies [19, 20]. In canola, a meta-GWAS analysis identified 79 genomic regions conferring potential candidate resistance to canola blackleg disease, more significant SNPs than single-locus GWAS [21]. Differing from single-locus MLM-GWAS testing one marker at a time, multi-locus GWAS have been applied by fitting all loci simultaneously to improve fine-mapping [22, 23]. As a multi-locus Bayesian method, BayesR simultaneously accommodates all SNPs in the model, and SNPs effects were a mixture of four normal distributions, which include SNPs with 0, small and moderate effects. In each distribution, fewer SNPs explain the gradually more genetic variance [24, 25]. BayesR has been used to identify QTL or associations in dairy cattle and wheat [26, 27]
The variation in phenotypes among genotypes in different environments is evaluated as the extent of the genotype-by-environment interaction (G × E), which is also referred to as the traits phenotypic plasticity [28]. Identifying G × E interaction patterns and their genetic basis under multi-environment trials can deepen the knowledge of the genetic architecture of traits [29, 30]. In a canola study, 12 environment-stable QTL and 43 environment-specific QTL were detected for flowering time in three different ecological conditions, which provided new insights into the genetic regulatory network underlying the control of flowering time [31]. Few studies investigating G × E interaction patterns have been reported in safflower; those studies were carried out to evaluate genotypes and yield stability [32, 33].
In Australia, crop production is challenged by spatial drought patterns due to seasonal rainfall and high temperatures [34]. Therefore, understanding the G × E interaction and genetic basis underlying grain yield and related agronomic traits are important for safflower breeding. In this study, a globally diverse Genebank collection of 406 accessions was grown in 4 different field environments (2 trials in one location but with different field management in 2017 and 2 locations in 2018). We examined the G × E patterns for grain yield (YP), plant height (PH), days to flowering (DF), 500 seed weight (SW), seed protein (PR), and seed oil content (OL). GWAS was conducted for each environment using three methods. The aims were to: 1) assess genetic variability in the different environments and the level of G × E interaction for the agronomic traits measured; 2) identify MTAs for these traits and the genetic basis of their G × E interaction.