We have conducted a genomic prediction study for solid wood properties assessed on increment cores from Norway spruce trees with SilviScan derived data from pith to bark, using properties of annual rings formed up to tree age 19 years as the reference age. We also compared this with genomic prediction of proxies for density, MFA and MOE estimated with data from same trees measured at the bark of the standing trees with Pilodyn and Hitman instruments. The study was conducted on 62 open-pollinated (OP) families.
On Norway spruce operational breeding, the use of OP families is preferable because it does not require expensive control crosses. The only action required is to collect cones where progenies are typically assumed to be half-sibs. Thus, OP families permit evaluation of large numbers of trees at lower cost and efforts than structured crossing designs. We investigated narrow-sense heritability estimation with ABLUP and marker-based GBLUP and the effect on predictive ability (PA) from using different training-to-validation set ratios, as well as different statistical methods. Further, we investigated what precision of GS can be reached when training the models with data from trees of different ages, also comparing results for the solid wood properties to those for their proxies. We also estimated PAs reached when coring to different depths from the bark at different tree ages, in order to find cost-effective methods for GS with minimum impact on the trees on the acquisition of data for training the prediction models.
Narrow-sense heritability (h 2 )
In our study, PA estimates for both pedigree and marker-based methods were consistent with their respective h2 estimates. A conifer literature review indicates that the level of consistency varies across studies 8, 18−20. When comparing between the ABLUP and GBLUP methods, our estimates for density, MOE and Pilodyn the h2 were on similar levels for Velocity and MOEind higher based on ABLUP and for MFA higher for GBLUP. Using GBLUP the estimated h2 values were clearly lower for the standing tree-based methods than for measured density, MFA and MOE: 54%, 35% and 45% lower, respectively. In a previous study conducted on full-sib progenies in Norway spruce, however, the ABLUP-based h2 were reported higher in all three standing-tree-based measurements 11. Instead, other conifer studies based on full- or half-sib progenies reported a comparable performance of A-matrix and G-matrix based methods in Pinus taeda 18, 23, Douglas-fir 29 and Picea mariana 15 for growth related traits and wood properties. Moreover, ABLUP accuracies were lower for growth, form and wood quality in Eucalyptus nitens 24. Experimental design factors such as number progenies and their level of coancestry, statistical method and the traits and pedigree errors under study may account for the apparent inconsistence in the relative performance of both methods 51.
Our results indicate that for more heritable traits ABLUP and GBLUP capture similar levels of additive variance, whereas for traits with very low heritability using ABLUP, such as MFA, the markers are able to capture additional genetic variance probably in the form of historical pedigree reflected in the G matrix. Less obvious is the case for Velocity and MOEind where GBLUP seems to capture lower values of additive variance. It is possible that at intermediate values of h2 the benefits of capturing historical consanguinity is overcome by possible confounding effects caused by markers which are identical by state (IBS) or simply due to genotyping errors. The h2 values obtained with ABLUP and GBLUP is the result of a balance between multiple factors such as the genetic structure of the trait, the historical pedigree, and the possible model overfitting to spurious effects or genotyping errors.
Effects on GS model predictive ability (PA) of training-to-validation sets ratios and statistical methods
In conifers and Eucalyptus cross-validation is often performed on 9/1 training to validation sets ratio 8, 12, 15, 16, 28. This coincide with the general conclusion from the present study, with exception for MFA and MOE, for which the best results were obtained at ratio 8/2. It has been suggested that when the trait has large standard deviation, more training data is needed to cover the variance in order to get high predictive ability 52. So, for density, Pilodyn and Velocity, PA kept increasing with the size of the training set. But for other traits with smaller standard deviation, (4.44 and 2.28 for MFA and MOE), PA decreased when increasing the training set from 80–90%, which may indicate that too much noise was introduced during model training.
The fact that the estimated PAs for all the solid wood properties as measured in the lab instrument are 25–30% higher than their proxies estimated from measurements of penetration depths and sound velocity at the bark may reflect the indirect nature of their proxies: the correlations calculated for the almost 6000 trees initially sampled were between − 0.62 Pilodyn and density, -0.4 between Velocity and MFA and 0.53 between MOEind and MOE 40.
In the conifer literature it has more often been reported similar performance of different marker-based statistical models for wood properties 11, 12, 18, 28, 53. This general conclusion agrees with our findings for all our traits with the exception of Velocity and to a less extent MOEind. For these, GBLUP and rrBLUP performed better than the other GS methods, which could be the result of a highly complex genetic structure where a large number of genes of similar and low effect are responsible of the control of the trait. For traits affected by major genes the variable selection methods, for example BayesB or LASSO, have been reported to perform better 18, whereas for additive traits the use of nonparametric models may not yield the expected accuracy 54.
Comparison of PA and PC from methods based on pedigree (ABLUP) and markers
Generally, pedigree-based PA estimates in conifer species have been reported to be higher or comparable to marker-based models 11, 15, 16, 19, 20, 23, but there are also some studies reporting marker-based PA estimates to be higher 13, 24, 55. Our results for density and Pilodyn follow the general finding in forest trees, whereas for MFA, a low heritability trait, the PA estimate based on GBLUP model is substantially higher (0.16) compared to the ABLUP model (0.04). As we already discussed in relation to h2, additional genetic variance is being captured by the G kinship matrix. When PA is standardized with h, the predictive accuracies of the methods become more similar across traits, indicating that proportionally similar response to GS can be expected for all traits.
Use of tree age versus cambial age (ring number)
From a quick look at Fig. 2, one may get the impression that breeding based on cambial age data allows earlier selection than using tree age data. That would however be a too rushed conclusion. At tree age 3 years, after the vegetation period of 1993, only 12.5% of the trees had formed a first annual ring at breast height. Not until tree age 6 years, more than 90% of the trees had done so, that is by number 5 years “higher age”. In contrast, all the trees had obviously a ring of cambial age 1. But if aiming for 90% representation, one must wait several years more for more rings were formed at breast height, more precisely, from 1993 to end of growth season 1996, tree age 6. And to train models based on data from 90% of the trees for cambial age say, 6 at breast height, samples cannot be collected until the end of growth season at tree age 11 years, or if a representation of 80% is judged as satisfactory, at tree age 10 years. This has to be considered if selection efficiencies are calculated based on cambial age data, which is common. Such results have for instance been published based on the almost 6000 trees sampled 2011 and 2012 30.
Correctly compared based on minimum 90% of the trees, the estimated PAs shown in Fig. 2 are similar between the age alternatives, or slightly better for use of tree age. For example, the PA for MOE using cambial age data shows a smooth increase, reaching above 0.2 at cambial age (ring number) 7, which needs data from the tree of age 12. The corresponding curve from using tree age passed above 0.2 already at age 8 years. However, curves based on tree age often show larger year-to-year variation. This is most likely an effect of the fact that the rings of same cambial age represent wood formed across a span of years with different weather. Thus, cambial age data reflect annual weather across a range of years, which does not happen when using tree age data. On the other hand, from a practical point of view, methods based on use of tree age may be easier to apply in operational breeding, especially in light of results as in Fig. 3, indicating that high PAs can be reached without coring all the way to the pith. To number the rings for precise cambial age, you need to find the innermost ring at the pith, but that may not be necessary for good results.
Implementation of GS for solid wood into operational breeding
The results indicate that GS can result in similar early selection efficiency or even higher than traditional pedigree-based breeding and offers further possibilities. Previously, in loblolly pine it was reported that models developed for diameter at breast height (DBH) and height with data collected on 1 to 4-year old trees had limited accuracy in predicting phenotypes at age 6-year old 21. In British Colombia Interior spruce, the predictive accuracy for tree height of models trained at ages 3 to 40 years, at certain intervals, and validated at 40 years revealed less opportunities for early model training, since the plateau was not reached until 30 years 28.
In our study, the highest PA values were obtained for the subsets of fast-growing trees which had reached breast height already at tree age 3 and 4 years, 12% and 51% of the total number of trees, representing a limited number of the OP families included in the analysis. The trees of this fast-growing group are affected by high intensity of selection for alleles accelerating growth within each OP family. Also, on cross-validation the prediction abilities for this group were calculated based on the trees within the same group. In this elite group different factors could account for a higher PA value, such as lower phenotypic variance, decreased number of alleles of minor effect could also facilitate identification of major effects and/or higher consanguinity between those families which may share alleles for growth. These models are shown for completeness, but as they cannot be used for operational breeding they are not further discussed.
Models for genetic selection are useful in different steps of a breeding program. A first type of prediction models, here illustrated with Table 1, can be trained from existing trials, preferably based on trees of as old age as available, as the aim of the breeding is to predict tree qualities at age of harvesting when the major part of the stem will be dominated by mature wood. Training the models in older trees for wood properties also allows considering other properties which cannot be easily observed from trees of very young age, such as stem straightness and health. For wood density, the results indicate that models can be built without coring very deep into the stem. It may be expected that this is valid also for instance for tracheid dimensions which in combination determines the wood density 34.
Due to the juvenility of the trees used in our study, the wood includes considerable proportions of both high MFA core wood and of lower MFA outer wood, with a pronounced shift in between. The results indicate that for such trees, information from the area around where the shift occurs is more useful for training of the GS model than information from the outermost rings, albeit resulting in generally low prediction abilities. This is an unexpected result. It may be expected that models for MFA and MOE trained on older trees would rely more on data from the mature wood outside the shift, similar to the models for density. Apart from models predicting traits of mature wood, ideally in trees of harvesting age, there is also an interest for models focusing on local features which can be detrimental for quality of wood-based products, such as traits of the inner core with high MFA and low MOE, and the size of this core with problematic properties for many solid wood products.
As illustrated in this work, two aspects of incorporating wood properties into operational GS breeding programs can be addressed with the same set of data. Firstly, as mentioned above, models for cost-effective selection based on genomic information from existing trees. In that case, models from data at old ages would normally be preferred, for example for wood density some model at bottom line of Fig. 3a. Secondly, models providing guidance on at what age it is reasonable to approach young trees for training of GS models for specific traits: a) trees in existing juvenile trials, or b) trees of new generations with different pools of genetics. As an example, the same Fig. 3b for wood density suggests GS model training at tree ages 10 to 12 on the third to fifth outermost rings to reduce costs and the negative impact on the tree.