This study applied a decision tree technique to establish a multilevel data-driven model predicting PA behavior defined based on activity profiles and to methodologically identify the correlates of PA behavior. The decision tree fitting was performed using 168 factors of different domains as input variables to predict physically active and inactive individuals. The final model selected a total of 36 factors of different domains by which formed 54 different subgroups of subjects. The factors emerging from the decision tree model such as body fat percentage, normalized heart rate recovery 60 seconds after exercise, urban-rural areas, average weekday total sitting time, and extravagance score were associated with SED, LPA, and/or MVPA time. The multilevel model and can be potentially informative for both multilevel intervention allocation and design, since it specifies the correlates of PA behavior at different level in each subgroup.
In agreement with the results of prior studies focused on understanding the causation of PA behaviors (6,9,12), the established model in the present study indicates that PA behavior is explained by a multilevel hierarchy composed of various factors of different domains. However, our results go beyond previous reports, showing that the predictors of PA behavior for different subgroups are different and from various domains. This is new and noteworthy because it can have potential implications for designing targeted, multilevel interventions including effective screening of subgroups followed by suggesting key modifiable factors for each subgroup. The data-driven nature of our model is also important, since prior studies have conceptualized the influences of PA behaviors mainly on the basis of theoretical combining of common sense and well-established evidence, and therefore primarily provided a broad view of PA behavior and its causation for general populations (5,9). Even though previous multilevel models have been successful in hypothesizing the interaction between factors of different domains, their implications in practice have remained limited (9), partially due to their theoretical nature. There are two studies that have applied a data-driven approach and established a decision tree-based model but with self-reported measurement of PA and a limited number of factors, one of which using only demographical factors (29) and the other using only sociodemographic factors (30).
The emerged factors in the decision tree model included well-established, less established, and novel (unknown) correlates of PA behavior with regards to the previous studies focusing on both identifying and prioritizing the correlates of PA and sedentary behavior (5,6,9,10,31). The well-established correlates include factors that have been assessed in several studies and recognized as correlates. Most of the demographical, psychological, and environmental factors in the decision tree model have been recognized as factors associated with PA behavior in the past works (5,6,10). Some examples of these factors are education level, profession, overall health status, fitness status, and population density. The decision tree model also included some less established factors such as the factors related to personality and temperament such as extravagance, impulsiveness, and explorative excitability (5). Such factors (or factors similar to them) were assessed in a few or several studies but, mostly due to the limited or sometimes contradictory evidence, had not yet been identified as correlates nor been rejected. The body composition measures (i.e., lean body mass and skeletal muscle mass) and a few of the psychological and environmental factors such as enjoyment of daily activities and number of road accidents can also be categorized as less established factors (5,9,31). The decision tree model also included a few measures related to heart rate recovery. Even though the association of PA with heart rate recovery measures have been well-studied (32), they can be considered as novel factors associated with PA behavior that are identified in the present study because our results indicate the existence of another direction of relationship that has not been previously examined.
The less established and previously undiscovered factors that are found here are potential candidates for the next generation of correlates. These factors were selected by the decision tree for creating the final model from a wide list of input (independent) variables. This suggests that the less established and novel factors that emerged in the decision tree model might be relatively more important correlates and likely surrogates for the other previously less established or well-established factors that were not selected by the decision tree to create the model, such as behavioral attributes (e.g., alcohol, smoking, etc.) or socioeconomic status (5). The less established and novel factors that are found here have most probably remained underreported (or unexamined) due to the subjective tendency in the existing literature towards examining only those factors for which there has been a very well-established evidence of significant associations (positive or negative) with different indices of PA behavior (10). This has resulted in numerous correlates studies typically focusing on only psychosocial and environmental factors (6,10) and, accordingly, the emergence of calls for correlates research across different domains to discover the next generation of correlates (6).
Nevertheless, relative importance of the emerged factors should be inferred with caution. There might be some other possible reasons, at least in some cases, rather than higher relative importance of the factors in the final model compared to those that did not appear in the final model. First, the study’s subjects had a narrow age range (46–48 years). This might explain why some of the very well-known correlates of PA behavior including age and gender did not appear in our final model (5,6,10,33). According to a previous systematic review (10), age was inversely associated with PA participation and when the age of the study’s subjects was diverse enough, there were significant differences in PA participation between men and women (higher in men). Second, the outcome variables in our study were defined differently compared to prior works. To date, there is no agreed upon approach to differentiate between individuals with different PA behaviors. Previous works have used different, sometimes arbitrary, cut-offs (e.g., 150 minutes of MVPA per week, more than 7.5 hours of SED time, etc.) to define PA behavior (30,34). It has been argued that cut-offs are inappropriate for defining PA behavior because they do not reflect the whole range of activity intensities performed by individuals in everyday life (8). Instead, we defined PA behavior based on activity profiles derived from a range of activity intensities including SED, LPA, and MVPA, assessed objectively over the course of one full week (26). Therefore, the relative importance of the appeared factors in the final model should be interpreted with respect to our definition of PA behavior that is more complicated and composed of a range of activity across the intensity spectrum.
Body fat percentage, a direct measure of adiposity, was the most important factor explaining PA behavior in the decision tree model. Even though it is typically assumed that PA impacts adiposity-related measures, this result is consistent with the findings of a previous systematic review that has suggested the existence of a possible bidirectional relationship between adiposity and PA behavior on the basis of satisfactory evidence from longitudinal studies in which PA was predicted by adiposity (6). A number of other factors for which the other direction of relationship is generally assumed were also seen in the other layers of the final model including muscle strength and heart rate recovery measures. Of note is the prognostic value of most of these factors for several chronic health conditions. For example, attenuated heart rate recovery is associated with an increased risk of diabetes (35), or can even indicate the presence of coronary artery disease (36). Another example is low muscle mass that is associated with increased risk of type 2 diabetes, metabolic syndrome, and coronary artery disease (37). Chronic health conditions have been identified both as a barrier and as motivations towards PA in different populations (38). Even though the self-reported measures addressed the prevalence of diagnosed diseases (e.g., having diabetes, hypertension, etc.), these direct measures were eliminated from the list of input variables due to the high number of missing values. Besides, the study’s subjects did not consist of only healthy individuals. As a result, the factors with prognostic value of chronic diseases found in our model may be acting as partial surrogates for chronic health conditions/risks and their effects on different PA behaviors.
We also performed association analysis between all the emerged factors in the decision tree model and three PA metrics. Almost all the emerged factors in the decision tree model were significantly associated with SED, LPA, and/or MVPA. The results of association analyses were, at least for the well-established factors, in line with previous studies. For instance, a better health-related quality of life score was associated with lower levels of SED (39), and higher levels of LPA and MVPA (5). The results of association analyses also indicated the relative importance of the identified factors, supporting that our results can be used to highlight the factors associating with PA behavior in terms of priority.
The main strength of the present study is the inclusion of a wide list of potentially modifiable factors rather than a few subjectively selected factors (6,10), which resulted in the discovery of the novel predictors. The use of objective measurement of daily PA is also a strength. Previous studies have typically used self-reported PA measures that are known to be imprecise and biased (40). Another strength is the discrimination of PA behaviors based on activity profiles built using the whole activity intensity spectrum over the course of one full week (8).
This study is not without limitations. The study design was cross-sectional, and therefore the causality of identified factors remains unknown. The causality of the identified factors, especially the less established and novel ones, needs to be tested in future studies. Additionally, some of the emerged factors in the final model were related to cultural and health behaviors. This might limit the results of this study to only the current study population and might not be generalizable to different study populations with different cultural and health behaviors. Further studies are required to confirm the generalizability of the identified factors across different populations.