With the SSG model, we augmented a scarce student’s BD dataset, guided by their intrinsic statistical features. Subsequently, one can exploit the DT for its intuitive ability to extract hidden information from complex data in a way that is readily understood by a human. For comparison, Fig. 4a and Table S2 show the DT performance based on the receiver operating characteristic curves and F1 score trained on the augmented dataset. The augmented dataset enables the DT to generalize better than that of the scarce one, with area under the curve (AUC) values of 0.90 for the SSG (with FS) and 0.76 for the scarce dataset.
To explore the insights of the dataset, we visualized a tree diagram of the augmented dataset based on the optimized DT. In Fig. 4b, the topmost node with a Gini index of 0.5 belongs to the topmost node of “happiness,” and it branches out into two child nodes that belong to “subjective norm” and “intention.” The tree grows until it reaches a Gini index equal to zero (see https://github.com/lchlyw/SSG_augmentation for the fully-grown trees). Generally, features that are near the topmost node are better at classifying the BD behavior. As such, positive emotion (happiness) is the most important factor that predicts the students’ BD behavior. The results show that the students are more likely to engage in BD when surrounded by an encouraging environment and a positive mood. In many ways, students might experience happy feelings, and one good example can be observed at the end of the academic term. With peers’ encouragement/approval, it might be difficult to resist the temptation of excessive drinking, which is in agreement with the result of previous research related to enhancement motive and social-contextual factors49, 50.
Contrary to the neuroscience perspective, negative emotions, such as stress, are often hypothesized as indicators of the BD behavior6, 51. According to the rodent alcohol dependency model, the brain releases the corticotropin-releasing factor signal during a BD session, which governs the desire for alcohol drinking and anxiety51–53. Such a negative reward-seeking behavior can further increase the negative emotional state and alcohol consumption.
However, the current study shows that stress is the third most important factor in BD behavior after happiness and subjective norms. We anticipated two reasons for our observation: (1) First, our survey period was conducted during the university holiday (June 2020), which was at the start of the summer semester. Therefore, students do not feel too much academic stress; hence, stress is less significant than happiness in determining their BD behavior. (2) Rather than a tool to cope with stress, students’ alcohol drinking occasions are usually related to social activities, which are often motivated by positive enhancement.
We completed the design-to-device pipeline based on the DT diagram by demonstrating the knowledge transfer from complex and abstract data into an interactive mobile-app-based chatbot. Figure 4c illustrates an accessible Telegram chatbot (@DrinkingBehavior_bot) interface built with Dialogflow. According to the DT classification diagram, the chatbot design is based on a simple yes/no question–answer format related to the BD behavior. Because the current study aims to lay the foundation for building a viable problem-specific data augmentation model, a clinical trial is needed before it can be officially used as a professional diagnostic tool for mental health issues.
Limitations of the Current Work. Our study has several limitations. First, because our data were collected based on a single location in KAIST, it is important to collect data from other universities to validate our results. Second, although ML considers a black box that can uncover a hidden pattern within the dataset, it is always difficult to extrapolate into an unseen region. Therefore, our SSG system can only generate new data within preset boundaries with statistical characteristics similar to their training dataset. Third, to increase the Cronbach’s α, we should modify the questions in perceived stress scale assessment, subjective happiness scale, and perceived control and to capture a wider variance of behavior, more feature vectors should be included in the data collection process.
Future efforts can be made by administering this system in an actual clinical trial where we can compare the generated data with that of an actual observation, to validate the accuracy of SSG-generated data. Based on the proposition that students’ emotional state and social norms play an important role in motivating students into BD, an intervention design should focus on students’ emotional states and social norms.