DDdeep: deep learning-based text analysis for depression illness detection on social media posts

doi:10.21203/rs.3.rs-2313393/v1

Download PDF

Research Article

DDdeep: deep learning-based text analysis for depression illness detection on social media posts

https://doi.org/10.21203/rs.3.rs-2313393/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Recently, depression has been raised as one of the most popular mental health disorders in the world. Also, social networks can be considered a valuable resource for mental health research due to the tendency of users for sharing their thoughts and feelings. On the other hand, text analysis of user posts relying on neural networks for such research is increasing. Neural networks have recently achieved significant success in text analysis because of the ability to automatically extract distinguishing features from data. However, neural networks are ignored the temporal and sequential nature of users' posts on social networks which affects the accuracy of the results. This shortcoming prompted us to present a more efficient method considering the sequential and temporal nature of social media users' posts. Thus, we have proposed a deep learning-based hybrid method called DDdeep to handle the mentioned challenge. There are three main features in our method, which are (1) text analysis relying on the temporal and sequential nature of posts, (2) identifying depressed users only by considering how users use language, and (3) remembering decisions because of the dependence of each post on previous posts. The DDdeep method has integrated a convolutional neural network (CNN) to extract more important features and long-short term memory (LSTM) to remember previous decisions. Our method identifies the depressed users by 78% precision, 70% recall, and 73% F1-score. Therefore, the evaluation results of our method are acceptable and competitive compared to other valid methods in this field.

text analysis

deep learning

social networks

depression detection

Mental disorder is an important challenge in the field of public health care (Astleitner et al., 2023). On the other hand, depression is one of the most common mental disorders, and suicide and self-harm are important issues in this field (Ben Hassine et al., 2022; Kabir et al., 2022). Depression and anxiety disorders can cause suicide tendencies or even suicide attempts (Faisal et al., 2022; Marcus et al., 2012). Extensive studies have been performed in the field of detecting mental disorders and their review show suicide is the second leading cause of death among people aged 15 to 25 years globally (Faisal et al., 2022; Hamilton, 1967). Literature shows that due to the impact of mental disorders on society, these disorders require prevention strategies and preventive actions, and early diagnosis of these disorders is a fundamental step toward improving the situation of people with mental disorders. Also, the review of the literature shows that several valid questionnaires have been designed to examine mental health and analyze specific emotional patterns or social interactions for taking appropriate action(Hamilton, 1967; Jokelainen et al., 2019; Wang et al., 2022).

The popularity of social media for sharing user-generated data has been increasing in recent years. The users update their status, upload photos, share their geographic location, or interact with others by commenting on others' posts and creating conversations. Through these interactions, users can express their feelings and thoughts and report on their daily activities, which can provide rich information about a person's social behavior (De et al., 2022; Zucco et al., 2020). Users turn to online forums to express feelings and seek social support about mental health concerns. These support communities include people such as mental health professionals, trained volunteers, or experienced users. The task of the support groups is to check the posts of the members of the association so that if there are signs of the risk of mental disorders, they can take measures to support the members.

Unlike physical diseases that can be diagnosed with laboratory results, experts make decisions and use self-reported information to diagnose mental disorders. As a result, diagnosis and examination of the disorder by the usual medical method that uses these two categories of information may not provide correct information about the patient's condition (Su et al., 2020).

Recently, many researchers have used machine learning techniques to explore and process a large amount of social network data. The generation rate of data is growing at a high speed in social networks. Data mining techniques allow researchers to extract information from complex data sets. These techniques help to interpret data and generate predictive models in financial (Li et al., 2020; Nassirtoussi et al., 2014), economic (Keyvanpour et al., 2020; Yu et al., 2014), political (Jungherr, 2016), and medical (Keyvanpour et al., 2022; Mehrmolaei and Keyvanpour, 2019; Sood et al., 2021) domains and social networks (Dai and Hao, 2017; Khetarpaul, 2021; Taghvaei et al., 2021). Since users express their feelings and moods on social networks daily and impartially, data mining and machine learning techniques can be used to develop automatic diagnosis systems for mental disorders. For example, unusual activities and unusual patterns extracted by text mining, exploring social network interactions, and photo analysis can be used to diagnose mental disorders (Kumari, 2022).

In recent years, many studies have been performed to identify depressed people in social networks (Bouarara, 2021; Hemanandhini and Padmavathy, 2022; Kim et al., 2021; William and Suhartono, 2021). For example in (Bouarara, 2021), authors have presented the recurrent neural network (RNN) to control the conditions of danger such as suicide, stress, or any other form of the psychological problem via analyzing tweets. They asserted that the RNN provides the best results over other techniques in literature such as decision tree and naïve Bayes in the field of mental disorders detecting. Also in (William and Suhartono, 2021), the authors have prepared a systematic review in the field of identifying depressive disorder in social network users based on learning approaches. As a general result of their research, the authors have stated that deep learning approaches have recently been widely used in the field of identifying disorders of social network users. This popularity seems logical because deep learning approaches can handle many of the limitations of previous data and techniques due to their unique features in data modeling (Cong et al., 2018; Su et al., 2020).

Investigating the proposed methods of previous studies in the field of identifying mental disorders reveals that some of those methods are weak in feature extraction and others cannot model the data well due to ignoring the temporal and sequential nature of users' posts on social networks. The mentioned weaknesses have affected the accuracy of the classifiers and therefore the previous classification methods do not achieve proper accuracy when classifying depressed people (William and Suhartono, 2021).

Therefore, this research is focused on presenting a suitable method based on deep learning for depression diagnosis using convolutional neural network (CNN) and long-short-term memory (LSTM). The proposed method is called CNN-LSTM, which can overcome the aforementioned two challenges. In the proposed method, the CNNs extract the features of posts well, and LSTM improves the efficiency of the method by using the temporal and sequential nature of the posts. The proposed model is trained by receiving the textual content of people's posts in the social network, and by finding Linguistic model identifies depressed people by exploring the text of their posts. The considerable strength of convolutional layers in deep networks prompted us to use CNN for efficient feature extraction. Also, the main innovation of this research is the use of LSTM along with CNN to remember the temporal and sequential data sequence of social network posts in order to better diagnose depressed people.

We have organized the continuation of the article as follows: In the second part, a general and formal definition is presented for the problem of mental disorders detection. Then, the subject literature has been reviewed to provide a relative understanding of previous ideas in third part. The proposed method and its components are presented in the fourth section. Some experiments have been designed in the fifth section to evaluate the efficiency of the proposed method. Also, in the sixth part, some discussions are presented to prove the superiority of the efficiency of the proposed method compared to previous related methods. In the end, the results of the current research and examples of future work are highlighted.

In this section, the main goal is to provide a general and formal definition for the problem of detecting mental disorders. Data explorers use machine learning and statistical models to discover hidden patterns in massive and big data (De et al., 2022; Tan et al., 2016). In social network mining, the content produced by users is used to extract patterns about each person or the relationship between people (Ayadi et al., 2022; Hao et al., 2018). Most of the extracted patterns are used to make advertising decisions or to be used in different research.

A review of previous studies shows that the exploration of social networks has been performed in two ways, structural and content. In the first type, social networks have been explored from the perspective of links and structures in areas such as the discovery of frequent subgraphs and dense subgraphs. In the second type, mining techniques are focused on mining different types of content, such as text and images, and people's interactions (Tabassum et al., 2018).

Social networks are an example of environment where people freely share their thoughts, feelings, and moments in their lives. Therefore, processing the information of posts of social network users can provide unbiased and close to real results in the field of mental health analysis and diagnosis of mental disorders for researchers. By applying the concepts and techniques related to data mining and machine learning, researchers can perform significant actions in line with the automatic diagnosis of mental disorders in social networks. The advantages of mining social networks to diagnose mental disorders include early diagnosis, timely intervention to solve the problem, low-cost diagnosis, the possibility of continuous monitoring of the patient and elimination of bias from the report (Conway and O’Connor, 2016; Wongkoblap et al., 2022).

In this research, we have examined the detection of depression disorder in social network users through the exploration of posts of social networks relying on learning techniques due to the importance of mental health problem. We can be formulated the problem under study as follows:

$$f=\left\{x|F\left(x\right)=l\right\}$$

Where $x$ is denied as $x=\left\{\forall user \exists set of terms\right\}$. Also, $l$ is defined as $l=\left\{set of labels:depressed, control\right\}$. Also, a general schema of problem definition is presented in Fig. 1.

As seen in Fig. 1, the output of the problem of mental disorders detection in social networks is different according to the number of disorders investigated in the research. Since this research, only depression disorder was detected, so the result is binary-classes as highlighted in Fig .1 (depressed and control classes). Here, F(x) is a function that receives the input and performs the identification of mental disorders by applying the desired learning technique.

As observed in Fig. 1, a hybrid technique based on deep learning is presented to solve the problem of diagnosing users with depression in social networks. The reason for this choice is the successful and acceptable performance of neural networks in the field of text processing in social networks (Babu and Kanaga, 2022; Su et al., 2020).

In most of the previous studies, people's information including images, text, and social interactions of network users are examined to detect mental disorders of users through social networks mining. As shown in Fig. 1, we processed the text of users' posts in this study. To extract features from the text of posts, there are different methods such as sentiment analysis, emoticon interpretation, and language features. In this article, we have used linguistic features to extract features from the text of users' posts.

In this article, the problem of detecting depression disorder through text mining of users' posts on social networks has been investigated in order to support the mental health of society. The review of the literature shows that so far there have been extensive studies in the field of detecting mental disorders in social networks (Bauer et al., 2018; Hassantabar et al., 2022; Wongkoblap et al., 2022). Also, a glance view at the previous methods indicates there is a variety of effective ideas. Therefore, it seems necessary to get familiar with the problem-solving space and the previous solutions in order to present a more efficient and effective method. Based on the mentioned necessity, this section follows two significant goals. The first, classifying the previous studies from perspective of the scope covered by the method. Second, reviewing instances of previous experimental studies to get familiar with the variety of previous ideas in the field under study.

According to the first goal, we have presented a classification to categorize the previous studies regarding mental disorders detection based on the scope covered by the methods. The proposed classification is one of research achievements in our research work. Our proposed classification is divided mental disorders detection methods into four general classes as shown in Fig. 2.

As seen in Fig. 2, the methods are divided into 4 different classes based on the scope covered by the methods. Methods that are placed in the prevention class can help in early diagnosis of mental disorders. In fact, this class of methods can be useful to prevent the progression of the mental disorder, increase its severity, or even prevent suicide. For example, research has helped experts to prevent people from committing suicide by finding early signs of suicidal tendencies (Homan et al., 2014). In another research, this class of methods is used to find signs of the disorder, months before the diagnosis of the experts, enabling timely action to prevent the progression of the disorder (Reece et al., 2017).

In prediction class, researches predict mental disorders in social networks according to the signs in the data (Huang, 2022; Large, 2022). These signs can be hidden in the texts or photos posted by the user or his interactions with others. Among the signs of mental disorders in the text, we can refer directly to the name of the disorder, referring to the symptoms of the disease such as lack of sleep or anorexia, or the linguistic characteristics of the user such as the frequent use of negative sentences. Regarding the features related to the image, we can mention the tendency to use certain color combinations. Also, the user's interactions with other users can also help to investigate the mental disorders of the users.

In the monitoring class, studies monitor the status of a pre-confirmed disorder. This class of methods can help doctors to check and control the patient's condition as well as treat the patient (Bauer et al., 2018; Fuller-Tyszkiewicz et al., 2018).

Some studies deal with intervention in the process of mental disorders. This intervention is carried out in two forms: warning and treatment. Warning-oriented intervention can be investigated in two ways. Based on this, the warnings that the proposed systems produce for intervention are warnings that either inform the doctor of the patient's condition (Guntuku et al., 2017) or inform the patient condition to those around him (Jia, 2018).

In studies based on treatment-oriented intervention, the intervention takes place in the form of treatment of mental disorders. For example, in (Park et al., 2015), after the diagnosis of the disorder, a free consultation link is sent to the patients for treatment. In another research, users are sent a link to information and services related to mental health (Wongkoblap et al., 2017).

According to second goal, we have reviewed some experimental studies regarding mental disorders detection using machine learning methods.

(Wongkoblap et al., 2022) have analyzed the big data available in social networks in the field of mental health. The authors have stated that although social networks can be a powerful source in mental health-related research, the topic of exploring textual big data in these networks will be a challenging issue.

In (Su et al., 2020), Su et al. presented a review article that focused on the use of deep learning in identifying mental disorders. Authors said that their review study follows three main directions “ investigating techniques”, “ identifying challenges”, and “ presenting several suggestions for better using deep learning techniques in mental health problems”. They have investigated the overall function of most deep learning techniques in the field of mental health and then identified challenges in the problem under study.

In (Yates et al., 2017), authors have proposed methods to identify text posts harmful to the mental health of the society. They believe that identifying depressed people and thinking of supportive measures to prevent them from harming themselves and others can play an important role in ensuring the mental health of society. The authors state that they have presented a neural network architecture for classifying the texts of users' posts in social networks, which leads to the identification of depressed people at risk of self-harm. In the architecture proposed by the authors, each input is processed by a convolution network and after merging these processed inputs, the presentation vector is made of user activities.

(Cong et al., 2018) presented a deep learning based techniques called X-A-BiLSTM for depression detection on imbalanced social networks data. The authors stated that their proposed method includes two main parts called XGBoost and Attention-BiLSTM neural network. They have said that the first part of the method reduces the amount of data imbalance and the second part of the method increases the power of data classification. Like other related studies, the authors have evaluated the efficiency of their proposed method by applying the method on the RSDD (Reddit Self-Reported Depression Diagnosis) dataset. By reporting the performance evaluation results, they have claimed that their proposed method has provided better function compared to other methods under test.

In (Bouarara, 2022), the authors considered the main goal of their research to identify the behavior of people suffering from mental problems among Twitter users in order to support them. For this purpose, they have used text mining strategy relying on machine learning techniques such as naïve Bayes and k-nearest neighbours. By observing the performance evaluation results of their proposed method, they have concluded that machine learning techniques can play an important role in the text processing process and provide acceptable results in mental disorders detection.

(Babu and Kanaga, 2022) provided a review on sentiment analysis using artificial intelligence on social networks. The authors stated that the most of previous studies followed 6 steps gathering requirements, collecting data, cleaning data, analyzing data, representing data, and visualizing data to investigate raw data. They have said that most of the techniques that have been used for text processing and data classification in the field under study are multi-class machine learning techniques instead of binary-class.

(Aguilera et al., 2021) detected depression and anorexia problem using a one-class classification technique in social networks. The authors stated that their proposed method evaluated the relationship of documents based on their strengths and relying on the gravitational attraction. They said that their proposed one-class approach considers the similarity between the input texts and the relationship between them to make a decision in the context of recognizing the desired class. Also, the authors proposed a new criterion to identify the relationship between documents for the task under study.

(Gupta et al., 2021) intend to identify people with adverse mental conditions by examining the posts of users in the Reddit dataset. For this purpose, they have tried to reveal the emotional mentality of people in social networks by processing the text of users' posts. To achieve this goal, the authors have used six different classification techniques. After evaluating the effectiveness of different classifiers and sentiment analysis and studying the evaluation results, they have stated that the Naïve Bayes classifier has provided an acceptable performance.

In (Kim et al., 2020), Kim et al. used a deep learning-based model to identify mental patients according to the analysis of user's posts content. They employed XGBoost and CNN models to classify people in this field. The authors used TF-IDF vector in the XGBoost classifier to convert the words into n-dimensional vectors. Finally, they have stated that intend to present an ensemble approach based on multi-class classification models in the future to solve the problem of identifying mental disorders. On the other hand, the authors stated that exist some weaknesses in their research. For example, they have not considered the impact of some factors such as socio-demographic and regional differences in classification task. Therefore, solving this challenge with the aim of increasing the accuracy of data classification can be considered as a future work.

In (Hassantabar et al., 2022), Hassantabar et al. investigated a mental health system to detect mental disorders. For this purpose, authors presented a hybrid method called Mhdeep, which use commercially available WMSs and effective deep neural networks models. Also, the authors stated that they used a synthetic dataset to pre-train their model weights. The authors have claimed that their proposed model has achieved acceptable results compared to other methods of identifying mental disorders.

By studying the empirical literature, it is concluded that most previous sources have received acceptable results from machine learning methods, especially techniques based on neural networks. But the problem of the sequential and temporal nature of data has been ignored in previous related studies. Therefore, providing an efficient solution based on the effectiveness of the sequential and temporal nature of data in the process of identifying people's mental disorders in social networks can be raised as a research innovation.

According to the study of previous researches in the field of diagnosis of mental disorders using the social networks mining, neural networks is used to detect depression disorder in the current research. This selection of technique is clear due to their strong performance, being up-to-date and also extracting automatic features and no need for knowledge in the field of psychology by neural networks (Hassantabar et al., 2022). In order to detect depressed people, unlike previous studies that use prior knowledge (Rao et al., 2020), this research considers only the linguistic features of the user as a feature (Yates et al., 2017).

In this research, among the different types of neural networks, the convolutional neural network (CNN) was chosen due to its high feature extraction capability. The convolutional layer extracts the features of the post text by applying filters in the sliding window. Then, by applying the aggregation layer, only the important features can be kept. This action helps to reduce the dimensions and extract the efficiency of the feature.

The literature shows that previous related researches that have used the CNN networks to detect mental disorders, did not apply the temporal and sequential nature of the data at the same time (Cong et al., 2018; Yates et al., 2017). Therefore, in this research, we try to increase the prediction accuracy by adding LSTM (Hochreiter and Schmidhuber, 1997) to the CNN network in order to remember the past predictions and take advantage of the temporal and sequential nature of the data. The proposed hybrid method is called CNN-LSTM that general architecture is presented in Fig. 3.

In the proposed CNN-LSTM method, a suitable solution is provided for identifying depressed people inspired by the method (Yates et al., 2017) and by combining two convolutional layers, one to extract the features of the posts and the other to integrate these features. As seen in Fig. 3, the innovation of the article is considered the use of LSTM, which is highlighted with a bold black pen. Also, it is observed that LSTM part receives the output of the first convolutional layer to be memorized as the first input. Then the output of this layer enters the second convolutional layer to integrate and form a representation of the user's information. At the end, the users are classified and the label of depressed or non-depressed (control) is assigned to them.

4.1 Pre-processing Unit

The preprocessing unit is the first part of our proposed method and Pseudocode of this unit is shown in Fig. 4.

Pre-processing is one of the most important stages in most text processing techniques, which begins the process of preparing the desired text for processing and analysis (Anandarajan et al., 2019). As seen in Fig. 4, first, the posts without text are removed, then if the variable related to the order of the posts is set, then the order of the posts will be randomized. Here, due to using the temporal order of the posts in our method, we avoid messing up their order. In the next step, if the number of posts is more than allowed, samples will be taken at the same intervals and with the maximum number allowed. Next, if the variable related to reversing the posts is set, the order of the posts will be reversed. In this research, we should mention that this variable is not set due to the importance of time order.

Finally, the posts are divided into their constituent tokens in order to prepare the data for processing, then the tokens whose modification frequency or document frequency is less than the specified minimum are removed. Where TF refers to the number of token repetitions in document and DF refers to the number of documents that contain the related token.

4.2 The Proposed Neural Network Component

In this section, a neural network architecture is presented for processing the text of Reddit social network users' posts with the aim of detecting users of the depressed. The architecture of the neural network component of the proposed method is shown in Fig. 5.

As seen in Fig. 5, convolutional layers are used in the architecture of the neural network component of the proposed method to extract the features of the input data, integrate the information of the posts, and provide a comprehensive representation of the user's activity. The reason for choosing LSTM along with CNN is to use the power of convolutional layers in extracting features along with using temporal and sequential features of data (Alzubaidi et al., 2021; Rhanoui et al., 2019; Zeberga et al., 2022). The first convolutional layer extracts the features of each of the user's posts by applying a filter in the sliding window, and then by applying the merging layer, only the important features are kept. The second convolutional layer receives all the remaining useful features from all the user's posts and by applying a filter, separates the important features from them and keeps them as a representation of the user's activity. The activation function of both layers of the convolutional operator is the linear rectifier activation function. This representation is processed by one or more dense layers for classification before being processed by a final output layer.

Unlike the traditional recurrent neural network, which only calculates the balanced sum of the input signals and then passes through an activation function, each LSTM unit uses a c_t memory cell at time t (Sarker, 2021; Uddin et al., 2022). The output h_t of the LSTM unit is obtained as follows:

$${h}_{t}={\tau }_{0}.\text{tanh}{(c}_{t})$$

Where ${\tau }_{o}$ is output gate. The output gateway controls the amount of content that is served through memory. The output gateway is computed as follows:

$${\tau }_{o}=\sigma ({W}_{o}.\left[{h}_{t-1} , {x}_{t}\right]+{b}_{o})$$

Where $\sigma$indicates Sigmoid activation function. Also, ${W}_{0}$ is a skew matrix. The memory cell c_t is updated by partially forgetting the current memory and adding new memory content ($\widehat{{c}_{t}}$) as follows:

$${ c}_{t}={\tau }_{t}{c}_{t-1}+{\tau }_{u}\widehat{{c}_{t}}$$

Also, the content of the new memory ($\widehat{{c}_{t}}$) is computed as follows:

$$\widehat{{c}_{t}}=\text{t}\text{a}\text{n}\text{h}({W}_{c} .\left[{h}_{t-1},{X}_{t}\right]+{b}_{c})$$

The amount of current memory that should be forgotten is controlled by the forget gate, which has been computed as follows:

$${\tau }_{forget}=\sigma ({W}_{forget}.\left[{h}_{t-1} , {x}_{t}\right]+{b}_{forget})$$

On the other hand, the amount of new memory content that should be added to the memory cell is computed by the updating gate as follows.

$${\tau }_{u}=\sigma ({W}_{u}.\left[{h}_{t-1} , {x}_{t}\right]+{b}_{u})$$

The model building and training part is the most important part of the implementation of the proposed method, which pseudo code of this part is presented in the Fig. 6.

As seen in Fig. 6, a Keras model is built using selected variables, then the characteristics of the training phase (optimizer type and loss function type) are specified. Keras presents strong and abstract structure blocks for constructing deep neural networks as a powerful library (Manaswi, 2018; Muhammad et al., 2020). In this step, the number of layers, input and output structure, their order, and also neural network variables are displayed. The pseudo-code for building a Keras model is presented in Fig. 7.

As seen in the pseudo-code presented in Fig. 7, in step 4, the data is converted into batch using a generator and provided to the model. At the end of each epoch of training, a summary of model status information such as precision, recall, F1-score, number of predicted labels, the actual number of labels, etc. is shown. In fact, at the end of each training phase, the proposed method is tested with the evaluation data set and the evaluation criteria are computed. In this way, it is possible to see the process of changes in the performance of the method. The purpose of checking the performance changes of the model is to adjust the neural network variables in order to optimize the training. The pseudo-code of the method status summary is shown in Fig. 8.

A tensor is created when building the Keras model for each user's data that the number of rows of a tensor is set with the maximum number of posts and the number of columns is set with the maximum post length. The pseudo-code for generating each tensor is shown in Fig. 9.

The constructed tensor is entered into the embedding layer, time-distributed layer, and convolutional operator layer to determine the type of activation function and aggregation operator. An aggregation operator is used to reduce the data of a convolutional neural network. Data is re-entered into a time-distributed layer and a long-short-term memory layer. They then enter a second convolutional layer for merging to create a representation of the user's activity. Finally, the output tensor from the convolution layer becomes one-dimensional and enters the dense layer for classification.

4.3 Describing Method Parameters

In this section, some parameters of the proposed method are briefly explained. More explanation about other parameters and their setting is provided in the pseudocodes.

The loss function: A cross-entropy function is considered as the loss function of the model. This function calculates the probability of belonging to each class for each sample and integrates the results at the end (Zhang and Sabuncu, 2018). The calculation of this function is presented as follows:

$${c}_{(p,t)}= -\sum _{c=1}^{c}{t}_{o,c}\text{log}\left({p}_{o,c}\right)$$

Where p is the predicted value, t is the target value, and c shows class.

The activation function of the convolutional operator layer: Both convolutional operator layers use a linearly rectified activator. The literature study shows that most of the new neural network based techniques prefer to use this activator for hidden layers (Bottou, 2012). This activator function is defined as follows:

$$f\left(x\right)=\text{max}\left(x,0\right)=\left\{\begin{array}{c}x x>0\\ 0 x\le 0\end{array}\right.$$

The main advantage of using this linear activation function is that it has a fixed derivative for all inputs greater than 0, which speeds up the learning of the network.

Random gradient descent: It is a method to optimize the objective function that works iteratively. This method provides a random approximation of the decreasing gradient. Random gradient descent uses only a randomly selected part of the training data to calculate the minimum of the function (Ketkar, 2017).
Adam's optimizer: It is an optimization method based on the gradient and is one of the random objective functions that calculate the training rate for each weight of the network separately. This method is computationally efficient because it requires little memory, so it provides acceptable performance for problems with high data volumes (Sharma et al., 2017).

The main goal of this section is to evaluate the effectiveness of the proposed method for detecting the depressed people through processing the text of users' posts in social networks. To achieve this important goal, this section of the article is organized into five sub-sections. Next, each section is described in more detail.

5.1 Datasets

In this article, the RSDD dataset (Aguilera et al., 2021; Safa et al., 2022) in which users have reported themselves suffering from depression has been used to test the proposed method. We have implemented and tested our proposed method on this data set because it has been referred to this data set in most valid and related studies. In fact, this data set has been used as a reliable source in most references related to detecting mental disorders (Aguilera et al., 2021; Tadesse et al., 2019; Uddin et al., 2022; Zeberga et al., 2022). This dataset was constructed by annotating users' information on the Reddit social network (Babu and Kanaga, 2022). Users who published a post between the months of January 2006 and October 2016 were selected for annotation. Users who shared less than 100 posts before the post related to their depression diagnosis were removed. The dataset includes 9,210 depressed users and 107,274 control users, the details of which are presented in Table 1. We have to say that each user of the dataset shared an average of 969 posts.

Table 1

Description of the details of the data set used in the current research
Part	The number of depressed users	The number of non-depressed users (control users)	Total
Train	3070	35753	38823
Evaluation	3070	35746	38816
Test	3070	35775	38845
Total	9210	107274	116484

The average post length and the median are considered 148 and 74 tokens, respectivelly. The dataset of RSDD is larger compared to the dataset of previous researches in the field of diagnosis declared by the user himself. Also, the posts in this dataset are annotated to confirm the claim of depression.

It should be noted that the available dataset includes some limitations, which are listed below:

It only includes the subpopulation of depressed people who have self-reported depression. Reddit users may not be a good sample of the general population.
In this data set, there is no way to recognize whether people were truthful in reporting depression or not.

Each row of the data set contains the raw information of a user, which is encoded into one-hot vectors. At the beginning of each row, the ordered pair (post, timestamp) is given for each user post. Then the user's random ID is given and at the end, the user's label is determined (depressed or control). Users whose labels are outside of these two modes are ignored. In the one-hat coding method, there are rows and columns according to the number of words. Each row belongs to the embedding of one word, which is equivalent to a vector with one row and n column.

Where n is the number of words in the embedding. In fact, n is the number of unique words in user posts and does not have a fixed value. This value can vary between 1 and the maximum allowed post' length value or count of terms in post (n_term). For each word, the column corresponding to itself is set to 1 and the rest of the columns are set to 0. In Table 2, an example of a three-word one-hot vector is shown.

Table 2

One-hot vector including three-word
	1	2	3
One-word	1	0	0
Two-word	0	1	0
Three-word	0	0	1

5.2 Evaluation Measures

In this article, three criteria of precision, recall, and F1 score are used to evaluate the effectiveness of the proposed method. These criteria have been used in most previous related studies to evaluate the effectiveness of the methods (Aguilera et al., 2021; Safa et al., 2022; Srividya et al., 2018). In the following, the details of computing each of the efficiency evaluation criteria are presented for the current research.

Precision: This criterion investigates the correctness of the model in the conditions that the model has positively predicted the result (Su et al., 2020). It can provide successful performance when the false positive value is high (Cong et al., 2018; Uddin et al., 2022). The criterion is calculated as follows:

$$precision=\frac{TP\left(True Positive\right)}{TP+FP \left(False Positive\right)}$$

Recall: This measure is actually a proportion of negative cases that are correctly classified as negative (Islam et al., 2018). The recall criterion will be very effective when the false negative value is high (Lovejoy, 2019; Skaik and Inkpen, 2020). The recall criterion is calculated as follows:

$$recall=\frac{TP\left(True Positive\right)}{TP+FN \left(False Negative\right)}$$

F1 score: It is considered a suitable criterion for evaluating the accuracy of a test because precision and recall are simultaneously considered in the computation of this criterion (Tadesse et al., 2019; Uddin et al., 2022). It is one of the common and practical criteria in issues related to classification and detection (Imani and Noferesti, 2022; Savargiv et al., 2020). The F1 score is computed as follows:

$$F1 score=2\times \frac{precision \times recall}{TP+FN ( precision+recall)}$$

5.3 Environment and Test Method

We should mention that Python language is used to execute our method. This method has been implemented in the Google Colab cloud service platform with 2-Core Intel(R) Xeon (R) CPU @ 2.20 GHz. Also, we used a Tensor Processing Unit (TPU) with Ram 36 GB.

In this research, the basis for choosing test method is considered the test method used in valid and practical articles such as (Aguilera et al., 2021; Tadesse et al., 2019; Uddin et al., 2022; Zeberga et al., 2022). There are two main reasons for this choice. First, there are similarities between the functional structure of our proposed method and the methods used in the mentioned sources. Second, our method has been applied to the same datasets as the three mentioned sources. For this purpose, the dataset is divided into three parts: training, validation and testing. The constructed model has been trained in 25 separate rounds. Evaluation criteria are calculated for each round using the validation data set.

Since the Python language and Tensor Flow libraries and Keras have been used to implement the neural network in this article, the values of the evaluation criteria have been calculated by the model.summary() ready function.

5.4 Setting Method Parameters

The goal of this part is considered to adjust parameters in the proposed method. The super-variables used in the method, such as the size of the sliding window, the number of convolutional filters and the type of aggregation operator, are presented in Table 3.

Table 3

The used Super-variables in the proposed method
Random removing	Dense layers	Convolutional layer			Method
Random removing	Dense layers	Aggregate length	Filter	Size	Method
0.0	One neuron for every 50 neurons from the previous layer	All (average)	25	3	Cross Entropy

The super-variables of the depression detection method have been selected using method evaluation on the validation set during training. The second convolutional layer of the depression detection model uses filters with a length of 15, and a stride length of 15.

We have considered 400 posts (n _post) with a maximum length of 100 terms (n _term) as input to the method. An experiment is designed to achieve the optimal value of the maximum term length. The main purpose of this experiment is to obtain the optimal value of the maximum number of terms by changing the number of the terms, calculating the criteria for evaluating the efficiency of the proposed method in different situations, and analyzing the values of these evaluation criteria. In this experiment, the range of terms count change has been considered from 25 to 150. The results of this assessment are presented in Fig. 10.

As seen in Fig. 10, with the increase in the number of terms up to 80 terms, the precision criterion does not change significantly, which could be due to the insufficient number of terms in the analysis process. Also, with the increase in the number of terms from 80 to 100, the precision criterion shows an upward trend in Fig. 10. On the other hand, with the increase of the number of terms to 100 terms and the number above 100, this criterion will decrease. As observed in Fig. 10, the optimal number of terms to achieve the highest precision is 100 terms.

The amount of recall has changed slightly for different values of the number of terms in the range of 0 to 160. On the other hand, the amount of recall has increased slightly for the number of terms in the range of 120 to 160. However, we should note that increasing the number of terms causes computational overhead, which is undesirable. Therefore, it can be concluded that the recall criterion is not effective for choosing the optimal value of the number of terms.

As shown in Fig. 10, the value of F1-score for the number of terms has increased in the range of 0 to 50, and after the number of terms exceeds 50, the value of this criterion has not changed. This can reveal that choosing 50 terms from each post can provide enough information about that post. Therefore, it can be concluded that choosing more than 50 terms is an optimal value for this criterion.

Considering the changes in efficiency evaluation criteria in this experiment, the number of terms examined is considered 100 for the training and testing phases of the proposed method.

5.5 Experimental Results

As mentioned earlier, the dataset is divided into three parts: training, validating, and testing. We have used the dataset evaluation section to tune the method in the training phase. Also, the method has been trained in 25 separate courses and evaluation criteria have been calculated for each course using the evaluation dataset. The results of the method evaluation for different periods of the training phase are presented in Table 4.

Table 4

Evaluation of the proposed method in different training rounds
Round number	Class	Precision	Recall	F1 score
1	non-depressed users	0.91	1.00	0.95
	depressed users	0.00	0.00	0.00
	Average	0.45	0.50	0.48
2	non-depressed users	0.91	1.0	0.95
	depressed users	0.00	0.00	0.00
	Average	0.45	0.50	0.48
3	non-depressed users	0.91	1.0	0.95
	depressed users	0.00	0.00	0.00
	Average	0.45	0.50	0.48
4	non-depressed users	0.91	1.0	0.95
	depressed users	0.00	0.00	0.00
	Average	0.45	0.50	0.48
5	non-depressed users	0.91	1.0	0.95
	depressed users	0.00	0.00	0.00
	Average	0.45	0.50	0.48
6	non-depressed users	0.91	1.0	0.95
	depressed users	0.00	0.00	0.00
	Average	0.45	0.50	0.47
7	non-depressed users	0.91	1.0	0.95
	depressed users	0.00	0.00	0.00
	Average	0.45	0.50	0.47
8	non-depressed users	0.91	1.0	0.95
	depressed users	0.00	0.00	0.00
	Average	0.45	0.50	0.47
9	non-depressed users	0.91	0.99	0.95
	depressed users	0.00	0.00	0.00
	Average	0.45	0.50	0.47
10	non-depressed users	0.91	0.97	0.94
	depressed users	0.18	0.06	0.09
	Average	0.55	0.51	0.51
11	non-depressed users	0.91	0.98	0.94
	depressed users	0.25	0.06	0.09
	Average	0.58	0.52	0.52
12	non-depressed users	0.91	0.99	0.95
	depressed users	0.00	0.00	0.00
	Average	0.45	0.50	0.47
13	non-depressed users	0.91	0.97	0.94
	depressed users	0.10	0.03	0.04
	Average	0.50	0.50	0.49
14	non-depressed users	0.91	0.97	0.94
	depressed users	0.17	0.06	0.08
	Average	0.54	0.51	0.51
15	non-depressed users	0.91	0.96	0.94
	depressed users	0.19	0.08	0.12
	Average	0.55	0.52	0.53
16	non-depressed users	0.91	0.99	0.95
	depressed users	0.29	0.06	0.09
	Average	0.60	0.52	0.52
17	non-depressed users	0.91	0.97	0.94
	depressed users	0.25	0.08	0.12
	Average	0.58	0.53	0.53
18	non-depressed users	0.91	0.97	0.94
	depressed users	0.18	0.06	0.09
	Average	0.55	0.51	0.51
19	non-depressed users	0.91	0.96	0.93
	depressed users	0.17	0.08	0.11
	Average	0.54	0.52	0.52
20	non-depressed users	0.91	0.95	0.93
	depressed users	0.14	0.08	0.14
	Average	0.52	0.51	0.52
21	non-depressed users	0.91	0.97	0.94
	depressed users	0.17	0.06	0.08
	Average	0.54	0.51	0.51
22	non-depressed users	0.91	0.97	0.94
	depressed users	0.21	0.08	0.12
	Average	0.56	0.53	0.53
23	non-depressed users	0.91	0.99	0.95
	depressed users	0.33	0.06	0.10
	Average	0.62	0.52	0.52
24	non-depressed users	0.91	0.97	0.94
	depressed users	0.18	0.06	0.09
	Average	0.55	0.51	0.51
25	non-depressed users	0.91	0.97	0.94
	depressed users	0.18	0.06	0.09
	Average	0.55	0.51	0.51

As seen in Table 4, in the first rounds, the model identifies most of the samples as non-depressed due to insufficient training. Therefore, the accuracy of detecting non-depressed users is very high and the accuracy of detecting depressed users is very low; The reason for this is that the number of non-depressed users is about 12 times that of depressed users, and in fact the model is trained more with non-depressed users.

It has also been observed that as the training continues, the accuracy of detecting depressed users increases and the accuracy of detecting non-depressed users decreases, and this means improving the performance of the method.

In a general conclusion, it can be said that the proposed method has been more successful in detecting non-depressed users. For each period of training in which the F1 score has increased compared to its previous best value, the weight values of the neurons of the network have been stored. In fact, goal was considered to save the best state of the network in the training phase.

In the following, the file of the best generated weight that has the highest F1 score value is used to evaluate the proposed method with the test data. As seen in Table 4, the best round of the training phase in the implementation of the proposed method is round 17, where the weights of the neurons in this round are stored as the final trained weights.

The results of testing the proposed method on the testing part of the dataset are presented in Fig. 11.

As seen in Fig. 11, the proposed method on the test dataset has provided high accuracy when identifying depressed users. The reason for this high accuracy is logical because the number of non-depressed users in training our method is less than that of depressed users. This issue can affect the output results. Also, the best performance of the proposed method has been provided for the recall index (0.98) compared to other performance evaluation indexes in identifying non-depressed people.

As a general result of testing the proposed method on the test data set, it can be said that our method can provide a successful performance in identifying mental disorders of users.

5.5.1 Discussion

In this article, our most important goal is to present an efficient and accurate method of the text analysis of users' posts to detect mental disorders in social networks. For this purpose, a hybrid method based on deep learning called DDDeep is presented. One of the most important strengths of our proposed method is its remarkable power in automatically extracting important and effective features and thus presenting reliable results. This positive point is provided by using convolutional layers and the unique architecture of convolutional neural networks (CNN) in our proposed method. Also, we don't need any primary technical knowledge in the field under study by using CNN.

On the other hand, the concurrent use of sequential and temporal nature of data in the process of text analysis of users' posts and detecting mental disorders of users is another strength of the proposed method. This positive point in our method is supported by applying LSTM. In fact, the time-distributed layer in LSTM has been used to consider the feature of temporal ordering of data. Also, the sequential feature of users' posts is supplied by using the long-short-term memory layer and the feature of remembering past decisions of the LSTM model. The feature of remembering past decisions in the LSTM model provided better status to more accurate decisions. Therefore, more reliable results are provided in data classification tasks, and the detection of depressed users according to more accurate decisions.

As seen in Fig. 11, the evaluation results of the proposed method on the test dataset can well reflect these theoretical findings experimentally.

Although there are many strengths in our method, the proposed method suffers from some shortcomings. For example, the computational complexity of our proposed method is high due to the use of deep learning methods as the basis of our approach. Therefore, in a situation where the speed factor is of particular importance, this method cannot be considered a suitable alternative in the field under study to detect depressed users.

On the other hand, if only the accuracy factor is considered the most important factor for solving the problem, then it can be said that the proposed method will be a suitable choice. As seen in the outline of the proposed method (see Fig. 3), our proposed method does not use optimization concepts to achieve more accurate results. However, optimization techniques can provide more acceptable results as a powerful tool in the field of text analysis in combination with other methods (Kumari et al., 2021).

Here, the best values of the evaluation metrics for 25 training rounds of the proposed method on validation data are shown in Fig. 12.

As seen in Fig. 12, the proposed method is more successful in detecting non-depressed users than in detecting depressed users, which can be considered a shortcoming of our method. As seen in Fig. 12, our method provided more accurate values for all three evaluation metrics when detecting non-depressed users compared to detecting depressed users. This weakness comes from the fact that the number of depressed users was less compared to non-depressed users. In fact, the model has been trained mostly with samples of non-depressed users, so improving this challenge can be investigated in future works.

Another weakness of our method is not paying attention to the knowledge of experts and combining the knowledge of these experts with machine learning approaches. In fact, an optimal combination can lead us to achieve more reliable results in the process of identifying users with mental disorders. On the other hand, our proposed method has only used linguistic features to identify depressed users in social networks, while benefiting from the vocabulary of words related to depression can help a better diagnosis.

In this part of the article, we intend to provide a comparative evaluation between our proposed method and other mantal disorders detection methods. The main goal of this evaluation is considered to investigate the effectiveness of the proposed method in comparision with other previous methods in the field of text mining of users' posts in social networks and detecting mental disorders. The results of this comparative evaluation is presented in Tabel 5.

Table 5

The results of comparative evaluation the methods of mental disorders detection on Reddit test
Method	Precision	Recall	F1 score
FastText (Joulin et al., 2016)	0.37	0.70	0.49
BOW-SVM (Staiano and Guerini, 2014)	0.72	0.49	0.46
X-A-BiLSTM (Cong et al., 2018)	0.69	0.53	0.60
LSTM	0.59	0.42	0.47
Bi-LSTM (Zhang et al., 2021)	0.62	0.39	0.48
BERT + knowledge distillation (Zeberga et al., 2022)	0.76	0.68	0.70
Our proposed method	0.78	0.70	0.73

In this comparative evaluation, we considered the methods of mental disorders detection such as FastText (Joulin et al., 2016), LSTM, BOW-SVM (Staiano and Guerini, 2014), X-A-BiLSTM (Cong et al., 2018), Bi-LSTM (Zhang et al., 2021), and BERT + knowledge distillation (Zeberga et al., 2022) for a fair comparison. We tried to make a fair comparison to evaluate the efficiency of the mental disorders detection methods. Therefore, two main reasons for choosing the mentioned methods and involving them in the comparison process have been considered. The first reason, all methods have been applied to the RSDD dataset for evaluating their efficiency. In other words, we should mention that the authors used the same data set in a comparative evaluation of their methods. The second reason, there is a functional similarity between the methods chosen to participate in the comparison process. In fact, most of these methods have used a convolutional neural network or long-short-term memory in their studies.

As an important achievement, it can be concluded that our proposed method has been able to achieve significant success compared to other methods under comparison in the field of mental disorders detection. Also, the results of Table 5 reveal that our method has achieved acceptable results in all three evaluation criteria.

6.1 Discussion

In this section, we discussed about the results of the comparative evaluation and the positive effects of our method for improving each of the performance criteria separately. For this purpose, the data is divided into training, validation, and testing datasets, each containing about 3000 depressed users and control users. The validation set has been used to set the parameters of the method. The results of Table 5 are reported on the test data set. For each user, the convolutional neural network has received up to n_post=400 and a maximum of n_term=100.

As the results of Table 5 show, the proposed method has provided a successful performance in identifying the depressed users over the methods under comparison. For example, the proposed method provides about 2% improvement in precision index, 1% in recall and 3% in F1 score in comparision with the method of the BERT + knowledge distillation (Zeberga et al., 2022). It is also seen in Table 5 that the highest precision (0.78), recall (0.70) and F1 score (0.73) are provided by the proposed method, and this improvement can be a reason to prove the superiority of the proposed method over to other compared methods.

From the results of Table 5, it can be concluded that the proposed method has achieved a 6% improvement in the precision index, 20% improvement in the recall index and 22% improvement in the F1 score compared to the BOW-SVM (Staiano and Guerini, 2014) method. In fact, this amount of improvement achieved by our method indicates that considering the temporal and sequential order of users' posts in the process of text analysis can be effective in providing stronger results. In the following, The effect of the proposed method to improve each of the performance indicators has been investigated separately.

For example, at first, the role of the proposed method is investigated to improve precision index in comparision with other methods then evaluation results are presented in Fig. 13.

As shown in Fig. 13, the proposed method has increased by 6% compared to the highest value of the precision criterion in previous methods (BOW-SVM). This means that the false negative predicted by the proposed method has decreased compared to previous methods. In fact, with this method, the number of non-depressed people whose the depressed was mistakenly identified has decreased. Considering that the goal is to detect depressed users, this criterion is less valuable than the other two criteria. Also, the precision of our method has increased by 2% compared to the BERT + knowledge distillation [46] method. In fact, the idea of considering the temporal and sequential nature of the posts by the proposed method has caused the number of false negatives predicted by our method to decrease in comparison with the method in [46].

Next, the role of the proposed method is evaluated to improve recall index over other methods then evaluation results are reported in Fig. 14.

As seen in Fig. 14, the proposed method has provided the recall criterion value equal to the highest value reported for recall in previous methods (FastText). In fact, with these two methods, the number of people who were the depressed users, but they are predicted as non-depressed people, has decreased. This achievement is important for the purpose of the current research, which is considered to diagnose depressed people.

Also, the results of Fig. 14 show that after the proposed method, the highest value of recall criterion is provided by BERT + knowledge distillation [46] method. This result seems logical because the overall performance of BERT + knowledge distillation [46] method is similar to the proposed method, with the main difference that in BERT + knowledge distillation [46] method, the issue of temporal and sequentioal of users' posts is ignored. On the other hand, our method has increased the recall criterion by 2% over BERT + knowledge distillation [46] method.

By observing the evaluation results in Fig. 14, it can be concluded that the recall criterion is improved by the proposed method compared to all the methods under comparison.

Finally, effectiveness of the proposed method is investigated to improve F1 score via other methods then comparision results are presented in Fig. 15.

As seen in the results of Fig. 15, the F1 criterion value of the proposed method has increased by 13% compared to the highest F1 criterion value among the previous methods (X-A-BiLSTM). This means that our method has performed better than all the previous methods (methods under comparision in this article) by simultaneously considering the criteria of precision and recall.

According to the results of Figs. 13 and 15, the precision and F1 criterion of the proposed method have increased by 9% and 13%, respectively, compared to the previous mentioned methods. Also, in the recall criterion, the performance of our method is similar to the best recall criterion reported about 70% in the previous methods such as FastText [43].

The reason for the common good performance between all the methods in Table 5 is the use of convolutional layers to extract the features in the text.

Also, the reason for the improvement obtained by the proposed method is the use of long-short-term memory to remember previous decisions in user classification, because remembering previous decisions has led to better training of the deep convolutional neural network.

Many studies in the research field of mental health and mental disorders show that people with mental disorders turn to social media networking software to express their feelings and receive support from others. This topic can provide a suitable platform for the development of many researches in the field of identifying many mental disorders of people by providing a rich source of information and investigating the posts of users.

The importance of the issue of mental health of society and the problem of detecting people's mental disorders prompted us to investigate the problem of detecting depression in this article. For this purpose, we have presented a CNN architecture using text of user's posts mining in social networks, called CNN-LSTM.

So far, many methods have been used in the field of detectingmental disorders of people, and the most important shortcoming of the previous methods is ignoring the temporal and sequential nature of the data, and as a result, the classification accuracy is low. In order to reduce the negative effects of the aforementioned challenge, we have presented a hybrid CNN-LSTM approach in this article in order to achieve two major goals. The first one is to use the convolutional layer to automatically extract features without the need for basic knowledge in the field of psychology. Second, improve accuracy by remembering past classification decisions. The results of the evaluation of the proposed method show the successful performance of our method compared to other previous methods.

Although the results of the detection of depression disorder using our proposed method are encouraging, the values obtained for the evaluation criteria make this research area require more studies. Therefore, in the following, an example of research activities that can be studied for the development and progress of this research field in the future is presented:

Using only linguistic features extracted by CNN in the classification process can be considered as a limitation of the current research. Therefore, the use of words related to depression and deep neural networks can be investigated as future work to improve the accuracy of diagnosis results.
We believe that with the advanced method of machine teaching from the knowledge of expert psychologists in the field of psychology, it is possible to help improve the results of diagnosing mental disorders. Therefore, this issue can also be investigated in the future work.
Our proposed method detects depressed people without the need of psychological knowledge, which can be improved as a weakness in future work by involving psychological knowledge in the process of diagnosing mental disorders.

Funding

The authors declare that their research work is not supported by anybody or any organization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.

Aguilera, J., Farías, D.I.H., Ortega-Mendoza, R.M., Montes-y-Gómez, M., 2021. Depression and anorexia detection in social media as a one-class classification problem. Applied Intelligence 51, 6088-6103.
Alzubaidi, L., Zhang, J., Humaidi, A.J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., Santamaría, J., Fadhel, M.A., Al-Amidie, M., Farhan, L., 2021. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. Journal of big Data 8, 1-74.
Anandarajan, M., Hill, C., Nolan, T., 2019. Text preprocessing, Practical Text Analytics. Springer, pp. 45-59.
Astleitner, H., Bains, A., Hörmann, S., 2023. The effects of personality and social media experiences on mental health: Examining the mediating role of fear of missing out, ghosting, and vaguebooking. Computers in Human Behavior 138, 107436.
Ayadi, M.G., Bouslimi, R., Akaichi, J., 2022. Medical social networks content mining for a semantic annotation. Social Network Analysis and Mining 12, 1-12.
Babu, N.V., Kanaga, E., 2022. Sentiment analysis in social media data for depression detection using artificial intelligence: A review. SN Computer Science 3, 1-20.
Bauer, A.M., Baldwin, S.A., Anguera, J.A., Areán, P.A., Atkins, D.C., 2018. Comparing approaches to mobile depression assessment for measurement-based care: Prospective study. Journal of Medical Internet Research 20, e10001.
Ben Hassine, M.A., Abdellatif, S., Ben Yahia, S., 2022. A novel imbalanced data classification approach for suicidal ideation detection on social media. Computing 104, 741-765.
Bottou, L., 2012. Stochastic gradient descent tricks, Neural networks: Tricks of the trade. Springer, pp. 421-436.
Bouarara, H.A., 2021. Recurrent neural network (RNN) to analyse mental behaviour in social media. International Journal of Software Science and Computational Intelligence (IJSSCI) 13, 1-11.
Bouarara, H.A., 2022. Sentiment Analysis Using Machine Learning Algorithms and Text Mining to Detect Symptoms of Mental Difficulties Over Social Media, Research Anthology on Implementing Sentiment Analysis Across Multiple Disciplines. IGI Global, pp. 581-595.
Cong, Q., Feng, Z., Li, F., Xiang, Y., Rao, G., Tao, C., 2018. XA-BiLSTM: A deep learning approach for depression detection in imbalanced data, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, pp. 1624-1627.
Conway, M., O’Connor, D., 2016. Social media, big data, and mental health: current advances and ethical implications. Current opinion in psychology 9, 77-82.
Dai, H., Hao, J., 2017. Mining social media data on marijuana use for post traumatic stress disorder. Computers in Human Behavior 70, 282-290.
De, S., Dey, S., Bhatia, S., Bhattacharyya, S., 2022. An introduction to data mining in social networks, Advanced Data Mining Tools and Methods for Social Computing. Elsevier, pp. 1-25.
Faisal, R.A., Jobe, M.C., Ahmed, O., Sharker, T., 2022. Mental health status, anxiety, and depression levels of Bangladeshi university students during the COVID-19 pandemic. International journal of mental health and addiction 20, 1500-1515.
Fuller-Tyszkiewicz, M., Richardson, B., Klein, B., Skouteris, H., Christensen, H., Austin, D., Castle, D., Mihalopoulos, C., O'Donnell, R., Arulkadacham, L., 2018. A mobile app–based intervention for depression: end-user and expert usability testing study. JMIR mental health 5, e9445.
Guntuku, S.C., Yaden, D.B., Kern, M.L., Ungar, L.H., Eichstaedt, J.C., 2017. Detecting depression and mental illness on social media: an integrative review. Current Opinion in Behavioral Sciences 18, 43-49.
Gupta, S., Das, D., Chatterjee, M., Naskar, S., 2021. Machine Learning-Based Social Media Analysis for Suicide Risk Assessment, Emerging Technologies in Data Mining and Information Security. Springer, pp. 385-393.
Hamilton, M., 1967. Development of a rating scale for primary depressive illness. British journal of social and clinical psychology 6, 278-296.
Hao, T., Chen, X., Li, G., Yan, J., 2018. A bibliometric analysis of text mining in medical research. Soft Computing 22, 7875-7892.
Hassantabar, S., Zhang, J., Yin, H., Jha, N.K., 2022. Mhdeep: Mental health disorder detection system based on wearable sensors and artificial neural networks. ACM Transactions on Embedded Computing Systems (TECS).
Hemanandhini, I., Padmavathy, C., 2022. Mental Health Prediction Using Data Mining, Inventive Computation and Information Technologies. Springer, pp. 711-720.
Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural computation 9, 1735-1780.
Homan, C., Johar, R., Liu, T., Lytle, M., Silenzio, V., Alm, C.O., 2014. Toward macro-insights for suicide prevention: Analyzing fine-grained distress at scale, Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pp. 107-117.
Huang, P., 2022. A Mental Disorder Prediction Model with the Ability of Deep Information Expression Using Convolution Neural Networks Technology. Scientific Programming 2022.
Imani, M., Noferesti, S., 2022. Aspect extraction and classification for sentiment analysis in drug reviews. Journal of Intelligent Information Systems, 1-21.
Islam, M., Kabir, M.A., Ahmed, A., Kamal, A.R.M., Wang, H., Ulhaq, A., 2018. Depression detection from social network data using machine learning techniques. Health information science and systems 6, 1-12.
Jia, J., 2018. Mental Health Computing via Harvesting Social Media Data, IJCAI, pp. 5677-5681.
Jokelainen, J., Timonen, M., Keinänen-Kiukaanniemi, S., Härkönen, P., Jurvelin, H., Suija, K., 2019. Validation of the Zung self-rating depression scale (SDS) in older adults. Scandinavian journal of primary health care 37, 353-357.
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T., 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759.
Jungherr, A., 2016. Twitter use in election campaigns: A systematic literature review. Journal of information technology & politics 13, 72-91.
Kabir, M., Ahmed, T., Hasan, M.B., Laskar, M.T.R., Joarder, T.K., Mahmud, H., Hasan, K., 2022. DEPTWEET: A typology for social media texts to detect depression severities. Computers in Human Behavior, 107503.
Ketkar, N., 2017. Stochastic gradient descent, Deep learning with Python. Springer, pp. 113-132.
Keyvanpour, M.R., Barani Shirzad, M., Mahdikhani, L., 2022. WARM: a new breast masses classification method by weighting association rule mining. Signal, Image and Video Processing 16, 481-488.
Keyvanpour, M.R., Mehrmolaei, S., Etaati, A., 2020. PLI-X: temporal association rules mining in customer relationship management systems. Computer and Knowledge Engineering 2, 29-48.
Khetarpaul, S., 2021. Mining location based social networks to understand the citizen’s check-in patterns. Computing 103, 2967-2993.
Kim, J., Lee, D., Park, E., 2021. Machine learning for mental health in social media: bibliometric study. Journal of Medical Internet Research 23, e24870.
Kim, J., Lee, J., Park, E., Han, J., 2020. A deep learning model for detecting mental illness from user content on social media. Scientific reports 10, 1-6.
Kumari, K., Singh, J.P., Dwivedi, Y.K., Rana, N.P., 2021. Multi-modal aggression identification using convolutional neural network and binary particle swarm optimization. Future Generation Computer Systems 118, 187-197.
Kumari, S., 2022. Text Mining and Pre-Processing Methods for Social Media Data Extraction and Processing, Handbook of Research on Opinion Mining and Text Analytics on Literary Works and Social Media. IGI Global, pp. 22-53.
Large, M.M., 2022. The role of prediction in suicide prevention. Dialogues in clinical neuroscience.
Li, Y., Ni, P., Chang, V., 2020. Application of deep reinforcement learning in stock trading strategies and stock forecasting. Computing 102, 1305-1322.
Lovejoy, C.A., 2019. Technology and mental health: the role of artificial intelligence. European Psychiatry 55, 1-3.
Manaswi, N.K., 2018. Understanding and working with Keras, Deep Learning with Applications Using Python. Springer, pp. 31-43.
Marcus, M., Yasamy, M.T., van Ommeren, M.v., Chisholm, D., Saxena, S., 2012. Depression: A global public health concern.
Mehrmolaei, S., Keyvanpour, M.R., 2019. An enhanced hybrid model for event prediction in healthcare time series. International Journal of Knowledge-based and Intelligent Engineering Systems 23, 131-147.
Muhammad, W., Ullah, I., Ashfaq, M., 2020. An introduction to deep convolutional neural networks with Keras, Machine learning and deep learning in real-time applications. IGI Global, pp. 231-272.
Nassirtoussi, A.K., Aghabozorgi, S., Wah, T.Y., Ngo, D.C.L., 2014. Text mining for market prediction: A systematic review. Expert Systems with Applications 41, 7653-7670.
Park, S., Kim, I., Lee, S.W., Yoo, J., Jeong, B., Cha, M., 2015. Manifestation of depression and loneliness on social networks: a case study of young adults on Facebook, Proceedings of the 18th ACM conference on computer supported cooperative work & social computing, pp. 557-570.
Rao, G., Peng, C., Zhang, L., Wang, X., Feng, Z., 2020. A knowledge enhanced ensemble learning model for mental disorder detection on social media, International Conference on Knowledge Science, Engineering and Management. Springer, pp. 181-192.
Reece, A.G., Reagan, A.J., Lix, K.L., Dodds, P.S., Danforth, C.M., Langer, E.J., 2017. Forecasting the onset and course of mental illness with Twitter data. Scientific reports 7, 1-11.
Rhanoui, M., Mikram, M., Yousfi, S., Barzali, S., 2019. A CNN-BiLSTM model for document-level sentiment analysis. Machine Learning and Knowledge Extraction 1, 832-847.
Safa, R., Bayat, P., Moghtader, L., 2022. Automatic detection of depression symptoms in twitter using multimodal analysis. The Journal of Supercomputing 78, 4709-4744.
Sarker, I.H., 2021. Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Computer Science 2, 1-20.
Savargiv, M., Masoumi, B., Keyvanpour, M.R., 2020. A new ensemble learning method based on learning automata. Journal of Ambient Intelligence and Humanized Computing, 1-16.
Sharma, M., Pachori, R., Rajendra, A., 2017. Adam: a method for stochastic optimization. Pattern Recogn. Lett 94, 172-179.
Skaik, R., Inkpen, D., 2020. Using social media for mental health surveillance: a review. ACM Computing Surveys (CSUR) 53, 1-31.
Sood, S.K., Sood, V., Mahajan, I., 2021. An intelligent healthcare system for predicting and preventing dengue virus infection. Computing, 1-39.
Srividya, M., Mohanavalli, S., Bhalaji, N., 2018. Behavioral modeling for mental health using machine learning algorithms. Journal of medical systems 42, 1-12.
Staiano, J., Guerini, M., 2014. Depechemood: a lexicon for emotion analysis from crowd-annotated news. arXiv preprint arXiv:1405.1605.
Su, C., Xu, Z., Pathak, J., Wang, F., 2020. Deep learning in mental health outcome research: a scoping review. Translational Psychiatry 10, 1-26.
Tabassum, S., Pereira, F.S., Fernandes, S., Gama, J., 2018. Social network analysis: An overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, e1256.
Tadesse, M.M., Lin, H., Xu, B., Yang, L., 2019. Detection of depression-related posts in reddit social media forum. IEEE Access 7, 44883-44893.
Taghvaei, N., Masoumi, B., Keyvanpour, M.R., 2021. Analytical framework for mental health feature extraction methods in social networks. Intelligent Decision Technologies, 1-14.
Tan, P.-N., Steinbach, M., Kumar, V., 2016. Introduction to data mining. Pearson Education India.
Uddin, M.Z., Dysthe, K.K., Følstad, A., Brandtzaeg, P.B., 2022. Deep learning for prediction of depressive symptoms in a large textual dataset. Neural Computing and Applications 34, 721-744.
Wang, Y.Y., Yan, J.C., Li, C.Y., Zhong, L., Sun, Y., Fu, L.L., 2022. Development and preliminary validation of a self-rating anxiety inventory for maintenance haemodialysis patients. Psychology, Health & Medicine 27, 1482-1494.
William, D., Suhartono, D., 2021. Text-based depression detection on social media posts: A systematic literature review. Procedia Computer Science 179, 582-589.
Wongkoblap, A., Vadillo, M.A., Curcin, V., 2017. Detecting and treating mental illness on social networks, 2017 IEEE International Conference on Healthcare Informatics (ICHI). IEEE, pp. 330-330.
Wongkoblap, A., Vadillo, M.A., Curcin, V., 2022. Social media big data analysis for mental health research, Mental Health in a Digital World. Elsevier, pp. 109-143.
Yates, A., Cohan, A., Goharian, N., 2017. Depression and self-harm risk assessment in online forums. arXiv preprint arXiv:1709.01848.
Yu, Q., Miche, Y., Séverin, E., Lendasse, A., 2014. Bankruptcy prediction using extreme learning machine and financial expertise. Neurocomputing 128, 296-302.
Zeberga, K., Attique, M., Shah, B., Ali, F., Jembre, Y.Z., Chung, T.-S., 2022. A Novel Text Mining Approach for Mental Health Prediction Using Bi-LSTM and BERT Model. Computational Intelligence and Neuroscience 2022.
Zhang, D., Shi, N., Peng, C., Aziz, A., Zhao, W., Xia, F., 2021. Mam: A metaphor-based approach for mental illness detection, International Conference on Computational Science. Springer, pp. 570-583.
Zhang, Z., Sabuncu, M., 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in neural information processing systems 31.
Zucco, C., Calabrese, B., Agapito, G., Guzzi, P.H., Cannataro, M., 2020. Sentiment analysis for mining texts and social networks data: Methods and tools. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 10, e1333.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

DDdeep: deep learning-based text analysis for depression illness detection on social media posts

Status:

Version 1

Abstract

Figures

1. Introduction

2. Problem Definition

3. Previous Works

4. The Proposed Method

4.1 Pre-processing Unit

4.2 The Proposed Neural Network Component

4.3 Describing Method Parameters

5. Evaluating The Proposed Method

5.1 Datasets

5.2 Evaluation Measures

5.3 Environment and Test Method

5.4 Setting Method Parameters

5.5 Experimental Results

5.5.1 Discussion

6. Comparative Evaluation Of Methods

6.1 Discussion

7. Conclusion And Future Works

Declarations

References

Additional Declarations

Status:

Version 1