2.2.1 Text Categorization——textCNN
In typhoon disasters, people use social media platforms to post various typhoon disaster-related news. Analyzing the information released by social media platforms timely can help governments and social groups gain an awareness of the disaster situation, understand the disaster situation in different locations and formulate corresponding rescue and recovery actions based on the analysis results.
The textCNN model is used in this study to explore the relationship between social media data and disasters. Yoon Kim applied Convolutional Neural Networks (CNN) to the task of text classification (Nguyen D T et al. 2016). The model utilizes multiple kernels of different sizes to extract key information in sentences (similar to n-grams with multiple window sizes), which can better capture local correlations. No changes have been made to textCNN in the network structure compared with the traditional image CNN network. The traditional textCNN model consists of four parts: input layer, convolutional layer, pooling layer, and fully connected layer. The first layer is the input layer. The input layer is an n×k matrix, where n is the number of words in a sentence, and k is the dimension of the word vector corresponding to each word. In addition, the padding operation is performed on the original sentence to make the vector length consistent. The second layer is a convolutional layer. Each convolution operation is equivalent to a feature vector extraction. By defining different windows, different feature vectors are extracted to form the output of the convolution layer. The third layer is the pooling layer, the role of which is to pool sentences of different lengths to obtain fixed-length vector representations. Commonly used pooling methods are 1-max pooling, k-max pooling, and average pooling, etc. The last layer is the fully connected layer, which is used to map the learned feature representation to the label space of the sample, and use the softmax activation function to output the classification category probability (Kalchbrenner N et al. 2014).
Compared with traditional models, CNN models don’t rely on well-designed features and complex natural language processing tools, and have the advantages of simple network structure, less computation, and faster training speed. By introducing already trained word vectors, the CNN model performs well in multiple datasets. The word vectors used in this study were provided by the Word2Vec model (Mikolov T et al. 2013). The model has different forms of channels in both static word vectors and dynamic word vectors, one of which is kept static while the other is dynamically fine-tuned through backpropagation during training. In this two-channel architecture, each filter is applied to both channels and the result is added.
2.2.2 Disaster Assessment Based on BP Neural Network Model
The back-propagation neural network continuously corrects the network weights and thresholds through the training of sample data, so that the error function decreases along the negative gradient direction and approaches the desired output. It is a widely used neural network model, which is mostly used for function approximation, model recognition, and classification, data compression and time series prediction (YE X et al. 2011). The BP network consists of an input layer, a hidden layer, and an output layer. The hidden layer can have one or more layers. Figure 3 is the used three-layer BP network model of m×L×n. In this study, the related factors are set to be the input layer, and the value of disaster economic loss is set to be the output layer. There are two main steps in establishing the BPNN model. (1) Model establishment and correlation analysis. Program the process and do network training using sample data to determine parameters and to establish a trained neural network; (2) Model evaluation. Input new related factor data, to get the estimated data of economic loss from the output layer, compare the estimated value with the observed value and evaluate the accuracy of the trained network.
To obtain a better fitting effect and avoid overfitting, we use the K-fold cross-validation method to validate the model. The K-fold cross-validation method is to randomly divide the training set into K groups, use (K-1) groups for modeling, and use the remaining group of the data to make predictions, and compare the predicted results with the actual values. Repeat the above steps until all samples are predicted.
In the process of building a BP neural network model, it is very important to allocate the number of hidden neuron nodes (L). The following two empirical formulas are commonly used to calculate the number of hidden nodes (Zhuo et al. 2011).
Formula (1):
$$\text{L}=\sqrt{m+n}+a$$
Formula (2):
\(\text{L}=\sqrt{0.43mn+0.12{n}^{2}+2.54m+0.77n+0.35}\) +0.51
In these two formulas, the parameter m represents the number of neurons in the input layer, the parameter n represents the number of neurons in the output layer, and a is an empirical integer between 1 and 10.
The BP network selects \(\text{f}\left(\text{x}\right)=\frac{1}{1+{e}^{-x}}\) as the sigmoid transfer function and \(\text{E}=\frac{\sum _{i}{({t}_{i}+{O}_{i})}^{2}}{2}\) as backpropagation error function (\({t}_{i}\) is the expected output, \({O}_{i}\) is the calculated output of the network). The BP neural network makes the error function E reach a minimum by continuously adjusting the network weights and thresholds.
This study uses root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (\({R}^{2}\)) to evaluate the performance of the model. RMSE represents the standard deviation between the actual loss and the evaluation result, MAE represents the mean absolute difference between the actual loss and the evaluation result (Willmott and Matsuura 2005), and \({R}^{2}\) represents the ratio of the evaluation result to the variance of the actual loss (Draper and Smith 1998). The formulas for the three metrics are as follows:
$$\text{R}\text{M}\text{S}\text{E}=\sqrt{\frac{1}{n}{\sum }_{i=1}^{n}{({y}_{i}-{\tilde{y}}_{i})}^{2}}$$
$$\text{M}\text{A}\text{E}=\frac{1}{n}{\sum }_{i=1}^{n}|{y}_{i}-{\tilde{y}}_{i}|$$
$${R}^{2}=1-\frac{{\sum }_{i=1}^{n}{({y}_{i}-{\tilde{y}}_{i})}^{2}}{{\sum }_{i=1}^{n}{({y}_{i}-\stackrel{-}{y})}^{2}}$$
When \({y}_{i}=\text{a}\text{c}\text{t}\text{u}\text{a}\text{l} \text{d}\text{a}\text{m}\text{a}\text{g}\text{e}\), \({\tilde{y}}_{i}=\text{a}\text{s}\text{s}\text{e}\text{s}\text{s}\text{e}\text{d} \text{d}\text{a}\text{m}\text{a}\text{g}\text{e}\), \(\stackrel{-}{y}=\)mean value of the actual damage; and \(n=\) number of sample used for calculating model performance.