2.1 Data preprocessing
The classification of earthquake and blasting events is different from the traditional classification, in that the same event corresponds to multiple stations, and each station corresponds to three channel waveforms, which share the same classification label. In order to avoid the distortion of the accuracy rate of the model results caused by the occurrence of waveforms which belong to the same event recorded by different stations in any two of the training set, verification set and test set, the data preprocessing classification based on the event is carried out in advance in the preprocessing stage.
The data is divided into the training set, the verification set, and the test set by events, then the set consisting of training set and verification set is randomly divided into 5 parts. In order to ensure that the data input of the convolutional neural network are valid signals, the waveform of triggered seismic and blasting event stations is preprocessed as follows:
(1) Waveform truncation: according to the arrival time of various seismic phases in the event waveform, take the station waveform as the unit to truncate the waveform signal with a length of 9000 points 30s before the arrival time of the seismic phase.
(2) Detrend: perform the detrend operation on the truncated waveforms.
(3) Filtering: High pass filtering is used to remove the long period components in the waveform.
(4) Normalization: the waveform data is uniformly normalized and mapped to - 1 to 1.
(5) Initial machine automatic filtering: calculate the signal-to-noise ratio of various waveforms, and set an appropriate threshold to filter waveforms.
(6) Manual final confirmation: the waveforms of all stations after filtering shall be separately plotted in batches and checked manually to remove the waveforms of stations with abnormal waveforms.
2.2 Convolution neural network model
The convolution neural network model of waveform recognition is shown in Figure 1. The model can be divided into three parts: input layer, feature extraction layer and output layer. three-component seismic data with the length of 9000 sampling points after normalization processing are taken, and the preprocessed seismic data enter the network structure from the input layer. The feature extraction layer mainly includes three convolution layers, three maximum pooling layers, one drop out layer and two full connection layers. All convolution layers in the feature extraction layer use the relu activation function. The first full connection layer uses the relu activation function, and the second full connection layer uses the softmax activation function. To avoid over fitting, this model adds a drop-out layer that can randomly "discard" some nodes, thus improving the generalization ability of the model. Finally, the convolutional neural network model generates 18562 training parameters. The output layer is a sequence with a length of 2. After the input layer and the feature extraction layer, the vector with a length of 2 is output at the output layer. When the first bit is 1, it means blasting, and when the second bit is 1, it means earthquake.
2.3 Model training and parameter adjustment
The convolution neural network model is trained according to the real event type judgment in the actual earthquake quick report work. The waveform after data preprocessing is divided into the training set, the verification set, and the test set by the unit of events. The set composed of training set and verification set is trained in the way of five-fold cross validation, that is, the set composed of training set and verification set is randomly divided into five sets, with a total of five times of training. One of them is taken as the verification set in turn, and the remaining four are taken as the training set. Take the average training result of five times as the accuracy and loss value of this training, and optimize the network structure and hyperparameter according to this value and the loss and accuracy curve of the training process. When the hyperparameter are adjusted to the optimum, the test set is fed to the network to objectively evaluate the performance of the network.
Figure 2 shows the main data processing framework of this study, but the preprocessing process is omitted due to space constraints. Including: training set processing flow, verification set processing flow, and test set processing flow. These three parts have the same input method: label the waveforms corresponding to each event, disrupt the order of all waveforms with corresponding labels, input 128 waveforms as a group into the network, forward propagate to generate prediction labels, use prediction labels and real labels to calculate the cross entropy loss value, and accuracy.
The unique feature in the training set processing process is that it will back propagate, update the network parameters, and call all waveforms input into the network an epoch of training. After each epoch of training, the verification set waveforms will be input into the network in the same way. Similarly, after all the verification sets are input into the network, the cross entropy, loss value and accuracy rate can be calculated based on the prediction label and the corresponding input waveform label . The difference between the test set and the verification set is that the model selected in the test set is the final trained model. The label predicted by the model also needs to be a new organization set based on the event. In each event, the prediction results of each station waveform are integrated to give a final prediction value. Finally, the prediction value of each event and the real value label of each event in the test set are used to calculate the accuracy, precision, recall and F1 score based on the event. Among them, the ground truth label is converted into a unique hot coding, and the prediction label is a 2-bit array produced through the softmax layer.
In each epoch of training, data will be randomly truncated into a waveform with a length of 9000 points. The data has a 30% probability of data enhancement. The data enhancement includes random translation of the designated area of the data and increasing the Gaussian noise with a fixed signal to noise ratio, which can ’create’ more seismic data, reduce the occurrence of over fitting, and improve the generalization ability of the model.