Hierarchy-based domain adversarial neural network for bearing fault diagnosis under variable working conditions

doi:10.21203/rs.3.rs-4942209/v1

Download PDF

Research Article

Hierarchy-based domain adversarial neural network for bearing fault diagnosis under variable working conditions

https://doi.org/10.21203/rs.3.rs-4942209/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

The bearing fault is one of the primary factors affecting the safe and stable running of mechanical systems. To guarantee the normal and reliable running of the entire equipment, it is crucial to promptly and accurately monitor the operating conditions of bearings. Conventional fault diagnosis methods usually depend upon the assumption that the training and test data are consistently distributed and independent. However, this premise poses challenges to the resolution of fault diagnosis issues for changeable running conditions. To tackle the aforementioned problem, a novel hierarchy-based domain adversarial neural network (H-DANN) is introduced in this paper. For the proposed H-DANN model, it is mainly constructed based on the DANN. The domain discriminator enables the feature extractor to abstract domain-independent features and allows classifier transfer across different operating environments. Furthermore, to extract rich discriminative features, a hierarchy-based feature extractor is proposed based on a novel feature pyramid network (FPN) modified by the CNN-BiLSTM network. Finally, the results of two bearing datasets indicate that the H-DANN model is adept at precisely recognizing bearing fault categories under different running environments, outperforming some state-of-the-art models.

Variable working conditions

domain adversarial neural network

fault diagnosis

bearing

Bearings are an indispensable supporting element in mechanical systems, and their running status is directly related to the overall operational performance of the equipment [1]. However, due to the adverse operating environment, the risk of bearing failure increases significantly. Such bearing faults would reduce the operational reliability and even cause disastrous accidents [2–4]. Therefore, it is indispensable to develop a performance-enhancing fault diagnosis approach for promptly and accurately identifying bearing faults to ensure smooth mechanical equipment operation [5, 6]. With the unprecedented advancement in the domain of information technology, the approaches based on in-depth learning have seen a dramatic rise in popularity and are increasingly being utilized in the domain of industrial process fault diagnosis. However, many conventional fault diagnosis approaches depend on the assumption of independent, identically distributed training and test sets. This implies that the training and test sets are made up of the vibration data measured under identical working conditions. In practical engineering applications, mechanical equipment generally runs with different operating parameters to adapt to varying task requirements. This leads to variable operating conditions (including radial force, speed, load torque, etc.) for bearings, and there are limitations to the application of conventional deep learning-based methods [7–9]. Therefore, it is obliged to develop a diagnosis method with strong generalization, which can still perform well in one working condition after training in another working condition.

The adversarial-based domain adaptation strategy, also known as the domain adversarial neural network (DANN) approach, can effectively achieve accurate classification of fault types under a variety of operating conditions. It employs an adversarial learning mechanism to decrease the differences in the cross-domain distributions and further extract the domain-independent features that are invariant within the domain [20–23]. Grain et al. [24] first introduced the adversarial mechanism into the domain adaptation method to propose the DANN method, and it has been successfully applied for sentiment analysis and image recognition tasks. Mao et al. [25] integrated the powerful adaptation of DANN networks and structured correlation information between multiple fault types to construct a new loss function with discriminative regularizers, enhancing the effectiveness of transfer learning. Chen et al. [26] developed a collaborative diagnosis framework that integrates domain knowledge from multiple sources by combining the edge confrontation module and an internal confrontation mechanism. Different from the metric-based method, DANN utilizes neural networks and adversarial training to achieve adaptive feature alignment. Nevertheless, the serviceability of the DANN method is heavily dependent on the quality of raw data or the discrepancies in the distribution of data across dissimilar domains in fault diagnosis tasks. For some complex diagnosis tasks, the utility of the DANN method needs to be further improved to extract the domain-independent features with rich discriminative information.

In the realm of fault diagnosis, numerous studies have manifested the outstanding performance of the CNN (Convolutional Neural Network) for extracting the local or spatial features of vibration data. The FPN (feature pyramid network) [27] model, constructed based on CNN, is essentially a hierarchical neural network. Its top-down architecture with lateral connections significantly enhances the model’s capabilities to mine abstract features. However, CNN is unable to optimally leverage the temporal characteristics of vibration data, which is the most significant characteristic of the vibration data in comparison to other data types [28], such as images. The LSTM (Long Short-Term Memory) and its improved versions are variants of the RNN (Recurrent Neural Network), which possess the capability to address the gradient explosion/vanishing problem encountered by RNNs and extract hidden temporal features from input data [29, 30]. This means that the LSTM and its improved versions are suitable for handling data with significant temporal characteristics. Hence, a hybrid network combining CNN and BiLSTM is constructed to form the CNN-BiLSTM module, which allows for comprehensive extraction of temporal and spatial features.

In conclusion, this paper proposes a new hierarchy-based domain adversarial neural network (H-DANN) method to meet the challenge of bearing fault diagnosis under variable operating environments. The H-DANN model is primarily comprised of three modules, i.e., a hierarchy-based feature extractor, a fault classifier and a domain discriminator. The hierarchy-based feature extraction is primarily constructed on the basis of the FPN model, and uses its hierarchical neural networks to extract multi-scale fault characteristics. A constructed CNN-BiLSTM module that can extract the spatio-temporal features is integrated into the FPN model, enhancing its feature extraction capabilities. For the fault classifier, it is employed to process the features obtained by the feature extractor to identify the different fault types. The contributions of this article are as follows:

1) The CNN-BiLSTM network is constructed and added to the FPN model to form a hierarchy-based feature extractor, which can mine the multi-scale spatio-temporal features from the raw vibration data.

2) A novel the H-DANN method, based on a modified FPN model and DANN model, has been developed in this paper to capture rich domain-independent features to realize the bearing fault diagnosis under different running environments.

The rest of this article is structured as follows. Relevant theories are briefly introduced in Section 2. The fault diagnosis architecture of the H-DANN model, developed in Section 3, is detailed. In Section 4, the case studies were conducted on the Case Western Reserve University (CWRU) and PU (Paderborn University) datasets. Section 5 summarizes the full text.

2.1 Feature pyramid network (FPN)

The FPN was first presented by Lin et al. in 2017 [27], which is mainly applied in the realm of object detection in computer vision. FPN overcomes the limitations of traditional image pyramids in memory and computational resources. Its top-down structure with lateral connections can fuse shallow features and deep features, resulting in output features with strong, fault semantic information at all scales or hierarchies.

The architecture of FPN incorporates three modules: the bottom-up pathway module, the up-bottom pathway module and lateral connection module. For the bottom-up pathway module, it is formed by the classical backbone networks, such as VGGNet and ResNet [31]. The feedforward of the convolutional neural network in these backbone networks is used to obtain some features at different scales. In Fig. 1, {A1, A2, A3, A4, A5} are the output features from the five modules of the backbone network. Then, these features {A2, A3, A4} are processed by the horizontal 1×1 convolutional layer in sequence (lateral connection module). These convolution results are added to the results of the previous layer after upsampling to obtain features {M2, M3, M4} (top-down pathway module). The feature M5 is obtained by only attaching a 1×1 convolutional layer to A5. In the lateral connection module, they are utilized to unify the second dimension of the output dimension of the features {A2, A3, A4, A5}, that is, the number of channels, to promote top-down feature fusion, and it does not change the height dimension and width dimension of these features. Additionally, considering the large memory footprint, feature A1 is not further processed and then output. Finally, some convolutional layers are attached to {M2, M3, M4, M5} to obtain the final output features {P2, P3, P4, P5}. In this stage, these convolutional layers are used to further abstract features on one hand, and reduce the aliasing effect of upsampling on the other hand. In conclusion, some multi-hierarchy features can be extracted by the FPN, and these features have strong semantics at all hierarchies.

In the problem of fault diagnosis, the input data of various intelligent diagnosis models are the temporal signals, which are non-stationary and complex. Moreover, the fault features are usually weak and vulnerable to influence from working

Figure 1 The schematic diagram of FPN

conditions. Therefore, enhancing the feature extraction performance is necessary for fault diagnosis under different running environments. Considering the multi-hierarchy nature of FPN, it can be embedded into the feature extractor to wholly and precisely capture the discriminative features, thereby improving diagnosis accuracy.

2.2 Domain adversarial neural network (DANN)

The purpose of the DANN method presented by Ganin et al. [24] is to map fault data with distinct distributions to a common fault feature space, so as to make its distribution in the space as consistent as possible. The key lies in the domain discriminator. It represents the idea of adversarial training to adaptively reduce cross-domain distributional discrepancies and extract domain-independent features, which can achieve classifier transfer across variable working conditions.

As shown in Fig. 2, DANN is made up of a feature extractor, a discriminator and a classifier. The obtained dataset is segmented into the source domain and the target domain, and their data distribution is typically discrepant. The former is labeled, and the latter is not labeled. The optimization objective is to develop an intelligent diagnosis model for accurate identification of target domain data, which is unlabeled, using the knowledge gleaned from the labeled source domain. The feature extractor is utilized in this study to abstract the fault features at different hierarchies, and the extracted features are judiciously utilized to train the classifier. The abstracted fault features from the two domains are used to guide the rigorous confrontation training. The feature extractor attempts to make the domain type of the extracted feature cannot be accurately identified by the discriminator. Conversely, the discriminator attempts to precisely identify which domain the fault feature originates from. This mutual adversarial pattern enables the feature extractor to abstract domain-independent fault features. They are crucial for the

final fault classification to achieve accurate fault identification. Finally, the well-trained classifier developed in the source domain can then be used to identify fault types with remarkable accuracy in the target domain.

3.1 Model structure

This paper develops a new H-DANN method for bearing fault intelligent diagnosis, and its framework is shown in Fig. 3. It shows that the H-DANN method mainly contains three modules, namely a hierarchy-based feature extractor ${G_f}$, a fault classifier ${G_y}$ and a domain discriminator ${G_d}$. First, the measured original vibration data is used to form the fault dataset. Secondly, ${G_f}$ is constructed by a novel FPN model modified with a CNN-BiLSTM module and it is utilized for the capture of features from the original vibration data. Third, ${G_y}$ is trained using the features captured by ${G_f}$. In the subsequent stage, adversarial training would be conducted through ${G_f}$ and ${G_d}$. In this way, ${G_f}$ can gradually identify and extract domain-independent features. Finally, the well-trained ${G_f}$ and ${G_y}$ are utilized to formulate the intelligent diagnosis model under different operating environments.

3.2 Hierarchy-based feature extractor

A novel hierarchy-based feature extractor is developed in this article, which aims to extract rich and multi-hierarchy features from the data. Due to the hierarchical architecture of the FPN, the hierarchy-based feature extractor is formed on the basis of the FPN.

As shown in Fig. 3, the classical ResNet18 is used to construct the down-top pathway module of the hierarchy-based feature extractor. The ResNet18 is one kind of the residual networks, which contains 18 network layers with weights. In the Fig. 3, C1 denotes the first module of ResNet18, which is formed by a convolutional layer and a max-pooling layer. The parameters of the former are set to 64@7×2 (channels@kernel size×stride), and the latter is 2×2 (size×stride). C2, C3, C4 and C5 are the other four modules of ResNet18, and each module is formed by two residual network modules. All residual network modules consist of two convolutional layers with a kernel size of 3×1 (kernel size×stride). For these four modules, the number of output channels is 64, 128, 256 and 512, respectively. For module C2, the stride of all four convolutional layers is 1. The specific operations of ResNet18 are detailed in Ref. [32].

In the subsequent experimental study, the input size is set to 1024×1. Through the down-top pathway of the hierarchy-based feature extractor, the output size of each module is seen in Fig. 3. The CNN-BiLSTM module is introduced into the lateral connection module of the proposed feature extractor to comprehensively extract the spatio-temporal feature. For the CNN-BiLSTM module, one branch is formed by the 1×1 convolution operation, which is used to set the channel dimension of multi-scale features to be the same (512). This facilitates subsequent top-bottom feature fusion. Additionally, the output generated by a 1×1 convolutional layer can be viewed as the spatial feature information that is essentially obtained by the down-top pathway. Considering the rich temporal information in the vibration signal, the BiLSTM network is used to form another branch to extract the temporal feature. The bi-directional architecture of the BiLSTM network would double the length of the input data. Therefore, a 2×2 pooling operation is used for down-sampling. The temporally correlated features and spatially correlated features are fused to obtain comprehensive features (${f_1}$, ${f_2}$, ${f_3}$ and ${f_4}$). Elaborated model coefficients are presented in Table 1.

Additionally, the top-down pathway is used to fuse the shallow features and deep features, resulting in output features (${f^{\prime}_1}$, ${f^{\prime}_2}$, ${f^{\prime}_3}$ and ${f^{\prime}_4}$) with strong semantics at all scales. Then, a convolutional layer with 512@3×1 and padding = 1 is used to decrease the aliasing effect and further extract features. In this stage, the max-pooling layer with size 2×2 needs to be used to reduce the feature length for the other three convolutional layers, except for the first convolutional layer. Finally, the output feature ${F_f}$ is obtained by the feature fusion (matrix addition).

Table 1

Model parameters of the hierarchy-based feature extractor
Output	Operation	Parameters	Output shape
${f_1}$	Convolutional layer	512@1×1	512×32×1
	BiLSTM layer	32
	Max-pooling layer	2×2
${f_2}$	Convolutional layer	256@1×1	512×64×1
	BiLSTM layer	64
	Max-pooling layer	2×2
${f_3}$	Convolutional layer	128@1×1	512×128×1
	BiLSTM layer	128
	Max-pooling	2×2
${f_4}$	Convolutional layer	64@1×1	512×256×1
	BiLSTM layer	256
	Max-pooling layer	2×2
${F_1}$	Convolutional layer	512@3×1 padding = 1	512×32×1
${F_2}$	Convolutional layer	512@3×1 padding = 1	512×32×1
${F_2}$	Max-pooling layer	2×2	512×32×1
${F_3}$	Convolutional layer	512@3×1 padding = 1	512×32×1
${F_3}$	Max-pooling layer	4×4	512×32×1
${F_4}$	Convolutional layer	512@3×1 padding = 1	512×32×1
${F_4}$	Max-pooling layer	8×8	512×32×1
${F_f}$	Feature fusion	——	512×32×1

3.3 Fault classifier

The multi-hierarchy fusion features acquired from the hierarchy-based feature extractor ${G_f}$ are fed into the fault classifier ${G_y}$, which in turn achieves precise identification of fault types. Meanwhile, the training objective of the ${G_f}$ and the ${G_y}$ aims for accurate fault type identification based on extracted features. Therefore, there is no adversarial relationship between the ${G_y}$ and the ${G_f}$.

For the proposed H-DANN model, it includes a fault classifier ${G_y}$ and uses ReLU as the activation function. Additionally, to prevent over-fitting in the diagnosis model, the dropout function is used after each ReLU activation function separately, and the dropout rate is set to 0.5. The detailed parameters of the model are displayed in Table 2.

Table 2

The model parameters of the fault classifier
	No.	Operation	Parameters
Module 1	1	Fully connected layer	1024
	2	ReLU	/
	3	dropout	0.5
Module 2	4	Fully connected layer	10
	5	ReLU	/
	6	dropout	0.5
	7	softmax	/

3.4 Domain discriminator

The multi-hierarchy fusion features captured by the ${G_f}$ are transmitted to the ${G_d}$, which aims to recognize the domain from which the features extracted by the ${G_f}$originate. Different from the ${G_y}$, the ${G_d}$ in the training process receives input data from both the source and the target domains. In addition, the training goal of the ${G_f}$ is to prevent the ${G_d}$ from accurately recognizing the domain type of the extracted features. On contrary, the ${G_d}$ aims to identify the domain type of features as accurately as possible. Through this adversarial training mechanism, the ${G_f}$ was able to extract domain-independent features under cross-domain conditions. This suggests that the possibility of being able to transfer fault classifiers obtained by training only on the source domain to the target domain can be realized through this opposition training strategy. For the proposed H-DANN model, the elaborated structure of the ${G_d}$ is presented in Table 3.

In Table 3, the ${G_d}$ primarily consists of four fully connected layers. The first three layers have 1024 neurons of the ${G_d}$, while the fourth layer has 1 neuron with a sigmoid activation function for binary classification. The dropout function with a dropout rate of 0.5 is added to the ${G_d}$ to suppress overfitting. The model parameters can be found in Table 3. Additionally, it is noteworthy that, to conduct adversarial training, a gradient inversion layer (GRL) needs to be integrated between the ${G_f}$ and the ${G_d}$. This layer essentially multiplies the input gradient value by a negative number. This is because the optimization target of the ${G_f}$ is to maximize the domain classification loss, which is opposite to that of the ${G_d}$.

Table 3

The model parameters of the domain discriminator
	No.	Operation	Parameters
Module 1	1	Fully connected layer	1024
	2	ReLU	/
	3	Dropout	0.5
Module 2	4	Fully connected layer	1024
	5	ReLU	/
	6	Dropout	0.5
Module 3	7	Fully connected layer	1024
	8	ReLU	/
	9	Dropout	0.5
Module 4	10	Fully connected layer	1
	11	ReLU	/
	12	Dropout	0.5
	13	Sigmoid	/

3.5 Model training and test

To enable model training and testing, the collected vibration signals are first processed using a sliding window of length 1024 for sampling under variable working environments. This study follows a non-overlapping sampling strategy for sample collection. Therefore, in the subsequent experimental study, the length of input data or sample is 1024. The bearing fault dataset needs to be segmented into the source and target domain sets based on the working condition. They contain the same fault types, but the running conditions are disparate. This means that the corresponding data distributions are different.

During training, the source domain fault data is utilized to optimize the ${G_f}$ and ${G_y}$. Meanwhile, the source and target domains, along with the domain label, are utilized to optimize the ${G_f}$ and ${G_d}$ during training. The aggregate loss function ${\mathbf{L}}({\theta _f},{\theta _c},{\theta _d})$contains two parts, namely fault classification loss ${{\mathbf{L}}_c}({\theta _f},{\theta _c})$ and domain identification loss ${{\mathbf{L}}_d}({\theta _f},{\theta _d})$, and it is given by

$${\mathbf{L}}({\theta _f},{\theta _c},{\theta _d})={{\mathbf{L}}_c}({\theta _f},{\theta _c})+{{\mathbf{L}}_d}({\theta _f},{\theta _d})$$

in which,

$${{\mathbf{L}}_c}({\theta _f},{\theta _c})=\sum\limits_{{i=1}}^{{{n_s}}} {{L_c}({G_y}({G_f}(x_{i}^{s};{\theta _f});{\theta _c})} ,{\kern 1pt} {\kern 1pt} y_{i}^{s})$$

$${{\mathbf{L}}_d}({\theta _f},{\theta _d})=\sum\limits_{{j=1}}^{{{n_s}+{n_t}}} {{L_d}({G_d}({R_\lambda }({G_f}({x_j};{\theta _f}));{\theta _d}),{d_j})}$$

$${R_\lambda }(X)=X;{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \frac{{d{R_\lambda }(X)}}{{dX}}= - {\lambda _{GRL}}I$$

where ${\theta _f}$, ${\theta _c}$ and ${\theta _d}$ denote the model coefficients of ${G_f}$, ${G_y}$ and ${G_d}$, respectively. ${n_s}$ and ${n_t}$ are the total sample size of both domains. $x_{i}^{s}$ and $y_{i}^{s}$ are the i-th sample and the fault category label in the source domain. $x_{j}^{{}}$ and ${d_j}$ are the j-th sample and the domain label. ${R_\lambda }$ denotes the GRL. It can be seen from Eq. (4) that, for forward propagation, the GRL directly outputs the input data. In back propagation, the GRL would multiply the input gradient value by a negative number ($- {\lambda _{GRL}}$), and ${\lambda _{GRL}}$ denotes the weight coefficient of the GRL. The loss function of ${{\mathbf{L}}_c}({\theta _f},{\theta _c})$ and ${{\mathbf{L}}_d}({\theta _f},{\theta _d})$ is cross-entropy loss. Additionally, the H-DANN model was optimized using the Adam optimizer, and the ‘step’ learning strategy is used to further adjust the performance of the H-DANN model. In the test stage, the well-trained ${G_f}$ and ${G_y}$ are used to form the final fault intelligent diagnosis model. The ${G_d}$ acts only in the training stage.

The outstanding performance of the H-DANN method in different running environments is validated in two experiment cases on the classical CWRU and PU datasets. Additionally, some classical methods, including CNN, CORAL [15], MK-MMD [16] and DANN [24], are used to carry out comparative studies to indicate the advantage of the H-DANN method. The case studies are performed on PCs with GTX3090 Ti GPU, and all diagnosis models are constructed using PyTorch 2.0.1.

4.1 Case1: CWRU dataset

4.1.1 Description of the dataset

The CWRU dataset was acquired on the test bench seen in Fig. 4. It contains a torque transducer, an electric motor, a dynamometer, an encoder, a test bearing and a controller. The test bearing used in this study is an SKF605 deep groove ball bearing, which acts as the supporting bearing for the motor. The simulated faults are single-point failures located on the rollers, the outer and the inner race. Additionally, each fault location on the test bearings was machined with three fault sizes. Thus, the CWRU dataset comprises a total of 10 bearing data types. The detailed instructions about the CWRU dataset are discussed in Ref. [33].

As shown in Table 4, the CWRU dataset contains data for four different loads and speeds, corresponding to four working conditions. To conduct experiments on variable working condition diagnosis, six tasks for variable working conditions are set in this study, including 0→1, 0→2, 0→3, 1→2, 1→3, and 2→3. Specifically, 0→1 means that data from running environment 0 forms the source domain, and data from running environment 1 forms the target domain.

Table 4

The variable operating environments of CWRU dataset
Condition No.	Speed (rpm)	Load (hp)
0	1797	0
1	1772	1
2	1750	2
3	1730	3

4.1.2 Results and discussions

Table 5 shows the diagnosis results for six tasks, and the average accuracy of five experiments under the same conditions is given. The diagnosis accuracy of the H-DANN method for six tasks varies from 98.25–99.70%, with a mean accuracy of 99.27%. This suggests that the H-DANN method can precisely classify the fault categories under different running environments.

To illustrate the advantage of the H-DANN method in the article, it was compared and analyzed with popular methods such as CNN, CORAL [15], MK-MMD [16] and DANN [24]. For these four comparison methods, their feature extractors are the same, which are mainly constructed by five convolutional layers with parameters of 16@15×1, 16@3×1, 32@3×1, 32@3×1 and 64@3×1, respectively. The CNN method is comprised of a feature extraction network and a classifier, and the classifier is the same as the fault classifier ${G_y}$ of the H-DANN method shown in Table 2. For the domain adaptation methods with CORAL and MK-MMD metrics, they mainly contain two branch networks with shared weights, and the branch network contained a module for fault feature mining and a fully connected layer for fault identification. The source domain data for model training and the target domain data for final fault identification are used as the input of the two branch networks respectively. The output features are used to calculate distribution differences via the CORAL or MK-MMD loss functions. Additionally, for the branch network of the source domain, the final output features also need to be input into a fault classifier to classify fault types or calculate the classification loss. The above two loss functions are utilized to guide the training of the domain adaptation method, and an optimized branch network served as the final diagnosis model, which is tested in the target domain. The framework of the DANN method is presented in Fig. 2, and it mainly contains a domain discriminator given in Table 3, a classifier given in Table 2, and a feature extractor in Table 1.

In Table 5, the average accuracy of the CNN method for six tasks is 90.68%. Specifically, its diagnosis accuracy for task 0→3 is only 86.78%, significantly inferior to other methods relying on transfer learning. This is because, for ordinary intelligent diagnosis methods based on in-depth learning, they usually take the test and training data from identical running environments, with the two datasets having the same distribution. This implies traditional non-transfer learning methods are challenged when confronted with bearing fault diagnosis across different running environments. For the metric-based domain adaptation methods using CORAL and MK-MMD metrics, the mean diagnosis results are 94.38% and 95.78%, respectively. It illustrates that the addition of a metric loss function (CORAL or MK-MMD) to calculate the difference in the distribution between two domains can effectively enhance the generalization of the model, in addition to the classification loss. Therefore, a diagnosis model trained in one running environment can perform well in another working condition. Different from the metric-based method, the DANN method utilizes neural networks and adversarial training to adaptively achieve feature alignment. Such the feature alignment approach can flexibly and precisely judge the distribution differences, and it is less susceptible to data quality. Therefore, the diagnosis result of the DANN method varies from 97.11–98.38% with a mean accuracy of 97.86%, which exceeds that of the metric-based method, but is still inferior to the H-DANN method. This is because the constructed hierarchy-based feature extractor can extract the multi-hierarchy spatio-temporal feature information to increase diagnosis accuracy. Additionally, to prove the stability of the H-DANN method under variable working conditions, the specific diagnosis accuracy for five repeated experiments on task 0→3 is shown in Fig. 5, and the corresponding standard deviations given in Table 6. After analysis, the standard deviation of the H-DANN method in this study is only ± 0.17 in the experiment. Compared with other existing fault diagnosis methods, the fluctuation range is significantly smaller. Based on this, it is evident that the H-DANN method is practical in dealing with bearing fault diagnosis under discrepant running conditions.

Table 5

The diagnosis results for variable working conditions on the CWRU dataset
Tasks	CNN/%	CORAL/%	MK-MMD/%	DANN/%	Proposed/%
0→1	94.12	96.80	96.44	98.05	99.68
0→2	88.31	91.25	97.08	97.73	98.25
0→3	86.78	92.42	95.50	98.25	99.40
1→2	92.41	94.16	92.45	98.38	99.58
1→3	88.92	95.53	96.12	97.11	99.03
2→3	93.56	96.12	97.10	97.62	99.70
Mean	90.68	94.38	95.78	97.86	99.27

Table 6

Diagnosis results of five experiments for the same external conditions on task 0→3
Method	Mean Accuracy	Standard Deviation
CNN	86.78%	± 1.40
CORAL	92.42%	± 0.94
MK-MMD	95.50%	± 1.26
DANN	98.25%	± 0.39
Proposed	99.40%	± 0.17

4.2 Case2: PU dataset

4.2.1 Description of the dataset

The variable working condition experiments are carried out on the PU dataset to further demonstrate the advantage of the H-DANN method. The PU dataset, a 6203-bearing dataset, was obtained on the workbench shown in Fig. 6. On the PU dataset, the test bearings experience both artificially induced and real-life damage. The real-life damages stem from accelerated lifetime tests. Ten bearings with real-life damage are used to conduct the experimental studies, and the detailed information of these ten bearings is given in Table 7. The IR and OR denote the inner and outer ring of the bearing, respectively. Additionally, damage extents 1, 2 and 3 indicate that the damage lengths are less than 2mm, (2mm, 4.5mm) and greater than 4.5mm, respectively. The detailed instructions on the PU bearing fault dataset are discussed in Ref. [34].

As depicted in Table 8, the PU dataset contains data under four running conditions based on change in rotational load torque, radial force, speed. To conduct variable running condition diagnosis experiments, five tasks were set for variable working condition: 3→0, 0→2, 0→3, 2→0 2→0 and 2→3.

Table 7

The detailed collecting information of bearings with real-life damages
No.	Damage	Damage element	Extent	Bearing code
0	Normal	—	—	K001
1	Fatigue: pitting	OR	1	KA04
2	Plastic deformation: indentations	OR	1	KA15
3	Fatigue: pitting	OR	2	KA16
4	Fatigue: pitting	IR(+ OR)	2	KB23
5	Fatigue: pitting	IR(+ OR)	3	KB24
6	Plastic deformation: indentations	OR + IR	1	KB27
7	Fatigue: pitting	IR	3	K116
8	Fatigue: pitting	IR	2	K118
9	Fatigue: pitting	IR	1	K121

Table 8

The variable working conditions of PU dataset
Condition No.	Radial Force (N)	Speed (rpm)	Load torque (Nm)
0	1000	1500	0.7
1	1000	900	0.7
2	1000	1500	0.1
3	400	1500	0.7

4.2.2 Results and discussions

For each task, the result given is the average diagnosis accuracy of five repeated tests. It is intuitively observable that the diagnosis accuracy of the five methods on the PU dataset is substantially less than on the CWRU dataset presented in Table 5. This is due to the small difference in working environments on the CWRU dataset, as evident from Table 4, resulting in small distribution differences under variable working condition tasks. Furthermore, the bearing failures in the CWRU dataset are induced by artificial damages caused by the discharge machining process. Therefore, the corresponding fault features are distinct. However, as shown in Table 8, the experimental environment on the PU dataset exhibits significant differences, and it is based on corresponding vibration signals from real damaged bearings (accelerated lifetime tests) for experiment research, which increases the difficulty of the variable working condition task.

As shown in Table 9, for the traditional CNN model lacking feature transfer ability, the diagnosis accuracy for six variable working condition tasks ranges from 52.59–76.61%, and the mean accuracy is only 64.08%, much lower than other transfer learning-based methods. Then, by introducing the concept of domain adaptation to achieve feature alignment, the diagnosis accuracy can be significantly enhanced under changed working conditions. Especially for the DANN method, neural networks and adversarial training are used to adaptively achieve the feature alignment, and the mean diagnosis accuracy is 79.06%. In the H-DANN method, a hierarchy-based feature extractor combining CNN-BiLSTM network and FPN model is used to extract multi-hierarchy spatio-temporal feature information. Therefore, its diagnosis accuracy is over 80% for six tasks, and the mean accuracy is 86.38%. Especially for task 0→2, the diagnosis accuracy is 94.12%, and the figure depicted in Fig. 7 shows that the H-DANN method has excellent diagnosis performance for varying running conditions and tasks. The comparison of the diagnosis accuracy is also seen in Fig. 8. The analysis shows that the H-DANN method has the highest diagnosis accuracy for each of the different operating condition tasks. In conclusion, the proposed H-DANN method can precisely identify bearing faults for different operating environments.

Table 9

The diagnosis results for the variable working conditions on the PU dataset
Tasks	CNN/%	CORAL/%	MK-MMD/%	DANN/%	Proposed/%
3→0	55.08	62.98	70.97	73.27	80.55
0→2	76.61	78.17	85.19	84.73	94.12
0→3	52.59	60.82	70.95	77.91	82.54
2→0	76.22	82.80	87.25	86.48	92.60
2→3	59.88	63.36	73.22	72.92	82.10
Mean	64.08	69.63	77.52	79.06	86.38

To more deeply investigate the rationality of each component module in the H-DANN method, this study conducted contribution analysis experiments on task 0→2. The diagnosis results are presented in a column bar chart in Fig. 9. ‘No Hierarchy-based’, ‘No DANN’ and ‘No CNN-BiLSTM’ denote the diagnosis model with removing hierarchy-based feature extractor, domain discriminator and CNN-BiLSTM module, respectively. Specifically, ‘No Hierarchy-based’ is essentially the same as the original DANN model used in previous experiments, and its feature extractor only consists of five convolutional layers. ‘No DANN’ means that there is no adversarial training (no domain discriminator), and it also means that only the hierarchy-based feature extractor shown in Table 1 and fault classifier shown in Table 2 are used to form the diagnosis model. Compared to the H-DANN method, ‘No CNN-BiLSTM’ denotes that the original FPN model is used as the feature extractor. Clearly, Fig. 9 indicates the hierarchy-based feature extraction module, CNN-BiLSTM and domain discriminator all have a positive impact on the diagnosis performance.

Additionally, the simultaneous addition of a domain discriminator and a hierarchy-based feature extraction module can significantly enhance the diagnosis performance compared to the addition of a single module. This is because the domain discriminator can enable the feature extractor to abstract domain-independent fault features, so as to achieve classifier transfer for variable working conditions. Additionally, the hierarchy-based feature extractor can extract rich discriminative features to enhance the diagnosis accuracy. Furthermore, considering the rich temporal information in vibration signals, introducing the CNN-BiLSTM module can further enhance the H-DANN model to capture features that are conducive to fault classification. To sum up, the proposed H-DANN method, combining a novel hierarchy-based feature extractor and DANN, is deemed reasonable and superior in fault diagnosis under variable running environments.

The paper presents a novel H-DANN method to tackle the difficulties of different operating environments. The central contribution is to construct a hierarchy-based feature extractor combining FPN and CNN-BiLSTM, which is integrated into the DANN model, forming an H-DANN model. It can align feature data and extract rich discriminative features to boost the diagnosis accuracy under different running environments. The results demonstrate that the H-DANN method can recognize bearing faults across varying operating environments, and its diagnosis capability surpasses that of classical CNN, CORAL, MK-MMD and DANN methods in diagnosis capability.

However, the cross-domain scenario studied in this paper refers to different working conditions on the same test bench or test object. In engineering practice, it is usually difficult to carry out vibration tests directly on mechanical equipment with various faults. Therefore, in the future, we will develop an intelligent diagnosis model to solve such complex cross-domain tasks. Training the network with the vibration data from a real test bench, and the well-trained model is used to diagnose faults in actual mechanical equipment.

Author Contributions Yuanlin Zheng put forward constructive suggestions for the research conception. Jie Liu provided methodology, designed the network structure and also modified the manuscript, and Ting Wang has made main contributions in experimental verification and initial draft writing. All authors reviewed the manuscript.

Funding This work is supported by the Natural Science Basic Research Program of Shaanxi Grant (No. 2024JC-YBMS-343), National Natural Science Foundation of China (No. 51905422), and Scientific Research Program Funded by Shaanxi Provincial Education Department (No.23JY062).

Data Availability All or part of the data, models, or programs generated or used in this study can be acquired by contacting the corresponding author.

Conflict of interest The authors declare that they have no conflict of interest.

Chen, X., Yang, R., Xue, Y., Huang, M., Ferrero, R., Wang, Z.: Deep Transfer Learning for Bearing Fault Diagnosis: A Systematic Review Since 2016. IEEE Trans. Instrum. Meas. 72, 1-21 (2023)
Meng, H., Zhang, J., Zhao, J., Wang, D.: Multi-scale feature extraction and fusion method for bearing fault diagnosis based on hybrid attention mechanism. Signal Image Video Process. 1-11 (2024)
Xie, J., Liu, J., Ding, T., Wang, T., Yu, T.: Self-Attention Metric Learning Based on Multiscale Feature Fusion for Few-Shot Fault Diagnosis. IEEE Sensors J. 23(17), 19771-19782 (2023)
Zhao, K., Feng, J., Shao, H.: A novel conditional weighting transfer Wasserstein auto-encoder for rolling bearing fault diagnosis with multi-source domains. Knowl-Based Syst. 262, 110203 (2022)
Tao, H., Qiu, J., Chen, Y., Stojanovic, V., Cheng, L.: Unsupervised cross-domain rolling bearing fault diagnosis based on time-frequency information fusion. J. Franklin Inst. 360(2), 1454-1477 (2022)
Zhang, X., He, C., Lu, Y., Chen, B., Zhu, Le., Zhang, L.: Fault diagnosis for small samples based on attention mechanism. Meas. 187, 110242 (2022)
Miao, Y., Zhang, B., Lin, J., Zhao, M., Liu, H., Liu, Z., Li, Hao.: A review on the application of blind deconvolution in machinery fault diagnosis. Mech. Syst. Signal Proc. 163, 108202 (2022)
Fang, H., Deng, J., Chen, D., Jiang, W., Shao, S., Tang, M., Liu, J.: You can get smaller: A lightweight self-activation convolution unit modified by transformer for fault diagnosis. Adv. Eng. Inform. 55, 101890 (2023)
Yang, J., Liu, J., Xie, J., Wang, C., Ding, T.: Conditional GAN and 2-D CNN for bearing fault diagnosis with small samples. IEEE Trans. Instrum. Meas. 70, 1-12 (2021)
An, Y., Zhang, K., Chai, Y., Liu, Q., Huang, X.: Domain adaptation network base on contrastive learning for bearings fault diagnosis under variable working conditions. Expert Syst. Appl. 212, 118802 (2023)
Su, K., Liu, J., Xiong, H.: A multi-level adaptation scheme for hierarchical bearing fault diagnosis under variable working conditions. J. Manuf. Syst. 64, 251-260 (2022)
Liu, S., Jiang, H., Wu, Z., Yi, Z., Wang, R.: Intelligent fault diagnosis of rotating machinery using a multi-source domain adaptation network with adversarial discrepancy matching. Reliab. Eng. Syst. Saf. 231, 109036 (2022)
Che, Z., He, L., Liu, Y., Bao, C.: Transferable mapping shift network for unsupervised domain adaptation using in vibration signal fault diagnosis under variable conditions. Signal Image Video Process. 1-11 (2024)
Borgwardt, K. M., Gretton, A., Rasch, M. J., Kriegel, H.-P., Schölkopf, B., Smola, A. J.: Integrating structured biological data by Kernel Maximum Mean Discrepancy. Bioinformatics 22(14), e49–e57 (2006)
Sun, B., Saenko, K.: Deep coral: Correlation alignment for deep domain adaptation. Eur. Conf. Comput. Vis. (ECCV) 443-450 (2016)
Long, M., Cao, Y., Cao, Z., Wang, J., Jordan, I. M.: Transferable representation learning with deep adaptation networks. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI/PAMI) 41(12), 3071-3058 (2019)
Cao, X., Wang, Y., Chen, B., Zeng, N.: Domain-adaptive intelligence for fault diagnosis based on deep transfer learning from scientific test rigs to industrial applications. Neural. Comput. Appl. 33, 4483-4499 (2020)
Wang, X., He, H., Li, L.: A hierarchical deep domain adaptation approach for fault diagnosis of power plant thermal system. IEEE Trans. Industr. Inform. 15(9), 5139-5148 (2019)
Wan, L., Li, Y., Chen, K., Gong, K., Li, C.: A novel deep convolution multi-adversarial domain adaptation model for rolling bearing fault diagnosis. Meas. 191, 110752 (2022)
Zhou, K., diehl, E., Tang, J.: Deep convolutional generative adversarial network with semi-supervised learning enabled physics elucidation for extended gear fault diagnosis under data limitations. Mech. Syst. Signal Proc. 185, 109772 (2023)
Hu, Q., Si, X., Qin, A., Lv, Y., Liu, M.: Balanced adaptation regularization based transfer learning for unsupervised cross-domain fault diagnosis. IEEE Sensors J. 22(12), 12139-12151 (2022)
Jiang, X., Wang, X., Han, B., Wang, J., Zhang, Z., Ma, H., Xing, S., Man, K.: A novel hybrid distance guided domain adversarial method for cross domain fault diagnosis of gearbox. Meas. Sci. Technol. 34, 065115 (2023)
Shen, C., Tian, J., Zhu, J., Shi, J., Zhu, Z., Wang, D.: A new multisource domain bearing fault diagnosis method with adaptive dual-domain obfuscation weighting strategy. 72, 1-11 (2023)
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. 189-209 (2015)
Mao, W., Liu, Y., Ding, L., Safian, A., Liang, X.: A new structured domain adversarial neural network for transfer fault diagnosis of rolling bearings under different working conditions. IEEE Trans. Instrum. Meas.70, 1-13 (2020)
Chen, X., Shao, H., Xiao, Y., Yan, S., Cai, S., Liu, B.: Collaborative fault diagnosis of rotating machinery via dual adversarial guided unsupervised multi-domain adaptation network. Mech. Syst. Signal Proc. 198, 110427 (2023)
Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie S.: Feature Pyramid Networks for Object Detection. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), 936-944 (2017)
Lopes, O. I., Zou, D., Abdulqadder, H. I., Akbar, S., Li, Z., Ruambo, F., Pereira, W.: Network intrusion detection based on the temporal convolutional model. Comput. Secur. 135, 103465 (2023)
Gao, D., Zhu, Y., Ren, Z., Yan, K., Kang, W.: A novel weak fault diagnosis method for rolling bearings based on LSTM considering quasi-periodicity. 231, 107413 (2021)
Shen, J., Zhao, D., Liu, S., Cui, Z.: Multiscale attention feature fusion network for rolling bearing fault diagnosis under variable speed conditions. Signal Image Video Process. 1-13 (2024)
Zhu, Z., Lei, Y., Qi, G., Chai, Y., Mazur, N., An, Y., Huang, X.: A review of the application of deep learning in intelligent fault diagnosis of rotating machinery. Meas. 112346 (2022)
Gao, M., Song, P., Wang, F., Liu, J., Mandelis, A., Qi, D.: A novel deep convolutional neural network based on ResNet-18 and transfer learning for detection of wood knot defects. J. Sensors 2021, 16 (2021)
Smith, A. W., Randall, B. R.: Rolling element bearing diagnostics using the Case Western Reserve University data: A benchmark study. Mech. Syst. Signal Proc. 64, 100-131 (2015)
Chen, Y., Peng, G., Xie, C., Zhang, W., Li, C., Liu, S.: ACDIN: Bridging the gap between artificial and real bearing damages for bearing fault diagnosis. Neurocomputing, 294, 61-71 (2018)

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Hierarchy-based domain adversarial neural network for bearing fault diagnosis under variable working conditions

Status:

Version 1

Abstract

Figures

1 Introduction

2 Theoretical background

2.1 Feature pyramid network (FPN)

2.2 Domain adversarial neural network (DANN)

3 The proposed H-DANN method

3.1 Model structure

3.2 Hierarchy-based feature extractor

3.3 Fault classifier

3.4 Domain discriminator

3.5 Model training and test

4 Experimental studies

4.1 Case1: CWRU dataset

4.1.1 Description of the dataset

4.1.2 Results and discussions

4.2 Case2: PU dataset

4.2.1 Description of the dataset

4.2.2 Results and discussions

5 Conclusions

Declarations

References

Additional Declarations

Status:

Version 1

Output	Operation	Parameters	Output shape
\({f_1}\)	Convolutional layer	512@1×1	512×32×1
	BiLSTM layer	32
	Max-pooling layer	2×2
\({f_2}\)	Convolutional layer	256@1×1	512×64×1
	BiLSTM layer	64
	Max-pooling layer	2×2
\({f_3}\)	Convolutional layer	128@1×1	512×128×1
	BiLSTM layer	128
	Max-pooling	2×2
\({f_4}\)	Convolutional layer	64@1×1	512×256×1
	BiLSTM layer	256
	Max-pooling layer	2×2
\({F_1}\)	Convolutional layer	512@3×1 padding = 1	512×32×1
\({F_2}\)	Convolutional layer	512@3×1 padding = 1	512×32×1
\({F_2}\)	Max-pooling layer	2×2	512×32×1
\({F_3}\)	Convolutional layer	512@3×1 padding = 1	512×32×1
\({F_3}\)	Max-pooling layer	4×4	512×32×1
\({F_4}\)	Convolutional layer	512@3×1 padding = 1	512×32×1
\({F_4}\)	Max-pooling layer	8×8	512×32×1
\({F_f}\)	Feature fusion	——	512×32×1