Bearing Fault Diagnosis Based on Improved Morlet Wavelet Transform and Shallow Residual Neural Network (2024)

1. Introduction

Rolling bearings are a core component of rotating mechanical systems. Bearing failure results in safety issues [1]. Therefore, the development of accurate and effective bearing fault diagnosis techniques is crucial for ensuring the operational safety of rotating machinery.

In traditional methods used to extract vibration signal features, time–frequency analysis methods are commonly implemented, with examples of such methods including frequency domain statistical feature analysis [2], fast Fourier transform [3], wavelet transform [4], and time–frequency map analysis [5]. However, traditional manual extraction methods have obvious shortcomings. First of all, they require artificial feature selection, design, and extraction, all of which consume a great deal of time. Secondly, traditional feature extraction methods are based on artificial assumptions and simplified models, which may not be able to capture nonlinear relationships in complex fault modes. They also demonstrate insufficient adaptability when dealing with new types of faults. Therefore, it is necessary to reduce dependence on empirical knowledge and improve the recognition rate of complex faults [6].

With the recent developments in deep learning, bearing fault diagnosis based on deep learning has been applied extensively. This method is driven by original signal data and uses a neural network model to extract features and complete classification tasks [7]. However, in cases where the provided training dataset is unbalanced, with limited samples for certain fault labels, the performance of CNNs tends to decline. To address the issue of dataset imbalance, the preferred approach is to employ a generative adversarial network (GAN) for data generation. However, in existing research studies, GANs primarily focus on the overall similarity of the generated data to the original measurements while overlooking the similarity in fault characteristics. This oversight results in a loss of crucial information for fault diagnosis. To address this issue, Ruan et al. [8] proposed two modifications for the general GAN. Zhang et al. [9] solved the problem of gradient disappearance in the training process of deep neural networks by introducing residual joining, which further improved the classification performance. Regarding development based on convolutional neural networks, Hoang et al. [10] established a two-dimensional CNN model for rolling bearing fault diagnosis. Zhao et al. [11] added a variable dynamic weight layer to a residual neural network model to improve its diagnosis accuracy. CNNs perform well in actual diagnosis tasks, but there are still several problems: First, when the number of recognition network layers increases, the gradient disappears, which makes training more difficult. Second, the ability of CNNs to extract fault features and generalize them is weakened under multi-operating conditions and when there is complex environmental noise [12,13].

To solve this problem, a model based on wavelet transform and a residual neural network is proposed and applied to bearing fault diagnosis under multiple working conditions. First, the original vibration signal is transformed into a two-dimensional time–frequency signal via wavelet transform, and then the obtained two-dimensional signal is used as the input for residual network feature extraction. By inputting the wavelet-transformed signal into the residual network, the multi-scale and frequency information extracted via wavelet transform can be used to combine the advantages of the residual network, thus achieving better feature learning.

The main innovations of this paper include: (1) by setting the central frequency $ω_{0} = 2 π$ and the parameter $β = 1.82$ in the modified Morlet wavelet, it approximates the admissible conditions, thereby improving the characteristic issue of time-frequency graph aliasing. (2) The improved Morlet wavelet transform and neural network are combined and applied to bearing fault diagnosis. The residual network can further extract fault features from time-frequency maps.

The remaining part of this paper is organized as follows. Section 2.1 introduces the theory and method of improved Morlet wavelet in detail. Section 2.2 presents wavelet residual networks. Section 2.3 describes the test bench and experimental data set. Section 3 is the experimental results. Section 4 discusses and analyzes the results. Section 5 concludes the whole paper.

2. Methods

2.1. Improved Morlet Wavelet Transform

Wavelet transform can simultaneously obtain time domain and frequency domain signals and instantaneously capture the signals’ features. Compared with Fourier transform, wavelet transform can provide finer resolutions and is more suitable for analyzing nonstationary signals. Continuous wavelet transform (CWT) can limit the frequency range of analysis to the frequency band of interest by selecting the appropriate mother wavelet function, thus providing more accurate frequency information, which is very suitable for fault diagnosis.

The Morlet wavelet is a particular type of Gabor wavelet, which was proposed by Morlet for seismic signal analysis [14]. With a Morlet wavelet, there is an actual and perceived difference of 90°, so it is easier to obtain instantaneous information. However, the direct current component of a Morlet wavelet is not zero, which does not satisfy the wavelet admissibility condition and may distort the time–frequency map to some extent.

Once improved, the fixed wavelet frequency parameter $ω_{0} = 2 π$ , and the wavelet scale parameter is equal to the wavelet principal frequency. The function space generated by the improved Morlet wavelet is equal to that generated by the original Morlet wavelet.

2.1.1. Wavelet Theory

Assuming $φ \in L^{2} \cap L^{1}$ , $\overset{\land}{φ} (0) = 0$ generates a function group as follows ${φ_{a, b}}$ :

$φ_{a, b} (t) = {|a|}^{- \frac{1}{2}} φ (\frac{t - b}{a})$

(1)

Equation (1) is a continuous wavelet, where $a \in R - {0}$ , $b \in R$ , and $φ$ is a basic wavelet. If $φ$ is a double-window function, then $φ$ is called a window wavelet function.

In signal $f$ , the continuous wavelet transform $W_{f} (a, b)$ of $f$ is defined as

$W_{f} (a, b) = 〈f, φ_{a, b}〉 = {|a|}^{- \frac{1}{2}} \int_{R} f (t) \bar{φ (\frac{t - b}{a})} d t$

(2)

$φ \in L^{2} \cap L^{1}$ and satisfies

$C_{φ} = \int_{R} \frac{{|\overset{\land}{φ} (ω)|}^{2}}{|ω|} d ω < \infty$

(3)

If $f$ is continuous at $t$ , then there is a wavelet reconstruction formula for signal $f$ :

$f (t) = \frac{1}{C_{φ}} \iint_{R^{2}} W_{f} (a, b) φ_{a, b} (t) \frac{d a}{a^{2}} d b$

(4)

In the above discussion, $\overset{\land}{φ} (ω)$ is the Fourier transform of $φ (t)$ , and $\bar{φ (t)}$ is the conjugate of $φ (t)$ .

2.1.2. Morlet Wavelet

Gabor introduced a Gauss envelope complex wavelet for signal time–frequency analysis. Morlet introduced this wavelet into seismic signal analysis, and its analytical formula is

$m (t) = e^{j ω_{0} t} e^{- t^{2} / 2}$

(5)

Equation (5) is a Morlet wavelet whose Fourier transform is

$\overset{\land}{m} (ω) = \int_{- \infty}^{+ \infty} e^{j ω_{0} t} e^{- t^{2} / 2} e^{- j ω t} d t = \sqrt{2 π} e^{- {(ω - ω_{0})}^{2} / 2}$

(6)

That is

$\overset{\land}{m} (0) = \sqrt{2 π} e^{- ω_{0}^{2} / 2} \neq 0$

(7)

Obviously, Morlet wavelets do not satisfy the admissibility condition of Equation (3), so there is no reconstruction or inverse transformation formula in the form of Equation (5). Chakraborty et al. [15] proposed that Morlet wavelets might satisfy the admissibility condition when $ω_{0} \geq 5$ .

2.1.3. Morlet Wavelet Improvement

In Morlet wavelet given by Equation (5), taking parameter $ω_{0} = 2 π$ , the improved Morlet fundamental wavelets are obtained as follows:

$φ (t) = \frac{1}{\sqrt{2 π}} \exp (- \frac{t^{2}}{2} + j 2 π t)$

(8)

In Equation (8), $\frac{1}{\sqrt{2 π}}$ is the wavelet amplitude parameter such that

$\int_{R} |φ (t)| d t = 1$

(9)

The modified Morlet wavelet of Formula (8) is transformed by “expansion” and “translation”, and the improved Morlet wavelet function family can be obtained:

$φ_{f, τ} (t) = \frac{|f|}{\sqrt{2 π}} \exp [- \frac{f^{2} {(t - τ)}^{2}}{2} + j 2 π f (t - τ)]$

(10)

In Equation (10), the parameters $t$ and $τ$ are time, whose variable unit is $s$ , and $f$ is the wavelet dominant frequency, whose unit is Hz.

Morlet wavelet kernel function transform can be used for signal time–frequency analysis. A Morlet wavelet is a double-window function, and the time window center $t^{*}$ of $φ_{f, τ} (t)$ , frequency window center f^*, time window radius $Δ t$ , and frequency window radius $Δ f$ are, respectively,

$t^{*} = τ, f^{*} = f, Δ t = \frac{1}{\sqrt{2 π}}, Δ f = \frac{f}{2 \sqrt{2} π}$

(11)

The time–frequency window is

$[τ - \frac{1}{\sqrt{2} f}, τ + \frac{1}{\sqrt{2} f}] \times [f - \frac{f}{2 \sqrt{2} π}, f + \frac{f}{2 \sqrt{2} π}]$

(12)

and there are

2.2. Residual Neural Network

ResNets was proposed by He et al. [16]. In the traditional deep layer, with increasing network layers, the gradient disappears, which leads to a poor training effect. To solve the negative optimization problem, residual blocks were introduced into the network. A residual block (RB) contains a shortcut connection and a main connection composed of several convolutions. The structure is shown in Figure 2a.

He et al. further optimized the RB structure, as shown in Figure 2b. First, the identity mapping range of the shortcut join is enlarged. Secondly, the order of BN, RELU, and convolution layers is adjusted. The training effect is improved after these changes.

Bearing fault signals are typically nonstationary and nonlinear, with fault characteristics in different signal frequency ranges [17]. Therefore, feature extraction methods are used for extracting feature information from data at different scales. Features at various scales can capture data structures and patterns at different levels. Wavelet transform is a multi-scale analysis method that can decompose a signal into low-frequency and high-frequency subbands, the former corresponding to the overall trend of the signal and the latter to its details and edge features [18].

Based on [19], wavelet transform and residual neural networks are combined in this paper, and their respective advantages can be fully utilized. Wavelet transform can transform signals into subbands with different frequency ranges and provide abundant information in the frequency domain. These subbands can then be passed as inputs to a residual neural network for further feature extraction and learning. The network can learn the correlation between and the importance of different frequency subbands to obtain a more discriminative feature representation [20]. At the same time, a residual connection can reduce the information loss when the signal propagates in the deep network and accelerates the network’s convergence and training process.

The diagnosis process of wavelet transform and residual network proposed in this paper is as follows: firstly, wavelet transform is performed on the original one-dimensional vibration signal to obtain time–frequency matrix, and then the obtained two-dimensional signal is input into ResNets, and the network learns the energy characteristics of different frequencies and in time–frequency matrix. Finally, softmax is input into the full connection layer for fault-type classification diagnosis. It should be noted that the wavelet bases used in this paper can be altered flexibly according to the characteristics of the data. The specific flow chart of the method is shown in Figure 3, and the specific parameters of the model are shown in Table 2.

The model consists of 4 convolution layers, 2 residual blocks, and 1 fully connected layer. The convolution kernel size of the first layer is 3 × 3, and the number of channels is 8. Then, it passes through the pooling layer with a size of 2 × 2. The convolution kernel size of the second layer is 3 × 3, the number of channels is 16, and the pooling layer size is also 2 × 2. The residual fast is located in the third and fourth layers, its convolution kernel is 3 × 3, and the number of channels is 16. Then, it is input to the fully connected layer, and the output size is 32. Finally, the output layer size is 10; that is, the classification results in 10 delineations.

2.3. Introduction to the Experiment

The Case Western Reserve University rolling bearing data set was used in the experiment. The dataset has four states: normal (N), inner ring fault (IR), outer ring fault (OR), and rolling element fault (RB). These states are shown in Figure 4. The rotational speed is 1772 r/min, the sampling frequency is 12 kHz, and the load is 0, 1, 2, and 3 HP.

The diameters are 0.007 mm, 0.014 mm, and 0.021 mm. The inner ring fault marks are IR1, IR2, and IR3. The outer ring fault marks are OR1, OR2, and OR3. The rolling element fault marks are BA1, BA2, and BA3. The normal state is NO. The specific sample classification is shown in Table 3. Figure 5 shows ten vibration signals in the time domain.

3. Results

3.1. Comparison and Analysis of Time–Frequency Maps

The original Morlet wavelet’s direct current component is not zero and does not satisfy the wavelet admissibility condition. The results are as follows: (1) the time–frequency map is blurred and unclear, and (2) the different frequency components interfere with each other, resulting in aliasing or crossing of energy distribution in time–frequency diagram.

Figure 6 is the time–frequency diagram of the original Morlet wavelet transform for four signals ( $ω_{0} = π, β = 1.82$ ). Figure 7 is the time–frequency diagram of the Morlet wavelet transform for four signals ( $ω_{0} = 3 π / 2, β = 1.82$ ). Figure 8 is a time–frequency diagram of the modified Morlet wavelet transform ( $ω_{0} = 2 π, β = 1.82$ ).

In Figure 6, it is apparent that the original Morlet wavelet does not satisfy the admissibility condition, resulting in fuzzy features, aliasing, and crossing in the time–frequency map. Figure 7 has the same effect as Figure 6, and aliasing still exists in the time-frequency diagram. Figure 8 shows the time–frequency map after the improved Morlet wavelet, which might satisfy the admissibility condition and improve the feature ambiguity and overlap in the time–frequency map.

According to the above analysis, the improved Morlet wavelet can reduce the aliasing effect and improve the time–frequency map’s resolution. However, in Figure 8b,c, it is apparent that there remains a small amount of feature aliasing because the modified Morlet wavelet only possibly satisfies the admissibility condition. To further separate and deepen features, the time–frequency maps can be input into the residual network. By using residual networks, deeper nonlinear transformations and feature extraction can be utilized in time–frequency map processing. This enables more advanced feature separation and enhancement on the time–frequency map to better capture the time–frequency characteristics of the signal.

3.2. Wavelet Residual Network

For each type of working condition, the sample length is 784, and the sample number of each type is 600. The dataset is divided into a training set, a validation set, and a test set. The ratio is 0.5:0.25:0.25. Two completed experiments are detailed in this paper. Experiment 1 relates to a residual neural network, and Experiment 2 is wavelet transform, which is then input to a residual neural network.

The sample signal is normalized, and the Adam optimization algorithm is adopted. The initial learning rate of the model is 0.001, which decreases to one-tenth every five iterations. A total of 10 types of sample signals are input into the model. The ResNets model was used in Experiment 1. The accuracy of the training set was 100%, and that of the test set was 93.80%, as shown in Figure 9. In Experiment 2, the accuracy of the training set and test set was 99.15% and 99.12%, respectively, as shown in Figure 10. The results show that the accuracy of the proposed method was improved by 5.32%.

In summation, the model proposed in this paper can identify faults more accurately than ResNet under variable load conditions. The proposed method can be effectively applied to diagnosis.

Figure 11 shows the confusion matrix of the residual network training results. As apparent from the graph, prediction errors exist in the categories labeled 0, 1, 2, and 8. Specifically, the prediction accuracy rates were 94.67%, 88.00%, 77.33%, and 98.67%, respectively. Figure 12 shows the confusion matrix of the wavelet residual network’s training results. As observed in the figure, only the classification labeled as 1 has errors in the prediction, and the accuracy rate was 96.00%.

Classification performance usually varies with the number of training samples in the training set. Table 4 shows the training results for 300, 600, 900, and 1200 samples per class. The results include accuracy and loss rates for the training and validation sets. Meanwhile, Figure 13 shows the training set’s accuracy trend, while Figure 13 shows that of the validation set. These trends can be observed to determine whether the model converges during training and how well it generalizes the validation set.

As can be seen from Figure 13 and Figure 14, with increasing training samples, the accuracy of both the training and validation sets is improved, and the model performance is enhanced. However, when the sample size reaches 1200, the accuracy decreases by 0.56% and 1.85%, respectively. From the above, as the number of training samples increases, the classification model can more easily capture the overall characteristics and general rules of the data, thus reducing the risk of overfitting. However, if the number of training samples is too large, it may also cause the model to overfit the training data and perform poorly on unseen data. Therefore, there is a trade-off between overfitting and underfitting. A total of 600 sample points are selected for the following reasons: (1) as seen in Figure 14, 600 sample points for training can provide enough information for learning the characteristics and laws of data. (2) A smaller sample size means shorter training time. The time required to train a model is proportional to the size of the dataset, so choosing a smaller number of samples can significantly reduce training time and improve efficiency.

By using the t-SNE algorithm [21], Figure 15 shows the classification of data points at each level after t-SNE dimensionality reduction. Ten colors are used to represent ten categories of fault signals. By observing the relative positions and clustering structure between data points, we can analyze the classification situation more intuitively. Suppose the data points of the same category are clustered together in the dimensionality-reduced space, forming a tight cluster structure. In that case, the model classification ability in this category can be considered as good. Conversely, if data points of the same category are scattered in different regions of space or those of different categories are clustered, there may be misclassification or confusion.

Figure 15a shows the original vibration signal’s distribution. Since these signals are not processed, the data points of different categories are randomly aliased together, and no obvious classification effect can be observed. Figure 15b shows the classification results after the original Morlet wavelet transform. It is apparent that there is aliasing between the data points of 0 and 1, and there is also aliasing between the data points of 7 and 8. This shows that the original Morlet wavelet transform has not been able to effectively separate different types of data points. Figure 15c shows the classification results of the improved Morlet wavelet transform. Compared to Figure 15b, data points with different labels are more distantly separated, and data points with the same label are more closely clustered. This shows that the improved Morlet wavelet transform has achieved some improvement in reducing the aliasing effect between features. However, a small amount of aliasing can still be observed for data points 1 and 5. Figure 15d shows the classification result after further processing by the residual network. It can be seen that data points of different types are completely separated, and those of the same type are closely clustered. This indicates that the model has good classification performance after residual network processing.

In summary, Figure 15 shows the effect of different processing steps on vibration signal classification. The introduction of improved Morlet wavelet transform and residual network helps improve classification performance and feature separation so that different types of data points can be better distinguished.

4. Analysis and Discussion

Gaussian white noise is added to the original signal in order to meet the actual condition. The SNR is −10 dB, −8 dB, −6 dB, −2 dB, and 2 dB. The model’s reliability under noise conditions is verified.

The CNN convolutional layer can extract features. Under complex conditions, the features to be extracted are increasingly complex, so it is necessary to increase the number of convolution layers to improve the model’s generalization ability. However, with the increase in convolution layers, the learning difficulty of the model increases. In this paper, wavelet layers are used to combine different convolution layers, and the convolution layers are 1, 2, 4, 6, and 8. From Table 5, it can be seen that under different noise conditions, with the increase in convolution layers, the accuracy rate increases. Overall, the accuracy rate increases first and then maintains a small fluctuation. When there are four convolution layers, the comprehensive performance of the model is optimal.

To further analyze the model’s performance, CNNs, LeNets, and ResNets were selected for comparison with the proposed model. The diagnostic results for different signal-to-noise ratios are shown in Figure 16. As observable from the figure, as the noise increases, the diagnostic accuracy decreases. The accuracy of CNNs and ResNets is 60% and 63%, respectively, when the SNR is −10, and that of the present method was 10% and 3%. In conclusion, the proposed model has better anti-noise performance.

In order to further verify the generalization ability of the model, the XJTU-SY bearing data set was selected, which contained fault vibration signals of 15 rolling bearings under three different operating conditions. The test bearing was LDK UER204 rolling bearing. See Table 6 for its relevant parameters. Three types of working conditions were designed for the test, as shown in Table 7. There are five bearings under each type of working condition. Table 8 shows XJTU-SY bearing data set information.

From Table 9, it can be seen that under different noise conditions, with the increase in convolution layers, the accuracy rate increases. Overall, the accuracy rate increases first and then maintains a small fluctuation. When there are four convolution layers, the comprehensive performance of the model is optimal.

The diagnostic results for different signal-to-noise ratios are shown in Figure 17. When the signal-to-noise ratio (SNR) is −10 dB, the LeNets, CNNs, and ResNets achieved accuracy rates of 47%, 61%, and 63%, respectively. In contrast, our proposed method in this paper achieved an accuracy rate of 64%, outperforming the LeNets by 17%, CNNs by 3%, and ResNets by 1%. To sum up, the proposed method shows strong generalization ability under the validation of new data sets.

5. Conclusions

To solve the problem of the original Morlet wavelet’s inability to satisfy the admissible condition, which leads to the fuzzy and aliasing characteristics of time–frequency images, the implementation of a series of improvement measures is detailed in this paper. Firstly, the Morlet wavelet is improved such that its central frequency $ω_{0} = 2 π$ might satisfy the admissibility condition. By comparing the time–frequency characteristics of different versions of the Morlet wavelet, it becomes evident that the improved Morlet wavelet can effectively alleviate the feature ambiguity and aliasing problem. This shows that the improved Morlet wavelet possesses better expression and resolution abilities in extracting vibration signals’ time–frequency features. Secondly, to further enhance feature learning, a fusion method of a Morlet wavelet and a residual network is proposed in this paper. As a deep learning model, the residual network can improve model performance by learning residual information. The improved Morlet wavelet is input to combine with the residual neural network, which can make full use of their advantages and improve the classification ability of vibration signals. In addition, the number of training samples is compared experimentally, and 600 samples for each class are selected as the number of training samples. By training different numbers of training samples, the relationship between sample size and classification performance can be explored. After the comparative analysis, 600 samples are selected as the number of training samples for each category to reduce training time and improve efficiency while fully learning features. The results show that the proposed method achieves the optimal outcome of a deep network after about 100 iterations, and the accuracy rate is 5.32% higher after 200 iterations. The convergence speed is faster, and the accuracy rate also achieves the learning effect of a deep network.

In order to further illustrate this method’s reliability, its performance is compared with three network structures under different signal-to-noise ratios regarding noise interferences. The results show that this method has better anti-noise performance than the network structures.

Author Contributions

Conceptualization, B.H. and Q.H.; methodology, B.H.; software, B.H. and Q.H.; validation, B.H.; formal analysis, B.H.; investigation, B.H. and Q.H.; resources, B.H., L.G. and Q.H.; data curation, B.H. and Q.H.; writing—original draft preparation, B.H.; writing—review and editing, B.H., L.G. and Q.H.; visualization, B.H. and Q.H.; supervision, L.G.; project administration, B.H. and L.G.; funding acquisition, L.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (grant number 52077155). The authors acknowledge the anonymous reviewers for their helpful comments on the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The source code presented in this study is available upon request from the corresponding author.

Acknowledgments

The authors would like to thank Tiangong University for technical support and all members of our team for their contribution to the experiments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sun, R.-B.; Yang, Z.-B.; Zhai, Z.; Chen, X.-F. Sparse representation based on parametric impulsive dictionary design for bearing fault diagnosis. Mech. Syst. Signal Process. 2019, 122, 737–753. [Google Scholar] [CrossRef]
Cascales-Fulgencio, D.; Quiles-Cucarella, E.; García-Moreno, E. Computation and Statistical Analysis of Bearings’ Time- and Frequency-Domain Features Enhanced Using Cepstrum Pre-Whitening: A ML- and DL-Based Classification. Appl. Sci. 2022, 12, 10882. [Google Scholar] [CrossRef]
He, Y.; Huang, J.; Zhang, B. Approximate entropy as a nonlinear feature parameter for fault diagnosis in rotating machinery. Meas. Sci. Technol. 2012, 23, 045603. [Google Scholar] [CrossRef]
Zhao, L.-Y.; Wang, L.; Yan, R.-Q. Rolling bearing fault diagnosis based on wavelet packet decomposition and multi-scale permutation entropy. Entropy 2015, 17, 6447–6461. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, P.; Wu, D.; Li, B. Time-frequency feature extraction method based on CSLBP for bearing signals. J. Vib. Meas. Diagn. 2016, 36, 22–27. [Google Scholar]
Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
Ruan, D.; Wang, J.; Yan, J.; Gühmann, C. CNN parameter design based on fault signal analysis and its application in bearing fault diagnosis. Adv. Eng. Inform. 2023, 55, 101877. [Google Scholar] [CrossRef]
Ruan, D.; Song, X.; Gühmann, C.; Yan, J. Collaborative Optimization of CNN and GAN for Bearing Fault Diagnosis under Unbalanced Datasets. Lubricants 2021, 9, 105. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Hoang, D.-T.; Kang, H.-J. Rolling element bearing fault diagnosis using convolutional neural network and vibration image. Cogn. Syst. Res. 2019, 53, 42–50. [Google Scholar] [CrossRef]
Zhao, M.; Kang, M.; Tang, B.; Pecht, M. Deep residual networks with dynamically weighted wavelet coefficients for fault diagnosis of planetary gearboxes. IEEE Trans. Ind. Electron. 2017, 65, 4290–4300. [Google Scholar] [CrossRef]
Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (sp), San Jose, CA, USA, 22–24 May 2017. [Google Scholar]
Nguyen, A.; Yosinski, J.; Clune, J. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
Morlet, J.; Arens, G.; Fourgeau, E.; Giard, D. Wave propagation and sampling theory; Part II, Sampling theory and complex waves. Geophysics 1982, 47, 222–236. [Google Scholar] [CrossRef]
Chakraborty, A.; Okaya, D. Frequency-time decomposition of seismic data using wavelet-based methods. Geophysics 1995, 60, 1906–1916. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the 14th European Conference—Computer Vision–ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
Huang, W.; Cheng, J.; Yang, Y.; Guo, G. An improved deep convolutional neural network with multi-scale information for bearing fault diagnosis. Neurocomputing 2019, 359, 77–92. [Google Scholar] [CrossRef]
Li, P.; Kong, F.; He, Q.; Liu, Y. Multiscale slope feature extraction for rotating machinery fault diagnosis using wavelet analysis. Measurement 2013, 46, 497–505. [Google Scholar] [CrossRef]
Kumar, A.; Zhou, Y.; Gandhi, C.; Kumar, R.; Xiang, J. Bearing defect size assessment using wavelet transform based Deep Convolutional Neural Network (DCNN). Alex. Eng. J. 2020, 59, 999–1012. [Google Scholar] [CrossRef]
Yan, J.; Kan, J.; Luo, H.J. Rolling bearing fault diagnosis based on Markov transition field and residual network. Sensors 2022, 22, 3936. [Google Scholar] [CrossRef] [PubMed]
Van Der Maaten, L. Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 2014, 15, 3221–3245. [Google Scholar]

Figure 1.Approximate satisfaction of the zero-mean condition.

Figure 2.Residual block structure. (a) before optimization (b) optimized structure.

Figure 3.Residual network structure.

Figure 4.The experimental test rig.

Parameters	Before Improvement	Improved
Normalization factor	1	$β / \sqrt{2 π}$
$Center frequency ω_{0}$	$π$	$2 π$
Parameter $β$ range	[0.1, 4]	[0.5, 1.82]
$β$ Number of test values	40	7

Layer	Kernel_Size/ Stride	Filters	Output Channels	Output Size	Padding
Conv-1	3 × 3/1	8	8	52 × 52 × 8	yes
Pooling-1	2 × 2/2	/	8	26 × 26 × 8	yes
Conv-2	3 × 3/1	16	/	26 × 26 × 16	yes
Pooling-2	2 × 2/2	/	/	13 × 13 × 16	yes
RB1	3 × 3/1	16	/	13 × 13 × 16	yes
RB2	3 × 3/1	16	/	13 × 13 × 16	yes
Dense	/	/	32	1	/
Output	/	/	10	1	/

Index	State	Label
c1	0.007 mm inner ring fault	0
c2	0.014 mm inner ring fault	1
c3	0.021 mm inner ring fault	2
c4	0.007 mm outer ring fault	3
c5	0.014 mm outer ring fault	4
c6	0.021 mm outer ring fault	5
c7	0.007 mm roller fault	6
c8	0.014 mm roller fault	7
c9	0.021 mm roller fault	8
c10	normal	9

Number of Samples	Accuracy	Loss	Val_Accuracy	Val_Loss
300	99.00%	0.12%	98.80%	2.05%
600	99.15%	0.10%	99.12%	1.72%
900	99.50%	0.07%	99.25%	1.84%
1200	98.94%	0.09%	97.40%	2.31%

SNR/dB
	−10	−8	−6	−2	2
1	72.35%	84.50%	91.02%	98.80%	98.35%
2	73.26%	85.31%	91.87%	99.05%	99.75%
4	75.28%	88.69%	93.52%	99.43%	99.89%
6	75.77%	88.45%	93.48%	99.00%	99.69%
8	75.60%	88.52%	93.11%	98.31%	98.88%

Parameter Name	Numerical	Parameter Name	Numerical
Inner ring diameter/mm	29.30	Ball diameter/mm	7.92
Outer ring diameter/mm	39.80	Number of balls	8
Center diameter/mm	34.55	Contact angle/(°)	0
Dynamic load rating/N	12,820	Static load rating/kN	6.65

Conditions	Data Set	Number of Samples	Fault Location
1	Bearing 1_1	123	outer ring
	Bearing 1_2	161	outer ring
	Bearing 1_3	158	outer ring
	Bearing 1_4	122	bearing cage
	Bearing 1_5	52	inner ring, outer ring
2	Bearing 2_1	491	inner ring
	Bearing 2_2	161	outer ring
	Bearing 2_3	533	bearing cage
	Bearing 2_4	42	outer ring
	Bearing 2_5	339	outer ring
3	Bearing 3_1	2538	outer ring
	Bearing 3_2	2496	inner ring, outer ring, rolling elements
	Bearing 3_3	371	inner ring
	Bearing 3_4	1515	inner ring
	Bearing 3_5	114	outer ring

Condition No.	1	2	3
Speed/(r/min)	2100	2250	2400
Radial force/kN	12	11	10

SNR/dB
	−10	−8	−6	−2	2
1	68.30%	67.54%	76.44%	83.55%	94.58%
2	69.15%	69.22%	79.63%	85.11%	95.37%
4	71.40%	72.71%	80.20%	91.32%	97.10%
6	68.97%	75.90%	81.43%	90.60%	98.24%
8	70.20%	73.50%	80.00%	88.68%	96.39%