Enhanced anomaly detection and normal behaviour power curve modelling in wind farm SCADA data: A hybrid approach

doi:10.21203/rs.3.rs-5288737/v1

To achieve optimal performance and reduce the maintenance cost of wind turbines, anomaly detection and power curve modelling are crucial. The supervisory control and data acquisition (SCADA) system provides continuous and real-time data insights by collecting different wind-turbine operational parameters. This study introduces a novel strategy combining the strengths of Isolation Forest (iForest) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to identify and isolate anomalous data. The hybrid iForest-DBSCAN model processes enormous amounts of SCADA data to detect outliers and anomalies of wind turbines under different operating conditions. By utilizing normal data with minimum anomalies, normal behavious power curves (NBPC) were modelled using a robust Locally Estimated Scattered Smoothing (LOESS) technique. Robust power curves allow us to compare the performances of wind turbines and ensure an optimized function with minimum maintenance. Different datasets validated the proposed method with higher accuracy and fewer computational resource requirements than traditional methods. From the two wind farms, the iForest-DBSCAN model identified the datasets effectively while successfully generating NBPC with a 95% confidence interval. This study demonstrated the effectiveness of cutting-edge data-driven models and techniques for optimizing the efficiency and performance of wind farms.

Wind energy

data analysis

anomalies detection

hybrid model

power curve

wind turbine optimization

data cleaning

SCADA

Energy plays a crucial role in modern societies and economics, and its significance is widespread as an electricity-powered technology worldwide. Electricity production remains the world’s leading source of carbon dioxide emissions; however, the rapid development of renewable sources, especially solar and wind power, facilitates the transition to carbon neutrality (IEA, 2024). The global wind energy industry has installed a record 117 GW of wind energy to become the largest capacity in 2023, and China is the leading country with 75 GW of installation (GWEC, 2024). Operation and maintenance (O&M) have a significant impact on the lifespan of wind farms and considerably affect the localized cost of energy owing to the regular inspection, repair, and replacement of faulty parts (Ren et al., 2021). The O&M of fast-growing wind farms has gained extensive attention to ensure the maximum availability, condition monitoring, and anomaly detection of wind turbines using advanced methods (Chen et al., 2020). To avoid disastrous operational mishaps, regular monitoring is necessary to improve the workability of wind turbines (Xiang et al., 2022). Thus, to prevent economic losses and foster the establishment of wind farms, wind turbine monitoring and anomaly detection are necessary for operational and maintenance plans to achieve the maximum energy production (Qiao & Lu, 2015; Q. Yao et al., 2023). Therefore, improving efficiency and optimizing the wind turbine operation are essential for acknowledging the wind energy transition toward the renewable energy landscape.

Supervisory Control and Data Acquisition (SCADA) systems have been installed in each wind farm to acquire an excessive quantity of data continuously generated for the real-time control and operation of wind turbines. However, these data have various uncertainties, owing to different factors. Currently, two types of methodologies are adopted for anomaly detection and wind turbine monitoring: physical or mathematical modelling and data-driven modelling. Physically based models of wind turbines and wind energy systems are difficult to achieve (Q. Yao et al., 2023; Zhang et al., 2022). Therefore, the feasibility of using SCADA data for useful information has become evident with the advancement of big data technology and artificial intelligence. Data-driven methods are more practical and flexible than knowledge-based methods. The real-time monitoring and analysis of data obtained from SCADA provides unique leverage for detecting and identifying inefficiencies, facilitating maintenance, and helping wind farm management.

Anomalies are data that misfit the normal pattern of the operational data of the wind turbines. In SCDA, data anomalies are classified as anomalies with zero power values at wind speeds greater than the cut-in speed, positive and continuous power values at wind speeds less than the rated wind speed, and randomly scattered power values around the power curve (Chandola et al., 2009; Lin et al., 2020; Morrison et al., 2022). After processing the SCADA data for anomaly detection associated with failure or malfunction and removal, normal or healthier data can be used to model the normal behavior of the power curve of a wind turbine. SCADA data consists of many parameters acquired at different frequencies to identify anomaly detection and condition monitoring based on power curves using SCADA data. In the past, various data-driven methods were employed to detect abnormal data, and refined operating data were used to obtain the normal-behavior wind turbine power curve. The following are the most recent studies that deal with anomalies and power curve modelling based on SCADA data. Ohunakin et al. (2024) applied the Kolmogorov-Smirnov method along with a quantile-based filtration technique for anomaly detection and power curves. Zhang et al. (2024) used a support vector machine and k-nearest neighbor to identify anomalous signals and the normal operational behavior of wind turbines. Marti-Puig et al. (2024) used a filtration approach based on a given manufacturer curve to delete anomalous data, and power curve estimation was performed using an artificial neural network. Zhang et al. (2024) proposed a long short-term memory-based asymmetric variational autoencoding Gaussian mixture model method for anomaly detection and a Shapley Additive explanation technique to determine the underlying reasons. Dao et al. (2024) introduced a sliding window principle for wind turbine anomalies identification and condition monitoring using SCADA data. The Dickey Fuller technique was utilized to monitor different operating parameters without using normal models. In this method fault and anomalous data lies in accumulation of data points of SCADA over the stationary window. Feng et al. (2023) evaluated wind turbine wind speed power characteristics using kernel density estimation and the Gaussian distribution assumption for outlier detection and distribution of wind power outputs, respectively. Yao et al. (2023) used the Thompson tau-local outlier factor to clean SCADA data for anomaly characterization and assessment. Similarly, Morrison et al. (2022) used the isolation forest (iForest), local outlier factor, Gaussian mixture models, and K-nearest neighbors methods to clean SCADA data to evaluate wind turbine performance and as input for forecasting models. Du et al. (2023) developed a multi turbines approach by merging different variables obtained from various wind turbines located on same wind farm. Their approach was based on wind turbine clustering, faults detection and finally using autoregressive neural network model was developed for single wind turbine. The anomalies were identified using residual values between the output of all wind turbines and targeted wind turbines that lie in same cluster. An artificial neural networking model by Santolamazza et al. (2021) was also tested using SCADA data to identify anomalies based on the behavior of main components of wind turbine. Liu et al. (2020) also used k-mean clustering with Silhouette coefficient. The dimension reduction of wind turbine attributes to classify the abnormal and normal data with 0 and 1. The method was validated using three sets of SCADA data to verify the proposed framwork based on features extraction. Liu et al. (2023) structured a triplet-convolutional deep autoencoder to identify abnormal and normal data behaviors based on discriminative deep-embedding features. Wang et al. (2022) presented a data driven hybrid framework using anti interference cascade for condition monitoring and lightGBM for anomaly classification of wind turbine, respectively. This method concluded that condition monitoring has more capability of anomalies recognition impacted with lighGBM abnormality classifier. Mckinnon et al. (2020) compared performance of different data driven models including isolation forest, support vector machine and elliptical envelope with same parameter and working under different condition. Similarly, Moreno et al. (2020) also compared various machine learning approaches for anomalies detection based on wind power curves. The methods tested on small dataset of SCADA to assess abnormality of data signals. All the above-mentioned models like kolmogorove tests, SVP, k-NN etc. cannot handled the complex monitoring data obtained from wind turbines working under dynamic environmental conditions. The assumption is that acquired SCADA data distribution may not hold dynamic operating condition of wind turbines. Other methods like neural network and autoencoders are computationally very expensive, highly sensitive to hyper parameters and cause overfitting.

To overcome the limitations of the above conventional data driven anomaly detection techniques, it is essential to develop a more sophisticated and accurate methodology. The newly developed method can deal with complex and challenging wind turbines operating under different environmental factors. The main scope of this research study is to develop an efficient and accurate methodology to detect anomalies in SCADA data and model normal behavior power curves based on normal data points. Therefore, it is necessary to reduce the operation and maintenance costs and optimize the performance of wind farms. The combination of isolation forest and density-based spatial clustering of applications with noise (DBSCAN) significantly improves the accuracy and efficiency of detecting anomalies in SCADA database. After identifying the anomalies, a robust wind-power curve modelling technique was applied to obtain a smooth power curve under different operational circumstances. This methodology analyzes the performances of different wind turbines under various operating conditions. A comparative analysis of wind turbine performance offers an understanding of the influence of environmental and mechanical factors on the anomalies and power curves. This study laid the foundation for identifying low-performing wind turbines and strategies for predictive maintenance based on the deviation of the normal-behavior power curve and detected anomalous data.

The following novelties are articulated for this study

The isolation forest is a decision-tree-based method that is very quick and requires minimal computational resources. The DBSCAN method can efficiently handle extremely complex SCADA data clusters based on arbitrary sizes and shapes. By leveraging the strengths of both techniques, a hybrid model combining them provides a robust and unique method for isolating and detecting anomalous data compared with traditional approaches.

Compared to other traditional approaches, the proposed model iForest-DBSCAN offers interpretable and accurate normal behavior power curve of wind turbines operating under different conditions, as compared to other curve estimation techniques.

Based on clusters and decision-based trees to identify anomalous data, this approach provides a cleaner database for power curve modelling and performance analysis of wind turbines based on faults and malfunctions.

The methodology used in this study is illustrated in Fig. 2. It consists of the following four parts.

Treatment of raw data for missing values

Data filtration

Development of a hybrid model for anomaly detection

Modelling of normal behaviour wind turbine power curve

Treatment for missing values

SCADA data contains entries for various reasons during data collection at the wind farm. Before processing the available data, it was necessary to treat missing values. In this study, the missing values of the SCADA data entries were treated using MATLAB using the shape-preserving cubic interpolation (PCHIP) method. On segments (a, b) with grid nodes a = x_o<x₁<x₂……x_n+1 = b, the values of the function can be written as (Romadanova, 2023; Volkov et al., 2010):

$\:\left({x}_{i},{F}_{i}\right),i=0,\:.\:\dots\:,\:n+1$ 1

It can be written with the following notations

$$\:{F}_{i}\left[{x}_{i},\:{x}_{i+1}\right]=\frac{\left({F}_{i+1}-{F}_{i}\right)}{{h}_{i}}\:,\:{h}_{i}={x}_{i+1}-{x}_{i}\:,\:i=0,\dots\:..,\:n$$

If $\:{F}_{i}\left[{x}_{i},\:{x}_{i+1}\right]\ge\:0\:,\:i=0,\dots\:\dots\:.\:n.$ it is called a monotonic function in Eq. 1. For the shape-preserving interpolation function S(x_i) = F_i, i = 0, …., n+1, S was monotonic at the initial value of the data. For a cubic spline function with weight w_i > 0, I = 0,….., n for function S, satisfying the following condition:

Function S should be a cubic polynomial.

Sϵ C^k {a, b}, k ≥ 1;

w _i+1 S” (x_i^-) = w_i+1 S” (x_i⁺), i = 0,....,n.

For shape preserve cubic spline interpolation function can be written as:

$$\:S\:\left(x\right)={F}_{i},\:i=0,\dots\:.,n+1$$

For spline function following boundary conditions and constraints has been used

Establishing the value of the boundary using the first derivative, S’(a) = F_o^’, S’(b) = F_n+1^’
Establishing the value of the boundary with the second derivative, S’’(a) = F_o^’’, S’’(b) = F_n+1^’’

Data filtration

SCADA data were filtered out for incorrect and obvious anomalies during the measurement campaign owing to faults and data collected during non-operational times. The data obtained during the operation were against a blade pitch of 30 °and the cut-in and cut-out wind speeds.

Development of a hybrid model for anomaly detection

A hybrid data-driven model based on isolation forests and density-based spatial clustering of applications with noise (DBSCAN) was proposed to identify anomalies using the SCADA database. The isolation forest first isolates data points that are few and different as outliers based on the values of the random decision tree structure. Then, DBSCAN analyzes the clusters of data points to identify the regions of the data points with varying densities and shapes and sizes.

Based on the capacity to recognize anomalies instead of profiling, the normal dataset of the SCADA isolation forest is different from that of other popular methods. This is a relatively new method introduced and developed by Liu et al. (2008). In general, normal data points are more frequently compared with abnormal values, which differ from each other and are called outliers. In one feature space, abnormal data points lie far from regular data values, which is why they have fewer partitions to identify. In contrast, normal data points require several splits to be isolated.

The isolation forest method is a tree-based approach consisting of decision trees. Based on the selected features, values were randomly selected and split between the minimum and maximum values (Lin et al., 2020). Isolation forests use anomaly scores for decision-making. For instance, X, out of all instances, the N anomaly score is defined as

$\:AS\left(X,N\right)={2}^{-\frac{E\left(h\left(X\right)\right)}{C\left(N\right)}}$ 2

Where E(h(X)) is the average path length of X across all isolation trees, h (X) is the path length of point X, and C (X) is the average path length of the unsuccessful binary tree search.

In this study, If the S value is 1, it is called an anomaly, whereas if the value of S is -1 it is called an outlier. The forest isolation method is computationally inexpensive and rapid. This was implemented using a sklearn-ensemble-isolation-forest library in Python (Pedregosa et al., 2011).

Anomaly detection clustering is an important unsupervised learning method. The clustering method uses two approaches based on distance and density. Distance-based clustering deals with data that have a spherical structure but becomes inefficient when the data have a non-spherical structure (Zhao et al., 2018). However, density-based clustering techniques can handle the shape of a dataset. In this approach, datasets with high-density regions can be easily differentiated from low-density regions of low density (Kusiak et al., 2009; Yesilbudak, 2018).

DBSCAN works on the concept of using density to cluster the data without requiring a specific number of clusters. The DBSCAN method first divides the dataset into different-density regions to identify clusters of random shapes and sizes. Clusters are a set of data points coupled with density. These points or nodes are called core, boundary, and noise points. The data points are core points (P_c) consisting of at least a minimum number of point samples (P_min) in a cluster with a maximum radius (Eps). Boundary points are the data points in the neighborhoods but have points less than P_min. The noise points were neither core nor boundary points. An illustration of the data points and DBSCAN methodology framework of DBSCAN are shown in Fig. 1.

Modelling of normal behaviour wind turbine power curve

After processing the data for missing values and removing most outliers and anomalies, a normal dataset was used for power curve modelling. Normal data were used as inputs to obtain a robust normal-behavior power curve for wind turbines. To obtain normal behavior, a locally weighted regression method was applied. Each smoothed point was obtained using neighboring data points within the selected span. For each data point, the regression weight within a given span was calculated using Eq. 3 (Bilendo et al., 2022).

$\:{w}_{i}={\left(1-{\left|\frac{x-{x}_{i}}{{d}_{x}}\right|}^{3}\right)}^{3}$ 3

where x_i denotes the nearest neighborhood of x in the span and x represents the predictor values associated with the point to be smoothed. In the above expression, d(x) represents the horizontal distance between the furthest predictor point and x on span. To find the upper and lower limit of robust power curve following expression can be used as shown in Equations 4 and 5.

$\:{U}_{L}={w}_{i}+n\left(\frac{{\sigma\:}_{{w}_{i}}}{k}\right)$ 4

$\:{L}_{L}={w}_{i}-n\left(\frac{{\sigma\:}_{{w}_{i}}}{k}\right)$ 5

Where U_L is the upper limit, L_L is the lower limit, σ_wi is the standard deviation of w_i for the normal-behavior power curve. The control of the limit is adjusted by n which represents an integer multiple. k represents a data sample for a given span.

The SCADA data used in this study were collected from three wind farms for one year at ten-minute time intervals. The measurement campaign was planned for one and a half years for two different wind farms with capacities of 4 MW. There was a three-bladed wind turbine with a 4000 KW rated power capacity. The cut-in, cut-out, and rated wind speeds were 3 m/s, 20 m/s, and 8.5 m/s.

3.1 Operating conditions of wind turbine

The operating conditions of wind turbines vary owing to their complex structures and nonlinear and constantly changing environment. Many features and data related to the condition of wind turbines can be found in SCADA data, which can be used for different analyses. The current study obtained the operational parameters from the SCADA system, particularly wind speed, wind direction, rotor speed, temperature, yaw error, blade pitch angle, and power. By considering these parameters, the overall operation of a wind turbine can be divided into two operational conditions: normal operation, and operation at rate or above, as shown in Table 1. During normal operation, the wind speed changes from 3-17.5 m/s and is regulated with a pitch angle of 0–90 degrees with a rotor speed of 10–16 RPM. The wind turbine operated at rated and above wind speeds with a pitch control mechanism that varies from 0–30 degrees to maintain a rotor speed of 16 RPM. Based on the operational environment, wind turbines produce different powers each month owing to different operational scenarios.

Table 1: Operating condition of wind turbine

Operational Scenarios	Wind Speed (m/s)	Rotor Speed (RPM)	Blade Pitch Angle (°)	Power (kw)
Normal: Windfarm 1 & 2	3-17.5	10–16	0–90	0-4000
Operated at Rated or Above	7–20	16	0–30	> 4000

3.2 Wind velocity and power distribution of monitoring data

Figure 3 demonstrates the distribution of wind velocity of wind turbines T₁, T₂ and T₃ in wind farm 1. Among the turbines in wind farm 1 T₁ has more fluctuation of power values with wind velocity that indicates more variabilities and spread of operational inconsistency. On the other hand, T₂ and T₃ have more stable and narrow distribution of power with wind velocity. More variabilities show the sensor failure, wake effects and mis communication of sensor measurements during data collection.

Similarly, Fig. 4 presents the distribution of wind power with wind velocity for wind farm 2 by comparing two different cases. Case 2 has a more stable operational scenario as compared to case 1. It shows that wind farm 2 case 2 and wind turbine T1 has more spread and variability of environmental factors chances of mechanical faults are more likely to happen as compare to other.

Dataset 1

SCADA dataset 1, collected from wind farm 1, was processed according to the methodology described in Section 2. The database consisted of three different types, each with erroneous and anomalous characteristics. The original data plots for the wind turbines T1, T2, and T3 are shown in the figure.

For wind farm 1, three turbines were inspected using the measurement data. The data were processed using the methodology discussed in Section 2. The model parameters adopted for the iForest-DBSCAN method are presented in Table 2. Inspecting Fig. 6, Fig. 7 and Fig. 8 (left side), all the steps were applied simultaneously to determine the anomalous data and model a smooth power curve. The red points show the outliers and anomalous data points, and the blue points show the normal data to be used as input for modelling the smooth or normal-behavior power curve. Anomalous data show that these erroneous behaviors are due to sensor failure, wake effects, instrumental faults, or accuracy degradation.

The data spread in T1 was greater than that in T2, and T3 had the least spread. Similarly, the modelling of smooth power curves or normal behavior power curves is shown on the right side of Fig. 6, Fig. 7 and Fig. 8. The normal behavior power curve (NBPC) is plotted with a 95% confidence interval, as shown in upper and lower bound limits. The NBPC showed similar and consistent trends for all three turbines, demonstrating the effectiveness of the hybrid model applied for power curve modelling. This method is highly effective and computationally inexpensive for modelling a power curve using a data-driven technique.

Wind turbine T1 shows more variation in operational and inconsistent environmental effects that cause the wide spread of data points in SCADA data. This turbine has maximum anomalies compared to those of T2 and T3. T2 has fewer anomalies and indicates stable operation. T3 had the least number of anomalies, and the operational condition was the most stable with the minimum deviation.

After processing the original data for modelling the power curve, T1 had a wider normal behavior data spread with higher uncertainty in the power curve, as shown in Fig. 6 (right). The power curve for T2 has a narrow spread for normal data with a smaller amount of irregularity in the data compared with T1, as shown Fig. 7 (right). Among the three turbines, T3 has the finest and smoothest power curve with the least spread of normal data, as shown in Fig. 8 (right). T1 required challenging maintenance and monitoring for operational and environmental control compared to T2 and T3 for decision-making to ensure optimal performance and monitoring of the wind farm.

Dataset 2

SCADA dataset 2 was collected from Windfarm 2, which has the same wind turbine capacity as Windfarm 1. The wind turbine behavior was analyzed, and the proposed methodology was applied to two different cases with different data trends. The data were divided into two types based on the behavior denoted in Cases 1 and 2. Data processing, anomalous data detection, and power curve modelling are shown in Fig. 9 and Fig. 10, respectively.

Case 1

Following the methodology illustrated in Fig. 2, a hybrid approach of the isolation forest and DBSCAN techniques was applied to detect and remove anomalous data. Figure 9 (a) shows the original SCADA data plot of wind speed and wind power. The data spread indicated that the anomalies were possibly due to wake effects, sensor failure, or other instrumental malfunctions. As shown in Fig. 9 (b), the data filtration and isolation forest techniques were applied to obtain a partially smoother curve. In the next step As shown in Fig. 9 (c), after data filtration and outlier removal, further anomalies were detected using the clustering approach, which confirmed the effectiveness of the iForest-DBSCAN model. In Fig. 9 (d), a normal behavior power curve is obtained with a 95% confidence interval, and the upper and lower bounds of the power output data represent the operation of the wind turbines under different natural environmental changes. Finally, Fig. 9 (d) and (e) show the robust smoothed power curves of the wind turbines after data cleaning, filtration, and anomalous data detection. The anomalous data detection process used to extract the normal power data and robust normal behavior power curve modelling authenticates the effectiveness and accuracy of the proposed methodology.

Case 2

Compared with Case 1, the behavior and trends of the wind turbine data differ. In this case, the operational conditions of the wind turbine vary with respect to the anomalies. The same approach as that used in Case 1 was applied. As discussed in Case 1 (Fig. 12 (a)), the original SCADA data are represented under different operational conditions. The data were filtered and outliers were removed, as shown in Fig. 12 (b). The gray color shows the filtered data owing to the miscommunication or fault of the sensors. Outliers are represented in red, and normal data are shown in blue. Compared with Case 1, more anomalies were clustered, as shown in Fig. 12 (c). Although the data behavior was different in case two, the iForest-DBSCAN model properly evaluated the anomalous data and captured the normal behavior of the wind turbine data to obtain a smooth power curve, as shown in Fig. 12 (d) and (e).

The proposed hybrid model was effectively applied to remove and detect anomalous data, although the behavior and data spread differed in both cases. The power output was low at near-rated wind speeds in case 2, but the final model of the power curve in both cases was closely aligned. The main difference lies in the operational characteristics of both cases, in terms of the spread and types of anomalous data.

Tunning parameters used in proposed hybrid models have been shown in Table 2.

Table 2

Model parameters for iForest-DBSCAN
Model/Algorithm	Parameters	Adopted values Speed
iForest	Contamination	0.023–0.038
	Max. samples	Auto
	Max. features	All
	Random state	None
	No of estimators	100
DBSCAN	eps	0.031–0.055
	Min no of sample	8
	metric	Euclidean
	algorithm	Auto
	Leaf size	30
	Power parameter	2 (Euclidean)
	Standardization	Applied

Table 3

Performance metric for iForest-DBSCAN model
Cases Names	Accuracy	Precision	F1-score	Recall
Windfarm 1	0.998	0.986	0.9876	0.973
Windfarm2 case 1	0.9912	0.9830	0.9830	0.9710
Windfarm 2 case 2	0.9891	0.9801	0.9789	0.9701

Figure 11 presents the evaluation of the iForest-DBSCAN method applied to detect anomalies in SCADA data. The confusion matrix highlights the performance of the proposed model in detecting anomalies from the acquired dataset. Figure 11 (T1) shows more data spread than for the T2 and T3 turbines. More false positive and false negative data points in T1 confirmed the inconsistency and more variations in the turbine operating conditions with more anomalous data. In contrast, T2 and T3 had a smaller number of anomalies, indicating more stable environmental conditions.

Figure 12 illustrates the performance of iForest-DBSCAN in wind farm 2 case1 and case 2. In both cases, the model performed well in detecting the anomalies. Data spread and anomaly detection were more manageable in Case 1 than in Case 2. In Case 2, the operating conditions varied significantly and had more false positives and false negatives than in Case 1. Despite the irregularities and more anomalies in case 2, the proposed model still performed well in detecting anomalies and normal behavior power curve modelling.

The results obtained from both wind farms confirm the effectiveness of the proposed methodology under various environmental conditions for wind turbines. However, some datasets are challenging owing to their greater spread. iForest-DBSCAN continuously demonstrated high accuracy and precision for use with real-time data obtained from SCADA.

This study aims to develop a robust and efficient model to detect anomalies and develop a normal behavior power curve using the SCADA database. The proposed hybrid iForest-DBSCAN methodology effectively identified anomalous and normal data with high precision and accuracy. Using normal data, a locally estimated scattered smoothing (LOESS)-based technique is applied to model the normal behavior of the power curve under various operating conditions. Based on the SCADA data used in this study, the following conclusions were drawn:

By combining the strengths of both the algorithms, the proposed hybrid model provides more reliable and effective results. Based on the decision tree structure, the iForest model detects anomalous data with minimal computational resources, making it useful for real-time anomaly detection. Similarly, the ability of DBSCAN to detect clusters of different shapes and sizes for complex and nonlinear datasets further enhances its methodology.
It is beneficial to remove incorrect and obvious datasets from SCADA data before applying the anomaly detection method to improve efficiency. It is also important to select an effective threshold for pitch angles based on wind turbine operation.
The proposed hybrid iForest-DBSCAN method performed exceptionally well for the different types of datasets. This indicates that the method accurately distinguished normal and anomalous data, even though the wind turbines were operating under different environmental conditions and operating factors.
After identifying the anomalies, LOESS regression technique was applied to model and develop a normal behavior power curve (NBPC) for each wind turbine. This technique successfully modelled the power curve with a 95% confidence interval under different operational scenarios and data variabilities.
The application of this research is its potential for wind farm maintenance by detecting anomalies and deviations in the power curves. This methodology can identify faults before failure, such as anomalies and power curve spread, indicating careful monitoring and maintenance, as in T1. This approach will contribute to optimizing wind farms and reducing maintenance and operational expenses, with timely proactive strategies for maintenance and fault/anomaly management.

Declaration of Competing Interest

The authors declare that any known competing financial interests or personal relationships could have influenced none of the work reported in this paper.

Author Contribution

Conceptualization, Z.M. and Z.W.; methodology, Z.M.; software, Z.M.; validation, Z.M., Z.W.; formal analysis, Z.M..; investigation, Z.M.; resources, Z.W.; data curation, Z.W.; writing—original draft preparation, Z.M.; writing—review and editing, Z.M.; visualization, Z.M..; supervision, Z.W.; project administration, Z.W.; funding acquisition, Z.W. All authors have read and agreed to the published version of the manuscript.

Data availability

Data will be made available on request.

Bilendo F, Badihi H, Lu N, Cambron P, Jiang B (2022) Power Curve-Based Fault Detection Method for Wind Turbines. IFAC-PapersOnLine 55(6):408–413. https://doi.org/https://doi.org/10.1016/j.ifacol.2022.07.163
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: A survey. ACM Comput Surv 41(3). Article 15. https://doi.org/10.1145/1541880.1541882
Chen J, Li J, Chen W, Wang Y, Jiang T (2020) Anomaly detection for wind turbines based on the reconstruction of condition parameters using stacked denoising autoencoders. Renewable Energy 147:1469–1480. https://doi.org/https://doi.org/10.1016/j.renene.2019.09.041
Dao PB, Barszcz T, Staszewski WJ (2024) Anomaly detection of wind turbines based on stationarity analysis of SCADA data. Renewable Energy 232:121076. https://doi.org/https://doi.org/10.1016/j.renene.2024.121076
Du B, Narusue Y, Furusawa Y, Nishihara N, Indo K, Morikawa H, Iida M (2023) Clustering Wind Turbines for SCADA Data-Based Fault Detection. IEEE Trans Sustain Energy 14(1):442–452. https://doi.org/10.1109/TSTE.2022.3215672
Feng C, Liu C, Jiang D, Kong D, Zhang W (2023) Multivariate Anomaly Detection and Early Warning Framework for Wind Turbine Condition Monitoring Using SCADA Data [Article]. J Energy Engineering 149(5). Article 04023028. https://doi.org/10.1061/JLEED9.EYENG-4843
GWEC (2024) GWEC. GLOBAL WIND REPORT. G. W. E. COUNCIL
IEA (2024) Electricity 2024, Analysis and forecast to 2026. I. ENERGY & AGENCY
Kusiak A, Zheng H, Song Z (2009) Models for monitoring wind farm power. Renewable Energy 34(3):583–590. https://doi.org/https://doi.org/10.1016/j.renene.2008.05.032
Lin Z, Liu X, Collu M (2020) Wind power prediction based on high-frequency SCADA data along with isolation forest and deep learning neural networks. Int J Electr Power Energy Syst 118:105835. https://doi.org/https://doi.org/10.1016/j.ijepes.2020.105835
Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. Proceedings - IEEE International Conference on Data Mining, ICDM
Liu J, Yang G, Li X, Wang Q, He Y, Yang X (2023) Wind turbine anomaly detection based on SCADA: A deep autoencoder enhanced by fault instances. ISA Trans 139:586–605. https://doi.org/https://doi.org/10.1016/j.isatra.2023.03.045
Liu X, Lu S, Ren Y, Wu Z (2020) Wind Turbine Anomaly Detection Based on SCADA Data Mining. Electronics 9(5):751. https://www.mdpi.com/2079-9292/9/5/751
Marti-Puig P, Hernández JÁ, Solé-Casals J, Serra-Serra M (2024) Enhancing Reliability in Wind Turbine Power Curve Estimation [Article]. Appl Sci (Switzerland) 14(6). Article 2479. https://doi.org/10.3390/app14062479
McKinnon C, Carroll J, McDonald A, Koukoura S, Infield D, Soraghan C (2020) Comparison of New Anomaly Detection Technique for Wind Turbine Condition Monitoring Using Gearbox SCADA Data. Energies, 13(19), 5152. https://www.mdpi.com/1996-1073/13/19/5152
Moreno SR, Coelho LdS, Ayala HVH, Mariani VC (2020) Wind turbines anomaly detection based on power curves and ensemble learning. IET Renew Power Gener 14(19):4086–4093. https://doi.org/https://doi.org/10.1049/iet-rpg.2020.0224
Morrison R, Liu X, Lin Z (2022) Anomaly detection in wind turbine SCADA data for power curve cleaning. Renewable Energy 184:473–486. https://doi.org/https://doi.org/10.1016/j.renene.2021.11.118
Ohunakin OS, Henry EU, Matthew OJ, Ezekiel VU, Adelekan DS, Oyeniran AT (2024) Conditional monitoring and fault detection of wind turbines based on Kolmogorov–Smirnov non-parametric test. Energy Rep 11:2577–2591. https://doi.org/https://doi.org/10.1016/j.egyr.2024.01.081
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
Qiao W, Lu D (2015) A Survey on Wind Turbine Condition Monitoring and Fault Diagnosis - Part II: Signals and Signal Processing Methods [Review]. IEEE Trans Industr Electron 62(10):6546–6557 Article 7084650. https://doi.org/10.1109/TIE.2015.2422394
Ren Z, Verma AS, Li Y, Teuwen JJE, Jiang Z (2021) Offshore wind turbine operations and maintenance: A state-of-the-art review. Renew Sustain Energy Rev 144:110886. https://doi.org/https://doi.org/10.1016/j.rser.2021.110886
Romadanova M (2023) Wind velocity data interpolation using a weighted cubic spline. E3S Web of Conferences
Santolamazza A, Dadi D, Introna V (2021) A Data-Mining Approach for Wind Turbine Fault Detection Based on SCADA Data Analysis Using Artificial Neural Networks. Energies, 14(7), 1845. https://www.mdpi.com/1996-1073/14/7/1845
Volkov YS, Bogdanov VV, Miroshnichenko VL, Shevaldin VT (2010) Shape-preserving interpolation by cubic splines. Math Notes 88(5):798–805. https://doi.org/10.1134/S0001434610110209
Wang L, Jia S, Yan X, Ma L, Fang J (2022) A SCADA-Data-Driven Condition Monitoring Method of Wind Turbine Generators. IEEE Access 10:67532–67540. https://doi.org/10.1109/ACCESS.2022.3185259
Xiang L, Yang X, Hu A, Su H, Wang P (2022) Condition monitoring and anomaly detection of wind turbine based on cascaded and bidirectional deep learning networks. Appl Energy 305:117925. https://doi.org/https://doi.org/10.1016/j.apenergy.2021.117925
Yao Q, Hu Y, Liu J, Zhao T, Qi X, Sun S (2023) Power Curve Modeling for Wind Turbine Using Hybrid-driven Outlier Detection Method. J Mod Power Syst Clean Energy 11(4):1115–1125. https://doi.org/10.35833/MPCE.2021.000769
Yao Q, Zhu H, Xiang L, Su H, Hu A (2023) A novel composed method of cleaning anomy data for improving state prediction of wind turbine. Renewable Energy 204:131–140. https://doi.org/https://doi.org/10.1016/j.renene.2022.12.118
Yesilbudak M (2018) Implementation of novel hybrid approaches for power curve modeling of wind turbines. Energy Conv Manag 171:156–169. https://doi.org/https://doi.org/10.1016/j.enconman.2018.05.092
Zhang C, Hu D, Yang T (2022) Anomaly detection and diagnosis for wind turbines using long short-term memory-based stacked denoising autoencoders and XGBoost. Reliab Eng Syst Saf 222:108445. https://doi.org/https://doi.org/10.1016/j.ress.2022.108445
Zhang C, Hu D, Yang T (2024) Research of artificial intelligence operations for wind turbines considering anomaly detection, root cause analysis, and incremental training. Reliab Eng Syst Saf 241:109634. https://doi.org/https://doi.org/10.1016/j.ress.2023.109634
Zhang S, Robinson E, Basu M (2024) Wind turbine condition monitoring based on three fitted performance curves. Wind Energy 27(5):429–446. https://doi.org/https://doi.org/10.1002/we.2859
Zhao Y, Ye L, Wang W, Sun H, Ju Y, Tang Y (2018) Data-Driven Correction Approach to Refine Power Curve of Wind Farm Under Wind Curtailment. IEEE Trans Sustain Energy 9:95–105

No competing interests reported.

Enhanced anomaly detection and normal behaviour power curve modelling in wind farm SCADA data: A hybrid approach

Status:

Version 1

Abstract

Figures

1. Introduction

2. Methodology

3. Data Descriptions

3.1 Operating conditions of wind turbine

3.2 Wind velocity and power distribution of monitoring data

4. Results and Discussion

5. Conclusions

Declarations

Declaration of Competing Interest

Author Contribution

Data availability

References

Additional Declarations

Status:

Version 1