Studies show that fault detection of the wind turbine generator is affected by time-varying working conditions. In the present study, generator data during a normal operation are integrated with the spatiotemporal attention mechanism to construct a long short-term memory auto-encoder network (AM-LSTM). In this regard, the spatiotemporal correlation of the generator operational data are calculated and the deep features of the generator under time-varying working conditions are extracted. Then the Mahalanobis distance between depth features is calculated and the health threshold is determined by kernel density estimation. To evaluate the performance of the proposed scheme, the supervisory control and data acquisition (SCADA) system data of a 2.0 MW doubly-fed asynchronous wind turbine generator are utilized. The obtained results demonstrate that the proposed method is an effective scheme to detect generator faults in advance. Furthermore, the visualization of the attention map interprets the cause of failure in a wind turbine generator. The obtained results show that the accuracy of fault detection is not affected by time-varying operating conditions, and the generator faults are detected 4.75 and 8.5 hours in advance respectively.