|
Title:
|
COMPARISON OF TIME SERIES IMPUTATION METHODS - USING SAITS FOR DATA RECOVERY |
|
Author(s):
|
Draylon Vieira Lopes, Lucas Gonçalves Brach, Emerson Cassiano da Silva, Peterson Gonçalves Alano and Rafael Stubs Parpinelli |
|
ISBN:
|
978-989-8704-71-9 |
|
Editors:
|
Paula Miranda and Pedro Isaías |
|
Year:
|
2025 |
|
Edition:
|
Single |
|
Keywords:
|
Data Imputation, Time Series, Missing Data, Industrial Monitoring, Machine Learning, Predictive Maintenance |
|
Type:
|
Full Paper |
|
First Page:
|
29 |
|
Last Page:
|
36 |
|
Language:
|
English |
|
Cover:
|
|
|
Full Contents:
|
if you are a member please login
|
|
Paper Abstract:
|
Missing data in wind turbine Supervisory Control and Data Acquisition (SCADA) is a common issue caused by sensor
faults, communication losses, and maintenance downtime. These gaps reduce the reliability of condition monitoring,
anomaly detection, and predictive maintenance, where complete and high-quality data are essential. This work focuses on
addressing the imputation of missing values in turbine datasets to improve the quality and usability of the data for machine
learning applications and operational decision-making. We start by collecting multivariate SCADA data from real turbine
operations and artificially introduce block gaps of 6-60 samples to replicate realistic sensor interruptions. Several strategies
for filling these gaps are evaluated, including time-based linear interpolation, multivariate linear regression, MIARMA,
which uses ARMA models to preserve spectral properties, and SAITS, a modern self-attention-based architecture. The
comparison is carried out under identical missingness conditions, and accuracy is assessed only at the masked points using
MAE, RMSE, and sMAPE. The results, aggregated across more than 200 million masked points, show that multivariate
linear regression is the most effective among the classical methods, performing better than simple time interpolation, while
MIARMA delivers similar results at a way higher cost in the multivariate contexts. SAITS achieves the best overall
performance and fidelity on the datasets, confirming that deep learning models are highly effective in reconstructing
complex turbine data, though they require greater computational resources for producing fast results. The findings highlight
the importance of exploiting cross-variable relationships in turbine monitoring and demonstrate that the proposed pipeline
can serve as a reproducible framework for evaluating imputation methods in other industrial domains as well. |
|
|
|
|
|
|