Home 9 Publications 9 A Drift-based Dynamic Ensemble Members Selection using Clustering for Time Series Forecasting

A Drift-based Dynamic Ensemble Members Selection using Clustering for Time Series Forecasting

Author: A. Saadallah, F. Priebe, K. Morik
Journal: ECML PKDD 2019: Machine Learning and Knowledge Discovery in Databases
Year: 2019

Citation information

A. Saadallah, F. Priebe, K. Morik,
ECML PKDD 2019: Machine Learning and Knowledge Discovery in Databases,
2019,
678-694,
Springer, Cham,
https://doi.org/10.1007/978-3-030-46150-8_40

Both complex and evolving nature of time series structure make forecasting among one of the most important and challenging tasks in time series analysis. Typical methods for forecasting are designed to model time-evolving dependencies between data observations. However, it is generally accepted that none of them is universally valid for every application. Therefore, methods for learning heterogeneous ensembles by combining a diverse set of forecasts together appear as a promising solution to tackle this task. Hitherto, in classical ML literature, ensemble techniques such as stacking, cascading and voting are mostly restricted to operate in a static manner. To deal with changes in the relative performance of models as well as changes in the data distribution, we propose a drift-aware meta-learning approach for adaptively selecting and combining forecasting models. Our assumption is that different forecasting models have different areas of expertise and a varying relative performance. Our method ensures dynamic selection of initial ensemble base models candidates through a performance drift detection mechanism. Since diversity is a fundamental component in ensemble methods, we propose a second stage selection with clustering that is computed after each drift detection. Predictions of final selected models are combined into a single prediction. An exhaustive empirical testing of the method was performed, evaluating both generalization error and scalability of the approach using time series from several real world domains. Empirical results show the competitiveness of the method in comparison to state-of-the-art approaches for combining forecasters.