TFB: نحو تقييم شامل وعادل لطرق التنبؤ بالسلاسل الزمنية TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods

المجلة: Proceedings of the VLDB Endowment، المجلد: 17، العدد: 9
DOI: https://doi.org/10.14778/3665844.3665863
تاريخ النشر: 2024-05-01
المؤلف: Xiangfei Qiu وآخرون
الموضوع الرئيسي: تحليل السلاسل الزمنية والتنبؤ

نظرة عامة

تقدم البحث TFB، وهو معيار آلي لأساليب التنبؤ بالسلاسل الزمنية (TSF)، مصمم لتعزيز المقارنة التجريبية وتقييم تقنيات التنبؤ عبر مجالات متنوعة مثل المرور والطاقة والصحة والاقتصاد. يعالج TFB العيوب الحرجة في المعايير الحالية، بما في ذلك تغطية نطاق البيانات المحدودة، والتحيز ضد أساليب التنبؤ التقليدية، وعدم الاتساق في خطوط تقييم الأداء. من خلال دمج مجموعات بيانات من عشرة مجالات متميزة وتوفير توصيف شامل للسلاسل الزمنية، يضمن TFB تمثيلاً واسعاً للبيانات.

علاوة على ذلك، يتضمن TFB مجموعة متنوعة من أساليب التنبؤ – تمتد من التعلم الإحصائي، والتعلم الآلي، والتعلم العميق – إلى جانب استراتيجيات تقييم ومقاييس متنوعة لتسهيل المقارنات العادلة. تهدف خط أنابيب المعيار المرنة والقابلة للتوسع إلى القضاء على التحيزات وعدم الاتساق، مما يوفر في النهاية إطار عمل قوي لتقييم 21 أسلوباً من أساليب التنبؤ بالسلاسل الزمنية أحادية المتغير (UTSF) عبر 8,068 سلسلة زمنية أحادية المتغير و14 أسلوباً من أساليب التنبؤ بالسلاسل الزمنية متعددة المتغيرات (MTSF) على 25 مجموعة بيانات. تسهم النتائج من هذه التقييمات في فهم أعمق لأساليب التنبؤ، مما يوجه الباحثين في اختيار التقنيات الأكثر ملاءمة لمجموعات بيانات وسياقات محددة، وبالتالي تعزيز تقدم منهجيات جديدة في TSF.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على أهمية التنبؤ بالسلاسل الزمنية (TSF) عبر مجالات متنوعة، بما في ذلك الاقتصاد، والمرور، والصحة، والطاقة، وAIOps. تؤكد على ضرورة التنبؤ بالقيم المستقبلية بناءً على البيانات التاريخية، مما يؤدي إلى تطوير العديد من أساليب التنبؤ المصنفة إلى التنبؤ بالسلاسل الزمنية أحادية المتغير (UTSF) والتنبؤ بالسلاسل الزمنية متعددة المتغيرات (MTSF). لقد تم استبدال الأساليب التقليدية مثل المتوسط المتحرك المتكامل الذاتي (ARIMA) والانحدار الذاتي المتجه (VAR) بأساليب التعلم الآلي والتعلم العميق، التي أظهرت أداءً متفوقاً. ومع ذلك، تحدد الورقة ثلاث قضايا حرجة في أطر التقييم الحالية: التغطية غير الكافية لمجالات البيانات المتنوعة، والتحيز النمطي ضد الأساليب التقليدية، ونقص خطوط تقييم مرنة ومتسقة.

لمعالجة هذه التحديات، يقترح المؤلفون معيار التنبؤ بالسلاسل الزمنية (TFB)، الذي يهدف إلى تعزيز التقييمات التجريبية لأساليب TSF. يقدم TFB مجموعة شاملة من مجموعات البيانات من مجالات متنوعة، مما يضمن تقييمات قوية، ويتضمن مجموعة واسعة من الأساليب من التعلم الإحصائي إلى التعلم العميق. يتميز المعيار بخط أنابيب تقييم مرن وقابل للتوسع يقيّد عملية التقييم، مما يحسن العدالة في المقارنات. تشير النتائج الرئيسية من التجارب التي تستخدم TFB إلى أن الأساليب التقليدية يمكن أن تتفوق على التقنيات الحديثة في سياقات معينة، وأن الأساليب التي تأخذ في الاعتبار الاعتماديات بين القنوات تعزز بشكل كبير أداء MTSF. توضح الورقة هيكلها للأقسام التالية، والتي تشمل مراجعة للأعمال ذات الصلة، والتعريفات، وتصميم TFB، ومعيار الأساليب الحالية، والملاحظات الختامية.

الأساليب

في هذا القسم، يستكشف المؤلفون نقاط القوة والقيود لـ 22 أسلوباً من أساليب التنبؤ المصنفة إلى أساليب التعلم الإحصائي، والتعلم الآلي، والتعلم العميق. تشمل الأساليب الإحصائية ARIMA، وETS، ومرشح كالمان (KF)، وVAR. تشمل أساليب التعلم الآلي XGBModel، والانحدار الخطي (LR)، وغابة عشوائية (RF). تُقسم أساليب التعلم العميق إلى نماذج قائمة على RNN، ونماذج قائمة على CNN (بما في ذلك MICN، وTimesNet، وTCN)، ونماذج قائمة على MLP (مثل NLinear، وDLinear، وTiDE، وN-HiTS، وN-BEATS)، ونماذج قائمة على Transformer (بما في ذلك PatchTST، وCrossformer، وFEDformer، وNon-stationary Transformer، وInformer، وTriformer)، ونماذج غير محددة مثل FiLM.

تسلط النتائج التجريبية الضوء على الأداء المقارن لنماذج Transformers والأساليب الخطية عبر مجموعات بيانات متنوعة. تكشف التحليلات أن الأساليب الخطية تؤدي بشكل أفضل على مجموعات البيانات ذات الاتجاهات المتزايدة أو التحولات الكبيرة بسبب قدراتها على النمذجة الخطية. في المقابل، تتفوق الأساليب القائمة على Transformer في مجموعات البيانات التي تتميز بالخصوصية الموسمية، والثبات، والأنماط غير الخطية، وذلك بفضل قدرتها المعززة على نمذجة الديناميات المعقدة للسلاسل الزمنية. تشير النتائج إلى أن اختيار أسلوب التنبؤ يجب أن يستند إلى الخصائص المحددة لبيانات السلاسل الزمنية لتحسين الأداء.

المناقشة

في قسم المناقشة من الورقة البحثية، يصنف المؤلفون أساليب التنبؤ بالسلاسل الزمنية (TSF) الحالية إلى ثلاث مجموعات رئيسية: التعلم الإحصائي، والتعلم الآلي، والتعلم العميق. لقد كانت الأساليب الإحصائية التقليدية مثل ARIMA ومرشح كالمان أساسية، ولكن مع التقدم في التعلم الآلي، اكتسبت تقنيات مثل XGBoost وغابات عشوائية شهرة لقدرتها على نمذجة العلاقات المعقدة وغير الخطية. أظهرت أساليب التعلم العميق، ولا سيما تلك التي تستخدم هياكل مثل الشبكات العصبية التلافيفية (CNNs) وTransformers، دقة تنبؤ متفوقة من خلال التقاط الاعتماديات الزمنية بشكل فعال في بيانات السلاسل الزمنية.

كما ينتقد المؤلفون المقترحات الحالية للمعايير الخاصة بـ TSF، مشيرين إلى قيودها من حيث تنوع الأساليب وتصنيف مجموعات البيانات. تركز العديد من المعايير إما على السلاسل الزمنية أحادية المتغير أو متعددة المتغيرات، مع قلة من تعالج كلاهما بشكل شامل. علاوة على ذلك، غالباً ما تفتقر المعايير إلى خطوط تقييم موحدة وقابلة للتوسع، مما يعيق دمج الأساليب الجديدة ويحد من قابليتها للتطبيق. في المقابل، يهدف معيار TFB المقترح إلى توفير إطار تقييم أكثر شمولية وسهولة في الاستخدام يشمل مجموعة أوسع من أساليب التنبؤ، ويستخدم خطوط تنفيذ قابلة للتوسع، ويقدم تصنيفاً دقيقاً لمجموعات البيانات، مما يسهل المقارنات الأكثر موثوقية ويعزز مجال TSF.

Journal: Proceedings of the VLDB Endowment, Volume: 17, Issue: 9
DOI: https://doi.org/10.14778/3665844.3665863
Publication Date: 2024-05-01
Author(s): Xiangfei Qiu et al.
Primary Topic: Time Series Analysis and Forecasting

Overview

The research introduces TFB, an automated benchmark for Time Series Forecasting (TSF) methods, designed to enhance empirical comparison and evaluation of forecasting techniques across diverse domains such as traffic, energy, health, and economics. TFB addresses critical shortcomings in existing benchmarks, including limited data domain coverage, biases against traditional forecasting methods, and inconsistencies in evaluation pipelines. By incorporating datasets from ten distinct domains and providing a comprehensive time series characterization, TFB ensures a broad representation of data.

Furthermore, TFB includes a diverse array of forecasting methods—spanning statistical learning, machine learning, and deep learning—alongside various evaluation strategies and metrics to facilitate fair comparisons. The benchmark’s flexible and scalable pipeline aims to eliminate biases and inconsistencies, ultimately offering a robust framework for evaluating 21 Univariate Time Series Forecasting (UTSF) methods across 8,068 univariate time series and 14 Multivariate Time Series Forecasting (MTSF) methods on 25 datasets. The findings from these evaluations contribute to a deeper understanding of forecasting methods, guiding researchers in selecting the most appropriate techniques for specific datasets and contexts, thereby fostering the advancement of new TSF methodologies.

Introduction

The introduction of this research paper highlights the significance of Time Series Forecasting (TSF) across various domains, including economics, traffic, health, energy, and AIOps. It emphasizes the necessity of forecasting future values based on historical data, leading to the development of numerous forecasting methods categorized into Univariate Time Series Forecasting (UTSF) and Multivariate Time Series Forecasting (MTSF). Traditional methods like Autoregressive Integrated Moving Average (ARIMA) and Vector Autoregression (VAR) have been succeeded by machine learning and deep learning approaches, which have shown superior performance. However, the paper identifies three critical issues in existing evaluation frameworks: insufficient coverage of diverse data domains, stereotype bias against traditional methods, and a lack of consistent and flexible evaluation pipelines.

To address these challenges, the authors propose the Time series Forecasting Benchmark (TFB), which aims to enhance empirical evaluations of TSF methods. TFB offers a comprehensive collection of datasets from various domains, ensuring robust evaluations, and includes a wide range of methods from statistical learning to deep learning. The benchmark features a flexible and scalable evaluation pipeline that standardizes the assessment process, thereby improving fairness in comparisons. Key findings from experiments utilizing TFB indicate that traditional methods can outperform state-of-the-art techniques in certain contexts, and that methods considering inter-channel dependencies significantly enhance MTSF performance. The paper outlines its structure for the subsequent sections, which include a review of related work, definitions, TFB design, benchmarking existing methods, and concluding remarks.

Methods

In this section, the authors investigate the strengths and limitations of 22 forecasting methods categorized into statistical learning, machine learning, and deep learning approaches. The statistical methods include ARIMA, ETS, Kalman Filter (KF), and VAR. The machine learning methods encompass XGBModel, Linear Regression (LR), and Random Forest (RF). Deep learning methods are further divided into RNN-based models, CNN-based models (including MICN, TimesNet, and TCN), MLP-based models (such as NLinear, DLinear, TiDE, N-HiTS, and N-BEATS), Transformer-based models (including PatchTST, Crossformer, FEDformer, Non-stationary Transformer, Informer, and Triformer), and Model-Agnostic models like FiLM.

The experimental results highlight the comparative performance of Transformers and linear methods across various datasets. The analysis reveals that linear methods perform better on datasets with increasing trends or significant shifts due to their linear modeling capabilities. In contrast, Transformer-based methods excel in datasets characterized by seasonality, stationarity, and nonlinear patterns, attributed to their enhanced ability to model complex time series dynamics. The findings suggest that the choice of forecasting method should be informed by the specific characteristics of the time series data to optimize performance.

Discussion

In the discussion section of the research paper, the authors categorize existing time series forecasting (TSF) methods into three main groups: statistical learning, machine learning, and deep learning. Traditional statistical methods like ARIMA and Kalman Filter have been foundational, but with advancements in machine learning, techniques such as XGBoost and Random Forests have gained prominence for their ability to model complex, nonlinear relationships. Deep learning methods, particularly those utilizing architectures like Convolutional Neural Networks (CNNs) and Transformers, have shown superior forecasting accuracy by effectively capturing temporal dependencies in time series data.

The authors also critique existing benchmarking proposals for TSF, noting their limitations in terms of method diversity and dataset categorization. Many benchmarks focus either on univariate or multivariate time series, with few addressing both comprehensively. Furthermore, the benchmarks often lack scalable and unified evaluation pipelines, which hinders the integration of new methods and limits their applicability. In contrast, the proposed TFB benchmark aims to provide a more inclusive and user-friendly evaluation framework that encompasses a broader range of forecasting methods, employs scalable implementation pipelines, and offers fine-grained dataset classification, thereby facilitating more reliable comparisons and advancing the field of TSF.