انتبه للتوقعات الساذجة! تقييم صارم لنماذج التنبؤ لسلاسل زمنية ذات قابلية تنبؤ منخفضة Mind the naive forecast! a rigorous evaluation of forecasting models for time series with low predictability

المجلة: Applied Intelligence، المجلد: 55، العدد: 6
DOI: https://doi.org/10.1007/s10489-025-06268-w
تاريخ النشر: 2025-02-03
المؤلف: Niels C. Beck وآخرون
الموضوع الرئيسي: تقنيات التنبؤ وتطبيقاتها

نظرة عامة

في هذا القسم، ينتقد المؤلفون الممارسة السائدة في أبحاث التنبؤ بالسلاسل الزمنية، وخاصة في الاقتصاد الكلي والمالية، حيث يتم تقييم نماذج التعلم الآلي المعقدة غالبًا فقط ضد بعضها البعض، متجاهلين الأسس الإحصائية الأبسط. قد يؤدي هذا النهج إلى إخفاء الآثار العملية للنتائج، خاصة في سياق بيانات السلاسل الزمنية شديدة التقلب. تستخدم الدراسة مجموعة من طرق التنبؤ المتقدمة، بما في ذلك ARIMA وETS وBVAR وLSTM وHCNN وDeepVAR وTFT وXGBoost، لتقييم أدائها مقابل التنبؤات الساذجة (بدون تغيير).

تكشف النتائج أن أيًا من الطرق المعقدة لا تتفوق باستمرار على التنبؤ الساذج لمجموعات البيانات شديدة التقلب، مثل أسعار الصرف وأسعار الأسهم، مما يثير تساؤلات حول قيمة التحليلات المقارنة بين النماذج المتطورة في مثل هذه السياقات. وعلى العكس من ذلك، عند تطبيقها على مؤشرات الأسعار الاقتصادية الكلية الأكثر استقرارًا، تتفوق العديد من هذه الطرق بشكل كبير على التنبؤات الساذجة. ومن الجدير بالذكر أن أداء نماذج التعلم الآلي يميل إلى الانخفاض بشكل أكثر حدة من أداء النماذج الإحصائية التقليدية في السيناريوهات ذات التقلب العالي. وهذا يبرز ضرورة دمج نماذج مرجعية مناسبة، بما في ذلك الأساليب البسيطة والفعالة من حيث التكلفة، لاستخلاص استنتاجات ذات مغزى من دراسات التنبؤ.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على التعقيد المتزايد لنماذج التعلم العميق في التنبؤ بالسلاسل الزمنية، خاصة مع ظهور الشبكات العصبية المعتمدة على المحولات. غالبًا ما يتم تقييم هذه النماذج باستخدام مجموعات بيانات من الاقتصاد الكلي والمالية، مثل أسعار الصرف وأسعار الأسهم، والتي تعتبر حاسمة في اتخاذ القرارات في مختلف القطاعات. ومع ذلك، تحدد الورقة إغفالًا كبيرًا في الأدبيات: حيث تفشل العديد من الدراسات في تضمين تنبؤ ساذج كمعيار، وهو أمر ضروري لتقييم القوة التنبؤية الحقيقية للنماذج المعقدة. يجادل المؤلفون بأن هذا الإغفال يمكن أن يؤدي إلى ادعاءات مضللة بالتفوق على النماذج الأخرى.

تساهم الورقة في هذا المجال من خلال إظهار أنه، في تطبيقين شائعين، يتفوق التنبؤ الساذج على عدة نماذج معقدة من التعلم الآلي، مما يثير تساؤلات حول الأهمية العملية للمزايا المزعومة للأخيرة. من خلال تحليل زائف خارج العينة، يؤكد المؤلفون على ضرورة تضمين التنبؤات الساذجة في تقييمات المعايير، خاصة لمجموعات البيانات المتقلبة مثل أسعار الصرف وNASDAQ100. كما يدعون إلى استخدام مجموعات بيانات ذات دقة أعلى، مثل تلك المستمدة من قاعدة بيانات بيانات الاقتصاد الفيدرالية الشهرية (FRED-MD)، لتعزيز مقارنات النماذج. تكشف الدراسة في النهاية أنه بينما تفشل النماذج المعقدة غالبًا في تجاوز المعيار الساذج، تظهر بعض النماذج وعدًا على مجموعات بيانات أكثر استقرارًا، مما يبرز أهمية أساليب التقييم الصارمة في التنبؤ بالسلاسل الزمنية.

النتائج

تستعرض قسم النتائج النتائج المرجعية للدراسة، بدءًا من تقييم أداء التنبؤ للنموذج المقترح. تشير مقاييس الأداء إلى فعالية النموذج في التنبؤ بالنتائج مقارنة بالمعايير المعتمدة. بعد ذلك، يتم تقديم تحليل مفصل للنتائج، مع تسليط الضوء على الرؤى الرئيسية والآثار المستمدة من بيانات الأداء. يهدف هذا التحليل إلى وضع النتائج في سياق أوسع للبحث، مع التأكيد على نقاط القوة في النموذج والمجالات المحتملة للتحسين.

المناقشة

في هذا القسم، يناقش المؤلفون نماذج التنبؤ المختلفة المطبقة على بيانات السلاسل الزمنية متعددة المتغيرات، مع التركيز بشكل خاص على أسعار الصرف وأسعار الأسهم ومؤشرات الاقتصاد الكلي. يصنفون النماذج إلى طرق إحصائية أحادية المتغير (مثل ETS وARIMA)، ونماذج إحصائية متعددة المتغيرات (مثل BVAR)، ونماذج تعلم آلي أحادية المتغير (بما في ذلك XGBoost وTFT)، ونماذج تعلم آلي متعددة المتغيرات (مثل LSTM وHCNN). يعمل التنبؤ الساذج كمعيار، حيث يتنبأ بعدم حدوث تغييرات أو بالقيمة الأخيرة الملاحظة. يبرز المؤلفون أنه بينما تكون النماذج الإحصائية مثل ETS وARIMA فعالة في العمليات الخطية، قد تواجه صعوبة مع الاعتماديات المعقدة، بينما يمكن لنماذج التعلم الآلي التقاط العلاقات غير الخطية ولكن قد تعاني من الإفراط في التكيف في البيئات المتقلبة.

يتم تقييم أداء التنبؤ من خلال مقاييس مثل متوسط الخطأ التربيعي النسبي (rMSE) ومتوسط الخطأ المطلق النسبي (rMAE)، مع نتائج تشير إلى أنه لا يوجد نموذج يتفوق باستمرار على التنبؤ الساذج عبر جميع مجموعات البيانات. على وجه الخصوص، يظهر ARIMA قدرة تنبؤية كبيرة لمجموعة بيانات FRED-MD، متفوقًا على التنبؤ الساذج لعدة متغيرات. يكشف التحليل أن أداء نماذج التعلم الآلي يتدهور في مجموعات البيانات شديدة التقلب، مما يشير إلى أن تعقيدها قد يؤدي إلى الإفراط في التكيف عند مواجهة الضوضاء. وعلى العكس من ذلك، فإن مجموعة بيانات FRED-MD، التي تتميز بوجود اعتمادات أقوى، تسمح لنماذج التعلم الآلي بالاستفادة من قدراتها بشكل أكثر فعالية، مما يؤدي إلى أداء أفضل في التنبؤ. بشكل عام، تؤكد النتائج على التحديات التي تواجه تحقيق تنبؤات موثوقة في السياقات المالية والاقتصادية الكلية، خاصة تحت ظروف تقلب متفاوتة.

Journal: Applied Intelligence, Volume: 55, Issue: 6
DOI: https://doi.org/10.1007/s10489-025-06268-w
Publication Date: 2025-02-03
Author(s): Niels C. Beck et al.
Primary Topic: Forecasting Techniques and Applications

Overview

In this section, the authors critique the prevailing practice in time series forecasting research, particularly within macroeconomics and finance, where complex machine learning models are often evaluated solely against one another, neglecting simpler statistical baselines. This approach may obscure the practical implications of the findings, especially in the context of highly volatile time series data. The study employs a range of advanced forecasting methods, including ARIMA, ETS, BVAR, LSTM, HCNN, DeepVAR, TFT, and XGBoost, to assess their performance against naive (no-change) forecasts.

The results reveal that none of the complex methods consistently outperform the naive forecast for highly volatile datasets, such as exchange rates and stock prices, thereby questioning the value of comparative analyses among sophisticated models in such contexts. Conversely, when applied to more stable macroeconomic price indices, many of these methods significantly surpass naive forecasts. Notably, the performance of machine learning models tends to decline more sharply than that of traditional statistical models in high-volatility scenarios. This underscores the necessity of incorporating appropriate benchmark models, including simple and cost-effective approaches, to draw meaningful conclusions from forecasting studies.

Introduction

The introduction of this research paper highlights the growing complexity of deep learning models in time series forecasting, particularly with the rise of transformer-based neural networks. These models are often evaluated using datasets from macroeconomics and finance, such as exchange rates and stock prices, which are crucial for decision-making in various sectors. However, the paper identifies a significant oversight in the literature: many studies fail to include a naive forecast as a benchmark, which is essential for assessing the true predictive power of complex models. The authors argue that this omission can lead to misleading claims of superiority over other models.

The paper contributes to the field by demonstrating that, in two common applications, the naive forecast outperforms several complex machine learning models, thereby questioning the practical relevance of the latter’s claimed advantages. Through a pseudo-out-of-sample analysis, the authors emphasize the necessity of including naive forecasts in benchmark evaluations, particularly for volatile datasets like exchange rates and the NASDAQ100. They also advocate for the use of higher granularity datasets, such as those from the Federal Reserve Economic Data monthly database (FRED-MD), to enhance model comparisons. The study ultimately reveals that while complex models often fail to surpass the naive benchmark, certain models do show promise on more stable datasets, thus underscoring the importance of rigorous evaluation methods in time series forecasting.

Results

The Results section outlines the benchmark findings of the study, beginning with an evaluation of the forecasting performance of the proposed model. The performance metrics indicate the model’s effectiveness in predicting outcomes compared to established benchmarks. Following this, a detailed analysis of the results is provided, highlighting key insights and implications derived from the performance data. This analysis aims to contextualize the findings within the broader research landscape, emphasizing the model’s strengths and potential areas for improvement.

Discussion

In this section, the authors discuss various forecasting models applied to multivariate time series data, specifically focusing on exchange rates, stock prices, and macroeconomic indicators. They categorize the models into univariate statistical methods (such as ETS and ARIMA), multivariate statistical models (like BVAR), univariate machine learning models (including XGBoost and TFT), and multivariate machine learning models (such as LSTM and HCNN). The naive forecast serves as a benchmark, predicting no changes or the last observed value. The authors highlight that while statistical models like ETS and ARIMA are effective for linear processes, they may struggle with complex dependencies, whereas machine learning models can capture non-linear relationships but may overfit in volatile environments.

The forecasting performance is evaluated through metrics such as relative mean squared error (rMSE) and relative mean absolute error (rMAE), with results indicating that no model consistently outperforms the naive forecast across all datasets. In particular, ARIMA shows significant predictive capability for the FRED-MD dataset, outperforming the naive forecast for several variables. The analysis reveals that the performance of machine learning models deteriorates in highly volatile datasets, suggesting that their complexity may lead to overfitting when faced with noise. Conversely, the FRED-MD dataset, characterized by stronger interdependencies, allows machine learning models to leverage their capabilities more effectively, resulting in better forecasting performance. Overall, the findings underscore the challenges of achieving reliable forecasts in financial and macroeconomic contexts, particularly under varying volatility conditions.