نموذج توقع السلاسل الزمنية لاستخراج أنماط السلاسل غير الثابتة باستخدام التعلم العميق ونمذجة GARCH Time series forecasting model for non-stationary series pattern extraction using deep learning and GARCH modeling

المجلة: Journal of Cloud Computing Advances Systems and Applications، المجلد: 13، العدد: 1
DOI: https://doi.org/10.1186/s13677-023-00576-7
تاريخ النشر: 2024-01-02
المؤلف: Huimin Han وآخرون
الموضوع الرئيسي: طرق التنبؤ بسوق الأسهم

نظرة عامة

تقدم هذه الورقة نموذجًا هجينًا لتوقع السلاسل الزمنية يدمج بين التغاير الشرطي الذاتي العام (GARCH)، وتحليل النمط التجميعي الكامل مع الضوضاء التكيفية (CEEMDAN)، والشبكات العصبية التلافيفية (GCN). يعالج النموذج التعقيدات الكامنة في بيانات السلاسل الزمنية، مثل الاتجاهات وعدم الثبات، من خلال استخدام GARCH أولاً لتعلم التقلبات ثم تطبيق CEEMDAN لتفكيك البيانات بشكل فعال. تبسط هذه المعالجة المسبقة البيانات بشكل كبير، مما يسمح لـ GCN بتعلم الميزات والارتباطات الأساسية بكفاءة.

تشير النتائج إلى أن نموذج GARCH-CEEMDAN-GCN المقترح يتفوق على نماذج التنبؤ الفردية التقليدية من حيث دقة التنبؤ والاستقرار. تعزز خطوات المعالجة المسبقة من سهولة تحليل البيانات وتمكن النموذج من إدارة البيانات الخام المعقدة بشكل أكثر فعالية. تؤكد النتائج التجريبية أن هذا النهج الهجين لا يحقق فقط دقة تنبؤ متفوقة ولكنه يظهر أيضًا قدرات قوية في ملاءمة البيانات، مما يجعله قابلًا للتطبيق في سيناريوهات عملية متنوعة، بما في ذلك توقع أسعار الأسهم ودرجات الحرارة. تسهم هذه البحث في تقدم منهجيات توقع السلاسل الزمنية وتبرز إمكانيات النماذج الهجينة في تحليل البيانات الكبيرة.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على التعقيدات وأهمية توقع السلاسل الزمنية عبر مجالات متنوعة، بما في ذلك المالية والرعاية الصحية والأرصاد الجوية والإنتاج الصناعي. يؤكد المؤلفون على التحديات التي تطرحها عدم الخطية وعدم الثبات الكامنة في بيانات السلاسل الزمنية، مما يعقد التنبؤات الدقيقة على الرغم من التقدم في تقنيات التنبؤ مثل الانحدار الخطي، والنماذج الذاتية التكرارية، والتعلم العميق. كانت الطرق التقليدية مثل تحويل فورييه وتحليل النمط التجميعي مفيدة في تفكيك بيانات السلاسل الزمنية إلى مكونات قابلة للإدارة، ومع ذلك غالبًا ما تفشل في استخراج ميزات كافية للتنبؤ الدقيق.

لمعالجة هذه التحديات، يقترح المؤلفون نموذجًا جديدًا للتنبؤ يدمج بين معالجة الإشارات والتعلم العميق. يتميز هذا النموذج بهيكل شبكة مصمم خصيصًا يهدف إلى استخراج الميزات ذات الصلة من بيانات التسلسل على فترات زمنية مختلفة. تشمل المساهمات الرئيسية تطبيق التغاير الشرطي الذاتي العام (GARCH) لتحليل التقلبات، واستخدام تحليل النمط التجميعي الكامل مع الضوضاء التكيفية (CEEMDAN) لتفكيك الإشارات بشكل فعال، وتنفيذ الشبكات العصبية التلافيفية (GCN) لتعزيز تعلم الميزات. يتم تقييم النموذج المقترح بدقة ضد مجموعات بيانات السلاسل الزمنية المتعددة، مما يظهر تحسينات في دقة التنبؤ والاستقرار، وبالتالي يبرز إمكانيته في تحويل منهجيات توقع السلاسل الزمنية.

الطرق

تحدد هذه القسم المنهجية المستخدمة في النموذج الهجين المقترح لتوقع السلاسل الزمنية، الذي يدمج نموذج التغاير الشرطي الذاتي (GARCH)، وتحليل النمط التجميعي الكامل مع الضوضاء التكيفية (CEEMDAN)، والشبكة العصبية التلافيفية (GCN). يلتقط نموذج GARCH بفعالية التقلب الديناميكي الكامن في بيانات السلاسل الزمنية، التي غالبًا ما تظهر عدم الخطية وعدم الثبات والضوضاء. من خلال معالجة البيانات عبر GARCH، يعزز النموذج القدرة التحليلية للسلاسل الزمنية.

بعد ذلك، يتم إخضاع المتبقيات من نموذج GARCH لـ CEEMDAN، الذي يقوم بتفكيك الإشارة المعقدة إلى وظائف النمط الجوهرية (IMFs)، مما يكشف عن مكونات التردد المخفية ويبسط هيكل البيانات. تعمل IMFs كميزات إدخال لـ GCN، الذي يقوم باستخراج الميزات وتوليد التنبؤات. تشمل المنهجية عدة خطوات معالجة مسبقة: ملء القيم المفقودة، وتوحيد البيانات، وتنظيمها زمنيًا. يتم اشتقاق نتائج التنبؤ النهائية من نموذج GCN بناءً على المكونات النمطية المفككة. يتم تلخيص العملية العامة بصريًا في مخطط هيكلي (الشكل 3).

النتائج

تظهر نتائج التحقيقات التجريبية التي أجريت على ثلاث مجموعات بيانات – جودة الهواء، والطاقة، وحركة المرور – الأداء المقارن لخمس نماذج: GCN، EMD-GCN، EEMD-GCN، EMD-CEEMDAN-GCN، والنموذج المقترح حديثًا GARCH-CEEMDAN-GCN. تكشف النتائج أنه بينما يظهر نموذج GARCH-CEEMDAN-GCN أداءً دون المستوى الأمثل على مجموعة بيانات جودة الهواء، فإنه يتفوق بشكل كبير على النماذج الأخرى على مجموعات بيانات الطاقة وحركة المرور عبر عدة مقاييس تقييم، بما في ذلك متوسط الخطأ المطلق (MAE)، ومتوسط الخطأ التربيعي (MSE)، ومتوسط الخطأ النسبي المطلق (MAPE)، وR².

على وجه الخصوص، على مجموعة بيانات جودة الهواء، يظهر GARCH-CEEMDAN-GCN MAE أفضل بنسبة 7.5% من GCN ولكنه أسوأ من EMD-GCN وEEMD-GCN. بالمقابل، على مجموعة بيانات الطاقة، يحقق GARCH-CEEMDAN-GCN MAE أفضل بنسبة 61.1% من GCN ويتفوق على جميع النماذج الأخرى. تشير مقاييس MSE وMAPE أيضًا إلى تحسينات كبيرة، حيث يكون GARCH-CEEMDAN-GCN أفضل بنسبة 86.9% من GCN في MSE. بشكل عام، تؤكد النتائج أن دمج نموذج GARCH يعزز قدرات التنبؤ لنموذج GARCH-CEEMDAN-GCN، خاصة لمجموعات بيانات الطاقة وحركة المرور، مما يثبت فعاليته في تطبيقات التنبؤ الهجينة.

المناقشة

في قسم المناقشة من الورقة البحثية، يستكشف المؤلفون تطور منهجيات توقع السلاسل الزمنية، مسلطين الضوء على الانتقال من النماذج الإحصائية التقليدية إلى الأساليب المتقدمة في التعلم الآلي والهجينة. يلاحظون أنه بينما كانت النماذج المبكرة مثل ARMA وARIMA تعالج بعض تحديات التنبؤ بفعالية، إلا أنها كانت تواجه صعوبة مع البيانات غير الثابتة. شكل ظهور تقنيات التعلم الآلي، وخاصة آلات الدعم الناقل (SVMs)، وأقرب الجيران (KNN)، والشبكات العصبية الاصطناعية (ANNs)، تحسنًا كبيرًا في التعامل مع مجموعات البيانات المعقدة. يؤكد المؤلفون على ظهور طرق التعلم العميق، مثل الشبكات العصبية التكرارية (RNNs) وشبكات الذاكرة طويلة وقصيرة المدى (LSTMs)، التي عززت قدرات التنبؤ بشكل أكبر، خاصة عند دمجها مع الشبكات العصبية التلافيفية (CNNs) والشبكات العصبية التلافيفية (GCNs).

يدعو المؤلفون إلى استخدام النماذج الهجينة التي تجمع بين تقنيات تفكيك الإشارات، مثل تحليل النمط التجميعي الكامل مع الضوضاء التكيفية (CEEMDAN)، مع نمذجة GARCH لإدارة التقلبات في بيانات السلاسل الزمنية بشكل فعال. يقدمون أدلة تجريبية تدعم تفوق هذه الأساليب الهجينة على النماذج التقليدية، خاصة في التنبؤ المالي ومجالات أخرى تتميز بسلوكيات غير ثابتة. ومع ذلك، يعترفون أيضًا بالتعقيدات والمتطلبات الحاسوبية المرتبطة بهذه النماذج، بما في ذلك مخاطر الإفراط في التكيف والتحديات في قابلية تفسير النموذج. يدعو المؤلفون إلى تركيز الأبحاث المستقبلية على تعزيز قابلية التكيف وقابلية تفسير هذه النماذج مع معالجة الآثار الأخلاقية لتطبيقها في مجالات حساسة مثل التنبؤ الاقتصادي.

Journal: Journal of Cloud Computing Advances Systems and Applications, Volume: 13, Issue: 1
DOI: https://doi.org/10.1186/s13677-023-00576-7
Publication Date: 2024-01-02
Author(s): Huimin Han et al.
Primary Topic: Stock Market Forecasting Methods

Overview

This paper introduces a hybrid model for time series forecasting that integrates Generalized Autoregressive Conditional Heteroskedasticity (GARCH), Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN), and Graph Convolutional Networks (GCN). The model addresses the complexities inherent in time series data, such as trends and non-stationarity, by first utilizing GARCH to learn volatility and then applying CEEMDAN for effective data decomposition. This preprocessing significantly simplifies the data, allowing the GCN to efficiently learn the underlying features and correlations.

The findings indicate that the proposed GARCH-CEEMDAN-GCN model outperforms traditional single prediction models in terms of predictive accuracy and stability. The preprocessing steps enhance data analysis convenience and enable the model to manage complex raw data more effectively. Experimental results confirm that this hybrid approach not only achieves superior prediction accuracy but also demonstrates strong data fitting capabilities, making it applicable to various practical scenarios, including stock price and temperature forecasting. This research contributes to the advancement of time series forecasting methodologies and highlights the potential of hybrid models in big data analysis.

Introduction

The introduction of this research paper highlights the complexities and significance of time series forecasting across various domains, including finance, healthcare, meteorology, and industrial production. The authors emphasize the challenges posed by the inherent nonlinearity and nonstationarity of time series data, which complicates accurate predictions despite the advancements in forecasting techniques such as linear regression, autoregressive models, and deep learning. Traditional methods like Fourier transform and empirical mode decomposition have been instrumental in breaking down time series data into manageable components, yet they often fall short in extracting sufficient features for precise forecasting.

To address these challenges, the authors propose a novel forecasting model that integrates signal processing with deep learning. This model features a custom-designed network structure aimed at extracting relevant features from sequence data at various intervals. Key contributions include the application of Generalized Autoregressive Conditional Heteroskedasticity (GARCH) to analyze volatility, the use of Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) for effective signal decomposition, and the implementation of Graph Convolutional Networks (GCN) for enhanced feature learning. The proposed model is rigorously evaluated against multiple time series datasets, demonstrating improved predictive accuracy and stability, thereby showcasing its potential to transform time series forecasting methodologies.

Methods

The section outlines the methodology employed in the proposed hybrid model for time series prediction, which integrates the Autoregressive Conditional Heteroskedasticity (GARCH) model, the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN), and the Graph Convolutional Neural Network (GCN). The GARCH model effectively captures the dynamic volatility inherent in time series data, which often exhibits nonlinearity, non-stationarity, and noise. By processing the data through GARCH, the model enhances the analytical capability of the time series.

Subsequently, the residuals from the GARCH model are subjected to CEEMDAN, which decomposes the complex signal into intrinsic mode functions (IMFs), thereby revealing hidden frequency components and simplifying the data structure. The IMFs serve as input features for the GCN, which performs feature extraction and generates predictions. The methodology includes several preprocessing steps: filling missing values, standardizing the data, and organizing it chronologically. The final prediction results are derived from the GCN model based on the decomposed modal components. The overall process is visually summarized in a structural diagram (Fig. 3).

Results

The results of the experimental investigations conducted on three datasets—Air Quality, Energy, and Traffic—demonstrate the comparative performance of five models: GCN, EMD-GCN, EEMD-GCN, EMD-CEEMDAN-GCN, and the newly proposed GARCH-CEEMDAN-GCN. The findings reveal that while the GARCH-CEEMDAN-GCN model exhibits suboptimal performance on the Air Quality dataset, it significantly outperforms the other models on the Energy and Traffic datasets across multiple evaluation metrics, including Mean Absolute Error (MAE), Mean Squared Error (MSE), Mean Absolute Percentage Error (MAPE), and R².

Specifically, on the Air Quality dataset, GARCH-CEEMDAN-GCN shows a MAE that is 7.5% better than GCN but worse than EMD-GCN and EEMD-GCN. In contrast, on the Energy dataset, GARCH-CEEMDAN-GCN achieves a MAE that is 61.1% better than GCN and outperforms all other models. The MSE and MAPE metrics also indicate substantial improvements, with GARCH-CEEMDAN-GCN being 86.9% better than GCN in MSE. Overall, the results affirm that the integration of the GARCH model enhances the forecasting capabilities of the GARCH-CEEMDAN-GCN model, particularly for the Energy and Traffic datasets, thereby validating its effectiveness in hybrid forecasting applications.

Discussion

In the discussion section of the research paper, the authors explore the evolution of time series forecasting methodologies, highlighting the transition from traditional statistical models to advanced machine learning and hybrid approaches. They note that while early models like ARMA and ARIMA effectively addressed certain forecasting challenges, they struggled with non-stationary data. The advent of machine learning techniques, particularly Support Vector Machines (SVMs), K-Nearest Neighbors (KNN), and Artificial Neural Networks (ANNs), marked a significant improvement in handling complex datasets. The authors emphasize the emergence of deep learning methods, such as Recurrent Neural Networks (RNNs) and Long Short-Term Memory networks (LSTMs), which further enhanced predictive capabilities, especially when integrated with Convolutional Neural Networks (CNNs) and Graph Convolutional Networks (GCNs).

The authors advocate for the use of hybrid models that combine signal decomposition techniques, such as Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN), with GARCH modeling to effectively manage volatility in time series data. They present empirical evidence supporting the superiority of these hybrid approaches over traditional models, particularly in financial forecasting and other domains characterized by non-stationary behaviors. However, they also acknowledge the complexities and computational demands associated with these models, including risks of overfitting and challenges in model interpretability. The authors call for future research to focus on enhancing the adaptability and interpretability of these models while addressing the ethical implications of their application in sensitive areas like economic forecasting.