بنية جديدة قائمة على المحولات مع انتباه مزدوج لتوقع السلاسل الزمنية المالية A novel transformer-based dual attention architecture for the prediction of financial time series

المجلة: Journal of King Saud University – Computer and Information Sciences، المجلد: 37، العدد: 5
DOI: https://doi.org/10.1007/s44443-025-00045-y
تاريخ النشر: 2025-06-16
المؤلف: Zhenyun Du وآخرون
الموضوع الرئيسي: طرق التنبؤ بسوق الأسهم

نظرة عامة

تقدم ورقة البحث بنية جديدة تعتمد على الانتباه المزدوج للتنبؤ المالي، حيث تتناول تعقيدات ديناميات السوق من خلال إطار عمل معزز للتشفير وفك التشفير. يتكون النموذج المقترح من شبكتين متوازيتين: شبكة انتباه الأسعار (PAN) وشبكة انتباه غير الأسعار (NAN). تركز شبكة PAN على استخراج ميزات معقدة من بيانات الأسعار التاريخية باستخدام وحدة انتباه ذاتي مقنع وانتباه متعدد الرؤوس، بينما تستخدم NAN ConvLSTM وBiGRU والانتباه الذاتي لمعالجة الميزات المالية ذات الصلة. يسمح هذا الدمج بفهم أكثر شمولاً لكل من بيانات الأسعار وغير الأسعار، مما يحسن بشكل كبير من دقة التنبؤ.

تظهر النتائج التجريبية تفوق النموذج المقترح على خمسة طرق تنبؤ متطورة، محققة مقاييس مثيرة للإعجاب مثل خطأ مطلق متوسط (MAE) قدره 0.01991 وخطأ مربع متوسط (MSE) قدره 0.00084 عبر مجموعات بيانات مالية متنوعة. تسلط الدراسة الضوء على قدرة النموذج على التقاط الاعتماديات المعقدة وغير الخطية بفعالية مع الحفاظ على القابلية للتفسير من خلال آليات الانتباه الخاصة به. يُقترح العمل المستقبلي لتعزيز دقة الاتجاه وتكييف النموذج للتنبؤات في الوقت الحقيقي، مع إمكانية دمج ميزات إضافية مثل مشاعر السوق. بشكل عام، يوفر الإطار أداة قوية للمحللين الماليين وأصحاب المصلحة، مما يعزز الشفافية والثقة في سيناريوهات اتخاذ القرار ذات المخاطر العالية.

مقدمة

تتناول مقدمة ورقة البحث هذه تعقيدات وتحديات التنبؤ المالي، مع التأكيد على ضرورة وجود طرق تنبؤ دقيقة بسبب تقلبات السوق التي تزيد من مخاطر الاستثمار. تستعرض الأساليب الحالية، مع التركيز بشكل خاص على تقنيات التعلم العميق مثل الشبكات العصبية التكرارية (RNNs) والشبكات ذات الذاكرة الطويلة القصيرة (LSTMs) والشبكات العصبية التلافيفية (CNNs)، مع تسليط الضوء على قيودها، بما في ذلك مشاكل الإفراط في التكيف، وأوقات التدريب الطويلة، والصعوبات في التقاط تقلبات السوق. تقترح الورقة نموذجًا هجينيًا جديدًا يعتمد على الانتباه المزدوج يدمج مصادر بيانات متنوعة – بيانات الأسعار اليومية، والمؤشرات الفنية، والأخبار – بهدف تعزيز دقة التنبؤ.

يتميز النموذج المقترح بهيكل مزدوج يتكون من شبكة انتباه الأسعار (PAN) وشبكة انتباه غير الأسعار (NAN)، مما يسمح بالمعالجة المتوازية لميزات الأسعار والميزات المالية ذات الصلة. يستخدم آليات الانتباه الذاتي المقنع متعدد الرؤوس (MMSA) والانتباه المتقاطع متعدد الرؤوس (MCA) لتحسين الكفاءة الحسابية والتركيز على العلاقات الحرجة داخل البيانات. تقدم الدراسة مؤشر تأثير الأخبار الجديد لتحسين حساب تأثير أحداث الأخبار على التنبؤات السوقية. يتم تقييم أداء النموذج باستخدام مقاييس متنوعة، مما يظهر تحسينات كبيرة مقارنة بالأساليب الحالية، مع تقليل خطأ مطلق متوسط (MAE) بنسبة 79.7% وتحسينات في خطأ مربع متوسط (MSE) وخطأ الجذر التربيعي المتوسط (RMSE). لا تعالج هذه الدراسة فقط أوجه القصور في النماذج التقليدية، بل تضع أيضًا أساسًا لمزيد من الاستكشاف للهياكل المعتمدة على المحولات في التنبؤ المالي.

طرق

في هذا القسم، يوضح المؤلفون الطرق التجريبية المستخدمة لتقييم النموذج التنبؤي المقترح مقابل تقنيات التنبؤ المالي المتطورة. باستخدام إطار عمل نموذج CRISP (Wirth et al. 2000)، يحددون إعداد التجربة، والذي يتضمن استخدام مكتبة scikit-learn بلغة بايثون جنبًا إلى جنب مع مكتبات داعمة إضافية لتطوير النموذج وتحسينه. تتكون مجموعة البيانات من بيانات مالية تاريخية مقسمة إلى فترات زمنية متميزة، مما يسمح باستخراج ميزات تمييزية ضرورية للتنبؤات الدقيقة. تم إجراء التجارب في بيئة Google Cloud Platform، وتحديدًا Google Colab، باستخدام وحدة معالجة الرسوميات Tesla V100-SXM2، مع تكوين يتضمن 25 جيجابايت من الذاكرة العشوائية و100 جيجابايت من التخزين.

استخدم المؤلفون نهجًا منهجيًا لضبط المعلمات الفائقة، حيث قاموا بتعديل معلمة واحدة في كل مرة مع الحفاظ على الآخرين ثابتين لتحديد التكوينات المثلى. على سبيل المثال، قاموا بتغيير معلمة D-model من 1 إلى 50 بزيادات قدرها 5، موثقين النتائج لتحديد أفضل الإعدادات أداءً، والتي تم تلخيصها في الجدول 2. ساهمت هذه العملية الدقيقة في ضبط المعلمات في فعالية النموذج، كما يتضح من مقاييس الأداء التي تم الحصول عليها خلال التقييم النهائي. بشكل عام، يظهر التصميم والتنفيذ التجريبي إطارًا قويًا للتنبؤ المالي، مع تسليط الضوء على قدرات النموذج ورؤى حول سلوكه التشغيلي.

مناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على تعقيدات التنبؤ باتجاهات السوق المالية، مع التأكيد على قيود النماذج التقليدية والاعتماد المتزايد على تقنيات التعلم الآلي (ML) والذكاء الاصطناعي (AI). تشير الدراسات المبكرة، مثل تلك التي أجراها فاما (1995)، إلى أن تحركات السوق غالبًا ما تشبه المشي العشوائي، مما يجعل التنبؤات الدقيقة تحديًا بدون رؤى جديدة. تشير الأدبيات الحديثة إلى أن التعلم الآلي، وخاصة نماذج التعلم العميق مثل الشبكات ذات الذاكرة الطويلة القصيرة (LSTM) والمحولات، تظهر وعدًا في التنبؤ بأسعار المالية بسبب قدرتها على إدارة مجموعات بيانات كبيرة والتقاط العلاقات المعقدة وغير الخطية. ومع ذلك، تبقى مشكلات مثل الإفراط في التكيف، وكثافة الحساب، والحاجة إلى القابلية للتكيف تحديات كبيرة.

تناقش الورقة أيضًا فعالية نماذج التعلم الآلي المختلفة، بما في ذلك LSTMs وآليات الانتباه متعددة الرؤوس، في التنبؤ بعوائد الأسهم وتطوير استراتيجيات التداول. تشير إلى أنه بينما يمكن أن تعزز نماذج التعلم العميق دقة التنبؤ، يجب أن يكون اختيار النموذج معتمدًا على السياق، حيث لا يوجد نهج واحد ينطبق بشكل عالمي. يتم تسليط الضوء على دمج التقنيات المتقدمة، مثل المحولات المزدوجة وآليات الانتباه، كوسيلة لتحسين دقة التنبؤ من خلال السماح للنماذج بالتركيز على البيانات ذات الصلة مع إدارة تعقيدات السلاسل الزمنية المالية. يدعو المؤلفون إلى مزيد من البحث لمعالجة قيود النماذج المعتمدة على المحولات، خاصة فيما يتعلق بالمتطلبات الحسابية وتحديات معالجة البيانات، مع التأكيد أيضًا على الحاجة إلى دراسات في الأسواق الأقل تطورًا حيث قد تكون البيانات التاريخية محدودة.

Journal: Journal of King Saud University – Computer and Information Sciences, Volume: 37, Issue: 5
DOI: https://doi.org/10.1007/s44443-025-00045-y
Publication Date: 2025-06-16
Author(s): Zhenyun Du et al.
Primary Topic: Stock Market Forecasting Methods

Overview

The research paper presents a novel dual-attention architecture for financial forecasting, addressing the complexities of market dynamics through an enhanced encoder-decoder framework. The proposed model comprises two parallel networks: the Price Attention Network (PAN) and the Nonprice Attention Network (NAN). PAN focuses on extracting intricate features from historical price data using a Masked Self-Attention module and Multi-head Attention, while NAN employs ConvLSTM, BiGRU, and Self-Attention to process related financial features. This integration allows for a more comprehensive understanding of both price and nonprice data, significantly improving prediction accuracy.

Experimental results demonstrate the superiority of the proposed model over five state-of-the-art forecasting methods, achieving impressive metrics such as a Mean Absolute Error (MAE) of 0.01991 and a Mean Squared Error (MSE) of 0.00084 across diverse financial datasets. The study highlights the model’s ability to effectively capture complex and non-linear dependencies while maintaining interpretability through its attention mechanisms. Future work is suggested to enhance directional accuracy and adapt the model for real-time predictions, potentially incorporating additional features like market sentiment. Overall, the framework offers a robust tool for financial analysts and stakeholders, promoting transparency and trust in high-stakes decision-making scenarios.

Introduction

The introduction of this research paper addresses the complexities and challenges of financial prediction, emphasizing the necessity for accurate forecasting methods due to market fluctuations that increase investment risks. It reviews existing approaches, particularly focusing on deep learning techniques such as recurrent neural networks (RNNs), long short-term memory networks (LSTMs), and convolutional neural networks (CNNs), while highlighting their limitations, including issues with overfitting, long training times, and difficulties in capturing market volatility. The paper proposes a novel hybrid dual attention-based model that integrates diverse data sources—daily price data, technical indicators, and news—aimed at enhancing predictive accuracy.

The proposed model features a dual architecture comprising a Price Attention Network (PAN) and a Nonprice Attention Network (NAN), which allows for parallel processing of price and related financial features. It employs Multi-head Masked Self-Attention (MMSA) and Multi-head Cross-Attention (MCA) mechanisms to optimize computational efficiency and focus on critical relationships within the data. The study introduces a new news impact indicator to better account for the influence of news events on market predictions. The model’s performance is evaluated using various metrics, demonstrating significant improvements over existing methods, with reductions in Mean Absolute Error (MAE) by 79.7% and enhancements in Mean Squared Error (MSE) and Root Mean Squared Error (RMSE). This research not only addresses the shortcomings of traditional models but also lays a foundation for further exploration of transformer-based architectures in financial forecasting.

Methods

In this section, the authors detail the experimental methods employed to evaluate the proposed predictive model against state-of-the-art financial prediction techniques. Utilizing the CRISP model framework (Wirth et al. 2000), they outline the experimental setup, which includes the use of the scikit-learn Python library alongside additional supportive libraries for model development and optimization. The dataset consists of historical financial data segmented into distinct time intervals, allowing for the extraction of discriminative features necessary for accurate predictions. The experiments were conducted on a Google Cloud Platform environment, specifically Google Colab, utilizing a Tesla V100-SXM2 GPU, with a configuration that included 25 GB of RAM and 100 GB of storage.

The authors employed a systematic approach for hyperparameter tuning, adjusting one parameter at a time while maintaining others constant to identify optimal configurations. For example, they varied the D-model parameter from 1 to 50 in increments of 5, documenting the results to determine the best-performing settings, which are summarized in Table 2. This meticulous tuning process contributed to the model’s effectiveness, as evidenced by the performance metrics obtained during the final evaluation. Overall, the experimental design and execution demonstrate a robust framework for financial prediction, highlighting the model’s capabilities and insights into its operational behavior.

Discussion

The discussion section of the research paper highlights the complexities of predicting financial market trends, emphasizing the limitations of traditional models and the growing reliance on machine learning (ML) and artificial intelligence (AI) techniques. Early studies, such as those by Fama (1995), suggest that market movements often resemble a random walk, making accurate predictions challenging without new insights. Recent literature indicates that ML, particularly deep learning models like Long Short-Term Memory (LSTM) networks and transformers, shows promise in forecasting financial prices due to their ability to manage large datasets and capture complex, non-linear relationships. However, issues such as overfitting, computational intensity, and the need for adaptability remain significant challenges.

The paper also discusses the effectiveness of various ML models, including LSTMs and multi-head attention mechanisms, in predicting stock returns and developing trading strategies. It notes that while deep learning models can enhance predictive accuracy, the choice of model should be context-dependent, as no single approach is universally applicable. The integration of advanced techniques, such as dual transformers and attention mechanisms, is highlighted as a means to improve prediction accuracy by allowing models to focus on relevant data while managing the complexities of financial time series. The authors call for further research to address the limitations of transformer-based models, particularly regarding computational demands and data preprocessing challenges, while also emphasizing the need for studies in less developed markets where historical data may be limited.