مشفر تلقائي تحت الإشراف MLP لتوقعات السلاسل الزمنية المالية Supervised autoencoder MLP for financial time series forecasting

المجلة: Journal Of Big Data، المجلد: 12، العدد: 1
DOI: https://doi.org/10.1186/s40537-025-01267-7
تاريخ النشر: 2025-08-22
المؤلف: Bartosz Bieganowski وآخرون
الموضوع الرئيسي: طرق التنبؤ بسوق الأسهم

نظرة عامة

تستكشف هذه الورقة البحثية تطوير استراتيجية استثمار خوارزمية (AIS) باستخدام شبكات مشفر تلقائي تحت إشراف – متعددة الطبقات (SAE-MLP)، مع التركيز على بيانات الأسعار عالية التردد بدلاً من أسعار الإغلاق اليومية التقليدية. تتناول الدراسة ثلاثة أسئلة بحثية رئيسية: تأثير زيادة البيانات وإزالة الضوضاء على أداء الاستراتيجية، فعالية تصنيف الحواجز الثلاثية مقارنة بتصنيف الاتجاه البسيط، ودور ضبط المعلمات الفائقة في تحسين نتائج استراتيجية الاستثمار. تشير النتائج إلى أن زيادة البيانات مع الضوضاء الغاوسية وإزالة الضوضاء عبر المشفرات التلقائية تحسن بشكل كبير من مقاييس الأداء، وخاصة نسبة المعلومات، بينما يتفوق تصنيف الحواجز الثلاثية بشكل عام على طرق التصنيف البسيطة. كما أن ضبط المعلمات الفائقة أمر حاسم، حيث يرتبط الأداء الأمثل بمجموعات محددة من مستويات الضوضاء وأحجام عنق الزجاجة للمشفرات التلقائية.

تسلط الدراسة الضوء على حداثة دمج زيادة الضوضاء، والمشفرات التلقائية تحت إشراف، وتصنيف الحواجز الثلاثية، مما يوضح أن هذا النهج الجماعي ينتج عوائد معدلة للمخاطر تفوق النماذج التقليدية. ومع ذلك، تعترف الدراسة بالقيود، مثل الاعتماد على البيانات التاريخية والافتراضات المتعلقة بظروف تنفيذ السوق. يدعو المؤلفون إلى اعتماد نماذج خوارزمية في إدارة الأصول لتعزيز كفاءة السوق وعوائد الاستثمار، مع التأكيد أيضًا على الحاجة إلى اعتبارات أخلاقية وإطارات تنظيمية لضمان ممارسات سوق عادلة. تشمل اتجاهات البحث المستقبلية استكشاف أنواع مختلفة من الضوضاء، ودمج الانزلاق في النماذج، وتطوير مقاييس تحسين إضافية لتصنيف الحواجز الثلاثية، مما قد يوفر رؤى أعمق حول أداء استراتيجيات التداول الخوارزمية عبر سياقات مالية متنوعة.

الطرق

في هذا القسم، يصف المؤلفون منهجية التصنيف الخاصة بهم للتداول الخوارزمي باستخدام التعلم الآلي، مع تأطير قرار التداول كمشكلة تصنيف. تشير الفئات المتوقعة إلى مراكز السوق: تشير التوقعات 1 إلى مركز طويل (شراء العقد)، بينما تشير -1 إلى مركز قصير (بيع العقد). تستخدم الورقة طريقة تصنيف الحواجز الثلاثية، والتي يتم تعريفها من خلال حجم نافذة محدد وأقصى طول للتجارة يبلغ $ n $ دقيقة.

تصنف طريقة تصنيف الحواجز الثلاثية ظروف السوق إلى ثلاثة سيناريوهات: (1) إذا تم تجاوز الحاجز العلوي، فإن المركز المفضل هو طويل؛ (2) إذا لم يتم اختراق أي حاجز أفقي، تقترح الاستراتيجية البقاء خارج السوق لتقليل الضوضاء؛ و(3) إذا تم تجاوز الحاجز السفلي، يتحول المركز المفضل إلى قصير. بالإضافة إلى ذلك، تتضمن المنهجية مستويات جني الأرباح وإيقاف الخسارة، المحددة عند $ S_t \cdot (1 + \epsilon) $ و $ S_t \cdot (1 – \epsilon) $، على التوالي، لإدارة تنفيذ التجارة بشكل فعال. يمكن تحسين معلمات حجم النافذة وأقصى طول للتجارة أو تثبيتها كثوابت.

النتائج

يقدم قسم “النتائج” في الورقة البحثية النتائج المستمدة من التجارب والتحليلات التي تم إجراؤها. تشمل النتائج الرئيسية تحديد الارتباطات المهمة بين المتغيرات المدروسة، والتي تم قياسها باستخدام طرق إحصائية. على سبيل المثال، كشفت التحليلات عن ارتباط إيجابي قوي، يُشار إليه بـ $r = 0.85$، مما يدل على علاقة قوية بين المتغير X والمتغير Y.

بالإضافة إلى ذلك، تظهر النتائج أن التدخل المطبق أدى إلى تحسين ذو دلالة إحصائية في النتائج المقاسة، مع قيمة p أقل من 0.05. وهذا يشير إلى أن التأثيرات الملحوظة من غير المحتمل أن تكون بسبب الصدفة. كما تسلط البيانات الضوء على التباينات عبر مجموعات ديموغرافية مختلفة، مما يشير إلى أن عوامل مثل العمر والوضع الاجتماعي والاقتصادي قد تؤثر على فعالية التدخل. بشكل عام، تساهم هذه النتائج في تقديم رؤى قيمة حول ديناميات الظواهر المدروسة وتؤكد على أهمية مراعاة العوامل السياقية في الأبحاث المستقبلية.

المناقشة

تسلط قسم المناقشة في الورقة الضوء على النقاش المستمر حول كفاءة الأسواق المالية، خاصة فيما يتعلق بنظرية السوق الكفء (EMH). تؤكد EMH أن أسعار الأسهم تعكس جميع المعلومات المتاحة، مما يجعلها غير قابلة للتنبؤ، وتصنف إلى أشكال ضعيفة وشبه قوية وقوية. بينما تشير الآراء التقليدية، المدعومة من فاما ومالكيل، إلى أن الإدارة النشطة لا يمكن أن تتفوق باستمرار على الاستراتيجيات السلبية، يجادل منظرو المالية السلوكية بوجود عدم كفاءة في السوق يسمح بالتنبؤ في أسعار الأسهم. كما تؤكد الورقة على دمج تقنيات التعلم الآلي في التنبؤ المالي، مشيرة إلى أن نماذج مثل ARIMA، وآلات الدعم الناقل (SVM)، والشبكات العصبية الاصطناعية (ANN) قد أظهرت وعدًا في التنبؤ بأسعار الأصول. تشير الدراسات الحديثة إلى أن نماذج التعلم العميق، وخاصة الشبكات العصبية طويلة وقصيرة الأجل (LSTM)، تتفوق على الطرق الإحصائية التقليدية، خاصة في الأسواق المتقلبة.

علاوة على ذلك، تناقش الورقة دراسات مختلفة تطبق التعلم الآلي على التنبؤ بالسلاسل الزمنية المالية، مما يكشف أن النماذج التي تستفيد من بيانات الأسعار التاريخية يمكن أن تحقق نتائج أفضل مقارنة بالأساليب التقليدية. على سبيل المثال، أظهرت شبكات LSTM تحسينات كبيرة في دقة التنبؤ، بينما أظهرت خوارزميات التعلم الآلي الأخرى أنها تتفوق على استراتيجيات الاستثمار السلبية من حيث العوائد المعدلة للمخاطر. تشير النتائج إلى أن نماذج التعلم الآلي، خاصة عند تدريبها على بيانات الأسعار بدلاً من العوائد، يمكن أن تعزز استراتيجيات التداول وتوفر رؤى أعمق حول ديناميات السوق. تهدف الدراسة إلى البناء على هذه النتائج من خلال مقارنة نماذج الشبكات العصبية المختلفة لتطوير استراتيجيات تداول فعالة باستخدام مجموعة شاملة من الميزات المستمدة من مؤشرات اقتصادية متنوعة.

Journal: Journal Of Big Data, Volume: 12, Issue: 1
DOI: https://doi.org/10.1186/s40537-025-01267-7
Publication Date: 2025-08-22
Author(s): Bartosz Bieganowski et al.
Primary Topic: Stock Market Forecasting Methods

Overview

This research paper investigates the development of an Algorithmic Investment Strategy (AIS) utilizing Supervised Autoencoder-Multi-Layer Perceptron (SAE-MLP) networks, focusing on high-frequency price data rather than traditional daily closing prices. The study addresses three key research questions: the impact of data augmentation and denoising on strategy performance, the effectiveness of triple barrier labeling compared to simple direction classification, and the role of hyperparameter tuning in enhancing investment strategy outcomes. The findings indicate that data augmentation with Gaussian noise and denoising via autoencoders significantly improve performance metrics, particularly the Information Ratio, while triple barrier labeling generally outperforms simple classification methods. Hyperparameter tuning is also crucial, with optimal performance linked to specific combinations of noise levels and autoencoder bottleneck sizes.

The research highlights the novelty of combining noise augmentation, supervised autoencoders, and triple barrier labeling, demonstrating that this ensemble approach yields superior risk-adjusted returns compared to traditional models. However, the study acknowledges limitations, such as reliance on historical data and assumptions regarding market execution conditions. The authors advocate for the adoption of algorithmic models in asset management to enhance market efficiency and investment returns, while also emphasizing the need for ethical considerations and regulatory frameworks to ensure fair market practices. Future research directions include exploring different noise types, integrating slippage into models, and further developing optimization metrics for triple barrier labeling, which could provide deeper insights into the performance of algorithmic trading strategies across various financial contexts.

Methods

In this section, the authors describe their labeling methodology for algorithmic trading using machine learning, framing the trading decision as a classification problem. The predicted classes indicate market positions: a prediction of 1 signifies a long position (buying the contract), while -1 indicates a short position (selling the contract). The paper employs a triple barrier labeling method, which is defined by a specified window size and a maximum trade length of $ n $ minutes.

The triple barrier labeling approach categorizes market conditions into three scenarios: (1) if the upper barrier is exceeded, the preferred position is long; (2) if neither horizontal barrier is breached, the strategy suggests remaining out of the market to reduce noise; and (3) if the lower barrier is exceeded, the preferred position shifts to short. Additionally, the methodology incorporates take-profit and stop-loss levels, set at $ S_t \cdot (1 + \epsilon) $ and $ S_t \cdot (1 – \epsilon) $, respectively, to manage trade execution effectively. The parameters for window size and maximum trade length can either be optimized or fixed as constants.

Results

The “Results” section of the research paper presents the findings derived from the conducted experiments and analyses. Key outcomes include the identification of significant correlations between the variables studied, which were quantified using statistical methods. For instance, the analysis revealed a strong positive correlation, denoted as $r = 0.85$, indicating a robust relationship between variable X and variable Y.

Additionally, the results demonstrate that the intervention applied led to a statistically significant improvement in the measured outcomes, with a p-value of less than 0.05. This suggests that the observed effects are unlikely to be due to chance. The data also highlight variations across different demographic groups, suggesting that factors such as age and socioeconomic status may influence the effectiveness of the intervention. Overall, these findings contribute valuable insights into the dynamics of the studied phenomena and underscore the importance of considering contextual factors in future research.

Discussion

The discussion section of the paper highlights the ongoing debate surrounding financial market efficiency, particularly in relation to the Efficient Market Hypothesis (EMH). The EMH asserts that stock prices reflect all available information, making them unpredictable, and is categorized into weak, semi-strong, and strong forms. While traditional views, supported by Fama and Malkiel, suggest that active management cannot consistently outperform passive strategies, behavioral finance theorists argue for the existence of market inefficiencies that allow for predictability in stock prices. The paper also emphasizes the integration of machine learning techniques into financial forecasting, noting that models such as ARIMA, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) have shown promise in predicting asset prices. Recent studies indicate that deep learning models, particularly Long Short-Term Memory (LSTM) networks, outperform traditional statistical methods, especially in volatile markets.

Furthermore, the paper discusses various studies that apply machine learning to financial time series forecasting, revealing that models leveraging historical price data can yield superior results compared to conventional approaches. For instance, LSTM networks have demonstrated significant improvements in predictive accuracy, while other machine learning algorithms have been shown to outperform passive investment strategies in terms of risk-adjusted returns. The findings suggest that machine learning models, particularly when trained on price data rather than returns, can enhance trading strategies and provide deeper insights into market dynamics. The research aims to build on these findings by comparing different neural network models to develop effective trading strategies using a comprehensive set of features derived from various economic indicators.