التعلم الآلي، توقع سوق الأسهم، وكفاءة السوق: دراسة مقارنة Machine learning, stock market forecasting, and market efficiency: a comparative study

المجلة: International Journal of Data Science and Analytics، المجلد: 20، العدد: 7
DOI: https://doi.org/10.1007/s41060-025-00854-4
تاريخ النشر: 2025-07-25
المؤلف: Oscar H. Bustos وآخرون
الموضوع الرئيسي: طرق التنبؤ بسوق الأسهم

نظرة عامة

تستكشف هذه الورقة البحثية دقة خوارزميات التعلم الآلي المختلفة في التنبؤ بمؤشرات سوق الأسهم، مع معالجة فجوة في الأدبيات الحالية التي تركز غالبًا على مؤشرات محدودة وإطارات زمنية قصيرة مع تجاهل تأثير كفاءة السوق. تستخدم الدراسة مجموعة بيانات شاملة تشمل 55 سوقًا على مدى 65 فترة شهرين من يناير 2011 إلى يوليو 2022. تقيم أداء خوارزميات مثل آلات الدعم الناقل (SVM)، والشبكات العصبية الاصطناعية (ANN)، والأشجار المعززة بالتدرج (GB)، والغابات العشوائية (RF)، والانحدار اللوجستي، ونماذج الذاكرة طويلة وقصيرة الأجل (LSTM)، باستخدام مؤشرات فنية كمدخلات. تكشف النتائج أن SVM حققت أعلى دقة متوسطة بنسبة 51.64%، تليها GB وANN بنسبة 51.61% و51.60% على التوالي. على الرغم من الدقة القريبة، تشير الانحرافات المعيارية العالية إلى تعقيد النمذجة عبر أسواق وفترات متنوعة.

من المثير للاهتمام، أن الدراسة لم تجد علاقة كبيرة بين كفاءة السوق ودقة الخوارزمية، كما يتضح من تحليل بيانات اللوحة الديناميكية واختبارات عدم السببية لجراجر. وهذا يشير إلى أن اعتماد الخوارزميات على المؤشرات الفنية غير متأثر بكفاءة السوق، مما يتعارض مع نظرية السوق الكفء التقليدية. يقترح المؤلفون تعزيز مدخلات النموذج من خلال دمج البيانات الأساسية والمعلومات من الأخبار ووسائل التواصل الاجتماعي لتحسين الدقة التنبؤية. كما يقترحون أنه يمكن تضمين مقاييس الكفاءة كبيانات مدخلة، مع افتراض أن انخفاض كفاءة السوق قد يعزز أداء الخوارزمية. تشمل اتجاهات البحث المستقبلية تحسين وزن مقاييس الدقة واستكشاف نماذج ARDL المتغيرة زمنياً لفهم تأثير كفاءة السوق على دقة الخوارزمية خلال أنظمة تقلب السوق المختلفة.

مقدمة

تسلط مقدمة الورقة الضوء على أهمية التنبؤ بدقة بحركات سوق الأسهم لمختلف أصحاب المصلحة، بما في ذلك المستثمرين وصناديق التقاعد. على الرغم من التحديات المرتبطة بالتنبؤ، ظهرت خوارزميات التعلم الآلي كالأدوات السائدة لتنبؤ سوق الأسهم، متجاوزة الطرق الإحصائية التقليدية. يشير المؤلفون إلى فجوة بحثية في الأدبيات الحالية، التي تركز في الغالب على عدد محدود من الأسواق وفترات زمنية قصيرة، حيث تتراوح معظم الدراسات حول خمس سنوات أو أقل. علاوة على ذلك، تم تجاهل تأثير كفاءة السوق على دقة التنبؤ إلى حد كبير، على الرغم من أن الافتراضات النظرية تشير إلى أن الأسواق الأكثر كفاءة قد تعيق القدرات التنبؤية.

لمعالجة هذه القصور، تقترح الورقة تحليلًا شاملاً يقيم أداء خوارزميات التعلم الآلي المختلفة—مثل آلات الدعم الناقل، والشبكات العصبية الاصطناعية، وشبكات الذاكرة طويلة وقصيرة الأجل—عبر مجموعة متنوعة من 55 سوقًا للأسهم على مدى إطار زمني كبير يتكون من 65 فترة شهرين من يناير 2011 إلى يوليو 2022. تهدف الدراسة إلى مقارنة دقة الخوارزميات بشكل منهجي مع فحص تأثير كفاءة السوق، الذي يتم قياسه باستخدام معامل هيرست، ومقياس هول-وود، ومقدّر غينتون القوي. تسعى هذه المقاربة إلى تقديم نتائج أكثر قابلية للتعميم بشأن فعالية التعلم الآلي في التنبؤ بسوق الأسهم، خاصة فيما يتعلق بكفاءة السوق.

الطرق

في هذه الدراسة، نستخدم منهجية من ثلاث طبقات لتحليل مؤشرات سوق الأسهم، كما هو موضح في الشكل 1. تتضمن الطبقة الأولى جمع البيانات من مجموعة متنوعة من مؤشرات الأسهم وأيام التداول. تركز الطبقة الثانية على بناء مجموعتين متميزتين من البيانات: مجموعة بيانات مصنفة واحدة لتدريب خوارزميات التعلم الآلي ومجموعة بيانات أخرى تحتوي على تقديرات الكفاءة لكل مؤشر سهم. تتضمن الطبقة النهائية تطبيق خوارزميات التنبؤ المعروفة على عينة من المؤشرات.

لتحسين تقييم أداء الخوارزمية، نقدم مجموعة جديدة من المقاييس التي تتجاوز التقييمات التقليدية المعتمدة على المتوسط، والتي تُستخدم عادةً في الأدبيات الحالية. لا تحدد هذه المقاربة فقط أكثر الخوارزميات فعالية ولكن تستكشف أيضًا العلاقة المحتملة بين كفاءة سوق الأسهم ودقة هذه الخوارزميات من خلال تحليل بيانات اللوحة الديناميكية. على مدار الدراسة، تم تدريب النماذج واختبارها بشكل منهجي عبر فترات زمنية مختلفة، مع حساب مقدرات الكفاءة لكل إطار زمني. من المهم ملاحظة أن مصطلحات “مؤشر سوق الأسهم”، “سوق الأسهم”، و”السوق” تُستخدم بالتبادل، حيث يتوافق كل مؤشر مع سوق محدد.

النتائج

في قسم النتائج، تبدأ الدراسة بتقييم الدقة النسبية لمختلف الخوارزميات، كما هو مفصل في القسم الفرعي 5.1. تهدف هذه المقارنة إلى تحديد أي خوارزمية تؤدي بشكل أفضل تحت الظروف المحددة. بعد ذلك، يتناول القسم الفرعي 5.2 العلاقة بين دقة الخوارزمية وكفاءة السوق، موفرًا رؤى حول كيفية تأثير فعالية هذه الخوارزميات على كفاءة السوق التي يتم تطبيقها فيها. تعتبر النتائج من هذه التحليلات حاسمة لفهم الآثار العملية لأداء الخوارزمية في السياقات المالية الواقعية.

المناقشة

في هذا القسم، يناقش المؤلفون منهجيات مختلفة لتقدير كفاءة السوق، مع التركيز بشكل خاص على معامل هيرست ومقدرات البعد الكسري. يتم حساب معامل هيرست، الذي تم اقتراحه في البداية للدراسات الهيدرولوجية، باستخدام تحليل النطاق المعاد قياسه (تحليل R/S)، والذي يتضمن تقسيم بيانات السلاسل الزمنية وتطبيق الانحدار الخطي لاشتقاق الأس exponent $H$. يستخدم المؤلفون دالة R `hurstexp` من مكتبة Pracma لهذا التقدير. بالإضافة إلى ذلك، يتم استخدام مقدرين للبعد الكسري: مقدر هول-وود، الذي يستخدم تقنيات عد الصناديق، ومقياس RG، وهو نهج قوي للمتغيرات. تعتبر هذه المقدرات حاسمة لقياس كفاءة السوق، التي يُفترض أنها ترتبط بدقة التنبؤ لخوارزميات التعلم الآلي في التنبؤ بسوق الأسهم.

كما يبرز المؤلفون التباين في معدلات الدقة لخوارزميات التعلم الآلي المطبقة على تنبؤات سوق الأسهم، مشيرين إلى دراسات من 2000 إلى 2023. يلاحظون أنه بينما تحقق بعض الخوارزميات معدلات دقة عالية، فإن البعض الآخر يؤدي بشكل ضعيف، مما يشير إلى عدم وجود توافق حول أكثر الطرق فعالية عبر أسواق مختلفة. تهدف الدراسة إلى معالجة هذه الفجوة من خلال التحقيق في العلاقة بين دقة التنبؤ الخوارزمية وكفاءة السوق، كما تم قياسها بواسطة معامل هيرست ومقاييس البعد الكسري. تستند الدراسة إلى مجموعة بيانات شاملة من أسعار الإغلاق اليومية من 55 مؤشر سهم، تمتد على أكثر من 3000 يوم تداول، وتستخدم تقنية التحقق المتقاطع القوية لضمان موثوقية نماذج التعلم الآلي المطورة.

Journal: International Journal of Data Science and Analytics, Volume: 20, Issue: 7
DOI: https://doi.org/10.1007/s41060-025-00854-4
Publication Date: 2025-07-25
Author(s): Oscar H. Bustos et al.
Primary Topic: Stock Market Forecasting Methods

Overview

This research paper investigates the accuracy of various machine learning algorithms in predicting stock market indices, addressing a gap in existing literature that often focuses on limited indices and timeframes while neglecting the impact of market efficiency. The study utilizes a comprehensive dataset encompassing 55 markets over 65 two-month periods from January 2011 to July 2022. It evaluates the performance of algorithms such as support vector machines (SVM), artificial neural networks (ANN), gradient-boosted trees (GB), random forest (RF), logistic regression, and long-short term memory (LSTM) models, using technical indicators as inputs. The findings reveal that SVM achieved the highest average accuracy of 51.64%, closely followed by GB and ANN at 51.61% and 51.60%, respectively. Despite the close accuracies, the high standard deviation indicates the complexity of modeling across diverse markets and periods.

Interestingly, the study finds no significant relationship between market efficiency and algorithm accuracy, as demonstrated by dynamic panel data analysis and Granger non-causality tests. This suggests that the algorithms’ reliance on technical indicators is unaffected by market efficiency, contradicting traditional efficient market theory. The authors propose enhancing model inputs by incorporating fundamental data and information from news and social media to potentially improve predictive accuracy. They also suggest that efficiency metrics could be included as input data, positing that lower market efficiency might enhance algorithm performance. Future research directions include refining the weighting of accuracy measures and exploring time-variant ARDL models to better understand the influence of market efficiency on algorithm accuracy during different market volatility regimes.

Introduction

The introduction of the paper highlights the significance of accurately predicting stock market movements for various stakeholders, including investors and pension funds. Despite the challenges associated with forecasting, machine learning algorithms have emerged as the predominant tools for stock market prediction, overshadowing traditional statistical methods. The authors note a research gap in the existing literature, which predominantly focuses on a limited number of markets and short time spans, with most studies averaging around five years or less. Furthermore, the influence of market efficiency on prediction accuracy has been largely overlooked, despite theoretical implications suggesting that more efficient markets may hinder predictive capabilities.

To address these shortcomings, the paper proposes a comprehensive analysis that evaluates the performance of various machine learning algorithms—such as support vector machines, artificial neural networks, and long-short term memory networks—across a diverse panel of 55 stock markets over a substantial time frame of 65 two-month periods from January 2011 to July 2022. The study aims to systematically compare algorithmic accuracy while also examining the impact of market efficiency, quantified using the Hurst exponent, Hall-Wood metric, and Robust Genton estimator. This approach seeks to provide more generalizable findings regarding the efficacy of machine learning in stock market prediction, particularly in relation to market efficiency.

Methods

In this study, we employ a three-layer methodology for analyzing stock market indices, as illustrated in Figure 1. The first layer involves gathering data from a diverse range of stock indices and trading days. The second layer focuses on constructing two distinct datasets: one labeled dataset for training machine learning algorithms and another dataset that contains efficiency estimates for each stock index. The final layer entails applying widely recognized forecasting algorithms to the sample of indices.

To enhance the evaluation of algorithm performance, we introduce a novel set of metrics that move beyond traditional average-based assessments, which are commonly used in existing literature. This approach not only identifies the most effective algorithms but also explores the potential correlation between stock market efficiency and the accuracy of these algorithms through dynamic panel data analysis. Throughout the study, models were systematically trained and tested across different periods, with efficiency estimators calculated for each timeframe. It is important to note that the terms “stock market index,” “stock market,” and “market” are used interchangeably, as each index corresponds to a specific market.

Results

In the Results section, the study begins by evaluating the relative accuracy of various algorithms, as detailed in subsection 5.1. This comparison aims to identify which algorithm performs best under the specified conditions. Following this, subsection 5.2 delves into the relationship between algorithm accuracy and market efficiency, providing insights into how the effectiveness of these algorithms may be influenced by the efficiency of the market in which they are applied. The findings from these analyses are crucial for understanding the practical implications of algorithm performance in real-world financial contexts.

Discussion

In this section, the authors discuss various methodologies for estimating market efficiency, particularly focusing on the Hurst exponent and fractal dimension estimators. The Hurst exponent, initially proposed for hydrological studies, is calculated using re-scaled range analysis (R/S analysis), which involves segmenting time series data and applying linear regression to derive the exponent $H$. The authors utilize the R function `hurstexp` from the Pracma library for this estimation. Additionally, two fractal dimension estimators are employed: the Hall-Wood estimator, which uses box-counting techniques, and the RG metric, a robust variogram approach. These estimators are crucial for quantifying market efficiency, which is hypothesized to correlate with the predictive accuracy of machine learning algorithms in stock market forecasting.

The authors also highlight the variability in accuracy rates of machine learning algorithms applied to stock market predictions, referencing studies from 2000 to 2023. They note that while some algorithms achieve high accuracy rates, others perform poorly, indicating a lack of consensus on the most effective methods across different markets. The research aims to address this gap by investigating the relationship between algorithmic prediction accuracy and market efficiency, as measured by the Hurst exponent and fractal dimension metrics. The study is grounded in a comprehensive dataset of daily closing prices from 55 stock indices, spanning over 3000 trading days, and employs a robust cross-validation technique to ensure the reliability of the machine learning models developed.