التنبؤ بالعوائد النسبية لأسهم S&P 500 باستخدام التعلم الآلي Forecasting relative returns for S&P 500 stocks using machine learning

المجلة: Financial Innovation، المجلد: 10، العدد: 1
DOI: https://doi.org/10.1186/s40854-024-00644-0
تاريخ النشر: 2024-04-20
المؤلف: Htet Htet Htun وآخرون
الموضوع الرئيسي: طرق التنبؤ بسوق الأسهم

نظرة عامة

تتناول ورقة البحث تعقيدات توقعات سوق الأسهم، مع تسليط الضوء على التحديات التي تطرحها الطبيعة غير الخطية وغير الثابتة لبيانات الأسهم، بالإضافة إلى التأثيرات الخارجية مثل الظروف الاقتصادية ومشاعر المستثمرين. تنتقد فرضية السير العشوائي وفرضية السوق الفعالة، التي تقترح أن تحركات أسعار الأسهم غير قابلة للتنبؤ بطبيعتها. ومع ذلك، يجادل المؤلفون بأن أسعار الأسهم قد تظهر بعض القابلية للتنبؤ بسبب سلوكيات المستثمرين النظامية.

في دراستهم، يستخدم المؤلفون عوائد نسبية تاريخية متعددة كميزات إدخال لآلات التصنيف مثل الغابة العشوائية (RF) وآلة الدعم الناقل (SVM) وذاكرة طويلة وقصيرة الأجل (LSTM) لتحديد الأسهم التي من المحتمل أن تتجاوز عائد 2% مقارنة بمؤشر S&P 500 على مدى عشرة أيام. تشير النتائج إلى أنه بينما تحقق RF وSVM دقة تصنيف متقاربة، تتفوق RF في الاسترجاع. ومن الجدير بالذكر أن مصنف LSTM يتفوق على كل من RF وSVM عبر جميع المقاييس، مما يظهر تفوقًا إحصائيًا كبيرًا على مصنف الاختيار العشوائي. تشير النتائج إلى أن مصنفي التعلم الآلي الذين يستفيدون من العوائد النسبية التاريخية يمكن أن يعززوا استراتيجيات اختيار الأسهم، مما يتحدى صحة فرضيات السير العشوائي والسوق الفعالة. ستوسع الأبحاث المستقبلية هذه النتائج من خلال دمج مؤشرات فنية متنوعة واستكشاف منهجيات تدريب مختلفة.

الطرق

توضح قسم “الطرق” تقنيات جمع البيانات والتحليل المستخدمة في الدراسة. استخدم الباحثون مجموعة بيانات شاملة، تم تنسيقها بعناية لضمان الدقة والملاءمة للأسئلة البحثية المطروحة. تم تطبيق منهجيات محددة، بما في ذلك التحليلات الإحصائية والنماذج الحسابية، لتفسير البيانات بشكل فعال.

تتفصل الدراسة أيضًا في التصميم التجريبي، بما في ذلك أي تدابير تحكم ومتغيرات تم أخذها في الاعتبار، لتعزيز موثوقية النتائج. تعتبر الطرق المستخدمة حاسمة للتحقق من صحة النتائج وضمان أن الاستنتاجات المستخلصة قوية وصحيحة علميًا. بشكل عام، يبرز هذا القسم النهج المنهجي المتبع لجمع وتحليل البيانات، والذي يدعم مساهمات الدراسة في هذا المجال.

النتائج

في هذا القسم، يقيم المؤلفون أداء التصنيف لثلاثة مصنفات—الغابة العشوائية (RF) وآلة الدعم الناقل (SVM) وذاكرة طويلة وقصيرة الأجل (LSTM)—باستخدام مقاييس مثل الدقة (A) والدقة (P) والاسترجاع (R) عبر 483 يوم تداول و23 مصنفًا. تشير النتائج إلى أن RF وSVM تؤديان بشكل مشابه من حيث الدقة والدقة، مع إظهار RF ميزة إحصائية كبيرة في الاسترجاع (p = 0.0005). ومع ذلك، عند مقارنة RF بـ LSTM، يتفوق الأخير على RF عبر جميع المقاييس، مع قيم p كبيرة (p = 0.004 للدقة، p < 0.0001 للدقة، وp < 0.0001 للاسترجاع). علاوة على ذلك، يتم مقارنة مصنف LSTM أيضًا بمصنف الاختيار العشوائي (RC)، مما يظهر أداءً متفوقًا في جميع المقاييس عبر السنوات من 2019 إلى 2021. يحقق LSTM باستمرار قيم دقة واسترجاع ودقة أعلى من مصنف RC، مع دعم إحصائي قوي (على سبيل المثال، t = 8.63، p < 1e-16 للدقة). تشمل التحليلات مقاييس الأداء السنوية، مما يكشف أن LSTM يؤدي بشكل أفضل خلال سنوات الجائحة (2020 و2021) مقارنة بعام 2019. بشكل عام، تدعم النتائج بقوة الاستنتاج بأن نهج تصنيف LSTM يتفوق بشكل كبير على كل من RF وSVM وRC في توقعات سوق الأسهم.

المناقشة

في هذه الدراسة، يبحث المؤلفون في فعالية مصنفات التعلم الآلي (ML)—تحديدًا الغابة العشوائية (RF) وآلات الدعم الناقل (SVM) وذاكرة طويلة وقصيرة الأجل (LSTM)—في توقع العوائد النسبية للأسهم التي تتجاوز عتبة 2% على مدى عشرة أيام، باستخدام العوائد النسبية التاريخية كميزات إدخال. تستند التحليلات إلى بيانات من 494 سهمًا في مؤشر S&P 500 من يناير 2017 إلى يناير 2022. تكشف النتائج أنه بينما تظهر RF وSVM دقة تصنيف ودقة متقاربة، تظهر RF استرجاعًا متفوقًا. ومن الجدير بالذكر أن مصنف LSTM يتفوق على كل من RF وSVM عبر جميع المقاييس، بما في ذلك الدقة والدقة والاسترجاع، مع نتائج إحصائية كبيرة تشير إلى ميزة قوية على طرق الاختيار العشوائي.

تؤكد الدراسة على إمكانيات مصنفات ML في اختيار الأسهم، مما يتحدى فرضيات السير العشوائي والسوق الفعالة. يقترح المؤلفون أن الأبحاث المستقبلية يمكن أن تعزز الأداء التنبؤي من خلال دمج مؤشرات فنية متنوعة وتغيير فترات التدريب/الاختبار، بالإضافة إلى استكشاف أحجام نوافذ تدريب مختلفة وعتبات العائد. بشكل عام، تدعو النتائج إلى استخدام تقنيات ML في التنبؤ المالي، لا سيما في تحديد الأسهم التي من المحتمل أن تتفوق على مؤشر السوق.

Journal: Financial Innovation, Volume: 10, Issue: 1
DOI: https://doi.org/10.1186/s40854-024-00644-0
Publication Date: 2024-04-20
Author(s): Htet Htet Htun et al.
Primary Topic: Stock Market Forecasting Methods

Overview

The research paper addresses the complexities of stock market forecasting, highlighting the challenges posed by the nonlinear and non-stationary nature of stock data, as well as external influences such as economic conditions and investor sentiment. It critiques the random walk hypothesis and the efficient market hypothesis, which suggest that stock price movements are inherently unpredictable. However, the authors argue that stock prices may exhibit some predictability due to systematic investor behaviors.

In their study, the authors employ multiple historical relative returns as input features for Random Forest (RF), Support Vector Machine (SVM), and Long Short-Term Memory (LSTM) classifiers to identify stocks that are likely to exceed a 2% return relative to the S&P 500 index over a ten-day period. The results indicate that while RF and SVM yield comparable classification accuracy, RF excels in recall. Notably, the LSTM classifier outperforms both RF and SVM across all metrics, demonstrating significant statistical superiority over a random choice classifier. The findings suggest that machine learning classifiers leveraging historical relative returns can enhance stock selection strategies, challenging the validity of the random walk and efficient market hypotheses. Future research will expand on these findings by incorporating various technical indicators and exploring different training methodologies.

Methods

The section on “Methods” outlines the data collection and analytical techniques employed in the study. The researchers utilized a comprehensive dataset, which was meticulously curated to ensure accuracy and relevance to the research questions posed. Specific methodologies, including statistical analyses and computational models, were applied to interpret the data effectively.

The study also details the experimental design, including any control measures and variables considered, to enhance the reliability of the findings. The methods employed are crucial for validating the results and ensuring that the conclusions drawn are robust and scientifically sound. Overall, this section emphasizes the systematic approach taken to gather and analyze data, which underpins the study’s contributions to the field.

Results

In this section, the authors evaluate the classification performance of three classifiers—Random Forest (RF), Support Vector Machine (SVM), and Long Short-Term Memory (LSTM)—using metrics such as accuracy (A), precision (P), and recall (R) across 483 trading days and 23 classifiers. The results indicate that RF and SVM perform similarly in terms of accuracy and precision, with RF showing a statistically significant advantage in recall (p = 0.0005). However, when comparing RF to LSTM, the latter outperforms RF across all metrics, with significant p-values (p = 0.004 for accuracy, p < 0.0001 for precision, and p < 0.0001 for recall). Furthermore, the LSTM classifier is also compared to a random choice (RC) classifier, demonstrating superior performance in all metrics across the years 2019 to 2021. The LSTM consistently achieves higher accuracy, precision, and recall values than the RC classifier, with strong statistical support (e.g., t = 8.63, p < 1e-16 for accuracy). The analysis includes yearly performance metrics, revealing that LSTM performs best during the pandemic years (2020 and 2021) compared to 2019. Overall, the findings robustly support the conclusion that the LSTM classification approach significantly outperforms both RF, SVM, and RC classifiers in stock market predictions.

Discussion

In this study, the authors investigate the efficacy of machine learning (ML) classifiers—specifically Random Forest (RF), Support Vector Machines (SVM), and Long Short-Term Memory (LSTM)—in predicting relative stock returns that exceed a 2% threshold over a ten-day period, using historical relative returns as input features. The analysis is based on data from 494 stocks in the S&P 500 index from January 2017 to January 2022. The findings reveal that while RF and SVM exhibit comparable classification accuracy and precision, RF demonstrates superior recall. Notably, the LSTM classifier outperforms both RF and SVM across all metrics, including accuracy, precision, and recall, with statistically significant results indicating a strong advantage over random selection methods.

The study underscores the potential of ML classifiers in stock selection, challenging the random walk and efficient market hypotheses. The authors suggest that future research could enhance predictive performance by incorporating diverse technical indicators and varying training/test periods, as well as exploring different training window sizes and return thresholds. Overall, the results advocate for the use of ML techniques in financial forecasting, particularly in identifying stocks likely to outperform the market index.