استغلال نموذج اللغة الكبير كمؤشر لمشاعر الأخبار في أسواق الأسهم: استراتيجية معززة بالمعرفة Leveraging large language model as news sentiment predictor in stock markets: a knowledge-enhanced strategy

المجلة: Discover Computing، المجلد: 28، العدد: 1
DOI: https://doi.org/10.1007/s10791-025-09573-7
تاريخ النشر: 2025-05-09
المؤلف: Weisi Chen وآخرون
الموضوع الرئيسي: طرق التنبؤ بسوق الأسهم

نظرة عامة

في سياق المجال المتقدم بسرعة للذكاء الاصطناعي، يتناول هذا البحث دمج معالجة اللغة الطبيعية (NLP) والتحليل المالي، تحديدًا من خلال تحليل المشاعر لمقالات الأخبار المالية. غالبًا ما تكافح الطرق التقليدية في تحليل المشاعر لفهم السياق، مما دفع المؤلفين إلى اقتراح إطار عمل جديد يسمى سلسلة المعرفة الخاصة بالمجال (DK-CoT). تجمع هذه الاستراتيجية بين المعرفة المالية الخاصة بالمجال والتفكير المتسلسل لتعزيز أداء نماذج اللغة الكبيرة (LLMs) في تحليل المشاعر. يقدم البحث درجة F1 الموزونة كمقياس تقييم أكثر ملاءمة، تعكس التأثير الكبير للأخبار السلبية على الأسواق المالية، ويظهر أن DK-CoT يتفوق على النماذج المعروفة مثل BERT و RoBERTa من حيث الدقة والموثوقية.

تشير النتائج إلى أن LLMs، خاصة عندما يتم تعزيزها بـ DK-CoT، يمكن أن تلتقط بفعالية مشاعر السوق من الأخبار المالية، مما يحسن دقة توقع المشاعر. يقترح البحث عدة مجالات للبحث المستقبلي، بما في ذلك تطبيق DK-CoT على مجالات مالية أخرى، والتكامل الفوري للأخبار المالية، ومعالجة الاعتبارات الأخلاقية في تحليل المشاعر. بالإضافة إلى ذلك، يقترح الاستفادة من LLMs لمهام التنبؤ، مثل توقع اتجاهات الأسهم، من خلال دمج بيانات المشاعر مع المقاييس المالية الأساسية. بشكل عام، يبرز البحث أهمية دمج المعرفة الخاصة بالمجال في تصميمات التحفيز لتحسين أداء LLM مع تعزيز ممارسات الذكاء الاصطناعي المستدامة.

مقدمة

تسلط المقدمة الضوء على الاهتمام المتزايد في تقارب معالجة اللغة الطبيعية (NLP) والتحليل المالي، مشددة على إمكانيته في تقديم رؤى مهمة حول ديناميات سوق الأسهم. هذا التقاطع ذو صلة خاصة لأنه يمكّن من استخراج معلومات ذات مغزى من كميات هائلة من البيانات النصية، مثل مقالات الأخبار، ووسائل التواصل الاجتماعي، والتقارير المالية، التي يمكن أن تؤثر على اتجاهات السوق وسلوك المستثمرين.

تضع هذه القسم الأساس لاستكشاف كيفية تطبيق تقنيات NLP لتحليل المشاعر، وتوقع تحركات أسعار الأسهم، وتعزيز عمليات اتخاذ القرار في المالية. من خلال الاستفادة من الخوارزميات المتقدمة ونماذج التعلم الآلي، يهدف الباحثون إلى اكتشاف الأنماط والارتباطات التي قد تتجاهلها الطرق التحليلية التقليدية، وبالتالي المساهمة في فهم أكثر دقة لتقلبات السوق.

طرق

في هذه الدراسة، تم تقييم عدة نماذج، بما في ذلك BERT، وBERT المعدل (FinBERT)، وFLANG-RoBERTa، وFLANG-RoBERTa المعدل (Fin-FLANG-RoBERTa)، وGLM2، وGLM3. تضمنت الإعدادات التجريبية تطبيق تقنيات تحفيز محددة، كما هو موضح في القسم 3.3، لتقييم أداء هذه النماذج. تشمل مقاييس التقييم المستخدمة الدقة، ودرجة F1، والدقة، والاسترجاع، مع توفير صيغها المعنية. على سبيل المثال، يتم حساب الدقة كـ \( \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \)، بينما يتم تعريف درجة F1 كـ \( F1_{\text{Score}} = \frac{2 \times \text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \).

لمعالجة عدم اليقين في المحتوى الذي تنتجه نماذج GLM، تم تحديد فئات المشاعر لعناصر الأخبار من خلال آلية تصويت على مدى عشرين جولة تنفيذ لكل قالب تحفيز. ستقدم الأقسام التالية نتائج تجريبية مفصلة لكل نموذج، مع تسليط الضوء على أدائها بناءً على المقاييس المذكورة أعلاه.

نتائج

تكشف نتائج الدراسة حول استراتيجية تحفيز DK-CoT عن عدة نتائج رئيسية تتعلق بتنفيذها في تحليل مشاعر المالية. إن دمج المعرفة الخاصة بالمجال في تصميم التحفيز يقلل بشكل كبير من الغموض، مما يسمح للنموذج بتفسير المصطلحات المالية وتعبيرات المشاعر بشكل أفضل. تعزز هذه الطريقة الكفاءة الحسابية من خلال استخدام تقنيات التحفيز دون الحاجة إلى تعديل دقيق واسع، مما يحفظ الموارد مقارنة بتعديلات النماذج التقليدية. تظهر دراسات الإلغاء التي أجريت أن DK-CoT يتفوق باستمرار على استراتيجيات التحفيز الأخرى، مما يدل على قوتها أمام التغيرات في تصميم التحفيز. علاوة على ذلك، تظهر الطريقة أداءً مستقرًا عبر عينات متنوعة من الأخبار المالية، بما في ذلك النصوص خارج التوزيع، مما يشير إلى فعاليتها في السيناريوهات المالية الديناميكية.

يسلط البحث الضوء أيضًا على العلاقة بين هندسة التحفيز وأداء نماذج اللغة الكبيرة (LLMs). بشكل محدد، فإن تقنية التحفيز القليل مع تحفيز DK-CoT تحقق أفضل النتائج، خاصة بالنسبة للنماذج الأكثر تقدمًا مثل GLM2 وGLM3. تشير النتائج إلى أن معلمات النموذج الأكبر والمقاييس عمومًا ترتبط بتحسين الأداء في مهام تحليل المشاعر. بالإضافة إلى ذلك، فإن تقديم درجة F1 الموزونة كمقياس تقييم مبرر، حيث يأخذ في الاعتبار التأثيرات المختلفة لمشاعر متنوعة على سلوك السوق، مما يتماشى مع مقاييس أداء النموذج مع عمليات اتخاذ القرار المالية في العالم الحقيقي. بشكل عام، يقترح البحث أن التحفيز الجماعي والمزيد من التحسينات لاستراتيجية DK-CoT يمكن أن تعزز أداء النموذج، على الرغم من أن هذه الاستكشافات تأتي في المرتبة الثانية بالنسبة للنتائج الرئيسية.

نقاش

في هذا القسم، يناقش البحث التقدم في تحليل المشاعر، خاصة ضمن المجال المالي، مشددًا على قيود الطرق التقليدية للتعلم الآلي وإمكانات نماذج اللغة الكبيرة (LLMs) وتقنيات هندسة التحفيز. يبرز المؤلفون أنه بينما تكافح الطرق التقليدية غالبًا لالتقاط السياق الدقيق للأخبار المالية، يمكن أن تحسن LLMs، خاصة عندما يتم تعزيزها من خلال هندسة التحفيز المنظمة، دقة تحليل المشاعر بشكل كبير. يهدف إطار عمل سلسلة المعرفة الخاصة بالمجال (DK-CoT) المقترح إلى دمج المعرفة المالية الخاصة بالمجال مع التفكير المتسلسل، مما يعزز القدرات التنبؤية لـ LLMs في تحليل مشاعر السوق من الأخبار المالية.

يتناول البحث سؤالين رئيسيين: (1) الاستخدام الفعال لـ LLMs كمؤشرات لمشاعر الأخبار المالية لتحسين توقعات سوق الأسهم، و(2) الفعالية المقارنة لاستراتيجية DK-CoT مقابل تقنيات التحفيز الأخرى مثل التحفيز الصفري، والتحفيز القليل، والتفكير المتسلسل. تشير النتائج إلى أن DK-CoT لا يتفوق فقط على النماذج الحالية مثل BERT وRoBERTa، ولكنها أيضًا تقدم درجة F1 الموزونة كمقياس تقييم جديد يعكس بشكل أفضل تأثير الأخبار السلبية على الأسواق المالية. بشكل عام، يعزز البحث التطبيق العملي لتقنيات الذكاء الاصطناعي المتقدمة في المالية، داعيًا إلى ممارسات الذكاء الاصطناعي المستدامة التي تعزز الوصول وتقلل من المتطلبات الحسابية.

Journal: Discover Computing, Volume: 28, Issue: 1
DOI: https://doi.org/10.1007/s10791-025-09573-7
Publication Date: 2025-05-09
Author(s): Weisi Chen et al.
Primary Topic: Stock Market Forecasting Methods

Overview

In the context of the rapidly advancing field of artificial intelligence, this paper addresses the integration of natural language processing (NLP) and financial analysis, specifically through sentiment analysis of financial news articles. Traditional methods in sentiment analysis often struggle with contextual understanding, prompting the authors to propose a novel framework called Domain Knowledge Chain-of-Thought (DK-CoT). This strategy combines domain-specific financial knowledge with chain-of-thought reasoning to enhance the performance of large language models (LLMs) in sentiment analysis. The study introduces the weighted F1 score as a more relevant evaluation metric, reflecting the significant impact of negative news on financial markets, and demonstrates that DK-CoT outperforms established models like BERT and RoBERTa in accuracy and reliability.

The findings indicate that LLMs, particularly when enhanced with DK-CoT, can effectively capture market sentiments from financial news, thus improving sentiment prediction accuracy. The paper suggests several avenues for future research, including the application of DK-CoT to other financial domains, real-time integration of financial news, and addressing ethical considerations in sentiment analysis. Additionally, it proposes leveraging LLMs for forecasting tasks, such as stock trend prediction, by combining sentiment data with fundamental financial metrics. Overall, the research underscores the importance of incorporating domain-specific knowledge in prompt designs to optimize LLM performance while promoting sustainable AI practices.

Introduction

The introduction highlights the growing interest in the convergence of natural language processing (NLP) and financial analysis, emphasizing its potential to yield significant insights into stock market dynamics. This intersection is particularly relevant as it enables the extraction of meaningful information from vast amounts of textual data, such as news articles, social media, and financial reports, which can influence market trends and investor behavior.

The section sets the stage for exploring how NLP techniques can be applied to analyze sentiment, predict stock price movements, and enhance decision-making processes in finance. By leveraging advanced algorithms and machine learning models, researchers aim to uncover patterns and correlations that traditional analytical methods may overlook, thereby contributing to a more nuanced understanding of market fluctuations.

Methods

In this study, several models were evaluated, including BERT, fine-tuned BERT (FinBERT), FLANG-RoBERTa, fine-tuned FLANG-RoBERTa (Fin-FLANG-RoBERTa), GLM2, and GLM3. The experimental setup involved applying specific prompting techniques, as outlined in Section 3.3, to assess the performance of these models. The evaluation metrics used include Accuracy, F1 Score, Precision, and Recall, with their respective formulas provided. For instance, Accuracy is calculated as \( \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \), while the F1 Score is defined as \( F1_{\text{Score}} = \frac{2 \times \text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \).

To address the uncertainty in content generated by the GLM models, sentiment categories for news items were determined through a voting mechanism over twenty execution rounds for each prompt template. The subsequent sections will present detailed experimental results for each model, highlighting their performance based on the aforementioned metrics.

Results

The results of the study on the DK-CoT prompting strategy reveal several key findings regarding its implementation in financial sentiment analysis. The incorporation of domain-specific knowledge into prompt design significantly mitigates ambiguity, allowing the model to better interpret financial terminology and sentiment expressions. This approach enhances computational efficiency by utilizing prompting techniques without extensive fine-tuning, thus conserving resources compared to traditional model adaptations. The ablation studies conducted demonstrate that DK-CoT consistently outperforms other prompting strategies, indicating its robustness to variations in prompt design. Furthermore, the method exhibits stable performance across diverse financial news samples, including out-of-distribution texts, suggesting its effectiveness in dynamic financial scenarios.

The research also highlights the relationship between prompt engineering and the performance of large language models (LLMs). Specifically, the few-shot prompting technique combined with DK-CoT prompts yields the best results, particularly for more advanced models like GLM2 and GLM3. The findings indicate that larger model parameters and scales generally correlate with improved performance in sentiment analysis tasks. Additionally, the introduction of a weighted F1 score as an evaluation metric is justified, as it accounts for the differential impacts of various sentiments on market behavior, aligning the model’s performance metrics with real-world financial decision-making processes. Overall, the study suggests that ensemble prompting and further refinements to the DK-CoT strategy could enhance model performance, although these explorations are secondary to the main findings.

Discussion

In this section, the paper discusses the advancements in sentiment analysis, particularly within the financial domain, emphasizing the limitations of traditional machine learning approaches and the potential of large language models (LLMs) and prompt engineering techniques. The authors highlight that while conventional methods often struggle to capture the nuanced context of financial news, LLMs, especially when enhanced through structured prompt engineering, can significantly improve sentiment analysis accuracy. The proposed Domain Knowledge Chain-of-Thought (DK-CoT) framework aims to integrate domain-specific financial knowledge with CoT reasoning, thereby enhancing the predictive capabilities of LLMs in analyzing market sentiments from financial news.

The study addresses two primary research questions: (1) the effective utilization of LLMs as predictors of financial news sentiment to improve stock market predictions, and (2) the comparative effectiveness of the DK-CoT strategy against other prompting techniques such as zero-shot, few-shot, and CoT. The findings suggest that DK-CoT not only outperforms existing models like BERT and RoBERTa but also introduces a weighted F1 score as a new evaluation metric that better reflects the impact of negative news on financial markets. Overall, the research promotes the practical application of advanced AI techniques in finance, advocating for sustainable AI practices that enhance accessibility and reduce computational demands.