نهج متعدد الوسائط لتحليل المشاعر عبر اللغات مع مجموعة من المحولات وLLM A multimodal approach to cross-lingual sentiment analysis with ensemble of transformer and LLM

المجلة: Scientific Reports، المجلد: 14، العدد: 1
DOI: https://doi.org/10.1038/s41598-024-60210-7
PMID: https://pubmed.ncbi.nlm.nih.gov/38671064
تاريخ النشر: 2024-04-26
المؤلف: Md Saef Ullah Miah وآخرون
الموضوع الرئيسي: تحليل المشاعر واستخراج الآراء

نظرة عامة

تقدم ورقة البحث نموذجًا جماعيًا يعزز تحليل المشاعر للغات الأجنبية من خلال ترجمة النص إلى الإنجليزية. تتناول الدراسة التحديات التي تواجه تحليل المشاعر في لغات مثل العربية والصينية والفرنسية والإيطالية، خاصة في غياب بيانات تدريب مصنفة. من خلال استخدام نموذجين للترجمة الآلية العصبية، LibreTranslate وGoogle Translate، قام المؤلفون بترجمة جمل من هذه اللغات وتحليل مشاعرها باستخدام مجموعة من النماذج المدربة مسبقًا، بما في ذلك Twitter-Roberta-Base-Sentiment-Latest وbert-base-multilingual-uncased-sentiment وGPT-3 من OpenAI. أظهرت النتائج دقة تتجاوز 86%، مما يشير إلى أن النموذج المقترح يتفوق على النماذج المدربة مسبقًا الفردية وLLMs في مهام تحليل المشاعر.

في الختام، تؤكد الدراسة فعالية دمج الترجمة الآلية مع تحليل المشاعر لتقييم المشاعر بدقة في اللغات الأجنبية. حقق النموذج الجماعي، خاصة عند استخدام Google Translate، أعلى دقة واسترجاع ودرجة F1. تشير النتائج إلى أن الترجمة إلى لغة أساسية، مثل الإنجليزية، هي استراتيجية قابلة للتطبيق لتحليل المشاعر في سياقات لغوية متنوعة. يشير المؤلفون إلى التطبيقات المحتملة في مجالات مثل الأعمال، وتحليل وسائل التواصل الاجتماعي، والاستخبارات الحكومية، مع الاعتراف أيضًا بالقيود المتعلقة بحجم مجموعة البيانات والنماذج المحددة المستخدمة في تحليل المشاعر، مما يستدعي المزيد من الاستكشاف في الأبحاث المستقبلية.

الطرق

في هذه الدراسة، تم تطوير منهجية شاملة متعددة الخطوات لتحليل مشاعر النصوص باللغة الأجنبية من خلال ترجمتها إلى الإنجليزية. تتكون المنهجية من خمس مراحل متميزة: جمع البيانات، وتنظيف البيانات والمعالجة المسبقة، والترجمة إلى الإنجليزية، وتحليل المشاعر، وتقييم النتائج. في البداية، تم جمع البيانات من مصادر متنوعة، بما في ذلك وسائل التواصل الاجتماعي، والمقالات الإخبارية، والمنتديات عبر الإنترنت باللغة المستهدفة. تلا ذلك مرحلة تنظيف ومعالجة مسبقة تهدف إلى القضاء على الضوضاء، والمحتوى المكرر، والمعلومات غير ذات الصلة.

بعد ذلك، تم ترجمة البيانات المنظفة إلى الإنجليزية باستخدام نظام ترجمة آلية. ثم خضعت النصوص المترجمة لتحليل المشاعر باستخدام نموذج مصمم خصيصًا للغة الإنجليزية. تضمنت المرحلة النهائية تقييم نتائج تحليل المشاعر لتقييم دقة وفعالية النهج بالكامل. يتم توضيح نظرة عامة على هذه المنهجية في الشكل 1، مع تقديم أقسام إضافية أوصافًا تفصيلية لكل مرحلة والأدوات والتقنيات المرتبطة المستخدمة.

النتائج

في هذا القسم، يقدم المؤلفون نتائج تجاربهم على تحليل المشاعر للغات الأجنبية باستخدام نماذج التعلم الآلي، مع التركيز بشكل خاص على الترجمات من العربية والصينية والفرنسية والإيطالية إلى الإنجليزية. قاموا بتقييم خدمتي ترجمة، LibreTranslate وGoogle Translate، بالتزامن مع ثلاثة نماذج لتحليل المشاعر: Twitter-RoBERTa-Base وBERTweet-Base وGPT-3، بالإضافة إلى نموذج جماعي مقترح. تم إجراء ما مجموعه 32 تجربة، وتم حساب مقاييس الأداء مثل الدقة، والموثوقية، والاسترجاع، ودرجة F1، والخصوصية لتقييم فعالية كل مجموعة.

تشير النتائج إلى أن Google Translate مع النموذج الجماعي المقترح حقق أعلى درجة دقة بلغت 0.8671، بينما سجلت مجموعة LibreTranslate-BERTweet-Base أدنى درجة عند 0.5638. من الجدير بالذكر أن GPT-3 أظهر أداءً قويًا عبر كلا خدمتي الترجمة، مما يبرز قوته في مهام تحليل المشاعر. كان النموذج الجماعي المقترح يتفوق باستمرار على النماذج الأخرى عبر مقاييس متنوعة، مما يشير إلى فعاليته في تعزيز توقعات المشاعر. بالإضافة إلى ذلك، كشفت النتائج الخاصة باللغة أن النموذج الجماعي وGPT-3 حققا درجات دقة عالية عبر جميع اللغات، مع تميز خاص في الصينية. بشكل عام، تؤكد الدراسة على الدور الحاسم للترجمة واختيار نموذج تحليل المشاعر في تحقيق توقعات دقيقة للمشاعر، مما يمهد الطريق لأبحاث مستقبلية في تطبيقات تحليل المشاعر متعددة اللغات.

المناقشة

في قسم المناقشة من ورقة البحث، يستكشف المؤلفون مشهد تحليل المشاعر، خاصة في سياق اللغات الأجنبية حيث تكون البيانات المعلّمة نادرة. يبرزون الاعتماد السائد على النماذج المدربة مسبقًا متعددة اللغات للتعلم الانتقالي، والذي لم يحسن الدقة بشكل كبير. يقترح المؤلفون نهجًا جديدًا يتضمن ترجمة النصوص باللغة الأجنبية إلى الإنجليزية قبل إجراء تحليل المشاعر، وهي طريقة لا تزال غير مستكشفة بشكل كافٍ في الأدبيات الحالية. يشيرون إلى دراسات، مثل تلك التي أجراها سلامة وآخرون، والتي تظهر أن تحليل المشاعر على الترجمات الإنجليزية للنصوص العربية يمكن أن يحقق نتائج تنافسية مقارنة بالتحليلات الأصلية، مما يدعم جدوى طريقتهم المقترحة.

كما يناقش المؤلفون أعمالًا ذات صلة تحقق في فعالية الترجمة الآلية في الحفاظ على المشاعر أثناء عملية الترجمة. على سبيل المثال، يذكرون دراسة قامت بتجميع مجموعة بيانات معيارية للمشاعر وقيمت محركات الترجمة الآلية المختلفة، حيث وجدت أن بعض المحركات حافظت بشكل فعال على سلامة المشاعر. بالإضافة إلى ذلك، يشيرون إلى التقدم في تحليل المشاعر المتداخل اللغوي، مما يوضح كيف يمكن أن يعزز ضبط النماذج متعددة اللغات الأداء. تؤكد الورقة على إمكانيات نموذجهم الجماعي، الذي يجمع بين نماذج التحويل والنماذج اللغوية الكبيرة، لتحسين دقة تحليل المشاعر عبر عدة لغات. بشكل عام، تؤكد النتائج على أهمية الترجمة في تحليل المشاعر وتوفر إطارًا قويًا للأبحاث المستقبلية في تطبيقات تحليل المشاعر عبر اللغات.

القيود

تسلط قسم القيود عبر دراسات متنوعة الضوء على عدة تحديات حرجة ومجالات للتحسين في تطبيق نماذج اللغة الكبيرة (LLMs) لمهام مثل تحليل المشاعر، وتحليل النصوص الطبية، واكتشاف الاحتيال، وتصنيف المحتوى متعدد اللغات. على سبيل المثال، يشير وحيد وآخرون إلى أنه بينما تعزز تقنيات الضبط الخاصة بهم بشكل كبير أداء تحليل المشاعر، فإن قابلية تعميم هذه النتائج محدودة، خاصة بالنسبة للنماذج الأصغر. كما يحددون التأثير السلبي للتعليمات المعقدة على دقة النموذج، مما يشير إلى الحاجة إلى مزيد من التحسين في تصميم التعليمات.

وبالمثل، تؤكد دراسة شينغ حول تحليل المشاعر المالية على عدم الكفاءة الحاسوبية المرتبطة بالتحجيم عند استخدام وكلاء LLM المتخصصين، إلى جانب المخاوف بشأن سرية مجموعات بيانات التقييم التي قد تؤدي إلى نتائج متحيزة. يشير شو وآخرون إلى غياب عنصر “الإنسان في الحلقة” في إطار تحليل النصوص الطبية الخاص بهم، مما قد يؤثر على الفائدة السريرية للتسميات التي تولدها الذكاء الاصطناعي، ويدعون إلى تطبيقات أوسع تتجاوز تقارير الأشعة السينية للصدر. كما يبرز أودين وآخرون وريحان وآخرون القيود في دراساتهم الخاصة، مثل عدم وجود مشاركة خبراء بشريين والتركيز الضيق على لغات أو مجالات معينة، مما يحد من قابلية تعميم نتائجهم. بشكل جماعي، تؤكد هذه الدراسات على ضرورة أن تتناول الأبحاث المستقبلية هذه القيود لتعزيز قوة وملاءمة LLMs عبر سياقات متنوعة.

Journal: Scientific Reports, Volume: 14, Issue: 1
DOI: https://doi.org/10.1038/s41598-024-60210-7
PMID: https://pubmed.ncbi.nlm.nih.gov/38671064
Publication Date: 2024-04-26
Author(s): Md Saef Ullah Miah et al.
Primary Topic: Sentiment Analysis and Opinion Mining

Overview

The research paper presents an ensemble model that enhances sentiment analysis for foreign languages by translating text into English. The study addresses the challenges of sentiment analysis in languages like Arabic, Chinese, French, and Italian, particularly in the absence of labeled training data. By employing two neural machine translation models, LibreTranslate and Google Translate, the authors translated sentences from these languages and analyzed their sentiment using an ensemble of pre-trained models, including Twitter-Roberta-Base-Sentiment-Latest, bert-base-multilingual-uncased-sentiment, and OpenAI’s GPT-3. The results demonstrated an accuracy exceeding 86%, indicating that the proposed model outperforms individual pre-trained models and LLMs in sentiment analysis tasks.

In conclusion, the study confirms the effectiveness of combining machine translation with sentiment analysis to accurately assess sentiments in foreign languages. The ensemble model, particularly when utilizing Google Translate, achieved the highest precision, recall, and F1 score. The findings suggest that translating to a base language, such as English, is a viable strategy for sentiment analysis in diverse linguistic contexts. The authors note potential applications in fields like business, social media analysis, and government intelligence, while also recognizing limitations related to dataset size and the specific sentiment analysis models used, which warrant further exploration in future research.

Methods

In this study, a comprehensive multi-step methodology was developed to analyze the sentiment of foreign language texts by translating them into English. The methodology consists of five distinct phases: data collection, data cleaning and pre-processing, translation to English, sentiment analysis, and result evaluation. Initially, data was gathered from diverse sources, including social media, news articles, and online forums in the target language. This was followed by a cleaning and pre-processing stage aimed at eliminating noise, duplicate content, and irrelevant information.

Subsequently, the cleaned data was translated into English using a machine translation system. The translated texts were then subjected to sentiment analysis utilizing a model specifically designed for English. The final phase involved evaluating the results of the sentiment analysis to assess the accuracy and effectiveness of the entire approach. An overview of this methodology is illustrated in Figure 1, with further sections providing detailed descriptions of each phase and the associated tools and techniques employed.

Results

In this section, the authors present the results of their experiments on sentiment analysis of foreign languages using machine learning models, specifically focusing on translations from Arabic, Chinese, French, and Italian into English. They evaluated two translation services, LibreTranslate and Google Translate, in conjunction with three sentiment analysis models: Twitter-RoBERTa-Base, BERTweet-Base, and GPT-3, along with a proposed ensemble model. A total of 32 experiments were conducted, and performance metrics such as Accuracy, Precision, Recall, F1 Score, and Specificity were calculated to assess the effectiveness of each combination.

The findings indicate that the Google Translate combined with the proposed ensemble model achieved the highest accuracy score of 0.8671, while the LibreTranslate-BERTweet-Base combination recorded the lowest at 0.5638. Notably, GPT-3 demonstrated strong performance across both translation services, highlighting its robustness in sentiment analysis tasks. The proposed ensemble model consistently outperformed other models across various metrics, suggesting its effectiveness in enhancing sentiment predictions. Additionally, language-specific results revealed that the ensemble model and GPT-3 yielded high accuracy scores across all languages, particularly excelling in Chinese. Overall, the study underscores the critical role of translation and sentiment analysis model selection in achieving accurate sentiment predictions, paving the way for future research in multilingual sentiment analysis applications.

Discussion

In the discussion section of the research paper, the authors explore the landscape of sentiment analysis, particularly in the context of foreign languages where annotated data is scarce. They highlight the predominant reliance on multilingual pre-trained models for transfer learning, which has not significantly improved accuracy. The authors propose a novel approach that involves translating foreign language texts into English before conducting sentiment analysis, a method that remains underexplored in existing literature. They reference studies, such as those by Salameh et al., which demonstrate that sentiment analysis on English translations of Arabic texts can yield competitive results compared to native analyses, thus supporting the feasibility of their proposed method.

The authors also discuss various related works that investigate the effectiveness of machine translation in preserving sentiment during the translation process. For instance, they mention a study that curated a sentiment gold standard corpus and evaluated different machine translation engines, finding that certain engines effectively maintained sentiment integrity. Additionally, they note advancements in code-switched sentiment analysis, showcasing how fine-tuning multilingual models can enhance performance. The paper emphasizes the potential of their ensemble model, which combines transformer and large language models, to improve sentiment analysis accuracy across multiple languages. Overall, the findings underscore the importance of translation in sentiment analysis and provide a robust framework for future research in cross-lingual sentiment analysis applications.

Limitations

The section on limitations across various studies highlights several critical challenges and areas for improvement in the application of large language models (LLMs) for tasks such as sentiment analysis, medical text analysis, phishing detection, and multilingual content classification. For instance, Wahidur et al. note that while their fine-tuning techniques significantly enhance sentiment analysis performance, the generalizability of these findings is limited, particularly for smaller models. They also identify the negative impact of complex instructions on model accuracy, suggesting a need for further optimization in instruction design.

Similarly, Xing’s study on financial sentiment analysis emphasizes the computational inefficiencies associated with scalability when using specialized LLM agents, alongside concerns regarding the confidentiality of evaluation datasets that could lead to biased results. Xu et al. point out the absence of a “human-in-the-loop” component in their medical text analysis framework, which may affect the clinical utility of AI-generated labels, and they call for broader applications beyond chest X-ray reports. Uddin et al. and Rehan et al. also highlight limitations in their respective studies, such as the lack of human expert involvement and the narrow focus on specific languages or domains, which restricts the generalizability of their findings. Collectively, these studies underscore the necessity for future research to address these limitations to enhance the robustness and applicability of LLMs across diverse contexts.