مُسيء سلوك، مُستشار جيد: استكشاف دور نماذج اللغة الكبيرة في كشف الأخبار الزائفة Bad Actor, Good Advisor: Exploring the Role of Large Language Models in Fake News Detection

المجلة: Proceedings of the AAAI Conference on Artificial Intelligence، المجلد: 38، العدد: 20
DOI: https://doi.org/10.1609/aaai.v38i20.30214
تاريخ النشر: 2024-03-24
المؤلف: Beizhe Hu وآخرون
الموضوع الرئيسي: المعلومات المضللة وتأثيراتها

نظرة عامة

في هذا القسم، يستكشف المؤلفون فعالية نماذج اللغة الكبيرة (LLMs) في الكشف عن الأخبار المزيفة، مع تسليط الضوء على قيود نماذج اللغة الصغيرة (SLMs) في هذا السياق. يقومون بإجراء دراسة تجريبية تكشف أنه بينما يمكن لنماذج LLM المتطورة مثل GPT-3.5 تحديد الأخبار المزيفة وتقديم مبررات متعددة الزوايا، إلا أنها لا تزال أقل أداءً مقارنةً بنماذج SLM المعدلة مثل BERT. يُعزى هذا الفارق في الأداء إلى التحديات التي تواجه LLM في اختيار ودمج المبررات بشكل فعال لاتخاذ القرارات.

لمعالجة هذه القيود، يقترح المؤلفون شبكة توجيه مبررات تكيفية (ARG) تمكّن نماذج SLM من الاستفادة من الرؤى المستخلصة من مبررات LLM لتحسين تحليل الأخبار. بالإضافة إلى ذلك، يقدمون نسخة خالية من المبررات، ARG-D، مصممة للتطبيقات الحساسة من حيث التكلفة التي لا تتطلب استعلامات عن LLMs. تظهر النتائج التجريبية على مجموعتين من البيانات في العالم الحقيقي أن كل من ARG وARG-D يتفوقان بشكل كبير على طرق الأساس المختلفة، بما في ذلك تلك المعتمدة على SLMs وLLMs ومزيجها، مما يبرز إمكانيات دمج LLMs كأدوات استشارية في الكشف عن الأخبار المزيفة.

مقدمة

تسلط المقدمة الضوء على الانتشار السريع والمهم للأخبار المزيفة عبر المنصات الإلكترونية، والتي أصبحت تهديدًا حاسمًا في مجالات متعددة، لا سيما في السياسة والاقتصاد. مستشهدين بفشر وآخرين (2016)، يؤكد المؤلفون على التأثير الضار للمعلومات المضللة على التصورات العامة وعمليات اتخاذ القرار. يبرز القسم ضرورة معالجة التحديات التي تطرحها الأخبار المزيفة، حيث يمكن أن تقوض المؤسسات الديمقراطية وتزعزع الاستقرار الاقتصادي. تهدف الدراسة إلى استكشاف الآليات وراء انتشار المعلومات المضللة واقتراح استراتيجيات للتخفيف.

الطرق

في هذا القسم، يحدد المؤلفون المنهجيات المستخدمة في بحثهم عن الكشف عن الأخبار المزيفة باستخدام نماذج اللغة الكبيرة (LLMs) ونماذج اللغة الصغيرة (SLMs). يستخدمون مجموعتين من البيانات: Weibo21 (الصينية) وGossipCop (الإنجليزية)، ويطبقون تقنيات المعالجة المسبقة مثل إزالة التكرار وتقسيم البيانات الزمنية لتقليل تسرب البيانات. النموذج الأساسي LLM الذي تم تقييمه هو GPT-3.5turbo، والذي يتم الاستفادة منه من خلال استراتيجيات تحفيز متنوعة، بما في ذلك Zero-Shot وZero-Shot Chain-of-Thought (CoT) وFew-Shot وFew-Shot CoT. تهدف كل طريقة إلى تعزيز قدرات التفكير والأداء للنموذج في الكشف عن الأخبار المزيفة، مع تحديد الاختبارات الأولية أن الإعداد الأمثل لـ few-shot هو أربعة أمثلة.

يقارن المؤلفون طرقهم المعتمدة على LLM ضد عدة طرق أساسية من SLM، بما في ذلك BERT ونماذج تتضمن التدريب العدائي ودمج الميزات العاطفية. يصنفون مقارناتهم إلى ثلاث مجموعات: LLM-Only وSLM-Only ومزيج من الاثنين (LLM+SLM). تشير النتائج، المقدمة في درجات F1 الكلية، إلى أن طرق التحفيز القليلة تعطي أداءً تنافسياً، مع تسليط الضوء على أفضل النتائج في التحليل. كما تم تقديم تفاصيل التنفيذ، لضمان التناسق في إعدادات النموذج والتحسين عبر التجارب، بما في ذلك استخدام كتلة المحول ذات الأربعة رؤوس في محاكي الميزات المدركة للمبررات.

المناقشة

في هذا القسم، يناقش المؤلفون التحديات والنتائج المتعلقة باستخدام نماذج اللغة الكبيرة (LLMs) للكشف عن الأخبار المزيفة. بينما تظهر نماذج LLM مثل GPT-3.5 قدرات تحليلية مثيرة للإعجاب، إلا أنها تعاني من ضعف الأداء مقارنةً بنماذج اللغة الصغيرة (SLMs) المحددة للمهام مثل BERT في اتخاذ أحكام الصدق. يبرز المؤلفون أن نماذج LLM يمكن أن تولد مبررات معلوماتية من زوايا متعددة، مما يمكن أن يعزز فهم محتوى الأخبار. ومع ذلك، فإن دمج هذه المبررات في عملية الكشف لا يزال غير فعال، ويرجع ذلك أساسًا إلى عدم قدرة LLM على اختيارها واستخدامها بشكل صحيح.

لمعالجة هذه القيود، يقترح المؤلفون شبكة توجيه المبررات التكيفية (ARG)، التي تجمع بين نقاط القوة لكل من LLMs وSLMs. تقوم شبكة ARG بدمج الرؤى المستخلصة من مبررات LLM بشكل انتقائي لتحسين أداء SLMs في الكشف عن الأخبار المزيفة. تظهر النتائج التجريبية أن ARG تتفوق على الطرق الحالية، بما في ذلك تلك التي تستخدم فقط LLMs أو SLMs. بالإضافة إلى ذلك، تم تقديم نسخة خالية من المبررات من النموذج، ARG-D، للتطبيقات الحساسة من حيث التكلفة، مما يظهر أداءً تنافسياً مع تقليل الاعتماد على استعلامات LLM. بشكل عام، تشير النتائج إلى أنه بينما قد لا تكون LLMs وحدها كافية للكشف الفعال عن الأخبار المزيفة، إلا أن مبرراتها يمكن أن تعزز بشكل كبير قدرات SLMs عند دمجها بشكل مناسب.

Journal: Proceedings of the AAAI Conference on Artificial Intelligence, Volume: 38, Issue: 20
DOI: https://doi.org/10.1609/aaai.v38i20.30214
Publication Date: 2024-03-24
Author(s): Beizhe Hu et al.
Primary Topic: Misinformation and Its Impacts

Overview

In this section, the authors explore the efficacy of large language models (LLMs) in the detection of fake news, highlighting the limitations of small language models (SLMs) in this context. They conduct an empirical study revealing that while sophisticated LLMs like GPT-3.5 can identify fake news and offer multi-perspective rationales, they still fall short compared to fine-tuned SLMs such as BERT. This performance gap is attributed to the LLM’s challenges in effectively selecting and integrating rationales for decision-making.

To address these limitations, the authors propose an adaptive rationale guidance network (ARG) that enables SLMs to leverage insights from LLMs’ rationales for enhanced news analysis. Additionally, they introduce a rationale-free variant, ARG-D, designed for cost-sensitive applications that do not require querying LLMs. Experimental results on two real-world datasets demonstrate that both ARG and ARG-D significantly outperform various baseline methods, including those based on SLMs, LLMs, and their combinations, thereby underscoring the potential of integrating LLMs as advisory tools in fake news detection.

Introduction

The introduction highlights the significant and rapid proliferation of fake news across online platforms, which has emerged as a critical threat in various domains, particularly in politics and the economy. Citing Fisher et al. (2016), the authors emphasize the detrimental impact of misinformation on public perception and decision-making processes. The section underscores the urgency of addressing the challenges posed by fake news, as it can undermine democratic institutions and disrupt economic stability. The research aims to explore the mechanisms behind the spread of misinformation and propose strategies for mitigation.

Methods

In this section, the authors outline the methodologies employed in their research on fake news detection using large language models (LLMs) and small language models (SLMs). They utilize two datasets: Weibo21 (Chinese) and GossipCop (English), applying preprocessing techniques such as deduplication and temporal data splitting to mitigate data leakage. The primary LLM evaluated is GPT-3.5turbo, which is leveraged through various prompting strategies, including Zero-Shot, Zero-Shot Chain-of-Thought (CoT), Few-Shot, and Few-Shot CoT prompting. Each approach aims to enhance the model’s reasoning capabilities and performance in detecting fake news, with preliminary tests determining the optimal few-shot setting to be four examples.

The authors compare their LLM-based methods against several SLM baselines, including BERT and models that incorporate adversarial training and emotional feature fusion. They categorize their comparisons into three groups: LLM-Only, SLM-Only, and a combination of both (LLM+SLM). The results, presented in macro F1 scores, indicate that the few-shot prompting methods yield competitive performance, with the best results highlighted in the analysis. Implementation details are also provided, ensuring consistency in model settings and optimization across experiments, including the use of a four-head transformer block in the rationale-aware feature simulator.

Discussion

In this section, the authors discuss the challenges and findings related to the use of large language models (LLMs) for fake news detection. While LLMs like GPT-3.5 exhibit impressive analytical capabilities, they underperform compared to task-specific small language models (SLMs) such as BERT in making veracity judgments. The authors highlight that LLMs can generate informative rationales from multiple perspectives, which can enhance understanding of news content. However, the integration of these rationales into the detection process remains ineffective, primarily due to the LLM’s inability to select and utilize them properly.

To address these limitations, the authors propose the Adaptive Rationale Guidance (ARG) network, which combines the strengths of both LLMs and SLMs. The ARG network selectively incorporates insights from LLM-generated rationales to improve the performance of SLMs in fake news detection. Experimental results demonstrate that the ARG outperforms existing methods, including those using only LLMs or SLMs. Additionally, a rationale-free version of the model, ARG-D, is introduced for cost-sensitive scenarios, showing competitive performance while reducing reliance on LLM queries. Overall, the findings suggest that while LLMs alone may not suffice for effective fake news detection, their rationales can significantly enhance the capabilities of SLMs when integrated appropriately.