اكتشاف البشر لخطابات السياسة المزيفة عبر النصوص والصوت والفيديو Human detection of political speech deepfakes across transcripts, audio, and video

المجلة: Nature Communications، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41467-024-51998-z
PMID: https://pubmed.ncbi.nlm.nih.gov/39223110
تاريخ النشر: 2024-09-02
المؤلف: Matthew Groh وآخرون
الموضوع الرئيسي: المعلومات المضللة وتأثيراتها

طرق

قسم “الطرق” يوضح التصميم التجريبي والتقنيات التحليلية المستخدمة في الدراسة. استخدم الباحثون نهجًا كميًا، حيث تم استخدام التحليلات الإحصائية لتقييم البيانات التي تم جمعها من تجارب مختلفة. تضمنت المنهجيات الرئيسية تجارب محكومة، حيث تم التلاعب بالمتغيرات بشكل منهجي لتقييم آثارها على النتائج ذات الاهتمام.

شملت جمع البيانات استخدام أدوات موحدة لضمان الموثوقية والصلاحية. تم إجراء التحليل باستخدام أدوات برمجية سهلت تطبيق الاختبارات الإحصائية المناسبة، مثل اختبارات t أو ANOVA، لتحديد الفروق المهمة بين المجموعات. يبرز القسم أهمية إمكانية التكرار والشفافية في الطرق للسماح بالتحقق المستقل من النتائج.

النتائج

تشير النتائج من التجارب 1a و2 و3 و4 و5، التي شملت مشاركين تم تجنيدهم من Prolific وتم تسجيلهم مسبقًا، إلى مستويات متفاوتة من الدقة في تحديد الفيديوهات الحقيقية والمزيفة. أظهرت التجربة 1a أن المشاركين حددوا بدقة فيديوهات بيانات التزييف العميق الرئاسية الحقيقية (PDD) والتزييف العميق بمعدلات 85% و87%، على التوالي. في التجربة 2، كانت معدلات الدقة لتحديد أنواع الفيديو المختلفة 86% لفيديوهات PDD الحقيقية، 83% لتزييفات الممثل الصوتي المعزز لـ PDD، 72% لتزييفات تحويل النص إلى كلام المعززة لـ PDD، 85% لفيديوهات أخرى حقيقية، و83% لتزييفات أخرى.

تشير هذه النتائج إلى أنه بينما أدت أداء المشاركين بشكل أفضل بكثير من التخمين العشوائي (الذي سيحقق 50% دقة)، كانت قدرتهم على تحديد تزييفات تحويل النص إلى كلام المعززة لـ PDD أقل بشكل ملحوظ، مما يشير إلى تحدٍ في تمييز هذه التزييفات المحددة مقارنةً بمحفزات أخرى. بشكل عام، تسلط النتائج الضوء على الفعالية المتفاوتة للمشاركين البشر في اكتشاف تكنولوجيا التزييف العميق عبر أنواع الفيديو المختلفة.

المناقشة

يقدم قسم المناقشة في الورقة نتائج سلسلة من التجارب التي تحقق في كيفية تأثير وسائل الإعلام المختلفة على قدرة المشاركين في التمييز بين الخطابات السياسية الحقيقية والمزيفة. أظهرت التجربة 1a، التي شملت 501 مشاركًا، أن الدقة في تحديد أصالة الخطابات تحسنت بشكل ملحوظ مع إضافة عناصر صوتية ومرئية، حيث حقق الجمع بين الفيديو والصوت أعلى دقة بنسبة 86%. تم تأكيد هذا الاتجاه من خلال التجربة 1b، التي شملت 41,313 مشاركًا وأظهرت نتائج مشابهة، وإن كانت بمعدلات دقة أقل قليلاً عبر الوسائط. استكشفت التجارب اللاحقة (2-5) عوامل مختلفة، بما في ذلك نوع التزييف العميق (تحويل النص إلى كلام مقابل الممثل الصوتي)، وتأثير معدلات المعلومات المضللة الأساسية، وتأثير مشاركة المشاركين دون تحفيزات صريحة حول الأصالة. من الجدير بالذكر أن المشاركين أظهروا دقة أقل مع تزييفات تحويل النص إلى كلام مقارنةً بتزييفات الممثل الصوتي، وأن وجود الصوت والفيديو عزز بشكل مستمر تمييز المحتوى المزيف.

بشكل عام، تؤكد الأبحاث على أهمية التواصل متعدد الوسائط في تعزيز قدرة الجمهور على تحديد المعلومات المضللة، خاصة في سياق الخطاب السياسي. تتحدى النتائج السرد البسيط “الرؤية تعني الإيمان” من خلال إظهار أنه بينما يمكن أن تعزز الإشارات البصرية والسمعية تحديد كل من المحتوى الأصيل والمزيف، فإن الفعالية تختلف بناءً على نوع الوسائط والتلاعبات المحددة المستخدمة. يقترح المؤلفون أن الأبحاث المستقبلية يجب أن تستكشف سياقات إضافية وأنواع تلاعبات التزييف العميق لفهم تعقيدات تأثيرات الوسائط على إدراك الجمهور بشكل أفضل.

Journal: Nature Communications, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41467-024-51998-z
PMID: https://pubmed.ncbi.nlm.nih.gov/39223110
Publication Date: 2024-09-02
Author(s): Matthew Groh et al.
Primary Topic: Misinformation and Its Impacts

Methods

The “Methods” section outlines the experimental design and analytical techniques employed in the study. The researchers utilized a quantitative approach, employing statistical analyses to evaluate the data collected from various experiments. Key methodologies included controlled trials, where variables were systematically manipulated to assess their effects on the outcomes of interest.

Data collection involved the use of standardized instruments to ensure reliability and validity. The analysis was conducted using software tools that facilitated the application of appropriate statistical tests, such as t-tests or ANOVA, to determine significant differences between groups. The section emphasizes the importance of replicability and transparency in the methods to allow for independent verification of the findings.

Results

The results from Experiments 1a, 2, 3, 4, and 5, which involved participants recruited from Prolific and were pre-registered, indicate varying levels of accuracy in identifying real and fabricated videos. Experiment 1a demonstrated that participants accurately identified real Presidential Deepfakes Data (PDD) videos and deepfakes at rates of 85% and 87%, respectively. In Experiment 2, the accuracy rates for identifying various video types were 86% for real PDD videos, 83% for enhanced PDD voice actor deepfakes, 72% for enhanced PDD text-to-speech deepfakes, 85% for real other videos, and 83% for other deepfakes.

These findings suggest that while participants performed significantly better than random guessing (which would yield 50% accuracy), their ability to identify enhanced PDD text-to-speech deepfakes was notably lower, indicating a challenge in distinguishing these specific deepfakes compared to other stimuli. Overall, the results highlight the varying effectiveness of human participants in detecting deepfake technology across different video types.

Discussion

The discussion section of the paper presents findings from a series of experiments investigating how different media modalities affect participants’ ability to discern real from fabricated political speeches. Experiment 1a, involving 501 participants, revealed that accuracy in identifying the authenticity of speeches improved significantly with the addition of audio and visual elements, with video and audio combined yielding the highest accuracy at 86%. This trend was corroborated by Experiment 1b, which included 41,313 participants and showed similar results, albeit with slightly lower accuracy rates across modalities. Subsequent experiments (2-5) explored various factors, including the type of deepfake (text-to-speech vs. voice actor), the influence of base rates of misinformation, and the impact of participant engagement without explicit prompts about authenticity. Notably, participants demonstrated lower accuracy with text-to-speech deepfakes compared to voice actor deepfakes, and the presence of audio and video consistently enhanced discernment of fabricated content.

Overall, the research underscores the importance of multimodal communication in enhancing the public’s ability to identify misinformation, particularly in the context of political discourse. The findings challenge the simplistic “seeing is believing” narrative by showing that while visual and auditory cues can bolster the identification of both authentic and fabricated content, the effectiveness varies based on the type of media and the specific manipulations employed. The authors suggest that future research should explore additional contexts and types of deepfake manipulations to further understand the complexities of media effects on public perception.