نهج RoBERTa القابل للتفسير لتحليل مشاعر الذعر والقلق في تعليقات التعليم الصحي الفموي على يوتيوب An explainable RoBERTa approach to analyzing panic and anxiety sentiment in oral health education YouTube comments

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-06560-2
PMID: https://pubmed.ncbi.nlm.nih.gov/40594864
تاريخ النشر: 2025-07-01
المؤلف: Zhenyun Du وآخرون
الموضوع الرئيسي: تأثير وسائل الإعلام على الصحة

نظرة عامة

تبحث ورقة البحث في دور مقاطع الفيديو عبر الإنترنت في التعليم الصحي، مع التركيز بشكل خاص على التأثير الضار للمعلومات المضللة في أقسام التعليقات التي يمكن أن تسبب التوتر والقلق بين المشاهدين. تستخدم الدراسة نموذج اللغة RoBERTa لتصنيف التعليقات المتعلقة بتعليم الصحة الفموية، بهدف تعزيز نتائج الصحة النفسية وتحسين تجربة التعليم. تم ضبط نموذج RoBERTa، المكون من 12 كتلة تحويل وذخيرة تتكون من 50,265 رمزًا، لتحقيق أداء مثالي. شمل سير العمل في التصنيف عدة مراحل، بما في ذلك استيعاب البيانات، وتطبيع الرموز، وتوليد المخرجات، مما أدى في النهاية إلى تحقيق دقة عامة تبلغ 75.00% في تحديد التعليقات التي تسبب الذعر والقلق، مع معدلات دقة واسترجاع تبلغ 74.76% و0.800 للحالات الإيجابية، على التوالي.

في الختام، تسلط الدراسة الضوء على التطبيق الفعال للتعلم العميق القائم على RoBERTa لتصنيف التعليقات المتعلقة بصحة الأسنان، مما يوفر رؤى قيمة لمقدمي الرعاية الصحية مع تحديد المجالات التي تحتاج إلى مزيد من التحسين. يمكن أن يعزز دمج أنظمة اعتدال التعليقات الآلية في ممارسة طب الأسنان رعاية المرضى والتواصل بشكل كبير. ومع ذلك، فإن التحديات مثل قابلية التوسع، وعدم توازن الفئات، وغياب الدعم متعدد اللغات تعيق النشر الفوري في العالم الحقيقي. يجب أن تركز الأبحاث المستقبلية على تحسين النموذج باستخدام مجموعات بيانات أكبر وأكثر تنوعًا والتحقق من أدائه في البيئات الحية لتحسين التواصل الصحي ونتائج المرضى.

طرق

تستخدم الدراسة منهجية منظمة لتحليل المشاعر في تعليقات يوتيوب المتعلقة بتعليم الصحة الفموية. تشمل استرجاع البيانات والمعالجة المسبقة، تليها تطبيق تقنيات معالجة اللغة الطبيعية المتقدمة (NLP)، مع استخدام نموذج RoBERTa بشكل خاص. يسهل هذا سير العمل تصنيف التعليقات إلى أربع فئات متميزة: التعليقات التي تسبب الذعر، والمعلوماتية، والمربكة، وغير المنتجة.

يهدف النهج المنهجي إلى تعزيز جودة النقاش الصحي عبر الإنترنت، وبالتالي توفير رؤى قابلة للتنفيذ يمكن أن تحسن التواصل الصحي العام. تؤكد النتائج على إمكانية الاستفادة من تقنيات NLP لفهم أفضل ومعالجة المشاعر المعبر عنها في المناقشات الصحية الرقمية.

نتائج

تكشف نتائج الدراسة التي تستخدم نموذج RoBERTa لتصنيف التعليقات عن رؤى مهمة حول العلامات اللغوية التي تؤثر على نتائج التصنيف. تشمل الميزات الرئيسية المحددة المصطلحات الفنية، وأفعال الحركة، واللغة السريرية، مع علامات محددة لفئات التعليقات المختلفة مثل كلمات السؤال الغامضة للتعليقات المربكة، والمفردات السنية للتعليقات المعلوماتية، والإشارات العاطفية للتعليقات التي تسبب الذعر. تشير التحليلات إلى متوسط درجة الثقة يبلغ 0.561، مع نطاق يتراوح من 0.315 إلى 0.776، مما يشير إلى تباين في يقين النموذج عبر أنواع التعليقات المختلفة.

تم تحديد أهم عشر ميزات تؤثر على التصنيفات كالتالي: “الشطف”، “يجب”، “الفعالية”، “المنتجات”، “أنا”، “الفلورايد”، “خاصتك”، “يقلل”، “مرتين”، و”قبل”. تشير النتائج إلى أن التعليقات المعلوماتية تتميز بمصطلحات سنية فنية، بينما تتضمن التعليقات التي تسبب الذعر غالبًا لغة غير رسمية وضمائر شخصية. تتميز التعليقات المربكة بكلمات السؤال والمصطلحات، بينما تحتوي التعليقات غير المنتجة عادةً على لغة سلبية وضمائر من الشخص الأول. تؤكد تحليل SHAP على اعتماد النموذج على مفردات معينة وعناصر هيكلية، مما يشير إلى مجالات للتحسين المحتمل في الإصدارات المستقبلية ويؤكد على أهمية السياق في تصنيف التعليقات.

مناقشة

في هذا القسم، يناقش المؤلفون المنهجية والنتائج لدراستهم حول تعليقات يوتيوب المتعلقة بتعليم الصحة الفموية، مع التركيز بشكل خاص على مشاعر الذعر والارتباك والمعلومات المضللة. قاموا باسترجاع وتصنيف التعليقات من أربعة مقاطع فيديو تعليمية باستخدام واجهة برمجة تطبيقات يوتيوب، مصنفين إياها إلى فئات مدفوعة بالذعر، ومعلوماتية، ومربكة، وغير منتجة. تتكون مجموعة البيانات من 251 تعليقًا، مما يكشف عن عدم توازن كبير في الفئات، خاصة مع وجود تعليقات مرتبطة بالذعر (165). استخدم المؤلفون نموذجًا قائمًا على RoBERTa لتحليل المشاعر، محققين دقة عامة تبلغ 75.00%. بينما أظهر النموذج دقة عالية في تحديد التعليقات المربكة وغير المنتجة، إلا أنه واجه صعوبة مع التعليقات التي تسبب الذعر، مما يشير إلى الحاجة إلى مزيد من التحسين في استراتيجيات التصنيف.

تؤكد المناقشة على أهمية فهم الاستجابات العاطفية لتعليم الصحة الفموية، حيث يمكن أن يعيق القلق والمعلومات المضللة الأفراد عن السعي للحصول على الرعاية السنية اللازمة. تشير النتائج إلى أن تحليل المشاعر يمكن أن يكون أداة قيمة للمعلمين والمهنيين الصحيين لتحديد ومعالجة الحواجز النفسية في النقاش حول صحة الفم. يقترح المؤلفون العمل المستقبلي لتحسين أداء النموذج، خاصة في تصنيف التعليقات المدفوعة بالذعر بدقة، مما يحسن فعالية التدخلات التعليمية في التواصل الصحي الرقمي.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-06560-2
PMID: https://pubmed.ncbi.nlm.nih.gov/40594864
Publication Date: 2025-07-01
Author(s): Zhenyun Du et al.
Primary Topic: Media Influence and Health

Overview

The research paper investigates the role of online videos in health education, particularly focusing on the detrimental impact of misinformation in comment sections that can induce stress and anxiety among viewers. The study employs the RoBERTa language model to classify comments related to oral health education, aiming to enhance mental health outcomes and improve the educational experience. The RoBERTa model, configured with 12 transformer blocks and a vocabulary of 50,265 tokens, was fine-tuned for optimal performance. The classification workflow involved several stages, including data ingestion, token normalization, and output generation, ultimately achieving an overall accuracy of 75.00% in identifying panic and anxiety-inducing comments, with precision and recall rates of 74.76% and 0.800 for positive cases, respectively.

In conclusion, the study highlights the effective application of RoBERTa-based deep learning for classifying comments related to dental health, providing valuable insights for healthcare providers while identifying areas for further refinement. The integration of automated comment moderation systems in dental practice could significantly enhance patient care and communication. However, challenges such as scalability, class imbalance, and the absence of multilingual support hinder immediate real-world deployment. Future research should focus on refining the model with larger, more diverse datasets and validating its performance in live settings to improve health communication and patient outcomes.

Methods

The study utilizes a structured methodology to analyze sentiment in YouTube comments pertaining to oral health education. It encompasses data retrieval and preprocessing, followed by the application of advanced natural language processing (NLP) techniques, specifically employing the RoBERTa model. This workflow facilitates the classification of comments into four distinct categories: panic-inducing, informative, confused, and unproductive.

The systematic approach aims to enhance the quality of online health discourse, thereby providing actionable insights that can improve public health communication. The findings underscore the potential of leveraging NLP techniques to better understand and address the sentiments expressed in digital health discussions.

Results

The results of the study utilizing the RoBERTa model for comment classification reveal significant insights into the linguistic markers that influence classification outcomes. Key features identified include technical terminology, action verbs, and clinical language, with specific markers for various comment categories such as ambiguous question words for confusing comments, dental vocabulary for informative comments, and emotional cues for panic comments. The analysis indicates a mean confidence score of 0.561, with a range from 0.315 to 0.776, suggesting variability in the model’s certainty across different comment types.

The ten most significant features impacting classifications were identified as “rinsing,” “should,” “effectiveness,” “products,” “me,” “fluoride,” “your,” “reduces,” “twice,” and “before.” The findings suggest that informative comments are characterized by technical dental terms, while panic comments often include informal language and personal pronouns. Confusing comments are marked by question words and jargon, whereas unproductive comments typically feature negative language and first-person pronouns. The SHAP analysis underscores the model’s reliance on specific vocabulary and structural elements, indicating areas for potential enhancement in future iterations and emphasizing the importance of context in comment categorization.

Discussion

In this section, the authors discuss the methodology and findings of their study on YouTube comments related to oral health education, particularly focusing on sentiments of panic, confusion, and misinformation. They retrieved and categorized comments from four instructional videos using the YouTube API, classifying them into Panic-Driven, Informative, Confused, and Unproductive categories. The dataset comprised 251 comments, revealing a significant class imbalance, particularly with a predominance of panic-related comments (165). The authors employed a RoBERTa-based model for sentiment analysis, achieving an overall accuracy of 75.00%. While the model demonstrated high precision in identifying confusing and unproductive comments, it struggled with panic comments, indicating a need for further refinement in classification strategies.

The discussion emphasizes the importance of understanding emotional responses to oral health education, as anxiety and misinformation can deter individuals from seeking necessary dental care. The findings suggest that sentiment analysis can serve as a valuable tool for educators and healthcare professionals to identify and address psychological barriers in oral health discourse. The authors propose future work to enhance model performance, particularly in accurately classifying panic-driven comments, thereby improving the effectiveness of educational interventions in digital health communication.