مقارنة بين أداء الذكاء الاصطناعي والخبراء البشريين في تقييم الألم الحاد في الأغنام Comparison between AI and human expert performance in acute pain assessment in sheep

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-024-83950-y
PMID: https://pubmed.ncbi.nlm.nih.gov/39754012
تاريخ النشر: 2025-01-03
المؤلف: Marcelo Feighelstein وآخرون
الموضوع الرئيسي: علم الأدوية البيطرية والتخدير

نظرة عامة

تستكشف هذه الدراسة قدرة الذكاء الاصطناعي (AI) على تجاوز الخبراء البشريين في التعرف على الألم في الأغنام، باستخدام مجموعة بيانات تضم 48 خروفًا خضعت لعملية جراحية. تم تسجيل مقاطع الفيديو قبل وبعد الإجراءات الجراحية لتقييم مستويات الألم. استخدم أربعة خبراء بيطريين نظامين معتمدين لتقييم الألم: مقياس تعبير الوجه للأغنام (SFPES) ومقياس السلوك المركب Unesp-Botucatu (USAPS)، والذي يُعتبر المعيار المعترف به لتقييم الألم في الأغنام.

أظهر خط أنابيب الذكاء الاصطناعي الذي تم تطويره باستخدام مشفر CLIP ميزة كبيرة على تقييم الوجه البشري، مع فرق في منطقة تحت المنحنى (AUC) قدره 0.115 (p < 0.001) عندما قام كل من الذكاء الاصطناعي والبشر بتحليل نفس البيانات المرئية (صور أمامية وجانبية لوجه الخروف). بينما كانت أداء الذكاء الاصطناعي مقارنة بتقييم البشر على USAPS (فرق AUC قدره 0.027، p = 0.163)، لم يكن هذا التحسن ذو دلالة إحصائية. تشير هذه النتائج إلى أن الذكاء الاصطناعي يمكن أن يعزز بشكل فعال التعرف على الألم في الأغنام، مما يقدم تداعيات مهمة للممارسات السريرية ويدعو إلى مزيد من الاستكشاف في هذا المجال.

الطرق

توضح قسم “الطرق” الإجراءات التجريبية والتحليلية المستخدمة في الدراسة. يتناول اختيار المشاركين، وتصميم التجارب، والتقنيات الإحصائية المستخدمة في تحليل البيانات. استخدم الباحثون إطار تجربة عشوائية محكومة لضمان صحة نتائجهم، مع اهتمام خاص بالتحكم في المتغيرات المربكة.

شملت جمع البيانات مقاييس وبروتوكولات موحدة لضمان الاتساق عبر التجارب. استخدمت التحليل طرقًا إحصائية متقدمة، بما في ذلك تحليل الانحدار واختبار الفرضيات، لتقييم دلالة النتائج. يبرز القسم قوة المنهجية، مشددًا على كيفية دعمها لموثوقية وعمومية النتائج.

النتائج

تشير النتائج إلى أن نظام تقييم الآلة أظهر أداءً متفوقًا مقارنة بتقييم البشر عبر عدة مقاييس، بما في ذلك الدقة، والوضوح، والخصوصية، ودرجة F1، كما هو موضح في الجدول 1. علاوة على ذلك، يبرز الجدول 2 مقارنات منطقة تحت المنحنى (AUC)، مما يكشف أن الآلة تفوقت بشكل كبير على طريقة تقييم البشر SPFES، مع فرق AUC قدره 0.115 (p < 0.001). تشير هذه النتائج إلى أن نظام تقييم الآلة هو بديل أكثر موثوقية للمقيمين البشريين في سياق تقييمات USAPS و SPFES.

المناقشة

في هذه الدراسة، استكشف المؤلفون فعالية خط أنابيب الذكاء الاصطناعي في التعرف على الألم في الأغنام مقارنة بالخبراء البشريين باستخدام نفس البيانات المرئية. تتكون مجموعة البيانات من مقاطع فيديو وصور لـ 48 خروفًا، تم تحليلها في نقطتين زمنيتين: قبل وبعد الجراحة البطنية. أسس الباحثون “حقيقة أساسية” لتصنيف الألم بناءً على هذه النقاط الزمنية ولكنهم اعترفوا بإمكانية وجود عدم دقة في هذه الطريقة. أنشأوا مجموعة بيانات مختصرة لمعالجة هذه المخاوف، مما أدى في النهاية إلى الاحتفاظ بـ 39 فردًا للتحليل. أظهر نموذج الذكاء الاصطناعي أداءً متفوقًا على طرق تقييم البشر، محققًا دقة أعلى، ووضوح، واسترجاع، وحساسية، خاصة عند مقارنته بطريقة تقييم الوجه (SPFES).

تشير النتائج إلى أن قدرة الذكاء الاصطناعي على اكتشاف الألم في تعبيرات وجه الأغنام تتجاوز تلك الخاصة بالخبراء البشر، حتى في ظل ظروف أكثر صرامة. يقترح المؤلفون أن أداء الذكاء الاصطناعي قد ينبع من قدرته على تحديد الميزات الوجهية الدقيقة التي قد يغفلها البشر. كما يبرزون أهمية استخدام كل من المناظر الأمامية والجانبية في تقييم الألم، حيث يوفر كل منظور رؤى فريدة حول تعبيرات الوجه. بينما يظهر نموذج الذكاء الاصطناعي وعدًا، يحذر المؤلفون من أن هناك حاجة لمزيد من البحث للتحقق من فعاليته عبر مستويات الألم المختلفة واستكشاف الآليات الأساسية لعملية اتخاذ القرار الخاصة به. تشير الدراسة إلى تحول محتمل نحو التعرف على الألم بمساعدة الذكاء الاصطناعي في الممارسات البيطرية، على الرغم من أن المؤلفين يؤكدون على الحاجة إلى استمرار تطوير وتحقق أدوات تقييم الألم.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-024-83950-y
PMID: https://pubmed.ncbi.nlm.nih.gov/39754012
Publication Date: 2025-01-03
Author(s): Marcelo Feighelstein et al.
Primary Topic: Veterinary Pharmacology and Anesthesia

Overview

This study investigates the capability of Artificial Intelligence (AI) to surpass human experts in recognizing pain in sheep, utilizing a dataset of 48 sheep subjected to surgery. Video recordings were captured before and after the surgical procedures to assess pain levels. Four veterinary experts employed two established pain scoring systems: the Sheep Facial Expression Scale (SFPES) and the Unesp-Botucatu Composite Behavioral Scale (USAPS), the latter being the recognized standard for sheep pain evaluation.

The AI pipeline developed using a CLIP encoder demonstrated a significant advantage over human facial scoring, with an area under the curve (AUC) difference of 0.115 (p < 0.001) when both AI and humans analyzed the same visual data (front and lateral images of the sheep's face). While the AI's performance was comparable to human scoring on the USAPS (AUC difference of 0.027, p = 0.163), this improvement was not statistically significant. These findings suggest that AI can effectively enhance pain recognition in sheep, presenting important implications for clinical practices and prompting further exploration in the field.

Methods

The “Methods” section outlines the experimental and analytical procedures employed in the study. It details the selection of participants, the design of the experiments, and the statistical techniques used for data analysis. The researchers utilized a randomized controlled trial framework to ensure the validity of their findings, with specific attention to controlling for confounding variables.

Data collection involved standardized measures and protocols to ensure consistency across trials. The analysis employed advanced statistical methods, including regression analysis and hypothesis testing, to evaluate the significance of the results. The section emphasizes the robustness of the methodology, highlighting how it supports the reliability and generalizability of the findings.

Results

The results indicate that the machine scoring system demonstrated superior performance compared to human scoring across several metrics, including accuracy, precision, specificity, and F1 score, as detailed in Table 1. Furthermore, Table 2 highlights the area under the curve (AUC) comparisons, revealing that the machine significantly outperformed the SPFES human scoring method, with an AUC difference of 0.115 (p < 0.001). These findings suggest that the machine scoring system is a more reliable alternative to human evaluators in the context of USAPS and SPFES assessments.

Discussion

In this study, the authors explored the efficacy of an AI pipeline for recognizing pain in sheep compared to human experts using the same visual data. The dataset comprised video recordings and images of 48 sheep, analyzed at two time points: before and after abdominal surgery. The researchers established a ‘ground truth’ for pain classification based on these time points but acknowledged potential inaccuracies in this method. They created a reduced dataset to address these concerns, ultimately retaining 39 individuals for analysis. The AI model demonstrated superior performance over human scoring methods, achieving higher accuracy, precision, recall, and sensitivity, particularly when compared to the facial scoring method (SPFES).

The findings indicate that the AI’s ability to detect pain in sheep facial expressions surpasses that of human experts, even under stricter conditions. The authors suggest that the AI’s performance may stem from its capacity to identify subtle facial features that humans might overlook. They also highlight the importance of using both frontal and lateral views in pain assessment, as each perspective provides unique insights into facial expressions. While the AI model shows promise, the authors caution that further research is needed to validate its effectiveness across varying pain levels and to explore the underlying mechanisms of its decision-making process. The study suggests a potential shift towards AI-assisted pain recognition in veterinary practice, although the authors emphasize the need for continued development and validation of pain assessment tools.