تطوير أداة تقييم الجروح مدعومة بالذكاء الاصطناعي: نهج منهجي لجمع البيانات وتحسين النموذج Developing an AI-powered wound assessment tool: a methodological approach to data collection and model optimization

المجلة: BMC Medical Informatics and Decision Making، المجلد: 25، العدد: 1
DOI: https://doi.org/10.1186/s12911-025-03144-y
PMID: https://pubmed.ncbi.nlm.nih.gov/40783534
تاريخ النشر: 2025-08-09
المؤلف: Alessio Stefanelli وآخرون
الموضوع الرئيسي: الوقاية من قرح الضغط وإدارتها

نظرة عامة

تتناول الدراسة التحدي الكبير للجروح المزمنة (CWs) في الرعاية الصحية، والتي غالبًا ما تتفاقم بسبب التقييم غير الكافي من قبل المتخصصين في الرعاية الصحية (HCPs) بسبب التدريب المحدود والأعباء العالية. للتخفيف من هذه المشكلات، طور الباحثون أداة تقييم جروح مدعومة بالذكاء الاصطناعي (AI) مدمجة في تطبيق موبايل. تم إجراء دراسة رصدية متعددة المراكز في غرب سويسرا، تجمع مجموعة بيانات هجينة من حوالي 4,000 صورة جرح من خلال طرق استرجاعية واستباقية. تضمنت مجموعة البيانات هذه صورًا عالية الدقة، ومقاطع فيديو، ومسح ثلاثي الأبعاد، جميعها مشروحة لتقسيم الجروح وتصنيف الأنسجة لتدريب نماذج التعلم العميق.

حقق نموذج تقسيم الجروح القائم على الذكاء الاصطناعي، الذي يستخدم بنية Deeplabv3+ مع هيكل ResNet50، درجة DICE بلغت 92% ودرجة Intersection-over-Union (IOU) بلغت 85%. أظهر تصنيف الأنسجة درجة DICE متوسطة أولية بلغت 78%، مع تباين عبر أنواع الأنسجة. تم تحسين النماذج للاستخدام على الهواتف المحمولة، محققة استدلالًا في الوقت الحقيقي مع الحد الأدنى من تقليل الأداء. عزز نهج جمع البيانات المزدوجة قابلية تعميم النموذج من خلال ضمان كل من توحيد الصور وتنوع العالم الحقيقي. تشير النتائج إلى أن الذكاء الاصطناعي يمكن أن يحسن بشكل كبير من دقة التشخيص والرعاية الشخصية في إدارة الجروح، على الرغم من أن التحديات لا تزال قائمة في تصنيف الأنسجة والتحقق السريري. يجب أن تركز الأبحاث المستقبلية على تحسين خوارزميات الذكاء الاصطناعي وتوسيع تنوع مجموعة البيانات لتعزيز اتخاذ القرارات السريرية ونتائج المرضى.

مقدمة

تشكل إدارة الجروح المزمنة (CWs) تحديات كبيرة في الرعاية الصحية بسبب علم الأمراض المعقد وعملية الشفاء البطيئة، والتي غالبًا ما تتفاقم بسبب الحالات المصاحبة مثل السكري وأمراض القلب والأوعية الدموية. إن انتشار CWs ملحوظ، حيث يؤثر على حوالي 2.2 لكل 1,000 فرد على مستوى العالم، مع تداعيات اقتصادية كبيرة، مثل التكلفة السنوية المقدرة بـ 8.3 مليار جنيه إسترليني في المملكة المتحدة. تعقد الشيخوخة السكانية إدارة الجروح، حيث يكون كبار السن أكثر عرضة لـ CWs بسبب التغيرات الفسيولوجية والأمراض المزمنة. تعتبر طرق التقييم الحالية لشفاء الجروح، بما في ذلك القياسات المعتمدة على المسطرة والتصوير الرقمي اليدوي، محدودة في الدقة والكفاءة، مما يؤدي غالبًا إلى تقديرات مبالغ فيها لحجم الجرح ويعرض لخطر التلوث.

تقدم التطورات الأخيرة في الذكاء الاصطناعي (AI) وتعلم الآلة (ML) حلولًا واعدة لتحسين تقييم الجروح. أظهرت الأدوات المدفوعة بالذكاء الاصطناعي إمكانيات في أتمتة تقييم صور الجروح، وتعزيز الدقة في القياسات وتصنيف الأنسجة، وتقليل التباين بين الأطباء. تشير الأبحاث إلى أن خوارزميات الذكاء الاصطناعي يمكن أن تحقق أداءً يقارب التقييمات البشرية الخبيرة. ومع ذلك، فإن التطبيق السريري لهذه الحلول المعتمدة على الذكاء الاصطناعي يعتمد على توفر مجموعات بيانات عالية الجودة ومتنوعة للتدريب والتحقق، والتي غالبًا ما تكون محدودة في النطاق والحجم. توضح هذه الدراسة المنهجية والتحديات المتعلقة بجمع البيانات لتطوير أداة تقييم جروح قائمة على الذكاء الاصطناعي بالتعاون مع شركة ناشئة سويسرية، موضحة الخطوات المتخذة، والحواجز التي تم مواجهتها، والعوامل المساعدة التي ساهمت في الحصول على بيانات فعالة.

طرق

في هذا القسم، يناقش المؤلفون المنهجيات المستخدمة لجمع البيانات، مع تسليط الضوء على مزايا الجمع بين الأساليب الاستباقية والاسترجاعية. أسفرت طريقة الجمع الاستباقية عن صور عالية الجودة وموحدة بسبب ظروف التصوير المتسقة، مما أدى إلى دقة أعلى، وإضاءة، ومعايرة الألوان. على العكس، سهلت الطريقة الاسترجاعية جمع مجموعة متنوعة من صور الجروح من بيئات سريرية مختلفة بسرعة، على الرغم من زيادة التباين في ظروف التصوير ونقص البيانات الوصفية في بعض الأحيان.

تؤكد النتائج على التبادلات الحرجة بين التوحيد وقابلية التوسع في تطوير أدوات سريرية مدفوعة بالذكاء الاصطناعي. من خلال الاستفادة من كلا الطريقتين، كان الباحثون يهدفون إلى إنشاء مجموعة بيانات شاملة توازن بين الحاجة إلى التصوير عالي الجودة وعملياته في جمع البيانات في العالم الحقيقي، مما يعزز في النهاية فعالية تطبيقات الذكاء الاصطناعي في البيئات السريرية.

نتائج

يسلط قسم النتائج في الدراسة الضوء على التكامل الناجح لطرق جمع البيانات الاسترجاعية والاستباقية، مما أسفر عن مجموعة بيانات شاملة من 4,000 صورة جرح. تشمل مجموعة البيانات هذه مجموعة متنوعة من الجروح الحادة والمزمنة، مشروحة لكل من تقسيم الجروح وتصنيف نوع الأنسجة. تم اختيار مجموعة فرعية من 1,200 صورة خصيصًا لتصنيف الأنسجة، مع التركيز على أنواع الأنسجة ذات الصلة سريريًا مثل النخر، والتكوين، والتكاثر الظهاري، مع معالجة التحديات المتعلقة بعدم توازن الفئات والتشابه البصري بين أنواع الأنسجة. تعكس تركيبة مجموعة البيانات الحقائق السريرية، وهو أمر حاسم لتدريب نماذج الذكاء الاصطناعي بشكل فعال.

أظهر نموذج تقسيم حدود الجروح أداءً قويًا، محققًا متوسط درجة DICE بلغت 92% ودرجة Intersection-over-Union (IOU) بلغت 85%. أسفرت النتائج الأولية لتقسيم الأنسجة عن متوسط درجة DICE بلغت 78%، مع تباين الأداء عبر أنواع الأنسجة. من الجدير بالذكر أن النموذج حقق أفضل أداء بالنسبة للنخر والتكوين، بينما أظهرت التكاثر الظهاري والنخر درجات أقل بسبب دقتها البصرية. تم دمج نموذج الذكاء الاصطناعي في تطبيق موبايل، تم تحسينه للنشر الفعال مع تقليل حجم النموذج بنسبة 75% من خلال التكميم، مع الحفاظ على قدرة تنبؤية عالية وتحقيق متوسط زمن استدلال قدره 0.3 ثانية على الهواتف الذكية القياسية. تهدف الجهود المستمرة إلى تعزيز قوة نموذج تقسيم نوع الأنسجة وموثوقيته السريرية من خلال توسيع مجموعة البيانات المشروحة وتحسين جودة التسميات.

مناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على تطوير وتقييم أداة تقييم جروح مدعومة بالذكاء الاصطناعي، مع التأكيد على إمكانياتها في تعزيز رعاية الجروح السريرية من خلال تحسين الدقة والكفاءة. استخدمت الدراسة مجموعة بيانات شاملة من حوالي 4,000 صورة جرح، تم جمعها من خلال طرق استرجاعية واستباقية عبر عدة مؤسسات صحية. ضمنت هذه الطريقة المزدوجة تمثيلًا متنوعًا لأنواع الجروح، بما في ذلك الجروح الحادة والمزمنة، مع الحفاظ على بروتوكولات تصوير موحدة. أظهرت نماذج الذكاء الاصطناعي أداءً واعدًا، محققة درجة DICE بلغت 92% لتقسيم الجروح ومتوسط درجة DICE بلغت 78% لتصنيف الأنسجة، مقارنةً أو تجاوزت النتائج من الدراسات السابقة.

تناقش المناقشة أيضًا التحديات التي تم مواجهتها، لا سيما في تقسيم الأنسجة، حيث كان من الصعب التمييز بين أنواع الأنسجة المتشابهة مثل الفيبرين والنخر. على الرغم من هذه التحديات، تؤكد نتائج الدراسة على أهمية التقييم الدقيق للجروح في اتخاذ القرارات السريرية، حيث يمكن أن توجه التقسيم الدقيق والتصنيف استراتيجيات العلاج. تشمل اتجاهات البحث المستقبلية توسيع مجموعة البيانات لتشمل مجموعة أوسع من أنواع الجروح، وتعزيز خوارزميات تصنيف الأنسجة، ودمج أدوات الذكاء الاصطناعي في تطبيقات الصحة المحمولة لتحسين الوصول إلى رعاية الجروح، لا سيما في المناطق المحرومة. بشكل عام، تدعو الدراسة إلى الإمكانيات التحويلية للذكاء الاصطناعي في إدارة الجروح، بهدف دعم الأطباء وتحسين نتائج المرضى.

القيود

ت outlines قسم القيود عدة تحديات تم مواجهتها في تحسين نماذج تقييم الجروح المدفوعة بالذكاء الاصطناعي. تعتبر الطبيعة المستهلكة للموارد لجمع البيانات الاستباقية مصدر قلق رئيسي، حيث تتطلب موظفين مخصصين، وبروتوكولات تصوير موحدة، وتفاعلات حقيقية مع المرضى. هذه الطريقة تستغرق وقتًا أطول بكثير من جمع البيانات الاسترجاعية، الذي يسمح بتراكم سريع لمجموعة البيانات. لمعالجة ذلك، يقترح المؤلفون الحاجة إلى استراتيجيات تدريب الذكاء الاصطناعي القابلة للتوسع التي يمكن أن تستخدم بفعالية مجموعات البيانات السريرية الحالية مع ضمان جودة البيانات، ربما من خلال تقنيات التعلم شبه المراقب.

تتم مناقشة قيد آخر وهو التباين في تقسيم الجروح والأنسجة، سواء بين المقيمين أو داخلهم. بينما تم استخدام بروتوكولات توضيحية موحدة لتحديد حدود الجروح، لا تزال الطبيعة الذاتية لتقسيم نوع الأنسجة تمثل تحديًا. يقترح المؤلفون أن دمج تقنيات التعلم الجماعي، حيث يتم تدريب نماذج ذكاء اصطناعي متعددة على استراتيجيات تقسيم مختلفة، يمكن أن يقلل من تحيز التوضيح ويعزز التناسق في التقسيم عبر أنواع الأنسجة المختلفة. علاوة على ذلك، تؤكد الورقة على أهمية الموافقة التنظيمية والتحقق السريري لاعتماد هذه الأدوات المعتمدة على الذكاء الاصطناعي على نطاق واسع، داعية إلى مزيد من التجارب السريرية لتقييم أداء النموذج في بيئات صحية متنوعة. أخيرًا، يجب أن تعطي دمج التقييمات التي تم إنشاؤها بواسطة الذكاء الاصطناعي في أنظمة السجلات الصحية الإلكترونية (EHR) الأولوية للشفافية، وقابلية الفهم، والقدرة على اتخاذ إجراءات سريرية، بما يتماشى مع التركيز المتزايد على الذكاء الاصطناعي القابل للتفسير (XAI) في التطبيقات الطبية.

Journal: BMC Medical Informatics and Decision Making, Volume: 25, Issue: 1
DOI: https://doi.org/10.1186/s12911-025-03144-y
PMID: https://pubmed.ncbi.nlm.nih.gov/40783534
Publication Date: 2025-08-09
Author(s): Alessio Stefanelli et al.
Primary Topic: Pressure Ulcer Prevention and Management

Overview

The study addresses the significant challenge of chronic wounds (CWs) in healthcare, which are often exacerbated by inadequate assessment from healthcare professionals (HCPs) due to limited training and high workloads. To mitigate these issues, the researchers developed an artificial intelligence (AI)-powered wound assessment tool integrated into a mobile application. A multicenter observational study was conducted in Western Switzerland, compiling a hybrid dataset of approximately 4,000 wound images through both retrospective and prospective methods. This dataset included high-resolution images, videos, and 3D scans, all annotated for wound segmentation and tissue classification to train deep learning models.

The AI-based wound segmentation model, utilizing the Deeplabv3+ architecture with a ResNet50 backbone, achieved a DICE score of 92% and an Intersection-over-Union (IOU) score of 85%. Tissue classification showed a preliminary mean DICE score of 78%, with variability across tissue types. The models were optimized for mobile use, achieving real-time inference with minimal performance reduction. The dual data collection approach enhanced model generalizability by ensuring both image standardization and real-world variability. The findings suggest that AI can significantly improve diagnostic precision and personalized care in wound management, although challenges remain in tissue classification and clinical validation. Future research should focus on refining AI algorithms and expanding dataset diversity to enhance clinical decision-making and patient outcomes.

Introduction

The management of chronic wounds (CWs) poses significant challenges in healthcare due to their complex pathophysiology and slow healing processes, often exacerbated by comorbid conditions such as diabetes and cardiovascular diseases. The prevalence of CWs is notable, affecting approximately 2.2 per 1,000 individuals globally, with substantial economic implications, such as the estimated £8.3 billion annual cost in the UK. The aging population further complicates wound management, as older adults are more susceptible to CWs due to physiological changes and chronic illnesses. Current assessment methods for wound healing, including ruler-based measurements and manual digital imaging, are limited in accuracy and efficiency, often leading to overestimations of wound size and introducing risks of contamination.

Recent advancements in artificial intelligence (AI) and machine learning (ML) present promising solutions for improving wound assessment. AI-driven tools have shown potential in automating the evaluation of wound images, enhancing accuracy in measurements and tissue classification, and reducing variability among clinicians. Research indicates that AI algorithms can achieve performance comparable to expert human annotations. However, the clinical applicability of these AI solutions is contingent upon the availability of high-quality, diverse datasets for training and validation, which are often limited in scope and size. This study outlines the methodology and challenges of data collection for developing an AI-based wound assessment tool in collaboration with a Swiss startup, detailing the steps taken, barriers faced, and facilitators that contributed to efficient data acquisition.

Methods

In this section, the authors discuss the methodologies employed for data collection, highlighting the advantages of combining prospective and retrospective approaches. The prospective collection method yielded high-quality, standardized images due to consistent imaging conditions, resulting in superior resolution, lighting, and color calibration. Conversely, the retrospective method facilitated the rapid gathering of a diverse array of wound images from various clinical settings, albeit with increased variability in imaging conditions and occasional deficiencies in metadata completeness.

The findings underscore the critical trade-offs between standardization and scalability in the development of AI-driven clinical tools. By leveraging both methods, the researchers aimed to create a comprehensive dataset that balances the need for high-quality imaging with the practicalities of real-world data collection, ultimately enhancing the efficacy of AI applications in clinical environments.

Results

The results section of the study highlights the successful integration of retrospective and prospective data collection methods, resulting in a comprehensive dataset of 4,000 wound images. This dataset encompasses a diverse range of acute and chronic wounds, annotated for both wound segmentation and tissue type classification. A curated subset of 1,200 images was specifically selected for tissue classification, focusing on clinically relevant tissue types such as slough, granulation, and epithelialization, while addressing challenges related to class imbalance and visual similarity among tissue types. The dataset’s composition reflects clinical realities, which is crucial for training AI models effectively.

The wound border segmentation model demonstrated robust performance, achieving an average DICE score of 92% and an Intersection-over-Union (IOU) score of 85%. Preliminary results for tissue segmentation yielded a mean DICE score of 78%, with performance varying across tissue types. Notably, the model performed best for slough and granulation, while epithelialization and necrosis showed lower scores due to their visual subtleties. The AI model has been integrated into a mobile application, optimized for efficient deployment with a 75% reduction in model size through quantization, maintaining high predictive capacity and achieving an average inference time of 0.3 seconds on standard smartphones. Ongoing efforts aim to enhance the tissue type segmentation model’s robustness and clinical reliability by expanding the annotated dataset and refining label quality.

Discussion

The discussion section of the research paper highlights the development and evaluation of an AI-powered wound assessment tool, emphasizing its potential to enhance clinical wound care through improved accuracy and efficiency. The study utilized a comprehensive dataset of approximately 4,000 wound images, collected through both retrospective and prospective methods across multiple healthcare institutions. This dual approach ensured a diverse representation of wound types, including acute and chronic wounds, while maintaining standardized imaging protocols. The AI models demonstrated promising performance, achieving a DICE score of 92% for wound segmentation and a mean DICE score of 78% for tissue classification, comparable to or exceeding results from previous studies.

The discussion also addresses the challenges encountered, particularly in tissue segmentation, where differentiating between similar tissue types like fibrin and slough proved difficult. Despite these challenges, the study’s findings underscore the importance of accurate wound assessment in clinical decision-making, as precise segmentation and classification can guide treatment strategies. Future research directions include expanding the dataset to include a broader range of wound types, enhancing tissue classification algorithms, and integrating AI tools into mobile health applications to improve access to wound care, particularly in underserved areas. Overall, the study advocates for the transformative potential of AI in wound management, aiming to support clinicians and optimize patient outcomes.

Limitations

The section on limitations outlines several challenges faced in optimizing AI-driven wound assessment models. A primary concern is the resource-intensive nature of prospective data collection, which necessitates dedicated personnel, standardized imaging protocols, and real-time patient interactions. This approach is significantly more time-consuming than retrospective data collection, which allows for rapid dataset accumulation. To address this, the authors suggest the need for scalable AI training strategies that can effectively utilize existing clinical datasets while ensuring data quality, potentially through semi-supervised learning techniques.

Another limitation discussed is the variability in wound and tissue segmentation, both inter- and intrarater. While standardized annotation protocols have been employed for delineating wound boundaries, the subjective nature of tissue type segmentation remains a challenge. The authors propose that incorporating ensemble learning techniques, where multiple AI models are trained on various segmentation strategies, could mitigate annotation bias and enhance consistency in segmentation across different tissue types. Furthermore, the paper emphasizes the importance of regulatory approval and clinical validation for the widespread adoption of these AI tools, advocating for further clinical trials to assess model performance in diverse healthcare settings. Lastly, the integration of AI-generated assessments into electronic health record (EHR) systems should prioritize transparency, interpretability, and clinical actionability, aligning with the growing focus on explainable AI (XAI) in medical applications.