تحليل آفات الجلد باستخدام محول الرؤية والشبكات العصبية التلافيفية: تصنيف جدري القرود Vision transformer and CNN-based skin lesion analysis: classification of monkeypox

المجلة: Multimedia Tools and Applications، المجلد: 83، العدد: 28
DOI: https://doi.org/10.1007/s11042-024-19757-w
تاريخ النشر: 2024-07-09
المؤلف: Gözde Yolcu
الموضوع الرئيسي: أبحاث فيروس الجدري وتفشيه

نظرة عامة

تتناول ورقة البحث الحاجة الملحة للتشخيص السريع والدقيق للجدري القرود وغيرها من الآفات الجلدية، وخاصة في تمييز الحالات التي قد تكون قاتلة مثل الميلانوما. غالبًا ما تكون طرق التشخيص التقليدية، مثل الديرموسكوبي والتصوير بالموجات فوق الصوتية عالية الدقة، نوعية وذات طابع شخصي وتستغرق وقتًا طويلاً. للتغلب على هذه القيود، يقدم البحث أداة تصنيف كمية وموضوعية تستخدم تقنيات التعلم العميق المتقدمة، وبشكل خاص نموذج المحول البصري وشبكات الأعصاب التلافيفية المختلفة، المدربة من خلال التعلم بالنقل. حقق النموذج التجميعي دقة بلغت 81.91%، بالإضافة إلى مقاييس تنافسية: 65.94% جاكارد، 87.16% دقة، 74.12% استرجاع، و78.16% درجة F.

في الختام، يبرز المؤلفون تطوير نظام آلي قادر على تصنيف صور الجدري القرود وستة آفات جلدية أخرى، باستخدام مجموعة بيانات مشتركة من PAD-UFES-20 وMSLD. تم ضبط نموذج المحول البصري، المدرب مسبقًا على مجموعة بيانات ImageNet-21k الواسعة، ليتناسب مع حجم مجموعة البيانات الأصغر. أظهر النظام أداءً متفوقًا مقارنة بالأدبيات الحالية من حيث الدقة والدقة ودرجة F، مع تحقيق نتائج استرجاع قابلة للمقارنة. بالإضافة إلى ذلك، وجدت الدراسة أن نموذج DenseNet201 المدرب مسبقًا حقق أفضل دقة بين الشبكات المختبرة. يظهر النموذج التجميعي، الذي يدمج كل من DenseNet201 والمحولات البصرية، وعدًا في مساعدة المتخصصين في الرعاية الصحية على التشخيص السريع وعزل المرضى، وخاصة في الحالات العاجلة مثل الجدري القرود.

مقدمة

تسلط المقدمة الضوء على انتشار وشدة الأمراض الجلدية، وخاصة الميلانوما الخبيثة والجدري القرود، مما يستلزم الكشف المبكر للعلاج الفعال. غالبًا ما تكون طرق التشخيص التقليدية، مثل الديرموسكوبي والتصوير بالموجات فوق الصوتية عالية الدقة، ذات طابع شخصي وتستغرق وقتًا طويلاً، مما يبرز الحاجة الملحة لأدوات تشخيص آلية وموضوعية وغير جراحية. يتم تقديم تطوير أنظمة التشخيص المعتمدة على الكمبيوتر (CAD) كحل واعد لتخفيف عبء فحص سرطان الجلد ومساعدة أطباء الجلد في عمليات اتخاذ القرار الخاصة بهم.

تقترح الدراسة نظامًا جديدًا يعالج تصنيف كل من الجدري القرود والآفات الجلدية باستخدام نهج موحد، مستفيدة من تقنيات التعلم العميق. تؤكد على التحديات التي تطرحها محتويات الصور غير ذات الصلة والحاجة إلى مجموعات بيانات كبيرة في تدريب أنظمة رؤية الكمبيوتر القوية. للتغلب على قيود البيانات، تستخدم البحث استراتيجية التعلم بالنقل، حيث يتم ضبط نموذج المحول البصري المدرب مسبقًا على مجموعة بيانات ImageNet-21k، ومقارنة أدائه مع أداء الشبكات العصبية التلافيفية (CNNs). تشير النتائج إلى أن النموذج التجميعي المقترح، الذي يجمع بين نقاط القوة لكل من المحولات البصرية وCNNs، يتفوق على الخوارزميات الحالية المتطورة، مما يوفر أداة فعالة من حيث التكلفة وكفاءة لأطباء الجلد لتعزيز سرعة ودقة التعرف على الآفات الجلدية.

طرق

في هذه الدراسة، تم بناء مجموعة بيانات شاملة من خلال دمج مجموعتين متميزتين من بيانات الآفات الجلدية. استخدم الباحثون نهج التعلم بالنقل، مستفيدين من عدة نماذج لشبكات الأعصاب التلافيفية (CNN) المدربة مسبقًا جنبًا إلى جنب مع نموذج المحول البصري لأداء مهام التصنيف على الآفات الجلدية. تم تقييم أداء هذه النماذج بشكل منهجي ومقارنتها، مما أدى إلى تحديد نموذج المحول البصري وDenseNet201 كنموذجين من بين الأفضل بناءً على الدقة.

لتحسين نتائج التصنيف، شمل عملية اتخاذ القرار النهائية تجميع النتائج من نموذج المحول البصري ونموذج DenseNet201. يتم تمثيل المنهجية والنتائج التجريبية بصريًا في الشكل المرافق (الشكل 1)، الذي يوضح خط أنابيب النظام المستخدم في الدراسة.

نتائج

في هذه الدراسة، تم تقييم نماذج مدربة مسبقًا مختلفة على مجموعة بيانات مشتركة لتصنيف الآفات الجلدية، مع نتائج تشير إلى أن نموذج المحول البصري حقق أعلى أداء. على وجه التحديد، أظهر نموذج المحول البصري المدرب من الصفر دقة بلغت 60.49%، بينما أدت تطبيق تقنية التعلم العميق التجميعي التي تجمع بين Densenet201 والمحولات البصرية إلى تحسين الدقة إلى 81.91%. أوضحت مصفوفة الالتباس أنه بينما ارتكب النموذج التجميعي بعض أخطاء التصنيف، إلا أنه عمومًا أنتج نتائج واعدة، خاصة في الحالات التي احتلت فيها الآفة جزءًا أكبر من الصورة.

تم التحقق من أداء النموذج التجميعي بشكل أكبر مقارنةً بدراسات أخرى تستخدم نفس مجموعة بيانات PAD-UFES-20. لم يقتصر الأمر على تصنيف سبعة أنواع من الآفات، بل تجاوز أيضًا الدراسات السابقة التي صنفت ستة أنواع من حيث الدقة والدقة ودرجة F. عند تدريبه على مجموعة بيانات من ست فئات، تفوق النظام القائم على المحول البصري على عدة نماذج أخرى، محققًا نتائج متفوقة في معظم المقاييس، باستثناء الاسترجاع، حيث احتل المرتبة الثانية. يُعزى الأداء المحسن للمحول البصري إلى آليات الانتباه الخاصة به، التي تبرز الميزات الحرجة بشكل فعال، بينما يستفيد النهج التجميعي من نقاط القوة لكل من Densenet201 والمحولات البصرية، مما يؤدي إلى تحسين نتائج التصنيف.

مناقشة

في هذه الدراسة، يقدم المؤلفون نهجًا جديدًا لتصنيف الآفات الجلدية، بما في ذلك الجدري القرود البشري، باستخدام نموذج المحول البصري المعتمد على التعلم بالنقل. على عكس الدراسات السابقة التي ركزت على التصنيفات الثنائية، تؤكد هذه البحث على نظام تصنيف متعدد الفئات. يقارن المؤلفون أداء نموذج تم تهيئته بأوزان عشوائية مع نموذج يستخدم التعلم بالنقل، مما يظهر أن الأخير يحقق نتائج أفضل. يتم استخدام طريقة تجميع تجمع بين نموذج المحول البصري العميق ونموذج CNN العميق لتعزيز القوة، مما يؤدي في النهاية إلى تحقيق أداء أفضل من الدراسات السابقة.

يعالج نموذج المحول البصري الصور من خلال تقسيمها إلى قطع ثابتة الحجم، والتي يتم تضمينها ثم إدخالها في مشفر المحول. يستفيد هذا الهيكل من آليات الانتباه لالتقاط اعتمادات البكسل، على غرار كيفية معالجة الكلمات في اللغة الطبيعية. تقوم الدراسة بضبط نموذج المحول البصري المدرب مسبقًا على مجموعة بيانات مشتركة من الآفات الجلدية، بما في ذلك بيانات من مجموعتي PAD-UFES-20 وMSLD، التي تتكون من سبع فئات من الآفات الجلدية. يحقق النموذج التجميعي دقة تبلغ 81.91%، إلى جانب مقاييس دقة واسترجاع ملحوظة، مما يشير إلى إمكانية استخدامه في الإعدادات السريرية للتشخيص السريع وعزل المرضى الذين يعانون من آفات جلدية، وخاصة في سياق تفشي الجدري القرود.

Journal: Multimedia Tools and Applications, Volume: 83, Issue: 28
DOI: https://doi.org/10.1007/s11042-024-19757-w
Publication Date: 2024-07-09
Author(s): Gözde Yolcu
Primary Topic: Poxvirus research and outbreaks

Overview

The research paper addresses the critical need for rapid and accurate diagnosis of monkeypox and other skin lesions, particularly in distinguishing potentially fatal conditions like melanoma. Traditional diagnostic methods, such as dermoscopy and high-resolution ultrasound imaging, are often qualitative, subjective, and time-consuming. To overcome these limitations, the study introduces a quantitative and objective classification tool that utilizes advanced deep learning techniques, specifically a Vision Transformer model and various convolutional neural networks, trained through transfer learning. The ensemble model achieved an accuracy of 81.91%, along with competitive metrics: 65.94% Jaccard, 87.16% Precision, 74.12% Recall, and 78.16% F-score.

In the conclusion, the authors highlight the development of an automated system capable of classifying images of monkeypox and six other skin lesions, utilizing a combined dataset from PAD-UFES-20 and MSLD. The Vision Transformer model, pre-trained on the extensive ImageNet-21k dataset, was fine-tuned to accommodate the smaller dataset size. The system demonstrated superior performance compared to existing literature in terms of Accuracy, Precision, and F-score, while achieving comparable Recall results. Additionally, the study found that the pre-trained DenseNet201 model yielded the best accuracy among the tested networks. The ensemble model, which integrates both the DenseNet201 and Vision Transformer, shows promise in assisting healthcare professionals with rapid diagnosis and isolation of patients, particularly in urgent cases like monkeypox.

Introduction

The introduction highlights the prevalence and severity of skin diseases, particularly malignant melanoma and monkeypox, which necessitate early detection for effective treatment. Traditional diagnostic methods, such as dermoscopy and high-resolution ultrasound imaging, are often subjective and time-consuming, underscoring the urgent need for automated, objective, and non-invasive diagnostic tools. The development of Computer-aided Diagnosis (CAD) systems is presented as a promising solution to alleviate the burden of skin cancer screening and assist dermatologists in their decision-making processes.

The study proposes a novel system that addresses the classification of both monkeypox and skin lesions using a unified approach, leveraging deep learning techniques. It emphasizes the challenges posed by irrelevant image content and the need for large datasets in training robust computer vision systems. To overcome data limitations, the research employs a transfer learning strategy, fine-tuning a pre-trained Vision Transformer model on the ImageNet-21k dataset, and compares its performance with that of convolutional neural networks (CNNs). The results indicate that the proposed ensemble model, which combines the strengths of both Vision Transformers and CNNs, outperforms existing state-of-the-art algorithms, thereby offering a cost-effective and efficient tool for dermatologists to enhance the speed and accuracy of skin lesion recognition.

Methods

In this study, a comprehensive dataset was constructed by merging two distinct skin lesion datasets. The researchers employed a transfer learning approach, utilizing several pre-trained Convolutional Neural Network (CNN) models alongside the Vision Transformer model to perform classification tasks on skin lesions. The performance of these models was systematically evaluated and compared, leading to the identification of the Vision Transformer and DenseNet201 models as the top performers based on accuracy.

To enhance classification outcomes, the final decision-making process involved ensembling the results from the Vision Transformer and DenseNet201 models. The methodology and experimental results are visually represented in the accompanying figure (Fig. 1), which outlines the system pipeline utilized in the study.

Results

In this study, various pre-trained models were evaluated on a combined dataset for skin lesion classification, with results indicating that the Vision Transformer model achieved the highest performance. Specifically, the Vision Transformer trained from scratch demonstrated an accuracy of 60.49%, while the application of a deep ensemble learning technique combining Densenet201 and the Vision Transformer improved accuracy to 81.91%. The confusion matrix illustrated that while the ensemble model made some classification errors, it generally produced promising results, particularly in cases where the lesion occupied a larger portion of the image.

The ensemble model’s performance was further validated against other studies utilizing the same PAD-UFES-20 dataset. It not only classified seven lesion types but also surpassed previous studies that classified six types in terms of accuracy, precision, and F-score. When trained on a six-class dataset, the Vision Transformer-based system outperformed several other models, achieving superior results in most metrics, except for recall, where it ranked second. The enhanced performance of the Vision Transformer is attributed to its attention mechanisms, which effectively highlight critical features, while the ensemble approach leverages the strengths of both Densenet201 and the Vision Transformer, leading to improved classification outcomes.

Discussion

In this study, the authors present a novel approach to classifying skin lesions, including human monkeypox, using a transfer learning-based Vision Transformer model. Unlike previous studies that focused on binary classifications, this research emphasizes a multi-class classification system. The authors compare the performance of a model initialized with random weights against one utilizing transfer learning, demonstrating that the latter yields superior results. An ensemble method combining a deep Vision Transformer and a deep CNN model is employed to enhance robustness, ultimately achieving better performance than prior studies.

The Vision Transformer model processes images by dividing them into fixed-size patches, which are then embedded and input into a Transformer encoder. This architecture leverages attention mechanisms to capture pixel dependencies, akin to how words are processed in natural language. The study fine-tunes a pre-trained Vision Transformer on a combined dataset of skin lesions, including data from the PAD-UFES-20 and MSLD datasets, which consists of seven classes of skin lesions. The ensemble model achieves an accuracy of 81.91%, alongside notable precision and recall metrics, indicating its potential utility in clinical settings for rapid diagnosis and isolation of patients with skin lesions, particularly in the context of monkeypox outbreaks.