ASF-YOLO: نموذج YOLO جديد مع دمج تسلسل المقياس الانتباهي لتجزئة حالات الخلايا ASF-YOLO: A novel YOLO model with attentional scale sequence fusion for cell instance segmentation

المجلة: Image and Vision Computing، المجلد: 147
DOI: https://doi.org/10.1016/j.imavis.2024.105057
تاريخ النشر: 2024-05-01
المؤلف: Ming Kang وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في اكتشاف السرطان

نظرة عامة

في هذا القسم، يقدم المؤلفون تطوير نموذج ASF-YOLO، وهو إطار متقدم لتجزئة الكائنات مصمم لتحليل صور الخلايا. يدمج هذا النموذج دمج الميزات المكانية والقياسية لتعزيز الكشف والتجزئة للأشياء الصغيرة والكثيفة في صور الخلايا. تشمل الابتكارات الرئيسية تقديم وحدات SSFF (دمج الميزات المكانية والقياسية) وTFE (تعزيز الميزات الزمنية)، والتي تحسن بشكل كبير أداء التجزئة متعددة المقاييس. بالإضافة إلى ذلك، يتم استخدام آلية انتباه القناة والموقع لاستخراج معلومات الميزات ذات الصلة، مما يؤدي إلى دقة وسرعة استدلال أفضل مقارنة بالطرق الحالية الرائدة.

تؤكد النتائج التجريبية فعالية ASF-YOLO، مما يظهر قدرته على التفوق على نماذج YOLO التقليدية في مهام تجزئة الخلايا. ومع ذلك، يعترف المؤلفون بالقيود الناتجة عن حجم مجموعة البيانات الصغيرة، مما يؤثر على أداء تعميم النموذج. كما يناقشون التبادلات الحسابية المرتبطة بـ CPAM (آلية انتباه القناة والموقع) ويقترحون تحسينات مستقبلية، مثل دمج الهياكل التلافيفية الهرمية، والتلافيف الموسعة، وتقنيات التعلم بالنقل. تهدف هذه التحسينات إلى توسيع مجال استقبال النموذج وتعزيز قدرته على التقاط المعلومات السياقية العالمية، مما يسهل تطبيقه العملي في البيئات السريرية.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على أهمية تقنيات إعداد العينات المتقدمة والتصوير في التحليل الكمي لصور الخلايا، وخاصة في الطب وعلم الخلايا. تناقش القيود المفروضة على أطر الشبكات العصبية التلافيفية التقليدية (CNN)، مثل سلسلة R-CNN ذات المرحلتين، في تحقيق الأداء الأمثل لتجزئة كائنات الخلايا في الوقت الحقيقي، خاصة مع الخلايا الكثيفة والصغيرة. في المقابل، ظهرت سلسلة You Only Look Once (YOLO) كنهج رائد لتجزئة الكائنات في الوقت الحقيقي بسبب تصميمها ذو المرحلة الواحدة وقدرات استخراج الميزات الفعالة. ومع ذلك، لا يزال تطبيق نماذج YOLO على تجزئة الكائنات الصغيرة في الصور الطبية غير مُعالج إلى حد كبير.

يقترح المؤلفون نموذجًا محسّنًا يعتمد على YOLO، يُطلق عليه اسم Attentional Scale Sequence Fusion YOLO (ASF-YOLO)، مصمم خصيصًا لتعزيز تجزئة كائنات الخلايا. يبني هذا النموذج على بنية YOLOv5، ويشمل وحدات جديدة في الجزء العنقي لتسهيل دمج الميزات متعددة المقاييس وآلية الانتباه. تشمل الابتكارات الرئيسية وحدة دمج ميزات تسلسل المقاييس (SSFF) ووحدة ترميز الميزات الثلاثية (TFE)، والتي تعزز معًا قوة النموذج في مواجهة التغيرات في مقياس الكائنات وتعزز التركيز على الميزات ذات الصلة للكائنات الصغيرة. بالإضافة إلى ذلك، فإن استخدام Enhanced Intersection over Union (EIoU) لتحسين صناديق الحدود وSoft Non-Maximum Suppression (Soft-NMS) للمعالجة اللاحقة يعالج التحديات المتعلقة بالخلايا المتداخلة بكثافة. يُظهر نموذج ASF-YOLO المقترح دقة وسرعة كشف متفوقة على مجموعات بيانات الخلايا القياسية مقارنة بالطرق الحالية الرائدة، مما يمثل تقدمًا كبيرًا في مجال تجزئة كائنات الخلايا.

طرق

في قسم الطرق، يقيم البحث مساهمات الوحدات المقترحة المختلفة لتعزيز أداء التجزئة في كشف الخلايا. توضح الجدول 3 أن دمج Soft-NMS ضمن إطار YOLOv5l-seg يعالج بفعالية مشكلات كبح الأخطاء الناجمة عن التداخل المتبادل، خاصة عند الكشف عن الأشياء الصغيرة المعبأة بكثافة. يؤدي هذا التعديل إلى تحسين ملحوظ في الأداء.

بالإضافة إلى ذلك، فإن تنفيذ دالة خسارة EIoU يعزز بشكل كبير دقة صناديق الحدود للأشياء الصغيرة، مما يؤدي إلى زيادة في متوسط الدقة (mAP) من 50 إلى 95 بنسبة 1.8%. علاوة على ذلك، أظهرت إضافة وحدات SSFF وTFE وCPAM أنها تعالج بفعالية التحديات المرتبطة بتجزئة كائنات صغيرة في صور الخلايا، مما يساهم في تحسين الأداء العام للنموذج.

نتائج

في قسم النتائج، تتم مقارنة أداء نموذج ASF-YOLO المقترح كميًا مع طرق كلاسيكية ورائدة مختلفة على مجموعة بيانات DSB2018، بما في ذلك Mask R-CNN وCascade Mask R-CNN وSOLO وغيرها. حقق نموذج ASF-YOLO، الذي يتكون من 46.18 مليون معلمة، أداءً متفوقًا مع متوسط دقة صندوق (mAP) يبلغ 0.91 ومتوسط دقة القناع (mAP) يبلغ 0.887، إلى جانب سرعة استدلال تبلغ 47.3 إطارًا في الثانية (FPS). يتجاوز هذا الأداء أداء Mask R-CNN مع هيكل Swin Transformer، بالإضافة إلى الخوارزميات الكلاسيكية ذات المرحلة الواحدة مثل SOLO وYOLACT. بالإضافة إلى ذلك، أظهر ASF-YOLO أفضل أداء لتجزئة الكائنات على مجموعة بيانات BCC، مما يبرز قدرته على التعميم عبر مجموعات بيانات مختلفة مع أنواع خلايا متنوعة.

توضح النتائج النوعية المعروضة في الشكل 6 فعالية ASF-YOLO في تجزئة الخلايا، خاصة للأشياء الكثيفة والصغيرة، بسبب دمج وحدة TFE. تعزز وحدة SSFF استخراج الميزات متعددة المقاييس، مما يؤدي إلى تجزئة دقيقة لصور الخلايا الأكبر في خلفيات معقدة. تشير المقارنات البصرية إلى أنه بينما تؤدي جميع النماذج بشكل جيد على صور الخلايا الأبسط، أظهر Mask R-CNN معدل اكتشاف خاطئ مرتفع، وكافحت كل من SOLO وYOLOv5l-seg مع الاكتشافات المفقودة وتجزئة الخلايا ذات الحدود الضبابية. تؤكد هذه النتائج على قوة ومرونة نموذج ASF-YOLO في مهام التجزئة المتنوعة.

نقاش

يسلط قسم النقاش في الورقة البحثية الضوء على التقدم في تجزئة كائنات الخلايا، مع التركيز على قيود الطرق التقليدية والحاجة إلى تقنيات محسنة. يستعرض مجموعة متنوعة من أساليب التعلم العميق، بما في ذلك Mask R-CNN وSSD وU-Net، التي حسنت دقة تجزئة النواة. ومع ذلك، تكافح هذه الطرق مع الخلايا الكثيفة والصغيرة، مما يستلزم تطوير أطر أكثر كفاءة. يدمج نموذج ASF-YOLO المقترح الميزات المكانية ومتعددة المقاييس من خلال وحدات جديدة، مثل دمج ميزات تسلسل المقاييس (SSFF) وترميز الميزات الثلاثية (TFE)، لتعزيز دقة التجزئة للخلايا الصغيرة والمتداخلة.

يستخدم إطار ASF-YOLO آلية انتباه القناة والموقع (CPAM) لاستخراج الميزات المعلوماتية، مما يحسن بشكل كبير أداء الكشف والتجزئة. تظهر النتائج التجريبية أن ASF-YOLO يتفوق على الطرق الحالية الرائدة من حيث الدقة وسرعة الاستدلال، على الرغم من أن المؤلفين يعترفون بالحاجة إلى تحسينات إضافية في أداء التعميم بسبب حجم مجموعة البيانات المحدود. ستركز الأعمال المستقبلية على دمج الهياكل التلافيفية الهرمية واستكشاف تقنيات متقدمة مثل التلافيف الموسعة وآليات الانتباه المعتمدة على Transformer لتحسين فعالية النموذج في التطبيقات السريرية.

Journal: Image and Vision Computing, Volume: 147
DOI: https://doi.org/10.1016/j.imavis.2024.105057
Publication Date: 2024-05-01
Author(s): Ming Kang et al.
Primary Topic: AI in cancer detection

Overview

In this section, the authors present the development of the ASF-YOLO model, an advanced instance segmentation framework tailored for cell image analysis. This model integrates spatial and scale feature fusion to enhance the detection and segmentation of small, dense objects in cell images. Key innovations include the introduction of the SSFF (Spatial and Scale Feature Fusion) and TFE (Temporal Feature Enhancement) modules, which significantly improve multiscale segmentation performance. Additionally, a channel and position attention mechanism is employed to further extract relevant feature information, leading to superior accuracy and inference speed compared to existing state-of-the-art methods.

The experimental results validate the effectiveness of ASF-YOLO, demonstrating its capability to outperform traditional YOLO models in cell segmentation tasks. However, the authors acknowledge limitations due to the small dataset size, which affects the model’s generalization performance. They also discuss the computational trade-offs associated with the CPAM (Channel and Position Attention Mechanism) and suggest future enhancements, such as incorporating hierarchical convolutional structures, dilated convolutions, and transfer learning techniques. These improvements aim to expand the model’s receptive field and enhance its ability to capture global contextual information, ultimately facilitating its practical application in clinical settings.

Introduction

The introduction of this research paper highlights the significance of advanced sample preparation and imaging technologies in the quantitative analysis of cell images, particularly in medicine and cell biology. It discusses the limitations of traditional Convolutional Neural Network (CNN) frameworks, such as the two-stage R-CNN series, in achieving optimal performance for real-time cell instance segmentation, especially with dense and small cells. In contrast, the You Only Look Once (YOLO) series has emerged as a leading approach for real-time instance segmentation due to its one-stage design and efficient feature extraction capabilities. However, the application of YOLO models to small object segmentation in medical images remains largely unaddressed.

The authors propose an improved YOLO-based model, termed Attentional Scale Sequence Fusion YOLO (ASF-YOLO), specifically designed to enhance cell instance segmentation. This model builds on the YOLOv5 architecture, incorporating novel modules in the neck part to facilitate multiscale feature fusion and an attention mechanism. Key innovations include the Scale Sequence Feature Fusion (SSFF) module and the Triple Feature Encoder (TFE) module, which collectively improve the model’s robustness to variations in object scale and enhance the focus on relevant features for small objects. Additionally, the use of the Enhanced Intersection over Union (EIoU) for bounding box optimization and Soft Non-Maximum Suppression (Soft-NMS) for post-processing addresses challenges related to densely overlapping cells. The proposed ASF-YOLO model demonstrates superior detection accuracy and speed on benchmark cell datasets compared to existing state-of-the-art methods, marking a significant advancement in the field of cell instance segmentation.

Methods

In the methods section, the research evaluates the contributions of various proposed modules to enhance segmentation performance in cell detection. Table 3 illustrates that the integration of Soft-NMS within the YOLOv5l-seg framework effectively addresses the error suppression issues arising from mutual occlusion, particularly when detecting densely packed small objects. This modification leads to a notable performance improvement.

Additionally, the implementation of the EIoU loss function significantly enhances the accuracy of bounding boxes for small objects, resulting in an increase in mean Average Precision (mAP) from 50 to 95 by 1.8%. Furthermore, the inclusion of the SSFF, TFE, and CPAM modules has been shown to effectively tackle the challenges associated with small object instance segmentation in cell images, thereby contributing to overall model performance enhancement.

Results

In the results section, the performance of the proposed ASF-YOLO model is quantitatively compared with various classical and state-of-the-art methods on the DSB2018 dataset, including Mask R-CNN, Cascade Mask R-CNN, SOLO, and others. The ASF-YOLO model, which comprises 46.18 million parameters, achieved superior performance with a Box mean Average Precision (mAP) of 0.91 and a Mask mAP of 0.887, alongside an inference speed of 47.3 Frames Per Second (FPS). This performance outstrips that of the Mask R-CNN with a Swin Transformer backbone, as well as classical one-stage algorithms like SOLO and YOLACT. Additionally, ASF-YOLO demonstrated the best instance segmentation performance on the BCC dataset, showcasing its generalization capability across different datasets with varying cell types.

Qualitative results illustrated in Figure 6 highlight the effectiveness of ASF-YOLO in cell segmentation, particularly for dense and small objects, due to the integration of the TFE module. The SSFF module further enhances multiscale feature extraction, yielding accurate segmentation for larger cell images in complex backgrounds. The visual comparisons indicate that while all models perform well on simpler cell images, Mask R-CNN exhibited a high false detection rate, and both SOLO and YOLOv5l-seg struggled with missed detections and segmentation of cells with blurred boundaries. These findings underscore the robustness and adaptability of the ASF-YOLO model in diverse segmentation tasks.

Discussion

The discussion section of the research paper highlights advancements in cell instance segmentation, emphasizing the limitations of traditional methods and the need for improved techniques. It reviews various deep learning approaches, including Mask R-CNN, SSD, and U-Net, which have enhanced nucleus segmentation accuracy. However, these methods struggle with dense and small cells, necessitating the development of more efficient frameworks. The proposed ASF-YOLO model integrates spatial and multiscale features through novel modules, such as the Scale Sequence Feature Fusion (SSFF) and Triple Feature Encoding (TFE), to enhance segmentation accuracy for small and overlapping cells.

The ASF-YOLO framework employs a Channel and Position Attention Mechanism (CPAM) to extract informative features, significantly improving detection and segmentation performance. Experimental results demonstrate that ASF-YOLO outperforms existing state-of-the-art methods in both accuracy and inference speed, although the authors acknowledge the need for further enhancements in generalization performance due to the limited dataset size. Future work will focus on incorporating hierarchical convolutional structures and exploring advanced techniques like dilated convolutions and Transformer-based attention mechanisms to further improve the model’s efficacy in clinical applications.