شبكة الانتباه القنوي المتبقي (RCA) لتصنيف مشاهد الصور في الاستشعار عن بُعد Residual Channel-attention (RCA) network for remote sensing image scene classification

المجلة: Multimedia Tools and Applications، المجلد: 84، العدد: 28
DOI: https://doi.org/10.1007/s11042-024-20546-8
تاريخ النشر: 2025-01-14
المؤلف: Ahmed Gomaa وآخرون
الموضوع الرئيسي: تصنيف الصور باستخدام الاستشعار عن بُعد

نظرة عامة

تقدم ورقة البحث شبكة جديدة تُعرف بشبكة الانتباه القنوي المتبقي (RCA) تهدف إلى تحسين تصنيف مشاهد الصور عالية الدقة في الاستشعار عن بعد (HRRS). تواجه الشبكات العصبية التقليدية (CNNs) تحديات في التقاط العلاقات الدلالية المعقدة والاعتمادات بعيدة المدى في صور HRRS، التي غالبًا ما تظهر تباينًا كبيرًا داخل الفئات وتشابهًا بين الفئات. تعالج شبكة RCA هذه القيود من خلال دمج هيكل متبقي خفيف الوزن لتحسين استخراج الميزات متعددة المقاييس وآلية انتباه القناة التي تبرز الميزات ذات الصلة بينما تخفف الضوضاء. بالإضافة إلى ذلك، تم دمج آلية الضغط والتحفيز (SE) لتحسين اختيار الميزات بشكل أكبر.

أظهرت التقييمات التجريبية على ثلاثة مجموعات بيانات عامة—RSSCN7 وPatternNet وEuroSAT—دقة تصنيف بلغت 97% و99% و96% على التوالي، مما يوضح تفوق شبكة RCA على الأساليب الحالية الرائدة. أكدت تقنيات التصوير Grad-CAM++ فعالية آلية انتباه القناة وقدرات تمثيل الميزات القوية للشبكة. تشير النتائج إلى أن شبكة RCA لا تعزز فقط أداء التصنيف ولكنها تقلل أيضًا من التعقيد الحسابي، مما يجعلها مناسبة للسيناريوهات الصعبة ذات بيانات التدريب المحدودة. يوصي المؤلفون بتمديد تطبيق شبكة RCA لمهام الاستشعار عن بعد الأخرى واستكشاف تقنيات التكيف مع المجال لتحسين القوة عبر بيئات متنوعة.

مقدمة

تناقش مقدمة ورقة البحث هذه التقدم في تكنولوجيا الاستشعار عن بعد، وخاصة التحسينات في الدقة الزمنية والمكانية والطيفية بسبب تعزيز قدرات علوم الكمبيوتر وزيادة عدد الأقمار الصناعية. أصبحت طرق التصنيف التقليدية، التي تعتمد على الميزات منخفضة المستوى، غير كافية للبيانات المعقدة الناتجة عن صور الاستشعار عن بعد عالية الدقة. وبالتالي، هناك تحول نحو تقنيات التعلم العميق، وخاصة الشبكات العصبية التلافيفية (CNNs)، التي يمكنها تعلم الميزات عالية المستوى تلقائيًا والتقاط العلاقات المعقدة عبر دقات مختلفة. تؤكد الورقة على أهمية تصنيف الصور في تطبيقات الاستشعار عن بعد، مثل إدارة الموارد الأرضية والمراقبة البيئية، مع تسليط الضوء على التحديات التي تطرحها الخلفيات الزائدة والتباين الكبير داخل الفئات.

لمعالجة هذه التحديات، يقترح المؤلفون شبكة الانتباه القنوي المتبقي (RCA)، التي تدمج شبكة متبقية خفيفة الوزن مع آلية انتباه القناة. تهدف هذه الطريقة الجديدة إلى تعزيز استخراج الميزات التمييزية مع تقليل تداخل الخلفية، وبالتالي تحسين أداء التصنيف. تم تصميم نموذج RCA ليتعلم من البداية إلى النهاية من بيانات الإدخال إلى التنبؤات، مما يجعله قابلاً للتكيف مع توزيعات الإدخال المتنوعة. توضح الورقة أربع مساهمات رئيسية: تطوير هيكل شبكة عميقة ذات انتباه ذاتي، معمارية جديدة للاتصالات المتبقية، قدرات التعلم من البداية إلى النهاية، والتحقق من فعالية النموذج من خلال التجارب على مجموعات بيانات مرجعية. ستركز الأعمال المستقبلية على استراتيجيات التكيف مع المجال وتقدير عدم اليقين لتعزيز قوة الشبكة المقترحة وتعميمها بشكل أكبر.

طرق

في هذا القسم، يوضح المؤلفون المنهجية التجريبية المستخدمة لتقييم أداء شبكة RCA المقترحة عبر ثلاث مجموعات بيانات عامة لمشاهد صور الاستشعار عن بعد: RSSCN7 وPatternNet وEuroSAT. تم تنفيذ النموذج باستخدام TensorFlow، مع تكوين تدريب يتكون من 500 دورة وحجم دفعة قدره 4، باستخدام مُحسِّن Adam بمعدل تعلم ثابت قدره 0.001. تم تطبيق تعزيز البيانات لتحسين جودة النموذج، ودمجت دالة الخسارة بين خسارة الانتروبيا المتقاطعة ($L_{CE}$) وخسارة المركز ($L_{center}$) مع معلمات التوازن المحددة على 1 و0.008. تم حساب الخسارة الإجمالية كـ $L = L_{CE} + \lambda L_{center}$، حيث تمثل $\lambda$ معلمة التوازن.

أظهرت النتائج التجريبية فعالية شبكة RCA، حيث حققت دقة تصنيف ملحوظة عبر مجموعات البيانات. على وجه الخصوص، في مجموعة بيانات RSSCN7، حقق النموذج دقة بلغت 97% عبر سبع فئات، مع إظهار فئة “الصناعة” دقة أقل بلغت 93% بسبب تحديات تشابه الميزات. في مجموعة بيانات PatternNet، حققت الطريقة المقترحة دقة مثيرة للإعجاب بلغت 99%، متفوقة على العديد من الأساليب الرائدة بفارق يتراوح بين 2% إلى 7%. وبالمثل، في مجموعة بيانات EuroSAT، حقق النموذج دقة بلغت 96.3%، متجاوزًا الأساليب الحالية بفارق يتراوح بين 1.55% إلى 7.13%. تؤكد هذه النتائج على قوة وقدرات تعميم شبكة RCA المقترحة في مهام تصنيف صور الاستشعار عن بعد.

مناقشة

في قسم المناقشة من الورقة، يحلل المؤلفون استراتيجيات مختلفة لتصنيف مشاهد صور الاستشعار عن بعد، مصنفين إياها إلى طرق استخراج ميزات منخفضة ومتوسطة وعالية المستوى. تستخدم التقنيات منخفضة المستوى خصائص الصورة الأساسية، مستخدمةً موصوفات محلية مثل هيستوغرام التدرجات الموجهة (HOG) وتحويل الميزات غير القابلة للتغيير (SIFT) لتمثيل هيكلي مفصل، بينما تركز الموصوفات العالمية على القوام واللون العام. تواجه الطرق متوسطة المستوى، مثل حقيبة الكلمات المرئية (BoVW)، تحديات مثل أخطاء التكميم التي يمكن أن تؤدي إلى فقدان المعلومات. في المقابل، أظهرت استخراج الميزات عالية المستوى من خلال التعلم العميق، وخاصة الشبكات العصبية التلافيفية (CNNs)، تقدمًا كبيرًا، حيث تلتقط تلقائيًا ميزات مجردة وتمييزية. تم تطوير هياكل بارزة مثل AlexNet وVGGNet وGoogleNet وResNet وDenseNet لتعزيز قدرات استخراج الميزات، على الرغم من أنها لا تزال تواجه صعوبات مع تعقيدات بيانات الاستشعار عن بعد.

يقترح المؤلفون شبكة الانتباه القنوي المتبقي (RCA) الجديدة التي تدمج نقاط القوة في الشبكات العصبية التلافيفية وآليات انتباه القناة لتحسين أداء تصنيف المشاهد. تجمع شبكة RCA بين الاتصالات المتبقية لتسهيل استخراج الميزات متعددة المقاييس مع آلية الضغط والتحفيز (SE) التي تبرز قنوات الميزات المهمة بينما تخفف القنوات غير ذات الصلة. تعالج هذه الطريقة المزدوجة قيود الشبكات العصبية التلافيفية التقليدية، خاصة في التقاط الاعتمادات بعيدة المدى والميزات المحلية الحرجة، مما يعزز قدرة النموذج على تصنيف صور الاستشعار عن بعد المعقدة بدقة. تم تصميم المعمارية المقترحة لتكون خفيفة الوزن وفعالة، مما يقلل من خطر الإفراط في التكيف مع زيادة دقة التصنيف عبر مجموعات بيانات متنوعة، بما في ذلك RSSCN7 وPatternNet وEuroSAT. تشير النتائج إلى أن شبكة RCA تتفوق على الأساليب الحالية، محققة دقة إجمالية تبلغ 99% في مجموعة بيانات PatternNet، مما يبرز فعاليتها في تصنيف صور الاستشعار عن بعد.

Journal: Multimedia Tools and Applications, Volume: 84, Issue: 28
DOI: https://doi.org/10.1007/s11042-024-20546-8
Publication Date: 2025-01-14
Author(s): Ahmed Gomaa et al.
Primary Topic: Remote-Sensing Image Classification

Overview

The research paper presents a novel Residual Channel-Attention (RCA) network aimed at enhancing high-resolution remote sensing (HRRS) image scene classification. Traditional convolutional neural networks (CNNs) face challenges in capturing complex semantic relationships and long-distance dependencies in HRRS images, which often exhibit substantial intra-class variation and inter-class similarity. The RCA network addresses these limitations by incorporating a lightweight residual structure for improved multi-scale feature extraction and a channel attention mechanism that emphasizes relevant features while suppressing noise. Additionally, a squeeze-and-excitation (SE) mechanism is integrated to further refine feature selection.

Experimental evaluations on three public datasets—RSSCN7, PatternNet, and EuroSAT—yielded classification accuracies of 97%, 99%, and 96%, respectively, demonstrating the RCA network’s superiority over existing state-of-the-art methods. The use of Grad-CAM++ visualization techniques confirmed the effectiveness of the channel attention mechanism and the network’s robust feature representation capabilities. The findings suggest that the RCA network not only enhances classification performance but also reduces computational complexity, making it suitable for challenging scenarios with limited training data. The authors recommend extending the RCA network’s application to other remote sensing tasks and exploring domain adaptation techniques to improve robustness across various environments.

Introduction

The introduction of this research paper discusses the advancements in remote sensing technology, particularly the improvements in temporal, spatial, and spectral resolutions due to enhanced computer science capabilities and an increasing number of satellites. Traditional classification methods, which rely on low-level features, are becoming inadequate for the complex data generated by high-resolution remote sensing images. Consequently, there is a shift towards deep learning techniques, especially Convolutional Neural Networks (CNNs), which can automatically learn high-level features and capture intricate relationships across various resolutions. The paper emphasizes the importance of image classification in remote sensing applications, such as land resource management and environmental monitoring, while highlighting the challenges posed by redundant backgrounds and significant intra-class variability.

To address these challenges, the authors propose the Residual Channel-attention network (RCA), which integrates a lightweight residual network with a channel attention mechanism. This novel approach aims to enhance the extraction of discriminative features while minimizing background interference, thus improving classification performance. The RCA model is designed to learn end-to-end from input data to predictions, making it adaptable to diverse input distributions. The paper outlines four main contributions: the development of a self-attention deep network structure, a new architecture of residual connections, end-to-end learning capabilities, and validation of the model’s effectiveness through experiments on benchmark datasets. Future work will focus on domain adaptation strategies and uncertainty estimation to further enhance the robustness and generalization of the proposed network.

Methods

In this section, the authors detail the experimental methodology employed to evaluate the performance of their proposed RCA network across three public remote sensing scene image datasets: RSSCN7, PatternNet, and EuroSAT. The model was implemented using TensorFlow, with a training configuration of 500 epochs and a batch size of 4, utilizing the Adam optimizer with a fixed learning rate of 0.001. Data augmentation was applied to enhance model quality, and the loss function combined cross-entropy loss ($L_{CE}$) and center loss ($L_{center}$) with respective trade-off parameters set to 1 and 0.008. The total loss was computed as $L = L_{CE} + \lambda L_{center}$, where $\lambda$ represents the trade-off parameter.

The experimental results demonstrated the RCA network’s effectiveness, achieving notable classification accuracies across the datasets. Specifically, on the RSSCN7 dataset, the model attained an accuracy of 97% across seven categories, with the “Industry” category showing a lower accuracy of 93% due to feature similarity challenges. On the PatternNet dataset, the proposed method achieved an impressive 99% accuracy, outperforming several state-of-the-art methods by margins of 2% to 7%. Similarly, on the EuroSAT dataset, the model reached an accuracy of 96.3%, surpassing existing approaches by 1.55% to 7.13%. These results underscore the robustness and generalization capabilities of the proposed RCA network in remote sensing image classification tasks.

Discussion

In the discussion section of the paper, the authors analyze various strategies for remote sensing image scene classification, categorizing them into low-level, mid-level, and high-level feature extraction methods. Low-level techniques utilize fundamental image properties, employing local descriptors like Histogram of Oriented Gradients (HOG) and Scale-Invariant Feature Transform (SIFT) for detailed structural representation, while global descriptors focus on overall texture and color. Mid-level methods, such as the Bag of Visual Words (BoVW), face challenges like quantization errors that can lead to information loss. In contrast, high-level feature extraction through deep learning, particularly Convolutional Neural Networks (CNNs), has shown significant advancements, automatically capturing abstract and discriminative features. Notable architectures like AlexNet, VGGNet, GoogleNet, ResNet, and DenseNet have been developed to enhance feature extraction capabilities, although they still struggle with the complexities of remote sensing data.

The authors propose a novel Residual Channel-attention network (RCA) that integrates the strengths of CNNs and channel attention mechanisms to improve scene classification performance. The RCA network combines residual connections to facilitate multi-scale feature extraction with a squeeze-and-excitation (SE) mechanism that emphasizes important feature channels while suppressing irrelevant ones. This dual approach addresses the limitations of traditional CNNs, particularly in capturing long-distance dependencies and critical local features, thereby enhancing the model’s ability to classify complex remote sensing images accurately. The proposed architecture is designed to be lightweight and efficient, minimizing the risk of overfitting while maximizing classification accuracy across various datasets, including RSSCN7, PatternNet, and EuroSAT. The results indicate that the RCA network outperforms existing methods, achieving an overall accuracy of 99% on the PatternNet dataset, showcasing its effectiveness in remote sensing image classification.