AER U-Net: هيكل U-Net متعدد المقاييس معزز بالاهتمام لتجزئة المسطحات المائية باستخدام صور الأقمار الصناعية Sentinel-2 AER U-Net: attention-enhanced multi-scale residual U-Net structure for water body segmentation using Sentinel-2 satellite images

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-99322-z
PMID: https://pubmed.ncbi.nlm.nih.gov/40341885
تاريخ النشر: 2025-05-08
المؤلف: Naga Surekha Jonnala وآخرون
الموضوع الرئيسي: الاستشعار عن بعد وتطبيقات LiDAR

نظرة عامة

تركز البحث على التقسيم التلقائي للمسطحات المائية من صور الأقمار الصناعية للاستشعار عن بعد، وهي مهمة حاسمة لإدارة الموارد المائية الفعالة ومراقبة البيئة. غالبًا ما تكافح الطرق التقليدية، بما في ذلك الأساليب المعتمدة على العتبات وتقنيات التعلم الآلي، في تحديد المسطحات المائية ذات الأشكال غير المنتظمة بدقة بسبب التحديات مثل الظروف البيئية المتغيرة وعيوب الصور مثل الغيوم والظلال. لمعالجة هذه القيود، يقترح الدراسة نموذج تعلم عميق جديد، وهو بنية AER U-Net، التي تعزز دقة التقسيم من خلال دمج ميزات متقدمة مثل الكتل المتبقية، وآليات الانتباه الذاتي، وطبقات الإسقاط.

تستخدم بنية AER U-Net مسارًا انكماشيًا مع طبقات تلافيفية، وتطبيع دفعي، ووظائف تنشيط لاستخراج ميزات متعددة المقاييس بكفاءة. يقلل تضمين الكتل المتبقية من مشكلة تلاشي التدرج، بينما تساعد طبقات الإسقاط في منع الإفراط في التكيف. تقوم آلية الانتباه بتنقيح الاتصالات المتجاوزة، مما يعزز أداء التقسيم بشكل أكبر. تم تدريب النموذج باستخدام مُحسِّن آدم ودالة خسارة الانتروبيا المتقاطعة الثنائية، حيث حقق النموذج درجة تقاطع على اتحاد (IoU) مثيرة للإعجاب تبلغ 0.94، مما يشير إلى فعاليته في اكتشاف المسطحات المائية بدقة، حتى في الظروف الصعبة. يوضح هذا البحث أن نموذج AER U-Net هو أداة قوية للتطبيقات البيئية، قادرة على معالجة الصور عالية الدقة وتقديم نتائج تقسيم دقيقة، خاصة للمسطحات المائية القريبة من حدود اليابسة.

طرق

في هذا القسم، يحدد المؤلفون المنهجية لتقسيم المسطحات المائية من صور الأقمار الصناعية Sentinel-2، كما هو موضح في الشكل 1. تبدأ العملية بالحصول على بيانات متعددة الأطياف عالية الدقة، والتي يتم تقسيمها بعد ذلك إلى مجموعات بيانات تدريب واختبار لتطوير وتقييم نموذج التقسيم. تم الحصول على مجموعة البيانات، من مستودع كاجل بعنوان “صور الأقمار الصناعية للمسطحات المائية”، وتنظيمها في مجلدين: الصور والأقنعة. يتم إنشاء الأقنعة باستخدام مؤشر الفرق المائي المنظم (NWDI)، وهي تقنية تستفيد من الفروق الطيفية بين الأسطح المائية وغير المائية من خلال مقارنة الانعكاس في نطاقات الضوء الأخضر والأشعة تحت الحمراء القريبة (NIR).

يتم حساب NWDI باستخدام الصيغة:

\[
NDWI = \frac{Band3 – Band8}{Band3 + Band8}
\]

حيث يتوافق النطاق 3 مع القناة الخضراء والنطاق 8 مع القناة NIR من القمر الصناعي Sentinel-2. تستفيد هذه المؤشر من خاصية أن الماء يمتص الأطوال الموجية تحت الحمراء بينما يعكس الضوء الأخضر، مما يسهل الكشف الفعال. توضح الأشكال 2 و3 سير العمل لنموذج AER U-Net وتقدم أمثلة على صور المسطحات المائية جنبًا إلى جنب مع أقنعتها المقابلة، على التوالي.

نتائج

يقدم قسم النتائج تحليلًا شاملاً لأداء النموذج المقترح في توقع مناطق المسطحات المائية من صور الأقمار الصناعية، كما هو مفصل في الجدول 3. تضمنت المنهجية معالجة الصور مسبقًا من خلال تغيير الحجم، والتقييس، والتوسيع، تلاها تطبيق نموذج U-Net المعدل المصمم لتعلم الميزات ذات الصلة تلقائيًا عبر عدة طبقات مخفية. تم تدريب النموذج باستخدام خوارزمية الانتشار العكسي، التي سهلت استخراج الميزات بشكل فعال.

تم تقييم الأداء باستخدام عدة مقاييس، بما في ذلك تقاطع على اتحاد (IoU)، ومعامل دايس، والدقة، والاسترجاع، ودرجة F1، والدقة، مع تلخيص النتائج في الجدول 7. تم تقسيم مجموعة البيانات إلى 80% للتدريب و20% للاختبار، مما يسمح بتقييم قوي لقدرات النموذج. تشير النتائج إلى أن النهج المقترح يتفوق على الطرق الحالية في هذا المجال، مما يبرز فعاليته في اكتشاف المناطق المائية بدقة في صور الأقمار الصناعية.

مناقشة

تسلط المناقشة الضوء على قيود نماذج التعلم العميق المختلفة، لا سيما في سياق مهام تقسيم الصور التي تتضمن بيانات بصرية متنوعة ومعقدة. من الجدير بالذكر أن الهياكل الحالية مثل SegNet وMC-WBDN وDeep U-Net تظهر عيوبًا في التعامل مع الصور المزدحمة، والأشكال غير المنتظمة، واكتشاف الحدود. تؤكد التحليل على ضرورة تحقيق مزيد من التقدم في قدرات الخوارزميات، وتوسيع بيانات التدريب، ودمج تقنيات متطورة لتعزيز قدرات التعميم والتفسير لهذه الأنظمة. تشير النتائج إلى أن التحسينات في بنية النموذج، مثل دمج آليات الانتباه والكتل المتبقية، يمكن أن تعزز بشكل كبير دقة التقسيم، لا سيما في السيناريوهات الصعبة مثل اكتشاف المسطحات المائية من صور الأقمار الصناعية.

يظهر نموذج AER U-Net المقترح تحسينًا ملحوظًا مقارنة بالهياكل التقليدية من خلال دمج ميزات متقدمة تعالج العيوب الشائعة، مثل مشكلة تلاشي التدرج والإفراط في التكيف. من خلال خط أنابيب معالجة شامل واستخدام مُحسِّن آدم التكيفي، يحقق النموذج درجة تقاطع على اتحاد (IoU) تبلغ 0.94، مما يشير إلى أداء متفوق في تحديد المسطحات المائية بدقة. تكشف نتائج دراسات الإزالة أن الجمع بين الاتصالات المعززة بالانتباه والآليات المتبقية يؤدي إلى مكاسب كبيرة في مقاييس الدقة، والاسترجاع، ودرجة F1، مما يؤكد على قوة النموذج وفعاليته في التطبيقات الواقعية. يساهم هذا البحث في تقديم رؤى قيمة حول تحسين أطر التعلم العميق لمراقبة البيئة وإدارة الموارد.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-99322-z
PMID: https://pubmed.ncbi.nlm.nih.gov/40341885
Publication Date: 2025-05-08
Author(s): Naga Surekha Jonnala et al.
Primary Topic: Remote Sensing and LiDAR Applications

Overview

The research focuses on the automatic segmentation of water bodies from remote-sensing satellite images, a critical task for effective water resource management and environmental monitoring. Traditional methods, including threshold-based and machine-learning techniques, often struggle with accurately delineating irregularly shaped water bodies due to challenges such as varying environmental conditions and image artifacts like cloud cover and shadows. To address these limitations, the study proposes a novel deep learning model, the AER U-Net architecture, which enhances segmentation accuracy through the integration of advanced features such as residual blocks, self-attention mechanisms, and dropout layers.

The AER U-Net architecture employs a contracting path with convolutional layers, batch normalization, and activation functions to extract multi-scale features efficiently. The inclusion of residual blocks mitigates the vanishing gradient problem, while dropout layers help prevent overfitting. The attention mechanism refines skip connections, further enhancing segmentation performance. Trained with the Adam optimizer and a binary cross-entropy loss function, the model achieves an impressive Intersection over Union (IoU) score of 0.94, indicating its effectiveness in accurately detecting water bodies, even in challenging conditions. This research demonstrates that the AER U-Net model is a robust tool for environmental applications, capable of processing high-resolution imagery and delivering precise segmentation results, particularly for water bodies near land boundaries.

Methods

In this section, the authors outline the methodology for segmenting water bodies from Sentinel-2 satellite imagery, as depicted in Figure 1. The process begins with the acquisition of high-resolution multi-spectral data, which is subsequently divided into training and testing datasets to develop and evaluate the segmentation model. The dataset, sourced from the Kaggle repository titled “Satellite Images of Water Bodies,” is organized into two folders: Images and Masks. Masks are generated using the Normalized Water Difference Index (NWDI), a technique that leverages the spectral differences between water and non-water surfaces by comparing reflectance in the green and near-infrared (NIR) bands.

The NWDI is calculated using the formula:

\[
NDWI = \frac{Band3 – Band8}{Band3 + Band8}
\]

where Band 3 corresponds to the green channel and Band 8 to the NIR channel of the Sentinel-2 satellite. This index capitalizes on the property that water absorbs infrared wavelengths while reflecting green light, facilitating effective detection. Figures 2 and 3 illustrate the workflow of the AER U-Net model and provide examples of water body images alongside their corresponding masks, respectively.

Results

The results section presents a comprehensive analysis of the proposed model’s performance in predicting water body areas from satellite images, as detailed in Table 3. The methodology involved pre-processing the images through resizing, scaling, and padding, followed by the application of a modified U-Net model designed to automatically learn relevant features across multiple hidden layers. The model was trained using the backpropagation algorithm, which facilitated effective feature extraction.

Performance evaluation was conducted using several metrics, including Intersection over Union (IoU), Dice coefficient, precision, recall, F1-score, and accuracy, with results summarized in Table 7. The dataset was split into 80% for training and 20% for testing, allowing for a robust assessment of the model’s capabilities. The findings indicate that the proposed approach outperforms existing methods in the field, highlighting its effectiveness in accurately detecting water regions in satellite imagery.

Discussion

The discussion highlights the limitations of various deep learning models, particularly in the context of image segmentation tasks involving diverse and complex visual data. Notably, existing architectures such as SegNet, MC-WBDN, and Deep U-Net exhibit deficiencies in handling cluttered images, irregular shapes, and boundary detections. The analysis underscores the necessity for further advancements in algorithmic capabilities, training data expansion, and the integration of sophisticated techniques to enhance the generalization and interpretative abilities of these systems. The findings suggest that improvements in model architecture, such as the incorporation of attention mechanisms and residual blocks, can significantly bolster segmentation accuracy, particularly in challenging scenarios like water body detection from satellite imagery.

The proposed AER U-Net model demonstrates a marked improvement over traditional architectures by integrating advanced features that address common pitfalls, such as the vanishing gradient problem and overfitting. Through a comprehensive preprocessing pipeline and the use of the Adaptive Adam optimizer, the model achieves an Intersection over Union (IoU) score of 0.94, indicating superior performance in accurately delineating water bodies. The results from ablation studies reveal that the combination of attention-enhanced connections and residual mechanisms leads to substantial gains in precision, recall, and F1-score metrics, affirming the model’s robustness and effectiveness in real-world applications. This research contributes valuable insights into the optimization of deep learning frameworks for environmental monitoring and resource management.