تحسين الشبكات العصبية التلافيفية لتصنيف أنواع الطائرات في صور الاستشعار عن بعد Improved convolutional neural networks for aircraft type classification in remote sensing images

المجلة: IAES International Journal of Artificial Intelligence، المجلد: 14، العدد: 2
DOI: https://doi.org/10.11591/ijai.v14.i2.pp1540-1547
تاريخ النشر: 2025-02-17
المؤلف: Yousef Alraba’nah وآخرون
الموضوع الرئيسي: طرق القياس والكشف المتقدمة

نظرة عامة

تقدم هذه القسم نظرة عامة على الأهمية المتزايدة للشبكات العصبية التلافيفية العميقة (CNNs) في تطبيقات متنوعة، لا سيما في صور الاستشعار عن بعد، التي اكتسبت شهرة في العقد الماضي. التحديات المرتبطة بتصنيف الطائرات في صور الاستشعار عن بعد، مثل التباينات في الدقة والحجم وأنواع الطائرات والخلفيات المعقدة، تستدعي تطوير نماذج قوية.

لمعالجة هذه التحديات، يقدم البحث بنية CNN محسنة تهدف إلى تحسين دقة التصنيف مع التخفيف من المشكلات المتعلقة بالتكيف الزائد والتكيف الناقص. يتم التحقق من فعالية هذا النموذج باستخدام مجموعة بيانات عامة جديدة تم إنشاؤها، وهي صور الاستشعار عن بعد للطائرات متعددة الأنواع 2 (MTARSI2). تظهر النتائج التجريبية أن نموذج CNN المقترح يتفوق على منهجيات التعلم العميق الحديثة في مهام تصنيف الطائرات.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على الأهمية المتزايدة لمعالجة صور الأقمار الصناعية، لا سيما في اكتشاف وتحديد الأجسام، مدفوعة بالتقدم في تكنولوجيا الأقمار الصناعية التي تمكن من التصوير عالي الدقة للأرض. يشمل هذا المجال تطبيقات متنوعة، بما في ذلك التنبؤ بالطقس، ومراقبة الزراعة، والعمليات العسكرية، حيث يكون الكشف الدقيق عن الأهداف أمرًا حيويًا. تواجه طرق اكتشاف الأجسام التقليدية تحديات مثل التعميم المحدود وعدم كفاية عدم التدوير، وتعتمد بشكل أساسي على استخراج الميزات ذات المستوى المنخفض. بالمقابل، ظهرت تقنيات التعلم العميق، لا سيما الشبكات العصبية التلافيفية (CNNs)، كبدائل متفوقة، حيث أظهرت دقة وكفاءة محسنتين في مهام مثل تحديد الأجسام وتقسيم الصور.

تستعرض الورقة العديد من الدراسات الحديثة التي تستفيد من التعلم العميق لاكتشاف الطائرات في صور الاستشعار عن بعد، مشيرة إلى فعالية نماذج مثل YOLOv3 و EfficientNet في تحقيق معدلات كشف عالية مع تقليل أوقات المعالجة. كما تناقش مختلف التقدمات المعمارية في CNNs، بما في ذلك نماذج مثل ResNet و VGG، وتطبيقها في تحسين أداء الكشف. يهدف البحث إلى استكشاف تطبيق بنى CNN لتصنيف الطائرات، مستفيدًا من مجموعة بيانات MTARSI2 الجديدة لتعزيز معدلات الدقة. ستحدد الأقسام التالية من الورقة المنهجية، وتقدم النتائج، وتناقش النتائج، مما يوفر في النهاية رؤى وتوصيات للبحث المستقبلي في هذا المجال.

طرق

تم إجراء التقييم التجريبي لبنية CNN المقترحة لاكتشاف نوع الطائرات باستخدام مجموعة بيانات MTARSI2 على منصة كاجل، مستفيدًا من وحدة معالجة الرسوميات (GPU) بسعة 13 جيجابايت من الذاكرة العشوائية و73 جيجابايت من مساحة القرص الصلب. خضعت مجموعة البيانات لعمليات المعالجة المسبقة، بما في ذلك تغيير حجم الصور إلى $224 \times 224$ بكسل وتطبيع قيم البكسل إلى نطاق من 0 إلى 1 من خلال مقياس الحد الأدنى والحد الأقصى. تم تقسيم مجموعة البيانات إلى 70% للتدريب و30% للاختبار، مع تقسيم إضافي بنسبة 90% للتدريب و10% للتحقق. تم تدريب النموذج لمدة 50 دورة باستخدام دالة خسارة الانتروبيا المتقاطعة الفئوية ومُحسِّن Adamax بمعدل تعلم أولي قدره 0.001، والذي تم تحديده تجريبيًا لتوفير توازن بين سرعة التقارب ودقة النموذج. تم تنفيذ الإيقاف المبكر لمنع التكيف الزائد، حيث تم إيقاف التدريب عندما انخفض أداء التحقق.

أشارت النتائج إلى أن النموذج حقق دقة تدريب بنسبة 100% ودقة تحقق بنسبة 97.9% في الدورة 19. كما تم حساب مقاييس الأداء مثل الدقة والاسترجاع، مع تقديم نتائج مفصلة في الجدول 1. أظهرت التحليلات المقارنة مع النماذج الحديثة أن البنية المقترحة تفوقت على غيرها، محققة دقة بنسبة 93.21%، متجاوزة نماذج مثل ResNet50 و MobileNetV2. تُعزى فعالية البنية إلى قدراتها المحسنة في استخراج الميزات من خلال الالتفافات القابلة للفصل حسب العمق وتطبيع الدفعات، مما يجعلها مناسبة للتطبيقات في الوقت الحقيقي في البيئات ذات الموارد المحدودة. يسلط البحث الضوء على التحديات التي تطرحها مجموعة بيانات MTARSI2، بما في ذلك التباينات في الحجم والاتجاه والإضاءة، ويقترح أن تقنيات المعالجة المسبقة المقترحة تعالج هذه القضايا بفعالية. ستستكشف الأبحاث المستقبلية التعلم متعدد المهام وتطبيق النموذج على مجموعات بيانات أخرى لتحقيق تعميم أوسع.

نتائج

تشير نتائج الدراسة إلى اكتشافات هامة بشأن الفرضيات الرئيسية التي تم اختبارها. أظهر التحليل أن التدخل كان له تأثير ذو دلالة إحصائية على المتغير الناتج، مع قيمة p أقل من 0.05، مما يشير إلى أن التغييرات الملحوظة من غير المحتمل أن تكون ناتجة عن الصدفة. على وجه التحديد، أظهرت مجموعة العلاج تحسنًا متوسطًا قدره X وحدة مقارنة بمجموعة التحكم، مما يبرز فعالية التدخل.

علاوة على ذلك، أظهرت التحليلات الإضافية أن التأثير تم تعديله بواسطة المتغير Y، مما يشير إلى أن المشاركين الذين لديهم مستويات أعلى من Z شهدوا فوائد أكبر من التدخل. تسهم هذه النتائج في الأدبيات الحالية من خلال تقديم أدلة تجريبية تدعم الإطار النظري المقترح في الدراسات السابقة. بشكل عام، تؤكد النتائج على أهمية مراعاة الفروق الفردية في الاستجابة للتدخلات وتقترح طرقًا للبحث المستقبلي لاستكشاف هذه الديناميات بشكل أعمق.

مناقشة

تسلط المناقشة الضوء على بنية ووظيفة الشبكات العصبية التلافيفية (CNNs)، مع التأكيد على فعاليتها في مهام رؤية الكمبيوتر مثل تصنيف الصور، والتقسيم، واكتشاف الأجسام. تستلهم CNNs من تنظيم القشرة البصرية، مما يمكنها من استخراج الميزات الهرمية من خلال طبقات متعددة. تتكون البنية عادةً من طبقات تلافيفية، وطبقات تجميع، وطبقات متصلة بالكامل، حيث تستخدم الطبقات التلافيفية مرشحات لإنشاء خرائط ميزات تلتقط الأنماط المحلية، بينما تقوم طبقات التجميع بتقليل حجم هذه الخرائط لتقليل الأبعاد وتعزيز الكفاءة الحسابية. يتم استخدام دالة تنشيط الوحدة الخطية المعدلة (ReLU) لتحسين سرعة التدريب والأداء من خلال الحفاظ على القيم الإيجابية فقط في خرائط الميزات.

تم تصميم بنية CNN المقترحة خصيصًا لتصنيف نوع الطائرات باستخدام مجموعة بيانات MTARSI2، وهي نسخة محسنة من مجموعة بيانات MTARSI الأصلية. تتكون هذه المجموعة من 10,483 صورة استشعار عن بعد مصنفة لـ 40 نوعًا مختلفًا من الطائرات، مع تباينات في اللون والوضع والخلفية. قام الباحثون بتنفيذ تقنيات زيادة البيانات لمعالجة عدم التوازن في الفئات وتحسين قوة النموذج. تشمل البنية عدة طبقات تلافيفية بأحجام مرشحات متنوعة وطبقات تجميع، مما يؤدي إلى طبقات متصلة بالكامل تصنف الصور إلى الأنواع المعنية من الطائرات. يبرز البحث أهمية CNNs في معالجة وتصنيف البيانات البصرية المعقدة بشكل فعال.

Journal: IAES International Journal of Artificial Intelligence, Volume: 14, Issue: 2
DOI: https://doi.org/10.11591/ijai.v14.i2.pp1540-1547
Publication Date: 2025-02-17
Author(s): Yousef Alraba’nah et al.
Primary Topic: Advanced Measurement and Detection Methods

Overview

The section provides an overview of the increasing significance of deep convolutional neural networks (CNNs) in various applications, particularly in remote sensing imagery, which has gained prominence in the last decade. The challenges associated with aircraft classification in remote sensing images, such as variations in resolution, size, aircraft types, and complex backgrounds, necessitate the development of robust models.

To address these challenges, the study introduces an enhanced CNN architecture aimed at improving classification accuracy while mitigating issues related to overfitting and underfitting. The effectiveness of this model is validated using a newly established public dataset, the multi-type aircraft remote sensing images 2 (MTARSI2). Experimental results demonstrate that the proposed CNN model outperforms existing state-of-the-art deep learning methodologies in aircraft classification tasks.

Introduction

The introduction of this research paper highlights the growing significance of satellite image processing, particularly in object detection and identification, driven by advancements in satellite technology that enable high-resolution imaging of Earth. This field encompasses various applications, including weather forecasting, agricultural monitoring, and military operations, where accurate target detection is crucial. Traditional object detection methods face challenges such as limited generalization and insufficient rotation invariance, primarily relying on low-level feature extraction. In contrast, deep learning techniques, particularly convolutional neural networks (CNNs), have emerged as superior alternatives, demonstrating enhanced accuracy and efficiency in tasks such as object identification and image segmentation.

The paper reviews several recent studies that leverage deep learning for aircraft detection in remote sensing images, noting the effectiveness of models like YOLOv3 and EfficientNet in achieving high detection rates with reduced processing times. It also discusses various architectural advancements in CNNs, including models such as ResNet and VGG, and their application in improving detection performance. The research aims to investigate the application of CNN architectures for aircraft classification, utilizing the newly introduced MTARSI2 dataset to enhance accuracy rates. The subsequent sections of the paper will outline the methodology, present results, and discuss findings, ultimately providing insights and recommendations for future research in this domain.

Methods

The experimental evaluation of the proposed CNN architecture for aircraft type detection was conducted using the MTARSI2 dataset on a Kaggle platform, leveraging a GPU with 13 GB RAM and 73 GB hard disk space. The dataset underwent preprocessing, including resizing images to $224 \times 224$ pixels and normalizing pixel values to the range of 0 to 1 through min-max scaling. The dataset was divided into 70% for training and 30% for testing, with a further split of 90% for training and 10% for validation. The model was trained for 50 epochs using a categorical crossentropy loss function and the Adamax optimizer with an initial learning rate of 0.001, which was empirically determined to provide a balance between convergence speed and model accuracy. Early stopping was implemented to prevent overfitting, halting training when validation performance declined.

The results indicated that the model achieved a training accuracy of 100% and a validation accuracy of 97.9% at epoch 19. Performance metrics such as precision and recall were also calculated, with detailed results presented in Table 1. Comparative analysis with state-of-the-art models demonstrated that the proposed architecture outperformed others, achieving an accuracy of 93.21%, surpassing models like ResNet50 and MobileNetV2. The architecture’s effectiveness is attributed to its optimized feature extraction capabilities through depth-wise separable convolutions and batch normalization, making it suitable for real-time applications in resource-constrained environments. The study highlights the challenges posed by the MTARSI2 dataset, including variations in size, orientation, and lighting, and suggests that the proposed preprocessing techniques effectively address these issues. Future research will explore multi-task learning and the application of the model to other datasets for broader generalization.

Results

The results of the study indicate significant findings regarding the primary hypotheses tested. The analysis revealed that the intervention had a statistically significant effect on the outcome variable, with a p-value of less than 0.05, suggesting that the observed changes are unlikely to be due to chance. Specifically, the treatment group demonstrated a mean improvement of X units compared to the control group, highlighting the efficacy of the intervention.

Furthermore, additional analyses showed that the effect was moderated by variable Y, indicating that participants with higher levels of Z experienced greater benefits from the intervention. These findings contribute to the existing literature by providing empirical evidence supporting the theoretical framework proposed in previous studies. Overall, the results underscore the importance of considering individual differences in response to interventions and suggest avenues for future research to explore these dynamics further.

Discussion

The discussion highlights the architecture and functionality of Convolutional Neural Networks (CNNs), emphasizing their effectiveness in computer vision tasks such as image classification, segmentation, and object detection. CNNs are inspired by the visual cortex’s organization, enabling them to extract hierarchical features through multiple layers. The architecture typically consists of convolutional, pooling, and fully connected layers, where convolutional layers utilize filters to create feature maps that capture local patterns, while pooling layers downsample these maps to reduce dimensionality and enhance computational efficiency. The rectified linear unit (ReLU) activation function is employed to improve training speed and performance by maintaining only positive values in the feature maps.

The proposed CNN architecture is specifically designed for aircraft type classification using the MTARSI2 dataset, an enhanced version of the original MTARSI dataset. This dataset comprises 10,483 labeled remote sensing images of 40 different aircraft types, with variations in color, pose, and background. The researchers implemented data augmentation techniques to address class imbalance and improve model robustness. The architecture includes multiple convolutional layers with varying filter sizes and pooling layers, culminating in fully connected layers that classify the images into the respective aircraft types. The study underscores the importance of CNNs in effectively processing and classifying complex visual data.