SSATNet: محول انتباه طيفي-مكاني لتصنيف صور الذرة الطيفية العالية SSATNet: Spectral-spatial attention transformer for hyperspectral corn image classification

المجلة: Frontiers in Plant Science، المجلد: 15
DOI: https://doi.org/10.3389/fpls.2024.1458978
PMID: https://pubmed.ncbi.nlm.nih.gov/39886680
تاريخ النشر: 2025-01-16
المؤلف: Bin Wang وآخرون
الموضوع الرئيسي: الطيفية والتحليلات الكيميائية

نظرة عامة

تقدم ورقة البحث نهجًا جديدًا لتصنيف صور الذرة الطيفية الفائقة من خلال تطوير شبكة محول الانتباه الطيفي المكاني (SSATNet). يعالج هذا الأسلوب التحديات التي تطرحها الكمية الكبيرة والميزات المعقدة للبيانات الطيفية الفائقة، والتي غالبًا ما تعيق التصنيف الدقيق لأنواع بذور الذرة. تستخدم SSATNet مزيجًا من الالتواءات ثلاثية الأبعاد وثنائية الأبعاد لاستخراج الميزات المحلية الطيفية والمكانية والنسيجية، بينما تدمج أيضًا الهياكل الطيفية والمكانية لتعزيز فهم الخصائص الداخلية للبيانات. يقوم مشفر المحول مع الانتباه المتقاطع بتحسين استخراج الميزات من منظور عالمي، مما يؤدي إلى تحسين أداء التصنيف مقارنة بالأساليب الحالية.

في الختام، يسهل إطار عمل SSATNet التعرف غير المدمر على أنواع الذرة الطيفية الفائقة من خلال استخدام هيكل تسلسلي ثلاثي الأبعاد-ثنائي الأبعاد لتبسيط بيانات الصورة واستخراج الميزات المحلية بشكل فعال. تعزز دمج الشكل الطيفي المكاني من خلال عمليات التوسع والتآكل قدرة النموذج على فهم طبيعة البيانات. الهيكل المحولي، الذي يستفيد من آليات الانتباه الذاتي، يلتقط الاعتمادات العالمية داخل طيف الذرة، مما يظهر فعالية النموذج. تؤكد دراسات الإزالة على أهمية كل مكون في استخراج الميزات والتصنيف، مما يضع SSATNet كتحسين واعد في الزراعة الذكية.

مقدمة

تناقش مقدمة الورقة التقدمات والتطبيقات لتقنية التصوير الطيفي الفائق، التي تلتقط الخصائص الطيفية لجسم ما عبر عدة نطاقات، مما يوفر معلومات غنية عن الميزات ضرورية لمجالات متنوعة، بما في ذلك الزراعة الذكية. تعالج التقنية التحديات في التعرف بدقة على أنواع بذور الذرة وسط سوق متزايد، مما يبرز أهمية تقنيات تصنيف الصور الطيفية الفائقة المعززة من خلال التعلم الآلي وطرق التعلم العميق. تشمل المساهمات الملحوظة من الدراسات الحديثة تطوير خوارزميات تحسن دقة التصنيف مع تقليل العبء الحاسوبي، مثل المشفرات الذاتية، وآلات الدعم العميقة، ومختلف هياكل الشبكات العصبية الالتفافية (CNN).

تؤكد الورقة على الانتقال من طرق التعلم الآلي التقليدية، التي تعتمد على استخراج الميزات يدويًا، إلى أساليب التعلم العميق التي تؤتمت هذه العملية، مما يعزز أداء التصنيف. تستعرض مجموعة متنوعة من أطر التعلم العميق، بما في ذلك نماذج CNN-Transformer الهجينة والهياكل المتخصصة المصممة لالتقاط كل من الميزات الطيفية والمكانية بشكل فعال. يقترح المؤلفون نهجًا جديدًا يجمع بين الالتواءات ثنائية وثلاثية الأبعاد وهياكل المحول للاستفادة من الميزات الطيفية المكانية لتحديد أنواع بذور الذرة الطيفية الفائقة. تشمل المساهمات الرئيسية هيكل تسلسلي ثلاثي الأبعاد-ثنائي الأبعاد لاستخراج الميزات، وهيكل الشكل الطيفي المكاني لفهم البيانات بشكل أفضل، ومشفر المحول مع الانتباه المتقاطع لتحسين شامل للميزات.

طرق

ت outlines قسم المنهجية هيكلية ومجموعة البيانات المستخدمة في شبكة الانتباه الطيفية المكانية المقترحة (SSATNet) التي تهدف إلى تصنيف صور الذرة الطيفية الفائقة. تتكون الشبكة من عدة مكونات رئيسية: وحدة الالتواءات ثلاثية الأبعاد-ثنائية الأبعاد، الشكل الطيفي المكاني، مشفر المحول مع الانتباه المتقاطع، ومصنف، كما هو موضح في الشكل 1.

لتقييم أداء SSATNet، استخدم الباحثون مجموعة بيانات صور الذرة الطيفية الفائقة المستمدة من SSTNet (Zhang et al., 2022b)، والتي تشمل صورًا لـ 10 أنواع مختلفة من الذرة، كل منها ممثلة بـ 120 عينة. تمتد مجموعة البيانات عبر نطاق طيفي من 400 إلى 1000 نانومتر عبر 128 نطاقًا. لتحسين الكفاءة الحاسوبية مع الاحتفاظ بالميزات الأساسية، تم تقليل دقة الصورة الأصلية من 696 × 520 إلى 210 × 200. تم جمع صور بذور الذرة من مناطق زراعة مختلفة في مقاطعة خنان، مع ميزات مثل FengDa601 وBaiYu9284 وBaiYu607، من بين آخرين. يوضح الشكل 3 خرائط النطاق الطيفي لعينات مختارة من مجموعة البيانات.

نتائج

تظهر نتائج التجارب الواسعة على تصنيف صور الذرة الطيفية الفائقة فعالية النموذج المقترح مقارنة بمختلف الأساليب المعتمدة، بما في ذلك KNN وSGD وRFA وHybridNet وSSTNet وCTMixer وMSTNet وMATNet و3DCT. كما هو موضح في الجدول 1، أظهرت نماذج التعلم الآلي التقليدية مثل KNN وRFA وSGD أداءً ضعيفًا عبر جميع مقاييس التقييم، حيث كان أداء RFA هو الأسوأ بسبب عدم قدرتها على استخراج الميزات الطيفية المكانية العميقة بشكل فعال. في المقابل، حققت النماذج التي تستخدم الالتواءات ثلاثية الأبعاد، مثل HybridNet وSSTNet و3DCT، نتائج متفوقة من خلال التقاط كل من الميزات الطيفية والمكانية في وقت واحد.

علاوة على ذلك، عالجت النماذج التي تستفيد من هيكل المحول، بما في ذلك CTMixer وMSTNet وMATNet، العلاقات المعقدة في البيانات الطيفية الفائقة بشكل فعال. ومن الجدير بالذكر أن النموذج المقترح، الذي يدمج الشبكات الالتفافية مع المحولات ويستخدم آلية انتباه طيفية مكانية جديدة، قد تفوق على جميع طرق المقارنة عبر جميع المقاييس، مما يعزز بشكل كبير الدقة والاسترجاع وF1-Score وKappa. تؤكد هذه النتائج قدرة النموذج على التقاط الميزات الطيفية المكانية المعقدة وعموميتها الفائقة على مجموعات البيانات عالية الأبعاد، مما يثبت أنه نهج متقدم في تصنيف الصور الطيفية الفائقة.

مناقشة

تؤكد قسم المناقشة في ورقة البحث على دمج تقنية التصوير الطيفي الفائق والتعلم العميق، خاصة من خلال استخدام نماذج المحولات، لتصنيف بذور الذرة. يعد هذا النهج حاسمًا لتعزيز الإنتاجية الزراعية والحفاظ على الموارد الجينية. يلتقط التصوير الطيفي الفائق معلومات طيفية مفصلة عبر أطوال موجية متنوعة، مما، عند دمجه مع القدرات القوية لاستخراج الميزات لنماذج المحولات، يسمح بتحسين دقة التصنيف. تتيح آلية الانتباه الذاتي الموجودة في المحولات للنموذج التركيز على المناطق المهمة داخل الصور الطيفية الفائقة، مما يعزز استخدام البيانات الطيفية لتصنيف بذور الذرة.

توضح الورقة أيضًا تنفيذ وحدة الالتواءات ثلاثية الأبعاد-ثنائية الأبعاد، التي تستفيد من كل من الالتواءات ثلاثية الأبعاد وثنائية الأبعاد لتعزيز استخراج الميزات. تلتقط الالتواءات ثلاثية الأبعاد العلاقات الطيفية المكانية المعقدة، بينما تستخرج الالتواءات ثنائية الأبعاد الميزات المكانية المحلية بكفاءة. يعد هذا المزيج ضروريًا للحفاظ على الكفاءة الحاسوبية مع تعظيم أداء التصنيف. بالإضافة إلى ذلك، فإن دمج تقنيات الشكل الطيفي المكاني مع الأساليب الالتفافية يعزز استخراج الميزات من خلال تحسين الهياكل المكانية وإدارة العلاقات الطيفية. تظهر طريقة SSATNet المقترحة، التي تتضمن هذه التقنيات المتقدمة، تحسينات كبيرة في دقة التصنيف والقدرة على مقاومة الضوضاء، مما يساهم في مجال الزراعة الذكية.

Journal: Frontiers in Plant Science, Volume: 15
DOI: https://doi.org/10.3389/fpls.2024.1458978
PMID: https://pubmed.ncbi.nlm.nih.gov/39886680
Publication Date: 2025-01-16
Author(s): Bin Wang et al.
Primary Topic: Spectroscopy and Chemometric Analyses

Overview

The research paper presents a novel approach for hyperspectral corn image classification through the development of a spectral-spatial attention transformer network (SSATNet). This method addresses the challenges posed by the large volume and complex features of hyperspectral data, which often hinder accurate classification of corn seed varieties. SSATNet employs a combination of 3D and 2D convolutions to extract local spatial, spectral, and textural features, while also integrating spectral and spatial morphological structures to enhance understanding of the data’s internal characteristics. A transformer encoder with cross-attention further refines feature extraction from a global perspective, leading to improved classification performance compared to existing methods.

In conclusion, the SSATNet framework facilitates non-destructive identification of hyperspectral corn varieties by utilizing a 3D-2D cascade structure to simplify image data and effectively extract local features. The incorporation of spectral-spatial morphology through expansion and erosion operations enhances the model’s ability to comprehend the data’s nature. The transformer structure, leveraging self-attention mechanisms, captures global dependencies within the corn spectra, demonstrating the model’s efficacy. Ablation studies confirm the significance of each component in feature extraction and classification, positioning SSATNet as a promising advancement in intelligent agriculture.

Introduction

The introduction of the paper discusses the advancements and applications of hyperspectral imaging technology, which captures an object’s spectral properties across multiple bands, providing rich feature information crucial for various fields, including intelligent agriculture. The technology addresses challenges in accurately identifying corn seed varieties amidst a growing market, highlighting the significance of hyperspectral image classification techniques enhanced by machine learning and deep learning methods. Notable contributions from recent studies include the development of algorithms that improve classification accuracy while reducing computational overhead, such as self-encoders, deep support vector machines, and various convolutional neural network (CNN) architectures.

The paper emphasizes the transition from traditional machine learning methods, which rely on manual feature extraction, to deep learning approaches that automate this process, thereby enhancing classification performance. It reviews various deep learning frameworks, including hybrid CNN-Transformer models and specialized architectures designed to capture both spectral and spatial features effectively. The authors propose a novel approach that combines 2D-3D convolution and Transformer architectures to leverage spectral-spatial morphological features for identifying hyperspectral corn seed varieties. Key contributions include a 3D-2D convolutional cascade structure for feature extraction, a spectral-spatial morphology structure for enhanced data understanding, and a Transformer Encoder with Cross-Attention for comprehensive feature refinement.

Methods

The methodology section outlines the architecture and dataset utilized for the proposed Spectral-Spatial Attention Transformer (SSATNet) aimed at hyperspectral corn image classification. The network comprises several key components: a 3D-2D Convolutional Module, Spectral-Spatial Morphology, a Transformer Encoder with Cross-Attention, and a Classifier, as depicted in Figure 1.

To evaluate the performance of SSATNet, the researchers employed a hyperspectral corn image dataset sourced from SSTNet (Zhang et al., 2022b), which includes images of 10 distinct corn varieties, each represented by 120 samples. The dataset spans a spectral range of 400 to 1000 nm across 128 bands. To optimize computational efficiency while retaining essential features, the original image resolution of 696 × 520 was downsampled to 210 × 200. The corn seed images were collected from various planting areas in Henan Province, featuring varieties such as FengDa601, BaiYu9284, and BaiYu607, among others. Figure 3 illustrates the spectral band maps for selected samples from the dataset.

Results

The results of extensive experiments on hyperspectral corn image classification demonstrate the effectiveness of the proposed model compared to various established methods, including KNN, SGD, RFA, HybridNet, SSTNet, CTMixer, MSTNet, MATNet, and 3DCT. As shown in Table 1, traditional machine learning models such as KNN, RFA, and SGD exhibited poor performance across all evaluation metrics, with RFA performing the worst due to its inability to effectively extract deep spectral-spatial features. In contrast, models that utilize 3D convolution, like HybridNet, SSTNet, and 3DCT, achieved superior results by capturing both spectral and spatial features simultaneously.

Furthermore, models leveraging the Transformer architecture, including CTMixer, MSTNet, and MATNet, effectively addressed the complex relationships in hyperspectral data. Notably, the proposed model, which integrates convolutional networks with Transformers and employs a novel spectral-spatial attention mechanism, outperformed all comparison methods across all metrics, significantly enhancing Precision, Recall, F1-Score, and Kappa. These findings underscore the model’s capability to capture intricate spectral-spatial features and its superior generalization to high-dimensional datasets, establishing it as a state-of-the-art approach in hyperspectral image classification.

Discussion

The discussion section of the research paper emphasizes the integration of hyperspectral imaging technology and deep learning, particularly through the use of transformer models, for the classification of corn seeds. This approach is critical for enhancing agricultural productivity and preserving genetic resources. Hyperspectral imaging captures detailed spectral information across various wavelengths, which, when combined with the powerful feature extraction capabilities of transformer models, allows for improved classification accuracy. The self-attention mechanism inherent in transformers enables the model to focus on significant areas within hyperspectral images, thereby optimizing the use of spectral data for corn seed classification.

The paper also details the implementation of a 3D-2D convolution module, which leverages both 3D and 2D convolutions to enhance feature extraction. The 3D convolution captures complex spectral-spatial relationships, while the 2D convolution efficiently extracts local spatial features. This combination is essential for maintaining computational efficiency while maximizing classification performance. Additionally, the integration of spectral-spatial morphology techniques with convolutional methods further refines feature extraction by enhancing spatial structures and managing spectral relationships. The proposed SSATNet method, which incorporates these advanced techniques, demonstrates significant improvements in classification accuracy and robustness against noise, thereby contributing to the field of intelligent agriculture.