تصنيف الصور الطيفية الفائقة عبر المجالات بناءً على التكيف ثنائي الاتجاه Cross-Domain Hyperspectral Image Classification Based on Bi-Directional Domain Adaptation

المجلة: IEEE Transactions on Circuits and Systems for Video Technology، المجلد: 35، العدد: 12
DOI: https://doi.org/10.1109/tcsvt.2025.3586282
تاريخ النشر: 2025-07-07
المؤلف: Yuxiang Zhang وآخرون
الموضوع الرئيسي: الاستشعار عن بعد واستخدام الأراضي

نظرة عامة

تقدم ورقة البحث إطار عمل جديد للتكيف الثنائي الاتجاه (BiDA) يهدف إلى تحسين تصنيف الصور الطيفية العالية (HSI) عبر المجالات. يتناول هذا الإطار تحدي التحولات الطيفية في فئات تغطية الأرض عندما يتم الحصول على بيانات التدريب والاختبار من مناطق أو أوقات مختلفة. تستخدم البنية المقترحة نموذج محول ثلاثي الفروع، يتكون من فرع مصدر، وفرع هدف، وفرع مرتبط، مما يسهل بشكل جماعي استخراج الميزات المستقلة عن المجال والمعلومات المحددة للمجال. تم تقديم آلية انتباه متقاطع متعدد الرؤوس المرتبط (CMCA) لتمكين التفاعل الفعال للميزات واستخراج الارتباطات بين المجالات، بينما تم تصميم خسارة تقطير ثنائية الاتجاه لتوجيه عملية التعلم في الفضاء التكيفي.

تشير النتائج من التجارب الواسعة على مجموعات بيانات زمنية ومكانية متقاطعة إلى أن إطار BiDA يتفوق بشكل كبير على طرق التكيف بين المجالات الحالية، محققًا تحسينات تتراوح بين 3% إلى 5% في مهمة تصنيف أنواع الأشجار عبر الزمن. بالإضافة إلى ذلك، يعزز تنفيذ استراتيجية تعزيز التكيف (ARS) قدرة النموذج على التركيز على استخراج الميزات العامة في وجود الضوضاء، مما يعزز فعالية الإطار في التطبيقات الواقعية.

مقدمة

تناقش مقدمة الورقة أهمية الصور الطيفية العالية (HSIs) في تطبيقات متنوعة، بما في ذلك إدارة الموارد ومراقبة التنوع البيولوجي، مع التأكيد على قدرتها على تمييز المواد من خلال المعلومات الطيفية الدقيقة. يعد تصنيف HSI أمرًا حيويًا في الاستشعار عن بعد، ومع ذلك تواجه طرق التصنيف أحادية المشهد الحالية تحديات عندما لا تتشارك بيانات التدريب والاختبار نفس التوزيع، خاصة في المهام المتقاطعة بين المشاهد أو الزمن. يمكن أن تؤدي هذه الفجوة إلى أخطاء تعميم عالية وأداء تصنيف ضعيف بسبب التباينات في الإضاءة والخصائص الطيفية، وهو ظاهرة تعرف باسم التحول الطيفي.

لمعالجة هذه التحديات، تقترح الورقة إطار عمل للتكيف الثنائي الاتجاه (BiDA)، الذي يستخدم بنية ثلاثية الفروع تتكون من فروع المصدر والهدف والمرتبط. يهدف هذا الإطار إلى تعلم الفضاءات التكيفية بشكل مستقل لكل من مجالات المصدر والهدف مع الاستفادة من آلية انتباه متقاطع متعدد الرؤوس المرتبط (CMCA) لتسهيل الارتباطات بين المجالات بشكل ثنائي الاتجاه. بالإضافة إلى ذلك، تم تقديم خسارة تقطير ثنائية الاتجاه لتعزيز عملية التدريب، وتم تصميم استراتيجية تعزيز التكيف (ARS) لتحسين استخراج الميزات العامة داخل المجال. من المتوقع أن يعزز إطار BiDA من قدرة التكيف وأداء التصنيف للصور الطيفية العالية عبر مجالات وظروف متنوعة.

طرق

في هذا القسم، يقدم المؤلفون النتائج التجريبية وتحليل طريقتهم المقترحة BiDA، باستخدام ثلاث مجموعات بيانات متميزة: مجموعة بيانات MFF الجوية عبر الزمن، مجموعة بيانات هيوستن عبر الزمن، ومجموعة بيانات HyRANK عبر المشهد. يتم استخدام هذه المجموعات لتقييم فعالية BiDA في مهام التصنيف عبر المجالات الدقيقة. يتم استخدام مجموعة متنوعة من خوارزميات المقارنة، بما في ذلك عدة طرق للتكيف العميق غير المشرف المستندة إلى المحولات، لتقييم الأداء. تشمل مقاييس التقييم دقة التصنيف (CA)، والدقة العامة (OA)، ومعامل كابا (KC)، مع تدريب جميع الخوارزميات باستخدام بيانات مجال المصدر (SD) وبيانات مجال الهدف (TD) بدون معلومات تصنيف TD.

تتضمن مجموعة بيانات MFF، التي تم جمعها في مزرعة غابات منغجياغانغ في مقاطعة هيلونغجيانغ، الصين، خمسة أنواع من الأشجار وتتكون من بيانات طيفية عالية تم التقاطها بواسطة جهاز AISA Eagle II المثبت على طائرة. تتميز مجموعة البيانات بتنوع طيفي كبير بسبب الفروق الزمنية في جمع البيانات، التي حدثت خلال فترة نمو قصيرة. تتكون مجموعة بيانات هيوستن من صور فضائية من عامي 2013 و2018، مع وجود فئات متسقة عبر كلا العامين، بينما تتضمن مجموعة بيانات HyRANK صور فضائية طيفية عالية مع 12 فئة، تركز على أنواع مختلفة من النباتات. يتم تقديم التوزيع التفصيلي للعينات المصنفة عبر هذه المجموعات، مع تسليط الضوء على التحديات التي تطرحها أحجام العينات غير المتوازنة، خاصة في مجموعة بيانات MFF.

نقاش

يتناول قسم النقاش في ورقة البحث تصميم ووظائف إطار BiDA المقترح لتصنيف الصور الطيفية العالية (HSI). يستخدم الإطار مُعالجًا دلاليًا يقوم بإنشاء رموز مكانية-طيفية من بيانات HSI، باستخدام إسقاط مكاني-طيفي متعلم بدلاً من الطرق التقليدية المستندة إلى الرقع. ينتج عن هذا المعالج مجموعات مفردات مضغوطة، والتي تتم معالجتها بعد ذلك من خلال مشفر ثلاثي الفروع. يتكون المشفر من فروع المصدر والهدف التي تلتقط الارتباطات داخل المجال، وفرع انتباه متقاطع متعدد الرؤوس المرتبط (CMCA) الذي يسهل استكشاف الارتباطات بين المجالات، مما يعالج التحولات الطيفية بين مجالات المصدر والهدف.

يتضمن الإطار استراتيجية تعزيز التكيف (ARS) لتعزيز قدرة النموذج على استخراج الميزات العامة بين المجالات مع تقليل التحولات الطيفية. يتضمن ذلك تطبيق أنواع مختلفة من الضوضاء على الرموز المدخلة وفرض قيود على الاتساق داخل المجال. تجمع دالة الخسارة العامة للتدريب بين خسارة التصنيف، وخسارة تباين التوزيع الهامشي (MMD)، وخسارة التقطير ثنائية الاتجاه، وخسارة الاتساق، مما يسمح بالتعلم الفعال للتمثيلات المستقلة عن المجال. تشير النتائج إلى أن BiDA يتفوق بشكل كبير على الطرق الحالية في دقة التصنيف عبر مجموعات بيانات متنوعة، مما يوضح فعالية بنيته وأهمية التخصيص في معالجة بيانات HSI.

Journal: IEEE Transactions on Circuits and Systems for Video Technology, Volume: 35, Issue: 12
DOI: https://doi.org/10.1109/tcsvt.2025.3586282
Publication Date: 2025-07-07
Author(s): Yuxiang Zhang et al.
Primary Topic: Remote Sensing and Land Use

Overview

The research paper presents a novel Bi-directional Domain Adaptation (BiDA) framework aimed at enhancing cross-domain hyperspectral image (HSI) classification. This framework addresses the challenge of spectral shifts in land cover classes when training and testing data are sourced from different regions or times. The proposed architecture employs a triple-branch transformer model, comprising a source branch, a target branch, and a coupled branch, which collectively facilitate the extraction of domain-invariant features and domain-specific information. A Coupled Multi-head Cross-attention (CMCA) mechanism is introduced to enable effective feature interaction and inter-domain correlation mining, while a bi-directional distillation loss is designed to guide the learning process in the adaptive space.

The results from extensive experiments on cross-temporal and cross-scene datasets indicate that the BiDA framework significantly outperforms existing state-of-the-art domain adaptation methods, achieving improvements of 3% to 5% in the cross-temporal tree species classification task. Additionally, the implementation of an Adaptive Reinforcement Strategy (ARS) further enhances the model’s ability to focus on generalized feature extraction in the presence of noise, thereby solidifying the framework’s efficacy in real-world applications.

Introduction

The introduction of the paper discusses the significance of hyperspectral images (HSIs) in various applications, including resource management and biodiversity monitoring, emphasizing their ability to differentiate materials through subtle spectral information. HSI classification is crucial in remote sensing, yet existing single-scene classification methods face challenges when training and testing data do not share the same distribution, particularly in cross-scene or cross-temporal tasks. This discrepancy can lead to high generalization errors and poor classification performance due to variations in illumination and spectral characteristics, a phenomenon known as spectral shift.

To address these challenges, the paper proposes a Bi-directional Domain Adaptation (BiDA) framework, which employs a triple-branch architecture consisting of source, target, and coupled branches. This framework aims to independently learn adaptive spaces for both source and target domains while utilizing a Coupled Multi-head Cross Attention (CMCA) mechanism to facilitate bi-directional inter-domain correlations. Additionally, a bi-directional distillation loss is introduced to enhance the training process, and an Adaptability Reinforcement Strategy (ARS) is designed to improve the extraction of intra-domain generalized features. The proposed BiDA framework is expected to enhance the adaptability and classification performance of HSIs across varying domains and conditions.

Methods

In this section, the authors present the experimental results and analysis of their proposed BiDA method, utilizing three distinct datasets: the MFF cross-temporal airborne dataset, the Houston cross-temporal satellite dataset, and the HyRANK cross-scene satellite dataset. These datasets are employed to assess the effectiveness of BiDA in fine-grained cross-domain classification tasks. A variety of comparison algorithms, including several transformer-based and unsupervised deep domain adaptation methods, are utilized to benchmark performance. The evaluation metrics include Classification Accuracy (CA), Overall Accuracy (OA), and Kappa Coefficient (KC), with all algorithms trained using source domain (SD) data and target domain (TD) data without TD label information.

The MFF dataset, collected at Mengjiagang Forest Farm in Heilongjiang Province, China, encompasses five tree species and consists of hyperspectral data captured by the AISA Eagle II imager mounted on an aircraft. The dataset is characterized by significant spectral variability due to the temporal differences in data collection, which occurred over a short growth period. The Houston dataset comprises satellite imagery from 2013 and 2018, featuring consistent classes across both years, while the HyRANK dataset includes satellite hyperspectral imagery with 12 classes, focusing on various vegetation types. The detailed distribution of labeled samples across these datasets is provided, highlighting the challenges posed by imbalanced sample sizes, particularly in the MFF dataset.

Discussion

The discussion section of the research paper elaborates on the design and functionality of the proposed BiDA framework for hyperspectral image (HSI) classification. The framework employs a semantic tokenizer that generates spatial-spectral tokens from HSI data, utilizing a learned spatial-spectral projection rather than traditional patch-based methods. This tokenizer outputs compact vocabulary sets, which are then processed through a triple-branch encoder. The encoder consists of source and target branches that capture intra-domain correlations, and a coupled multi-head cross-attention (CMCA) branch that facilitates the exploration of inter-domain correlations, thereby addressing spectral shifts between source and target domains.

The framework incorporates an adaptability reinforcement strategy (ARS) to enhance the model’s ability to extract domain-generalized features while minimizing spectral shifts. This involves applying different types of noise to the input tokens and enforcing intra-domain consistency constraints. The overall loss function for training combines classification loss, marginal distribution discrepancy (MMD) loss, bidirectional distillation loss, and consistency loss, allowing for effective learning of domain-invariant representations. The results indicate that BiDA significantly outperforms existing methods in classification accuracy across various datasets, demonstrating the effectiveness of its architecture and the importance of tailored tokenization for HSI data.