ترميز مشترك عميق مدرك لأهمية الميزات لنقل الصور بكفاءة حسابية وقابلة للتعديل Feature Importance-Aware Deep Joint Source-Channel Coding for Computationally Efficient and Adjustable Image Transmission

المجلة: IEEE Internet of Things Journal
DOI: https://doi.org/10.1109/jiot.2026.3680582
تاريخ النشر: 2026-01-01
المؤلف: H M Choi وآخرون
الموضوع الرئيسي: تصنيف تعديل الإشارة اللاسلكية

نظرة عامة

تقدم ورقة البحث نموذج FAJSCC (نموذج JSCC العميق المدرك لأهمية الميزات)، الذي يعزز كفاءة نقل الصور مع السماح بتعقيد حسابي قابل للتعديل. يستخدم FAJSCC حسابات متخصصة في أبعاد المحاور لأداء عمليات مصممة لكل محور مكاني وقناة، مما يقلل من التكاليف الحسابية دون التضحية بتمثيل الميزات. بالإضافة إلى ذلك، يتضمن آلية انتباه ذاتي قابلة للتشوه الانتقائي تركز على الميزات المهمة، مما يمكّن النموذج من التقاط العلاقات المعقدة بشكل فعال. من الجدير بالذكر أن FAJSCC هو أول بنية JSCC عميقة تسمح بالتحكم المستقل في الموارد الحسابية لكل من المشفر والمفكك، مما يتكيف مع الميزانيات الحسابية المتغيرة.

تظهر النتائج التجريبية أن FAJSCC يتفوق على النماذج الحالية الرائدة، مثل SwinJSCC، في أداء نقل الصور عبر ظروف قنوات متنوعة مع الحفاظ على تعقيد حسابي أقل. يكشف التحليل أن معالجة المفكك للميزات المزعجة هي الجانب الأكثر استهلاكًا للموارد الحسابية. تهدف الأعمال المستقبلية إلى تعزيز FAJSCC من خلال تخصيص الموارد الحسابية ديناميكيًا لوحدة إدراك الإشارة في المفكك بناءً على ظروف القناة، مما يعزز الكفاءة والقدرة على التكيف في سيناريوهات الاتصال العملية. الشيفرة الخاصة بـ FAJSCC متاحة للجمهور على github.com/hansung-choi/FAJSCCv2.

مقدمة

تناقش مقدمة هذه الورقة البحثية التقدم في الأجهزة المدمجة عالية الأداء وتقنيات الاتصال اللاسلكي من الجيل التالي، وخاصة 6G، التي سهلت انتشار أجهزة إنترنت الأشياء (IoT) عبر مختلف القطاعات، بما في ذلك الرعاية الصحية الذكية والمدن الذكية. يتم تسليط الضوء على أجهزة IoT المرئية، مثل كاميرات المراقبة والطائرات بدون طيار، لدورها الحاسم في توفير البيانات المرئية لمراقبة الأمن والسلامة. ومع ذلك، أدى العدد المتزايد من هذه الأجهزة إلى قيود في النطاق الترددي، خاصة لأولئك الذين يتطلبون نقل بيانات كبيرة، مثل المعلومات المرئية. تؤكد الورقة على عدم كفاية أنظمة الاتصال الرقمية التقليدية المعتمدة على الفصل لتلبية الطلبات على نقل بيانات الرؤية عالية الجودة، مما يدفع نحو التحول إلى طرق الترميز المشترك بين المصدر والقناة (JSCC)، وخاصة تلك التي تستفيد من تقنيات التعلم العميق (deep-JSCC).

يقترح المؤلفون إطار عمل جديد يسمى FAJSCC (JSCC العميق المدرك لأهمية الميزات) لمعالجة تحديات التعقيد الحسابي والأداء في أنظمة deepJSCC. يستخدم FAJSCC حسابات متخصصة في أبعاد المحاور وآليات انتباه ذاتي قابلة للتشوه الانتقائي لتعزيز الكفاءة الحسابية مع الحفاظ على أداء نقل عالي. يسمح هذا الإطار بالتحكم الديناميكي في التعقيد الحسابي، مما يمكّن من التعديلات المستقلة على جانبي المرسل والمستقبل، وهو أمر بالغ الأهمية للتطبيقات ذات المتطلبات المتغيرة. كما توضح المقدمة أهمية النتائج، بما في ذلك القدرة على تحليل التعقيد الحسابي في أنظمة deepJSCC وتحديد حساسية المفكك لتخصيص الموارد الحسابية، مما يوفر رؤى للتطورات المستقبلية في أنظمة الاتصال الفعالة.

طرق

في هذا القسم، يصف المؤلفون إعداد التجارب والمنهجيات المستخدمة لتقييم إطار العمل المقترح، FAJSCC، جنبًا إلى جنب مع مجموعة متنوعة من البنى الأساسية. تم تنفيذ نسختين من FAJSCC: النسخة الكاملة، التي تتضمن انتباهًا ذاتيًا قابلاً للتشوه الانتقائي، ونسخة مبسطة تُسمى LAJSCC (JSCC العميق مع انتباه خفيف)، التي تستبعد هذه الوحدة لتقييم الكفاءة الحسابية. كما تقارن الدراسة FAJSCC ضد متغيرات deepJSCC المعروفة، بما في ذلك Con-vJSCC وResJSCC وSwinJSCC وLICRFJSCC، مع تكوينات محددة بناءً على الأبحاث السابقة. يتم تلخيص تفاصيل البنية في الجدول I، ويتم توفير الشيفرة التنفيذية للجمهور.

لتوضيح أداء FAJSCC، يقارن المؤلفون أدائه مع الطرق الحديثة المعتمدة على الفصل تحت قناة AWGN عند نسبة إشارة إلى ضوضاء (SNR) تبلغ 10، باستخدام رموز LDPC بمعدل 2/3 لترميز القناة و16-QAM للتعديل. يتم قياس مؤشرات الأداء باستخدام مقاييس Bjøntegaard delta (BD)، التي تقيم نسبة تقليل بتات الترميز لكل بكسل (CPP) والتحسينات في نسبة الإشارة إلى الضوضاء القصوى (PSNR) والتشابه الهيكلي متعدد المقاييس (MS-SSIM). تشير النتائج إلى أن FAJSCC يحقق أداءً تنافسياً، خاصة من حيث BD-CPP وBD-MS-SSIM، متفوقًا على كل من BPG وVTM، خاصة تحت تقييمات التشابه الهيكلي. من الجدير بالذكر أن FAJSCC يظهر كفاءة زمنية متفوقة، مما يجعله حلاً واعدًا لنقل الصور في الوقت الحقيقي في تطبيقات IoT.

مناقشة

في هذا القسم، يقدم المؤلفون صياغة شاملة لنظام نقل الصور من نقطة إلى نقطة باستخدام الترميز المشترك بين المصدر والقناة العميق (deepJSCC) عبر قنوات ضوضائية، مع التركيز بشكل خاص على الضوضاء البيضاء الجاوسية المضافة (AWGN) وقنوات تلاشي رايلي السريع. يقوم مشفر deepJSCC بتحويل صورة مصدر RGB إلى إدخال قناة مع الالتزام بحدود الطاقة. يتم تقييم أداء النظام من خلال نسبة الإشارة إلى الضوضاء القصوى (PSNR) ومقياس مؤشر التشابه الهيكلي (SSIM)، اللذان يقيمان تشويه البكسل بدقة ودقة هيكلية، على التوالي. يؤكد المؤلفون على أهمية تقليل التشويه للتطبيقات مثل المراقبة اللاسلكية، ويقترحون نموذج deepJSCC فعال من الناحية الحسابية يوازن بين جودة الصورة واستخدام الموارد.

يتضمن إطار FAJSCC المقترح تقنيات متقدمة مثل الحسابات المتخصصة في أبعاد المحاور وآليات انتباه ذاتي قابلة للتشوه الانتقائي لتعزيز الكفاءة الحسابية مع الحفاظ على أداء عالٍ. من خلال استخدام الالتفافات العميقة والنقطية، يقلل النموذج من التعقيد الحسابي دون التضحية بالقدرة التمثيلية. تتيح آلية الانتباه القابلة للتشوه الانتقائي معالجة مرنة للميزات المهمة، مما يحسن تخصيص الموارد. كما يقدم المؤلفون شجرة عائلة الانتباه لتبسيط استخراج ميزات الانتباه، والإزاحات، وأهمية النوافذ، مما يقلل من الأعباء الحسابية. تستخدم عملية التدريب أخذ عينات Gumbel-Softmax لتسهيل تدفق التدرج أثناء تحسين دالة خسارة مدركة لأهمية الميزات، مما يضمن تعلمًا فعالًا لبارامترات النموذج. بشكل عام، تهدف الطرق المقترحة إلى تحقيق أداء نقل صور متفوق مع تقليل التكاليف الحسابية، مما يمثل تقدمًا كبيرًا في منهجيات deepJSCC.

Journal: IEEE Internet of Things Journal
DOI: https://doi.org/10.1109/jiot.2026.3680582
Publication Date: 2026-01-01
Author(s): H M Choi et al.
Primary Topic: Wireless Signal Modulation Classification

Overview

The research paper introduces the Feature Importance-Aware deepJSCC (FAJSCC) model, which enhances image transmission efficiency while allowing for adjustable computational complexity. FAJSCC employs axis-dimension specialized computation to perform operations tailored to each spatial and channel axis, thereby reducing computational costs without sacrificing feature representation. Additionally, it incorporates a selective deformable self-attention mechanism that focuses on important features, enabling the model to capture complex correlations effectively. Notably, FAJSCC is the first deepJSCC architecture that permits independent control over the computational resources of both the encoder and decoder, adapting to varying computational budgets.

Experimental results demonstrate that FAJSCC outperforms existing state-of-the-art models, such as SwinJSCC, in image transmission performance across diverse channel conditions while maintaining lower computational complexity. The analysis reveals that the decoder’s processing of noisy features is the most computationally intensive aspect. Future work aims to enhance FAJSCC by dynamically allocating computational resources to the decoder’s signal perception module based on channel conditions, thereby optimizing efficiency and adaptability in practical communication scenarios. The code for FAJSCC is publicly accessible at github.com/hansung-choi/FAJSCCv2.

Introduction

The introduction of this research paper discusses the advancements in high-performance embedded hardware and next-generation wireless communication technologies, particularly 6G, which have facilitated the proliferation of Internet of Things (IoT) devices across various sectors, including smart healthcare and smart cities. Vision IoT devices, such as surveillance cameras and UAVs, are highlighted for their critical role in providing visual data for security and safety monitoring. However, the increasing number of these devices has led to bandwidth constraints, particularly for those requiring substantial data transmission, such as visual information. The paper emphasizes the inadequacy of traditional separation-based digital communication systems in meeting the demands for high-quality vision data transmission, prompting a shift towards joint source-channel coding (JSCC) methods, particularly those leveraging deep learning techniques (deep-JSCC).

The authors propose a novel framework called Feature Importance-Aware DeepJSCC (FAJSCC) to address the challenges of computational complexity and performance in deepJSCC systems. FAJSCC employs axis-dimension specialized computation and selective deformable self-attention mechanisms to enhance computational efficiency while maintaining high transmission performance. This framework allows for dynamic control of computational complexity, enabling independent adjustments at both the transmitter and receiver sides, which is crucial for applications with varying requirements. The introduction also outlines the significance of the findings, including the ability to analyze computational complexity in deepJSCC systems and the identification of the decoder’s sensitivity to computational resource allocation, providing insights for future developments in efficient communication systems.

Methods

In this section, the authors describe the experimental setup and methodologies used to evaluate their proposed framework, FAJSCC, alongside various baseline architectures. Two versions of FAJSCC are implemented: the full version, which includes selective deformable self-attention, and a simplified version termed lightattention deepJSCC (LAJSCC), which omits this module to assess computational efficiency. The study also compares FAJSCC against established deepJSCC variants, including Con-vJSCC, ResJSCC, SwinJSCC, and LICRFJSCC, with specific scaling configurations based on prior research. The architecture details are summarized in Table I, and the implementation code is made publicly available.

To contextualize FAJSCC’s performance, the authors compare it with modern separation-based methods under an AWGN channel at a signal-to-noise ratio (SNR) of 10, utilizing 2/3-rate LDPC codes for channel coding and 16-QAM for modulation. Performance metrics are quantified using Bjøntegaard delta (BD) metrics, which assess the percentage reduction in coding bits per pixel (CPP) and improvements in peak signal-to-noise ratio (PSNR) and multiscale structural similarity (MS-SSIM). Results indicate that FAJSCC achieves competitive performance, particularly in terms of BD-CPP and BD-MS-SSIM, outperforming both BPG and VTM, especially under structural similarity evaluations. Notably, FAJSCC demonstrates superior latency efficiency, making it a promising solution for real-time image transmission in IoT applications.

Discussion

In this section, the authors present a comprehensive formulation of a point-to-point image transmission system utilizing deep Joint Source-Channel Coding (deepJSCC) over noisy channels, specifically focusing on the additive white Gaussian noise (AWGN) and fast Rayleigh fading channels. The deepJSCC encoder maps an RGB source image to a channel input while adhering to a power constraint. The performance of the system is evaluated through the peak signal-to-noise ratio (PSNR) and the structural similarity index measure (SSIM), which assess pixel-wise distortion and structural accuracy, respectively. The authors emphasize the importance of minimizing distortion for applications such as wireless surveillance, and they propose a computationally efficient deepJSCC model that balances image quality and resource usage.

The proposed FAJSCC framework incorporates advanced techniques such as axis-dimension specialized computation and selective deformable self-attention to enhance computational efficiency while maintaining high performance. By employing depthwise and pointwise convolutions, the model reduces computational complexity without sacrificing representational capability. The selective deformable self-attention mechanism allows for adaptive processing of important features, thereby optimizing resource allocation. The authors also introduce an attention family tree to streamline the extraction of attention features, offsets, and window importance, further minimizing computational overhead. The training process utilizes Gumbel-Softmax sampling to facilitate gradient flow while optimizing a feature importance-aware loss function, ensuring effective learning of the model’s parameters. Overall, the proposed methods aim to achieve superior image transmission performance with reduced computational costs, marking a significant advancement in deepJSCC methodologies.