ترميز المصدر-القناة العميق متعدد القفزات مع تقطير التجزئة العميقة لاستعادة الصور المتوافقة دلاليًا Multi-Hop Deep Joint Source-Channel Coding With Deep Hash Distillation for Semantically Aligned Image Recovery

المجلة: ICASSP 2026 – 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
DOI: https://doi.org/10.1109/icassp55912.2026.11462344
تاريخ النشر: 2026-04-21
المؤلف: D. E. BERGSTROM وآخرون
الموضوع الرئيسي: تقنيات ضغط البيانات المتقدمة

نظرة عامة

في هذا البحث، يحقق المؤلفون في نقل الصور باستخدام إطار عمل الترميز المشترك العميق للمصدر والقناة (DeepJSCC) عبر قنوات الضوضاء البيضاء الغاوسية المضافة متعددة القفزات (AWGN). يقومون بتحسين زوج الترميز-فك الترميز DeepJSCC من خلال دمج وحدة تقطير الهاش العميق المدرب مسبقًا (DHD)، والتي تساعد في تجميع الصور بشكل دلالي. لا يحسن هذا النهج فقط جودة إعادة البناء الإدراكية ولكن يعزز أيضًا التطبيقات الموجهة للأمان من خلال تحسين الاتساق الدلالي. تركز عملية التدريب على تقليل كل من متوسط الخطأ التربيعي (MSE) والمسافة الكوسينية بين هاشات DHD للصور المصدر والمُعاد بناؤها. تظهر النتائج أن الطريقة المقترحة تخفف بشكل كبير من مشاكل تراكم الضوضاء التي تواجه عادة DeepJSCC التقليدي، كما يتضح من الأداء المحسن على مقياس تشابه بقع الصور الإدراكية المتعلم (LPIPS) عبر تكوينات متعددة القفزات مختلفة.

تشير النتائج إلى أن مخطط DeepJSCC-DHD متعدد القفزات يعزز بشكل فعال المحاذاة الدلالية في الجودة الإدراكية للصور المُعاد بناؤها لكل من إعدادات فك الترميز والإرسال (DF) والتكميم والإرسال (QF). لا يعزز دمج وحدة DHD التجميع الدلالي فحسب، بل يضمن أيضًا القوة ضد آثار التكميم. وبالتالي، يثري هذا النهج قدرات DeepJSCC من خلال دمج المحاذاة الدلالية، مما يسهل التدريب الموجه للإدراك ويمكّن تطبيقات المصادقة الآمنة في سياقات DeepJSCC.

مقدمة

في المقدمة، يناقش المؤلفون قيود أنظمة الاتصال التقليدية، التي تفصل عادة بين ترميز المصدر وترميز القناة. بينما تشير نظرية الفصل لشانون إلى أن هذه الطريقة مثالية بشكل تقريبي، إلا أنها تعاني من نقص في التطبيقات العملية ذات الأطوال الكتلية المحدودة. يبرز المؤلفون مزايا الترميز المشترك للمصدر والقناة (JSCC)، خاصة من خلال إطار DeepJSCC، الذي يستفيد من التعلم العميق لتجاوز الطرق التقليدية المعتمدة على الفصل. ومع ذلك، تواجه DeepJSCC تحديات في سيناريوهات الإرسال متعدد القفزات بسبب تراكم الضوضاء، مما يؤدي إلى تدهور جودة الصور المعاد بناؤها ويعقد المصادقة التشفيرية.

لمعالجة هذه القضايا، يقترح المؤلفون بنية جديدة تدمج تقطير الهاش العميق (DHD) في إطار DeepJSCC، مما يسهل الاتصال الدلالي من خلال التركيز على المعنى الأساسي للبيانات. يسمح هذا الدمج بتجميع دلالي، مما يساعد في تخفيف تراكم الضوضاء ويعزز الجودة الإدراكية في الإرسال متعدد القفزات (DF). تستكشف الورقة أيضًا آثار تكميم مخرجات القناة على المحاذاة الدلالية في الإرسال متعدد القفزات (QF). تشير النتائج إلى أن النهج المقترح لا يحسن فقط قوة أنظمة الاتصال ضد الضوضاء ولكن أيضًا يحافظ على سلامة المعنى للبيانات المرسلة، مما يمكّن التطبيقات الموجهة للأمان في البيئات المليئة بالضوضاء.

طرق

في هذا القسم، يصف المؤلفون إعداد التجربة لدراستهم، باستخدام مجموعة فرعية من مجموعة بيانات NUS-WIDE، التي تتكون من 9,450 صورة تدريب، 1,050 صورة تحقق، و2,100 صورة اختبار، جميعها بدقة $256 \times 256$. ترتبط الصور بتسميات فئات مشفرة متعددة عبر 21 فئة. يعمل نموذج ResNet50 المدرب مسبقًا كترميز، مما ينتج عنه بعد ميزات قدره $N_E = 2048$. يتم تعيين طول الهاش إلى $N_H = 64$ بت، ويتم تحديد معلمات مختلفة، بما في ذلك نسبة عرض النطاق $\rho = \frac{1}{3}$ ومعامل الانتظام $\lambda = 0.06$، بناءً على الأعمال السابقة.

تستخدم عملية التدريب مُحسّن آدم بمعدل تعلم قدره $10^{-4}$، يتم تعديله بواسطة جدولة معدل التعلم المضاعف. يضمن المؤلفون مقارنة عادلة مع النماذج الأساسية من خلال الحفاظ على إعدادات ومعمارية المعلمات الفائقة المتطابقة. بالنسبة للإطار الموزع (DF)، يتم تعديل أحجام الدفعات الصغيرة بناءً على عدد المراسلين لمنع تجاوز الذاكرة على وحدة معالجة الرسوميات (GPU). يتم تدريب النماذج واختبارها عبر نسب الإشارة إلى الضوضاء (SNRs) المختلفة -5، -10، و-15 ديسيبل، مما يؤدي إلى تدريب 12 نموذج DF متميز. بالإضافة إلى ذلك، بالنسبة لإطار التكميم (QF) في إعداد متعدد القفزات، تتم معالجة مخرجات التحقق باستخدام تجميع K-means لحساب مراكز التكميم، ويتم تقييم مجموعة الاختبار من خلال عملية التكميم-إلغاء التكميم، مع تفاصيل النتائج في الأقسام التالية.

نقاش

في هذا القسم، يناقش المؤلفون إطار نقل الصور اللاسلكية متعدد القفزات باستخدام الترميز المشترك العميق للمصدر والقناة (DeepJSCC) وتقطير الهاش العميق (DHD) لتعزيز المحاذاة الدلالية في الصور المعاد بناؤها. يتم صياغة المشكلة حول نقل صورة مصدر $ S $ عبر $ r $ عقدة إرسال متصلة بواسطة قنوات الضوضاء البيضاء الغاوسية المضافة (AWGN). يتم تقييم الأداء باستخدام متوسط نسبة ذروة الإشارة إلى الضوضاء (PSNR) وتشابه بقع الصور الإدراكية المتعلم (LPIPS)، حيث يوفر الأخير محاذاة أفضل مع الإدراك البشري. يدمج المخطط المقترح وحدة DHD التي تولد هاشات دلالية لتسهيل تحسين جودة إعادة بناء الصورة، خاصة من حيث التشابه الإدراكي.

تشير النتائج التجريبية إلى أنه بينما يحقق مخطط DeepJSCC-DHD المقترح جودة إدراكية أعلى (درجات LPIPS أقل) مقارنة بالطرق الأساسية، فإنه يظهر تبادلًا مع PSNR، خاصة مع زيادة عدد القفزات. يشير هذا إلى أن النظام يعطي الأولوية للمحاذاة الدلالية على الدقة على مستوى البكسل. في إعداد التكميم والإرسال (QF)، يحافظ النهج المقترح على محاذاة دلالية مستقرة حتى في ظل ظروف الضوضاء العالية، مما يظهر قوته وإمكانية تطبيقه في سياقات موجهة للأمان. بشكل عام، تسلط النتائج الضوء على فعالية دمج DeepJSCC مع DHD لتعزيز المحاذاة الدلالية في أنظمة الإرسال متعددة القفزات، مما يمهد الطريق لتطبيقات مستقبلية في نقل الصور المدفوعة بالإدراك والأمان.

Journal: ICASSP 2026 – 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
DOI: https://doi.org/10.1109/icassp55912.2026.11462344
Publication Date: 2026-04-21
Author(s): D. E. BERGSTROM et al.
Primary Topic: Advanced Data Compression Techniques

Overview

In this research, the authors investigate image transmission using a deep joint source-channel coding (DeepJSCC) framework over multi-hop additive white Gaussian noise (AWGN) channels. They enhance the DeepJSCC encoder-decoder pair by integrating a pre-trained deep hash distillation (DHD) module, which aids in semantically clustering images. This approach not only improves the perceptual reconstruction quality but also enhances security-oriented applications through better semantic consistency. The training process focuses on minimizing both the mean squared error (MSE) and the cosine distance between the DHD hashes of the source and reconstructed images. The results demonstrate that the proposed method significantly mitigates noise accumulation issues typically faced by classical DeepJSCC, as evidenced by improved performance on the learned perceptual image patch similarity (LPIPS) metric across various multi-hop configurations.

The findings indicate that the multi-hop DeepJSCC-DHD scheme effectively enhances semantic alignment in the perceptual quality of reconstructed images for both decode-and-forward (DF) and quantize-and-forward (QF) relay settings. The integration of the DHD module not only bolsters semantic clustering but also ensures robustness against quantization effects. Consequently, this approach enriches the capabilities of DeepJSCC by incorporating semantic alignment, thereby facilitating perception-oriented training and enabling secure authentication applications in DeepJSCC contexts.

Introduction

In the introduction, the authors discuss the limitations of conventional communication systems, which typically separate source coding and channel coding. While Shannon’s separation theorem indicates that this method is asymptotically optimal, it falls short in practical applications with finite block lengths. The authors highlight the advantages of joint source-channel coding (JSCC), particularly through the DeepJSCC framework, which leverages deep learning to outperform traditional separation-based methods. However, DeepJSCC faces challenges in multi-hop relaying scenarios due to noise accumulation, which degrades the quality of reconstructed images and complicates cryptographic authentication.

To address these issues, the authors propose a novel architecture that integrates Deep Hash Distillation (DHD) into the DeepJSCC framework, facilitating semantic communication by focusing on the underlying meaning of the data. This integration allows for semantic clustering, which helps mitigate noise accumulation and enhances perceptual quality in multi-hop decode-and-forward (DF) relaying. The paper further explores the effects of channel output quantization on semantic alignment in multi-hop quantize-and-forward (QF) relaying. The findings suggest that the proposed approach not only improves the robustness of communication systems against noise but also preserves the semantic integrity of the transmitted data, thereby enabling security-oriented applications in noisy environments.

Methods

In this section, the authors describe the experimental setup for their study, utilizing a subset of the NUS-WIDE dataset, which comprises 9,450 training images, 1,050 validation images, and 2,100 test images, all with a resolution of $256 \times 256$. The images are associated with multi-hot encoded class labels across 21 classes. A pre-trained ResNet50 model serves as the encoder, yielding a feature dimension of $N_E = 2048$. The hash length is set to $N_H = 64$ bits, and various parameters, including the bandwidth ratio $\rho = \frac{1}{3}$ and regularization parameter $\lambda = 0.06$, are specified based on previous works.

The training employs the Adam optimizer with a learning rate of $10^{-4}$, adjusted by a multiplicative learning rate scheduler. The authors ensure a fair comparison with baseline models by maintaining identical hyperparameter settings and architectures. For the distributed framework (DF), mini-batch sizes are adjusted based on the number of relays to prevent memory overflow on the GPU. The models are trained and tested across different signal-to-noise ratios (SNRs) of -5, -10, and -15 dB, leading to the training of 12 distinct DF models. Additionally, for the quantization framework (QF) multi-hop setting, the validation outputs are processed using K-means clustering to compute quantization centers, and the test set is evaluated through a quantization-dequantization process, with results detailed in subsequent sections.

Discussion

In this section, the authors discuss a multi-hop wireless image transmission framework utilizing Deep Joint Source-Channel Coding (DeepJSCC) and Deep Hash Distillation (DHD) to enhance semantic alignment in reconstructed images. The problem is formulated around transmitting a source image $ S $ through $ r $ relay nodes connected by additive white Gaussian noise (AWGN) channels. The performance is evaluated using average Peak Signal-to-Noise Ratio (PSNR) and Learned Perceptual Image Patch Similarity (LPIPS), with the latter providing a better alignment with human perception. The proposed scheme integrates a DHD module that generates semantic hashes to facilitate improved image reconstruction quality, particularly in terms of perceptual similarity.

The experimental results indicate that while the proposed DeepJSCC-DHD scheme achieves higher perceptual quality (lower LPIPS scores) compared to baseline methods, it exhibits a trade-off with PSNR, particularly as the number of hops increases. This suggests that the system prioritizes semantic alignment over pixel-wise accuracy. In the Quantize-and-Forward (QF) setting, the proposed method maintains stable semantic alignment even under high noise conditions, demonstrating its robustness and potential applicability in security-oriented contexts. Overall, the findings highlight the effectiveness of combining DeepJSCC with DHD for enhancing semantic alignment in multi-hop relaying systems, paving the way for future applications in perceptually-driven and secure image transmission.