المعايرة التباينية على التوافق والتمثيلات متعددة المناظر التكميلية Contrastive calibration on consensus and complementary multi-view representations

المجلة: Pattern Recognition، المجلد: 176
DOI: https://doi.org/10.1016/j.patcog.2026.113291
تاريخ النشر: 2026-02-11
المؤلف: Negin Jabari وآخرون
الموضوع الرئيسي: تكييف المجال والتعلم من عدد قليل من الأمثلة

نظرة عامة

تقدم ورقة البحث C4MV، وهو إطار عمل جديد لتعلم التمثيل متعدد المناظر (MRL) الذي يعالج القيود الرئيسية في الأساليب الحالية. غالبًا ما تركز الأساليب التقليدية على تعلم الإجماع مع إغفال المعلومات التكميلية الكامنة في التمثيلات الخاصة بالمناظر. يدمج C4MV بشكل مبتكر تعلم التمثيل بالإجماع والتكميلية من خلال مزيج من تحليل التمثيل الذاتي المشترك وغير المشترك، مستفيدًا من تحليل المصفوفات غير السلبية المنسقة مع تنظيم التنوع لتقليل التكرار عبر المناظر. تقدم خطوة كبيرة إدخال تنظيم المعايرة التباينية، الذي يعزز توافق التمثيلات داخل المناظر وبينها عبر قيود الرسم البياني التبايني، مما يحسن من تمييز المستوى العيني ويقلل الاعتماد على الأزواج السلبية.

تظهر التقييمات الشاملة عبر تسعة مجموعات بيانات حقيقية متنوعة أن C4MV يتفوق باستمرار على 12 طريقة حديثة غير خاضعة للإشراف في MRL، مما يبرز فعاليته في التقاط الميزات المشتركة والخاصة بالمناظر. بينما يظهر الإطار أداءً قويًا، فإنه يتطلب ضبطًا دقيقًا للمعلمات الفائقة، مما قد يحد من قابليته للتكيف. تشمل اتجاهات البحث المستقبلية استكشاف تقنيات التنظيم التكيفية، وتوسيع C4MV إلى إعدادات شبه خاضعة للإشراف، ودمج طرق قائمة على الرسم البياني لتحسين قابلية التوسع. بالإضافة إلى ذلك، يتم تسليط الضوء على إمكانية دمج C4MV مع هياكل مستوحاة من تعلم التحويل أو دمج القواميس التباينية، مما يقترح طرقًا لتطبيقات أوسع في تعلم التمثيل.

مقدمة

تناقش مقدمة ورقة البحث الاهتمام المتزايد في تعلم التمثيل متعدد المناظر (MRL)، الذي يدمج بشكل فعال مصادر بيانات متنوعة تصف نفس الكيان. تم تطوير أساليب مختلفة لـ MRL، بما في ذلك تحليل الارتباط القياسي، والتجميع في الفضاءات الفرعية، والتعلم القائم على الرسم البياني، وتحليل المصفوفات غير السلبية (NMF). ركزت التطورات الأخيرة على معالجة التحديات مثل مشاكل المناظر المتعددة غير المكتملة، وتعزيز الاتفاق عبر المنظورات، واستغلال تقنيات تعلم التنسور. ومن الجدير بالذكر أن NMF قد ظهرت كأداة قوية للتجميع وتمثيل بيانات المناظر المتعددة، حيث تعزز تحليل المصفوفات الجماعية (CMF) الأداء من خلال تحليل المصفوفات بشكل مشترك مع الكيانات المشتركة.

تسلط الورقة الضوء على قيود الأساليب الحالية المعتمدة على الإجماع، التي غالبًا ما تتجاهل المعلومات الفريدة الخاصة بالمناظر، مما يؤدي إلى التكرار وتمثيلات غير مثالية. لمعالجة هذه القضايا، يقترح المؤلفون نموذج تمثيل ذاتي جديد، يسمى المعايرة التباينية على التمثيلات متعددة المناظر بالإجماع والتكميلية (C4MV). يدمج هذا النموذج تعلم التمثيل بالإجماع والتكميلية على مستوى الميزات مع التمييز على مستوى العينات من خلال التعلم التبايني. من خلال استخدام تحليل التشفير-فك التشفير المشترك وغير المشترك، يلتقط إطار عمل C4MV كل من المعلومات المشتركة والفريدة عبر المناظر. يعزز النهج المقترح القدرة التمييزية من خلال تنظيم الرسم البياني التبايني، مما ينسق العينات المماثلة بينما يفصل بين غير المماثلة. تظهر النتائج التجريبية فعالية إطار عمل C4MV، مما يعرض تحسين الأداء في التجميع عبر مجموعات بيانات متعددة.

الطرق

في هذا القسم، يقدم المؤلفون تقييمًا شاملاً لطريقة C4MV المقترحة من خلال تجارب موسعة أجريت على تسعة مجموعات بيانات متعددة المناظر. يتم تقييم أداء C4MV باستخدام أربعة مقاييس تقييم معيارية: الدقة، المعلومات المتبادلة العادية (NMI)، مؤشر راند المعدل (ARI)، ودرجة F1، والتي توفر معًا تقييمًا شاملاً لفعالية التجميع، وتوافق التسمية، والاتفاق الثنائي، والتوازن بين الدقة والاسترجاع. يتم مقارنة النتائج، التي تم حسابها كمتوسط عبر 10 تجارب مع 400 تكرار لكل منها، ضد 12 طريقة تمثيل متعددة المناظر راسخة، مما يظهر فعالية C4MV. يتم تطبيق تجميع K-means على التمثيلات المتعلمة من جميع الطرق باستثناء CCNMF، التي تستخدم التجميع الطيفي.

كما يحدد القسم الطرق المرجعية التي يتم تقييم C4MV ضدها. تشمل هذه DiNMF، التي تعزز تميز الميزات؛ LP-DiNMF، التي تحافظ على الهياكل الهندسية المحلية؛ والعديد من الطرق الأخرى التي تستخدم استراتيجيات متنوعة مثل القيود المتعامدة المشتركة، وتنظيم عدم التكرار، وطرق الرسم البياني المزدوج لتحسين التمثيل متعدد المناظر. يتم وصف كل طريقة بإيجاز، مع تسليط الضوء على مساهماتها الفريدة في مجال التجميع متعدد المناظر. ستفصل الأقسام الفرعية التالية مجموعات البيانات المستخدمة، والطرق الأساسية، والنتائج التجريبية الرئيسية، وحساسية المعلمات، ودراسة الإزالة، وسلوك التقارب.

النتائج

يقيم قسم النتائج أداء التجميع لنموذج C₄MV المقترح مقابل 12 طريقة تمثيل متعددة المناظر حديثة عبر تسعة مجموعات بيانات حقيقية. تشير النتائج، الملخصة في الجدول 3، إلى أن C₄MV يتفوق باستمرار على منافسيه، محققًا أفضل النتائج في 32 من 36 حالة تقييم. في المتوسط، يتجاوز C₄MV الطريقة الثانية الأفضل بحوالي 3.2% في المعلومات المتبادلة العادية (NMI)، و2.14% في الدقة (ACC)، و5.8% في مؤشر راند المعدل (ARI)، و1.74% في درجات F1، مما يوضح فعاليته في استخراج التمثيلات التمييزية والحفاظ على الإجماع.

من الجدير بالذكر أنه في مجموعات البيانات التي تحتوي على تكرار معتدل وهياكل تكميلية، مثل 3Sources وBBCSport، يظهر C₄MV تحسينات كبيرة في الأداء عبر جميع المقاييس، حيث تحسنت ACC من 0.672 إلى 0.704 على 3Sources وبلغت NMI 0.550 على BBCSport. كما يتفوق النموذج في مجموعات البيانات البصرية مثل Caltech101 وCoil100، محققًا تحسينات كبيرة، بما في ذلك زيادة بنسبة 6.5% في ARI على Coil100. علاوة على ذلك، في مجموعات البيانات القائمة على الرسم البياني مثل Cora ومجموعات فرعية من WebKB، يظهر C₄MV مرة أخرى أداءً ممتازًا، مع تحسينات في ARI بنسبة 7.4% على Texas و4.8% على Wisconsin. بشكل عام، تؤكد هذه النتائج تصميم نموذج C₄MV، الذي يفصل بشكل فعال بين المعلومات الإجماعية والتكميلية، مما يعزز استقرار التجميع ووضوحه عبر أنماط متنوعة.

المناقشة

في قسم المناقشة من الورقة، يستعرض المؤلفون تطور أساليب التعلم متعدد المناظر، مع التركيز بشكل خاص على تحليل المصفوفات غير السلبية (NMF) وتكيفاتها للتمثيل والتجميع. يسلطون الضوء على الأعمال الأساسية التي شكلت هذا المجال، مثل نموذج Liu et al. الذي ينسق بين عدة مناظر من خلال مصفوفة إجماع مشتركة، والتطورات اللاحقة التي تدمج تعلم المنحنيات وقيود متنوعة لتعزيز استخراج المعلومات التكميلية من وجهات نظر مختلفة. من الجدير بالذكر أن المؤلفين يؤكدون على أهمية تحقيق التوازن بين التمثيلات المشتركة والخاصة بالمناظر لالتقاط تعقيدات البيانات متعددة المناظر بشكل فعال.

يناقش القسم أيضًا دمج التقنيات الحديثة، مثل التشفير التلقائي والتعلم التبايني، في تعلم التمثيل متعدد المناظر. يشير المؤلفون إلى أنه بينما تحسن هذه الأساليب التوافق واستخراج الميزات، فإنها غالبًا ما تضر بالقدرة على التفسير وقد لا تحافظ بشكل كافٍ على المعلومات الخاصة بالمناظر. لمعالجة هذه القيود، يقدم نموذج C4MV المقترح إطار عمل جديد يجمع بين المعايرة التباينية وNMF، مما يسمح بتمييز واضح بين المكونات المشتركة والتكميلية مع الحفاظ على عدم السلبية وقيود الرتبة المنخفضة. يهدف هذا النهج إلى توحيد نقاط القوة في القابلية للتفسير المعتمدة على التحليل مع قدرات التوافق المحسنة للتعلم التبايني، مما يقدم حلاً أكثر تماسكًا وفعالية لتمثيل البيانات متعددة المناظر.

Journal: Pattern Recognition, Volume: 176
DOI: https://doi.org/10.1016/j.patcog.2026.113291
Publication Date: 2026-02-11
Author(s): Negin Jabari et al.
Primary Topic: Domain Adaptation and Few-Shot Learning

Overview

The research paper presents C4MV, a novel framework for multi-view representation learning (MRL) that addresses key limitations in existing methods. Traditional approaches often focus on consensus learning while neglecting the complementary information inherent in view-specific representations. C4MV innovatively integrates consensus and complementary representation learning through a combination of joint and disjoint self-representation factorizations, utilizing coordinated nonnegative matrix factorizations with diversity regularization to minimize redundancy across views. A significant advancement is the introduction of contrastive calibration regularization, which enhances the alignment of intra- and inter-view representations via contrastive graph constraints, thereby improving sample-level discriminability and reducing dependence on negative pairs.

Extensive evaluations across nine diverse real-world datasets demonstrate that C4MV consistently outperforms 12 state-of-the-art unsupervised MRL methods, underscoring its effectiveness in capturing both shared and view-specific features. While the framework shows strong performance, it requires careful tuning of hyperparameters, which may limit its adaptability. Future research directions include exploring adaptive regularization techniques, extending C4MV into semi-supervised settings, and incorporating anchor graph-based methods for improved scalability. Additionally, the potential for integrating C4MV with transform learning-inspired architectures or contrastive dictionary fusion is highlighted, suggesting avenues for broader representation learning applications.

Introduction

The introduction of the research paper discusses the growing interest in Multi-view Representation Learning (MRL), which effectively integrates diverse data sources describing the same entity. Various methodologies have been developed for MRL, including Canonical Correlation Analysis, Subspace Clustering, Graph-based learning, and Nonnegative Matrix Factorization (NMF). Recent advancements have focused on addressing challenges such as incomplete multi-view problems, enhancing agreement across perspectives, and leveraging tensor learning techniques. Notably, NMF has emerged as a powerful tool for clustering and representing multi-view data, with Collective Matrix Factorization (CMF) enhancing performance by jointly factorizing matrices with shared entities.

The paper highlights the limitations of existing consensus-based methods, which often overlook unique view-specific information, leading to redundancy and suboptimal representations. To address these issues, the authors propose a novel self-representation model, termed Contrastive Calibration on Consensus and Complementary Multi-View representations (C4MV). This model integrates feature-level consensus and complementary representation learning with sample-level discrimination through contrastive learning. By employing joint and disjoint encoder-decoder factorizations, the C4MV framework captures both shared and unique information across views. The proposed approach enhances discriminative capability through contrastive graph regularization, aligning similar samples while separating dissimilar ones. Experimental results demonstrate the effectiveness of the C4MV framework, showcasing improved clustering performance across multiple datasets.

Methods

In this section, the authors present a thorough evaluation of the proposed C4MV method through extensive experiments conducted on nine multi-view datasets. The performance of C4MV is assessed using four standard evaluation metrics: Accuracy, Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and F1-score, which collectively provide a comprehensive assessment of clustering effectiveness, label correlation, pairwise agreement, and the balance between precision and recall. The results, averaged over 10 runs with 400 iterations each, are compared against 12 established multi-view representation methods, demonstrating the effectiveness of C4MV. K-means clustering is applied to the learned representations from all methods except for CCNMF, which utilizes spectral clustering.

The section also outlines the benchmark methods against which C4MV is evaluated. These include DiNMF, which enhances feature distinctiveness; LP-DiNMF, which preserves local geometric structures; and several others that employ various strategies such as co-orthogonal constraints, non-redundancy regularization, and dual-graph approaches to optimize multi-view representation. Each method is briefly described, highlighting its unique contributions to the field of multi-view clustering. The subsequent subsections will detail the datasets used, baseline methods, main experimental results, parameter sensitivity, an ablation study, and convergence behavior.

Results

The results section evaluates the clustering performance of the proposed C₄MV model against 12 state-of-the-art multi-view representation methods across nine real-world datasets. The findings, summarized in Table 3, indicate that C₄MV consistently outperforms its competitors, achieving the best results in 32 out of 36 evaluation cases. On average, C₄MV surpasses the second-best method by approximately 3.2% in Normalized Mutual Information (NMI), 2.14% in Accuracy (ACC), 5.8% in Adjusted Rand Index (ARI), and 1.74% in F1 scores, demonstrating its effectiveness in extracting discriminative and consensus-preserving representations.

Notably, on datasets with moderate redundancy and complementary structures, such as 3Sources and BBCSport, C₄MV shows significant performance enhancements across all metrics, with ACC improving from 0.672 to 0.704 on 3Sources and NMI reaching 0.550 on BBCSport. The model also excels in visual datasets like Caltech101 and Coil100, achieving substantial improvements, including a 6.5% increase in ARI on Coil100. Furthermore, in graph-based datasets such as Cora and subsets from WebKB, C₄MV again demonstrates top performance, with ARI improvements of 7.4% on Texas and 4.8% on Wisconsin. Overall, these results validate the C₄MV model’s design, which effectively separates consensus and complementary information, thereby enhancing clustering stability and clarity across diverse modalities.

Discussion

In the discussion section of the paper, the authors review the evolution of multi-view learning approaches, particularly focusing on Nonnegative Matrix Factorization (NMF) and its adaptations for representation and clustering. They highlight foundational works that have shaped the field, such as Liu et al.’s model that aligns multiple views through a shared consensus matrix, and subsequent advancements that integrate manifold learning and diverse constraints to enhance the extraction of complementary information from various views. Notably, the authors emphasize the importance of balancing shared and view-specific representations to capture the complexities of multi-view data effectively.

The section also discusses the integration of modern techniques, such as autoencoders and contrastive learning, into multi-view representation learning. The authors note that while these methods improve alignment and feature extraction, they often compromise interpretability and may not adequately preserve view-specific information. To address these limitations, the proposed C4MV model introduces a novel framework that combines contrastive calibration with NMF, allowing for a clear distinction between shared and complementary components while maintaining nonnegativity and low-rank constraints. This approach aims to unify the strengths of factorization-based interpretability with the enhanced alignment capabilities of contrastive learning, offering a more coherent and effective solution for multi-view data representation.