الأبعاد الكامنة وراء التوافق التمثيلي للشبكات العصبية العميقة مع البشر Dimensions underlying the representational alignment of deep neural networks with humans

المجلة: Nature Machine Intelligence، المجلد: 7، العدد: 6
DOI: https://doi.org/10.1038/s42256-025-01041-7
PMID: https://pubmed.ncbi.nlm.nih.gov/40567352
تاريخ النشر: 2025-06-23
المؤلف: Florian P Mahner وآخرون
الموضوع الرئيسي: ديناميات الأعصاب ووظيفة الدماغ

طرق

قسم “الطرق” يوضح التصميم التجريبي والتقنيات التحليلية المستخدمة في الدراسة. استخدم الباحثون نهجًا كميًا، حيث تم استخدام التحليلات الإحصائية لتقييم البيانات التي تم جمعها من تجارب مختلفة. تضمنت المنهجيات المحددة تجارب مختبرية خاضعة للرقابة، حيث تم التلاعب بالمتغيرات بشكل منهجي لملاحظة تأثيراتها على النتائج ذات الصلة.

شملت جمع البيانات استخدام أدوات موحدة لضمان الموثوقية والصلاحية، تلتها اختبارات إحصائية صارمة لتحليل النتائج. تم تطبيق تقنيات مثل تحليل الانحدار وANOVA لتحديد أهمية النتائج، مما سمح بفهم شامل للعلاقات بين المتغيرات المدروسة. بشكل عام، تم تصميم الطرق لضمان نتائج قوية وقابلة للتكرار، مما يساهم في موثوقية الاستنتاجات المستخلصة من البحث.

نتائج

في هذا القسم، يقدم المؤلفون نتائج من دراسة تهدف إلى مقارنة تمثيلات البشر والشبكات العصبية العميقة (DNN) من خلال مهمة ثلاثية الاختيار الغريب. تم تكليف المشاركين، سواء من البشر أو DNN، بتحديد الصورة التي تختلف عن مجموعة من ثلاث صور. قام المؤلفون بإنشاء مساحة تشابه باستخدام حاصل النقطة من ميزات DNN لتحديد الزوج الأكثر تشابهًا داخل كل ثلاثية، مع تعيين الصورة المتبقية كالصورة الغريبة. بالنسبة للبشر، تم إبلاغ هذا النهج من خلال السلوك الملحوظ، مما خدم كمقياس لتمثيلاتهم المعرفية.

لتسهيل المقارنة، قام المؤلفون بتطوير تمثيلات ذات أبعاد أقل لكل من البشر وDNN، تم تحسينها للتنبؤ بالاختيارات التي تم اتخاذها في مهمة الثلاثية. فرضوا قيودًا على الندرة وعدم السلبية على التمثيلات لتعزيز قابلية التفسير، مما يعكس الخصائص المحدودة التي تميز الأشياء في العالم الحقيقي. أسفرت عملية التحسين عن تمثيلات مستقرة ب70 بعدًا لـ DNN و68 للبشر، حيث تم التقاط 84.03% و82.85% من إجمالي التباين في تشابه الصور، على التوالي. ومن الجدير بالذكر أن تمثيل البشر شكل 91.20% من التباين القابل للتفسير، مما يدل على توافق قوي مع سقف الضوضاء التجريبية لمجموعة البيانات.

مناقشة

في هذا القسم، تناقش البحث قابلية تفسير أبعاد الشبكات العصبية العميقة (DNN) فيما يتعلق بأحكام البشر في مهمة ثلاثية الاختيار الغريب. وجد المؤلفون أنه بينما كانت تمثيلات DNN تلتقط كل من الخصائص الدلالية والبصرية للأشياء، إلا أنها أظهرت تحيزًا ملحوظًا نحو الأبعاد البصرية مقارنة بالأبعاد الدلالية التي تم تحديدها في أحكام البشر. تم تأكيد ذلك من خلال تقييمات من مقيمين مستقلين، مما كشف أن أبعاد DNN كانت أقل قابلية للتفسير وغالبًا ما عكست مزيجًا من المعلومات البصرية والدلالية. كما استخدمت الدراسة تقنيات متنوعة، بما في ذلك Grad-CAM والنماذج التوليدية، لتحليل مساهمات خصائص الصورة المحددة في أبعاد DNN، مما يظهر أن بعض السمات البصرية أثرت بشكل كبير على التمثيلات.

علاوة على ذلك، يبرز البحث العلاقة المتوسطة (معامل بيرسون = 0.55) بين مصفوفات تشابه التمثيل للبشر وDNN، مما يشير إلى بعض التوافق ولكن أيضًا اختلافات كبيرة في كيفية تصنيف كل نظام للصور. أظهر التحليل أنه بينما اعتمد البشر بشكل كبير على الأبعاد الدلالية للتصنيف، كانت DNN تميل إلى إعطاء الأولوية للميزات البصرية. تم عكس هذا الانحراف في استراتيجيات التمثيل في الخيارات السلوكية، حيث يمكن أن تنشأ نتائج مشابهة من تمثيلات أساسية مختلفة. يقترح المؤلفون إطارًا لمقارنة مباشرة بين تمثيلات البشر والذكاء الاصطناعي، مشيرين إلى أن فهم هذه الاختلافات يمكن أن يُفيد في تطوير أنظمة ذكاء اصطناعي أكثر تشابهًا مع البشر وتحسين توافقها مع الإدراك البشري.

Journal: Nature Machine Intelligence, Volume: 7, Issue: 6
DOI: https://doi.org/10.1038/s42256-025-01041-7
PMID: https://pubmed.ncbi.nlm.nih.gov/40567352
Publication Date: 2025-06-23
Author(s): Florian P Mahner et al.
Primary Topic: Neural dynamics and brain function

Methods

The “Methods” section outlines the experimental design and analytical techniques employed in the study. The researchers utilized a quantitative approach, employing statistical analyses to evaluate the data collected from various experiments. Specific methodologies included controlled laboratory experiments, where variables were systematically manipulated to observe their effects on the outcomes of interest.

Data collection involved the use of standardized instruments to ensure reliability and validity, followed by rigorous statistical testing to analyze the results. Techniques such as regression analysis and ANOVA were applied to determine the significance of the findings, allowing for a comprehensive understanding of the relationships between the variables studied. Overall, the methods were designed to ensure robust and reproducible results, contributing to the reliability of the conclusions drawn from the research.

Results

In this section, the authors present results from a study aimed at comparing human and deep neural network (DNN) representations through a triplet odd-one-out task. Participants, both human and DNN, were tasked with identifying the image that differed from a set of three. The authors constructed a dot-product similarity space from DNN features to determine the most similar pair within each triplet, designating the remaining image as the odd one out. For humans, this approach was informed by observed behavior, serving as a measure of their cognitive representations.

To facilitate comparison, the authors developed lower-dimensional embeddings for both humans and the DNN, optimized to predict choices made in the triplet task. They imposed sparsity and non-negativity constraints on the embeddings to enhance interpretability, reflecting the limited properties that characterize real-world objects. The optimization yielded stable embeddings with 70 dimensions for the DNN and 68 for humans, capturing 84.03% and 82.85% of the total variance in image similarity, respectively. Notably, the human embedding accounted for 91.20% of the explainable variance, indicating a strong alignment with the empirical noise ceiling of the dataset.

Discussion

In this section, the research discusses the interpretability of deep neural network (DNN) dimensions in relation to human judgments in a triplet odd-one-out task. The authors found that while DNN embeddings captured both semantic and visual properties of objects, they exhibited a notable bias towards visual dimensions compared to the predominantly semantic dimensions identified in human judgments. This was confirmed through evaluations by independent raters, revealing that DNN dimensions were less interpretable and often reflected a mixture of visual and semantic information. The study also employed various techniques, including Grad-CAM and generative models, to analyze the contributions of specific image properties to the DNN dimensions, demonstrating that certain visual attributes significantly influenced the embeddings.

Furthermore, the research highlights the moderate correlation (Pearson’s r = 0.55) between human and DNN representational similarity matrices, indicating some alignment but also significant differences in how each system categorizes images. The analysis showed that while humans relied heavily on semantic dimensions for categorization, DNNs tended to prioritize visual features. This divergence in representational strategies was reflected in behavioral choices, where similar outcomes could arise from different underlying representations. The authors propose a framework for directly comparing human and AI representations, suggesting that understanding these differences can inform the development of more human-like AI systems and improve their alignment with human cognition.