نموذج أساسي للتنبؤ والتقاط الإدراك البشري A foundation model to predict and capture human cognition

المجلة: Nature، المجلد: 644، العدد: 8078
DOI: https://doi.org/10.1038/s41586-025-09215-4
PMID: https://pubmed.ncbi.nlm.nih.gov/40604288
تاريخ النشر: 2025-07-02
المؤلف: Marcel Binz وآخرون
الموضوع الرئيسي: دراسات علم النفس العصبي والسلوكي

نظرة عامة

في ختام الدراسة، يتناول المؤلفون المخاوف الأولية بشأن قبول نموذج موحد للإدراك، والذي كان يخشى بعض الباحثين أن يُنظر إليه على أنه غير متوافق مع النظريات المعروفة في علم الإدراك. لتخفيف هذه المخاوف، قدموا مفهوم العشاري الإدراكي، وهو إطار تقييم صارم مصمم لتقييم النماذج الإدراكية المتنافسة من خلال سلسلة من عشرة تجارب.

يبلغ المؤلفون أن نموذجهم، سنتور، خضع لما يعادل 16 عشاري إدراكي، حيث تم مقارنته مع نماذج معروفة مختلفة وخرج منتصرًا في كل حالة. يبرز هذا النجاح المستمر إمكانيات الأساليب المعتمدة على البيانات لكشف النماذج العامة للإدراك. يقترح المؤلفون أن المرحلة التالية من البحث يجب أن تركز على تطوير نظرية موحدة للإدراك البشري بناءً على هذا النموذج الحسابي.

مقدمة

في هذا القسم، يحدد المؤلفون نهجهم لتقييم النماذج الإدراكية الخاصة بالمجال، مختارين 14 نموذجًا إدراكيًا وإحصائيًا كنماذج أساسية تشمل معظم التجارب في Psych-101. كان التركيز الأساسي لتحليلهم هو التنبؤ بسلوك المشاركين الذين لم يتم تضمينهم في بيانات التدريب. لتحقيق ذلك، قاموا بتناسب مجموعة مشتركة من المعلمات عبر جميع المشاركين في التدريب وقاموا بتقييم دقة النموذج التنبؤية للمشاركين المحجوزين باستخدام متوسط (سلبي) لوغاريتم الاحتمالات كمعيار للتقييم.

بالنسبة للتقييمات خارج التوزيع، استخدم المؤلفون استراتيجية حيث تم ضبط معلمات النموذج بناءً على التجربة الأكثر تشابهًا من مجموعة التدريب. على سبيل المثال، تم اشتقاق معلمات النموذج لإصدار السجادة السحرية من تجربة مشابهة لمهمة الخطوتين مع قصة غلاف سفينة فضاء افتراضية، بينما كانت المعلمات لمزرعة ماغي مستندة إلى مهمة الأفق. من الجدير بالذكر أن المؤلفين لم يتضمنوا نموذجًا أساسيًا لمهمة الاستدلال المنطقي بسبب عدم وجود تجارب قابلة للمقارنة في بيانات التدريب.

نقاش

في هذه الدراسة، نقدم سنتور، نموذج أساسي للإدراك البشري تم تطويره من خلال تحسين نموذج اللغة Llama 3.1 70B على مجموعة بيانات Psych-101، التي تشمل مجموعة واسعة من بيانات السلوك البشري. باستخدام تقنية فعالة من حيث المعلمات تُعرف باسم التكيف المنخفض الرتبة الكمي (QLoRA)، قمنا بإدخال محولات منخفضة الرتبة في النموذج، مما يسمح له بالتقاط عمليات اتخاذ القرار الشبيهة بالبشر بشكل فعال. تضمنت عملية التدريب نهجًا مركزًا قام بإخفاء الاستجابات غير البشرية، مما يضمن أن سنتور تعلم تقليد السلوك البشري بدقة. أظهرت نتائج التقييم أن سنتور تفوق على كل من نموذج Llama الأساسي ونماذج إدراكية خاصة بالمجال عبر عدة نماذج تجريبية، مما يدل على تحسينات كبيرة في التنبؤ باستجابات البشر.

تم التحقق من متانة سنتور من خلال محاكاة الحلقة المفتوحة والتقييمات خارج التوزيع، حيث تكيف بنجاح مع ظروف تجريبية جديدة وحافظ على دقة تنبؤية عالية. من الجدير بالذكر أنه أظهر القدرة على التعميم خارج بيانات تدريبه، ملتقطًا سلوك البشر في سيناريوهات ذات قصص غلاف وهياكل مهام مختلفة. بالإضافة إلى ذلك، أظهرت التمثيلات الداخلية لسنتور توافقًا مع النشاط العصبي البشري، مما يشير إلى أن تحسينه على بيانات السلوك يعزز من قدراته في نمذجة الإدراك. بشكل عام، يمثل سنتور تقدمًا كبيرًا في نمذجة الإدراك البشري، مما يوفر أداة متعددة الاستخدامات لفهم عمليات اتخاذ القرار وتسهيل الاكتشاف العلمي في علم الإدراك.

Journal: Nature, Volume: 644, Issue: 8078
DOI: https://doi.org/10.1038/s41586-025-09215-4
PMID: https://pubmed.ncbi.nlm.nih.gov/40604288
Publication Date: 2025-07-02
Author(s): Marcel Binz et al.
Primary Topic: Neural and Behavioral Psychology Studies

Overview

In the conclusion of the study, the authors address initial concerns regarding the acceptance of a unified model of cognition, which some researchers feared might be perceived as incompatible with established cognitive science theories. To alleviate these apprehensions, they introduced the concept of a cognitive decathlon, a rigorous evaluation framework designed to assess competing cognitive models through a series of ten experiments.

The authors report that their model, Centaur, was subjected to the equivalent of 16 cognitive decathlons, where it was pitted against various established models and emerged victorious in each instance. This consistent success underscores the potential of data-driven approaches to uncover domain-general models of cognition. The authors suggest that the next phase of research should focus on developing a unified theory of human cognition based on this computational model.

Introduction

In this section, the authors outline their approach to evaluating domain-specific cognitive models, selecting 14 cognitive and statistical models as baselines that encompass the majority of experiments in Psych-101. The primary focus of their analysis was to predict the behavior of participants not included in the training data. To achieve this, they fitted a joint set of parameters across all training participants and assessed the model’s predictive accuracy for held-out participants using average (negative) log-likelihoods as the evaluation metric.

For out-of-distribution evaluations, the authors employed a strategy where model parameters were fitted based on the most analogous experiment from the training set. For instance, the model parameters for the magic-carpet version of the two-step task were derived from a similar two-step task experiment featuring a default spaceship cover story, while parameters for Maggie’s farm were based on the horizon task. Notably, the authors did not include a baseline model for the logical reasoning task due to the absence of comparable experiments in the training data.

Discussion

In this study, we present Centaur, a foundation model of human cognition developed by fine-tuning the Llama 3.1 70B language model on the Psych-101 dataset, which encompasses a large array of human behavioral data. Utilizing a parameter-efficient technique known as quantized low-rank adaptation (QLoRA), we incorporated low-rank adapters into the model, allowing it to capture human-like decision-making processes effectively. The training process involved a focused approach that masked out non-human responses, ensuring that Centaur learned to emulate human behavior accurately. Evaluation results indicated that Centaur outperformed both the base Llama model and various domain-specific cognitive models across multiple experimental paradigms, demonstrating significant improvements in predicting human responses.

Centaur’s robustness was further validated through open-loop simulations and out-of-distribution evaluations, where it successfully adapted to novel experimental conditions and maintained high predictive accuracy. Notably, it demonstrated the ability to generalize beyond its training data, capturing human behavior in scenarios with different cover stories and task structures. Additionally, Centaur’s internal representations showed alignment with human neural activity, suggesting that fine-tuning on behavioral data enhances its cognitive modeling capabilities. Overall, Centaur represents a significant advancement in modeling human cognition, providing a versatile tool for understanding decision-making processes and facilitating scientific discovery in cognitive science.

كلمات مفتاحية: إدراك، الإدراك، الذكاء الاصطناعي، المقياس (النسبة)، بشر، ذكر، سلوك الاختيار، علم المعرفة، علم النفس، علوم الحاسوب، لغة، محاكاة الكمبيوتر، مهمة (إدارة المشاريع)، نطاق (الطيران)، نماذج، نفسية، نموذج حسابي