عدم اليقين وتاريخ المكافآت لهما تأثيرات مميزة على القرارات بعد الانتصارات والهزائم Uncertainty and reward histories have distinct effects on decisions after wins and losses

المجلة: Scientific Reports، المجلد: 16، العدد: 1
DOI: https://doi.org/10.1038/s41598-026-37554-3
PMID: https://pubmed.ncbi.nlm.nih.gov/41620492
تاريخ النشر: 2026-01-31
المؤلف: Shivam Kalhan وآخرون
الموضوع الرئيسي: تأثير مستقبلات الناقلات العصبية على السلوك

نظرة عامة

في هذه الدراسة، يحقق المؤلفون في كيفية تأثير عدم اليقين وتاريخ المكافآت على التعلم غير المتناظر للانتصارات والهزائم في الجرذان. يقترحون نموذج تعلم تعزيز يتضمن كل من خطأ توقع المكافأة المتوسطة غير الموقعة ومكون تاريخ المكافأة الذاتي. باستخدام مهمة تعلم عكسية احتمالية ديناميكية، وجد الباحثون أن حساسية الجرذان للانتصارات والهزائم كانت متأثرة بشكل واضح بالتقديرات المستمرة لعدم اليقين وتاريخ المكافأة. على وجه التحديد، في البيئات ذات التنبؤ العالي وعدم اليقين المنخفض، أظهرت الجرذان ميلاً أكبر لتفضيل الانتصارات على الهزائم، كما يتضح من احتمال أعلى لسلوكيات البقاء بعد الفوز واحتمال أقل لسلوكيات التحول بعد الخسارة. سمحت لهم هذه الاستراتيجية بالحفاظ على الأفعال الصحيحة مع تقليل تأثير الخسائر النادرة.

بالإضافة إلى ذلك، كشفت الدراسة عن اختلافات خاصة بالجنس في كيفية تأثير تاريخ عدم اليقين على اتخاذ القرار، حيث أظهرت الجرذان الذكور حساسية أكبر لعدم اليقين عند اتخاذ خيارات البقاء بعد الفوز مقارنة بالإناث. بشكل عام، تشير النتائج إلى أن القدرة على وزن الانتصارات والهزائم بشكل غير متناظر هي استراتيجية سلوكية حاسمة للتكيف مع ظروف المكافأة وعدم اليقين المتقلبة، مما يبرز التفاعل المعقد بين هذه العوامل في تشكيل السلوك الذكي في الحيوانات.

مقدمة

في هذه الدراسة، حقق المؤلفون في استخدام استراتيجيات التعلم غير المتناظر في الجرذان، مع التركيز على كيفية تأثير هذه الاستراتيجيات على اتخاذ قراراتهم بعد عكس المكافآت. كشفت النتائج أن الجرذان أظهرت زيادة في حساسية الفوز (WS) وانخفاض في حساسية الخسارة (LS) عند الاستجابة لـ ‘أفعال أفضل’ في المرحلة المتأخرة من التجارب، مما يشير إلى تكيف استراتيجي لتعظيم المكافآت. على وجه التحديد، أظهرت الجرذان احتمال WS أعلى بشكل ملحوظ في المرحلة المتأخرة مقارنة بالمرحلة المبكرة (p = 0.0001، d لكوهين = 1.33)، بينما ظلت احتمالات LS ثابتة عبر المراحل (p = 0.95، d لكوهين = 0.01). تشير هذه السلوكيات إلى أن الجرذان كانت أكثر انتباهاً للانتصارات مع ازدياد معرفتها بظروف الأفعال.

بالإضافة إلى ذلك، أظهر التحليل أن الأفعال الأفضل كانت لها احتمال WS أعلى واحتمال LS أقل من الأفعال الأسوأ، خاصة في الكتل ذات التباين العالي (التأثير الرئيسي لنوع الفعل: WS: F(1,36) = 57.53، p = 5.68e-09؛ LS: F(1,36) = 436.29، p = 2.20e-12). من الجدير بالذكر أنه بينما أظهرت الذكور انخفاضًا أكبر في احتمالات LS من الأفعال الأفضل مقارنة بالإناث، استخدم كلا الجنسين استراتيجية التعلم غير المتناظر بشكل مشابه من حيث سلوكيات WS. يبرز هذا التباين في استراتيجيات LS الاختلافات الخاصة بالجنس في عمليات اتخاذ القرار. بشكل عام، تدعم النتائج الفرضية القائلة بأن استراتيجيات التعلم غير المتناظر تعزز احتمال استمرار الجرذان في الأفعال ذات المكافآت العالية، خاصة في البيئات ذات الهياكل الواضحة للمهام.

طرق البحث

يستعرض قسم “الطرق” في ورقة البحث التصميم التجريبي والتقنيات التحليلية المستخدمة للتحقيق في سؤال البحث. يوضح اختيار المشاركين، بما في ذلك معايير الإدراج والاستبعاد، بالإضافة إلى الإجراءات المحددة المتبعة خلال جمع البيانات. كما يتم وصف المنهجيات المستخدمة لتحليل البيانات، بما في ذلك الاختبارات الإحصائية والبرامج المستخدمة لضمان صحة وموثوقية النتائج.

بالإضافة إلى ذلك، قد يتضمن القسم معلومات عن أي ضوابط تجريبية تم تنفيذها للتخفيف من المتغيرات المربكة، فضلاً عن الأسباب وراء الطرق المختارة. تتيح هذه المقاربة الشاملة فحصًا قويًا للفرضيات المطروحة في الدراسة، مما يساهم في النزاهة العامة لنتائج البحث.

النتائج

في هذه الدراسة، قمنا بتحليل بيانات من تجربة سابقة تضمنت 14 جرذًا من نوع لونغ إيفانز مقيد بالماء (9 ذكور و5 إناث) تم تدريبهم على نموذج تعلم عكسي احتمالي ديناميكي (dynaPRL). تضمنت التجربة ثلاثة أنواع من كتل احتمالية المكافأة: تباين عالي (80% مقابل 10%)، تباين منخفض (60% مقابل 30%)، وعدم تباين (45% مقابل 45%). أشارت النتائج إلى أن كل من الجرذان الذكور والإناث تكيفت مع عكس المكافآت بحلول التجربة السادسة. من الجدير بالذكر أن الجرذان أظهرت حساسية أكبر للانتصارات مقارنة بالخسائر، حيث أظهرت الذكور احتمالًا أعلى بشكل ملحوظ للبقاء بعد الفوز (WS) واحتمالًا أقل للتحول بعد الخسارة (LS) مقارنة بالإناث، مما يشير إلى اختلاف خاص بالجنس في معالجة المكافآت.

أظهر تحليل إضافي باستخدام نموذج تعلم تعزيز أن الذكور كان لديهم معدلات تعلم أعلى من كل من النتائج الإيجابية (α⁺) والسلبية (α⁻)، بالإضافة إلى معدل انخفاض قيمة أعلى من النتائج السلبية (1-γ⁻). يشير هذا إلى أنه بينما تكون الذكور أكثر استجابة للانتصارات، فإنها أيضًا تعاني من انخفاض أسرع في القيمة من الخسائر، مما قد يفسر حساسية أقل للخسائر. دعمت تحليل عوامل بايز الموجه هذه النتائج، حيث كانت الذكور أكثر احتمالًا بحوالي 11 مرة لامتلاك معدل تعلم أعلى من الانتصارات مقارنة بالإناث. بالإضافة إلى ذلك، أظهرت بيانات الكمون أن الذكور كانوا أسرع في بدء التجارب بعد الخسارة، مما يعزز الفكرة القائلة بأن الخسائر أثرت على دافعهم أقل من الإناث. بشكل عام، تسلط هذه النتائج الضوء على اختلافات كبيرة بين الجنسين في حساسية المكافأة وديناميات التعلم في الجرذان.

المناقشة

في هذه الدراسة، بحثنا في كيفية تكيف الجرذان لاستراتيجيات اتخاذ القرار استجابة لمستويات مختلفة من عدم اليقين وتاريخ المكافأة، مع التركيز على سلوكياتهم للبقاء بعد الفوز (WS) والتحول بعد الخسارة (LS). تشير نتائجنا إلى أن الجرذان استخدمت استراتيجيات تعلم غير متناظر، تأثرت بحالة المكافأة العالمية (GRS) وخطأ توقع المكافأة المتوسطة غير الموقعة (avgRPE). على وجه التحديد، وجدنا أن GRS، الذي يعكس تاريخ المكافأة، أثر بشكل كبير على قرارات WS، خاصة في الجرذان الذكور، بينما أثر avgRPE، الذي يدل على عدم اليقين، على كل من سلوكيات WS وLS، خاصة في البيئات ذات العشوائية المنخفضة. من الجدير بالذكر أن الجرذان الذكور أظهرت حساسية أكبر لعدم اليقين عند اتخاذ قرارات WS مقارنة بالإناث، مما يشير إلى اختلافات خاصة بالجنس في كيفية استخدام هذه الحسابات.

كشف التحليل الإضافي أن التفاعل بين GRS وavgRPE أثر على احتمالات WS بشكل مختلف عبر الجنسين وأنواع الكتل. بالنسبة للذكور، زاد GRS العالي من احتمال WS بغض النظر عن عدم اليقين، بينما في الكتل ذات التباين العالي، كانت كل من GRS المنخفض وavgRPE العالي ضرورية لتقليل احتمال WS. في المقابل، كانت الإناث أكثر تأثرًا بـ GRS وحده، مما يدل على الاعتماد على تاريخ المكافأة بدلاً من عدم اليقين في اتخاذ قراراتهن. تسلط هذه النتائج الضوء على الأدوار المتميزة لعدم اليقين وتاريخ المكافأة في تشكيل السلوكيات التكيفية وتقترح أن الاختلافات بين الجنسين في استراتيجيات اتخاذ القرار قد تتأثر بتقلبات المهمة وبنيتها. بشكل عام، تعزز نتائجنا الفهم للحسابات الكامنة وراء التعلم غير المتناظر واتخاذ القرار في بيئات المكافأة المتغيرة.

Journal: Scientific Reports, Volume: 16, Issue: 1
DOI: https://doi.org/10.1038/s41598-026-37554-3
PMID: https://pubmed.ncbi.nlm.nih.gov/41620492
Publication Date: 2026-01-31
Author(s): Shivam Kalhan et al.
Primary Topic: Neurotransmitter Receptor Influence on Behavior

Overview

In this study, the authors investigate how uncertainty and reward history influence the asymmetrical learning of wins and losses in rats. They propose a reinforcement learning model that incorporates both an unsigned average reward prediction error and a subjective reward history component. Using a dynamic probabilistic reversal learning task, the researchers found that rats’ sensitivity to wins and losses was distinctly modulated by ongoing estimations of uncertainty and reward history. Specifically, in environments with high predictability and low uncertainty, rats exhibited a greater tendency to weight wins over losses, as evidenced by a higher probability of win-stay and a lower probability of lose-shift behaviors. This strategy allowed them to maintain correct actions while minimizing the impact of infrequent losses.

Additionally, the study revealed sex-specific differences in how uncertainty history affected decision-making, with male rats showing a greater sensitivity to uncertainty when making win-stay choices compared to females. Overall, the findings suggest that the ability to asymmetrically weight wins and losses is a crucial behavioral strategy for adapting to fluctuating reward and uncertainty conditions, highlighting the complex interplay between these factors in shaping intelligent behavior in animals.

Introduction

In this study, the authors investigated the use of asymmetrical learning strategies in rats, focusing on how these strategies influenced their decision-making following reward reversals. The findings revealed that rats exhibited increased win sensitivity (WS) and decreased loss sensitivity (LS) when responding to ‘better’ actions in the late phase of trials, suggesting a strategic adaptation to maximize rewards. Specifically, rats demonstrated a significantly higher WS probability in the late phase compared to the early phase (p = 0.0001, Cohen’s d = 1.33), while LS probabilities remained consistent across phases (p = 0.95, Cohen’s d = 0.01). This behavior indicates that rats were more attuned to wins as they became more familiar with the action contingencies.

Additionally, the analysis showed that better actions had a higher WS probability and a lower LS probability than worse actions, particularly in high contrast blocks (main effect of action type: WS: F(1,36) = 57.53, p = 5.68e-09; LS: F(1,36) = 436.29, p = 2.20e-12). Notably, while males exhibited a greater reduction in LS probabilities from better actions compared to females, both sexes employed the asymmetrical learning strategy similarly in terms of WS behaviors. This divergence in LS strategies highlights sex-specific differences in decision-making processes. Overall, the results support the hypothesis that asymmetrical learning strategies enhance the likelihood of rats persisting with high-reward actions, particularly in environments with clear task structures.

Methods

The “Methods” section of the research paper outlines the experimental design and analytical techniques employed to investigate the research question. It details the selection of participants, including inclusion and exclusion criteria, as well as the specific procedures followed during data collection. The methodologies utilized for data analysis are also described, including statistical tests and software used to ensure the validity and reliability of the results.

Additionally, the section may include information on any experimental controls implemented to mitigate confounding variables, as well as the rationale behind the chosen methods. This comprehensive approach allows for a robust examination of the hypotheses posed in the study, ultimately contributing to the overall integrity of the research findings.

Results

In this study, we analyzed data from a previous experiment involving 14 water-restricted Long-Evans rats (9 males and 5 females) trained on a dynamic probabilistic reversal learning (dynaPRL) paradigm. The experiment involved three types of reward probability blocks: high contrast (80% vs. 10%), low contrast (60% vs. 30%), and no contrast (45% vs. 45%). The results indicated that both male and female rats adapted to reward reversals by the sixth trial. Notably, rats exhibited a greater sensitivity to wins than losses, with males showing a significantly higher win-stay (WS) probability and a lower lose-shift (LS) probability compared to females, suggesting a sex-specific difference in reward processing.

Further analysis using a reinforcement learning model revealed that males had higher learning rates from both positive (α⁺) and negative (α⁻) outcomes, as well as a higher value decay rate from negative outcomes (1-γ⁻). This indicates that while males are more responsive to wins, they also experience a quicker decline in value from losses, which may account for their reduced sensitivity to losses. The directed Bayes Factor analysis supported these findings, with males being approximately 11 times more likely to have a higher learning rate from wins than females. Additionally, latency data showed that males were quicker to initiate trials following a loss, reinforcing the notion that losses affected their motivation less than that of females. Overall, these results highlight significant sex differences in reward sensitivity and learning dynamics in rats.

Discussion

In this study, we investigated how rats adapt their decision-making strategies in response to varying levels of uncertainty and reward history, focusing on their win-stay (WS) and lose-shift (LS) behaviors. Our findings indicate that rats employed asymmetrical learning strategies, influenced by the global reward state (GRS) and the unsigned average reward prediction error (avgRPE). Specifically, we found that the GRS, which reflects reward history, significantly impacted WS decisions, particularly in male rats, while the avgRPE, indicative of uncertainty, influenced both WS and LS behaviors, especially in low-stochastic environments. Notably, male rats demonstrated a greater sensitivity to uncertainty when making WS decisions compared to females, suggesting sex-specific differences in how these computations are utilized.

Further analysis revealed that the interaction between GRS and avgRPE modulated WS probabilities differently across sexes and block types. For males, a high GRS increased WS likelihood regardless of uncertainty, while in high contrast blocks, both a low GRS and high avgRPE were necessary to reduce WS probability. In contrast, females were more influenced by GRS alone, indicating a reliance on reward history over uncertainty in their decision-making. These results highlight the distinct roles of uncertainty and reward history in shaping adaptive behaviors and suggest that sex differences in decision-making strategies may be influenced by task volatility and structure. Overall, our findings enhance the understanding of the latent computations underlying asymmetrical learning and decision-making in changing reward environments.