تشكيل إشارة التعلم في قاعدة Q-learning المدمجة لتحسين التعاون المنظم Shaping the learning signal in a combined Q-learning rule to improve structured cooperation

المجلة: Chaos Solitons & Fractals، المجلد: 206
DOI: https://doi.org/10.1016/j.chaos.2026.117986
تاريخ النشر: 2026-01-28
المؤلف: Chunpeng Du وآخرون
الموضوع الرئيسي: نظرية الألعاب التطورية والتعاون

نظرة عامة

تبحث الدراسة في دور السمعة في تعزيز التعاون ضمن إطار تعلم Q، حيث يقوم الوكلاء بتحديث قيم أفعالهم بناءً على إشارة مكافأة تجمع بين عوائد اللعبة ودرجة سمعة محدودة. من خلال الحفاظ على هيكل لعبة وشبكة ثابت، تعزل الدراسة آثار السمعة على ديناميات التعلم. تكشف النتائج أن زيادة وزن إشارة السمعة تعزز عمومًا التعاون من خلال توطيد مجموعات من السلوك التعاوني. ومع ذلك، فإن هذا التأثير يعتمد على ديناميات التعلم المحددة: يتناقص عندما تكون سرعة التعلم منخفضة جدًا، مما يعيق نشر المعلومات بشكل فعال، وعندما تقترب معامل الخصم من واحد، حيث يعيق ذلك الفوائد السمعة الفورية.

تؤكد الدراسة على أن دمج السمعة في إشارة التعزيز هو معلمة تحكم مهمة لتعزيز التعاون، مما يتناقض مع الأساليب السابقة التي غيرت العوائد أو اختيار الشركاء. بينما تؤكد النتائج التوقع بأن المعلومات الاجتماعية يمكن أن تعزز السلوك الإيجابي، فإنها تسلط الضوء أيضًا على التفاعل المعقد بين معلمات التعلم وتأثير السمعة. يقترح المؤلفون مجالات للبحث المستقبلي، بما في ذلك استكشاف الشبكات غير المتجانسة، وتقييمات السمعة الأكثر تعقيدًا، ومعدلات التعلم التكيفية، لفهم المزيد من آثار تعلم Q المدمج مع السمعة على السلوك التعاوني في سياقات متنوعة.

مقدمة

تناقش مقدمة هذه الورقة البحثية الدور الحاسم للتعاون في الأنظمة البيولوجية والاجتماعية، مع تسليط الضوء على تحدياته بسبب إمكانية الاستغلال من قبل غير المساهمين. تعتبر نظرية الألعاب التطورية إطارًا لفهم كيفية ظهور التعاون واستقراره، مع تحديد الآليات الرئيسية مثل اختيار الأقارب، والتبادلات المباشرة وغير المباشرة، والتبادلات الشبكية، واختيار المجموعات. من بين هذه الآليات، يتم التأكيد على التبادلات الشبكية لقدرتها على تعزيز التعاون في التفاعلات المنظمة، حيث يمكن أن يخفف التجمع المحلي للمتعاونين من مخاطر الاستغلال.

تستكشف الورقة أيضًا دمج السمعة في التعلم القائم على القيمة على الشبكات، مشيرة إلى أن السمعة يمكن أن تؤثر بشكل كبير على السلوك التعاوني. أظهرت دراسات مختلفة أن اختيار الشركاء القائم على السمعة وإشارات السمعة القابلة للرؤية محليًا تعزز التعاون. يقترح المؤلفون نهجًا جديدًا حيث تكون إشارة التعزيز مزيجًا موزونًا من السمعة وعوائد اللعبة، بدلاً من تغيير العوائد الاستراتيجية أو تفاعلات الشراكة. يحافظ هذا الأسلوب على الدور الواضح للسمعة بينما يسهل التعاون في سيناريوهات معضلة السجين المكاني. تشير النتائج إلى أن التعاون يزداد مع زيادة وزن السمعة، على الرغم من أن هذا التأثير يتأثر بمعدلات التعلم ومعاملات الخصم، التي تؤثر على انتشار مزايا السمعة وتأثيرها على التبادلات الشبكية.

النتائج

تكشف نتائج الدراسة عن علاقة كبيرة بين وزن السمعة ($\beta$) ومستوى التعاون في إطار تعلم التعزيز، خاصة تحت قوى معضلة متغيرة ($b$). تشير النتائج إلى أنه مع زيادة $\beta$، ترتفع نسبة التعاون أيضًا، مما يظهر أن السمعة يمكن أن تكون أكثر تأثيرًا من عوائد اللعبة في تعزيز السلوك التعاوني. يستمر هذا التأثير حتى في السيناريوهات ذات قوة المعضلة العالية. توضح التحليلات، الموضحة في الشكل 2، أنه بينما تعزز $\beta$ التعاون باستمرار، فإن تأثيرات المعلمات الأخرى، مثل معدل التعلم ($\alpha$) ومعامل الخصم ($\gamma$)، تظهر تفاعلات أكثر تعقيدًا. على وجه التحديد، يوجد نطاق مثالي لكل من $\alpha$ و $\gamma$، حيث يصل التعاون إلى ذروته، بينما القيم المنخفضة جدًا أو العالية تقلل من التأثير الإيجابي للسمعة.

تظهر المزيد من الاستكشافات للأنماط الزمانية والمكانية أنه بدون سمعة ($\beta = 0$)، يبقى التعاون منخفضًا ومجزأ. مع زيادة $\beta$، تبدأ مجموعات التعاون في التكون والتوسع، على الرغم من أن وجود المخالفين يمكن أن يعطل هذه المجموعات. من الجدير بالذكر أنه عند $\beta = 1.0$، يكافح المخالفون لاجتياح الأراضي التعاونية، مما يؤدي إلى حالة تعاونية مستقرة. تسلط الدراسة أيضًا الضوء على أن معدل التعلم غير الصفري ضروري لإشارة التعزيز للتأثير بشكل فعال على السلوك، حيث يمكن أن تؤدي المعدلات الأعلى إلى عدم استقرار مجموعات التعاون. بالإضافة إلى ذلك، يلعب معامل الخصم $\gamma$ دورًا حاسمًا، حيث تعزز القيم المتوسطة التعاون، بينما القيم التي تقترب من الواحد تضعف فعالية السمعة. بشكل عام، تؤكد هذه النتائج على أهمية السمعة وديناميات التعلم في تعزيز التعاون ضمن بيئات تنافسية.

نقاش

في هذه الدراسة، يستكشف المؤلفون تأثير السمعة على التعاون ضمن مجموعة سكانية منظمة مكانيًا، باستخدام إطار تعلم التعزيز القائم على تعلم Q. يتم ترتيب الوكلاء على شبكة مربعة ويشاركون في لعبة معضلة السجين مع جيرانهم الأقرب، accumulating payoffs التي تتأثر بأفعالهم وسمعتهم. يتضمن النموذج متغير سمعة يتطور بشكل حتمي بناءً على أفعال الوكلاء التعاونية أو المخالفة، والذي يتم دمجه بعد ذلك في إشارة التعزيز من خلال مزيج موزون من العوائد الطبيعية ودرجات السمعة. تكشف النتائج أن التعاون يزداد مع الوزن المخصص للسمعة؛ ومع ذلك، فإن هذا التأثير يتناقص تحت ظروف معينة، خاصة عندما تكون سرعة التعلم منخفضة أو معامل الخصم مرتفعًا.

تسلط النتائج الضوء على أهمية معلمة السمعة كآلية تحكم لتعزيز التعاون، مشددة على أن الوزن الداخلي للسمعة ضمن إطار تعلم Q هو أمر حاسم لتعزيز السلوك التعاوني. على عكس الدراسات السابقة التي غيرت البيئة الاستراتيجية من خلال السمعة، تركز هذه الدراسة على تعديل إشارة التعلم نفسها مع الحفاظ على ديناميات اللعبة ثابتة. يقترح المؤلفون أن البحث المستقبلي يجب أن يستكشف هياكل الشبكات الأكثر تعقيدًا، ومعدلات التعلم التكيفية، وتقييمات السمعة المتقدمة لفهم المزيد من ديناميات التعلم المدمج مع السمعة في تعزيز التعاون في سيناريوهات العالم الحقيقي المتنوعة.

Journal: Chaos Solitons & Fractals, Volume: 206
DOI: https://doi.org/10.1016/j.chaos.2026.117986
Publication Date: 2026-01-28
Author(s): Chunpeng Du et al.
Primary Topic: Evolutionary Game Theory and Cooperation

Overview

The research investigates the role of reputation in promoting cooperation within a Q-learning framework, where agents update their action values based on a reward signal that combines game payoffs and a bounded reputation score. By maintaining a fixed game and network structure, the study isolates the effects of reputation on learning dynamics. The findings reveal that increasing the weight of the reputation signal generally enhances cooperation by consolidating clusters of cooperative behavior. However, this effect is contingent upon specific learning dynamics: it diminishes when the learning rate is very low, which hampers effective information dissemination, and when the discount factor approaches one, as this obscures immediate reputational benefits.

The study emphasizes that the integration of reputation into the reinforcement signal is a significant control parameter for fostering cooperation, contrasting with previous approaches that altered payoffs or partner selection. While the results affirm the expectation that social information can enhance prosocial behavior, they also highlight the nuanced interplay between learning parameters and reputational influence. The authors suggest avenues for future research, including the exploration of heterogeneous networks, more complex reputation assessments, and adaptive learning rates, to further understand the implications of reputation-integrated Q-learning on cooperative behavior in diverse contexts.

Introduction

The introduction of this research paper discusses the critical role of cooperation in biological and social systems, highlighting its challenges due to the potential for exploitation by non-contributors. Evolutionary game theory serves as a framework for understanding how cooperation can emerge and stabilize, identifying key mechanisms such as kin selection, direct and indirect reciprocity, network reciprocity, and group selection. Among these, network reciprocity is emphasized for its ability to foster cooperation in structured interactions, where local clustering of cooperators can mitigate exploitation risks.

The paper further explores the integration of reputation into value-based learning on networks, noting that reputation can significantly influence cooperative behavior. Various studies have shown that reputation-based partner choice and locally observable reputation cues enhance cooperation. The authors propose a novel approach where the reinforcement signal is a weighted combination of reputation and game payoffs, rather than altering strategic payoffs or partnership interactions. This method maintains the explicit role of reputation while facilitating cooperation in spatial prisoner’s dilemma scenarios. The findings indicate that cooperation increases with the weight on reputation, although this effect is moderated by learning rates and discount factors, which influence the propagation of reputation advantages and their impact on network reciprocity.

Results

The results of the study reveal a significant relationship between the reputation weight ($\beta$) and the level of cooperation in a reinforcement learning framework, particularly under varying dilemma strengths ($b$). The findings indicate that as $\beta$ increases, the fraction of cooperation also rises, demonstrating that reputation can be more influential than game payoffs in fostering cooperative behavior. This effect persists even in scenarios with high dilemma strength. The analysis, illustrated in Figure 2, shows that while $\beta$ consistently enhances cooperation, the effects of other parameters, such as the learning rate ($\alpha$) and discount factor ($\gamma$), exhibit more complex interactions. Specifically, an optimal range for both $\alpha$ and $\gamma$ exists, where cooperation peaks, while very low or high values diminish the positive impact of reputation.

Further exploration of spatiotemporal patterns reveals that without reputation ($\beta = 0$), cooperation remains low and fragmented. As $\beta$ increases, cooperative clusters begin to form and expand, although the presence of defectors can still disrupt these clusters. Notably, at $\beta = 1.0$, defectors struggle to invade cooperative territories, leading to a stable cooperative state. The study also highlights that a non-zero learning rate is essential for the reinforcement signal to effectively influence behavior, with higher rates potentially destabilizing cooperative clusters. Additionally, the discount factor $\gamma$ plays a crucial role, with intermediate values promoting cooperation, while values approaching one weaken the reputation’s effectiveness. Overall, these results underscore the importance of reputation and learning dynamics in promoting cooperation within competitive environments.

Discussion

In this study, the authors investigate the influence of reputation on cooperation within a spatially structured population, utilizing a reinforcement learning framework based on Q-learning. Agents are arranged on a square lattice and engage in a prisoner’s dilemma game with their nearest neighbors, accumulating payoffs that are influenced by their actions and reputations. The model incorporates a reputation variable that evolves deterministically based on the agents’ cooperative or defecting actions, which is then integrated into the reinforcement signal through a weighted combination of normalized payoffs and reputation scores. The findings reveal that cooperation increases with the weight assigned to reputation; however, this effect diminishes under certain conditions, specifically when the learning rate is low or the discount factor is high.

The results highlight the significance of the reputation parameter as a control mechanism for enhancing cooperation, emphasizing that the internal weighting of reputation within the Q-learning framework is crucial for fostering cooperative behavior. Unlike previous studies that altered the strategic environment through reputation, this work focuses on modifying the learning signal itself while keeping the game dynamics constant. The authors suggest that future research should explore more complex network structures, adaptive learning rates, and advanced reputation assessments to further understand the dynamics of reputation-integrated learning in promoting cooperation in diverse real-world scenarios.