إطار عمل هجين لتعلم التعزيز Fuzzy-TD3 لتتبع المسار بشكل قوي لذراع الروبوت Mitsubishi RV-2AJ A fuzzy-TD3 hybrid reinforcement learning framework for robust trajectory tracking of the Mitsubishi RV-2AJ robotic arm

المجلة: Scientific Reports، المجلد: 16، العدد: 1
DOI: https://doi.org/10.1038/s41598-026-42615-8
PMID: https://pubmed.ncbi.nlm.nih.gov/41792354
تاريخ النشر: 2026-03-06
المؤلف: Zied Ben Hazem
الموضوع الرئيسي: التحكم في البرمجة الديناميكية التكيفية

نظرة عامة

تقدم هذه الورقة البحثية بنية تحكم هجينة تجمع بين نظام المنطق الضبابي وخوارزمية Twin Delayed Deep Deterministic Policy Gradient (TD3)، تهدف إلى تحسين تتبع المسار لذراع روبوتية ذات 5 درجات من الحرية (5-DOF). تستفيد الإطار المقترح من قابلية تفسير المنطق الضبابي والاستجابة السريعة إلى جانب قدرات التحسين التكيفية للتعلم العميق المعزز. يوفر مشرف ضبابي إجراءات تصحيح فورية بناءً على حالات الخطأ في الوقت الحقيقي، بينما يتعلم وكيل TD3 سياسة تحكم مثلى لمعالجة الديناميات غير الخطية للنظام. تشير نتائج المحاكاة إلى أن هذا المتحكم الهجين يتفوق بشكل كبير على متحكمات TD3 المستقلة ومتحكمات PID-TD3 الهجينة، محققًا تقليلًا في خطأ التتبع بنسبة 27.8-50% و14.8-28.6%، على التوالي. كما تظهر البنية أداءً قويًا تحت عدم اليقين البارامتري والاضطرابات، مع الحفاظ على تحسينات تتراوح بين 23.5-34.2% مقارنة بـ TD3 و11.0-16.7% مقارنة بـ PID-TD3.

تؤكد الدراسة فعالية المتحكم الهجين الضبابي-TD3 من خلال تقييم شامل ومعايير حساسية. تشمل النتائج الرئيسية تقليلًا ملحوظًا في العزم بنسبة 23% مع زيادة قدرها 8% فقط في خطأ الجذر التربيعي المتوسط (RMSE)، مما يدل على استخدام فعال للطاقة دون المساس بالدقة. تؤكد تقييمات الاستقرار العددي على تشغيل حلقة مغلقة مستقرة، مع بقاء مشتق ليابانوف سالبًا لمعظم مدة المحاكاة. بالإضافة إلى ذلك، يوفر تحليل تفعيل قواعد المنطق الضبابي رؤى حول عملية اتخاذ القرار للمتحكم خلال المراحل الحرجة. بشكل عام، يمثل المتحكم الهجين الضبابي-TD3 تقدمًا كبيرًا في أنظمة التحكم الروبوتية، حيث يقدم مزيجًا من التكيف والدقة وقابلية التفسير. ستستكشف الأبحاث المستقبلية التطبيقات في العالم الحقيقي، وتنسيق الروبوتات المتعددة، ودمج تقنيات التعلم الميتا لتحسين الأداء في البيئات الديناميكية.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على التطور الكبير في الروبوتات من الآلات الآلية البسيطة إلى الأنظمة المستقلة المتطورة القادرة على العمل في بيئات معقدة. لقد أسس هذا التحول، الذي تغذيه التقدمات في الاستشعار، والتفعيل، والذكاء الاصطناعي، الروبوتات كأدوات أساسية في مختلف القطاعات، بما في ذلك التصنيع الصناعي، والرعاية الصحية، واللوجستيات، والمساعدة الشخصية. تعتبر التحديات الحرجة في هذا المجال هي تحقيق تتبع دقيق ومرن للمسار لل manipulators الروبوتية، التي تتميز بالديناميات غير الخطية، والارتباط القوي بين المفاصل، والعلاقات المعقدة متعددة المدخلات والمخرجات. غالبًا ما تفشل طرق التحكم التقليدية، مثل متحكمات PID، في الظروف الديناميكية وغير المؤكدة، مما يدفع لاستكشاف استراتيجيات بديلة مثل التحكم التكيفي والمنطق الضبابي. ومع ذلك، لا يزال ضمان دقة عالية متسقة قضية غير محلولة.

ظهر التعلم العميق المعزز (DRL) كحل واعد لتعلم سياسات التحكم المثلى من خلال التفاعل مع البيئة. من بين خوارزميات DRL، أظهرت خوارزمية Twin Delayed Deep Deterministic Policy Gradient (TD3) فعالية خاصة في مهام التحكم المستمر من خلال معالجة تحديات الاستقرار والتقدير الزائد. ومع ذلك، فإن تطبيق التعلم المعزز النقي محدود بمشكلات تتعلق باستقرار التعلم في المراحل المبكرة والاستكشاف الآمن. يبرز هذا السياق الحاجة إلى أطر هجينة تدمج منهجيات متنوعة لتعزيز قوة ومرونة أنظمة التحكم. بالإضافة إلى ذلك، تقدم الورقة التحكم بالمنطق الضبابي (FLC) كبديل قابل للتطبيق للطرق التقليدية لإدارة تعقيدات manipulators الروبوتية، خاصة عندما يكون من الصعب إجراء نمذجة رياضية دقيقة. تشمل بنية المتحكم الضبابي لذراع الروبوت Mitsubishi RV-2AJ ذات 5-DOF مكونات مثل التمويه، وقاعدة المعرفة لقواعد IF-THEN، وآلية الاستدلال باستخدام منطق من نوع مامداني، وإزالة التمويه.

النتائج

يقدم قسم النتائج في الدراسة تحليلًا مقارنًا لثلاث استراتيجيات تحكم—TD3، وPID-TD3 الهجينة، والمقترح الضبابي-TD3—المطبقة على تتبع المسار لذراع روبوت Mitsubishi RV-2AJ ذات 5-DOF. تم إجراء محاكاة شاملة تحت ظروف تشغيل متغيرة، بما في ذلك تغييرات الحمولة الديناميكية والاضطرابات الخارجية. تشير النتائج إلى أن بنية الضبابي-TD3 تتفوق بشكل كبير على كل من TD3 النقي والنهج الهجين PID-TD3، مما يظهر دقة تتبع متفوقة، وسلوك تقارب، وقدرات رفض الاضطرابات. يُعزى هذا الأداء المحسن إلى دمج قابلية تفسير المنطق الضبابي ومرونته مع التعلم العميق المعزز (DRL).

يكشف تحليل إضافي لمتحكم الضبابي-TD3 عن فعاليته في توليد أوامر عزم مستمر مثلى لجميع المفاصل، مما يضمن تتبع دقيق للمسار حتى في ظل عدم اليقين الديناميكي. توضح نتائج المحاكاة لمختلف ملفات المسار—حلزوني، لولبي، وشكل N—أن المتحكم الضبابي-TD3 يحقق باستمرار دقة تتبع واستقرار أعلى مقارنة بنظرائه. تم تقييم قوة المتحكم المقترح من خلال إدخال اضطرابات داخلية، مثل تغييرات في كتلة الوصلات ومعلمات العطالة، بالإضافة إلى اضطرابات خارجية تم نمذجتها كاضطرابات عزم نبضي. تؤكد تقييمات الأداء، استنادًا إلى استجابات زوايا المفاصل وحسابات الخطأ القياسي، على مرونة المتحكم الضبابي-TD3 ضد كل من الاضطرابات الداخلية والخارجية خلال مهمة تتبع المسار الحلزوني الصعبة.

المناقشة

تسلط قسم المناقشة في الورقة الضوء على التقدمات الكبيرة في التعلم العميق المعزز (DRL) للتحكم الروبوتي، خاصة من خلال تطوير خوارزمية Twin Delayed Deep Deterministic Policy Gradient (TD3). تعالج هذه الخوارزمية القضايا الحرجة الموجودة في خوارزمية Deep Deterministic Policy Gradient (DDPG) الأصلية، مثل التقدير الزائد والتحكم غير المستقر. أدى دمج TD3 مع المنطق الضبابي إلى إنشاء أطر تحكم هجينة تجمع بين قدرات تحسين السياسة طويلة الأجل لـ TD3 مع القابلية للتكيف في الوقت الحقيقي وقابلية تفسير أنظمة المنطق الضبابي. يخفف هذا النهج الهجين بشكل فعال من عدم استقرار التعلم في المراحل المبكرة ويعزز مرونة المتحكمات الروبوتية، مما يجعلها مناسبة بشكل خاص للأنظمة المعقدة وغير الخطية مثل manipulators الروبوتية.

تقترح الورقة إطار تحكم هجيني ضبابي-TD3 مصمم خصيصًا لتتبع المسار لذراع روبوت Mitsubishi RV-2AJ ذات 5-DOF. في هذه البنية، يقوم متحكم المنطق الضبابي من نوع مامداني (FLC) بتوليد عزم تعويضي فوري بناءً على أخطاء التتبع في الوقت الحقيقي، بينما يتعلم وكيل TD3 إخراج عزم متبقي تكميلي لتحسين الأداء على المدى الطويل. لا يحسن هذا التصميم التآزري دقة التتبع ورفض الاضطرابات فحسب، بل يعزز أيضًا قابلية تفسير ومرونة نظام التحكم. يتم التحقق من الإطار المقترح من خلال محاكاة صارمة، مما يظهر أداءً متفوقًا مقارنة بمتحكمات TD3 التقليدية وPID-TD3 الهجينة عبر مسارات معقدة متنوعة. تشمل المساهمات الرئيسية تطوير بنية تحكم متكاملة بالكامل، وآلية مكافأة مصممة بشكل ضبابي لتحسين التعلم، ومنصة محاكاة شاملة تلتقط الديناميات الواقعية، مما يضمن تدريبًا آمنًا وفعالًا لوكيل DRL.

Journal: Scientific Reports, Volume: 16, Issue: 1
DOI: https://doi.org/10.1038/s41598-026-42615-8
PMID: https://pubmed.ncbi.nlm.nih.gov/41792354
Publication Date: 2026-03-06
Author(s): Zied Ben Hazem
Primary Topic: Adaptive Dynamic Programming Control

Overview

This research paper presents a hybrid control architecture that combines a fuzzy logic system with the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, aimed at enhancing trajectory tracking for a 5-degree-of-freedom (5-DOF) robotic manipulator. The proposed framework leverages the interpretability and rapid response of fuzzy logic alongside the adaptive optimization capabilities of deep reinforcement learning. A fuzzy supervisor provides immediate corrective actions based on real-time error states, while the TD3 agent learns an optimal control policy to address the system’s nonlinear dynamics. Simulation results indicate that this hybrid controller significantly outperforms standalone TD3 and hybrid PID-TD3 controllers, achieving a reduction in tracking error by 27.8-50% and 14.8-28.6%, respectively. The architecture also demonstrates robust performance under parametric uncertainties and disturbances, maintaining improvements of 23.5-34.2% over TD3 and 11.0-16.7% over PID-TD3.

The study validates the effectiveness of the hybrid fuzzy-TD3 controller through comprehensive benchmarking and sensitivity analysis. Key findings include a notable torque reduction of 23% with only an 8% increase in root mean square error (RMSE), indicating efficient energy use without compromising precision. Numerical stability assessments confirm stable closed-loop operation, with the Lyapunov derivative remaining negative for the majority of the simulation duration. Additionally, the analysis of fuzzy rule activation provides insights into the controller’s decision-making process during critical phases. Overall, the hybrid fuzzy-TD3 controller represents a significant advancement in robotic control systems, offering a blend of adaptability, precision, and interpretability. Future research will explore real-world applications, multi-robot coordination, and the integration of meta-learning techniques to improve performance in dynamic environments.

Introduction

The introduction of this research paper highlights the significant evolution of robotics from simple automated machines to sophisticated, autonomous systems capable of operating in complex environments. This transformation, fueled by advancements in sensing, actuation, and artificial intelligence, has established robots as essential tools in various sectors, including industrial manufacturing, healthcare, logistics, and personal assistance. A critical challenge in this domain is achieving precise and resilient trajectory tracking for robotic manipulators, which are characterized by nonlinear dynamics, strong joint coupling, and complex multi-input, multi-output relationships. Traditional control methods, such as PID controllers, often fall short in dynamic and uncertain conditions, prompting exploration of alternative strategies like adaptive control and fuzzy logic. However, ensuring consistent high precision remains an unresolved issue.

Deep Reinforcement Learning (DRL) has emerged as a promising solution for learning optimal control policies through interaction with the environment. Among DRL algorithms, Twin Delayed Deep Deterministic Policy Gradient (TD3) has shown particular effectiveness in continuous control tasks by addressing stability and overestimation challenges. Nonetheless, the application of pure reinforcement learning is limited by issues related to early-stage learning stability and safe exploration. This context underscores the need for hybrid frameworks that integrate various methodologies to enhance the robustness and adaptability of control systems. Additionally, the paper introduces Fuzzy Logic Control (FLC) as a viable alternative to conventional methods for managing the complexities of robotic manipulators, particularly when precise mathematical modeling is difficult. The architecture of the fuzzy controller for the 5-DOF Mitsubishi RV-2AJ robotic arm includes components such as fuzzification, a knowledge base of IF-THEN rules, an inference mechanism utilizing Mamdani-type logic, and defuzzification.

Results

The results section of the study presents a comparative analysis of three control strategies—TD3, hybrid PID-TD3, and the proposed fuzzy-TD3 hybrid—applied to the trajectory tracking of a 5-DOF Mitsubishi RV-2AJ robotic arm. Comprehensive simulations were conducted under varying operational conditions, including dynamic payload changes and external disturbances. The findings indicate that the fuzzy-TD3 architecture significantly outperforms both the pure TD3 and the PID-TD3 hybrid approaches, demonstrating superior tracking precision, convergence behavior, and disturbance rejection capabilities. This enhanced performance is attributed to the integration of fuzzy logic’s interpretability and adaptability with deep reinforcement learning (DRL).

Further analysis of the hybrid fuzzy-TD3 controller reveals its effectiveness in generating optimal continuous torque commands for all joints, ensuring accurate trajectory tracking even amidst dynamic uncertainties. Simulation results for various trajectory profiles—helical, spiral, and N-shaped—illustrate that the fuzzy-TD3 controller consistently achieves higher tracking precision and stability compared to its counterparts. The robustness of the proposed controller was assessed by introducing internal disturbances, such as variations in link mass and inertia parameters, as well as external disturbances modeled as pulsed torque perturbations. The performance evaluation, based on joint angle responses and standard error calculations, confirms the fuzzy-TD3 controller’s resilience against both internal and external disturbances during the demanding spiral trajectory tracking task.

Discussion

The discussion section of the paper highlights significant advancements in deep reinforcement learning (DRL) for robotic control, particularly through the development of the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. This algorithm addresses critical issues found in the original Deep Deterministic Policy Gradient (DDPG), such as overestimation bias and training instability. The integration of TD3 with fuzzy logic has led to the creation of hybrid control frameworks that combine TD3’s long-term policy optimization capabilities with the real-time adaptability and interpretability of fuzzy systems. This hybrid approach effectively mitigates early-stage learning instability and enhances the adaptability of robotic controllers, making it particularly suitable for complex, nonlinear systems like robotic manipulators.

The paper proposes a novel fuzzy-TD3 hybrid control framework specifically designed for trajectory tracking of a 5-DOF Mitsubishi RV-2AJ robotic arm. In this architecture, a Mamdani-type fuzzy logic controller (FLC) generates immediate compensatory torque based on real-time tracking errors, while the TD3 agent learns to output a complementary residual torque to optimize long-term performance. This synergistic design not only improves tracking accuracy and disturbance rejection but also enhances the interpretability and robustness of the control system. The proposed framework is validated through rigorous simulations, demonstrating superior performance compared to traditional TD3 and hybrid PID-TD3 controllers across various complex trajectories. Key contributions include the development of a fully integrated control architecture, an engineered fuzzy-shaped reward mechanism for improved learning, and a comprehensive simulation platform that captures real-world dynamics, ensuring safe and efficient training of the DRL agent.