DRoPE: تضمين موضع دوار اتجاهي لنمذجة تفاعل الوكلاء بكفاءة DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling

المجلة: IEEE Transactions on Vehicular Technology
DOI: https://doi.org/10.1109/tvt.2026.3696986
تاريخ النشر: 2026-01-01
المؤلف: J. W. Zhao وآخرون
الموضوع الرئيسي: التعرف على وضع الجسم والحركة

نظرة عامة

في هذا القسم، يقدم المؤلفون تضمين الموضع الدوار الاتجاهي (DRoPE)، وهو تعديل جديد لتضمين الموضع الدوار (RoPE) يهدف إلى تعزيز نمذجة تفاعلات الوكلاء لتوليد المسارات في أنظمة القيادة الذاتية. تتمتع الأطر الحالية – المركزية على المشهد، المركزية على الوكيل، والمركزية على الاستعلام – كل منها بنقاط قوتها وضعفها، مما يؤدي إلى توازن بين الدقة، ووقت الحساب، وكفاءة الذاكرة. يتناول DRoPE هذه التحديات من خلال إدخال متجه هوية موحد في التحويل الدوار ثنائي الأبعاد لـ RoPE، مما يسمح بالتشفير الفعال للمعلومات الزاوية النسبية مع الحفاظ على تعقيد مكاني منخفض.

تظهر التحليلات النظرية المقدمة أن DRoPE لا يحتفظ فقط بكفاءة الحساب والذاكرة لطرق المركزية على المشهد، بل يحسن أيضًا دقة توليد المسارات من خلال دمج المعلومات الموضعية النسبية واتجاهات الوكلاء بشكل فعال. تؤكد التقييمات التجريبية ضد النماذج الرائدة أداء DRoPE المتفوق وتقليل تعقيد الذاكرة، مما يبرز فعاليته العملية. يقترح المؤلفون أن تركز الأعمال المستقبلية على تحسين DRoPE لسيناريوهات التفاعل الأكثر تعقيدًا واستكشاف قابليته للتطبيق في مجالات أخرى تتطلب نمذجة فعالة للزوايا الدورية، مما يسهم في تقدم كبير في توازن الدقة، وتعقيد الوقت، وتعقيد المساحة في توليد مسارات القيادة الذاتية.

مقدمة

في مقدمة هذه الورقة البحثية، يؤكد المؤلفون على أهمية نمذجة تفاعلات الوكلاء لتوليد المسارات في القيادة الذاتية المعتمدة على التعلم العميق (AD). يصنفون الطرق الحالية إلى ثلاثة أطر: المركزية على المشهد، المركزية على الوكيل، والمركزية على الاستعلام. يتم انتقاد النهج المركزي على المشهد لأدائه الضعيف بسبب الاعتماد على المواقع المطلقة، بينما تعاني طرق المركزية على الوكيل، على الرغم من دقتها، من تعقيد زمني مرتفع. تواجه طرق المركزية على الاستعلام، التي تستخدم تضمينات الموضع النسبي (RPE) لتشفير العلاقات بين الوكلاء، زيادة في تعقيد الذاكرة. وهذا يخلق “مثلث مستحيل” حيث لا يمكن لأي طريقة واحدة تحسين الدقة، وتعقيد الوقت، وتعقيد المساحة في نفس الوقت.

لمعالجة هذه التحديات، يقدم المؤلفون تضمين الموضع الدوار الاتجاهي (DRoPE)، وهو تعديل لتضمين الموضع الدوار (RoPE) الذي يقوم بنمذجة الزوايا الدورية الضرورية لتوقع المسارات بشكل فعال. يدمج DRoPE متجه هوية موحد في التحويل الدوار ثنائي الأبعاد، مما ينسق زوايا الدوران مع اتجاهات الوكلاء ويسمح بتخطيط متسق لمجالات الأعداد الحقيقية إلى مجالات الزوايا الدورية. تتيح هذه الابتكار تضمين المواضع النسبية والاتجاهات في نفس الوقت دون زيادة كبيرة في تعقيد الحساب أو المساحة. يقدم المؤلفون تحليلات نظرية وت validations تجريبية لـ DRoPE، مما يوضح مزاياه على النماذج الحالية من حيث تعقيد المساحة وأداء التوقع. تشمل المساهمات الرئيسية تقديم DRoPE، وتحليل شامل لقيود RoPE، واقتراح معمارين عمليين لـ DRoPE-RoPE لنمذجة تفاعلات الوكلاء.

نقاش

ت outlines قسم النقاش في الورقة منهجيات مختلفة لنمذجة تفاعلات الوكلاء في القيادة الذاتية، مصنفة إياها إلى ثلاثة أطر رئيسية: المركزية على المشهد، المركزية على الوكيل، والمركزية على الاستعلام. تركز الطرق المركزية على المشهد على الكفاءة الحسابية من خلال استخدام نظام إحداثيات موحد مركزي حول المركبة الذاتية، لكنها تعاني من اختلالات في توزيع البيانات، مما يؤدي إلى تقليل الدقة التنبؤية للوكلاء البعيدين. في المقابل، تقوم النماذج المركزية على الوكيل بتطبيع نظام الإحداثيات حول وكلاء فرديين، مما يعزز الدقة التنبؤية ولكنه يزيد من تعقيد الحساب، الذي يتزايد خطيًا مع عدد الوكلاء، $N$. يشكل هذا التعقيد تحديات للتطبيقات في الوقت الحقيقي.

يهدف النموذج المركزي على الاستعلام، المتمثل في QCNet، إلى تحقيق توازن بين الكفاءة والدقة من خلال فصل تشفير الشكل عن العلاقات المكانية. ومع ذلك، غالبًا ما تواجه الطرق المركزية على الاستعلام الحالية عبء ذاكرة مرتفع بسبب تعقيد المساحة التربيعي المرتبط بمصفوفات الموضع النسبي. لمعالجة هذه القيود، تقدم الورقة تضمينات الموضع الدوار الاتجاهي (DRoPE)، التي تحافظ على تعقيد مكاني خطي بينما تحقق أداءً تنافسيًا. يقوم DRoPE بتشفير كل من المواضع المكانية والمعلومات الزاوية بشكل فعال، مما يحسن نمذجة التفاعلات بين الوكلاء في سيناريوهات المرور المعقدة. يتم وضع الطريقة المقترحة كتحسين كبير على تقنيات التشفير الموضعية النسبية التقليدية، التي تكافح مع الدورية في التمثيلات الزاوية.

Journal: IEEE Transactions on Vehicular Technology
DOI: https://doi.org/10.1109/tvt.2026.3696986
Publication Date: 2026-01-01
Author(s): J. W. Zhao et al.
Primary Topic: Human Pose and Action Recognition

Overview

In this section, the authors present Directional Rotary Position Embedding (DRoPE), a novel adaptation of Rotary Position Embedding (RoPE) aimed at enhancing the modeling of agent interactions for trajectory generation in autonomous driving systems. The existing frameworks—scene-centric, agent-centric, and query-centric—each have their own strengths and weaknesses, leading to a trade-off among accuracy, computational time, and memory efficiency. DRoPE addresses these challenges by introducing a uniform identity scalar into RoPE’s 2D rotary transformation, which allows for effective encoding of relative angular information while maintaining low space complexity.

The theoretical analysis provided demonstrates that DRoPE not only retains the computational and space efficiency of scene-centric methods but also improves the accuracy of trajectory generation by effectively integrating relative positional information and agent headings. Empirical evaluations against state-of-the-art models confirm DRoPE’s superior performance and reduced space complexity, highlighting its practical effectiveness. The authors suggest that future work will focus on optimizing DRoPE for more complex interaction scenarios and exploring its applicability in other domains requiring efficient modeling of periodic angles, thus contributing a significant advancement in the balance of accuracy, time complexity, and space complexity in autonomous driving trajectory generation.

Introduction

In the introduction of this research paper, the authors emphasize the importance of modeling agent interactions for trajectory generation in deep learning-based autonomous driving (AD). They categorize existing methods into three frameworks: scene-centric, agent-centric, and query-centric. The scene-centric approach is criticized for its poor performance due to reliance on absolute positions, while agent-centric methods, despite their accuracy, suffer from high time complexity. Query-centric methods, which utilize Relative Position Embeddings (RPE) to encode inter-agent relationships, face increased space complexity. This creates an “impossible triangle” where no single method can optimize accuracy, time complexity, and space complexity simultaneously.

To address these challenges, the authors introduce Directional Rotary Position Embedding (DRoPE), an adaptation of Rotary Position Embedding (RoPE) that effectively models periodic angles essential for trajectory prediction. DRoPE incorporates a uniform identity scalar into the 2D rotary transformation, aligning rotation angles with agent headings and allowing for a consistent mapping of real-number fields to periodic angular domains. This innovation enables the simultaneous embedding of relative positions and headings without significantly increasing computational or space complexity. The authors provide theoretical analyses and empirical validations of DRoPE, demonstrating its advantages over existing models in terms of space complexity and prediction performance. Key contributions include the introduction of DRoPE, a thorough analysis of RoPE’s limitations, and the proposal of two practical DRoPE-RoPE architectures for agent interaction modeling.

Discussion

The discussion section of the paper outlines various methodologies for modeling agent interactions in autonomous driving, categorizing them into three primary frameworks: scene-centric, agent-centric, and query-centric approaches. Scene-centric methods focus on computational efficiency by using a unified coordinate system centered on the ego vehicle, but they suffer from data distribution imbalances, leading to reduced predictive accuracy for distant agents. In contrast, agent-centric models normalize the coordinate system around individual agents, enhancing predictive accuracy but increasing computational complexity, which scales linearly with the number of agents, $N$. This complexity poses challenges for real-time applications.

The query-centric paradigm, exemplified by QCNet, aims to balance efficiency and accuracy by decoupling shape encoding from spatial relationships. However, existing query-centric methods often face high memory overhead due to the quadratic space complexity associated with relative position matrices. To address these limitations, the paper introduces Directional Rotary Positional Embeddings (DRoPE), which maintains a linear space complexity while achieving competitive performance. DRoPE effectively encodes both spatial positions and angular information, thereby improving the modeling of interactions among agents in complex traffic scenarios. The proposed method is positioned as a significant advancement over traditional relative positional encoding techniques, which struggle with periodicity in angular representations.