Scene2Hap: توليد لمسات واسعة النطاق للمشاهد للواقع الافتراضي من سياق المشهد باستخدام نماذج لغوية متعددة الوسائط Scene2Hap: Generating Scene-Wide Haptics for VR from Scene Context with Multimodal LLMs

المجلة: Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems
DOI: https://doi.org/10.1145/3772318.3791297
تاريخ النشر: 2026-04-13
المؤلف: Arata Jingu وآخرون
الموضوع الرئيسي: التفاعلات اللمسية والحسية

نظرة عامة

تقدم ورقة البحث Scene2Hap، وهو نظام مبتكر يركز على نماذج اللغة الكبيرة (LLMs) التي تولد تلقائيًا ردود فعل اهتزازية لمسية لبيئات الواقع الافتراضي (VR). يدمج النظام مكونين رئيسيين: استنتاج لمسي قائم على LLM وتقديم لمسي مستوحى من الفيزياء. يستخدم الأول LLM متعدد الوسائط لاستنتاج السمات الدلالية والسياق الفيزيائي للأشياء داخل مشهد VR، بما في ذلك خصائصها المادية وخصائص الاهتزاز. هذه المعلومات ضرورية لتوليد إشارات لمسية واقعية، والتي يتم إنتاجها إما من خلال إنشاء أو استرجاع إشارات صوتية متطابقة وتحويلها إلى ردود فعل لمسية.

يعزز مكون تقديم اللمس المستوحى من الفيزياء واقعية تجربة اللمس الاهتزازي من خلال محاكاة انتشار وتخفيف الاهتزازات عبر الأجسام في المشهد، مع الأخذ في الاعتبار خصائصها المادية والعلاقات المكانية. تظهر النتائج من ثلاث دراسات أن Scene2Hap يفهم بفعالية دلالات الأجسام والسياقات الفيزيائية، ويعزز بشكل كبير جودة الانغماس في تجارب اللمس في VR، ويحسن تفاعل المستخدم ضمن مشاهد VR الكاملة. لا يثري خط الأنابيب من البداية إلى النهاية إحساس المادية والوعي المكاني فحسب، بل يحسن أيضًا تجربة المستخدم العامة في البيئات الافتراضية.

مقدمة

تناقش مقدمة ورقة البحث التحديات التي تواجه تصميم العوالم الافتراضية ثلاثية الأبعاد، لا سيما في إنشاء ردود فعل لمسية واقعية لمشاهد الواقع الافتراضي (VR). بينما استخدمت التطورات الأخيرة الذكاء الاصطناعي (AI) ونماذج اللغة الكبيرة (LLMs) لأتمتة تصميم العناصر البصرية والسمعية، لا يزال توليد الخصائص اللمسية غير مستكشف بشكل كافٍ. غالبًا ما تفشل النماذج التوليدية الحالية في الاستفادة الكاملة من المعلومات الدلالية للأشياء وسياقاتها الفيزيائية، والتي تعتبر حاسمة لتجارب اللمس الواقعية. على سبيل المثال، يختلف رد الفعل الاهتزازي لجسم مثل وعاء بشكل كبير اعتمادًا على سياقه داخل المشهد.

لمعالجة هذه القيود، يقترح المؤلفون Scene2Hap، وهو نظام يركز على LLM مصمم لتوليد ردود فعل لمسية على مستوى الأجسام تلقائيًا لمشاهد VR كاملة. يستخدم Scene2Hap LLM متعدد الوسائط لاستنتاج الدلالات وخصائص المواد للأشياء بناءً على معلوماتها السياقية، مما يمكّنه من إنتاج إشارات لمسية واقعية تأخذ في الاعتبار عوامل مثل انتشار الاهتزاز وتخفيفه بناءً على تفاعل المستخدم مع المشهد. يدمج هيكل النظام استنتاجًا لمسيًا قائمًا على LLM وتقديم لمسي مستوحى من الفيزياء، مما يسمح بالتعديل في الوقت الحقيقي للردود بناءً على موقع لمسة المستخدم وخصائص المواد. تظهر الدراسات التجريبية أن Scene2Hap يعزز بشكل كبير جودة الانغماس في تجارب VR من خلال تحسين إحساس المستخدم بالمادية والوعي المكاني، مما يجعله نهجًا جديدًا لتصميم اللمس القابل للتوسع في البيئات الافتراضية والمختلطة.

الطرق

في قسم الطرق من الدراسة، تم استخدام مجموعتين من الأسئلة لتقييم تأثيرات ردود الفعل اللمسية على تجربة المستخدم في البيئات الافتراضية. بالنسبة للدراسة 2، استجاب المشاركون لسلسلة من العبارات المتعلقة بانتشار الاهتزاز باستخدام مقياس خطي مستمر، يتراوح من “أعارض بشدة” إلى “أوافق بشدة”. قيمت الأسئلة أبعادًا مختلفة من ردود الفعل اللمسية، بما في ذلك فائدتها، وسببها، واتساقها، ووضوحها، وماديتها، والوعي المكاني. تم تحويل كل استجابة إلى نسبة مئوية لتسهيل التحليل.

في الدراسة 3، تم استخدام مقياس ليكرت من خمس نقاط لقياس تصورات المشاركين لردود الفعل اللمسية ضمن مشاهد VR الكاملة. تراوحت المقياس من “أعارض” إلى “أوافق”، مما يسمح بردود دقيقة. ركزت الأسئلة على جوانب رئيسية مثل الواقعية، والانغماس، والوجود، ووضوح الردود، والانخراط، والرضا العام عن التجربة الافتراضية. كانت هذه التقييمات المنظمة تهدف إلى قياس تأثير ردود الفعل اللمسية على تعزيز تفاعل المستخدم مع البيئة الافتراضية وإدراكه لها.

النتائج

في قسم النتائج، قام المشاركون بتقييم فئات المشهد المستنتجة من LLM بمتوسط عالٍ قدره 4.68 (SD = 0.67)، مما يشير إلى فعالية محلل المشهد في تقدير دلالات مشاهد VR من المعلومات متعددة الوسائط. على الرغم من وجود تناقضات بين أسماء المشهد المحددة والفئات المقدرة، أظهر LLM أداءً قويًا في استنتاج فئة الجسم، وفئة المادة، والاستخدام، وما إذا كان يهتز أم لا، مع تقييمات متوسطة تتجاوز 4 على مقياس ليكرت من 5 نقاط. ومن الجدير بالذكر أن تقييم خصائص الاهتزاز أسفر عن تقييمات أقل قليلاً حوالي 3.6، مما يعكس تعقيد هذه التقييمات.

كشف تحليل مفصل عن ثلاث مجموعات من تقييمات الأجسام. حصلت الغالبية (24 من 30 جسمًا) على تقييمات تتراوح بين 4.30 و4.88، مما يظهر قدرة LLM على استنتاج الخصائص بدقة، بما في ذلك إمكانات الاهتزاز (AVG = 4.61، SD = 0.87). على العكس، كانت مجموعة أصغر من ثلاثة أجسام، وهي أنواع مختلفة من حامل المناشف، مصنفة بشكل خاطئ، مما أدى إلى تقييمات منخفضة (1.10 – 2.63) بسبب مظهرها المتشابه. بالإضافة إلى ذلك، واجهت ثلاثة أجسام تحديات في تحديد خصائص اهتزازها، مما أدى إلى تقييمات متباينة من المشاركين (Vibrate-Or-Not AVG = 3.23، SD = 1.48). تم توضيح الفهم الدقيق لـ LLM لدلالات الأجسام بشكل أكبر من خلال قدرته على التمييز بين “شاحنة تفريغ” و”شاحنة لعبة مصغرة” بناءً على السياق، مما يشير إلى أن LLMs تحمل وعدًا كبيرًا لتعزيز التصميم التلقائي للملمس من خلال النظر بفعالية في الدلالات المتنوعة للأجسام.

المناقشة

في مناقشة تصميم اللمس لمشاهد الواقع الافتراضي (VR)، تسلط الورقة الضوء على التعقيدات المرتبطة بإنشاء ردود فعل لمسية فعالة بسبب التفاعل المعقد بين دلالات الأجسام، والخصائص الفيزيائية، والسياق المكاني. لقد سهلت أدوات تصميم اللمس القائمة على واجهة المستخدم الرسومية التقليدية إنشاء وتحرير الإشارات اللمسية، ومع ذلك لا تزال تتطلب جهدًا يدويًا كبيرًا، لا سيما في المشاهد المعقدة التي تحتوي على العديد من العناصر التفاعلية. يعالج نظام Scene2Hap المقترح هذه القيود من خلال أتمتة تصميم اللمس على مستوى الأجسام من خلال استنتاج قائم على نماذج اللغة الكبيرة (LLM)، مما يمكّن من توليد لمسي قابل للتوسع وواعي بالسياق دون الحاجة إلى ضبط معلمات منخفضة المستوى.

تستكشف الورقة أيضًا أهمية انتشار الاهتزاز عبر الأسطح، والتي تم تجاهلها إلى حد كبير في منهجيات تصميم اللمس الحالية. تؤكد على أن الخصائص الميكانيكية للمواد تؤثر على كيفية إدراك الاهتزازات، مما يشير إلى أن دمج هذه العوامل في ردود الفعل اللمسية يمكن أن يعزز تجربة المستخدم. يستخدم Scene2Hap نموذجًا مستوحى من الفيزياء لمحاكاة انتشار الاهتزاز وتخفيفه بناءً على خصائص المواد، مما يسمح بتقديم ردود فعل لمسية ذات صلة بالسياق في الوقت الحقيقي تعكس التفاعلات الفيزيائية داخل بيئة VR. هذا النهج المبتكر يحول المشاهد البصرية فقط إلى تجارب متعددة الوسائط، مما يعزز الواقعية والانغماس في تفاعلات VR.

Journal: Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems
DOI: https://doi.org/10.1145/3772318.3791297
Publication Date: 2026-04-13
Author(s): Arata Jingu et al.
Primary Topic: Tactile and Sensory Interactions

Overview

The research paper introduces Scene2Hap, an innovative system centered around large language models (LLMs) that autonomously generates vibrotactile feedback for virtual reality (VR) environments. The system integrates two key components: LLM-based haptic inference and physics-inspired haptic rendering. The former utilizes a multimodal LLM to derive the semantic attributes and physical context of objects within a VR scene, including their material properties and vibration characteristics. This information is essential for generating realistic vibrotactile signals, which are produced by either creating or retrieving corresponding audio signals and converting them into haptic feedback.

The physics-inspired haptic rendering component enhances the realism of the vibrotactile experience by simulating the propagation and attenuation of vibrations across objects in the scene, taking into account their material properties and spatial relationships. The findings from three studies demonstrate that Scene2Hap effectively understands object semantics and physical contexts, significantly enhances the immersive quality of VR haptic experiences, and improves user interaction within full VR scenes. The end-to-end pipeline not only enriches the sense of materiality and spatial awareness but also optimizes the overall user experience in virtual environments.

Introduction

The introduction of the research paper discusses the challenges faced in designing 3D virtual worlds, particularly in creating realistic haptic feedback for virtual reality (VR) scenes. While recent advancements have utilized artificial intelligence (AI) and large language models (LLMs) to automate the design of visual and auditory elements, the generation of haptic properties remains underexplored. Existing generative models often fail to fully utilize the semantic information of objects and their physical contexts, which are crucial for realistic haptic experiences. For instance, the vibrational feedback of an object like a pot varies significantly depending on its context within a scene.

To address these limitations, the authors propose Scene2Hap, an LLM-centered system designed to automatically generate object-level vibrotactile feedback for entire VR scenes. Scene2Hap employs a multimodal LLM to infer the semantics and material properties of objects based on their contextual information, enabling it to produce realistic vibrotactile signals that consider factors such as vibration propagation and attenuation based on the user’s interaction with the scene. The system’s architecture integrates LLM-based haptic inference and physics-inspired haptic rendering, allowing for real-time modulation of feedback based on user touch location and material properties. Empirical studies demonstrate that Scene2Hap significantly enhances the immersive quality of VR experiences by improving users’ sense of materiality and spatial awareness, positioning it as a novel approach to scalable haptic design in virtual and mixed reality environments.

Methods

In the Methods section of the study, two sets of questions were employed to evaluate the effects of haptic feedback on user experience in virtual environments. For Study 2, participants responded to a series of statements regarding vibration propagation using a continuous line scale, ranging from “Strongly Disagree” to “Strongly Agree.” The questions assessed various dimensions of haptic feedback, including its utility, causality, consistency, saliency, materiality, and spatial awareness. Each response was quantified as a percentage to facilitate analysis.

In Study 3, a five-point Likert scale was utilized to gauge participants’ perceptions of haptic feedback within full-VR scenes. The scale ranged from “Disagree” to “Agree,” allowing for nuanced responses. The questions focused on key aspects such as realism, immersion, presence, feedback clarity, engagement, and overall satisfaction with the virtual experience. These structured evaluations aimed to quantify the impact of haptic feedback on enhancing the user’s interaction with and perception of the virtual environment.

Results

In the results section, participants rated the LLM-inferred Scene Categories with a high average score of 4.68 (SD = 0.67), indicating the effectiveness of the Scene Analyzer in estimating the semantics of VR scenes from multimodal information. Despite discrepancies between the defined Scene Names and the estimated categories, the LLM demonstrated strong performance in inferring Object Category, Material Category, Usage, and Vibrate-Or-Not, with average ratings exceeding 4 on a 5-point Likert scale. Notably, the assessment of vibration properties yielded slightly lower ratings around 3.6, reflecting the complexity of these evaluations.

A detailed analysis revealed three clusters of object assessments. The majority (24 out of 30 objects) received ratings between 4.30 and 4.88, showcasing the LLM’s capability to accurately infer properties, including vibration potential (AVG = 4.61, SD = 0.87). Conversely, a smaller cluster of three objects, specifically variations of a Hand Towel Rack, were misclassified, resulting in low ratings (1.10 – 2.63) due to their similar appearance. Additionally, three objects posed challenges in determining their vibration properties, leading to varied participant ratings (Vibrate-Or-Not AVG = 3.23, SD = 1.48). The LLM’s nuanced understanding of object semantics was further illustrated by its ability to differentiate between a “dump truck” and a “miniature toy truck” based on context, suggesting that LLMs hold significant promise for enhancing automatic haptic design by effectively considering the diverse semantics of objects.

Discussion

In the discussion of haptic design for virtual reality (VR) scenes, the paper highlights the complexities involved in creating effective haptic feedback due to the intricate interplay of object semantics, physical properties, and spatial context. Traditional GUI-based haptic design tools have facilitated the creation and editing of haptic signals, yet they still require significant manual effort, particularly in complex scenes with numerous interactive elements. The proposed Scene2Hap system addresses these limitations by automating object-level haptic design through large language model (LLM)-based inference, enabling scalable and context-aware haptic generation without the need for low-level parameter tuning.

The paper further explores the significance of vibration propagation across surfaces, which has been largely overlooked in existing haptic design methodologies. It emphasizes that the mechanical properties of materials influence how vibrations are perceived, suggesting that incorporating these factors into haptic feedback can enhance user experience. Scene2Hap utilizes a physics-inspired model to simulate vibration propagation and attenuation based on material properties, allowing for real-time, contextually relevant haptic feedback that reflects the physical interactions within the VR environment. This innovative approach transforms visual-only scenes into multimodal experiences, enhancing the realism and immersion of VR interactions.