إعادة بناء رقمية مدفوعة بالذكاء المتعدد الأنماط للتراث الثقافي في مناظر مزارع الشاي من سلالة تشينغ Multimodal intelligent reconstruction-driven digital regeneration of cultural heritage in Qing Dynasty tea plantation landscapes

المجلة: npj Heritage Science، المجلد: 14، العدد: 1
DOI: https://doi.org/10.1038/s40494-025-02183-y
تاريخ النشر: 2026-01-17
المؤلف: Jie Chen وآخرون
الموضوع الرئيسي: الشبكات التنافسية التوليدية وتوليد الصور

نظرة عامة

يتناول هذا القسم من ورقة البحث التأثير التحويلي للذكاء الاصطناعي، وخاصة التعلم العميق، على الحفاظ على المناظر الطبيعية التاريخية والثقافية واستخدامها. يقترح المؤلفون سير عمل شامل للتجديد الرقمي مصمم للصور التاريخية المجزأة، والذي يدمج تقنيات تحليل الصور—مثل التقسيم واستخراج الألوان—مع طرق التوليد الشرطي والعرض الديناميكي.

مع التركيز على 104 لوحات تصدير من سلالة تشينغ، يقدم البحث بنية هجينة تجمع بين “الانتشار المستقر” و”ControlNet” لتسهيل توليد المشاهد الدقيقة من خلال التقسيم الدلالي وتحليل الألوان HSV. يحدد البحث عملية تقنية متعددة الوسائط تُسمى “إعادة البناء الذكي-التجديد الديناميكي”، والتي تتجاوز تطبيقات نماذج التوليد التقليدية. تحول المخرجات الناتجة الصور التاريخية المجزأة إلى عروض فضائية متعددة الوسائط مدفوعة بالسرد، مما يوفر إطار عمل قابلاً لإعادة الاستخدام لتجارب التراث الغامرة ويعزز الحفاظ الثقافي المستدام.

الطرق

يستعرض قسم “الطرق” في ورقة البحث تصميم التجربة والتقنيات التحليلية المستخدمة للتحقيق في أسئلة البحث. استخدمت الدراسة نهجًا كميًا، مع دمج التحليلات الإحصائية لتقييم البيانات المجمعة من تجارب مختلفة. تضمنت المنهجيات المحددة تجارب مختبرية محكومة، حيث تم التلاعب بالمتغيرات بشكل منهجي لملاحظة تأثيراتها على النتائج المعنية.

شملت جمع البيانات استخدام أدوات موحدة لضمان الموثوقية والصلاحية، تلتها اختبارات إحصائية صارمة لتحليل النتائج. تم استخدام تقنيات مثل تحليل الانحدار وANOVA لتحديد الفروق والعلاقات المهمة بين المتغيرات. يبرز القسم أهمية إمكانية التكرار والشفافية في الطرق المستخدمة، موفرًا حسابًا تفصيليًا للإجراءات لتسهيل البحث المستقبلي في هذا المجال.

النتائج

تقدم نتائج هذه الدراسة إطار تقييم لتجربة المستخدم في إعادة إنتاج المناظر الطبيعية لإنتاج الشاي من سلالة تشينغ في الفضاء الافتراضي، مستندة إلى المنطق المعرفي للمستخدم ونموذج قبول التكنولوجيا (TAM). يتكون الإطار من ثلاثة أبعاد أساسية: فعالية توصيل المعلومات، إدراك تجربة المستخدم، وفعالية التطبيق الفني، يتم تقييم كل منها من خلال مؤشرات أولية وثانوية. استخدمت الدراسة استبيانًا مع مقياس ليكرت من 7 نقاط، مما أسفر عن 122 استجابة صالحة. أكدت التحليلات الإحصائية، بما في ذلك اختبارات الموثوقية والصلاحية، التناسق الداخلي وملاءمة نموذج القياس، مع قيمة ألفا كرونباخ تبلغ 0.796 وقيمة KMO تبلغ 0.799، مما يشير إلى جودة قياس جيدة.

تكشف النتائج الرئيسية أن بعد فعالية التطبيق الفني تفوق على الآخرين، مما يبرز دور تقنية GAI في تعزيز دقة وبلاغة استعادة المناظر الطبيعية التاريخية. أظهرت الدراسة أن GAI تدعم السرد الديناميكي والتجارب التفاعلية، مما يحسن من تفاعل المستخدم وفهمه الثقافي. على الرغم من النتائج الواعدة، تم الإشارة إلى قيود مثل حجم العينة الصغيرة والحاجة إلى تحسين دقة القياس في بعد فعالية توصيل المعلومات. تشمل اتجاهات البحث المستقبلية توسيع حجم العينة، واستخدام طرق مختلطة للحصول على رؤى أعمق، والتحقق من مخرجات الذكاء الاصطناعي مقابل السجلات التاريخية لضمان الأصالة في المشاهد المعاد بناؤها.

المناقشة

يستعرض قسم المناقشة في ورقة البحث منهجية متطورة لتجديد المناظر الطبيعية للتراث الثقافي رقميًا، مع التركيز بشكل خاص على لوحات تصدير سلالة تشينغ المتعلقة بإنتاج الشاي. يدمج النهج رؤية الكمبيوتر والذكاء الاصطناعي التوليدي من خلال سير عمل من ثلاث مراحل. تستخدم المرحلة الأولية تجميع K-means لتحليل الألوان في فضاء الألوان HSV، مما يحدد بفعالية لوحات الألوان السائدة التي توجه النماذج التوليدية للحفاظ على الأصالة اللونية. بالإضافة إلى ذلك، يتم استخدام نموذج تقسيم أي شيء (SAM) للتقسيم الدقيق على مستوى البكسل لعناصر المناظر الطبيعية الرئيسية، والتي تعمل كمدخلات شرطية مكانية للعمليات التوليدية اللاحقة.

تسهل القدرة التوليدية الأساسية من خلال دمج الانتشار المستقر (SD) وControlNet، مما يسمح بتحكم معزز في التركيب المكاني مع الحفاظ على معرفة النماذج المدربة مسبقًا. يمكّن هذا الدمج من إجراء تعديلات دقيقة على الصور المولدة، مما يضمن الالتزام بالهياكل التاريخية والحدود الدلالية. تترجم المرحلة النهائية من سير العمل هذه الصور الثابتة إلى سرد ديناميكي باستخدام بنية محول الانتشار (DiT)، والتي تدعم توليد الفيديو وتدمج الصوت الناتج عن الذكاء الاصطناعي لإنشاء تجربة غامرة. لا تعيد المنهجية العامة بناء المشاهد التاريخية فحسب، بل تغنيها أيضًا بسرد متعدد الحواس، مما ينقل بفعالية الأهمية الثقافية لمناظر إنتاج الشاي من سلالة تشينغ.

Journal: npj Heritage Science, Volume: 14, Issue: 1
DOI: https://doi.org/10.1038/s40494-025-02183-y
Publication Date: 2026-01-17
Author(s): Jie Chen et al.
Primary Topic: Generative Adversarial Networks and Image Synthesis

Overview

This research paper section discusses the transformative impact of artificial intelligence, particularly deep learning, on the preservation and utilization of historical and cultural landscapes. The authors propose a comprehensive digital regeneration workflow designed for fragmented historical images, which integrates image analysis techniques—such as segmentation and color extraction—with conditional generation and dynamic presentation methods.

Focusing on 104 Qing Dynasty export paintings, the study introduces a hybrid architecture combining “Stable Diffusion” and “ControlNet” to facilitate fine-grained scene generation through semantic segmentation and HSV color analysis. The research establishes a multimodal technical process termed “intelligent reconstruction-dynamic regeneration,” which transcends traditional generative model applications. The resulting outputs convert fragmented historical imagery into narrative-driven multimedia spatial demonstrations, thereby providing a reusable framework for immersive heritage experiences and promoting sustainable cultural preservation.

Methods

The “Methods” section of the research paper outlines the experimental design and analytical techniques employed to investigate the research questions. The study utilized a quantitative approach, incorporating statistical analyses to evaluate the data collected from various experiments. Specific methodologies included controlled laboratory experiments, where variables were systematically manipulated to observe their effects on the outcomes of interest.

Data collection involved the use of standardized instruments to ensure reliability and validity, followed by rigorous statistical testing to analyze the results. Techniques such as regression analysis and ANOVA were employed to determine significant differences and relationships among the variables. The section emphasizes the importance of replicability and transparency in the methods used, providing a detailed account of the procedures to facilitate future research in the field.

Results

The results of this study present an evaluation framework for user experience in the virtual reproduction of Qing Dynasty Lingnan tea production landscapes, grounded in user cognitive logic and the Technology Acceptance Model (TAM). The framework comprises three core dimensions: Information Delivery Effectiveness, User Experience Perception, and Technical Application Efficacy, each assessed through primary and secondary indicators. The study utilized a questionnaire survey with a 7-point Likert scale, yielding 122 valid responses. Statistical analyses, including reliability and validity tests, confirmed the internal consistency and suitability of the measurement model, with a Cronbach’s α of 0.796 and a KMO value of 0.799, indicating good measurement quality.

Key findings reveal that the Technical Application Efficacy dimension outperformed others, highlighting GAI technology’s role in enhancing the accuracy and expressiveness of historical landscape restorations. The study demonstrated that GAI supports dynamic narratives and interactive experiences, thereby improving user engagement and cultural comprehension. Despite the promising results, limitations such as a small sample size and the need for improved measurement precision in the Information Delivery Effectiveness dimension were noted. Future research directions include expanding the sample size, employing mixed methods for deeper insights, and validating AI outputs against historical records to ensure authenticity in reconstructed scenes.

Discussion

The discussion section of the research paper outlines a sophisticated methodology for digitally regenerating cultural heritage landscapes, specifically focusing on Qing Dynasty export paintings related to tea production. The approach integrates computer vision and generative AI through a three-stage workflow. The initial stage employs K-means clustering for color analysis in the HSV color space, effectively identifying dominant color palettes that guide the generative models to maintain chromatic authenticity. Additionally, the Segment Anything Model (SAM) is utilized for precise pixel-level segmentation of key landscape elements, which serves as spatial conditioning inputs for subsequent generative processes.

The core generative capability is facilitated by a combination of Stable Diffusion (SD) and ControlNet, allowing for enhanced control over spatial composition while preserving the knowledge of the pretrained models. This integration enables fine-tuned adjustments to the generated images, ensuring adherence to historical structures and semantic boundaries. The final stage of the workflow translates these static images into dynamic narratives using a Diffusion Transformer (DiT) architecture, which supports video generation and integrates AI-generated audio to create an immersive experience. The overall methodology not only reconstructs historical scenes but also enriches them with a multisensory narrative, effectively conveying the cultural significance of the Qing Dynasty tea production landscape.