بحث حول رؤية الكمبيوتر في المراقبة الذكية للأضرار في الحفاظ على التراث: حالة لوحات كهوف يونغوانغ Research on computer vision in intelligent damage monitoring of heritage conservation: the case of Yungang Cave Paintings

المجلة: npj Heritage Science، المجلد: 13، العدد: 1
DOI: https://doi.org/10.1038/s40494-025-01567-4
تاريخ النشر: 2025-03-01
المؤلف: Jiawei Zhan وآخرون
الموضوع الرئيسي: تقنيات الحفظ والدراسات

نظرة عامة

تقدم هذه الورقة البحثية خوارزمية مبتكرة للرؤية الحاسوبية، تُسمى YOLO CP (رسومات الكهوف)، تهدف إلى تعزيز دقة وسرعة مراقبة الأضرار لرسومات الكهوف، وهو جانب حيوي من حماية التراث الثقافي. تتضمن الخوارزمية وحدة كتلة C2f-FasterEMA المطورة حديثًا، التي تُحسن وحدة البقايا في الطبقة العميقة لتحسين استخراج ميزات الهدف مع تقليل المعلمات. بالإضافة إلى ذلك، يتم دمج آلية RepGD (جمع وتوزيع التكرار) في شبكة دمج الميزات، مما يعزز بشكل كبير دمج معلومات الميزات عبر الطبقات ودقة الكشف. يُحسن إدخال Inner-SIoU (التقاطع المكاني الخطي على الاتحاد) دالة الخسارة، مما يُحسن مقياس نسبة العرض إلى الارتفاع ويُسرع من تقارب النموذج. تشير النتائج التجريبية إلى أن YOLO CP تقلل من المعلمات والعمليات العائمة بنسبة 8.14% و7.14% على التوالي، مقارنةً بالخط الأساسي YOLOv10 الأصلي، مع تحقيق تحسينات في الدقة والاسترجاع بمقدار 1.82 و7.05 نقطة مئوية على التوالي، مع سرعة كشف في الوقت الحقيقي تبلغ 277.39 إطارًا في الثانية.

تختتم الورقة بالتأكيد على إمكانيات الخوارزمية في الحفظ الرقمي واستعادة رسومات الكهوف، مدعومة بمجموعة بيانات تم إنشاؤها حديثًا لرسومات كهف يونغوانغ للتطوير الأكاديمي والتكنولوجي. على الرغم من مزاياها في الدقة والسرعة، تواجه الخوارزمية تحديات في الكشف عن الأضرار الدقيقة، مثل الشقوق الدقيقة، بسبب عوامل بيئية مثل الضوضاء والإضاءة المتغيرة. تشمل اتجاهات البحث المستقبلية تحسين الخوارزمية للتطبيق في الوقت الحقيقي على المنصات المدمجة، وتكييفها لأنواع مختلفة من التراث الثقافي، واستكشاف تقنيات متقدمة مثل التصوير متعدد الأطياف والتعلم الذاتي المعزز لتعزيز قدرات الكشف. يضع هذا العمل إطارًا أساسيًا للتقدم المستمر في مجال الحفاظ على التراث الثقافي.

الطرق

في هذا القسم، يحدد المؤلفون المنهجيات المستخدمة للكشف عن الأضرار في رسومات الكهوف، مع التركيز على خوارزميات التعلم العميق المختلفة. تستخدم الدراسة بشكل أساسي نموذج YOLOv10، المعروف بإطاره الفعال للكشف من مرحلة واحدة، والذي يعد مفيدًا بشكل خاص للتطبيقات في الوقت الحقيقي. بينما تتفوق سلسلة YOLO، بما في ذلك YOLOv10، في السرعة والهندسة الخفيفة، يتم مناقشة خوارزميات Real-time Detection Transformer (RT-DETR) وMamba أيضًا لقوتها في التعامل مع القوام المعقدة وتحقيق دقة عالية، على التوالي. ومع ذلك، فإن المتطلبات الحاسوبية لـ RT-DETR وMamba تحد من نشرها العملي على الأجهزة ذات الموارد المحدودة، مما يجعل YOLOv10n الخيار المفضل لهذا البحث.

تم توحيد إعداد التجربة عبر جميع الاختبارات، مع تقييم مقاييس الأداء من خلال مصفوفة الارتباك التي توضح فعالية النموذج في التمييز بين أنواع الأضرار مثل الشقوق والتقشير. تشير النتائج إلى دقة كشف تبلغ 0.79 للشقوق و0.84 للتقشير، مما يُظهر أداء النموذج القوي. تكشف نتائج التدريب عن انخفاض مستمر في قيم الخسارة وتحسن عام في الدقة، الذي يستقر بين 0.8349 و0.8923. بالإضافة إلى ذلك، يُظهر معدل الاسترجاع تحسينًا كبيرًا، حيث يرتفع من 0.3472 إلى 0.7446، مما يشير إلى قدرة النموذج المتزايدة على تحديد المناطق المتضررة. تشير النتائج إلى أن نموذج YOLOv10n يتعلم بفعالية التنقل في تعقيدات أضرار رسومات الكهوف، محققًا توازنًا بين الكفاءة والدقة وهو أمر حاسم للحفاظ على التراث.

النتائج

في هذا القسم، تقدم الدراسة نتائج التجارب التي أُجريت باستخدام خوارزمية YOLO CP المقترحة للكشف وتصنيف الأضرار في رسومات الكهوف، مقارنةً بأدائها ضد النماذج الأساسية YOLOv8n وYOLOv9 وYOLOv10n. تكشف التجارب، التي أُجريت على مدى 500 تكرار، أنه بينما يكافح نموذج YOLOv9 مع الكشف المكرر وتحديد موقع الصناديق غير الدقيقة، خاصةً بالنسبة للشقوق التي تشبه ميزات الخلفية، تحقق خوارزمية YOLO CP مستوى ثقة متفوقًا يبلغ 0.84.

تشير التحليلات الإضافية إلى أن جميع النماذج الأساسية، بما في ذلك YOLOv8n وYOLOv9 وYOLOv10n، تُظهر حالات كشف مفقودة للأضرار الدقيقة للشقوق، مع مستويات ثقة أقل في نتائجها. في المقابل، تميز خوارزمية YOLO CP، باستخدام آلية FasterEMA Block، الشقوق بفعالية عن الخلفيات المماثلة، مما يعزز دقة الكشف. توضح النتائج أن خوارزمية YOLO CP لا تحسن فقط تحديد موقع الصناديق ولكن أيضًا تتفوق في تحديد وتحديد الأضرار وسط القوام المعقدة والاضطرابات الخلفية، مما يبرز إمكانياتها الكبيرة في تحليل القطع الأثرية الرقمية في سياق رسومات الكهوف.

المناقشة

يقدم قسم المناقشة في الورقة البحثية عدة وحدات مبتكرة تهدف إلى تحسين الكشف والتجزئة للأضرار في رسومات الكهوف. تعزز وحدة كتلة FasterEMA تمثيل الميزات من خلال تحسين ميزات الصورة من خلال مجموعة من التلافيف 1 × 1 و3 × 3، مما يسمح بالتقاط معلومات مكانية متعددة المقاييس بشكل فعال. تستفيد هذه الوحدة من آلية EMA لتسهيل التعلم عبر الفضاء دون ضغط أبعاد القنوات، مما يُغني تجميع الميزات ويحسن الانتباه على مستوى البكسل. بالإضافة إلى ذلك، يتم تقديم آلية RepGD لتعزيز دمج المعلومات عبر الطبقات، مع معالجة عدم الكفاءة وفقدان المعلومات المحتمل أثناء استخراج الميزات. تتضمن هذه الآلية استراتيجيات تجميع ضحلة وعميقة لتوحيد أبعاد الميزات وحقن الميزات العالمية في طبقات مختلفة، مما يُحسن بشكل كبير أداء الكشف للأهداف ذات الأحجام المتنوعة.

يناقش القسم أيضًا تحسين دالة الخسارة من خلال نهج Inner-SIoU، الذي يُحسن معدلات التقارب من خلال حساب التقاطع على الاتحاد (IoU) باستخدام صندوق حدود مساعد. تتيح هذه الطريقة إجراء انحدار أكثر فعالية، خاصةً في سيناريوهات IoU العالية والمنخفضة، مما يُحسن في النهاية قدرات الكشف للنموذج. تُظهر خوارزمية YOLO CP، التي تدمج هذه التطورات، أداءً متفوقًا في الدقة والاسترجاع ومتوسط الدقة العامة (mAP) مقارنةً بالنماذج الحالية، محققةً دقة تبلغ 89.2% و82.7% mAP@0.5. تجعل تصميم الخوارزمية الخفيف وسرعة الكشف العالية (277.39 إطارًا في الثانية) مناسبة بشكل خاص للمراقبة في الوقت الحقيقي لرسومات الكهوف، مما يُظهر قابليتها العملية في الحفاظ على التراث الثقافي. بشكل عام، تعزز التحسينات المقترحة في هذه الدراسة بشكل كبير من قوة النموذج وقدراته على التعميم في الكشف عن أنماط الأضرار المعقدة في فن الكهوف.

Journal: npj Heritage Science, Volume: 13, Issue: 1
DOI: https://doi.org/10.1038/s40494-025-01567-4
Publication Date: 2025-03-01
Author(s): Jiawei Zhan et al.
Primary Topic: Conservation Techniques and Studies

Overview

This research paper presents an innovative computer vision algorithm, termed YOLO CP (Cave Paintings), aimed at enhancing the precision and speed of damage monitoring for cave paintings, a vital aspect of cultural heritage protection. The algorithm incorporates a newly developed C2f-FasterEMA Block module, which refines the deep layer’s residual module to improve target feature extraction while reducing parameters. Additionally, the RepGD (Rep Gather-and-Distribute) mechanism is integrated into the feature fusion network, significantly enhancing cross-layer feature information fusion and detection accuracy. The introduction of Inner-SIoU (Linear Spatial Intersection over Union) optimizes the loss function, improving the aspect ratio metric and accelerating model convergence. Experimental results indicate that YOLO CP reduces parameters and floating-point operations by 8.14% and 7.14%, respectively, compared to the original YOLOv10 baseline, while achieving improvements in Precision and Recall by 1.82 and 7.05 percentage points, respectively, with a real-time detection speed of 277.39 FPS.

The paper concludes by emphasizing the algorithm’s potential for digital preservation and restoration of cave paintings, supported by a newly created dataset of Yungang Cave paintings for academic and technological development. Despite its advantages in accuracy and speed, the algorithm faces challenges in detecting subtle damage, such as micro-cracks, due to environmental factors like noise and variable lighting. Future research directions include optimizing the algorithm for real-time application on embedded platforms, adapting it for various cultural heritage types, and exploring advanced technologies such as multispectral imaging and self-supervised learning to enhance detection capabilities. This work lays a foundational framework for ongoing advancements in the field of cultural heritage conservation.

Methods

In this section, the authors outline the methodologies employed for damage detection in cave paintings, focusing on various deep learning algorithms. The study primarily utilizes the YOLOv10 model, recognized for its efficient single-stage detection framework, which is particularly advantageous for real-time applications. While the YOLO series, including YOLOv10, excels in speed and lightweight architecture, the Real-time Detection Transformer (RT-DETR) and Mamba algorithms are also discussed for their strengths in handling complex textures and achieving high accuracy, respectively. However, the computational demands of RT-DETR and Mamba limit their practical deployment on resource-constrained devices, making YOLOv10n the preferred choice for this research.

The experimental setup is standardized across all tests, with performance metrics evaluated through a confusion matrix that illustrates the model’s effectiveness in distinguishing between damage types such as fissures and shedding. The results indicate a detection precision of 0.79 for fissures and 0.84 for spalling, demonstrating the model’s robust performance. Training outcomes reveal a consistent reduction in loss values and an overall improvement in precision, which stabilizes between 0.8349 and 0.8923. Additionally, the recall rate shows significant enhancement, rising from 0.3472 to 0.7446, indicating the model’s increasing capability to identify damaged areas. The findings suggest that the YOLOv10n model effectively learns to navigate the complexities of cave painting damage, achieving a balance between efficiency and accuracy critical for heritage conservation.

Results

In this section, the study presents the results of experiments conducted using the proposed YOLO CP algorithm for detecting and classifying damage in cave paintings, comparing its performance against baseline models YOLOv8n, YOLOv9, and YOLOv10n. The experiments, carried out over 500 iterations, reveal that while the YOLOv9 model struggles with duplicate detections and inaccurate bounding box localization, particularly for cracks that resemble background features, the YOLO CP algorithm achieves a superior confidence level of 0.84.

Further analysis indicates that all baseline models, including YOLOv8n, YOLOv9, and YOLOv10n, exhibit missed detections for finer crack damage, with lower confidence levels in their results. In contrast, the YOLO CP algorithm, utilizing the FasterEMA Block mechanism, effectively distinguishes cracks from similar backgrounds, enhancing detection accuracy. The findings illustrate that the YOLO CP algorithm not only improves bounding box localization but also excels in identifying and localizing damage amidst complex textures and background disturbances, underscoring its significant potential for digital artifact analysis in the context of cave paintings.

Discussion

The discussion section of the research paper introduces several innovative modules aimed at improving the detection and segmentation of damage in cave paintings. The FasterEMA Block module enhances feature representation by refining image features through a combination of 1 × 1 and 3 × 3 convolutions, allowing for effective multi-scale spatial information capture. This module leverages the EMA mechanism to facilitate cross-spatial learning without compressing channel dimensions, thereby enriching feature aggregation and improving pixel-level attention. Additionally, the RepGD mechanism is introduced to enhance information fusion across layers, addressing inefficiencies and potential information loss during feature extraction. This mechanism incorporates shallow and deep aggregation strategies to standardize feature dimensions and inject global features into various layers, significantly improving detection performance for targets of varying sizes.

The section also discusses the optimization of the loss function through the Inner-SIoU approach, which enhances convergence rates by calculating Intersection over Union (IoU) using an auxiliary bounding box. This method allows for more effective regression, particularly in high and low IoU scenarios, ultimately improving the model’s detection capabilities. The YOLO CP algorithm, which integrates these advancements, demonstrates superior performance in precision, recall, and mean average precision (mAP) compared to existing models, achieving 89.2% precision and 82.7% mAP@0.5. The algorithm’s lightweight design and high detection speed (277.39 FPS) make it particularly suitable for real-time monitoring of cave paintings, showcasing its practical applicability in cultural heritage preservation. Overall, the enhancements proposed in this study significantly bolster the model’s robustness and generalization capabilities in detecting intricate damage patterns in cave art.