دمج الأجسام الحضرية ثلاثية الأبعاد من جميع أنحاء العالم لتحسين تصنيف الأجسام والتقسيم الدلالي Combining 3D Urban Objects from All Around the World to Improve Object Classification and Semantic Segmentation

المجلة: PFG – Journal of Photogrammetry Remote Sensing and Geoinformation Science، المجلد: 94، العدد: 3
DOI: https://doi.org/10.1007/s41064-025-00374-7
تاريخ النشر: 2026-01-14
المؤلف: Onur Can Bayrak وآخرون
الموضوع الرئيسي: تطبيقات الشبكات العصبية المتقدمة

نظرة عامة

تتناول ورقة البحث أهمية مجموعة بيانات ESTATE في تعزيز تصنيف سحب النقاط الحضرية، لا سيما للأجسام الحضرية الممثلة تمثيلاً ناقصًا مثل إشارات المرور والأعمدة الكهربائية. تتكون مجموعة البيانات من آلاف الحالات عبر 13 فئة، مما يعالج قيود الشبكات العصبية الحالية التي تكافح مع التعميم عبر مجموعات البيانات بسبب اختلافات في المستشعرات وأشكال الأجسام وعدم توازن الفئات. تشير النتائج التجريبية إلى أن دمج مجموعة بيانات ESTATE مع خوارزميات التصنيف الحديثة يحسن الأداء ويسهل تقسيمًا دلاليًا أفضل من خلال زيادة تمثيل الفئات الممثلة تمثيلاً ناقصًا.

تشير النتائج إلى أن استخدام قيم الإحداثيات فقط (مدخلات XYZ) من مجموعة بيانات ESTATE يحقق أعلى درجات التصنيف، على الرغم من أن الورقة تؤكد على أهمية استكشاف تكوينات مدخلات مختلفة مصممة خصيصًا للأجسام الحضرية المحددة. تشمل اتجاهات البحث المستقبلية تقييم هياكل الشبكات العصبية الإضافية، وتقييم القوة ضد الضوضاء، وتوسيع مجموعة البيانات بمزيد من الأجسام الممثلة تمثيلاً ناقصًا، وتطبيق مجموعة البيانات لتقسيم الحالات أو التقسيم الشامل. تسلط الدراسة الضوء على أن دمج البيانات من مصادر متنوعة يمكن أن يعزز تعميم النموذج، وهي فائدة لوحظت أيضًا مع الخصائص المتنوعة لمجموعة بيانات ESTATE. جميع البيانات والنتائج متاحة للجمهور على الرابط المقدم في GitHub.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على تزايد توفر وتطبيق بيانات 3D الحضرية، لا سيما سحب النقاط، عبر مجالات متنوعة مثل القيادة الذاتية، والروبوتات، وإدارة المدن. أدت التطورات الأخيرة في التكنولوجيا إلى دمج المعلومات اللونية، مما يعزز فهم البيئات الحضرية. يركز المجتمع البحثي بشكل كبير على إثراء دلالي وتصنيف هذه السحب، حيث تعتمد الطرق التقليدية على ميزات مصممة يدويًا ومصنفات تعلم الآلة. ومع ذلك، فإن ظهور تقنيات التعلم العميق قد حول هذا المشهد، على الرغم من أن التحديات لا تزال قائمة، لا سيما فيما يتعلق بعدم توازن الفئات والتعميم عبر مجموعات بيانات ومستشعرات مختلفة.

تحدد الورقة القضايا الحرجة في تصنيف سحب النقاط ثلاثية الأبعاد، ولا سيما التمثيل الناقص للأجسام الحضرية الصغيرة والصعوبات التي تطرحها عدم توازن الفئات الشديد. تكافح طرق التعلم العميق الحالية للتعميم عبر المعايير، وغالبًا ما تؤدي بشكل جيد على مجموعات بيانات معينة ولكنها تفشل عند مواجهة بيانات من مصادر مختلفة. لمعالجة هذه التحديات، يقترح المؤلفون دمج مجموعة متنوعة من العناصر الحضرية الممثلة تمثيلاً ناقصًا في عملية التعلم. تهدف الورقة إلى تعزيز الواقعية وقابلية تطبيق مجموعات البيانات، من خلال تقديم مجموعة بيانات ESTATE، التي تركز على تحسين تصنيف الأجسام الحضرية الممثلة تمثيلاً ناقصًا. توضح الأقسام التالية إنشاء مجموعة البيانات، ودمجها في تدريب الشبكات العصبية، وتقييم تأثيرها على أداء التصنيف والتقسيم الدلالي.

النتائج

يقدم قسم النتائج تحليلًا مقارنًا لثلاث هياكل شبكات عصبية—KPConv وOctformer وMinkowski—تستخدم تكوينات مدخلات متنوعة (XYZ، XYZ + الكثافة، XYZ + RGB) واستراتيجيات التدريب/الاختبار (STST، ATST، وATAT) على مجموعة بيانات ES-TATE. تشير النتائج إلى أن استراتيجية ATST تعزز باستمرار أداء النموذج عبر جميع الشبكات من خلال الاستفادة من مجموعة بيانات تدريب أوسع، مما يحسن قدرات التعميم. على سبيل المثال، يختلف أداء KPConv مع مدخلات XYZ بشكل كبير عبر مجموعات البيانات، حيث يحقق دقة عالية في Paris-Lille3D (0.910) ولكن أقل في TR-MLS (0.611). بالمقابل، يظهر Octformer أداءً متفوقًا مع درجة إجمالية تبلغ 0.87 تحت تكوين STST، متفوقًا بشكل خاص في مجموعات بيانات مثل Swiss3DCities (0.983).

عمومًا، يحسن تضمين بيانات الكثافة أداء شبكات KPConv وMinkowski، على الرغم من أنه يؤدي إلى انخفاض في الدرجات الإجمالية عند تقييمها عبر مجموعات بيانات متنوعة. على وجه التحديد، ينخفض أداء KPConv مع مدخلات XYZ + الكثافة من 0.85 إلى 0.82، بينما يظهر Minkowski زيادة ملحوظة من 0.41 إلى 0.77 عند تضمين ميزات الكثافة. أداء الشبكات مع مدخلات RGB أقل اتساقًا، مما يشير إلى تحديات محتملة في التطبيقات المحددة بالمستشعر. بشكل عام، تعزز استراتيجية ATST بشكل كبير دقة النموذج والتعميم من خلال توفير فهم شامل لتنوعات مجموعة البيانات، مما يبرز أهمية بيانات التدريب المجمعة في تحسين أداء التصنيف.

المناقشة

يسلط قسم المناقشة في ورقة البحث الضوء على التقدم والقيود في مجموعات البيانات والمعايير المختلفة لتصنيف الأجسام ثلاثية الأبعاد وتقسيم سحب النقاط الدلالي، لا سيما في البيئات الحضرية. يتم فحص مجموعات بيانات بارزة مثل Sydney Urban Objects وModelNet40 وShapeNet لمساهماتها في هذا المجال. بينما توفر مجموعات بيانات مثل ModelNet40 بيانات تركيبية منظمة مفيدة لمعايرة الخوارزميات، إلا أنها تفتقر إلى قابلية التطبيق في العالم الحقيقي. على العكس من ذلك، تقدم مجموعات بيانات مثل ScanNet وScanObjectNN بيانات من العالم الحقيقي ولكنها محدودة في تمثيلها للأجسام الحضرية الخارجية، والتي تعتبر حاسمة لتطبيقات مثل القيادة الذاتية. يؤكد القسم على التحدي المستمر لعدم توازن الفئات في مجموعات البيانات الحضرية، حيث تهيمن الأجسام الشائعة مثل المباني والطرق، مما يترك الفئات الممثلة تمثيلاً ناقصًا، مثل أثاث الشوارع، غير ممثلة بشكل كافٍ.

تقدم الورقة مجموعة بيانات ESTATE، المصممة لمعالجة هذه القيود من خلال توفير مجموعة غنية من أكثر من 6,000 جسم حضري موضح عبر 13 فئة، مستمدة من مجموعات بيانات عالمية متنوعة. تهدف هذه المجموعة إلى تعزيز قدرات التعميم للشبكات العصبية في تصنيف الأجسام الحضرية الممثلة تمثيلاً ناقصًا. شملت إعداد مجموعة بيانات ESTATE استخراجًا يدويًا دقيقًا وفحوصات بصرية لضمان جودة وتمييز الأجسام، مع التركيز على الحفاظ على تمثيل واقعي للبيئات الحضرية. تؤكد النتائج على ضرورة تحسين مجموعات البيانات التي يمكن أن تعكس بشكل أفضل تعقيدات السيناريوهات الحضرية في العالم الحقيقي، مما يسهل التقدم في منهجيات تصنيف الأجسام ثلاثية الأبعاد وتقسيمها الدلالي.

Journal: PFG – Journal of Photogrammetry Remote Sensing and Geoinformation Science, Volume: 94, Issue: 3
DOI: https://doi.org/10.1007/s41064-025-00374-7
Publication Date: 2026-01-14
Author(s): Onur Can Bayrak et al.
Primary Topic: Advanced Neural Network Applications

Overview

The research paper discusses the significance of the ESTATE dataset in enhancing urban point cloud classification, particularly for under-represented urban objects such as traffic lights and electrical poles. The dataset comprises thousands of instances across 13 classes, addressing the limitations of existing neural networks that struggle with cross-dataset generalization due to variations in sensors, object shapes, and class imbalances. Experimental results indicate that integrating the ESTATE dataset with state-of-the-art classification algorithms improves performance and facilitates better semantic segmentation by increasing the representation of under-represented classes.

The findings suggest that utilizing only the coordinate values (XYZ input) from the ESTATE dataset yields the highest classification scores, although the paper emphasizes the importance of exploring different input configurations tailored to specific urban objects. Future research directions include evaluating additional neural network architectures, assessing robustness against noise, expanding the dataset with more under-represented objects, and applying the dataset for instance or panoptic segmentation. The study highlights that combining data from various sources can enhance model generalization, a benefit also observed with the diverse characteristics of the ESTATE dataset. All data and findings are publicly accessible at the provided GitHub link.

Introduction

The introduction of this research paper highlights the growing availability and application of urban 3D data, particularly point clouds, across various fields such as autonomous driving, robotics, and urban management. Recent advancements in technology have led to the integration of colorimetric information, enhancing the understanding of urban environments. A significant focus of the research community is on semantic enrichment and classification of these point clouds, with traditional methods relying on manually designed features and machine learning classifiers. However, the emergence of deep learning techniques has transformed this landscape, although challenges remain, particularly regarding class imbalance and generalization across different datasets and sensors.

The paper identifies critical issues in classifying 3D point clouds, notably the under-representation of small urban objects and the difficulties posed by extreme class imbalance. Current deep learning methods struggle to generalize across benchmarks, often performing well on specific datasets but failing when faced with data from different sources. To address these challenges, the authors propose the incorporation of a diverse set of under-represented urban elements into the learning process. The paper aims to enhance the realism and applicability of datasets, specifically through the introduction of the ESTATE dataset, which focuses on improving the classification of under-represented urban objects. The subsequent sections outline the dataset’s creation, its integration into neural network training, and the evaluation of its impact on classification and semantic segmentation performance.

Results

The results section presents a comparative analysis of three neural network architectures—KPConv, Octformer, and Minkowski—utilizing various input configurations (XYZ, XYZ + Intensity, XYZ + RGB) and training/testing strategies (STST, ATST, and ATAT) on the ES-TATE dataset. The findings indicate that the ATST strategy consistently enhances model performance across all networks by leveraging a broader training dataset, thus improving generalization capabilities. For instance, KPConv’s performance with XYZ input varies significantly across datasets, achieving high accuracy in Paris-Lille3D (0.910) but lower in TR-MLS (0.611). In contrast, Octformer demonstrates superior performance with an overall score of 0.87 under the STST configuration, particularly excelling in datasets like Swiss3DCities (0.983).

The inclusion of intensity data generally improves the performance of KPConv and Minkowski networks, although it leads to a decrease in overall scores when evaluated across diverse datasets. Specifically, KPConv’s performance with XYZ + Intensity input drops from 0.85 to 0.82, while Minkowski shows a notable increase from 0.41 to 0.77 when intensity features are included. The performance of the networks with RGB inputs is less consistent, indicating potential challenges in sensor-specific applications. Overall, the ATST strategy significantly enhances model accuracy and generalization by providing a comprehensive understanding of dataset variations, underscoring the importance of aggregated training data in improving classification performance.

Discussion

The discussion section of the research paper highlights the advancements and limitations of various datasets and benchmarks for 3D object classification and point cloud semantic segmentation, particularly in urban environments. Notable datasets such as Sydney Urban Objects, ModelNet40, and ShapeNet are examined for their contributions to the field. While datasets like ModelNet40 provide structured synthetic data beneficial for algorithm benchmarking, they lack real-world applicability. Conversely, datasets like ScanNet and ScanObjectNN offer real-world data but are limited in their representation of outdoor urban objects, which are critical for applications such as autonomous driving. The section emphasizes the ongoing challenge of class imbalance in urban datasets, where common objects like buildings and roads dominate, leaving under-represented classes, such as street furniture, inadequately sampled.

The paper introduces the ESTATE dataset, designed to address these limitations by providing a rich collection of over 6,000 annotated urban objects across 13 classes, sourced from diverse global datasets. This dataset aims to enhance the generalization capabilities of neural networks in classifying under-represented urban objects. The preparation of the ESTATE dataset involved meticulous manual extraction and visual checks to ensure the quality and distinguishability of objects, with a focus on maintaining a realistic representation of urban environments. The findings underscore the necessity for improved datasets that can better reflect the complexities of real-world urban scenarios, thereby facilitating advancements in 3D object classification and semantic segmentation methodologies.