تصنيف الفخار الكوري التقليدي القابل للتفسير باستخدام تعلم الآلة استنادًا إلى بيانات التركيب الكيميائي بواسطة تحليل الأشعة السينية للطيفي (XRF) Explainable machine learning-based classification of traditional Korean ceramics using XRF chemical composition data

المجلة: npj Heritage Science، المجلد: 14، العدد: 1
DOI: https://doi.org/10.1038/s40494-026-02301-4
تاريخ النشر: 2026-01-14
المؤلف: Ye Eun Cho وآخرون
الموضوع الرئيسي: تحليل مواد التراث الثقافي

نظرة عامة

تقدم هذه الدراسة إطار عمل لتعلم الآلة القابل للتفسير لتصنيف الفخار الكوري التقليدي—تحديداً السيلادون، البونشونغ، والبورسلين الأبيض—باستخدام بيانات التركيب الكيميائي من الأشعة السينية. تم فحص مجموعة بيانات تتكون من 624 عينة من خلال ستة خوارزميات تعلم آلة، بما في ذلك تحليل المكونات الرئيسية-تحليل التمييز الخطي، شجرة القرار، الغابة العشوائية، تعزيز التدرج المتطرف، الجيران الأقرب، وآلة الدعم الناقل. حققت نماذج الغابة العشوائية وتعزيز التدرج المتطرف أعلى دقة تصنيف بنسبة 95.8%. ومن الجدير بالذكر أنه بينما تم التعرف على البورسلين الأبيض باستمرار عبر جميع النماذج، أظهر السيلادون والبونشونغ تصنيفاً خاطئاً جزئياً بسبب تداخل الخصائص الكيميائية.

لزيادة قابلية تفسير النموذج، تم استخدام تفسيرات شابلي الإضافية، مما كشف أن Fe$_2$O$_3$ وTiO$_2$ كانا المكونين الأكثر أهمية في التأثير على تمييز النوع، مما يتماشى مع الآليات المعروفة لتلوين الفخار. تؤكد هذه النتائج على إمكانيات تعلم الآلة القابل للتفسير في تصنيف الفخار القائم على الكيمياء، مقدمة إطار عمل كمي يعزز الطرق التقليدية في دراسة التراث الثقافي.

الطرق

يستعرض قسم “الطرق” في ورقة البحث التصميم التجريبي والتقنيات التحليلية المستخدمة للتحقيق في أسئلة البحث. يوضح معايير اختيار المشاركين، وإجراءات جمع البيانات، والتحليلات الإحصائية المستخدمة لتفسير النتائج. يتم وصف منهجيات محددة، مثل الاستطلاعات، التجارب المنضبطة، أو النماذج الحسابية، لضمان إمكانية التكرار والشفافية.

بالإضافة إلى ذلك، قد يتضمن القسم معلومات حول الأدوات والبرامج المستخدمة لتحليل البيانات، فضلاً عن أي نماذج رياضية أو معادلات ذات صلة تدعم إطار الدراسة. يتم التأكيد على صرامة الطرق للتحقق من النتائج ودعم الاستنتاجات المستخلصة من البحث. بشكل عام، يعمل هذا القسم كأساس حاسم لفهم نتائج الدراسة وآثارها في السياق الأوسع للمجال.

النتائج

في هذه الدراسة، أظهر تقييم مقارن لستة نماذج تعلم آلة (ML) لتصنيف الفخار بناءً على بيانات التركيب من الأشعة السينية (XRF) أن الغابة العشوائية (RF) وتعزيز التدرج المتطرف (XGB) حققا أعلى دقة تصنيف بنسبة 95.8% على مجموعة الاختبار، تليها آلة الدعم الناقل (SVM) بنسبة 93.3%، والجيران الأقرب (KNN)، وأشجار القرار (DT)، وتحليل التمييز الخطي (PCA-LDA) بدقة أقل (88.3% و85.0% على التوالي). أشارت الفروق الطفيفة بين دقتي التدريب والاختبار عبر جميع النماذج إلى التحكم الفعال في الإفراط في التكيف وأداء التعميم المستقر. ومن الجدير بالذكر أن الأداء المنخفض لـ PCA-LDA تم عزوه إلى اعتماده على الحدود القرارية الخطية وفقدان محتمل للتغيرات الكيميائية الحرجة أثناء تحويل PCA.

كشفت التحليلات الإضافية لنموذج DT أنه يمكنه تصنيف البورسلين الأبيض باستخدام قواعد قرار أبسط مقارنة بالمسارات الأكثر تعقيداً المطلوبة للسيلادون والبونشونغ، مما يشير إلى اختلافات تركيبية مميزة. تم ربط الأداء المتفوق لـ RF وXGB بهياكلها التجميعية، التي تمثل تفاعلات الميزات المعقدة بفعالية وتقلل من التباين من خلال تقنيات مثل التجميع وتصحيح المتبقيات بشكل تكراري، على التوالي. أكدت الاختبارات الإحصائية أن RF وXGB تفوقا بشكل كبير على PCA-LDA وDT، بينما أظهرت SVM أيضاً تحسناً كبيراً مقارنة بـ PCA-LDA وDT. تؤكد هذه النتائج على المزايا المنهجية للتعلم التجميعي في التطبيقات الكيميائية، مما يبرز أهمية هيكل النموذج في تحقيق نتائج تصنيف قوية.

المناقشة

في هذه الدراسة، تم استخدام تحليل الأشعة السينية (XRF) للتحقيق في التركيب العنصري للفخار الكوري التقليدي، تحديداً السيلادون، البونشونغ، والبورسلين الأبيض من سلالتي غوريو وجوسون. تتكون مجموعة البيانات من 624 عينة، تم تنسيقها وتنظيمها بدقة لضمان جودة البيانات، مع التركيز على عشرة أكاسيد عنصرية رئيسية. تم تنفيذ استراتيجية تحقق من مرحلتين، باستخدام كل من مجموعات البيانات الداخلية والخارجية لتقييم أداء النموذج وقابلية تعميمه. تم اختيار ستة نماذج تعلم آلة (ML) كلاسيكية للتصنيف، نظراً لصغر حجم مجموعة البيانات وطبيعتها التركيبية، مع التركيز على قابلية التفسير والأداء التنبؤي.

أشارت النتائج إلى أن النماذج القائمة على الأشجار، وخاصة الغابة العشوائية (RF) وتعزيز التدرج المتطرف (XGB)، تفوقت على غيرها في دقة التصنيف، حيث حققت 93.2% و91.5% على التوالي على مجموعة بيانات تحقق خارجية. كشفت اتجاهات التصنيف الخاطئ أن السيلادون والبونشونغ غالباً ما يتم الخلط بينهما بسبب تشابه تركيبتهما، بينما تم تصنيف البورسلين الأبيض بشكل مميز. سلط تحليل أهمية الميزات باستخدام قيم SHAP الضوء على TiO₂ وFe₂O₃ كعوامل حاسمة في التصنيف، مما يتماشى مع المعرفة الراسخة بآليات تلوين الفخار. تؤكد الدراسة على إمكانيات تعلم الآلة القابل للتفسير في السياقات الأثرية، مقدمة رؤى حول الخصائص المادية وتقنيات الإنتاج للفخار التاريخي.

Journal: npj Heritage Science, Volume: 14, Issue: 1
DOI: https://doi.org/10.1038/s40494-026-02301-4
Publication Date: 2026-01-14
Author(s): Ye Eun Cho et al.
Primary Topic: Cultural Heritage Materials Analysis

Overview

This study introduces an explainable machine learning framework for classifying traditional Korean ceramics—specifically celadon, buncheong, and white porcelain—utilizing X-ray fluorescence chemical composition data. A dataset comprising 624 samples was examined through six machine learning algorithms, including principal component analysis-linear discriminant analysis, decision tree, random forest, extreme gradient boosting, k-nearest neighbors, and support vector machine. The random forest and extreme gradient boosting models achieved the highest classification accuracy of 95.8%. Notably, while white porcelain was consistently identified across all models, celadon and buncheong exhibited partial misclassification due to overlapping chemical characteristics.

To enhance model interpretability, Shapley additive explanations were employed, revealing that Fe$_2$O$_3$ and TiO$_2$ were the most significant components influencing type differentiation, aligning with known mechanisms of ceramic coloration. These findings underscore the potential of explainable machine learning in chemical-based ceramic classification, offering a quantitative framework that enriches traditional typological methods in the study of cultural heritage.

Methods

The “Methods” section of the research paper outlines the experimental design and analytical techniques employed to investigate the research questions. It details the selection criteria for participants, the data collection procedures, and the statistical analyses utilized to interpret the results. Specific methodologies, such as surveys, controlled experiments, or computational models, are described to ensure reproducibility and transparency.

Additionally, the section may include information on the tools and software used for data analysis, as well as any relevant mathematical models or equations that underpin the study’s framework. The rigor of the methods is emphasized to validate the findings and support the conclusions drawn from the research. Overall, this section serves as a critical foundation for understanding the study’s outcomes and their implications in the broader context of the field.

Results

In this study, a comparative evaluation of six machine learning (ML) models for ceramic classification based on X-ray fluorescence (XRF) compositional data revealed that Random Forest (RF) and Extreme Gradient Boosting (XGB) achieved the highest classification accuracy of 95.8% on the test set, followed by Support Vector Machine (SVM) at 93.3%, and K-Nearest Neighbors (KNN), Decision Trees (DT), and PCA-Linear Discriminant Analysis (PCA-LDA) at lower accuracies (88.3% and 85.0%, respectively). The minimal discrepancies between training and test accuracies across all models indicated effective control of overfitting and stable generalization performance. Notably, PCA-LDA’s lower performance was attributed to its reliance on linear decision boundaries and potential loss of critical chemical variations during PCA transformation.

Further analysis of the DT model revealed that it could classify white porcelain using simpler decision rules compared to the more complex pathways required for celadon and buncheong, suggesting distinct compositional differences. The superior performance of RF and XGB was linked to their ensemble structures, which effectively model intricate feature interactions and reduce variance through techniques like bagging and iterative correction of residuals, respectively. Statistical tests confirmed that RF and XGB significantly outperformed PCA-LDA, DT, and KNN, while SVM also showed significant improvement over PCA-LDA and DT. These results underscore the methodological advantages of ensemble learning in chemometric applications, emphasizing the importance of model architecture in achieving robust classification outcomes.

Discussion

In this study, X-ray fluorescence (XRF) analysis was employed to investigate the elemental composition of traditional Korean ceramics, specifically celadon, buncheong, and white porcelain from the Goryeo and Joseon dynasties. The dataset comprised 624 samples, meticulously curated and normalized to ensure data quality, with a focus on ten major elemental oxides. A two-stage validation strategy was implemented, utilizing both internal and external datasets to evaluate model performance and generalizability. Six classical machine learning (ML) models were selected for classification, given the dataset’s relatively small size and compositional nature, with emphasis on interpretability and predictive performance.

The results indicated that tree-based models, particularly Random Forest (RF) and Extreme Gradient Boosting (XGB), outperformed others in classification accuracy, achieving 93.2% and 91.5% respectively on an external validation dataset. Misclassification trends revealed that celadon and buncheong were often confused due to their compositional similarities, while white porcelain was distinctly classified. Feature importance analysis using SHAP values highlighted TiO₂ and Fe₂O₃ as critical determinants in classification, aligning with established knowledge of ceramic coloration mechanisms. The study underscores the potential of explainable ML in archaeological contexts, providing insights into the material properties and production technologies of historical ceramics.