إطار تقييم للذكاء الاصطناعي القابل للتفسير مع تطبيقات في الأمن السيبراني An assessment framework for explainable AI with applications to cybersecurity

المجلة: Artificial Intelligence Review، المجلد: 58، العدد: 5
DOI: https://doi.org/10.1007/s10462-025-11141-w
تاريخ النشر: 2025-02-27
المؤلف: Maria Carla Calzarossa وآخرون
الموضوع الرئيسي: الصلابة ضد الهجمات في تعلم الآلة

نظرة عامة

تتناول هذه الورقة الحاجة إلى مقارنة منهجية لطرق الذكاء الاصطناعي القابلة للتفسير، لا سيما في سياق الأمن السيبراني، مع التركيز على اكتشاف مواقع التصيد. يقترح المؤلفون إطار عمل يقيم التفسيرات البديلة بناءً على التعقيد والصلابة، باستخدام مجموعة بيانات متاحة للجمهور من الميزات من كل من الصفحات الضارة والشرعية. منهجيتهم تحدد بفعالية طريقة القابلية للتفسير التي توازن بين انخفاض التعقيد وارتفاع الصلابة، مما يعزز اكتشاف الهجمات الاحتيالية وتصميم آليات الأمان.

تستند المنهجية إلى مفهومين رئيسيين: زونيدات لورينز، التي تقيس تركيز التفسيرات المتعلقة بنموذج (مع زيادة التركيز تشير إلى انخفاض التعقيد)، وإجراء bootstrap الذي يولد فترات ثقة للتفسيرات، ممثلة كرسوم بيانية للصندوق. تظهر النتائج أن نهجهم يمكن أن يختار نماذج التعلم الآلي التي ليست دقيقة فحسب، بل أيضًا قوية وبسيطة. يؤكد المؤلفون أنه بينما تعتبر البساطة معيارًا حاسمًا لاختيار النموذج، يجب أن تكملها اعتبارات خاصة بالمجال. كما أنهم يعترفون بالمتطلبات الحسابية لعملية bootstrap، التي، رغم كونها مكثفة، أقل عبئًا من التقنيات غير المعتمدة على النموذج مثل قيم شابلي. ستهدف الأبحاث المستقبلية إلى تكييف هذه المنهجية للمدخلات غير المنظمة واستكشاف الهجمات العدائية، مما يعزز الثقة العامة في تطبيقات التعلم الآلي في اكتشاف التصيد.

مقدمة

تسلط مقدمة الورقة الضوء على الأهمية المتزايدة لطرق التعلم الآلي (ML) في تعزيز تطبيقات الذكاء الاصطناعي (AI) عبر مختلف القطاعات. على الرغم من قدراتها التنبؤية المثيرة للإعجاب، تعاني العديد من نماذج التعلم الآلي من نقص في القابلية للتفسير البديهية، مما أثار مخاوف بين المنظمين والسلطات بشأن المخاطر المرتبطة بأنظمة الذكاء الاصطناعي غير القابلة للتفسير. استجابةً لذلك، اقترحت المفوضية الأوروبية قانون الذكاء الاصطناعي، الذي يبرز الحاجة إلى أن تلبي نماذج التعلم الآلي معايير الدقة والقابلية للتفسير والعدالة والصلابة. يتم تعريف القابلية للتفسير على أنها قدرة أصحاب المصلحة على فهم كل من المخرجات والأسس وراءها، مما يتطلب تفسيرات واضحة لعمليات النموذج.

تستعرض الورقة مقاييس مختلفة لقياس القابلية للتفسير عبر أنواع مختلفة من نماذج التعلم الآلي، مثل الانحدار، وطرق التجميع، والشبكات العصبية. توفر هذه المقاييس، على الرغم من عدم كونها غير معتمدة على النموذج، رؤى حول أهمية المتنبئين ومساهماتهم في قرارات النموذج. يقترح المؤلفون منهجية لتقييم ومقارنة التفسيرات من نماذج التعلم الآلي المختلفة، بهدف تحديد التفسيرات الأكثر بساطة وصلابة. يطبقون هذا الإطار على دراسة حالة في مجال الأمن السيبراني، مع التركيز بشكل خاص على اكتشاف التصيد، الذي يشكل مخاطر كبيرة على الأفراد والشركات. تظهر النتائج فعالية النهج المقترح في تحديد نماذج أبسط وأكثر موثوقية، مما يمهد الطريق لمزيد من الاستكشاف في الأقسام اللاحقة من الورقة.

طرق

في هذا القسم، يوضح المؤلفون مساهماتهم المنهجية في تقييم طرق الذكاء الاصطناعي القابلة للتفسير، مع التركيز على مقارنة التفسيرات المستمدة من نماذج التعلم الآلي المختلفة باستخدام مجموعة بيانات متسقة. يؤكدون على مبدئين رئيسيين: بساطة النموذج، المسترشد بشفرة أوكام، وصلابة النموذج. يتم تقييم البساطة من خلال تركيز التفسيرات، الذي يتم قياسه باستخدام زونيد لورينز، الذي يعمل كعمومية لمعامل جيني. يوضح المؤلفون حساب زونيد لورينز، موضحين تطبيقه على تفسيرات الميزات في نماذج التعلم الآلي. يتم تقييم الصلابة باستخدام طريقة bootstrap، التي تولد تفسيرات متعددة لكل متغير، والتي يتم تصورها بعد ذلك من خلال الرسوم البيانية للصندوق لتحديد أهميتها مقارنة بنموذج أساسي.

تُطبق المنهجية على اكتشاف مواقع التصيد، باستخدام خمسة مصنفات لتدريب 100 نموذج تعلم آلي. تشير النتائج إلى دقة عالية وقيم منطقة تحت منحنى ROC (AUC) لمعظم المصنفات، مع اختيار الانحدار اللوجستي (LR) وغابة عشوائية (RF) وآلة الدعم الناقل (SVM) لمزيد من التحليل. يكشف تحليل أهمية الميزات عن تباين عبر المصنفات، حيث تظهر RF التفسيرات الأكثر تركيزًا وبساطة. تؤكد تمثيلات الرسوم البيانية للصندوق على أهمية التفسيرات، مؤكدة أن نماذج RF توفر أكثر الرؤى موثوقية حول أهمية الميزات، مما يعزز قابلية تفسير مخرجات النموذج في سياق الأمن السيبراني.

نقاش

في قسم النقاش من الورقة، يستعرض المؤلفون المشهد الحالي للقابلية للتفسير في التعلم الآلي، لا سيما في سياق اكتشاف التصيد. يؤكدون أنه لكي يُعتبر نموذج التعلم الآلي موثوقًا، يجب أن يلبي متطلبات داخلية مختلفة، بما في ذلك القابلية للتفسير، التي اكتسبت أهمية متزايدة جنبًا إلى جنب مع ظهور نماذج “الصندوق الأسود”. يبرز المؤلفون أنه بينما تعتبر بعض النماذج، مثل الانحدار الخطي والانحدار اللوجستي، قابلة للتفسير بطبيعتها، تتطلب أخرى، مثل الشبكات العصبية وآلات الدعم الناقل، جهود حسابية إضافية لتحقيق القابلية للتفسير. يقدمون قيم شابلي كطريقة لفهم مساهمات الميزات في توقعات النموذج، على الرغم من أنهم يشيرون إلى الكثافة الحسابية المعنية في حساب هذه القيم.

ثم ينتقل المؤلفون إلى اكتشاف التصيد، مشددين على أهمية التعرف في الوقت المناسب على المواقع الضارة التي تقلد المواقع الشرعية. يناقشون مختلف أساليب التعلم الآلي التي تم استخدامها لاستخراج الميزات من عناوين URL ومحتوى HTML، مشيرين إلى أنه بينما تظهر هذه النماذج دقة عالية، غالبًا ما تفتقر إلى الشفافية فيما يتعلق بعمليات اتخاذ القرار الخاصة بها. تقدم الورقة منهجية جديدة تقيم قابلية تفسير النموذج دون العبء الحسابي المرتبط بالتقنيات غير المعتمدة على النموذج مثل قيم شابلي. تستخدم هذه المنهجية زونيدات لورينز لتقييم تركيز التفسيرات وتستخدم إجراء bootstrap لتوليد فترات ثقة لهذه التفسيرات. يخلص المؤلفون إلى أن نهجهم يحدد بفعالية النماذج الأكثر دقة وصلابة مع الحفاظ على البساطة، ويحددون اتجاهات البحث المستقبلية، بما في ذلك توسيع منهجيتهم لتشمل المدخلات غير المنظمة ومعالجة الهجمات العدائية في التعلم الآلي.

Journal: Artificial Intelligence Review, Volume: 58, Issue: 5
DOI: https://doi.org/10.1007/s10462-025-11141-w
Publication Date: 2025-02-27
Author(s): Maria Carla Calzarossa et al.
Primary Topic: Adversarial Robustness in Machine Learning

Overview

This paper addresses the need for a systematic comparison of explainable AI methods, particularly in the context of cybersecurity, focusing on phishing website detection. The authors propose a framework that evaluates alternative explanations based on complexity and robustness, utilizing a publicly available dataset of features from both malicious and legitimate web pages. Their methodology effectively identifies the explainability method that balances low complexity with high robustness, thus enhancing the detection of fraudulent attacks and the design of security mechanisms.

The methodology is grounded in two key concepts: Lorenz Zonoids, which measure the concentration of explanations related to a model (with higher concentration indicating lower complexity), and a bootstrap procedure that generates confidence intervals for explanations, represented as boxplots. The results demonstrate that their approach can select machine learning models that are not only accurate but also robust and simple. The authors emphasize that while parsimony is a critical criterion for model selection, it should be complemented by domain-specific considerations. They also acknowledge the computational demands of bootstrapping, which, while intensive, is less burdensome than model-agnostic techniques like Shapley values. Future research will aim to adapt this methodology for unstructured inputs and explore adversarial attacks, enhancing the overall trustworthiness of machine learning applications in phishing detection.

Introduction

The introduction of the paper highlights the growing significance of Machine Learning (ML) methods in enhancing Artificial Intelligence (AI) applications across various sectors. Despite their impressive predictive capabilities, many ML models suffer from a lack of intuitive interpretability, which has raised concerns among regulators and authorities regarding the risks associated with unexplainable AI systems. In response, the European Commission has proposed the AI Act, which emphasizes the need for machine learning models to meet standards of accuracy, explainability, fairness, and robustness. Explainability is defined as the ability of stakeholders to understand both the outputs and the rationale behind them, necessitating clear explanations of model operations.

The paper outlines various metrics for measuring explainability across different types of ML models, such as regression, ensemble methods, and neural networks. These metrics, while not model agnostic, provide insights into the importance of predictors and their contributions to model decisions. The authors propose a methodology for assessing and comparing explanations from different ML models, aiming to identify the most parsimonious and robust explanations. They apply this framework to a case study in the cybersecurity domain, specifically focusing on phishing detection, which poses significant risks to individuals and businesses. The results demonstrate the effectiveness of the proposed approach in identifying simpler and more reliable models, setting the stage for further exploration in subsequent sections of the paper.

Methods

In this section, the authors outline their methodological contributions to the evaluation of explainable AI methods, focusing on the comparison of explanations derived from different machine learning models using a consistent dataset. They emphasize two key principles: model parsimony, guided by Occam’s razor, and model robustness. Parsimony is assessed through the concentration of explanations, quantified using the Lorenz Zonoid, which serves as a generalization of the Gini coefficient. The authors detail the calculation of the Lorenz Zonoid, illustrating its application to feature explanations in machine learning models. Robustness is evaluated using the bootstrap method, generating multiple explanations for each variable, which are then visualized through boxplots to determine their significance against a baseline model.

The methodology is applied to the detection of phishing websites, utilizing five classifiers to train 100 machine learning models. The results indicate high accuracy and Area Under the ROC Curve (AUC) values for most classifiers, with Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM) selected for further analysis. The feature importance analysis reveals variability across classifiers, with RF demonstrating the most concentrated and parsimonious explanations. Boxplot representations further validate the significance of the explanations, confirming that RF models provide the most robust insights into feature importance, thereby enhancing the interpretability of the model outputs in the context of cybersecurity.

Discussion

In the discussion section of the paper, the authors review the current landscape of explainability in machine learning, particularly in the context of phishing detection. They emphasize that for a machine learning model to be deemed trustworthy, it must meet various internal requirements, including explainability, which has gained prominence alongside the rise of “black-box” models. The authors highlight that while some models, like linear and logistic regression, are inherently explainable, others, such as neural networks and support vector machines, require additional computational efforts to achieve interpretability. They introduce Shapley values as a method for understanding feature contributions to model predictions, although they note the computational intensity involved in calculating these values.

The authors then transition to phishing detection, underscoring the importance of timely identification of malicious websites that mimic legitimate ones. They discuss various machine learning approaches that have been employed to extract features from URLs and HTML content, noting that while these models demonstrate high accuracy, they often lack transparency regarding their decision-making processes. The paper presents a novel methodology that evaluates model explainability without the computational burden associated with model-agnostic techniques like Shapley values. This methodology utilizes Lorenz Zonoids to assess the concentration of explanations and employs a bootstrap procedure to generate confidence intervals for these explanations. The authors conclude that their approach effectively identifies the most accurate and robust models while maintaining simplicity, and they outline future research directions, including extending their methodology to unstructured inputs and addressing adversarial attacks in machine learning.