مشفر تلقائي موجه بأهمية الميزات لتقليل الأبعاد في أنظمة كشف التسلل Feature importance guided autoencoder for dimensionality reduction in intrusion detection systems

المجلة: Scientific Reports، المجلد: 16، العدد: 1
DOI: https://doi.org/10.1038/s41598-026-36695-9
PMID: https://pubmed.ncbi.nlm.nih.gov/41639261
تاريخ النشر: 2026-02-04
المؤلف: M A Abdel-Rahman وآخرون
الموضوع الرئيسي: أمن الشبكات وكشف التسلل

نظرة عامة

في هذا القسم، يبرز المؤلفون أهمية أنظمة كشف التسلل (IDS) في حماية الشبكات الحاسوبية ودور تقنيات تقليل الأبعاد في تعزيز أداء IDS القائم على التعلم الآلي. يقدمون نهجًا جديدًا يسمى مشفر تلقائي قائم على أهمية الميزات (FI-AE)، والذي يدمج طريقة فريدة لتحديد أهمية الميزات تُعرف بأهمية الميزات من نوع واحد ضد الجميع (OVA) المستمدة من خوارزمية الغابة العشوائية. يستخدم FI-AE دالة خسارة موزونة أثناء تدريب نموذج المشفر التلقائي، مما يعكس أهمية الميزات المحددة من خلال طريقة OVA.

تم تقييم فعالية FI-AE المقترح باستخدام ثلاثة مجموعات بيانات معيارية: NSL-KDD و UNSW-NB15 و CIC-IDS2017. أظهرت النتائج أن الجمع بين مصنف الغابة العشوائية وتقنية FI-AE تفوق بشكل كبير على طرق تقليل الأبعاد الحالية، محققًا دقة أعلى ودرجات F1. تؤكد هذه الدراسة على إمكانيات نهج FI-AE في تحسين أداء IDS وتضع الأساس للبحوث المستقبلية في هذا المجال.

النتائج

في قسم النتائج، يوضح المؤلفون أداء مشفرهم التلقائي القائم على أهمية الميزات (FI-AE) في تعزيز دقة التصنيف على مجموعات البيانات المعيارية. بعد معالجة هذه المجموعات، تم تدريب المشفر التلقائي، وتم استخدام مشفره لتقليل أبعاد فضاء الميزات. ثم تم استخدام المجموعات المعدلة لتدريب مصنف الغابة العشوائية. تقارن الدراسة فعالية FI-AE ضد المشفرات التلقائية التقليدية وتحليل المكونات الرئيسية (PCA) من خلال تقييمات شاملة مقدمة في الأشكال 7 و 8، بالإضافة إلى الجداول 4 و 5 و 6.

بالإضافة إلى ذلك، يقارن المؤلفون FI-AE ضد تقنيات تقليل الأبعاد الحديثة الموجودة لأنظمة كشف التسلل (IDS) باستخدام مجموعات بيانات NSL-KDD و UNSW-NB15 و CIC-IDS2017، مع تقديم مقارنات مفصلة في الجداول 7 و 8 و 9. تعتمد النسخة الحالية من FI-AE على مجموعات بيانات مصنفة، مما يساعد في تصنيف أنواع الهجمات المعروفة. ومع ذلك، يعترف المؤلفون بالتحدي المتمثل في اكتشاف هجمات اليوم الصفري ويقترحون عملاً مستقبليًا سيجمع بين وزن واحد ضد الجميع (OVA) مع طرق الكشف عن الشذوذ شبه المراقبة أو غير المراقبة لتعزيز قدرات الكشف عن أنواع الهجمات التي لم تُرَ من قبل.

المناقشة

في قسم المناقشة من ورقة البحث، يستعرض المؤلفون المنهجيات الحالية لتقليل الأبعاد في أنظمة كشف التسلل (IDS)، مع تسليط الضوء على التمييز بين اختيار الميزات واستخراج الميزات. يحدد اختيار الميزات الميزات ذات الصلة من خلال خوارزميات تقيم أهميتها، بينما يقوم استخراج الميزات بتحويل الميزات الأصلية إلى فضاء منخفض الأبعاد، مع كون تقنيات مثل تحليل المكونات الرئيسية (PCA) والمشفرات التلقائية بارزة. يشير المؤلفون إلى أن PCA مفضل لسهولته، بينما أظهرت المشفرات التلقائية وعدًا في تطبيقات متنوعة، بما في ذلك IDS. كما يلخصون الطرق الهجينة التي تجمع بين استراتيجيات اختيار مختلفة ويستكشفون تقنيات متقدمة مثل الميتا-heuristics وطرق استخراج الميزات العميقة.

يقترح المؤلفون نموذج مشفر تلقائي جديد يدمج مقياس جديد لأهمية الميزات، يُعرف بأهمية الميزات من نوع واحد ضد الجميع (OVA)، والذي يهدف إلى تقليل التحيز في أهمية الميزات التقليدية للغابة العشوائية تجاه الفئات الغالبة. تحسب هذه الطريقة أهمية الميزات لكل فئة على حدة، مما يسمح بتمثيل أكثر توازنًا للميزات الحرجة للفئات الأقلية. بالإضافة إلى ذلك، يقدمون دالة خسارة متوسط مربع الخطأ الموزون (WMSE) التي تعطي الأولوية لإعادة بناء الميزات المهمة أثناء التدريب. تم تصميم مشفر تلقائي القائم على أهمية الميزات (FI-AE) لتعزيز عملية التعلم وتحسين دقة إعادة البناء، مما يؤدي في النهاية إلى أداء أفضل في مهام IDS. يختتم القسم بمناقشة الإعداد التجريبي، بما في ذلك مجموعات البيانات المستخدمة ومقاييس التقييم المستخدمة لتقييم فعالية النموذج المقترح.

القيود

يسلط قسم القيود الضوء على القيود الرئيسية للدراسة، لا سيما فيما يتعلق بتقييم أداء الطرق المقترحة مقارنة بالأساليب الحالية على مجموعة بيانات NSL-KDD. تقدم الجدول 7 تحليلًا مقارنًا، حيث تشير القيم بالخط العريض إلى أفضل أداء تم تحقيقه لكل مقياس بين الطرق التي تم تقييمها. تؤكد هذه المقارنة على فعالية التقنيات المقترحة ولكن تشير أيضًا إلى ضرورة المزيد من التحقق عبر مجموعات بيانات متنوعة لتعزيز القابلية للتعميم والصلابة.

علاوة على ذلك، تشير القيود إلى أنه بينما النتائج واعدة، قد لا تلتقط تمامًا تعقيدات السيناريوهات الواقعية، مما يستلزم مزيدًا من البحث لمعالجة العيوب المحتملة وتحسين قابلية تطبيق النتائج.

Journal: Scientific Reports, Volume: 16, Issue: 1
DOI: https://doi.org/10.1038/s41598-026-36695-9
PMID: https://pubmed.ncbi.nlm.nih.gov/41639261
Publication Date: 2026-02-04
Author(s): M A Abdel-Rahman et al.
Primary Topic: Network Security and Intrusion Detection

Overview

In this section, the authors highlight the significance of intrusion detection systems (IDS) in safeguarding computer networks and the role of dimensionality reduction techniques in enhancing machine learning-based IDS performance. They introduce a novel approach called the feature importance-based autoencoder (FI-AE), which integrates a unique feature importance method termed one-versus-all feature importance (OVA) derived from a random forest algorithm. The FI-AE employs a weighted loss function during the training of an autoencoder model, reflecting the importance of features identified through the OVA method.

The effectiveness of the proposed FI-AE was evaluated using three benchmark datasets: NSL-KDD, UNSW-NB15, and CIC-IDS2017. The results demonstrated that the combination of the random forest classifier with the FI-AE technique significantly outperformed existing dimensionality reduction methods, achieving superior accuracy and F1-scores. This study underscores the potential of the FI-AE approach in improving IDS performance and sets the stage for future research in this domain.

Results

In the results section, the authors detail the performance of their proposed feature importance autoencoder (FI-AE) in enhancing classification accuracy on benchmark datasets. Following the preprocessing of these datasets, the autoencoder was trained, and its encoder was utilized to reduce the dimensionality of the feature space. The modified datasets were then used to train a random forest classifier. The study compares the effectiveness of the FI-AE against traditional autoencoders and principal component analysis (PCA) through comprehensive evaluations presented in Figures 7 and 8, as well as Tables 4, 5, and 6.

Additionally, the authors benchmark the FI-AE against existing state-of-the-art dimensionality reduction techniques for intrusion detection systems (IDS) using the NSL-KDD, UNSW-NB15, and CIC-IDS2017 datasets, with detailed comparisons provided in Tables 7, 8, and 9. The current implementation of FI-AE relies on labeled datasets, which aids in the classification of known attack types. However, the authors acknowledge the challenge of detecting zero-day attacks and propose future work that will integrate one-vs-all (OVA) weighting with semi-supervised or unsupervised anomaly detection methods to enhance detection capabilities for previously unseen attack types.

Discussion

In the discussion section of the research paper, the authors review existing methodologies for dimensionality reduction in intrusion detection systems (IDS), highlighting the distinction between feature selection and feature extraction. Feature selection identifies relevant features through algorithms that assess their importance, while feature extraction transforms original features into a lower-dimensional space, with techniques like Principal Component Analysis (PCA) and autoencoders being prominent. The authors note that PCA is favored for its simplicity, while autoencoders have shown promise in various applications, including IDS. They also summarize hybrid methods that combine different selection strategies and explore advanced techniques like meta-heuristics and deep feature extraction approaches.

The authors propose a novel autoencoder model that incorporates a new feature importance metric, termed one-versus-all (OVA) feature importance, which aims to mitigate the bias of traditional random forest feature importance towards majority classes. This method computes feature importance for each class separately, allowing for a more balanced representation of features critical for minority classes. Additionally, they introduce a weighted mean square error (WMSE) loss function that prioritizes the reconstruction of important features during training. The proposed feature importance-based autoencoder (FI-AE) is designed to enhance the learning process and improve reconstruction accuracy, ultimately leading to better performance in IDS tasks. The section concludes with a discussion of the experimental setup, including the datasets used and the evaluation metrics employed to assess the effectiveness of the proposed model.

Limitations

The section on limitations highlights key constraints of the study, particularly in relation to the performance evaluation of the proposed methods against existing approaches on the NSL-KDD dataset. Table 7 presents a comparative analysis, where bold values signify the best performance achieved for each metric among the methods evaluated. This comparison underscores the effectiveness of the proposed techniques but also indicates the necessity for further validation across diverse datasets to enhance generalizability and robustness.

Moreover, the limitations suggest that while the results are promising, they may not fully capture the complexities of real-world scenarios, necessitating additional research to address potential shortcomings and improve the applicability of the findings.