طريقة جديدة لاكتشاف التسلل باستخدام تصنيف التجميع واختيار الميزات A new intrusion detection method using ensemble classification and feature selection

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-98604-w
PMID: https://pubmed.ncbi.nlm.nih.gov/40254667
تاريخ النشر: 2025-04-20
المؤلف: Pooyan Azizi Doost وآخرون
الموضوع الرئيسي: أمن الشبكات وكشف التسلل

نظرة عامة

تقدم هذه البحث نظام كشف التسلل الهجين (IDS) الذي يجمع بين الشبكات العصبية التلافيفية (CNNs) لاستخراج الميزات وخوارزمية الغابة العشوائية (RF) للتصنيف، بهدف تعزيز أمان الشبكة من خلال التعرف الفعال على التهديدات السيبرانية والتخفيف منها. تستفيد الطريقة المقترحة من CNNs لاستخراج الميزات ذات الصلة تلقائيًا من بيانات الشبكة، مما يقلل من الأبعاد والضوضاء، مما يسمح لمصنف RF بتحقيق تصنيف قوي للتسلل. تشير التقييمات التي أجريت على مجموعات بيانات KDD99 و UNSW-NB15 إلى أن النموذج يحقق دقة تصل إلى 97% ودقة تتجاوز 98%، متفوقًا على حلول IDS التقليدية المعتمدة على التعلم الآلي.

تسلط الدراسة الضوء على الأداء المتفوق للنموذج عبر مقاييس تقييم مختلفة، بما في ذلك معدل الكشف، الدقة، الدقة، الاسترجاع، ودرجة F1، مقارنةً بأساليب تقليل الميزات مثل تحليل المكونات الرئيسية (PCA) والخوارزميات الجينية. ومع ذلك، تعترف بالقيود، وخاصة الاعتماد على مجموعة بيانات KDD99 القديمة، التي قد لا تمثل التهديدات السيبرانية المعاصرة بشكل كافٍ. تشمل اتجاهات البحث المستقبلية اختبار النموذج مع مجموعات بيانات أكثر حداثة، واستكشاف قابليته للتوسع في البيئات الديناميكية مثل إنترنت الأشياء والشبكات السحابية، وتحسين وقت التنفيذ للتطبيقات واسعة النطاق. يتم وضع إطار IDS المقترح كحل عملي للسيناريوهات الواقعية، مع التأكيد على قابليته للتكيف مع أنماط المرور المتطورة وضرورة التحسين المستمر لمواجهة تحديات الأمن السيبراني الحديثة بشكل فعال.

طرق

في هذا القسم، يقدم المؤلفون تحليلًا مقارنًا لمختلف طرق التصنيف، مع تسليط الضوء على مقاييس الأداء مثل الدقة، الدقة، الاسترجاع، درجة F1، ووقت التنفيذ عبر خوارزميات مختلفة، بما في ذلك الغابة العشوائية (RFO)، والخوارزمية الجينية (GA)، وتحليل المكونات الرئيسية (PCA)، ونهجهم الهجين المقترح. يظهر الأسلوب المقترح أداءً متفوقًا بدقة تصل إلى 97.36%، ودقة 98.46%، واسترجاع 95.62%، ودرجة F1 تبلغ 96.27%، على الرغم من وقت التنفيذ الأطول قليلاً البالغ 14.21 ثانية. تشير النتائج إلى أنه بينما تعتبر طريقة الغابة العشوائية الأسرع، فإن النهج المقترح يحقق توازنًا جديرًا بالثناء بين الدقة والكفاءة.

تظهر تقييمات إضافية على مجموعات بيانات KDD99 و UNSW-NB15 معدل تعرف يبلغ 98.38% ودقة 84.36% لـ KDD99، إلى جانب دقة اختبار تبلغ 78.27% واسترجاع 98.44%. بالنسبة لمجموعة بيانات UNSW-NB15، فإن معدل التعرف أثناء الاختبار هو 96.71%، مع دقة تبلغ 82.7% واسترجاع 96.65%، مما يؤدي إلى درجة F1 تبلغ 86.99%. يؤكد المؤلفون أن نموذجهم الهجين CNN-RF يعزز بشكل كبير قدرات كشف التسلل من خلال أتمتة استخراج الميزات عبر الشبكات العصبية التلافيفية (CNNs) والتصنيف القوي عبر الغابة العشوائية (RF). لا يحسن هذا النهج سرعة ودقة التصنيف فحسب، بل يعالج أيضًا التحديات التي تواجه أنظمة كشف التسلل التقليدية (IDS) من خلال تصفية البيانات الضوضائية وتقليل الإفراط في التخصيص، مما يعزز قابلية تعميم النموذج وكفاءته.

نتائج

في قسم نتائج التقييم، يظهر أسلوب كشف التسلل المقترح أداءً متفوقًا عبر مقاييس متعددة مقارنةً بالخوارزميات التقليدية مثل NBFS و C4.5 و NBTree. تشير معدلات الاسترجاع إلى أن الحل المقترح يحدد بفعالية الحالات الإيجابية الحقيقية، محققًا دقة تزيد عن 99% ومعدل دقة يتجاوز 97%. يُعزى هذا الأداء العالي إلى دمج الشبكات العصبية التقليدية مع خوارزمية الغابة العشوائية (RF)، التي تختار بشكل مثالي الميزات ذات الصلة لكشف التسلل، مما يعزز دقة التصنيف.

تؤكد التحليلات الإضافية باستخدام مقياس F-Score فعالية الطريقة المقترحة، حيث تتفوق على المصنفات الأخرى من خلال التركيز على الميزات الأكثر معلوماتية. تشير النتائج إلى أن الحل المقترح لا يتفوق فقط في كشف التهديدات المعروفة، بل يظهر أيضًا وعدًا في تحديد الهجمات غير المعروفة، بمعدل كشف أعلى بكثير من الطرق التقليدية. بالإضافة إلى ذلك، فإن قدرة النهج المقترح على تقليل مجموعة الميزات مع الحفاظ على دقة عالية تبرز إمكانيته كحل قوي لكشف التسلل في تطبيقات الأمن السيبراني. بشكل عام، تسلط النتائج الضوء على فعالية دمج الشبكات العصبية مع خوارزميات RF لتحسين الأداء في تحديد كل من الحالات الطبيعية والتسللية.

مناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على التقدم في أنظمة كشف التسلل (IDS) من خلال تقنيات التعلم الآلي والتعلم العميق المختلفة. يؤكد على تطوير نماذج قوية قادرة على كشف الوصول غير المصرح به والأنشطة الخبيثة، لا سيما في البيئات المعقدة مثل إنترنت الأشياء وشبكات 5G. تشمل الأساليب الملحوظة استخدام الشبكات العصبية التلافيفية (CNNs) لاستخراج الميزات التكيفية، مما يعزز دقة وكفاءة مصنفات الغابة العشوائية (RF). على سبيل المثال، تم تقديم نهج مشفر تلقائي هجين وتعديل تحسين سرب الجسيمات (HAEMPSO) لاختيار الميزات، بينما حقق نهج قائم على التجميع باستخدام CatBoost دقة ملحوظة تبلغ 99.96% على مجموعة بيانات KDDCup99.

على الرغم من هذه التقدمات، تحدد الورقة التحديات المستمرة في IDS الحالية، مثل التعقيد الحسابي العالي وعدم توازن الفئات في مجموعات البيانات مثل KDD99. تعالج الحل المقترح، وهو نموذج هجين CNN-RF، هذه القضايا من خلال أتمتة اختيار الميزات، مما يقلل بشكل كبير من أبعاد البيانات، ويحسن دقة التصنيف. يقوم مكون CNN بتصفية الميزات غير ذات الصلة بفعالية، مما يسمح لخوارزمية RF بالتركيز على السمات الحرجة، وبالتالي تعزيز الأداء العام. تظهر النتائج التجريبية على مجموعات بيانات مثل UNSW-NB15 و KDD99 تفوق هذا النهج على الطرق التقليدية، مما يجعله حلاً قابلاً للتوسع وفعالاً لتحديات أمان الشبكات الحديثة.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-98604-w
PMID: https://pubmed.ncbi.nlm.nih.gov/40254667
Publication Date: 2025-04-20
Author(s): Pooyan Azizi Doost et al.
Primary Topic: Network Security and Intrusion Detection

Overview

This research presents a hybrid Intrusion Detection System (IDS) that combines Convolutional Neural Networks (CNNs) for feature extraction and the Random Forest (RF) algorithm for classification, aimed at enhancing network security by effectively identifying and mitigating cyber threats. The proposed method leverages CNNs to automatically extract relevant features from network data, thereby reducing dimensionality and noise, which allows the RF classifier to achieve robust intrusion classification. Evaluations conducted on the KDD99 and UNSW-NB15 datasets indicate that the model achieves an accuracy of 97% and precision exceeding 98%, outperforming traditional machine learning-based IDS solutions.

The study highlights the model’s superior performance across various evaluation metrics, including detection rate, accuracy, precision, recall, and F1-score, compared to feature reduction methods such as Principal Component Analysis (PCA) and genetic algorithms. However, it acknowledges limitations, particularly the reliance on the outdated KDD99 dataset, which may not adequately represent contemporary cyber threats. Future research directions include testing the model with more current datasets, exploring its scalability in dynamic environments like IoT and cloud networks, and optimizing execution time for large-scale applications. The proposed IDS framework is positioned as a practical solution for real-world scenarios, emphasizing its adaptability to evolving traffic patterns and the necessity for ongoing enhancements to address modern cybersecurity challenges effectively.

Methods

In this section, the authors present a comparative analysis of various classification methods, highlighting the performance metrics of accuracy, precision, recall, F1-score, and execution time across different algorithms, including Random Forest (RFO), Genetic Algorithm (GA), Principal Component Analysis (PCA), and their proposed hybrid approach. The proposed method demonstrates superior performance with an accuracy of 97.36%, precision of 98.46%, recall of 95.62%, and an F1-score of 96.27%, despite a slightly longer execution time of 14.21 seconds. The results indicate that while the Random Forest method is the fastest, the proposed approach achieves a commendable balance between accuracy and efficiency.

Further evaluations on the KDD99 and UNSW-NB15 datasets reveal a recognition rate of 98.38% and precision of 84.36% for KDD99, alongside a test precision of 78.27% and recall of 98.44%. For the UNSW-NB15 dataset, the recognition rate during testing is 96.71%, with precision at 82.7% and recall at 96.65%, resulting in an F1 score of 86.99%. The authors emphasize that their CNN-RF hybrid model significantly enhances intrusion detection capabilities by automating feature extraction through Convolutional Neural Networks (CNNs) and robust classification via Random Forest (RF). This approach not only improves classification speed and accuracy but also addresses challenges faced by traditional Intrusion Detection Systems (IDS) by filtering noisy data and reducing overfitting, thereby enhancing model generalizability and efficiency.

Results

In the evaluation results section, the proposed intrusion detection method demonstrates superior performance across multiple metrics compared to traditional algorithms such as NBFS, C4.5, and NBTree. The recall rates indicate that the proposed solution effectively identifies true positive cases, achieving over 99% precision and an accuracy rate exceeding 97%. This high performance is attributed to the integration of conventional neural networks with the Random Forest (RF) algorithm, which optimally selects relevant features for intrusion detection, thereby enhancing classification accuracy.

Further analysis using the F-Score metric confirms the proposed method’s effectiveness, as it outperforms other classifiers by focusing on the most informative features. The results indicate that the proposed solution not only excels in detecting known threats but also shows promise in identifying unknown attacks, with a detection rate significantly higher than conventional methods. Additionally, the proposed approach’s ability to reduce the feature set while maintaining high accuracy underscores its potential as a robust solution for intrusion detection in cybersecurity applications. Overall, the findings highlight the effectiveness of combining neural networks with RF algorithms for improved performance in identifying both normal and intrusive instances.

Discussion

The discussion section of the research paper highlights advancements in intrusion detection systems (IDS) through various machine learning and deep learning techniques. It emphasizes the development of robust models capable of detecting unauthorized access and malicious activities, particularly in complex environments like IoT and 5G networks. Notable methods include the use of Convolutional Neural Networks (CNNs) for adaptive feature extraction, which enhances the accuracy and efficiency of Random Forest (RF) classifiers. For instance, a hybrid autoencoder and modified particle swarm optimization (HAEMPSO) approach has been introduced for feature selection, while an ensemble-based method utilizing CatBoost has achieved a remarkable 99.96% accuracy on the KDDCup99 dataset.

Despite these advancements, the paper identifies persistent challenges in existing IDS, such as high computational complexity and class imbalance in datasets like KDD99. The proposed solution, a CNN-RF hybrid model, addresses these issues by automating feature selection, significantly reducing data dimensionality, and improving classification accuracy. The CNN component effectively filters out irrelevant features, allowing the RF algorithm to focus on critical attributes, thus enhancing overall performance. Experimental results on datasets like UNSW-NB15 and KDD99 demonstrate the superiority of this approach over traditional methods, making it a scalable and efficient solution for modern network security challenges.