تعزيز كشف التسلل: نهج هجين من التعلم الآلي والتعلم العميق Enhancing intrusion detection: a hybrid machine and deep learning approach

المجلة: Journal of Cloud Computing Advances Systems and Applications، المجلد: 13، العدد: 1
DOI: https://doi.org/10.1186/s13677-024-00685-x
تاريخ النشر: 2024-07-17
المؤلف: Muhammad Sajid وآخرون
الموضوع الرئيسي: أمن الشبكات وكشف التسلل

نظرة عامة

تقدم ورقة البحث نموذجًا هجينًا لاكتشاف التسلل (ID) يدمج تقنيات التعلم الآلي (ML) والتعلم العميق (DL) لمعالجة التحديات المتزايدة في أمان الشبكات بسبب الزيادة في حجم البيانات والهجمات المتطورة. يستخدم النموذج تعزيز التدرج المتطرف (XGBoost) والشبكات العصبية التلافيفية (CNN) لاستخراج الميزات، والتي يتم دمجها بعد ذلك مع الشبكات الذاكرة قصيرة وطويلة الأمد (LSTM) للتصنيف. تستخدم الدراسة أربعة مجموعات بيانات مرجعية—CIC IDS 2017، UNSW NB15، NSL KDD، وWSN DS—للمهام التصنيف الثنائي والمتعدد الفئات. تشير النتائج إلى أن النموذج الهجين المقترح يحقق معدل اكتشاف عالٍ ودقة مع الحفاظ على معدل قبول زائف (FAR) منخفض نسبيًا، مما يدل على فعاليته في تحديد التهديدات الجديدة.

على الرغم من النتائج الواعدة، تعترف الورقة بالقيود، لا سيما في التعامل مع عدم توازن البيانات وتعقيد النموذج، الذي يتطلب وقت تدريب أطول مقارنة بالطرق التقليدية. يقترح المؤلفون اتجاهات البحث المستقبلية، بما في ذلك استخدام مجموعات بيانات محدثة لاكتشاف التهديدات الناشئة مثل هجمات اليوم صفر، والتحقيق الإضافي في أنماط الهجوم، والتحسينات لتحسين الأداء على الفئات الأقل. بشكل عام، يظهر النموذج الهجين أداءً متفوقًا مقارنة بالأساليب الحالية المتطورة، ومع ذلك يبرز الحاجة إلى تحسينات مستمرة لمعالجة القضايا المتعلقة بمعدلات الاكتشاف والإنذارات الكاذبة.

مقدمة

تسلط مقدمة ورقة البحث هذه الضوء على التطور السريع للتقنيات مثل الحوسبة السحابية، والإنترنت، وأنظمة التحكم الصناعية، التي أدت إلى زيادة الاعتماد على الشبكات التواصلية المتطورة مثل 5G. لقد زاد هذا التطور أيضًا من خطر التسللات الشبكية، حيث يقوم الفاعلون الخبيثون بتطوير طرق جديدة لاستغلال الثغرات في هذه الأنظمة. إن التدابير الأمنية التقليدية تثبت عدم كفايتها ضد التهديدات المتزايدة التعقيد، مما يستلزم دمج تقنيات متقدمة مثل التعلم الآلي في أنظمة اكتشاف التسلل الشبكي (NIDS) وأنظمة منع التسلل (IPS). تناقش الورقة أساليب مختلفة تستفيد من نماذج التعلم الآلي والتعلم العميق، بما في ذلك أشجار القرار، وآلات الدعم الناقل، والنماذج الهجينة التي تجمع بين الشبكات الذاكرة قصيرة وطويلة الأمد (LSTM) والمشفّرات الذاتية، لتعزيز قدرات الاكتشاف ومعالجة التحديات التي تطرحها متجهات الهجوم المتطورة.

يقترح المؤلفون نموذجًا هجينًا جديدًا يجمع بين XGBoost وLSTM مع CNN لتحسين دقة وتعميم أنظمة اكتشاف التسلل. يهدف هذا النموذج إلى تحديد التهديدات المعروفة وغير المعروفة بفعالية من خلال استخدام هندسة الميزات والتعرف على الأنماط التسلسلية. تؤكد الدراسة على أهمية تقليل معدلات الإيجابيات الكاذبة، التي تمثل مشكلة كبيرة في الأنظمة الحالية، وتبرز قدرة النموذج على التكيف مع أنماط الهجوم الجديدة. توضح الورقة مساهماتها، بما في ذلك استخدام مجموعات بيانات مرجعية لاختيار الميزات وقابلية التطبيق العملي للنموذج المقترح من خلال تقييم شامل. كما يتم توضيح هيكل الورقة، مع تفاصيل المنهجية والنتائج التي تلي المقدمة.

طرق

في هذه الدراسة، تضمنت الإعدادات التجريبية نموذجًا هجينًا يدمج مجموعة متنوعة من المعلمات الفائقة لتحسين الأداء. كانت المعلمات الفائقة الرئيسية تشمل `max_depth`، التي تنظم تعقيد أشجار XGBoost لتخفيف الإفراط في التكيف وعدم التكيف، و`n_estimators`، التي تحدد عدد الأشجار في التجميع. تم ضبط الأخيرة لتحقيق توازن بين دقة النموذج وكفاءة التدريب. بالإضافة إلى ذلك، تم تعديل معلمة `gamma` لتعزيز فصل العقد ومنع الإفراط في التكيف من خلال تحديد عدد الانقسامات.

تضمنت بنية النموذج طبقات LSTM مكدسة لالتقاط الاعتماديات الزمنية المعقدة، مع إجراء تجارب لتحديد العمق الأمثل لاستخراج الأنماط المهمة. علاوة على ذلك، تم تعديل عدد الطبقات التلافيفية، جنبًا إلى جنب مع مجموعات مختلفة من حجم الخطوة وحجم النواة، لالتقاط البيانات الزمنية والمكانية ذات الصلة بفعالية. لم تؤدي الضبط الدقيق لهذه المعلمات الفائقة إلى تحسين اختيار الميزات ودقة التصنيف فحسب، بل أيضًا إلى تقليل التكاليف الحاسوبية من خلال التركيز على الميزات الأساسية. تم تصنيف مجموعات البيانات للتصنيف الثنائي إلى مجموعات حميدة وهجومية، مع تصنيف متعدد الفئات يميز بين الأنواع العادية وأنواع الهجوم، كما هو موضح في الجدول 3.

نتائج

تسلط نتائج الدراسة الضوء على أداء نماذج التعلم العميق المختلفة، وخاصة LSTM وGRU وRNN، في تصنيف البيانات من مجموعة بيانات NSL KDD. حقق نموذج LSTM الأمثل، الذي يستخدم 180 وحدة عبر الطبقات المخفية ووظيفة تنشيط ReLU، دقة اختبار تبلغ 89.26% ودرجة F1 تبلغ 98.64%، مع وقت تدريب قدره 253.76 ثانية. في المقابل، سجل أفضل نموذج GRU، الذي استخدم 120 وحدة ووظيفة تنشيط Softmax، درجة F1 تبلغ 95.04% ودقة اختبار تبلغ 86.10%، مستغرقًا 189.10 ثانية للتدريب. أظهر نموذج RNN دقة اختبار تبلغ 87.21% ودرجة F1 تبلغ 94.03%، مع مدة تدريب قدرها 139.55 ثانية.

شمل التحليل الإضافي الهيكل الهجين CNN-LSTM، الذي تفوق على التكوينات الأخرى من حيث الدقة على مجموعة بيانات CIC IDS 2017، محققًا 98.55% مع إعداد من خمس طبقات. كما فحصت الدراسة تأثير اختيار الميزات على أداء النموذج، كاشفة أن استخدام 52 ميزة أسفر عن أعلى درجة F1 ومعدل اكتشاف، بينما قدمت 60 ميزة أفضل دقة. بالنسبة لمجموعة بيانات WSN، تم تحديد النموذج الأمثل مع 18 ميزة، محققًا دقة تبلغ 96.95% ومعدل اكتشاف يبلغ 96.10%. بشكل عام، تؤكد النتائج فعالية نماذج التعلم العميق في مهام اكتشاف التسلل، مع تأثير التكوينات واختيار الميزات بشكل كبير على نتائج الأداء.

مناقشة

في قسم المناقشة من ورقة البحث، يستعرض المؤلفون التقدمات الأخيرة في منهجيات التعلم العميق (DL) والتعلم الآلي (ML) لاكتشاف الشذوذ ضمن أنظمة اكتشاف التسلل (IDS) التي تستفيد من الذكاء الاصطناعي (AI). يبرزون أساليب مختلفة، مثل دمج الشبكات العصبية المتكررة (RNNs) والشبكات العصبية العميقة ذات التغذية الأمامية (FFDNNs) لاكتشاف الهجمات الشبكية، بالإضافة إلى استخدام المشفّرات الذاتية النادرة والشبكات التنافسية التوليدية (GANs) لاستخراج الميزات وتعزيز النموذج. يؤكد المؤلفون على أهمية استخدام مجموعات بيانات متنوعة، بما في ذلك CIC IDS 2017، UNSW NB15، NSL KDD، وWSN DS، لتقييم فعالية نموذجهم الهجين المقترح، الذي يجمع بين XGBoost والشبكات العصبية التلافيفية (CNNs) لاستخراج الميزات يليها التصنيف باستخدام الشبكات الذاكرة قصيرة وطويلة الأمد (LSTM).

كما يوضح المؤلفون تقنيات معالجة البيانات الخاصة بهم، بما في ذلك التطبيع والتقييس، والتي تعتبر حاسمة لتحسين أداء الشبكات العصبية. يستخدمون تنظيم L2 لتخفيف الإفراط في التكيف وتعزيز تعميم النموذج. تم تصميم بنية النموذج الهجين للاستفادة من نقاط القوة في CNNs لاستخراج الميزات المكانية وLSTMs للتحليل الزمني، مما يحسن في النهاية اكتشاف التسللات في الوقت الحقيقي. تختتم المناقشة بالتأكيد على أهمية اختيار مجموعات البيانات المناسبة التي تشمل السجلات الحميدة والخبيثة، مما يضمن قابلية تطبيق النموذج على السيناريوهات الواقعية. بشكل عام، يهدف نظام IDS الهجين المقترح إلى تعزيز أمان أجهزة الشبكة ضد مجموعة متنوعة من الهجمات، مما يعزز التواصل القوي للنظام.

Journal: Journal of Cloud Computing Advances Systems and Applications, Volume: 13, Issue: 1
DOI: https://doi.org/10.1186/s13677-024-00685-x
Publication Date: 2024-07-17
Author(s): Muhammad Sajid et al.
Primary Topic: Network Security and Intrusion Detection

Overview

The research paper presents a hybrid model for Intrusion Detection (ID) that integrates Machine Learning (ML) and Deep Learning (DL) techniques to address the growing challenges in network security due to the increasing volume of data and sophisticated attacks. The model employs Extreme Gradient Boosting (XGBoost) and Convolutional Neural Networks (CNN) for feature extraction, which are then combined with Long Short-Term Memory networks (LSTM) for classification. The study utilizes four benchmark datasets—CIC IDS 2017, UNSW NB15, NSL KDD, and WSN DS—for both binary and multi-class classification tasks. The findings indicate that the proposed hybrid model achieves a high detection rate and accuracy while maintaining a relatively low False Acceptance Rate (FAR), demonstrating its effectiveness in identifying new threats.

Despite the promising results, the paper acknowledges limitations, particularly in handling data imbalance and the complexity of the model, which requires more training time compared to traditional methods. The authors suggest future research directions, including the use of updated datasets to detect emerging threats like zero-day attacks, further investigation into attack patterns, and enhancements to improve performance on minority classes. Overall, the hybrid model shows superior performance compared to existing state-of-the-art approaches, yet it highlights the need for ongoing improvements to address issues related to detection rates and false alarms.

Introduction

The introduction of this research paper highlights the rapid evolution of technologies such as cloud computing, the Internet, and industrial control systems, which have led to increased reliance on sophisticated communication networks like 5G. This evolution has also heightened the risk of network intrusions, as malicious actors develop new methods to exploit vulnerabilities in these systems. Traditional security measures are proving inadequate against increasingly complex threats, necessitating the integration of advanced techniques such as machine learning into Network Intrusion Detection Systems (NIDS) and Intrusion Prevention Systems (IPS). The paper discusses various approaches that leverage machine learning and deep learning models, including Decision Trees, Support Vector Machines, and hybrid models combining Long Short-Term Memory (LSTM) networks and autoencoders, to enhance detection capabilities and address the challenges posed by evolving attack vectors.

The authors propose a novel hybrid model that combines XGBoost and LSTM with CNN to improve the accuracy and generalization of intrusion detection systems. This model aims to effectively identify both known and unknown threats by utilizing feature engineering and sequential pattern recognition. The research emphasizes the importance of reducing false positive rates, which are a significant issue in current systems, and highlights the model’s ability to adapt to new attack patterns. The paper outlines its contributions, including the use of benchmark datasets for feature selection and the practical applicability of the proposed model through extensive evaluation. The structure of the paper is also outlined, detailing the methodology and results that follow the introduction.

Methods

In this study, the experimental setup involved a hybrid model that integrates various hyperparameters to optimize performance. Key hyperparameters included `max_depth`, which regulates the complexity of the XGBoost trees to mitigate overfitting and underfitting, and `n_estimators`, which determines the number of trees in the ensemble. The latter was fine-tuned to balance model accuracy and training efficiency. Additionally, the `gamma` hyperparameter was adjusted to enhance node separation and further prevent overfitting by limiting the number of splits.

The model architecture incorporated stacked Long Short-Term Memory (LSTM) layers to capture complex temporal dependencies, with experimentation conducted to identify the optimal depth for significant pattern extraction. Furthermore, the number of convolutional layers, along with various combinations of stride and kernel sizes, were adjusted to effectively capture relevant temporal and spatial data. The careful tuning of these hyperparameters not only improved feature selection and classification accuracy but also reduced computational costs by focusing on essential features. The datasets were categorized for binary classification into benign and assault groups, with multiclass classification distinguishing between normal and attack types, as detailed in Table 3.

Results

The results of the study highlight the performance of various deep learning models, specifically LSTM, GRU, and RNN, in classifying data from the NSL KDD dataset. The optimal LSTM model, utilizing 180 units across hidden layers and a ReLU activation function, achieved a test accuracy of 89.26% and an F1 score of 98.64%, with a training time of 253.76 seconds. In contrast, the best GRU model, which employed 120 units and a Softmax activation function, recorded an F1 score of 95.04% and a test accuracy of 86.10%, taking 189.10 seconds to train. The RNN model demonstrated a test accuracy of 87.21% and an F1 score of 94.03%, with a training duration of 139.55 seconds.

Further analysis involved the CNN-LSTM hybrid structure, which outperformed other configurations in terms of accuracy on the CIC IDS 2017 dataset, achieving 98.55% with a five-layer setup. The study also examined the impact of feature selection on model performance, revealing that using 52 features yielded the highest F1 score and detection rate, while 60 features provided the best accuracy. For the WSN dataset, the optimal model was identified with 18 features, achieving an accuracy of 96.95% and a detection rate of 96.10%. Overall, the findings underscore the effectiveness of deep learning models in intrusion detection tasks, with specific configurations and feature selections significantly influencing performance outcomes.

Discussion

In the discussion section of the research paper, the authors review recent advancements in deep learning (DL) and machine learning (ML) methodologies for anomaly detection within intrusion detection systems (IDS) that leverage artificial intelligence (AI). They highlight various approaches, such as the integration of recurrent neural networks (RNNs) and feed-forward deep neural networks (FFDNNs) for detecting network attacks, as well as the use of sparse autoencoders and generative adversarial networks (GANs) for feature extraction and model enhancement. The authors emphasize the importance of utilizing diverse datasets, including CIC IDS 2017, UNSW NB15, NSL KDD, and WSN DS, to evaluate the effectiveness of their proposed hybrid model, which combines XGBoost and convolutional neural networks (CNNs) for feature extraction followed by classification using long short-term memory (LSTM) networks.

The authors also detail their data preprocessing techniques, including normalization and scaling, which are crucial for optimizing the performance of neural networks. They employ L2 regularization to mitigate overfitting and enhance model generalization. The hybrid model architecture is designed to leverage the strengths of CNNs for spatial feature extraction and LSTMs for temporal analysis, ultimately improving the detection of intrusions in real-time. The discussion concludes with an emphasis on the significance of selecting appropriate datasets that encompass both benign and malicious records, ensuring the model’s applicability to real-world scenarios. Overall, the proposed hybrid IDS aims to bolster the security of network devices against a variety of attacks, thereby enhancing robust system communication.