استغلال نموذج التعلم العميق الهجين المتقدم للكشف والوقاية في الوقت الحقيقي من هجمات القرصنة من نوع الرجل في المنتصف Harnessing advanced hybrid deep learning model for real-time detection and prevention of man-in-the-middle cyber attacks

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-85547-5
PMID: https://pubmed.ncbi.nlm.nih.gov/39799192
تاريخ النشر: 2025-01-11
المؤلف: Vijayalakshmi Kandasamy وآخرون
الموضوع الرئيسي: أمن الشبكات وكشف التسلل

نظرة عامة

تقدم البحث نموذج AEXB، وهو إطار عمل هجين للتعلم العميق يجمع بين AutoEncoder وXGBoost لاكتشاف ومنع هجمات Man-in-the-Middle (MitM) بفعالية في بيئات المنازل الذكية. يظهر النموذج أداءً استثنائيًا، حيث حقق دقة تبلغ 97.24% على مجموعة بيانات كشف التسلل في المنازل الذكية (IDSH). يُعزى هذا النجاح إلى معالجة البيانات الدقيقة، وهندسة الميزات، وتقنيات التحسين، التي تعزز مجتمعة موثوقية النموذج مع تقليل الإيجابيات الكاذبة والمتطلبات الحاسوبية. علاوة على ذلك، يسمح قابلية توسيع نموذج AEXB بتطبيقه عبر شبكات إنترنت الأشياء (IoT) المختلفة، بما في ذلك أنظمة الرعاية الصحية والصناعية، حيث تمثل هجمات MitM مصدر قلق كبير.

بالنظر إلى المستقبل، يقترح المؤلفون دمج هياكل تعلم عميق إضافية، مثل الشبكات العصبية التلافيفية (CNNs)، لتحسين استخراج الميزات على مجموعات بيانات أكبر. كما يهدفون إلى توسيع مجموعة البيانات لتشمل طيفًا أوسع من أجهزة إنترنت الأشياء وسيناريوهات الهجوم، مما يحقق التحقق من قابلية تكيف النموذج. بالإضافة إلى ذلك، تم تحديد تنفيذ نموذج AEXB في أنظمة الوقت الحقيقي مع قدرات التعلم المستمر كاتجاه واعد لمعالجة التهديدات السيبرانية المتطورة، مما يسهم في النهاية في استراتيجية أمان سيبراني أكثر شمولاً.

مقدمة

تقدم مقدمة هذه الورقة نموذج AEXB، وهو إطار عمل هجين جديد يدمج خوارزميات AutoEncoder وXGBoost لتعزيز اكتشاف ومنع هجمات Man-in-the-Middle (MitM) في الوقت الحقيقي، وخاصة ضمن بيئات المنازل الذكية. يجمع النموذج بشكل فريد بين استخراج الميزات غير المراقب من خلال AutoEncoder مع قدرات التصنيف الفعالة لـ XGBoost، مما يستفيد من نقاط القوة في كلا المنهجين. تشمل المساهمات الرئيسية تقنيات هندسة الميزات الشاملة مثل تحليل الارتباط، وأهمية الميزات المستندة إلى الشجرة، وإزالة الميزات التكرارية (RFE)، والتي تصقل مجموعة البيانات لتشمل فقط الميزات المهمة، مما يحسن دقة النموذج ويقلل من الأعباء الحاسوبية.

بالإضافة إلى ذلك، يركز نموذج AEXB على معالجة البيانات بشكل قوي، حيث يستخدم تعويض الوسيط للقيم المفقودة، وتقييس الميزات العددية، وترميز البيانات الفئوية لتحسين مجموعة البيانات للتدريب الفعال. تم تصميم النموذج ليكون قابلًا للتوسع والتكيف، مما يجعله قابلًا للتطبيق في بيئات إنترنت الأشياء (IoT) المختلفة بخلاف المنازل الذكية، مثل أنظمة التحكم الصناعية وشبكات الرعاية الصحية. تميز هذه الدراسة نفسها عن الأدبيات الحالية من خلال دمج استخراج الميزات غير المراقب والتصنيف المراقب ضمن إطار عمل واحد، مع معالجة قابلية التوسع والتكيف، ودمج معالجة البيانات المرنة واختيار الميزات، والتي غالبًا ما يتم تجاهلها في أنظمة كشف التسلل الحالية. تم هيكلة الورقة لتقديم نظرة شاملة على الأعمال ذات الصلة، والمنهجية، والنتائج، واتجاهات البحث المستقبلية.

طرق

تركز المنهجية الموضحة في هذا البحث على نموذج AEXB، الذي يدمج AutoEncoder لاستخراج الميزات وتقليل الأبعاد مع XGBoost لمهام التصنيف. تتضمن المرحلة الأولية معالجة مجموعة بيانات كشف التسلل في المنازل الذكية (IDSH)، والتي تشمل معالجة القيم المفقودة، وتقييس الميزات العددية، وترميز المتغيرات الفئوية.

لتحسين جودة مجموعة البيانات، يستخدم المؤلفون تقنيات هندسة الميزات المتقدمة واختيار الميزات، مثل تحليل الارتباط، وأهمية الميزات المستندة إلى الشجرة، وإزالة الميزات التكرارية (RFE). تعتبر هذه الخطوات حاسمة لتحسين أداء النموذج. يتم توضيح هيكل النموذج المقترح في الشكل 2، مما يبرز الطبيعة الهجينة للنهج.

نتائج

تظهر نتائج هذه الدراسة فعالية تقنيات معالجة البيانات المختلفة واستخراج الميزات في تعزيز أداء نماذج التعلم الآلي لاكتشاف الهجمات السيبرانية في بيئات المنازل الذكية. تم تنفيذ النموذج باستخدام معالج Intel® Core™ i9 عالي الأداء، على الرغم من كونه محدودًا بـ 4 جيجابايت من ذاكرة الوصول العشوائي، مما يتطلب إدارة فعالة للذاكرة. تتكون مجموعة البيانات، المستمدة من مجموعة بيانات المنازل الذكية ID على Kaggle، من 4143 سجلًا مع 22 ميزة، بما في ذلك العمليات العادية ونوعين من الهجمات السيبرانية: Man-in-the-Middle (MitM) وDenial of Service (DoS). شملت خطوات المعالجة التعامل مع القيم المفقودة، والتطبيع، والترميز الفئوي، وزيادة البيانات، مما حسن مجتمعة دقة النموذج من 89.00% إلى ذروتها 97.24% عند تطبيق جميع التقنيات.

عززت تقنيات استخراج الميزات، وخاصة مزيج AutoEncoder وتحليل المكونات الرئيسية (PCA)، أداء النموذج بشكل أكبر. حقق النهج الهجين دقة تبلغ 97.24%، مع دقة واسترجاع ودرجات F1 تبلغ 0.96 و0.95 و0.96، على التوالي. بالإضافة إلى ذلك، تم تقييم طرق اختيار الميزات المختلفة، حيث حقق النهج الميتا أعلى مقاييس الأداء. تم إجراء التصنيف النهائي باستخدام XGBoost، الذي أظهر قدرات قوية في التعامل مع القيم المفقودة وتجنب الإفراط في التكيف، محققًا دقة تبلغ 93.00% وROC-AUC تبلغ 0.93. تؤكد هذه النتائج على الدور الحاسم لمعالجة البيانات المنهجية واستخراج الميزات في تطوير نماذج تنبؤية دقيقة للأمن السيبراني في أنظمة المنازل الذكية.

نقاش

تسلط قسم النقاش في ورقة البحث الضوء على الأهمية المتزايدة لأمان السحابة في العصر الرقمي، مع التأكيد على الثغرات التي تنشأ من الوصول الواسع للبيانات وتعقيد الهجمات السيبرانية، وخاصة هجمات Man-in-the-Middle (MitM) وهجمات اختطاف الجلسات. تستعرض الورقة الأدبيات الحالية، مشيرة إلى انخفاض في منشورات البحث حول هذه المواضيع من 2018 إلى 2021، وتحدد الهند وIEEE كمساهمين رئيسيين في هذا المجال. تشير النتائج إلى أنه على الرغم من تطوير طرق التشفير وأنظمة كشف التسلل الهجينة (IDS) لتعزيز الأمان، لا يزال هناك حاجة ملحة لمزيد من تدابير الأمان السيبراني القوية، خاصة في سياق التقنيات الناشئة مثل إنترنت الأشياء الصناعي (IIoT) والشبكات الذكية.

يجمع نموذج AEXB المقترح بين AutoEncoder وXGBoost لمعالجة قيود الأساليب التقليدية المستندة إلى القواعد والأساليب الحالية للتعلم الآلي في اكتشاف التهديدات السيبرانية المعقدة. يستفيد هذا النموذج من قدرات تقليل الأبعاد لـ AutoEncoder لاستخراج ميزات ذات مغزى من البيانات عالية الأبعاد، بينما يعزز XGBoost دقة التصنيف ويقلل من معدلات الإنذارات الكاذبة. تظهر الدراسة أن نموذج AEXB يحدد بفعالية أنواعًا مختلفة من الهجمات في بيئات المنازل الذكية، محققًا دقة كشف عالية وقدرات استجابة في الوقت الحقيقي. بشكل عام، تؤكد البحث على ضرورة وجود أنظمة كشف التسلل المتقدمة التي يمكن أن تتكيف مع التهديدات السيبرانية المتطورة، مما يسهم بشكل كبير في مجال الأمن السيبراني.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-85547-5
PMID: https://pubmed.ncbi.nlm.nih.gov/39799192
Publication Date: 2025-01-11
Author(s): Vijayalakshmi Kandasamy et al.
Primary Topic: Network Security and Intrusion Detection

Overview

The research presents the AEXB Model, a hybrid deep learning framework that combines AutoEncoder and XGBoost to effectively detect and prevent Man-in-the-Middle (MitM) attacks in smart home environments. The model demonstrates exceptional performance, achieving an accuracy of 97.24% on the Intrusion Detection in Smart Home (IDSH) dataset. This success is attributed to rigorous data preprocessing, feature engineering, and optimization techniques, which collectively enhance the model’s reliability while minimizing false positives and computational demands. Furthermore, the AEXB Model’s scalability allows for its application across various Internet of Things (IoT) networks, including healthcare and industrial systems, where MitM attacks are a significant concern.

Looking ahead, the authors propose to incorporate additional deep learning architectures, such as convolutional neural networks (CNNs), to improve feature extraction on larger datasets. They also aim to expand the dataset to encompass a broader spectrum of IoT devices and attack scenarios, thereby validating the model’s adaptability. Additionally, implementing the AEXB Model in real-time systems with continuous learning capabilities is identified as a promising direction for addressing evolving cyber threats, ultimately contributing to a more comprehensive cybersecurity strategy.

Introduction

The introduction of this paper presents the AEXB Model, a novel hybrid framework that integrates AutoEncoder and XGBoost algorithms to enhance the detection and prevention of Man-in-the-Middle (MitM) attacks in real-time, particularly within smart home environments. The model uniquely combines unsupervised feature extraction through AutoEncoder with the efficient classification capabilities of XGBoost, thereby leveraging the strengths of both methodologies. Key contributions include comprehensive feature engineering techniques such as correlation analysis, tree-based feature importance, and Recursive Feature Elimination (RFE), which refine the dataset to include only significant features, thus improving model accuracy and reducing computational overhead.

Additionally, the AEXB Model emphasizes robust data preprocessing, employing median imputation for missing values, scaling of numerical features, and encoding of categorical data to optimize the dataset for effective training. The model is designed to be scalable and adaptable, making it applicable to various Internet of Things (IoT) environments beyond smart homes, such as industrial control systems and healthcare networks. This work distinguishes itself from existing literature by integrating unsupervised feature extraction and supervised classification within a single framework, addressing scalability and adaptability, and incorporating flexible data preprocessing and feature selection, which are often overlooked in current intrusion detection systems. The paper is structured to provide a comprehensive overview of related works, methodology, results, and future research directions.

Methods

The methodology outlined in this research focuses on the AEXB Model, which integrates an AutoEncoder for feature extraction and dimensionality reduction with XGBoost for classification tasks. The initial phase involves preprocessing the Intrusion Detection in Smart Home (IDSH) dataset, which includes addressing missing values, scaling numerical features, and encoding categorical variables.

To enhance the dataset’s quality, the authors employ advanced feature engineering and selection techniques, such as correlation analysis, tree-based feature importance, and Recursive Feature Elimination (RFE). These steps are crucial for optimizing the model’s performance. The architecture of the proposed model is illustrated in Figure 2, highlighting the hybrid nature of the approach.

Results

The results of this study demonstrate the effectiveness of various data preprocessing and feature extraction techniques in enhancing the performance of machine learning models for detecting cyber-attacks in smart home environments. The model was implemented using a high-performance Intel® Core™ i9 processor, although limited by 4GB of RAM, necessitating efficient memory management. The dataset, sourced from the Smart Home ID Dataset on Kaggle, comprises 4143 records with 22 features, including both normal operations and two types of cyber-attacks: Man-in-the-Middle (MitM) and Denial of Service (DoS). The preprocessing steps included handling missing values, normalization, categorical encoding, and data augmentation, which collectively improved model accuracy from 89.00% to a peak of 97.24% when all techniques were applied.

Feature extraction techniques, particularly the combination of AutoEncoder and Principal Component Analysis (PCA), further enhanced model performance. The hybrid approach achieved an accuracy of 97.24%, with precision, recall, and F1 scores of 0.96, 0.95, and 0.96, respectively. Additionally, various feature selection methods were evaluated, with the meta approach yielding the highest performance metrics. The final classification was performed using XGBoost, which demonstrated robust capabilities in handling missing values and avoiding overfitting, achieving an accuracy of 93.00% and an ROC-AUC of 0.93. These findings underscore the critical role of systematic data preprocessing and feature extraction in developing accurate predictive models for cybersecurity in smart home systems.

Discussion

The discussion section of the research paper highlights the increasing importance of cloud security in the digital age, emphasizing the vulnerabilities that arise from widespread data access and the sophistication of cyberattacks, particularly Man-in-the-Middle (MitM) and session hijacking attacks. The paper reviews existing literature, noting a decline in research publications on these topics from 2018 to 2021, and identifies India and IEEE as leading contributors to the field. The findings indicate that while cryptographic methods and hybrid intrusion detection systems (IDS) have been developed to enhance security, there remains a critical need for more robust cybersecurity measures, especially in the context of emerging technologies like the Industrial Internet of Things (IIoT) and smart grids.

The proposed AEXB Model integrates AutoEncoder and XGBoost to address the limitations of traditional rule-based and existing machine learning approaches in detecting complex cyber threats. This model leverages the dimensionality reduction capabilities of AutoEncoder to extract meaningful features from high-dimensional data, while XGBoost enhances classification accuracy and reduces false alarm rates. The study demonstrates that the AEXB Model effectively identifies various types of attacks in smart home environments, achieving high detection accuracy and real-time response capabilities. Overall, the research underscores the necessity for advanced intrusion detection systems that can adapt to evolving cyber threats, thereby contributing significantly to the field of cybersecurity.