نموذج هجين للتعلم الآلي لاكتشاف التسلل في شبكات الاستشعار اللاسلكية من خلال الاستفادة من توازن البيانات وتقليل الأبعاد A hybrid machine learning model for intrusion detection in wireless sensor networks leveraging data balancing and dimensionality reduction

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-87028-1
PMID: https://pubmed.ncbi.nlm.nih.gov/39920281
تاريخ النشر: 2025-02-07
المؤلف: Md. Alamin Talukder وآخرون
الموضوع الرئيسي: أمن الشبكات وكشف التسلل

نظرة عامة

تقدم هذه الدراسة نموذجًا هجينًا جديدًا لتعلم الآلة مصممًا لتعزيز أنظمة كشف التسلل (IDS) لشبكات الاستشعار اللاسلكية (WSNs) وبيئات إنترنت الأشياء (IoT). يدمج النموذج KMeans-SMOTE (KMS) لموازنة البيانات وتحليل المكونات الرئيسية (PCA) لتقليل الأبعاد، مستخدمًا مصنفات مثل شجرة القرار، ومصنف الغابة العشوائية (RFC)، وتقنيات تعزيز التدرج مثل XGBoost (XGBC). تم تقييمه على مجموعات بيانات WSN-DS وTON-IoT، حيث يحقق النهج الهجين (KMS + PCA + RFC) نتائج مثيرة للإعجاب، بدقة 99.94% ودرجة f1 تبلغ 99.94% على WSN-DS، ودقة 99.97% ودرجة f1 على TON-IoT. يتفوق هذا النموذج على طرق موازنة البيانات التقليدية، حيث يعالج بفعالية مشاكل عدم توازن الفئات وارتفاع الأبعاد، مع إظهار تقليل أوقات التدريب والتنبؤ، مما يجعله مناسبًا للتطبيقات في الوقت الحقيقي.

في الختام، يعزز IDS المقترح بشكل كبير الأمان في WSNs. ستركز الأبحاث المستقبلية على طرق اختيار الميزات التكيفية التي يمكن أن تستجيب لظروف الشبكة في الوقت الحقيقي واستكشاف تقنيات التعلم الفيدرالي لتعزيز الخصوصية في بيئات WSN الموزعة على نطاق واسع.

طرق

تهدف المنهجية المقدمة في هذه الدراسة إلى تحسين أداء نماذج تعلم الآلة (ML) لكشف التسلل في شبكات الاستشعار اللاسلكية (WSN) ومجموعات بيانات شبكة TON-IoT. تشمل الطريقة عدة خطوات رئيسية:

1. **جمع البيانات**: تم استخدام مجموعتي بيانات مرجعية، WSN-DS وTON-IoT Network، التي تتميز بأنماط حركة مرور وسيناريوهات هجوم متنوعة، والتي تشكل أساسًا قويًا لتدريب النموذج وتقييمه.

2. **معالجة البيانات**: استخدمت الدراسة Standard-Scaler لتوحيد الميزات العددية وترميز الفئات لتحويل المتغيرات الفئوية إلى تنسيقات عددية. تضمن هذا المعالجة التوحيد ويعزز أداء النموذج. لمعالجة عدم توازن الفئات، تم تنفيذ تقنيات مثل SMOTE TomekLink (STL)، والشبكة التنافسية التوليدية (GAN)، وKMeans-SMOTE (KMS)، حيث يعزز KMS توليد البيانات الاصطناعية من خلال التجميع باستخدام k-means.

3. **تقليل الأبعاد والتصنيف**: تم تطبيق تحليل المكونات الرئيسية (PCA) لتقليل الأبعاد مع الاحتفاظ بالتباين الكبير، مما يقلل من التكرار. تم اختيار مصنف الغابة العشوائية (RFC) كنموذج أساسي نظرًا لصلابته وقدرته على إدارة البيانات عالية الأبعاد، مع تقييم نماذج إضافية مثل مصنف شجرة القرار ونماذج تعزيز التدرج المختلفة للمقارنة.

4. **تقييم الأداء**: تم تقييم أداء النماذج باستخدام مقاييس مثل الدقة، والدقة، والاسترجاع، ودرجة F1، وROC-AUC، إلى جانب مقاييس التعقيد الحسابي. تم التحقق من المنهجية المقترحة، التي تدمج KMS وPCA وRFC، من خلال التحقق المتقاطع K-fold (K=5) وأظهرت أداءً وكفاءة متفوقين مقارنة بالطرق الحالية، مما يثبت ملاءمتها لتطبيقات أمان الشبكات IoT في العالم الحقيقي. يتم تمثيل هيكل نموذج ML الهجين بصريًا في الشكل 1، موضحًا سير العمل الشامل من معالجة البيانات إلى كشف التسلل.

مناقشة

تناقش ورقة البحث تطوير نموذج هجين لتعلم الآلة (ML) يهدف إلى تعزيز كشف التسلل في شبكات الاستشعار اللاسلكية (WSNs). يدمج النموذج تقنيات متقدمة لموازنة البيانات، بما في ذلك تقنية زيادة العينة للأقليات الاصطناعية (SMOTE)، والشبكات التنافسية التوليدية (GANs)، وKMeans-SMOTE (KMS)، جنبًا إلى جنب مع تحليل المكونات الرئيسية (PCA) لتقليل الأبعاد. تعالج هذه الطريقة بفعالية التحديات المتعلقة بعدم توازن الفئات والبيانات عالية الأبعاد، مما يؤدي إلى تحسين دقة الكشف، وتقليل الإيجابيات الكاذبة، وزيادة الكفاءة الحسابية، مما يجعلها مناسبة للتطبيقات في الوقت الحقيقي في أمان WSN.

تشمل المساهمات الرئيسية للدراسة تقديم نهج هجين جديد يجمع بين موازنة البيانات وتقليل الأبعاد مصمم خصيصًا لكشف التسلل في WSN. يعزز دمج STL وGANs وKMS تمثيل الفئات الأقل، مما يحسن قدرة النموذج على الكشف عن التسللات بدقة. كما يقلل PCA من الأبعاد من خلال إسقاط البيانات على ميزات مهمة، مما يقلل من الضوضاء والبيانات غير ذات الصلة، مما يضمن قابلية توسيع النموذج وموثوقيته. تؤكد النتائج على فعالية هذه التقنيات في التغلب على التحديات الشائعة في أنظمة كشف التسلل، مما يعزز أمان WSNs.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-87028-1
PMID: https://pubmed.ncbi.nlm.nih.gov/39920281
Publication Date: 2025-02-07
Author(s): Md. Alamin Talukder et al.
Primary Topic: Network Security and Intrusion Detection

Overview

This research presents a novel hybrid machine learning model designed to enhance intrusion detection systems (IDS) for wireless sensor networks (WSNs) and Internet of Things (IoT) environments. The model integrates KMeans-SMOTE (KMS) for data balancing and principal component analysis (PCA) for dimensionality reduction, utilizing classifiers such as Decision Tree, Random Forest Classifier (RFC), and gradient boosting techniques like XGBoost (XGBC). Evaluated on the WSN-DS and TON-IoT datasets, the hybrid approach (KMS + PCA + RFC) achieves impressive results, with an accuracy of 99.94% and an f1-score of 99.94% on WSN-DS, and 99.97% accuracy and f1-score on TON-IoT. This model outperforms traditional data balancing methods, effectively addressing class imbalance and high dimensionality issues while demonstrating reduced training and prediction times, making it suitable for real-time applications.

In conclusion, the proposed IDS significantly enhances security in WSNs. Future research will focus on adaptive feature selection methods that can respond to real-time network conditions and the exploration of federated learning techniques to bolster privacy in large-scale, distributed WSN environments.

Methods

The methodology presented in this study aims to improve the performance of machine learning (ML) models for intrusion detection in Wireless Sensor Networks (WSN) and TON-IoT Network datasets. The approach encompasses several key steps:

1. **Data Collection**: Two benchmark datasets, WSN-DS and TON-IoT Network, were utilized, featuring varied traffic patterns and attack scenarios, which serve as a robust foundation for model training and evaluation.

2. **Data Preprocessing**: The study employed Standard-Scaler for standardizing numerical features and Label Encoding for converting categorical variables into numerical formats. This preprocessing ensures uniformity and enhances model performance. To address class imbalance, techniques such as SMOTE TomekLink (STL), Generative Adversarial Network (GAN), and KMeans-SMOTE (KMS) were implemented, with KMS enhancing synthetic data generation through k-means clustering.

3. **Dimensionality Reduction and Classification**: Principal Component Analysis (PCA) was applied to reduce dimensionality while retaining significant variance, thus minimizing redundancy. The Random Forest Classifier (RFC) was selected as the primary model due to its robustness and ability to manage high-dimensional data, with additional models like Decision Tree Classifier and various Gradient Boosting models evaluated for comparison.

4. **Performance Evaluation**: The models’ performance was assessed using metrics such as Accuracy, Precision, Recall, F1-score, and ROC-AUC, alongside computational complexity measures. The proposed methodology, integrating KMS, PCA, and RFC, was validated through K-fold cross-validation (K=5) and demonstrated superior performance and efficiency compared to existing methods, establishing its suitability for real-world IoT network security applications. The architecture of the hybrid ML model is visually represented in Figure 1, illustrating the comprehensive workflow from data processing to intrusion detection.

Discussion

The research paper discusses the development of a hybrid machine learning (ML) model aimed at enhancing intrusion detection in Wireless Sensor Networks (WSNs). The model integrates advanced data balancing techniques, including Synthetic Minority Oversampling Technique (SMOTE), Generative Adversarial Networks (GANs), and KMeans-SMOTE (KMS), alongside Principal Component Analysis (PCA) for dimensionality reduction. This approach effectively addresses the challenges of class imbalance and high-dimensional data, leading to improved detection accuracy, reduced false positives, and enhanced computational efficiency, making it suitable for real-time applications in WSN security.

Key contributions of the study include the introduction of a novel hybrid approach that combines data balancing and dimensionality reduction tailored for WSN intrusion detection. The integration of STL, GANs, and KMS enhances the representation of minority classes, thereby improving the model’s ability to detect intrusions accurately. PCA further reduces dimensionality by projecting data onto significant features, which minimizes noise and irrelevant data, ensuring the model’s scalability and reliability. The findings underscore the effectiveness of these techniques in overcoming common challenges in intrusion detection systems, thereby advancing the security of WSNs.