كشف التسلل الأمثل القائم على التعلم الفيدرالي لبيئة إنترنت الأشياء An optimal federated learning-based intrusion detection for IoT environment

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-93501-8
PMID: https://pubmed.ncbi.nlm.nih.gov/40082567
تاريخ النشر: 2025-03-13
المؤلف: A. Karunamurthy وآخرون
الموضوع الرئيسي: أمن الشبكات وكشف التسلل

نظرة عامة

تقدم البحث نظام كشف التسلل (IDS) القائم على التعلم الفيدرالي المصمم لبيئات إنترنت الأشياء (IoT)، مع معالجة التحديات التي تطرحها أنماط الهجوم المتطورة في الأمن السيبراني. غالبًا ما تعتمد أساليب التعلم الآلي التقليدية على مجموعات بيانات ومعلمات محددة، مما يمكن أن يعيق قدرتها على اكتشاف التسللات الجديدة بشكل فعال. من خلال استخدام التعلم الفيدرالي، يقوم النظام المقترح بتدريب مصنفات التعلم العميق بشكل تعاوني عبر الشبكات الموزعة، مما يعزز قدرة النموذج على التكيف مع أنماط الهجوم المتنوعة. يتم استخدام خوارزمية تحسين الشمبانزي لاختيار الميزات المثلى، مما يؤدي إلى تحسين كبير في دقة الكشف.

تظهر النتائج التجريبية باستخدام مجموعة بيانات MQTT أن نظام IDS القائم على التعلم الفيدرالي المقترح يحقق دقة كشف قصوى تبلغ 95.59%، متفوقًا على نماذج التعلم الآلي التقليدية. يسمح هيكل النظام بتجميع معلمات الشبكة لإنشاء نموذج عالمي قوي، مما يحسن قدرات كشف التسلل. ومع ذلك، يعترف البحث بالقيود، مثل التأثير المحتمل للبيانات الضارة على النماذج المحلية، والتي يمكن أن تضر بدقة النموذج العالمي. بالإضافة إلى ذلك، تقدم عملية التحسين تعقيدًا حسابيًا، مما يشير إلى الحاجة إلى تطوير تقنيات تعلم فيدرالي أكثر كفاءة مناسبة للأجهزة IoT ذات الموارد المحدودة في المستقبل.

النتائج

في هذا القسم، يتم مناقشة نتائج النموذج المقترح لتقييم اتصالات أجهزة إنترنت الأشياء باستخدام مجموعة بيانات بروتوكول MQTT. تم الحصول على مجموعة البيانات، من مستودع كاجل، تحاكي كل من حركة المرور العادية وهجمات، وتشمل أنواع هجمات مختلفة مثل القوة الغاشمة، هجمات الفيضانات، فيضان نشر MQTT، البيانات المشوهة، وهجمات SlowITe. استخدمت التجربة بايثون، مع مكتبة PyTorch لتنفيذ نموذج التعلم العميق، وSciPy وNumPy للعمليات الرياضية. تضمنت إعدادات التجربة معالج Intel i5 مع 32 جيجابايت من الذاكرة، وتم توازن مجموعة البيانات قبل التدريب، حيث تم تخصيص 70% للتدريب و30% للاختبار.

تم استخدام خوارزمية تحسين الشمبانزي لتقليل الأبعاد، مما يضمن اختيار فقط الميزات الأكثر صلة من بيانات حركة مرور الشبكة، مما يعزز الكفاءة الحسابية ودقة النموذج. قام كل مثيل IoT في إعداد التعلم الفيدرالي بتدريب نموذج شبكة عصبية تلافيفية (CNN) محليًا، وتم تجميع المعلمات بعد ذلك في خادم مركزي لإنشاء نموذج عالمي. تم تقييم أداء هذا النموذج العالمي باستخدام مقاييس مثل الدقة، الدقة، الاسترجاع، ودرجة F1، وتمت مقارنته مع نماذج التعلم الآلي الحالية بما في ذلك الشبكة العصبية متعددة الطبقات، شجرة القرار، الغابة العشوائية، الشبكة العصبية، تعزيز التدرج، وNaïve Bayes. تم تفصيل معلمات المحاكاة لكل من النماذج المقترحة والحالية في الجداول المرفقة.

المناقشة

تستعرض قسم المناقشة من ورقة البحث التقدمات الأخيرة في أنظمة كشف التسلل (IDS) لبيئات إنترنت الأشياء (IoT)، مع التأكيد على أهمية اختيار نماذج التعلم الآلي (ML) المناسبة لتصنيف حركة المرور الضارة. استكشفت دراسات مختلفة فعالية خوارزميات ML المختلفة، بما في ذلك أساليب التعلم الفيدرالي (FL) التي تعزز قدرات الكشف من خلال تجميع معلمات النموذج المحلي لإنشاء نموذج عالمي. من الجدير بالذكر أن IDS القائم على FL أظهر دقة كشف تبلغ 92.49% باستخدام مجموعة بيانات Edge IIoT، متفوقًا على طرق ML التقليدية. لقد حسنت دمج تقنيات التعلم العميق، مثل الشبكات العصبية التلافيفية (CNNs)، من دقة الكشف بينما تعالج التحديات المتعلقة بخصوصية البيانات والكفاءة الحسابية.

يهدف هيكل IDS القائم على FL المقترح إلى كشف التسللات عبر مثيلات IoT الموزعة، باستخدام إعداد فيدرالي يسمح بتدريب النماذج المحلية على بيانات الشبكة مع الحفاظ على خصوصية البيانات. يتضمن الهيكل وحدة اختيار الميزات المدعومة من خوارزمية تحسين الشمبانزي، التي تقلل بشكل فعال من الأبعاد وتعزز دقة التصنيف. تم التحقق من أداء النموذج المقترح من خلال التجارب، حيث حقق دقة كشف قصوى تبلغ 95.59% مع تحسين المقاييس عبر أنواع الهجمات المختلفة مقارنة بالنماذج الحالية. ومع ذلك، يعترف الدراسة بالقيود المحتملة، مثل خطر تأثر النماذج المحلية الم compromised بدقة النموذج العالمي وزيادة التعقيد الحسابي المرتبط بخوارزمية التحسين. ستركز الأعمال المستقبلية على تطوير تقنيات FL أكثر كفاءة مناسبة للأجهزة IoT ذات الموارد المحدودة.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-93501-8
PMID: https://pubmed.ncbi.nlm.nih.gov/40082567
Publication Date: 2025-03-13
Author(s): A. Karunamurthy et al.
Primary Topic: Network Security and Intrusion Detection

Overview

The research presents a federated learning-based intrusion detection system (IDS) tailored for Internet of Things (IoT) environments, addressing the challenges posed by evolving attack patterns in cybersecurity. Traditional machine learning approaches often rely on specific datasets and parameters, which can hinder their ability to detect novel intrusions effectively. By employing federated learning, the proposed system trains deep learning classifiers collaboratively across distributed networks, enhancing the model’s adaptability to diverse attack patterns. The Chimp optimization algorithm is utilized for optimal feature selection, leading to a significant improvement in detection accuracy.

Experimental results using the MQTT dataset demonstrate that the proposed federated learning-based IDS achieves a maximum detection accuracy of 95.59%, outperforming conventional machine learning models. The system’s architecture allows for the aggregation of network parameters to create a robust global model, thereby improving intrusion detection capabilities. However, the research acknowledges limitations, such as the potential impact of malicious data on local models, which can compromise the accuracy of the global model. Additionally, the optimization process introduces computational complexity, suggesting a need for future development of more efficient federated learning techniques suitable for resource-constrained IoT devices.

Results

In this section, the results of the proposed model for evaluating IoT device communications using the MQTT protocol dataset are discussed. The dataset, sourced from the Kaggle repository, simulates both normal and attack traffic, encompassing various attack types such as Brute Force, Flooding DoS, MQTT Publish Flood, Malformed Data, and SlowITe attacks. The experimentation utilized Python, with the PyTorch library for deep learning model implementation, and SciPy and NumPy for mathematical operations. The experimental setup included an Intel i5 processor with 32GB of memory, and the dataset was balanced prior to training, with 70% allocated for training and 30% for testing.

The Chimp Optimization Algorithm was employed for dimensionality reduction, ensuring that only the most relevant features from the network traffic data were selected, thereby enhancing computational efficiency and model accuracy. Each IoT instance in the federated learning setup trained its own Convolutional Neural Network (CNN) model locally, and the parameters were subsequently aggregated at a central server to create a global model. The performance of this global model was evaluated using metrics such as accuracy, precision, recall, and F1-score, and compared against existing machine learning models including Multilayer Perceptron, Decision Tree, Random Forest, Neural Network, Gradient Boost, and Naïve Bayes. The simulation parameters for both the proposed and existing models are detailed in accompanying tables.

Discussion

The discussion section of the research paper reviews recent advancements in intrusion detection systems (IDS) for Internet of Things (IoT) environments, emphasizing the significance of selecting appropriate machine learning (ML) models for classifying malicious traffic. Various studies have explored the efficacy of different ML algorithms, including federated learning (FL) approaches that enhance detection capabilities by aggregating local model parameters to create a global model. Notably, an FL-based IDS demonstrated a detection accuracy of 92.49% using the Edge IIoT dataset, outperforming traditional ML methods. The integration of deep learning techniques, such as convolutional neural networks (CNNs), has further improved detection accuracy while addressing challenges related to data privacy and computational efficiency.

The proposed optimal FL-based IDS architecture aims to detect intrusions across distributed IoT instances, utilizing a federated setup that allows local models to be trained on network data while maintaining data privacy. The architecture incorporates a feature selection module powered by the Chimp Optimization Algorithm, which effectively reduces dimensionality and enhances classification accuracy. The proposed model’s performance was validated through experiments, achieving a maximum detection accuracy of 95.59% with improved metrics across various attack types compared to existing models. However, the study acknowledges potential limitations, such as the risk of compromised local models affecting global accuracy and the increased computational complexity associated with the optimization algorithm. Future work will focus on developing more efficient FL techniques suitable for resource-constrained IoT devices.