تحسين نظام كشف التسلل لإنترنت الأشياء: اختيار الميزات مقابل استخراج الميزات في التعلم الآلي Optimizing IoT intrusion detection system: feature selection versus feature extraction in machine learning

المجلة: Journal Of Big Data، المجلد: 11، العدد: 1
DOI: https://doi.org/10.1186/s40537-024-00892-y
تاريخ النشر: 2024-02-24
المؤلف: Jing Li وآخرون
الموضوع الرئيسي: أمن الشبكات وكشف التسلل

نظرة عامة

تبحث ورقة البحث في فعالية تقنيات تقليل الميزات—اختيار الميزات (FS) واستخراج الميزات (FE)—في سياق أنظمة كشف التسلل الشبكي (NIDS) لبيئات إنترنت الأشياء (IoT). تستخدم مجموعة بيانات شبكة TON-IoT غير المتجانسة لتقييم أداء هذه التقنيات بناءً على مقاييس مثل الدقة، ودرجة F1، ووقت التشغيل لكل من مهام التصنيف الثنائي ومتعدد الفئات. تشير النتائج إلى أن FE يتفوق عمومًا على FS من حيث دقة الكشف والكفاءة عندما يتم تقليل عدد الميزات بشكل كبير (على سبيل المثال، 9 أو 22 ميزة). على العكس، يظهر FS أداءً متفوقًا مع مجموعات ميزات أكبر (على سبيل المثال، 33 ميزة أو أكثر)، مما يبرز إمكانيته في تحسين الدقة تحت ظروف معينة.

تحدد الدراسة أيضًا الشبكة العصبية متعددة الطبقات (MLP) كأفضل مصنف لـ FE، بينما تتفوق أشجار القرار (DT) في سياق FS، حيث توفر أعلى دقة في كشف الهجمات. من الجدير بالذكر أن FE أقل حساسية للتغيرات في عدد الميزات وقادرة على كشف مجموعة واسعة من أنواع الهجمات مقارنة بـ FS. تميل كلا الطريقتين إلى تعزيز معدلات الكشف، خاصةً لفئات الهجمات غير الطبيعية، مع زيادة عدد الميزات. تؤكد الأبحاث على أهمية اختيار تقنيات تقليل الميزات المناسبة المصممة لتناسب سيناريوهات IoT المحددة وتحدد الخطط للتحقيقات المستقبلية في طرق ومجموعات بيانات إضافية لسد الفجوة بين التحليل النظري والتطبيق الفوري في أمان IoT.

مقدمة

تناقش مقدمة الورقة النمو السريع لإنترنت الأشياء (IoT)، مع التأكيد على إمكانيته في ربط مليارات الأجهزة وتوليد كميات هائلة من البيانات. على الرغم من هذا التوسع، غالبًا ما تفتقر أجهزة IoT إلى تدابير أمان قوية، مما يستدعي تطوير أنظمة كشف التسلل الشبكي الفعالة (NIDS). تبرز الورقة أهمية استخدام تقنيات التعلم الآلي لكشف التسلل، بينما تتناول أيضًا التحديات التي تطرحها الميزات غير ذات الصلة أو الزائدة في مجموعات البيانات الحالية. لتخفيف هذه المشكلات، تقترح طرق تقليل الميزات، وتحديدًا اختيار الميزات واستخراج الميزات، كاستراتيجيات أساسية لتعزيز كفاءة وقابلية تفسير NIDS.

تهدف الأبحاث إلى سد الفجوة في التحليل المقارن لتقنيات اختيار الميزات واستخراج الميزات، لا سيما في سياق مجموعات بيانات IoT الحديثة. تسعى لتقييم التوازن بين دقة الكشف وتعقيد الحساب تحت ظروف تجريبية متسقة. تشير النتائج إلى أن اختيار الميزات عمومًا يؤدي إلى دقة كشف أعلى مع تقليل وقت التدريب والاستدلال عندما يتم القضاء على عدد كبير من الميزات. على العكس، يظهر استخراج الميزات أداءً متفوقًا مع انخفاض عدد الميزات، مع قدرة أوسع على كشف أنواع مختلفة من الهجمات. تحدد الدراسة أيضًا مصنف شجرة القرار كالأكثر فعالية لـ NIDS في بيئات IoT. بشكل عام، تقدم الورقة نهجًا منظمًا لتوجيه اختيار طرق تقليل الميزات، مما يساهم في تقديم رؤى قيمة لتحسين كشف التسلل في شبكات IoT.

طرق

في هذا القسم، يحدد المؤلفون المنهجية لدمج تقنيات اختيار الميزات (FS) واستخراج الميزات (FE) في نظام كشف التسلل الشبكي القائم على التعلم الآلي (NIDS). يتم هيكلة المنهجية في ثلاث مراحل: معالجة البيانات، وتقليل الميزات، والتصنيف. خلال معالجة البيانات، تخضع مجموعة البيانات للتنظيف، والتقسيم، والتطبيع، مما ينتج عنه مجموعة تدريب لتقليل الميزات ومجموعة اختبار لتوقع النموذج. تستخدم مرحلة تقليل الميزات تقنيات FS أو FE لتحديد السمات الرئيسية، مما يقلل الأبعاد ويعد البيانات للتصنيف. تستخدم مرحلة التصنيف نماذج تعلم آلي متنوعة، بما في ذلك شجرة القرار، والغابة العشوائية، وأقرب الجيران، ونايف بايز، والشبكة العصبية متعددة الطبقات، لتقييم فعالية طرق تقليل الميزات من خلال مهام التصنيف الثنائي ومتعدد الفئات.

يتضمن الإعداد التجريبي تقييمًا شاملاً لأداء NIDS باستخدام طرق FS و FE المذكورة، مع تحليل مقاييس مثل الدقة، والدقة، والاسترجاع، ودرجة F1، ومعامل ارتباط ماثيو (MCC). يأخذ المؤلفون أيضًا في الاعتبار أوقات تدريب النموذج والاستدلال لتقييم كفاءة طرق الكشف. يوفر مقارنة مفصلة لمقاييس الأداء مع وبدون تقليل الميزات رؤى حول فعالية تقنيات FS و FE في تحسين NIDS لسيناريوهات الشبكة المحددة لإنترنت الأشياء (IoT). بالإضافة إلى ذلك، تقدم الجدول 3 تفاصيل منصة الحوسبة والبرامج المستخدمة في بناء إطار عمل NIDS.

مناقشة

في قسم المناقشة، تستعرض الورقة دراسات متنوعة حول أنظمة كشف التسلل الشبكي (NIDS) التي تستخدم طرق تقليل الميزات، مع تسليط الضوء على الاستخدام السائد لتقنيات اختيار الميزات لتبسيط تعقيد بيانات الحركة. من الجدير بالذكر أن الطرق المعتمدة على الفلتر، مثل المعلومات المتبادلة (MI) وتقنيات الارتباط، أظهرت أداءً متفوقًا في دقة كشف الهجمات مقارنة بالطرق التقليدية. على سبيل المثال، تم دمج الأساليب المعتمدة على MI مع مصنفات مثل آلات الدعم الشعاعي لتعزيز دقة النموذج مع تقليل التعقيد عبر مجموعات بيانات مثل KDD Cup 99 و UNSW-NB15. بالإضافة إلى ذلك، تم استخدام طرق معتمدة على الغلاف، بما في ذلك CorrAUC وخوارزميات التحسين الاستدلالية، لتحديد مجموعات الميزات الفعالة، على الرغم من أنها غالبًا ما تتكبد تكاليف حسابية عالية.

تؤكد الورقة على الحاجة إلى طرق اختيار واستخراج ميزات فعالة مصممة لبيئات IoT، حيث تكون الموارد الحاسوبية محدودة. تناقش تطبيق اختيار الميزات المعتمد على الارتباط، الذي يوازن بين دقة الكشف وكفاءة الحساب، وتقوم بمقارنته مع تقنيات استخراج الميزات مثل تحليل المكونات الرئيسية (PCA) وAutoencoders (AEs). يقترح المؤلفون تقييمًا شاملاً لهذه الطرق باستخدام مجموعة بيانات TON-IoT المعاصرة، مع التركيز على مقاييس الأداء مثل الدقة، والدقة، والاسترجاع، وتعقيدات وقت التشغيل. تهدف هذه الدراسة إلى سد الفجوة في الأدبيات الحالية من خلال تقديم مقارنة مفصلة بين طرق اختيار واستخراج الميزات، مما يساهم في تطوير NIDS خفيفة الوزن وفعالة لأمان IoT.

Journal: Journal Of Big Data, Volume: 11, Issue: 1
DOI: https://doi.org/10.1186/s40537-024-00892-y
Publication Date: 2024-02-24
Author(s): Jing Li et al.
Primary Topic: Network Security and Intrusion Detection

Overview

The research paper investigates the effectiveness of feature reduction techniques—feature selection (FS) and feature extraction (FE)—in the context of network intrusion detection systems (NIDS) for Internet of Things (IoT) environments. It utilizes the heterogeneous Network TON-IoT dataset to evaluate the performance of these techniques based on metrics such as accuracy, F1-score, and runtime for both binary and multiclass classification tasks. The findings indicate that FE generally outperforms FS in terms of detection accuracy and efficiency when the number of features is significantly reduced (e.g., 9 or 22 features). Conversely, FS demonstrates superior performance with larger feature sets (e.g., 33 or more features), highlighting its potential for improving accuracy under certain conditions.

The study also identifies the Multi-Layer Perceptron (MLP) as the most effective classifier for FE, while Decision Trees (DT) excel in the context of FS, providing the highest accuracy in attack detection. Notably, FE is less sensitive to variations in feature count and is capable of detecting a wider array of attack types compared to FS. Both methods tend to enhance detection rates, particularly for abnormal attack classes, as the number of features increases. The research emphasizes the importance of selecting appropriate feature reduction techniques tailored to specific IoT scenarios and outlines plans for future investigations into additional methods and datasets to bridge the gap between theoretical analysis and real-time application in IoT security.

Introduction

The introduction of the paper discusses the rapid growth of the Internet of Things (IoT), emphasizing its potential to connect billions of devices and generate vast amounts of data. Despite this expansion, IoT devices often lack robust security measures, necessitating the development of effective network intrusion detection systems (NIDS). The paper highlights the importance of employing machine learning techniques for intrusion detection, while also addressing the challenges posed by irrelevant or redundant features in existing datasets. To mitigate these issues, feature reduction methods, specifically feature selection and feature extraction, are proposed as essential strategies for enhancing the efficiency and interpretability of NIDS.

The research aims to fill a gap in the comparative analysis of feature selection and feature extraction techniques, particularly in the context of modern IoT datasets. It seeks to evaluate the trade-offs between detection accuracy and computational complexity under consistent experimental conditions. The findings indicate that feature selection generally yields higher detection accuracy with reduced training and inference time when a significant number of features are eliminated. Conversely, feature extraction demonstrates superior performance as the feature count decreases, with a broader capability to detect various attack types. The study also identifies the Decision Tree classifier as the most effective for NIDS in IoT environments. Overall, the paper presents a structured approach to guide the selection of feature reduction methods, contributing valuable insights for optimizing intrusion detection in IoT networks.

Methods

In this section, the authors outline the methodology for integrating feature selection (FS) and feature extraction (FE) techniques into a machine learning-based network intrusion detection system (NIDS). The methodology is structured into three phases: data preprocessing, feature reduction, and classification. During data preprocessing, the dataset undergoes cleansing, partitioning, and normalization, resulting in a training set for feature reduction and a testing set for model prediction. The feature reduction phase employs FS or FE techniques to identify key attributes, thereby reducing dimensionality and preparing the data for classification. The classification phase utilizes various machine learning models, including Decision Tree, Random Forest, k-Nearest Neighbors, Naive Bayes, and Multi-Layer Perceptron, to assess the effectiveness of the feature reduction methods through binary and multiclass classification tasks.

The experimental setup involves a comprehensive evaluation of the NIDS performance using the aforementioned FS and FE methods, with metrics such as accuracy, precision, recall, F1-score, and Matthews Correlation Coefficient (MCC) being analyzed. The authors also consider model training and inference times to assess the efficiency of the detection methods. A detailed comparison of the performance metrics with and without feature reduction provides insights into the effectiveness of FS and FE techniques in optimizing NIDS for specific Internet of Things (IoT) network scenarios. Additionally, Table 3 presents the computing platform and software details used in constructing the NIDS framework.

Discussion

In the discussion section, the paper reviews various studies on Network Intrusion Detection Systems (NIDS) that utilize feature reduction methods, highlighting the prevalent use of feature selection techniques to simplify traffic data complexity. Notably, filter-based methods, such as Mutual Information (MI) and correlation-based techniques, have shown superior performance in attack detection accuracy compared to traditional methods. For instance, MI-based approaches have been combined with classifiers like support vector machines to enhance model accuracy while reducing complexity across datasets like KDD Cup 99 and UNSW-NB15. Additionally, wrapper-based methods, including CorrAUC and heuristic optimization algorithms, have been employed to identify effective feature subsets, although they often incur high computational costs.

The paper emphasizes the need for efficient feature selection and extraction methods tailored for IoT environments, where computational resources are limited. It discusses the application of correlation-based feature selection, which balances detection accuracy with computational efficiency, and contrasts it with feature extraction techniques such as Principal Component Analysis (PCA) and Autoencoders (AEs). The authors propose a comprehensive evaluation of these methods using the contemporary TON-IoT dataset, focusing on performance metrics like accuracy, precision, recall, and runtime complexities. This study aims to fill the gap in existing literature by providing a detailed comparison of feature selection and extraction methods, ultimately contributing to the development of lightweight and effective NIDS for IoT security.