تعزيز اكتشاف البرمجيات الضارة من خلال اختيار الميزات الذاتية باستخدام مُحسِّن الذئب الرمادي Enhancing malware detection through self-union feature selection using gray wolf optimizer

المجلة: Indonesian Journal of Electrical Engineering and Computer Science، المجلد: 37، العدد: 1
DOI: https://doi.org/10.11591/ijeecs.v37.i1.pp197-205
تاريخ النشر: 2024-10-31
المؤلف: Mosleh M. Abualhaj وآخرون
الموضوع الرئيسي: تقنيات الكشف المتقدمة عن البرمجيات الخبيثة

نظرة عامة

تقدم هذه البحث نظام RFGWO-Mal، وهو نهج مبتكر للكشف عن البرمجيات الضارة وتصنيفها باستخدام مزيج من مصنف الغابة العشوائية (RF) ومحسن الذئب الرمادي (GWO). يستخدم النظام طريقة جديدة لاختيار الميزات ذات الاتحاد الذاتي، والتي تدمج الميزات من مجموعات فرعية مختلفة تم تحديدها بواسطة GWO لمهام التصنيف الثنائي والمتعدد الفئات. تم تقييم فعالية نظام RFGWO-Mal باستخدام مجموعة بيانات Obfuscated-MalMem2022 الشاملة، محققًا معدلات دقة ملحوظة بلغت 99.95% للتصنيف الثنائي و86.57% للتصنيف المتعدد الفئات.

تؤكد النتائج على التحسين الكبير في أداء الكشف عن البرمجيات الضارة المنسوب إلى نهج اختيار الميزات ذات الاتحاد الذاتي، مما يمثل مساهمة قيمة في مجال الأمن السيبراني. تشمل اتجاهات البحث المستقبلية اختبار طريقة الاتحاد الذاتي مع مصنفات تعلم الآلة الإضافية مثل الجيران الأقرب (KNN) وأشجار القرار (DT) وآلات الدعم الناقل (SVM)، بالإضافة إلى تقييم نظام RFGWO-Mal مقابل مجموعات بيانات برمجيات ضارة أخرى.

مقدمة

ت outlines مقدمة ورقة البحث التهديد المتزايد للبرمجيات الضارة في سياق مشهد الإنترنت المتنامي. في البداية، كانت الهجمات الإلكترونية نادرة، ولكن انتشار البرمجيات الضارة – المصممة للتجسس أو الكسب المالي أو سرقة البيانات – قد زاد بشكل كبير، مما أدى إلى خسائر اقتصادية كبيرة تقدر بـ 6 تريليون دولار في عام 2021. من المتوقع أن ينمو سوق تحليل البرمجيات الضارة من 3 مليارات دولار في عام 2019 إلى 11.7 مليار دولار بحلول عام 2024. أصبحت طرق الكشف التقليدية، المعتمدة على القواعد التي تم إنشاؤها يدويًا، غير كافية بسبب تعقيد البرمجيات الضارة المتقدمة التي يمكن أن تتجاوز تدابير الأمان القياسية. وبالتالي، يتبنى مقدمو الأمن بشكل متزايد تقنيات تعلم الآلة (ML) لتعزيز الكشف عن البرمجيات الضارة وتصنيفها.

تؤكد الورقة على التحديات التي تطرحها مجموعات البيانات الواسعة والمتنوعة المعنية في الكشف عن البرمجيات الضارة، مما يستلزم اختيار ميزات فعّال لتحسين أداء تعلم الآلة. يتم تسليط الضوء على اختيار الميزات كونه مهمة معقدة، خاصة في مجموعات البيانات عالية الأبعاد، مما يدفع الباحثين لاستكشاف الخوارزميات الميتاهيرستية من أجل التحسين. تم تحديد محسن الذئب الرمادي (GWO) كنهج واعد لاختيار الميزات في هذا السياق. تستعرض المقدمة أيضًا منهجيات مختلفة للكشف عن البرمجيات الضارة، بما في ذلك الأطر التي تقلل من الإيجابيات الكاذبة وتقنيات التصنيف المبتكرة التي تستفيد من تحليل استخدام الأذونات ونماذج التعلم العميق. تظهر النتائج الملحوظة من الدراسات السابقة معدلات دقة عالية في الكشف عن البرمجيات الضارة، مما يبرز فعالية هذه المنهجيات المتقدمة.

النتائج

تظهر نتائج الدراسة فعالية نظام RFGWO-Mal المقترح في كل من مهام التصنيف الثنائي والمتعدد الفئات. يستخدم النظام محسن الذئب الرمادي (GWO) لاختيار مجموعة فرعية من الميزات، والتي تم تقييمها بعد ذلك باستخدام طريقتين لاختيار الميزات: C-FS والطريقة الجديدة المقدمة ذات الاتحاد الذاتي U-FS.

تم تنفيذ مهام التصنيف باستخدام مصنف الغابة العشوائية (RF)، كما هو موضح في القسم 3.3 من الورقة. تشير النتائج إلى أن نظام RFGWO-Mal، من خلال اختياره المحسن للميزات ومنهجيته في التصنيف، يظهر وعدًا في تعزيز أداء التصنيف عبر أنواع مختلفة من مجموعات البيانات.

المناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على فعالية نظام RFGWO-Mal المقترح للكشف عن البرمجيات الضارة، والذي يدمج مصنف الغابة العشوائية (RF) مع محسن الذئب الرمادي (GWO) لاختيار الميزات. تكشف الدراسة أن مصنفات تعلم الآلة التقليدية، بما في ذلك أشجار القرار (DT)، حققت معدلات دقة عالية، حيث وصلت DT إلى 99%. يظهر نظام RFGWO-Mal أداءً متفوقًا، محققًا دقة 99.95% في التصنيف الثنائي و86.57% في التصنيف المتعدد الفئات، متفوقًا على طرق اختيار الميزات التقليدية. يعزز نهج اختيار الميزات ذات الاتحاد الذاتي المبتكر، الذي يجمع الميزات المختارة من كل من التصنيفات الثنائية والمتعددة الفئات، دقة النظام واسترجاعه ودقته بشكل كبير، خاصة في السيناريوهات المتعددة الفئات.

تؤكد البحث على أهمية اختيار الميزات الفعّال في تحسين أداء تعلم الآلة في الأمن السيبراني. تم الإشارة إلى قدرة محسن GWO على التكيف ديناميكيًا مع خصائص البرمجيات الضارة المتطورة وكفاءته في البيئات المحدودة الموارد كميزات رئيسية. تشير النتائج إلى أن طريقة اختيار الميزات ذات الاتحاد الذاتي لا تتطابق فقط مع الطرق التقليدية في التصنيف الثنائي، بل توفر أيضًا تحسينات كبيرة في مهام التصنيف المتعدد الفئات. يضع هذا التقدم نظام RFGWO-Mal كأداة قوية للكشف عن البرمجيات الضارة، مما يسهم في تعزيز تدابير الأمن السيبراني ويضع الأساس للبحث المستقبلي في هذا المجال.

Journal: Indonesian Journal of Electrical Engineering and Computer Science, Volume: 37, Issue: 1
DOI: https://doi.org/10.11591/ijeecs.v37.i1.pp197-205
Publication Date: 2024-10-31
Author(s): Mosleh M. Abualhaj et al.
Primary Topic: Advanced Malware Detection Techniques

Overview

This research presents the RFGWO-Mal system, an innovative approach for detecting and classifying malware using a combination of a random forest (RF) classifier and a gray wolf optimizer (GWO). The system employs a novel self-union feature selection method, which integrates features from various subsets identified by the GWO for both binary and multiclass classification tasks. The effectiveness of the RFGWO-Mal system was evaluated using the comprehensive Obfuscated-MalMem2022 dataset, achieving remarkable accuracy rates of 99.95% for binary classification and 86.57% for multiclass classification.

The findings underscore the significant enhancement in malware detection performance attributed to the self-union feature selection approach, marking a valuable contribution to the field of cybersecurity. Future research directions include testing the self-union method with additional machine learning classifiers such as K-nearest neighbors (KNN), decision trees (DT), and support vector machines (SVM), as well as evaluating the RFGWO-Mal system against other malware datasets.

Introduction

The introduction of the research paper outlines the escalating threat of malware in the context of the growing internet landscape. Initially, cyberattacks were infrequent, but the proliferation of malware—designed for espionage, financial gain, or data theft—has surged, resulting in significant economic losses estimated at $6 trillion in 2021. The market for malware analysis is projected to grow from $3 billion in 2019 to $11.7 billion by 2024. Traditional detection methods, reliant on manually constructed rules, have become inadequate due to the sophistication of advanced malware that can bypass standard security measures. Consequently, security providers are increasingly adopting machine learning (ML) techniques to enhance malware detection and classification.

The paper emphasizes the challenges posed by the vast and diverse datasets involved in malware detection, necessitating effective feature selection to improve ML performance. Feature selection is highlighted as a complex task, particularly in high-dimensional datasets, prompting researchers to explore metaheuristic algorithms for optimization. The gray wolf optimizer (GWO) is identified as a promising approach for feature selection in this context. The introduction also reviews various methodologies for malware detection, including frameworks that minimize false positives and innovative classification techniques that leverage permission usage analysis and deep learning paradigms. Notable findings from previous studies demonstrate high accuracy rates in malware detection, underscoring the effectiveness of these advanced methodologies.

Results

The results of the study demonstrate the effectiveness of the proposed RFGWO-Mal system in both binary and multiclass classification tasks. The system utilizes the Grey Wolf Optimizer (GWO) to select a subset of features, which were subsequently evaluated using two feature selection methods: C-FS and the newly introduced self-union U-FS approach.

The classification tasks were executed using a Random Forest (RF) classifier, as outlined in section 3.3 of the paper. The findings indicate that the RFGWO-Mal system, through its optimized feature selection and classification methodology, shows promise in enhancing classification performance across different types of datasets.

Discussion

The discussion section of the research paper highlights the effectiveness of the proposed RFGWO-Mal system for malware detection, which integrates a Random Forest (RF) classifier with a Grey Wolf Optimizer (GWO) for feature selection. The study reveals that traditional machine learning classifiers, including Decision Trees (DT), achieved high accuracy rates, with DT reaching 99%. The RFGWO-Mal system demonstrates superior performance, achieving 99.95% accuracy in binary classification and 86.57% in multiclass classification, outperforming conventional feature selection methods. The innovative self-union feature selection approach, which combines features selected from both binary and multiclass classifications, significantly enhances the system’s accuracy, recall, and precision, particularly in multiclass scenarios.

The research emphasizes the importance of effective feature selection in improving machine learning performance in cybersecurity. The GWO optimizer’s ability to dynamically adapt to evolving malware attributes and its efficiency in resource-limited environments are noted as key advantages. The findings suggest that the self-union feature selection method not only matches traditional methods in binary classification but also provides substantial improvements in multiclass classification tasks. This advancement positions the RFGWO-Mal system as a robust tool for malware detection, contributing to enhanced cybersecurity measures and laying the groundwork for future research in this domain.