التعرف في الوقت الحقيقي على هجمات التصيد الاحتيالي من خلال ملحقات المتصفح المعززة بتعلم الآلة Real time identification of phishing attacks through machine learning enhanced browser extensions

المجلة: Scientific Reports، المجلد: 16، العدد: 1
DOI: https://doi.org/10.1038/s41598-026-35655-7
PMID: https://pubmed.ncbi.nlm.nih.gov/41611809
تاريخ النشر: 2026-01-29
المؤلف: Monika Dandotiya وآخرون
الموضوع الرئيسي: كشف البريد المزعج والاحتيال

نظرة عامة

تتناول ورقة البحث التهديد المستمر لهجمات التصيد الاحتيالي، التي تستغل المواقع الوهمية لسرقة المعلومات الشخصية. تقترح إضافة متصفح لجوجل كروم تستخدم تقنيات التعلم الآلي لتحليل عناوين URL والعناصر المرئية في الوقت الحقيقي، مما يتيح التعرف الفعال على مواقع التصيد. يستخدم النظام خوارزميات مثل آلة الدعم الناقل (SVM) وشجرة القرار (DT) والغابة العشوائية (RF) لاستخراج وتقييم الميزات الهجينة، بما في ذلك المعلمات المعجمية والهيكلية والمرئية. يتم استخدام مُحسِّن الذئب الرمادي (GWO) لتعزيز اختيار الميزات، مما يؤدي إلى تحسين الأداء وتقليل استهلاك الموارد الحاسوبية.

أظهر النظام المقترح أداءً استثنائيًا على مجموعات البيانات المرجعية، محققًا درجة معامل ارتباط ماثيو (MCC) تبلغ 0.96 ومعدل دقة يبلغ 98.7%. يتفوق على الحلول الحالية لمكافحة التصيد من خلال تقديم تقييمات فورية لعناوين URL، وتكييف التحذيرات بناءً على إجراءات المستخدم، والتعامل بفعالية مع عناوين URL الملتبسة. يسلط هذا البحث الضوء على أهمية دمج تقنيات دمج الميزات مع مصنفات التعلم الآلي لتعزيز دقة اكتشاف التصيد، مما يساهم في النهاية في آلية دفاع تركز على المستخدم ضد التهديدات المتطورة للتصيد.

مقدمة

في مقدمة ورقة البحث، يعترف المؤلفون بأنه على الرغم من أن النظام المقترح يظهر أداءً قويًا في اكتشاف محاولات التصيد، إلا أنه ليس بدون قيود. إحدى القضايا المهمة هي اعتماده على مجموعات بيانات قد لا تعكس بشكل كافٍ الطبيعة المتطورة باستمرار لاستراتيجيات التصيد. بالإضافة إلى ذلك، فإن تكييف النظام للمنصات المحمولة ومتصفحات الويب المختلفة يمثل تحديات محتملة. لضمان فعالية مستدامة ضد تقنيات التصيد الجديدة، يؤكد المؤلفون على ضرورة التحديثات المنتظمة للنظام.

النتائج

في قسم النتائج، تحقق الدراسة من خصائص رسائل البريد الإلكتروني الاحتيالية من خلال تحليل مجموعة بيانات كبيرة من الاتصالات الاحتيالية لتحديد الأنماط اللغوية الشائعة والعبارات الخادعة. باستخدام نهج تكرار المصطلحات – تكرار الوثائق العكسي (tf-idf)، تسلط الدراسة الضوء على انتشار كلمات وعبارات معينة تُستخدم بشكل متكرر في محاولات التصيد. أظهر خوارزمية الجيران الأقرب (KNN) دقة ملحوظة تبلغ 100% في اختبار الوحدة، كما يتضح من توافق المخرجات المتوقعة والفعلي. تصنف هذه الخوارزمية عناوين URL من خلال مقارنتها بمواقع التصيد المعروفة والمشروعة، مما يميز بفعالية بين الاثنين.

بالإضافة إلى ذلك، يعتمد أداء خوارزمية KNN على جودة مجموعة البيانات، مع احتمال انخفاض الدقة عند مواجهة تكتيكات تصيد جديدة تختلف بشكل كبير عن الحالات التي تم التعرف عليها سابقًا. كما تفيد الدراسة بأن نموذج آلة الدعم الناقل (SVM) يتطابق باستمرار مع المخرجات المتوقعة، مما يؤكد موثوقيته في اكتشاف عناوين URL المحتملة الاحتيالية. ومن الجدير بالذكر أن النموذج قام بدقة بتحديد الإدخال ‘paypal.de@secure-‘ كتصيد، مما يبرز فعاليته في التعرف على العناوين الوهمية.

المناقشة

في هذا القسم من المناقشة، تبرز الدراسة تزايد تعقيد هجمات التصيد وعيوب طرق الكشف التقليدية، مثل القوائم السوداء والنماذج الاستدلالية، التي تكافح للتكيف مع التهديدات المتطورة. الحل المقترح هو إضافة لمتصفح كروم تستفيد من تقنيات التعلم الآلي، بما في ذلك تحليل URL والتعرف على التشابه المرئي، لتعزيز اكتشاف التصيد في الوقت الحقيقي. تهدف هذه الإضافة إلى تزويد المستخدمين بتنبيهات فورية ونصائح أمان قابلة للتنفيذ، مما يحسن الأمان عبر الإنترنت ووعي المستخدم.

تؤكد الدراسة على دمج ميزات متنوعة – مثل سمات HTML والخصائص المرئية وخصائص URL – من خلال مُحسِّن الذئب الرمادي (GWO) لاختيار الميزات بشكل فعال. لا يقلل هذا النهج من التكرار والتكيف الزائد فحسب، بل يعزز أيضًا دقة الاكتشاف. تظهر الأبحاث أن النموذج المقترح يتفوق على الطرق الحالية من خلال استخدام خوارزميات التعلم الآلي المتقدمة ومجموعة بيانات شاملة، مما يساهم في النهاية في دفاع أكثر قوة ضد هجمات التصيد. تؤكد النتائج على ضرورة وجود حلول مبتكرة وديناميكية في الأمن السيبراني لحماية الأفراد والمنظمات من التهديد المتزايد للتصيد.

Journal: Scientific Reports, Volume: 16, Issue: 1
DOI: https://doi.org/10.1038/s41598-026-35655-7
PMID: https://pubmed.ncbi.nlm.nih.gov/41611809
Publication Date: 2026-01-29
Author(s): Monika Dandotiya et al.
Primary Topic: Spam and Phishing Detection

Overview

The research paper addresses the persistent threat of phishing attacks, which exploit fake websites to steal personal information. It proposes a browser extension for Google Chrome that employs machine learning techniques to analyze URLs and visual elements in real-time, effectively identifying phishing sites. The system utilizes algorithms such as Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF) to extract and evaluate hybrid features, including lexical, structural, and visual parameters. The Grey Wolf Optimizer (GWO) is employed to enhance feature selection, leading to improved performance and reduced computational resource consumption.

The proposed system demonstrated exceptional performance on benchmark datasets, achieving a Matthews correlation coefficient (MCC) score of 0.96 and an accuracy rate of 98.7%. It outperforms existing anti-phishing solutions by providing real-time assessments of URLs, adapting warnings based on user actions, and effectively handling obfuscated URLs. This research highlights the importance of integrating feature fusion techniques with machine learning classifiers to enhance phishing detection accuracy, ultimately contributing to a user-centered defense mechanism against evolving phishing threats.

Introduction

In the introduction of the research paper, the authors acknowledge that while the proposed system exhibits robust performance in detecting phishing attempts, it is not without limitations. One significant concern is its reliance on datasets that may not adequately reflect the continuously evolving nature of phishing strategies. Additionally, the adaptation of the system for mobile platforms and various web browsers presents potential challenges. To ensure sustained effectiveness against new phishing techniques, the authors emphasize the necessity for regular updates to the system.

Results

In the results section, the study investigates the characteristics of phishing emails by analyzing a substantial dataset of fraudulent communications to identify common linguistic patterns and deceptive phrases. Utilizing a term frequency-inverse document frequency (tf-idf) approach, the research highlights the prevalence of specific words and phrases that are frequently employed in phishing attempts. The K-Nearest Neighbors (KNN) algorithm demonstrated a remarkable 100% accuracy in unit testing, as evidenced by the alignment of predicted and actual outputs. This algorithm classifies URLs by comparing them to known phishing and legitimate sites, effectively distinguishing between the two.

Additionally, the performance of the KNN algorithm is contingent upon the quality of the dataset, with potential declines in accuracy when encountering novel phishing tactics that differ significantly from previously identified cases. The study also reports that the Support Vector Machine (SVM) model consistently matched expected outputs, affirming its reliability in detecting potentially fraudulent URLs. Notably, the model accurately flagged the input ‘paypal.de@secure-‘ as phishing, underscoring its effectiveness in identifying deceptive web addresses.

Discussion

In this discussion section, the research highlights the increasing sophistication of phishing attacks and the inadequacy of traditional detection methods, such as blacklists and heuristic models, which struggle to adapt to evolving threats. The proposed solution is a Chrome extension that leverages machine learning techniques, including URL analysis and visual similarity recognition, to enhance real-time phishing detection. This extension aims to provide users with immediate alerts and actionable safety tips, thereby improving online security and user awareness.

The study emphasizes the integration of various features—such as HTML attributes, visual characteristics, and URL properties—through a Grey Wolf Optimizer (GWO) for effective feature selection. This approach not only reduces redundancy and overfitting but also enhances detection accuracy. The research demonstrates that the proposed model outperforms existing methods by utilizing advanced machine learning algorithms and a comprehensive dataset, ultimately contributing to a more robust defense against phishing attacks. The findings underscore the necessity for innovative, dynamic solutions in cybersecurity to protect individuals and organizations from the growing threat of phishing.