الكشف عن الأعطال وتصنيفها بشكل موثوق في خطوط نقل الطاقة عبر نماذج التعلم الآلي الجماعية Robust fault detection and classification in power transmission lines via ensemble machine learning models

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-86554-2
PMID: https://pubmed.ncbi.nlm.nih.gov/39833299
تاريخ النشر: 2025-01-20
المؤلف: Zhenyun Du وآخرون
الموضوع الرئيسي: كشف أعطال أنظمة الطاقة

نظرة عامة

تتناول البحث القضية الحرجة لاكتشاف الأعطال في خطوط النقل، والتي تعتبر ضرورية لتوصيل الكهرباء لمسافات طويلة ولكنها غالبًا ما تواجه تحديات موثوقية بسبب الأعطال. يقترح الدراسة نهجًا جديدًا يستخدم خوارزميات التعلم الآلي، بما في ذلك الغابة العشوائية (RF)، وأقرب الجيران (KNN)، وشبكات الذاكرة طويلة وقصيرة المدى (LSTM)، لتحليل أنماط الجهد والتيار لتصنيف الأعطال. تم تقديم طريقة تجميع، تُسمى RF-LSTM Tuned KNN، تحقق دقة مثيرة للإعجاب تبلغ 99.96% في التصنيف متعدد التسميات، متجاوزة بشكل كبير الأداء الفردي لـ RF (97.50%) و KNN (96.55%). في التصنيف الثنائي، يتصدر KNN بدقة 99.85%، يليه RF عن كثب عند 99.72%. تؤكد هذه النتائج على إمكانيات تقنيات التعلم الآلي المتقدمة لتعزيز موثوقية واستقرار شبكات الطاقة.

تؤكد الخاتمة على الدور التحويلي للتعلم الآلي في تحسين منهجيات اكتشاف الأعطال لأنظمة الطاقة. تقيم الدراسة بدقة أداء نماذج وتقنيات التجميع المختلفة، كاشفة أن نموذج RF-LSTM Stack Tune KNN يتفوق في التصنيف متعدد التسميات بدقة 99.93%. يسلط البحث الضوء على فعالية دمج نماذج RF و LSTM لإدارة مجموعات البيانات المعقدة والاعتمادات الزمنية، محققًا أداءً متفوقًا مقارنة بالدراسات السابقة. ومع ذلك، يشير أيضًا إلى قيود نماذج التجميع، بما في ذلك الحاجة إلى مجموعات بيانات تدريب كبيرة وزيادة المتطلبات الحاسوبية، والتي قد تشكل تحديات للتطبيقات في الوقت الحقيقي في أنظمة الطاقة الكبيرة. على الرغم من هذه التحديات، تشير النتائج إلى اتجاه واعد لتعزيز موثوقية الشبكة وكفاءة التشغيل من خلال نهج التعلم الآلي المتطورة.

طرق

استخدمت منهجية الدراسة مجموعة متنوعة من خوارزميات التعلم الآلي، بما في ذلك الغابة العشوائية (RF)، وأقرب الجيران (KNN)، وشبكات الذاكرة طويلة وقصيرة المدى (LSTM)، لتحديد وتصنيف أعطال خطوط النقل. تم تطوير نموذج هجين يجمع بين RF و LSTM مع مجموعة KNN مضبوطة. شمل الإطار معالجة البيانات المسبقة ونمذجة التنبؤ، باستخدام تحسين سرب الجسيمات (PSO) لضبط المعلمات. تم تقييم النموذج باستخدام مصفوفات الارتباك، والتحقق المتقاطع، ومنحنيات ROC، وتحليل منحنيات التعلم، مع تسهيل TensorFlow تطوير وتدريب النماذج على نظام مزود بمعالج Intel Core i5-4590، و بطاقة رسومات Nvidia GeForce GTX 750Ti، و 8 جيجابايت من الذاكرة العشوائية.

لتقييم أداء النموذج، تم استخدام عدة مقاييس، بما في ذلك الدقة، والدقة، والاسترجاع، ودرجة F1، ومساحة تحت منحنى ROC (AUC). قدمت مصفوفة الارتباك نظرة مفصلة على الإيجابيات الحقيقية (TP)، والسلبيات الحقيقية (TN)، والإيجابيات الكاذبة (FP)، والسلبيات الكاذبة (FN). الدقة، المعرفة بأنها \( \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \)، تقدم مقياسًا عامًا للصواب ولكن يمكن أن تكون مضللة في مجموعات البيانات غير المتوازنة. الدقة، المحسوبة كـ \( \text{Precision} = \frac{TP}{TP + FP} \)، والاسترجاع، أو الحساسية، المعرفة بأنها \( \text{Recall} = \frac{TP}{TP + FN} \)، تعتبر حاسمة في السياقات التي تحمل فيها الإيجابيات أو السلبيات الكاذبة عواقب كبيرة. درجة F1، المعطاة بواسطة \( F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \)، توازن بين هذين المقياسين. ترسم منحنى ROC معدل الإيجابيات الحقيقية مقابل معدل الإيجابيات الكاذبة، مع تلخيص AUC أداء النموذج، حيث يشير AUC بقيمة 1 إلى تمييز مثالي و 0.5 يشير إلى عدم وجود قدرة تمييزية. تضمن هذه المقاييس مجتمعة تقييمًا قويًا لفعالية النموذج في تحديد وتصنيف الأعطال في العالم الحقيقي لخطوط النقل.

النتائج

في قسم النتائج، يقدم المؤلفون تقدمًا منهجيًا لمنهجيتهم المقترحة، بدءًا من التصنيف الثنائي وامتدادًا إلى التصنيف متعدد الفئات. يتم تقييم أداء النموذج بدقة باستخدام مجموعة من المقاييس الشاملة، والتي تعتبر ضرورية لوضع النتائج في سياقها. لا تبرز هذه الفحص التفصيلي فعالية المنهجية فحسب، بل تقدم أيضًا رؤى حول قابليتها للتطبيق عبر سيناريوهات تصنيف مختلفة.

المناقشة

تقيم قسم المناقشة في هذه الورقة البحثية فعالية نماذج التعلم الآلي المختلفة لتصنيف الأعطال في خطوط نقل الطاقة، مع التركيز على كل من السيناريوهات الثنائية ومتعددة التسميات. تستخدم الدراسة مجموعة بيانات اكتشاف وتصنيف الأعطال الكهربائية من كاجل، والتي تتضمن بيانات من نظام يحتوي على مولدات ومحولات متعددة، لتطوير خوارزميات تعزز اكتشاف وتصنيف الأعطال. تم تقسيم مجموعة البيانات إلى تصنيفات ثنائية ومتعددة الفئات، حيث حقق نموذج التصنيف الثنائي دقة مثيرة للإعجاب تبلغ 99.85% باستخدام أقرب الجيران (KNN)، متفوقًا بشكل كبير على الطرق التقليدية مثل آلات الدعم المتجهة (SVM) مع تحليل المكونات الرئيسية (PCA). كما أظهر نموذج الغابة العشوائية (RF) أداءً قويًا بدقة 99.72%.

بالنسبة للتصنيف متعدد التسميات، تقدم البحث نموذج تجميع يجمع بين تقنيات RF وذاكرة طويلة وقصيرة المدى (LSTM)، محققًا دقة ملحوظة تبلغ 99.96%. لا يعزز نموذج RF-LSTM Stacked Tune KNN الأداء التنبؤي فحسب، بل يقلل أيضًا من التعقيد الحاسوبي مقارنة بالهياكل العميقة الأكثر تعقيدًا. تشمل التقييم منحنيات ROC ومصفوفات الارتباك، التي تؤكد على قوة النماذج عبر فئات الأعطال المختلفة. تؤكد النتائج على الميزة الاستراتيجية للتعلم التجميعي في تحسين دقة تصنيف الأعطال مع الحفاظ على الكفاءة التشغيلية، مما يقدم حلاً واعدًا للتطبيقات الواقعية في اكتشاف أعطال أنظمة الطاقة. ومع ذلك، تعترف الدراسة بالتحديات التي تطرحها المتطلبات الحاسوبية لنماذج التجميع، خاصة في السيناريوهات في الوقت الحقيقي.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-86554-2
PMID: https://pubmed.ncbi.nlm.nih.gov/39833299
Publication Date: 2025-01-20
Author(s): Zhenyun Du et al.
Primary Topic: Power Systems Fault Detection

Overview

The research addresses the critical issue of fault detection in transmission lines, which are essential for long-distance electricity delivery but often face reliability challenges due to faults. The study proposes a novel approach that utilizes machine learning algorithms, including Random Forest (RF), K-Nearest Neighbors (KNN), and Long Short-Term Memory (LSTM) networks, to analyze voltage and current patterns for fault classification. An ensemble method, termed RF-LSTM Tuned KNN, is introduced, achieving an impressive accuracy of 99.96% in multi-label classification, significantly surpassing the individual performances of RF (97.50%) and KNN (96.55%). In binary classification, KNN leads with 99.85% accuracy, followed closely by RF at 99.72%. These findings underscore the potential of advanced machine learning techniques to enhance the reliability and stability of power grids.

The conclusion emphasizes the transformative role of machine learning in improving fault detection methodologies for power systems. The study rigorously evaluates the performance of various models and ensemble techniques, revealing that the RF-LSTM Stack Tune KNN model excels in multi-label classification with an accuracy of 99.93%. The research highlights the effectiveness of integrating RF and LSTM models to manage complex datasets and temporal dependencies, achieving superior performance compared to previous studies. However, it also notes the limitations of ensemble models, including the need for large training datasets and increased computational demands, which may pose challenges for real-time applications in large-scale power systems. Despite these challenges, the findings suggest a promising direction for enhancing grid reliability and operational efficiency through sophisticated machine learning approaches.

Methods

The methodology of the study employed various machine learning algorithms, including Random Forest (RF), K-Nearest Neighbors (KNN), and Long Short-Term Memory (LSTM), to identify and classify transmission line faults. A hybrid model combining RF and LSTM with a tuned KNN ensemble was developed. The framework included data preprocessing and prediction modeling, utilizing Particle Swarm Optimization (PSO) for hyperparameter tuning. Model evaluation was conducted using confusion matrices, cross-validation, ROC curves, and learning curve analysis, with TensorFlow facilitating the development and training of the models on a system equipped with an Intel Core i5-4590 CPU, Nvidia GeForce GTX 750Ti GPU, and 8 GB of RAM.

To assess model performance, several metrics were employed, including accuracy, precision, recall, F1 score, and the Area Under the ROC Curve (AUC). The confusion matrix provided a detailed view of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Accuracy, defined as \( \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \), offers an overall measure of correctness but can be misleading in imbalanced datasets. Precision, calculated as \( \text{Precision} = \frac{TP}{TP + FP} \), and recall, or sensitivity, defined as \( \text{Recall} = \frac{TP}{TP + FN} \), are critical in contexts where false positives or negatives carry significant consequences. The F1 score, given by \( F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \), balances these two metrics. The ROC curve plots the true positive rate against the false positive rate, with the AUC summarizing model performance, where an AUC of 1 indicates perfect discrimination and 0.5 indicates no discriminative ability. These metrics collectively ensure a robust evaluation of the model’s effectiveness in real-world fault identification and classification for transmission lines.

Results

In the Results section, the authors present a systematic progression of their proposed methodology, starting from binary classification and extending to multi-class classification. The performance of the model is rigorously evaluated using a range of comprehensive metrics, which are essential for contextualizing the findings. This detailed examination not only highlights the effectiveness of the methodology but also provides insights into its applicability across different classification scenarios.

Discussion

The discussion section of this research paper evaluates the effectiveness of various machine learning models for fault classification in power transmission lines, focusing on both binary and multi-label scenarios. The study utilizes the Electrical Fault Detection and Classification dataset from Kaggle, which includes data from a system with multiple generators and transformers, to develop algorithms that enhance fault detection and classification. The dataset is divided into binary and multi-class classifications, with the binary classification model achieving an impressive accuracy of 99.85% using K-Nearest Neighbors (KNN), significantly outperforming traditional methods like Support Vector Machines (SVM) with Principal Component Analysis (PCA). The Random Forest (RF) model also demonstrated strong performance with an accuracy of 99.72%.

For multi-label classification, the research introduces an ensemble model combining RF and Long Short-Term Memory (LSTM) techniques, achieving a remarkable accuracy of 99.96%. This RF-LSTM Stacked Tune KNN ensemble model not only enhances predictive performance but also reduces computational complexity compared to more intricate deep learning architectures. The evaluation includes ROC curves and confusion matrices, which confirm the robustness of the models across different fault categories. The findings underscore the strategic advantage of ensemble learning in improving fault classification accuracy while maintaining operational efficiency, thus presenting a promising solution for real-world applications in power system fault detection. However, the study acknowledges the challenges posed by the computational demands of ensemble models, particularly in real-time scenarios.