تحسين هجين باستخدام أساليب ميتاheuristic للكشف عن الأمراض غير المعدية وتشخيصها Hybrid metaheuristic optimization for detecting and diagnosing noncommunicable diseases

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-91136-3
PMID: https://pubmed.ncbi.nlm.nih.gov/40050658
تاريخ النشر: 2025-03-06
المؤلف: Saleem Malik وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية

نظرة عامة

تتناول الأبحاث التحديات الحرجة في الكشف المبكر وإدارة الأمراض غير السارية (NCDs)، التي تفاقمت بسبب جائحة COVID-19. وتقترح إطارًا شاملاً يدمج تقنيات استخراج البيانات المتقدمة، واختيار الميزات، والتحسين الميتا-heuristic لتعزيز توقع الأمراض وعلاجها. ومن الجدير بالذكر أن الدراسة تقدم خوارزميتين هجينتين جديدتين: خوارزمية اختيار تقليل متعددة وراثية هرمية (H-GMRA) وخوارزمية تحسين سرب الجسيمات المعتمدة على الوظائف المخصصة مع نظرية المجموعات الخشنة لاختيار ميزات NCD (CPSO-RST-NFS). تتعامل هذه الخوارزميات بفعالية مع القضايا المتعلقة باختيار الميزات، وتعقيد الحسابات، ودقة التصنيف، حيث تظهر H-GMRA أداءً متفوقًا في تحديد مجموعات الميزات الدنيا ذات نسب الاعتماد العالية.

تشير النتائج التجريبية إلى أن الإطار المقترح يحسن بشكل كبير دقة التصنيف عبر مجموعات بيانات NCD المختلفة، خاصة عند دمجه مع مصنفات مثل آلات الدعم الشعاعي (SVM). تسهل خوارزمية CPSO-RST-NFS، التي تستخدم دالة لياقة محسّنة تعتمد على النسبة الذهبية، التقارب الأسرع والنتائج المثلى في اختيار الميزات. تؤكد الدراسة على أهمية اختيار الميزات في تعزيز دقة التشخيص، مما يساهم في تحسين نتائج المرضى. بشكل عام، توفر هذه الأبحاث رؤى وأدوات قيمة لممارسي الرعاية الصحية ومحللي البيانات، مما يعزز مجال أبحاث NCD ويقدم الأمل في تحسين التوقعات السريرية ورعاية المرضى.

الطرق

تناقش قسم الطرق في ورقة البحث تطبيق الخوارزميات الاستدلالية لحل المشكلات واتخاذ القرارات، مع التركيز على كفاءتها في معالجة القضايا المعقدة والحساسة للوقت دون الحاجة إلى حلول دقيقة. تركز الدراسة بشكل خاص على الأمراض المزمنة غير السارية (NCDs) مثل أمراض القلب والأوعية الدموية، والسكري، والسرطان، وأمراض الكبد، التي تمثل 71% من الوفيات العالمية. تم تنفيذ خوارزمية CPSO-RST-NFS المقترحة في MATLAB 2016a، باستخدام مجموعات بيانات من مستودع التعلم الآلي بجامعة كاليفورنيا في إيرفين. تم تحليل كل مجموعة بيانات، التي تم تصنيفها على أنها ثنائية مع متغيرات مستهدفة تشير إلى الحالة الصحية، لتقييم فعالية الخوارزمية في اختيار الميزات وتشخيص الأمراض.

تم إجراء تحليل مقارن لخوارزمية CPSO-RST-NFS مقابل QRA وH-GMRA، مما كشف عن تحسينات كبيرة في اختيار الميزات وقيم الاعتماد عبر مجموعات بيانات مختلفة. على سبيل المثال، حققت خوارزمية CPSO-RST-NFS تقليصًا بنسبة 42.86% في الميزات وزيادة بنسبة 28.43% في دقة الاعتماد لمجموعة بيانات SPECTF. لوحظت اتجاهات مماثلة في مجموعات بيانات أخرى، مع تحسينات ملحوظة في نسب الاعتماد، حتى عندما اختلف عدد الميزات المختارة. بشكل عام، تشير النتائج إلى أن CPSO-RST-NFS تقلل بفعالية من حجم مجموعة الميزات مع تعزيز قيم الاعتماد، مما يظهر إمكانياتها في تحسين اختيار الميزات في تطبيقات الرعاية الصحية المتعلقة بـ NCD.

المناقشة

في قسم المناقشة من ورقة البحث، يقيم المؤلفون مساهمات وقيود خوارزميتين: CPSO-RST-NFS وH-GMRA. بينما تعالج كلتا الخوارزميتين بفعالية اختيار الميزات واختيار تقليل متعدد، على التوالي، إلا أنهما لا تعتبران جديدتين حيث تبنيان على منهجيات قائمة مثل تحسين سرب الجسيمات (PSO) ونظرية المجموعات الخشنة (RST). تكمن الابتكارات في تكييفاتهما مع التحديات المحددة في التعامل مع مجموعات بيانات الأمراض غير السارية (NCD)، مما يبرز عمليتهما وكفاءتهما في البيئات المعقدة.

تسلط الورقة الضوء على أهمية معالجة البيانات، خاصة في إدارة القيم المفقودة وتقليل الضوضاء، وهو أمر حاسم للحفاظ على صحة الاستنتاجات المستخلصة من مجموعات بيانات NCD. يستخدم المؤلفون تقدير المتوسط والتطبيع من الحد الأدنى إلى الحد الأقصى لتعزيز جودة البيانات قبل اختيار الميزات. يُلاحظ استخدام RST لقدرته على إدارة البيانات غير المكتملة وغير المتسقة، مما يسمح بتقليل الخصائص بشكل فعال وتحسين دقة التصنيف. تم تقديم خوارزمية Quick Reduct كطريقة لاختيار الميزات ذات الصلة، مما يعزز قابلية تفسير النموذج ويقلل من الإفراط في التكيف.

علاوة على ذلك، يتم تقديم دمج PSO مع RST في خوارزمية CPSO-RST-NFS كتحسين كبير في اختيار الميزات لمجموعات بيانات NCD. تستفيد هذه الطريقة الهجينة من نقاط القوة في كلا المنهجين، باستخدام دالة هدف مخصصة لتحسين مجموعات الميزات بناءً على نسب اعتمادها. يهدف الهيكل المقترح ذو المرحلتين، الذي يتكون من H-GMRA لاختيار الميزات الأولية تليه CPSO-RST-NFS للتنقيح، إلى تحسين دقة التصنيف وكفاءة الحساب في أبحاث NCD. يؤكد المؤلفون أن استراتيجيات اختيار الميزات المخصصة الخاصة بهم ضرورية لمعالجة التعقيدات الفريدة لمجموعات بيانات NCD، مما يعزز في النهاية التوقعات السريرية ونتائج المرضى.

القيود

تنبع قيود هذه الدراسة بشكل أساسي من تركيزها الضيق على اختيار الميزات ودقة التصنيف، متجاهلة جوانب حيوية مثل قابلية تفسير الميزات المختارة والأهمية السريرية للنتائج في تشخيص الأمراض غير السارية (NCD). بينما تظهر الخوارزميات الهجينة المقترحة وعدًا، إلا أن فعاليتها غير متسقة عبر مجموعات بيانات NCD المختلفة، مما يشير إلى الحاجة إلى مزيد من الضبط والتحقق مع مجموعة أوسع من البيانات السريرية الواقعية.

علاوة على ذلك، فإن مجموعات البيانات المستخدمة في هذا البحث لا تمثل بشكل كافٍ تعقيد وتنوع السيناريوهات السريرية الفعلية، مما يبرز ضرورة إجراء دراسات إضافية للتحقق من هذه النتائج في بيئات الرعاية الصحية العملية. يجب أن تعطي الأبحاث المستقبلية الأولوية لدمج قابلية التفسير والأهمية السريرية جنبًا إلى جنب مع اختيار الميزات ودقة التصنيف. لتعزيز متانة وقابلية تعميم الخوارزميات، يُوصى بدمج مصادر بيانات متنوعة، مثل السجلات الصحية الإلكترونية وبيانات ديموغرافية المرضى. في النهاية، الهدف هو تطوير أدوات أو منصات برمجية سهلة الاستخدام في الرعاية الصحية تسهل تحسين تحديد وإدارة الأمراض غير السارية.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-91136-3
PMID: https://pubmed.ncbi.nlm.nih.gov/40050658
Publication Date: 2025-03-06
Author(s): Saleem Malik et al.
Primary Topic: Artificial Intelligence in Healthcare

Overview

The research addresses the critical challenges in the early detection and management of Non-Communicable Diseases (NCDs), exacerbated by the COVID-19 pandemic. It proposes a comprehensive framework that integrates advanced data mining techniques, feature selection, and meta-heuristic optimization to enhance disease prediction and treatment. Notably, the study introduces two novel hybrid algorithms: the Hierarchical Genetic Multiple Reduct Selection Algorithm (H-GMRA) and the Customized Function-based Particle Swarm Optimization with Rough Set Theory for NCD Feature Selection (CPSO-RST-NFS). These algorithms effectively tackle issues related to feature selection, computational complexity, and classification accuracy, with H-GMRA demonstrating superior performance in identifying minimal feature sets with high dependency ratios.

The experimental results indicate that the proposed framework significantly improves classification accuracy across various NCD datasets, particularly when combined with classifiers such as Support Vector Machines (SVM). The CPSO-RST-NFS algorithm, utilizing an optimized fitness function based on the Golden Ratio, facilitates faster convergence and optimal results in feature selection. The study emphasizes the importance of feature selection in enhancing diagnostic accuracy, thereby contributing to better patient outcomes. Overall, this research provides valuable insights and tools for healthcare practitioners and data analysts, advancing the field of NCD research and offering hope for improved clinical predictions and patient care.

Methods

The methods section of the research paper discusses the application of heuristic algorithms for problem-solving and decision-making, emphasizing their efficiency in addressing complex, time-sensitive issues without requiring precise solutions. Specifically, the study focuses on non-communicable chronic diseases (NCDs) such as Cardiovascular Disease, Diabetes, Cancer, and Liver Disease, which account for 71% of global deaths. The proposed CPSO-RST-NFS algorithm was implemented in MATLAB 2016a, utilizing datasets from the UC Irvine Machine Learning Repository. Each dataset, characterized as binary with target variables indicating health status, was analyzed to evaluate the algorithm’s efficacy in feature selection and disease diagnosis.

A comparative analysis of the CPSO-RST-NFS algorithm against QRA and H-GMRA was conducted, revealing significant improvements in feature selection and dependency values across various datasets. For instance, the CPSO-RST-NFS algorithm achieved a 42.86% reduction in features and a 28.43% increase in dependency accuracy for the SPECTF dataset. Similar trends were observed in other datasets, with notable improvements in dependency ratios, even when the number of selected features varied. Overall, the findings indicate that CPSO-RST-NFS effectively reduces feature set size while enhancing dependency values, demonstrating its potential for optimizing feature selection in NCD-related healthcare applications.

Discussion

In the discussion section of the research paper, the authors evaluate the contributions and limitations of two algorithms: CPSO-RST-NFS and H-GMRA. While both algorithms effectively address feature selection and multiple reduct selection, respectively, they are not considered novel as they build upon established methodologies such as particle swarm optimization (PSO) and rough set theory (RST). The innovation lies in their adaptations to specific challenges in handling non-communicable disease (NCD) datasets, emphasizing their practicality and efficiency in complex environments.

The paper highlights the importance of data preprocessing, particularly in managing missing values and noise reduction, which is crucial for maintaining the validity of conclusions drawn from NCD datasets. The authors employ mean imputation and min-max normalization to enhance data quality before feature selection. The use of RST is noted for its ability to manage incomplete and inconsistent data, allowing for effective attribute reduction and improved classification accuracy. The Quick Reduct Algorithm is introduced as a method to select relevant features, thereby enhancing model interpretability and reducing overfitting.

Furthermore, the integration of PSO with RST in the CPSO-RST-NFS algorithm is presented as a significant advancement in feature selection for NCD datasets. This hybrid approach leverages the strengths of both methodologies, utilizing a customized objective function to optimize feature subsets based on their dependency ratios. The proposed two-phase architecture, consisting of H-GMRA for initial feature selection followed by CPSO-RST-NFS for refinement, aims to improve classification accuracy and computational efficiency in NCD research. The authors assert that their tailored feature selection strategies are essential for addressing the unique complexities of NCD datasets, ultimately enhancing clinical predictions and patient outcomes.

Limitations

The limitations of this study primarily stem from its narrow focus on feature selection and classification accuracy, neglecting critical aspects such as the interpretability of selected features and the clinical relevance of outcomes in non-communicable disease (NCD) diagnosis. While the proposed hybrid algorithms show promise, their effectiveness is inconsistent across different NCD datasets, indicating a need for further fine-tuning and validation with a broader range of real-world clinical data.

Moreover, the datasets utilized in this research fail to adequately represent the complexity and diversity of actual clinical scenarios, underscoring the necessity for additional studies to validate these findings in practical healthcare settings. Future research should prioritize the integration of interpretability and clinical relevance alongside feature selection and classification accuracy. To enhance the robustness and generalizability of the algorithms, it is recommended to incorporate diverse data sources, such as electronic health records and patient demographics. Ultimately, the goal is to develop user-friendly healthcare software tools or platforms that facilitate improved identification and management of NCDs.