التنبؤ وتصنيف مخاطر السمنة بناءً على نهج تعلم آلي هجين ميتاheuristic Prediction and classification of obesity risk based on a hybrid metaheuristic machine learning approach

المجلة: Frontiers in Big Data، المجلد: 7
DOI: https://doi.org/10.3389/fdata.2024.1469981
PMID: https://pubmed.ncbi.nlm.nih.gov/39403430
تاريخ النشر: 2024-09-30
المؤلف: Zarindokht Helforoush وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية

نظرة عامة

تبحث ورقة البحث في تطبيق تقنيات التعلم الآلي لتعزيز توقع مخاطر السمنة، مع معالجة قيود نماذج الانحدار التقليدية التي تكافح لمراعاة التفاعلات المعقدة بين العوامل الوراثية والبيئية والسلوكية. تقدم الدراسة نموذجًا هجينًا جديدًا يجمع بين الشبكات العصبية الاصطناعية (ANN) وتحسين سرب الجسيمات (PSO)، والذي، بعد معالجة البيانات بشكل شامل، حقق معدل دقة مثير للإعجاب يبلغ 92%، متجاوزًا خوارزميات أخرى تم تقييمها مثل الانحدار اللوجستي (LR)، آلة الدعم الناقل (SVM)، الغابة العشوائية (RF)، XGBoost، LightGBM (LGBM)، وCATBoost.

تؤكد النتائج على فعالية نموذج ANN-PSO في تصنيف مستويات السمنة المختلفة بدقة، لا سيما في الفئات الشديدة من السمنة التي تمثل تمثيلًا ناقصًا، مع تقليل الأخطاء في التصنيف كما يتضح من مصفوفات الارتباك. بالإضافة إلى ذلك، قدم تحليل SHAP رؤى حول أهمية الميزات، كاشفًا عن العوامل المهمة التي تؤثر على مخاطر السمنة. تبرز هذه البحث الإمكانات التحويلية لنماذج التعلم الآلي المتقدمة في الصحة العامة، داعيةً إلى دمجها في استراتيجيات الرعاية الصحية الشخصية لتحسين علاج السمنة والوقاية منها. تشمل اتجاهات البحث المستقبلية استكشاف خوارزميات تحسين أخرى وهياكل شبكية لتعزيز الأداء التنبؤي في مهام تصنيف الرعاية الصحية المعقدة.

مقدمة

تتناول مقدمة ورقة البحث هذه أزمة الصحة العامة العالمية المتعلقة بالسمنة، والتي لها تداعيات صحية واقتصادية كبيرة. تعرف منظمة الصحة العالمية (WHO) السمنة بأنها زيادة في الدهون في الجسم، ويتم قياسها باستخدام مؤشر كتلة الجسم (BMI)، حيث يتم تصنيف الأفراد على أنهم بدينون إذا تجاوز مؤشر كتلة الجسم لديهم 30 كجم/م². لقد تضاعف انتشار السمنة تقريبًا منذ عام 1980، حيث يؤثر على أكثر من 200 مليون رجل بالغ وحوالي 300 مليون امرأة بالغة في جميع أنحاء العالم. يرتبط هذا الحالة بالعديد من المشكلات الصحية، بما في ذلك ارتفاع ضغط الدم، وأمراض القلب والأوعية الدموية، والسكري، وزيادة معدلات الوفيات، بالإضافة إلى التأثيرات النفسية والأعباء المالية على أنظمة الرعاية الصحية.

تسلط الورقة الضوء على قيود نماذج الانحدار التقليدية في تقييم مخاطر السمنة، مما يدفع نحو استخدام تقنيات التعلم الآلي (ML) التي يمكن أن تلتقط التفاعلات المعقدة وغير الخطية في البيانات السريرية. يقترح المؤلفون تحليلًا شاملاً لمخاطر السمنة باستخدام سبعة خوارزميات تعلم آلي مشرف، بما في ذلك الانحدار اللوجستي، والغابة العشوائية، وآلة الدعم الناقل، من بين آخرين. ومن الجدير بالذكر أنهم يقدمون نموذجًا هجينًا جديدًا، ANN-PSO (الشبكات العصبية الاصطناعية – تحسين سرب الجسيمات)، الذي يجمع بين قدرات تحسين تحسين سرب الجسيمات مع نقاط القوة في التعلم للشبكات العصبية الاصطناعية. يظهر هذا النهج الهجين أداءً متفوقًا في توقع مخاطر السمنة، مما يشير إلى إمكانات كبيرة لتعزيز التحليلات التنبؤية في أبحاث السمنة.

الطرق

تستخدم الدراسة تقنيات التعلم الآلي (ML) لتقييم مستويات السمنة بناءً على أنماط النظام الغذائي والحالات البدنية، مصنفة الأفراد إلى سبع فئات محددة بواسطة مؤشر كتلة الجسم: نقص الوزن، الوزن الطبيعي، الوزن الزائد (المستويات I و II)، والسمنة (الأنواع I و II و III). يتماشى التصنيف مع إرشادات منظمة الصحة العالمية، بهدف تطوير نموذج تنبؤي يسهل الكشف المبكر عن السمنة واستراتيجيات العلاج الشخصية. تم تقسيم مجموعة البيانات إلى مجموعات تدريب (70%)، والتحقق (15%)، والاختبار، مع إخضاع بيانات التدريب لمزيد من التحقق المتقاطع 10 مرات لتعزيز موثوقية النموذج. تم استخدام خوارزميات تعلم آلي متنوعة، بما في ذلك الانحدار اللوجستي (LR)، وآلة الدعم الناقل (SVM)، والغابة العشوائية (RF)، والبيرسيبترون متعدد الطبقات (MLP)، وLightGBM (LGBM)، وXGBoost، وCATBoost، جنبًا إلى جنب مع نموذج هجين يجمع بين الشبكات العصبية الاصطناعية (ANN) مع تحسين سرب الجسيمات (PSO) لتحسين الأداء.

تشير النتائج إلى أنه بينما حققت عدة نماذج دقة تتراوح بين 85% و 89%، فإن نموذج ANN-PSO الهجين تفوق عليها بدقة تبلغ 91.79%. أظهر هذا النموذج قدرات تنبؤية متفوقة، لا سيما في تحديد الفئات الأقل تمثيلًا مثل “الوزن الزائد النوع I” و”السمنة النوع I”، حيث واجهت النماذج التقليدية صعوبة. كانت مقاييس الأداء، بما في ذلك الدقة، والاسترجاع، ودرجات F1، مرتفعة باستمرار عبر جميع الفئات لنموذج ANN-PSO، مدعومة بمصفوفات الارتباك التي توضح فعاليته في تقليل الأخطاء في التصنيف. بالإضافة إلى ذلك، تم التحقق من قوة التمييز القوية للنموذج وقدرته على التعامل مع الفئات غير المتوازنة من خلال منحنيات ROC وPrecision-Recall، مما يبرز إمكاناته في تصنيف السمنة بدقة وتقييم المخاطر.

النتائج

تقيّم قسم النتائج فعالية النموذج المقترح في تصنيف السمنة من خلال أربعة مقاييس أداء رئيسية: الدقة، الدقة، الاسترجاع، ودرجة F1. تقيس الدقة نسبة العينات المتوقعة بشكل صحيح، المحسوبة كنسبة من الإيجابيات الحقيقية (tp) والسلبيات الحقيقية (tn) إلى العدد الإجمالي للعينات. تحدد الدقة قدرة النموذج على التعرف بشكل صحيح على الحالات الإيجابية، والتي تعرف كنسبة من tp إلى مجموع tp والإيجابيات الكاذبة (fp). تشير الدقة الأعلى إلى تحسين الدقة في اكتشاف الحالات الإيجابية.

من ناحية أخرى، يقيم الاسترجاع قدرة النموذج على تحديد الإيجابيات الحقيقية بين جميع الحالات الإيجابية الفعلية، والتي تمثل كنسبة من tp إلى مجموع tp والسلبيات الكاذبة (fn). تشير قيمة الاسترجاع العالية إلى أن النموذج يلتقط بشكل فعال نسبة كبيرة من الحالات الإيجابية، مما يقلل من فرص تجاهل السلبيات الكاذبة. تعمل درجة F1 كمتوسط توافقي للدقة والاسترجاع، مما يشير إلى أداء متوازن. تشير درجة F1 الكبيرة إلى أن النموذج يحافظ على توازن فعال بين الدقة والاسترجاع، مما يبرز قوته العامة في تصنيف السمنة.

المناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على الاستكشاف الواسع لنماذج التعلم الآلي (ML) لتوقع السمنة عبر مختلف الفئات السكانية. استخدمت العديد من الدراسات منهجيات متنوعة، بما في ذلك السجلات الصحية الإلكترونية (EHR) وبيانات الصحة العامة، مستخدمة مجموعة من خوارزميات التعلم الآلي مثل طرق التجميع، وأشجار القرار، وآلات الدعم الناقل (SVM). ومن الجدير بالذكر أن محمد عدنان وآخرين (2012) طوروا نموذجًا هجينًا يجمع بين نايف بايز والخوارزميات الجينية، محققين دقة توقع تبلغ 92%. أفادت دراسات أخرى، مثل تلك التي أجراها دوغان وآخرون (2015) وزينغ وراجيرو (2017)، بدقة تبلغ 85% و88.92%، على التوالي، باستخدام تقنيات تعلم آلي مختلفة. تؤكد النتائج على فعالية خوارزميات متنوعة، حيث تظهر SVM والغابة العشوائية بشكل متكرر كأداء عالٍ.

على الرغم من النجاحات، تكشف الأدبيات عن تحديات شائعة، بما في ذلك الدقة المنخفضة، والتكيف الزائد، والقدرة المحدودة على التكيف مع البيانات الجديدة. تهدف هذه الدراسة إلى معالجة هذه القضايا من خلال تنفيذ شبكة عصبية اصطناعية تم تحسينها من خلال تحسين سرب الجسيمات (ANN-PSO)، والتي تسعى إلى تعزيز دقة التوقع والموثوقية. تشمل مجموعة البيانات المستخدمة مجموعة واسعة من الخصائص البدنية والغذائية من الأفراد الذين تتراوح أعمارهم بين 14 و61 عامًا في كولومبيا وبيرو والمكسيك، مما يسمح بتحليل شامل لمؤشرات السمنة. تؤكد البحث على أهمية هندسة الميزات ومعالجة البيانات، والتي تضمنت إنشاء متغيرات جديدة لالتقاط تعقيدات السمنة بشكل أفضل، مما يضع أساسًا قويًا لتحسين النمذجة التنبؤية في هذا المجال.

القيود

يقدم نموذج الشبكة العصبية الاصطناعية-تحسين سرب الجسيمات (ANN-PSO) المقترح لتصنيف السمنة عدة قيود قد تؤثر على أدائه وقابليته للتطبيق. من الجدير بالذكر أن القرار باستبعاد الطول من مجموعة البيانات كان يهدف إلى تجنب التبسيطات المرتبطة بمؤشر كتلة الجسم (BMI)؛ ومع ذلك، قد يؤدي هذا الاستبعاد أيضًا إلى إزالة معلومات حيوية يمكن أن تعزز دقة النموذج. بينما يتضمن النموذج ميزات متنوعة مثل الوزن، والعادات الغذائية، والنشاط البدني، والعلامات الوراثية، والعوامل الديموغرافية لتوفير فهم أكثر دقة للسمنة، قد تؤدي هذه التعديلات إلى تقويض أدائه التنبؤي العام.

بالإضافة إلى ذلك، قد يؤدي الاعتماد على تحسين سرب الجسيمات (PSO) لضبط المعلمات إلى تقارب غير مثالي، مما يمكن أن يعيق فعالية النموذج. تثير التكلفة الحسابية العالية المرتبطة بتدريب النموذج مخاوف بشأن جدواه للتطبيقات في الوقت الحقيقي. أخيرًا، تبقى قدرة النموذج على التعميم عبر مجموعات سكانية متنوعة دون الحاجة إلى إعادة التدريب غير مؤكدة، مما قد يحد من قابليته للتطبيق وفعاليته في سياقات ديموغرافية مختلفة.

Journal: Frontiers in Big Data, Volume: 7
DOI: https://doi.org/10.3389/fdata.2024.1469981
PMID: https://pubmed.ncbi.nlm.nih.gov/39403430
Publication Date: 2024-09-30
Author(s): Zarindokht Helforoush et al.
Primary Topic: Artificial Intelligence in Healthcare

Overview

The research paper investigates the application of machine-learning techniques to enhance obesity risk prediction, addressing the limitations of traditional regression models that struggle to account for the multifaceted interactions among genetic, environmental, and behavioral factors. The study introduces a novel hybrid model combining Artificial Neural Networks (ANN) and Particle Swarm Optimization (PSO), which, after thorough data preprocessing, achieved an impressive accuracy rate of 92%, surpassing other evaluated algorithms such as Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), XGBoost, LightGBM (LGBM), and CATBoost.

The findings emphasize the effectiveness of the ANN-PSO model in accurately classifying various obesity levels, particularly in underrepresented severe obesity categories, while minimizing misclassifications as demonstrated by confusion matrices. Additionally, SHAP analysis provided insights into feature importance, revealing the significant factors influencing obesity risk. This research highlights the transformative potential of advanced machine-learning models in public health, advocating for their integration into personalized healthcare strategies to improve treatment and prevention of obesity. Future research directions include exploring other optimization algorithms and network architectures to further enhance predictive performance in complex healthcare classification tasks.

Introduction

The introduction of this research paper addresses the global public health crisis of obesity, which has significant health and economic repercussions. Defined by the World Health Organization (WHO) as an excess of body fat, obesity is operationalized using body mass index (BMI), with individuals classified as obese if their BMI exceeds 30 kg/m². The prevalence of obesity has nearly doubled since 1980, affecting over 200 million adult males and approximately 300 million adult females worldwide. This condition is associated with numerous health issues, including hypertension, cardiovascular disease, diabetes, and increased mortality rates, as well as psychological effects and financial burdens on healthcare systems.

The paper highlights the limitations of traditional regression models in assessing obesity risk, prompting a shift towards machine learning (ML) techniques that can capture complex, non-linear interactions in clinical data. The authors propose a comprehensive analysis of obesity risk using seven supervised ML algorithms, including Logistic Regression, Random Forest, and Support Vector Machine, among others. Notably, they introduce a novel hybrid model, ANN-PSO (Artificial Neural Networks-Particle Swarm Optimization), which combines the optimization capabilities of Particle Swarm Optimization with the learning strengths of artificial neural networks. This hybrid approach demonstrates superior performance in predicting obesity risk, suggesting significant potential for enhancing predictive analytics in obesity research.

Methods

The study employs machine learning (ML) techniques to evaluate obesity levels based on dietary patterns and physical conditions, categorizing individuals into seven BMI-defined classes: underweight, normal weight, overweight (levels I and II), and obese (types I, II, and III). The classification adheres to WHO guidelines, with the aim of developing a prognostic model that facilitates early obesity detection and personalized treatment strategies. A dataset was divided into training (70%), validation (15%), and testing sets, with the training data further subjected to 10-fold cross-validation to enhance model reliability. Various ML algorithms, including Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Multi-Layer Perceptron (MLP), LightGBM (LGBM), XGBoost, and CATBoost, were utilized, alongside a hybrid model combining Artificial Neural Networks (ANN) with Particle Swarm Optimization (PSO) to optimize performance.

The results indicate that while several models achieved accuracies between 85% and 89%, the ANN-PSO hybrid model outperformed them with an accuracy of 91.79%. This model demonstrated superior predictive capabilities, particularly in identifying minority classes such as “Overweight Type I” and “Obesity Type I,” where traditional models struggled. Performance metrics, including precision, recall, and F1 scores, were consistently high across all classes for the ANN-PSO model, supported by confusion matrices that illustrated its effectiveness in minimizing misclassifications. Additionally, Receiver Operating Characteristic (ROC) and Precision-Recall curves further validated the model’s robust discrimination power and ability to handle imbalanced classes, underscoring its potential for accurate obesity classification and risk assessment.

Results

The results section evaluates the proposed model’s effectiveness in classifying obesity through four key performance metrics: Accuracy, Precision, Recall, and F1 Score. Accuracy measures the proportion of correctly predicted samples, calculated as the ratio of true positives (tp) and true negatives (tn) to the total number of samples. Precision quantifies the model’s ability to correctly identify positive cases, defined as the ratio of tp to the sum of tp and false positives (fp). A higher precision indicates improved accuracy in detecting positive cases.

Recall, on the other hand, assesses the model’s capability to identify true positives among all actual positive instances, represented as the ratio of tp to the sum of tp and false negatives (fn). A high recall value signifies that the model effectively captures a large proportion of positive cases, thereby reducing the chances of overlooking false negatives. The F1 Score serves as a harmonic mean of precision and recall, indicating a balanced performance. A significant F1 Score suggests that the model maintains an effective equilibrium between precision and recall, highlighting its overall robustness in obesity classification.

Discussion

The discussion section of the research paper highlights the extensive exploration of machine learning (ML) models for predicting obesity across various demographics. Numerous studies have employed diverse methodologies, including Electronic Health Records (EHR) and public health data, utilizing a range of ML algorithms such as ensemble methods, decision trees, and support vector machines (SVM). Notably, Muhamad Adnan et al. (2012) developed a hybrid model combining Naïve Bayes with genetic algorithms, achieving a prediction accuracy of 92%. Other studies, like those by Dugan et al. (2015) and Zheng and Ruggiero (2017), reported accuracies of 85% and 88.92%, respectively, using different ML techniques. The findings underscore the effectiveness of various algorithms, with SVM and Random Forest frequently emerging as top performers.

Despite the successes, the literature reveals common challenges, including low accuracy, overfitting, and limited adaptability to new data. This study aims to address these issues by implementing an Artificial Neural Network optimized through Particle Swarm Optimization (ANN-PSO), which seeks to enhance prediction accuracy and robustness. The dataset utilized encompasses a wide range of physical and dietary characteristics from individuals aged 14 to 61 in Colombia, Peru, and Mexico, allowing for a comprehensive analysis of obesity predictors. The research emphasizes the importance of feature engineering and data preprocessing, which included the creation of new variables to better capture the complexities of obesity, thereby setting a solid foundation for improved predictive modeling in this domain.

Limitations

The proposed hybrid Artificial Neural Network-Particle Swarm Optimization (ANN-PSO) model for obesity classification presents several limitations that may affect its performance and applicability. Notably, the decision to exclude height from the dataset was intended to avoid the simplifications associated with Body Mass Index (BMI); however, this exclusion may also remove critical information that could enhance the model’s accuracy. While the model incorporates various features such as weight, dietary habits, physical activity, genetic markers, and demographic factors to provide a more nuanced understanding of obesity, this adjustment could potentially compromise its overall predictive performance.

Additionally, the reliance on Particle Swarm Optimization (PSO) for hyperparameter tuning may lead to suboptimal convergence, which can hinder the model’s effectiveness. The high computational cost associated with training the model raises concerns about its feasibility for real-time applications. Lastly, the model’s ability to generalize across diverse populations without requiring retraining remains uncertain, which could limit its broader applicability and effectiveness in various demographic contexts.