تحليل مقارن لتنبؤ أمراض القلب باستخدام الانحدار اللوجستي، وآلة الدعم الناقل، وجيران الأقرب، وغابة عشوائية مع التحقق المتقاطع لتحسين الدقة

Comparative analysis of heart disease prediction using logistic regression, SVM, KNN, and random forest with cross-validation for improved accuracy

تقارير علمية

افتح

المنهجية

مخطط تدفق إعداد البيانات

مخطط تصميم التحقق

النتائج والمناقشة

نموذج التعلم الآلي الأساسي بدون تحقق متقاطع

نموذج التعلم الآلي باستخدام التحقق المتقاطع

منحنى التعلم لجميع النماذج

الخاتمة

توفر البيانات

References

شكر وتقدير

مساهمات المؤلفين

الإعلانات

المصالح المتنافسة

معلومات إضافية

scientific reports

OPEN

Methodology

Data preparation flow chart

Validation design diagram

Results and discussion

Baseline machine learning model without cross validation

Machine learning model using cross-validation

Learning curve of all models

Conclusion

Data availability

References

Acknowledgements

Author contributions

Declarations

Competing interests

Additional information

تحليل مقارن لتنبؤ أمراض القلب باستخدام الانحدار اللوجستي، وآلة الدعم الناقل، وجيران الأقرب، وغابة عشوائية مع التحقق المتقاطع لتحسين الدقة Comparative analysis of heart disease prediction using logistic regression, SVM, KNN, and random forest with cross-validation for improved accuracy

الملخص

Abstract

عربي
English

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-93675-1
PMID: https://pubmed.ncbi.nlm.nih.gov/40251253
تاريخ النشر: 2025-04-18

ياغياناث ريمالنافنيت شارماسيدهارثا باوديلعبير السدونمادهف بارساد كويرالاوسوميت جيل

تؤكد هذه الورقة البحثية الأساسية على التحقق المتبادل، حيث يتم إعادة ترتيب عينات البيانات في كل تكرار لتشكيل مجموعات فرعية عشوائية مقسمة إلى n طيات. هذه الطريقة تحسن من أداء النموذج وتحقق دقة أعلى من نموذج الأساس. تكمن الجدة في عملية إعداد البيانات، حيث تم استيفاء الميزات العددية باستخدام المتوسط، وتم استيفاء الميزات الفئوية باستخدام طرق كاي-تربيع، وتم تطبيق التطبيع. تتضمن هذه الدراسة البحثية تحويل مجموعات البيانات الأصلية وتحليل النماذج المقارنة لأربعة طرق تحقق متبادل تشمل الانحدار اللوجستي (LR)، آلة الدعم الناقل (SVM)، الجار الأقرب (KNN)، وغابة عشوائية (RF) على مجموعات بيانات مفتوحة لأمراض القلب. الهدف هو تحديد متوسط دقة توقعات النموذج بسهولة ومن ثم تقديم توصيات لاختيار النموذج بناءً على زيادة نموذج التحقق المتبادل لإعداد البيانات (5 إلى ) أكثر من نموذج الأساس لاختيار أفضل نموذج. من خلال مقارنة درجات دقة كل نموذج، تم العثور على أن الانحدار اللوجستي وحققت نماذج الجار الأقرب أعلى دقة منبين النماذج الأربعة عندما تكون الدقة الفردية مصدر قلق. ومع ذلك، حققت إحصائيات ملخص نموذج الغابة العشوائية درجة F1 تبلغ 95%، والدقة، واسترجاع ( )، مما يدل على أعلى درجة دقة ماكرو إجمالية. يمكن مقارنة هذه النتائج بشكل أكبر باستخدام التحقق من منحنى التعلم. على العكس من ذلك، أظهر نموذج الانحدار اللوجستي أدنى دقة لـ بين النماذج الأربعة لتعلم الآلة. ومع ذلك، لا تغطي هذه الدراسة تحسين المعلمات الفائقة، والذي يمكن أن يحسن أداء النموذج.

الكلمات الرئيسية: تعلم الآلة، التحقق المتقاطع، الدقة-التحقق، منحنى التعلم، معلومات الصحة
لقد تم تحويل الرعاية الصحية بواسطة التعلم الآلي، الذي يحمل وعدًا بتحسين دقة التشخيص في الأمراض المعقدة مثل أمراض القلب. ومع ذلك، لا تزال تقنيات التحقق المثلى لنماذج التنبؤ القوية بأمراض القلب غير مستكشفة بشكل كافٍ، مما يبرز الفجوة المعرفية من خلال تعزيز دقة التشخيص، لا سيما في الحالات المعقدة مثل أسباب أمراض القلب. مع استمرار كون أمراض القلب سببًا رئيسيًا للوفاة على مستوى العالم، لم تكن الحاجة إلى أدوات تشخيص دقيقة أكبر من أي وقت مضى. ومع ذلك، لا تزال تقنيات التحقق المثلى لنماذج التنبؤ القوية بأمراض القلب غير مستكشفة. التعلم الآلي هو عملية تصميم نموذج استنادًا إلى مجموعات بيانات التدريب والاختبار، والتي يتم تقييم قيمتها لاحقًا من مجموعات التحقق من العينة. تقسيم التدريب والاختبار هو طريقة مستخدمة على نطاق واسع لتقسيم مجموعات بيانات البحث إلى مجموعات فرعية للتدريب والاختبار. تعتمد دقة النموذج بشكل أساسي على مجموعات الإدخال والتحقق. يساعد التحقق المتقاطع، مع تكرارات متعددة، في تحسين النموذج لتحقيق درجات أداء مثلى. عادةً ما تبدأ نماذج التعلم الآلي بتقسيم مجموعة البيانات المتاحة إلى مجموعات تدريب والتحقق والاختبار، غالبًا باستخدام نسبة 70:15:15. يتم بناء النموذج وتدريبه باستخدام بيانات التدريب، ويتم تقييمه على مجموعة التحقق لتحسين أدائه، وأخيرًا يتم اختباره على مجموعة الاختبار غير المرئية لتقييم قدرته على التعميم.

تصنيف وتوقع متعدد الدرجات استنادًا إلى أنماط الدرجات المسجلة سابقًا يكون أكثر دقة عندما يستخدم الأطباء الدواء عند طرح بعض الحلول للأسئلة الحرجة خلال الفحص. ومع ذلك، قد تكون هذه الطريقة أقل فعالية عندما

يتم وصف الأدوية للأطباء دون التعرض المسبق للأسئلة الحرجة.

التقاطع المتقاطع هو طريقة لتدريب نموذج عن طريق تقسيم مجموعة البيانات إلى عدة طيات، باستخدام جزء من البيانات من كل تقسيم كمجموعة تحقق بينما يتم استخدام البيانات المتبقية لتدريب النموذج، مما يضمن دقة مثلى.

أكثر أنواع التحقق من صحة الطبقات الشائعة هو الذي يُستخدم لتقسيم البيانات، حيث يتم توزيع نسبة مشابهة من المخرجات المستهدفة بين عينات التنبؤ التي توفر أفضل متوسط نتيجة.

تعمل طريقة الاحتفاظ على الجزء الأيسر من مجموعات التدريب للنموذج، بينما تعمل طريقة التقسيم المتوازن n-fold على مجموعات البيانات غير المتوازنة عندما يحتوي كل قسم على نفس الطبقات بشكل مناسب من كل فئة ناتجة. تستخدم طريقة التحقق المتبادل بترك العينات للتدريب والنقاط كتحقق، والتي تتكرر لجميع التركيبات، ويتم حساب متوسط الخطأ حتى لا يكون هناك عشوائية مع المتوسط.

تُستخدم تقنيات الانحدار اللوجستي، وغابة العشوائية، وآلة الدعم الناقل، والتكرار الذاتي، وتقنيات التحقق المتقاطع لحل مشاكل الإفراط في التكيف في الأبحاث الطبية. يستخدم التكرار الذاتي بيانات عينة minimal لإعادة أخذ عينات من البيانات، بينما تستخدم تقنيات التحقق المتقاطع العديد من الميزات المساهمة لمقارنة الاستجابات المستهدفة.

. مؤلف

قم بتقسيم مجموعة البيانات إلى طيات، وتدريب النموذج على مجموعة التدريب، والتحقق من صحته على مجموعة الاختبار. كرر هذه الخطوات من 3 إلى 6000 مرة، مع تخصيص الطية الأولى لاختبار النموذج واستخدام البقية لتدريب النموذج. يقيس التحيز الفرق بين توقعات النموذج والقيم المستهدفة الفعلية، بينما تعكس التباين عدم اتساق توقعات النموذج عبر مجموعات بيانات مختلفة. من الناحية المثالية، يحقق النموذج توازنًا بين التحيز والتباين، مما يضمن قدرة تفسير مثالية ويحقق أفضل أداء شامل على مجموعة البيانات.

. بالمثل المؤلفون

استخدمت عملية التعميم لتقييم مدى فعالية تدريب نموذج لتحديد أنماط البيانات ذات المعنى وتصنيف عينات البيانات غير المرئية. النماذج المفرطة في التكيف تتذكر أنماط البيانات لمجموعة بيانات التدريب ولكنها لا تعمم على البيانات غير المرئية، مما يؤدي إلى تباين عالٍ.

يحدث نقص التوافق عندما يفشل النموذج في التقاط الأنماط من مجموعة البيانات، وغالبًا ما يكون ذلك بسبب عدم كفاية أو تدني جودة بيانات التدريب. بالإضافة إلى ذلك، يمكن أن يؤدي نقص عينات التدريب الكافية أيضًا إلى نقص التوافق. يسعى الباحثون لتحقيق توافق جيد من خلال تحديد الأنماط في بيانات التدريب، والتي تعتمد على جودة مدخلات البيانات وتقسيم الطيات أثناء تطوير النموذج. وبالمثل، المؤلف

يقترح نموذجًا ثلاثي المراحل يعتمد على الشبكات العصبية الاصطناعية (الأمامية، والخلفية). أظهر هذا النموذج التحليلي وحقق دقة تصنيف قدرها

باستخدام مجموعة بيانات الجامعة. إذا أظهر نموذج الشبكة العصبية لتراجع الأخطاء

الدقة عند اختبار سجلات البيانات غير المرئية. بالمثل

، أجرت بحثًا حول عدم التدخين بين الأطفال الذين تتراوح أعمارهم بين 12 و 19 عامًا، مع الإبلاغ عن دقة تتراوح بين

(الحد الأدنى) إلى

(الحد الأقصى) التغيرات. لم يتقدم استخدام الحركة الجسدية التجميعية، وقائمة كتلة الجسم، ومستوى الجلوكوز في الدم؛ انخفضت الغلبة من 70 إلى

على مدار نفس الوقت

إطار التصنيف والإطار المنجز يظهران

الدقة، مع ذلك، تختلف بشكل كبير حسب النموذج

استخدام الأداء الانقباضي البطيني بشكل فردي ضمن التوزيع، حيث تتراوح التقارير الموزعة بشكل واسع من 13 إلى

؛ كما أن معدل الوفيات السنوي التفصيلي يتحول من 1.3 إلى

. بالمثل، المؤلفون

قرارات طبية متكاملة تعتمد على مصنفات إطار أعراض العدوى القلبية، وطرق متعددة الطبقات، وشبكات عصبية اصطناعية تبرز لأمراض القلب، وخوارزميات تعلم الآلة، وشبكات عصبية اصطناعية، وشبكات عصبية اصطناعية تستخدم معالجة ضبابية تحليلية هرمية. حقق نظام التصنيف المقترح دقة تصنيف تبلغ

. يتناول هذا العمل بشكل رئيسي اختيار النموذج ودقته، ولكن دون التعامل مع حالات مختلفة من مشاكل التصنيف المفرط والتصنيف الناقص في حساب التصنيف المزدوج، y

تاريخ سلبي، وتقدير للمتغير المستقبلي y لدورة إيجابية واحدة (AUCROC). التقديرات المتعددة الفئات لـ y لـ y

زيادة التباين. تم رسم تخمين لتصنيف فئتين وفئة واحدة؛ عائد المصنف هو قيمة عتبة 0.5. يمكن أن تكون آلة الدعم المتجه حساب تصنيف في تعلم الآلة تُستخدم عادةً لمشاكل التصنيف. استخدمت آلة الدعم المتجه أكثر تقنيات الحافة تطورًا.

تحليل الصور. وفقًا لـ

تُصنف مجموعات البيانات غير المتوازنة التعليم الابتدائي والتعليم العالي باستخدام الاختيار المتعدد عبر الإنترنت من جامعة بهاراتيار. البحث الذي أُجري حول عدد الطلاب الذين لم يكملوا التخرج في الولايات المتحدة هو

وفي أوروبا، يكون حوالي

لإنهاء دراستهم في الوقت المحدد من خلال الاستفادة من ميزات الأهداف متعددة الدرجات. وبالمثل، فإن المؤلفين

الدراسات ذات الصلة المنشورة بين عامي 2000 و2018 تشير إلى أن عوامل متعددة تؤثر على الأداء بطرق غير خطية في التعلم عبر الإنترنت لتحليل الأداء وتحديد عوامل التأثير بناءً على سلوك التقييمات، والتدريس، واستخراج قواعد الارتباط. التحليل الانحداري، وتحليل التصنيف، وتوقع الأداء لأغلبية

تفضل دراسات النمذجة تصنيف الأداء على أنه نجاح أو فشل أيضًا. وبالمثل، المؤلف

استخدمت خوارزميات الإشراف على التعلم لتحسين نسخة تنبؤية من المجلس الفيدرالي للتعليم الثانوي والوسيط في إسلام آباد، باكستان، باستخدام تقنية الطي 10. في التحقق المتقاطع باستخدام k-fold، تم استخدام جهاز دعم المتجهات القائم على متجهات التعليم المخفضة القادر على التنبؤ بالطلاب المعرضين للخطر والطلاب الهامشيين، حيث حقق جهاز الدعم المتجه تقليصًا في متجهات التدريب بنسبة لا تقل عن تسعة وخمسين نقطة.

دون تغيير الهامش أو دقة المصنف. علاوة على ذلك، أكدت النتائج أن النهج المقترح قادر على تحقيق دقة أساسية قدرها

في التنبؤ بالمخاطر، على التوالي، والتي تعتبر أمثلة جيدة على تباين الدقة المستقلة. وبالمثل، فإن المؤلفين

استخدموا تعزيز التدرج الخفيف، وتعزيز التدرج الموسع، والغابات العشوائية، ومصنفات الإدراك المتعدد الطبقات من سجلات UCI واستخدموا ثلاث مجموعات لتوقع الأخطاء مع التعميم المتراكم حول مستودعات التعلم الآلي لاستهداف الميزات. وقد حققوا متوسط حساسية قدره

الدقة المشتركة لـ

تصنيف، تصنيف F1 من

، ومتوسط

خوارزمية الشبكة العصبية. معدل الانخفاض من 12 إلى

. بالمثل، المؤلف

استخدمت نسخة مجتمعية من الشبكة العصبية للعرض الذي يصل فيه النموذج المقترح إلى دقة تصل إلى

، وهو أعلى من العديد من الطرق الحالية لاضطراب السكتة الدماغية. وبالمثل، المؤلفون

تم تحديد واستخراج تسعة تجمعات من الحقائق الجينية المعلوماتية بواسطة DBSCAN، والتي تم اختيارها من خلال تحليل التعبير الجيني التفاضلي لنماذج فئة خاصة خمسة. ثم، تم استخدام طريقة تعلم عميقة لتجميع مخرجات المصنفات الخمسة. وبالمثل، قام المؤلفون

يتم استخدام مصنف J48 المعدل لزيادة معدل دقة تقنية استخراج البيانات. أداة استخراج البيانات MATLAB تُستخدم لتوليد مصنفات القرار ومصنفات بايزيان الساذج في WEKA. الدقة العامة حوالي ثلاثة وثمانين.

مجموعة القواعد الميمية تقدمت بدقة من

إلى ثلاثة وتسعين نقطة

. والذي وجد أيضًا أن الخوارزمية الميمية كانت لديها دقة أعلى من النسخة المستندة إلى مجموعة القواعد الجينية ونموذج الانحدار. بالمثل

، يستخدم جينياً
نموذج الانحدار القائم على الخوارزمية لتوقع مستويات التضخم. يتم تدريب النموذج وتقييمه باستخدام البيانات. وبالمثل، المؤلفون

استخدم نموذج توقع تم تقديمه مع تركيبات فريدة من نوعها من الميزات واستراتيجيات فئة معترف بها عدة. مما ينتج عنه مستوى أداء محسّن بدقة تبلغ

عبر النسخة التنبؤية لمرض القلب التاجي باستخدام مجال الغابة العشوائية الهجينة مع نموذج خطي. وبالمثل، المؤلفون

استخدمت توقع أداء خوارزمية الجار الأقرب (k-nearest neighbor) لفئة الأداء العامة، وتم الحصول على الدقة العامة للمصنفات المختبرة عند

. صنف شجرة القرار حوالي

لإجراء اختبار التحقق من الصحة بعشر مرات وتسعة وستين

لاختبار تقسيم الأسهم. الدقة عالية للفئة الأساسية (

) و أناقة 2 د (

). بالمثل، المؤلفون

أجرى بحثًا لتحديد أي وظيفة لها أفضل تأثير على فئة الهدف لتحديد أي طريقة تتفوق على أكثر مكونات RF استخدامًا، J48، بايزي، الحقائق الاجتماعية والمالية والديموغرافية والتعليمية التي قدمها الغابة العشوائية.

الدقة. يؤثر التقييم الداخلي على النسبة المئوية النهائية للفصل الدراسي. وبالمثل، المؤلف

استخدمت البيانات السكانية لتجاوز أداء الغابة العشوائية من خلال توفير

الدقة على معلومات التدريب مع التحقق المتقاطع بعشرة أضعاف و

الدقة في طريقة الاحتفاظ. بينما يتم تنفيذ الإرشادات، تتحسن دقة أبسط شبكة عصبية اصطناعية بنسبة تصل إلى

وعكس ذلك بالنسبة لطرق أخرى. تعتبر الكفاءة الذاتية والدافع للنجاح مؤشرات جيدة عند الارتباط بمعدل GPA. مثل الميزات الديموغرافية، تأثرت الميزات المستقلة لمرض القلب بشكل كبير بمرض القلب المستهدف، وبالإضافة إلى النماذج، فإنها تؤثر أيضًا على دقتها، لذا تحاول هذه الدراسة التحقق من صحة درجات التحقق المتقاطع، بما في ذلك مقارنة أربعة نماذج شائعة.

تلعب تعلم الآلة دورًا كبيرًا في استخراج القدرات الخفية من السجلات العلمية، مما يفيد في الكشف المبكر من مستودع تقارير أمراض القلب؛ ولهذا السبب تحدث حوالي 12 مليون حالة وفاة على مستوى العالم.

يلاحظ أن اضطرابات الشرايين التاجية تؤدي إلى الوفاة بشكل أكبر في الولايات المتحدة الأمريكية مقارنة بالدول الأوروبية المتقدمة الأخرى.

لذلك، يهدف هذا البحث إلى تحقيق أدق توقع لدقة التحقق لنماذج التعلم الآلي باستخدام أربع تقنيات: الانحدار اللوجستي، وآلات الدعم الشعاعي، وأقرب الجيران، والغابات العشوائية. يركز التحقق على تحديد حالات الشرايين التاجية، مما يسهل على الممارسين الطبيين اختيار النموذج الأنسب للتصنيف والتنبؤ. تقارن هذه الدراسة بين أربعة نماذج للتعلم الآلي – الانحدار اللوجستي، وآلات الدعم الشعاعي، وأقرب الجيران، والغابات العشوائية – باستخدام التحقق المتقاطع لتحديد فعاليتها التنبؤية.

تتكون مجموعات البيانات حول أمراض القلب من 13 سمة تصنيفية تتطلب المعالجة المسبقة قبل إجراء اختبار وتقييم نموذج التعلم الآلي، كما هو موضح في الشكل 1. بعد تحميل مجموعة البيانات في وحدة تحكم بايثون، يصف الأمر df.types فئات أنواع البيانات مع فئاتها المعنية. كل ميزة تحمل أمرًا فريدًا تصف خصائص مجموعات البيانات. يقوم المحول الأحادي (categories=’auto’) بالتوافق مع fit_transform (df [sex, cp., fbs, restecg, exang, slope, thal, ca]). تتكون أسماء الأعمدة لكل OneHotEncoder تصنيفية من 76 عمودًا من مجموعات البيانات. يوفر التحقق المتقاطع تحسينًا أفضل لنموذج أمراض القلب باستخدام الانحدار اللوجستي ونماذج التعلم الآلي لدعم المتجهات قبل الانتهاء من أفضل نموذج لبيانات البحث. مجموعة بيانات من

. قراءة csv (‘https://raw.githubusercontent.com/kb22/Heart-DiseasePrediction/master/dataset.csv‘) تم استخدامه. أهم التفاصيل في هذا النص هي العمر، ألم الصدر، قمم الأشجار، الكوليسترول، معدل ضربات القلب الأقصى، الانحدار القديم، ذكر، أنثى، الذبحة الصدرية النمطية، الذبحة الصدرية غير النمطية، الألم غير الذبحي، بدون أعراض، طبيعي، غير طبيعي، طبيعي، غير طبيعي، نعم، لا، مائل للأعلى، مسطح، هابط، طبيعي، عيب ثابت، عيب قابل للعكس، لا، Ca0، Ca1، Ca2، Ca3، و Ca4.

. بالمثل، المؤلف

استخدم مجموعة بيانات مرض القلب التي تحتوي على 14 عينة مستقلة والمتغيرات المستهدفة النهائية التي تتضمن مرض القلب 1 وعدم وجود مرض القلب 0 كمتغيرات مستهدفة، والتي

الشكل 1. خطوات معالجة البيانات لمجموعة بيانات أمراض القلب.

عمر	ضغط الدم الانقباضي	كول	ثلاث	الذروة القديمة		عمر	ضغط الدم الانقباضي	كول	ثلاث	أولدبيك
63	145	233	150	2.3	قبل بعد	0.952	0.763	-0.256	0.0154	1.08
37	١٣٠	٢٥٠	187	٣.٥		-1.91	-0.092	0.0721	1.633	2.122
41	١٣٠	204	172	1.4		-1.47	-0.092	-0.816	0.977	0.3109
٥٦	١٢٠	236	178	0.8		0.18	-0.663	-0.198	1.239	-0.206

الجدول 1. جدول البيانات التحضيري قبل/ بعد.

الشكل 2. مخطط تصميم التحقق.

كان هناك 303 سجلاً لمرضى القلب تم عرضها وتنزيلها 381,647 و 62,705 على التوالي حتى يناير 2023. بعد تحميل مكتبة المعالجة المسبقة sklearn الخاصة بمقياس المعايير القياسية في وحدة تحكم بايثون، أعدت تسميات الأعمدة الخاصة بهم كـ final2[age, trestbps, chol, thalach, oldpeak, m, f, و normal]

بعد دمجها مع عمود الهدف وتطبيعها، أصبحت مجموعة البيانات جاهزة لمقارنة النماذج، كما هو موضح في الجدول 1. يحتاج نموذج التعلم الآلي دائماً إلى تقسيمات تدريب واختبار. استخدمت هذه الدراسة غابة بحثية أربعة مقارنات نموذجية باستخدام بيانات مطبقة مع

تقسيمات، وتشير المعلمات باستخدام stratify إلى أنه في كل من هذه المجموعات، يتم الحفاظ على نسبة بيانات الهدف/التسمية كما هو

للفئات

. وهذا يشير إلى أنه لن تكون هناك مشاكل في زيادة العينة أو تقليلها في كل من مجموعات التدريب والاختبار. ومع حالة العشوائية

، نحصل على وحدات تعليم واختبار متطابقة عبر تنفيذات مختلفة، ولكن هذه المرة، تكون مجموعات التعليم والتحقق حصرية عن الحالة السابقة مع حالة عشوائية

. تؤثر وحدات التدريب والاختبار مباشرة على درجة أداء النموذج.

تهدف هذه الدراسة إلى التحقق من أكثر نماذج التعلم الآلي فعالية من خلال استخدام أربعة خوارزميات معترف بها على نطاق واسع، كما هو موضح في الشكل 2. قارن هذا البحث النتائج الأساسية مع التحقق المتقاطع باستخدام مجموعات بيانات مطبقة. تم إعادة أخذ العينات من مجموعات البيانات بشكل فريد مع تقسيمات n مختلفة واختبارها عبر نماذج التعلم الآلي، بما في ذلك الانحدار اللوجستي، والغابات العشوائية، وآلات الدعم الشعاعي، وأقرب الجيران. تبدأ العملية بإعداد مجموعات البيانات من خلال تطبيق ترميز one-hot للمتغيرات الفئوية وتطبيع البيانات. يتبع ذلك التنبؤ بدقة النموذج الفردي وإجراء تحقق متقاطع من 5 طيات بعد تقسيم مجموعات بيانات مرض القلب المعالجة مسبقاً إلى مجموعات تدريب واختبار. سيتم رسم هذه المخرجات باستخدام منحنى التعلم مع نفس التحقق المتقاطع. تمت مقارنة مصفوفة الارتباك ومنحنى ROC-AUC لكل نموذج جنبًا إلى جنب مع ملخصات نماذجها، مع تضمين منحنى التعلم في كل خطوة. تتبنى الدراسة نهجًا صارمًا في إعداد البيانات والتحقق من النموذج. تتكون مجموعة البيانات، التي تتضمن 13
خاصية فئوية، من معالجة مسبقة وهندسة ميزات، حيث يقوم ترميز one-hot بتوسيع البيانات إلى 76 عمودًا. يتم استخدام تقسيم 80:20 بين التدريب والاختبار مع تصنيف للحفاظ على نسب الفئات، يتبعه إجراء تحقق متقاطع من 5 طيات لتقييم قوي. بالإضافة إلى ذلك، يتم استخدام تحقق متقاطع من 5 طيات لرسم منحنيات التعلم، مما يضمن تقييمًا شاملاً لأداء النموذج. باستخدام مكتبة Seaborn في بايثون لرسم الخرائط الحرارية. لذا، حصل علماء البيانات على أفضل اختيار للنموذج من عينات البيانات المماثلة.

بعد تصميم مجموعات البيانات، يتم وصف العلاقة بين المتغيرات التابعة والمستقلة باستخدام خريطة حرارية (final2.corr(), cmap=’cool warm’) في حالة عرض كل قيمة ارتباط. يتم استخدام دالة sns. heatmap (final2.corr(), annot

True). تُستخدم مخططات الارتباط لفهم أي المتغيرات مرتبطة بشكل كبير ببعضها البعض وقوة هذه العلاقة في التسبب في مرض القلب.

تظهر معاملات الارتباط الأساسية بين الميزات التابعة والمستقلة لمجموعة بيانات مرض القلب، المستمدة من تحليل الانحدار اللوجستي (الشكل 3)، متوسط خطأ تربيعي للميزات مثل الجنس، cp.، الميل الحقيقي، الدهون، thalach، و ca.، مما يشير إلى أداء مرضٍ أقل من 10. بالمقابل، تظهر الميزات المستقلة مثل exang (38)، oldpeak (29)، و slope (58) قيم خطأ أعلى بكثير. بالمثل، يظهر معامل تحديد النموذج للمتغير التابع، كما تفسره المتغيرات المستقلة، تباينًا كبيرًا.

بالمثل، يفحص الباحث العلاقة بين الميزات التابعة والمستقلة باستخدام الانحدار اللوجستي الثنائي لتقييم ارتباط الميزات. يقيس مقياس R -مربع نسبة التباين في المتغير التابع الذي تفسره المتغيرات المستقلة، كما هو موضح في الجدول 2. في هذا التحليل،

خريطة حرارية للارتباط للميزات

الشكل 3. خريطة حرارية لارتباط البيانات.

من التباين يتم تفسيره بواسطة المتغيرات المستقلة. يقف R -مربع المعدل، الذي يوفر مقياسًا أكثر دقة، عند

، وهو ما كان مرضيًا. قيمة p (P) قريبة من الصفر، مما يشير إلى دليل قوي ضد الفرضية الصفرية، مما يدل على وجود مساهمة كبيرة من الميزات التابعة. يتم استخدام إحصائية F (F) لاختبار الأهمية العامة لنموذج الانحدار. تشير إحصائية F الأعلى إلى علاقة أكثر أهمية بين المتغيرات المستقلة والمتغير التابع، والتي كانت منخفضة، 17.71 في المجموع. يمثل التقاطع القيمة المتوقعة للمتغير التابع عندما تكون جميع المتغيرات المستقلة صفرًا. في هذه الحالة، يكون التقاطع 0.54، مع قيمة t عالية وقيمة p منخفضة. تشير المعاملات إلى أن زيادة وحدة واحدة في العمر مرتبطة بزيادة قدرها 20 وحدة في المتغير التابع، ولكنها ليست ذات دلالة إحصائية. تشير المعاملات إلى أن زيادة وحدة واحدة في Trestbps مرتبطة بانخفاض قدره 41 وحدة في المتغير التابع. المتغير Sex_1 ذو دلالة إحصائية، مع قيمة t تبلغ 2.64 وقيمة p منخفضة (0.001). لا يبدو أن المتغيرات الأخرى (العمر، trestbps، الكوليسترول، thalach، oldpeak) لها علاقة كبيرة بالمتغير التابع، مما يظهر كل ميزة بشكل كبير. لذلك، تحتاج هذه الدراسة إلى تقييم نموذج التعلم الآلي بشكل أكبر مع التحقق المتقاطع لتقييم دقة التنبؤ لمرضى القلب.

تم تصميم نموذج التعلم الآلي الافتراضي لأربعة معلمات مختلفة لنموذج التعلم الآلي (models = [الانحدار اللوجستي مع الحد الأقصى من التكرارات 1000، SVC مع نواة خطية، مصنف الجيران، الغابة العشوائية]) في النموذج، ثم باستخدام حلقة، تم حساب دقة النموذج بعد ملاءمة النماذج. تضمنت هذه العملية تقسيم البيانات إلى مجموعات تدريب واختبار، وتصميم النموذج، وإجراء التنبؤات على بيانات الاختبار، وحساب درجة دقة كل نموذج. كانت سير العمل تتكون من ملاءمة النموذج باستخدام نموذج ملائم مع التدريب والتدريب، ثم التنبؤ بدقة النموذج. أظهر الإخراج من وحدة التحكم أن نماذج الانحدار اللوجستي وأقرب الجيران حققت أعلى دقة

، تليها SVM مع نواة خطية ودرجة

. على وجه التحديد، كان الانحدار اللوجستي هو

، وكان K Neighbors هو

، وكانت النواة خطية عند

، وسجلت الغابة العشوائية

. تظهر درجات الدقة هذه أداء النماذج باستخدام الإعدادات الافتراضية. بالإضافة إلى ذلك، تم حساب درجة دقة التصنيف في مشكلة متعددة التسميات بناءً على عينة yttrian، حيث تطابقت التسميات الحقيقية مع التسميات المتوقعة تمامًا. يبرز هذا المقياس بشكل فعال سلوك النماذج في التعامل مع مجموعة البيانات المعطاة الشكل 4: مصفوفة الارتباك لأربعة نماذج. تم إنشاء مصفوفة الارتباك لكل فئة هدف باستخدام رسم دالة مصفوفة الارتباك، والتي تصور المشكلات الخاصة بالفئة وتقييم أنماط خطأ النموذج، كما هو موضح في الشكل 4. تمثل الصفوف الفئات الفعلية، بينما تشير الأعمدة إلى الفئات المتوقعة لفئة ثنائية. تبرز العناصر غير القطرية التنبؤات غير الصحيحة، مما يسهل تحديد الأخطاء. كانت دقة النموذج، التي تم تقييمها باستخدام تقرير التصنيف، قد تم حسابها في البداية تحت تقسيمات التدريب والاختبار لمجموعة البيانات الكاملة (x و y). ومع ذلك، بسبب التباين الكبير في النتائج، تم استخدام التحقق المتقاطع. في هذا النهج، تم تغيير عينات الاختبار عبر خمس تكرارات، مما يضمن تقييمًا أكثر قوة. تم طباعة الإحصائيات الملخصة لجميع النماذج، بما في ذلك تقارير التصنيف، باستخدام الأمر تقرير التصنيف مع y، model. predict (x)).

الجدول 3 يلخص أقصى دقة تم تحقيقها بواسطة النماذج ويبرز الحد الأدنى من الأداء عبر جميع الخوارزميات الأربعة. تُظهر نماذج الانحدار اللوجستي وأقرب الجيران K أعلى دقة تنبؤية.

مقارنةً بنماذج آلة الدعم الناقل والغابة العشوائية. ومع ذلك، عند تقييم المقاييس المتوسطة الكلية مثل الدقة والاسترجاع ودرجة F1، يتفوق نموذج الغابة العشوائية، محققًا نطاق دقة يبلغ

للمطابقات الحقيقية. وهذا يشير إلى أنه بينما يعتبر الانحدار اللوجستي مفضلًا عندما تكون الدقة وحدها هي الشاغل الرئيسي، فإن نموذج الغابة العشوائية أكثر ملاءمة للسيناريوهات التي تكون فيها الأداء العام، بما في ذلك الدقة والاسترجاع ودرجة F1، أمرًا حاسمًا في اتخاذ القرار.

طريقة أخرى لإعادة أخذ عينات من مجموعات بيانات أمراض القلب لتعلم الآلة هي استخدام تقنيات التحقق المتقاطع. لذلك، فإن التحقق المتقاطع هو عملية لتقييم جميع نماذج العينات K-fold من خلال تدريب كل نموذج تدريب/اختبار على مجموعات فرعية من البيانات. سيتم التنبؤ بالتصويت النهائي للأغلبية بعد تقييمها على المجموعة الفرعية التكميلية من البيانات. هذه العملية جيدة جدًا أثناء تصميم التحقق المتقاطع لاكتشاف مشاكل الإفراط في التكيف لتعميم نمط. تم تقييم كل نموذج بشكل فردي وتناسبه، وتم حساب درجة دقته باستخدام التحقق المتقاطع ووظائف الدقة.

-مربع: 0.58	معدلمربع: 0.54			إحصائيات F: 17.71
-مربع: 0.58	معامل	معيار	ت	P
ثابت	0.54	0.019	٢٨.٢٩	0.00
عمر	0.20	0.024	1.023	0.30
تريستبس	-0.41	0.021	-0.92	0.56
تشول	-0.016	0.026	1.67	0.42
ثلاث	0.042	0.027	-1.80	0.9
أولدبيك	-0.041	0.022	-3.40	0.07
الجنس_1	-0.07	0.020	٢.٦٤	0.001
فبس_1	0.011	0.043	0.58	0.00

الجدول 2. ملخص إحصائيات اللوجستية.

الشكل 4. منحنى ROC/AUC لكل نموذج.

	حد أدنى				الحد الأقصى
	نسبة الدقة %	دقة	استدعاء	فورمولا 1	دقة	استدعاء	فورمولا 1
الانحدار اللوجستي	81.9	86	٨٨	87	86	83	84
مصنف SVC	78.6	٨٨	91	٨٨	85	81	85
مصنف K NN	81.9	89	92	89	86	83	86
مصنف RF	78.6	96	97	96	95	93	95

الجدول 3. الحد الأقصى والدقة القصوى.

نموذج/تكرار						متوسط
الانحدار اللوجستي	٨٨	٨٨	٨٠	83	78	٨٣.٨١
دعم المتجه	٨٨	٨٨	75	81	78	82.49
الجوار الأقرب K	85	86	81	85	81	84.15
الغابة العشوائية	83	90	٨٠	85	81	84.15

الجدول 4. دقة نموذج التعلم الآلي باستخدام التحقق المتقاطع.

يوضح الجدول 4 متوسط الدقة الذي تم الحصول عليه من خلال التحقق المتقاطع بخمسة طيات. حققت نماذج الغابة العشوائية وأقرب الجيران أعلى دقة عند

لأن النموذج يمكنه قبول كلا مجموعتي البيانات من خلال تطبيق فصل شجرة القرار في هياكل شبيهة بالأشجار. وبمتابعة دقيقة، حقق الانحدار اللوجستي دقة قدرها

بينما حقق نموذج آلة الدعم الناقل

الدقة مع عينة البيانات المعيارية. ومن الجدير بالذكر أن خوارزمية الغابة العشوائية سجلت أقصى دقة قدرها

في حالات معينة. تم إجراء عملية التحقق المتقاطع بخمسة أضعاف باستخدام عينة البيانات المحدثة لتقييم ومقارنة

نموذج/تكرار						متوسط%
الانحدار اللوجستي	٨٨.٥	٨٨	٨٠	83	78	٨٣.٨١
دعم المتجه	٨٨.٥	٨٨	75	81	78	82.49
الجوار الأقرب K	85.2	86	81	85	81	84.15
الغابة العشوائية	86.8	86	78	86	87	84.73

الجدول 5. دقة نموذج التعلم الآلي باستخدام التحقق المدمج.

الشكل 5. مقارنة دقة مخطط الأعمدة Min-Max.

نماذج التعلم الآلي، بما في ذلك الانحدار اللوجستي مع عدد أقصى من التكرارات يساوي 1000، وSVM مع kernel=’linear’، والخوارزميات الجوارية، وغابات العشوائية. لكل نموذج، تم حساب درجات التحقق المتقاطع باستخدام درجة التحقق المتقاطع عند 5 طيات. تم حساب الدقة المتوسطة كمتوسط درجات التحقق المتقاطع، معبرًا عنها كنسبة مئوية بدقتين. تسلط الجدول الناتج الضوء على الدقة لكل طية ومتوسط دقة بيانات العينة.

في الجدول 5 أعلاه، تعرض نماذج التعلم الآلي المدمجة مع معلمات فرعية مختلفة درجات الدقة لكل نموذج. نموذج الجار الأقرب k يحقق أعلى دقة عند

تليها نماذج الانحدار اللوجستي والغابات العشوائية التي حققت ثاني أدنى دقة عند

يظهر نموذج آلة الدعم النقطي أدنى دقة، حيث سجلت

لذلك، قد يأخذ الباحث إما أعلى درجة أو أدنى درجة لتقييم دقة النموذج لمرض القلب. قد يكون النموذج مرتبكًا بسبب أخذ الحد الأقصى/الحد الأدنى من دقة الخمس عمليات التحقق المتقاطعة. تعتمد درجة الدقة باستخدام الحد الأقصى/الحد الأدنى من كل قيمة عائدة للنموذج على إعداد المعامل المنظم نظرًا لأن العينة قد أعيد ترتيبها باستخدام القيمة الطبقية التي تصبح صحيحة عندما اعتبر الباحث العينة المعاد ترتيبها عند تكرار التحقق المتقاطع؛ الفرق بين كل نموذج مهم لدقة النموذج الكبيرة بالنسبة للعينات المصنفة بشكل صحيح.

من الشكل 4، حققت النماذج الأفضل ضبطًا التي تم اختبارها مع قيم التنبؤ والمعلمات المعكوسة الضبط الدرجات التالية من الدقة: الانحدار اللوجستي (

غابة عشوائية

دعم المتجهات

وتمتعت جميع النماذج بدقة تتجاوز 90%، مما يشير إلى أداء تصنيفي مرضٍ عند دمجها واختبارها.

في الشكل 5 أعلاه، يقارن رسم الشريط درجات الدقة لكل نموذج مع أعلى وأدنى الدرجات، سواء مع أو بدون اعتبارات دقة الحلقة. أحدث نموذج الانحدار اللوجستي ونموذج دعم المتجهات فرقًا كبيرًا بين الحالة بدون التحقق المتقاطع والحالة مع التحقق المتقاطع. كان لنموذج آلة الدعم المتجه أكبر فرق، في حين أن نموذج الجار الأقرب حقق أفضل نتيجة بين النماذج الأربعة. وبالمثل، عند مقارنة نموذج مستقل واحد مقابل نماذج متعددة مع نموذج دعم المتجه مع التحقق، يختلف بـ

مقارنةً بدقة نموذج الجار الأقرب التي تقل عن 5%، كما هو موضح في الشكل 6. تصبح دقة النموذج في التعلم الآلي أقل من دقة التحقق المتقاطع بسبب إعادة الترتيب القياسية مع n -fold أثناء ضبط النموذج.

وبالمثل، تم الاستنتاج أن النموذج الفردي لديه أقل دقة مع القيم المتوسطة. لذلك، يُوصى بأن ينتج نموذج التحقق المتعدد الأقصى أعلى دقة. سجل نموذج الغابة العشوائية مع التحقق المتقاطع

الدقة مقارنة بـ

أدى نموذج الغابة العشوائية أداءً أفضل بسبب طبيعته التجميعية، مما ساعد في تقليل التباين والتعامل مع البيانات غير المتوازنة. تقارن تقنية التحقق المتقاطع أداء النماذج من خلال تقسيم البيانات إلى مجموعات تدريب واختبار عدة مرات.

تمثل منحنيات التعلم التكلفة الحاسوبية والجهد المبذول في اكتساب المعرفة أو المهارات بمرور الوقت أو من خلال التجارب المتكررة. تصور منحنيات التعلم التحديات المرتبطة بإتقان موضوع ما على مدى فترة معينة والتقدم النسبي المحرز خلال عملية التعلم. يفترض هذا المفهوم تضاعف الإنتاج، حيث، على سبيل المثال،

تشير منحنى التعلم إلى أن متوسط الوقت التراكمي لكل وحدة ينخفض إلى

من المتوسط السابق حيث يتضاعف الناتج. يتم قياس ذلك من الوحدة الأولى المنتجة.

الشكل 6. مقارنة مخطط الأعمدة لنموذج التجميع.

الشكل 7. منحنى التعلم لأقرب الجيران (أ) وغابة عشوائية (ب).

غالبًا ما يتم توليد منحنيات التعلم باستخدام طرق مثل البحث الشبكي عبر التحقق المتقاطع، والذي يقيم النماذج أو الوظائف عبر مجموعة من المعلمات، مستفيدًا من الأعداد المطلقة من أمثلة التدريب لرسم وتقييم اتجاهات الأداء. يتم استخدام التقييم كمقياس لتقييم أداء النسخة لتحديد المعلمات الفائقة الدقيقة؛ إذا لم يكن هناك شيء خاص، فإنه يستخدم تصنيف المقدر. بشكل افتراضي، يتم تعيينه على خمسة؛ ومع ذلك، هنا قرر الباحث استخدام 10 تكرارات. تمثل الوظائف مجموعة واسعة من الوظائف التي سيتم تشغيلها بالتوازي، و -1 تشير إلى استخدام جميع المعالجات. بعد استيراد حزمة منحنى التعلم في وحدة تحكم بايثون، يتم تقسيم البيانات المعيارية أولاً إلى مجموعة تعتمد على القلب ومجموعة مستقلة من أمراض القلب.

final4. حذف([غير, هدف], المحور

)، و

النهائي4 مع الهدف. معدل التعلم ينقسم مع دقة التسجيل، ومعدل التعلم يبدأ من

، و 100 تقسيم للتكرار. بعد تقسيم بيانات التدريب والاختبار، احسب متوسط دقة نموذج K-nearest plot. يصف منحنى التعلم مقياس التدريب والتحقق لوصف الإفراط في التكيف ونقص التكيف.

في الشكل 7 أعلاه، تمثل الخطان منحنى التحقق، الذي يتغير تدريجياً، مع الإشارة إلى أن الخط السفلي يدل على خطأ التدريب أو درجة الدقة. يصف هذا المنحنى كيفية تغير مقاييس الخطأ عند زيادة التدريب والتحقق من صحة النموذج. يصف كل خط التأثيرات المجمعة للنموذج مع مجموعات بيانات القلب. في البداية، عندما وصل النموذج إلى 100 مجموعة تدريب، أنتج النموذج خطاً متوتراً، ثم بعد تقليل كلاهما للتحويل، أدى الوصول إلى 250 تكراراً إلى إنتاج مخرجات ثابتة بسبب مجموعات بيانات أمراض القلب التي تشير إلى تباين عالٍ. وبالمثل، عندما تم إنتاج منحنى التعلم لشجرة العشوائية بعد 50 عينة تدريب، زادت دقة نموذج التعلم الآلي بشكل كبير.

) ولكن أشار إلى انحياز عالٍ مقارنةً بالخط.

في الشكل 8 أعلاه، يتم تصوير الحد الأقصى للتغير حيث تتغير كل دورة من دورات التحقق المتقاطع خلال عملية تدريب واختبار النموذج. تدرس آلة الدعم الناقل والانحدار اللوجستي استخدام مجموعة بيانات أمراض القلب خطوة بخطوة للتحقق بعد 50 جيلًا. تُظهر منحنى بيانات التدريب تحسنًا تدريجيًا أكبر عند 200 خطوة بسرعة أكبر من منحنى التحقق، مما يشير في النهاية إلى الإفراط في التكيف. النظام

الشكل 8. منحنى التعلم لدعم المتجهات (أ) والانحدار اللوجستي (ب).

الشكل 9. درجة الدقة والعَتبة (أ) الدقة عند 12 تكرار (ب).

يتعرف على المنحنى الذي يكون مفيدًا للعديد من الأغراض، مع تقييم خوارزميات مختلفة، واختيار معلمات النموذج خلال التصميم، وتحديد عدد البيانات المستخدمة للتدريب. يُشار إلى هذا التباين في العلاقة بين الممارسة والكفاءة بمرور الوقت باسم ‘منحنى التعلم’. تم ضبط الانحدار اللوجستي أكثر من

أفضل ضبط عندما تم تركيب كلاهما بعد 175 تكراراً.

تتكون مجموعة البيانات من 303 سجلات، وتم تقسيمها إلى 165 و138 سجلاً لأغراض الاختبار. تعرض مخططات عتبة التمييز مع 100 تجربة دقة واسترجاع ومخططات درجة F1 مع مجموعات بيانات التدريب والاختبار غير المرئية، مما يكشف عن أفضل ملاءمة عند

قد تختلف درجات التحقق المتقاطع ضمن

كما هو موضح في الشكل 9(أ). وبالمثل، تظهر درجات الدقة من 12 تكرارًا متوسط درجة مربعة قدرها

مما يؤدي إلى نتائج مشابهة، كما هو موضح في الشكل 9(b). بعد استخدام منحنيات التحقق المتقاطع لخطأ الغابة العشوائية،

تم تقييم 16 ميزة على أنها الأفضل عندما تم دمج 5 ميزات في كل خطوة.

وبالمثل، في الشكل 10(أ)، يصل خطأ التنبؤ وخطأ المتبقي إلى 85.5 عندما يصبح عدد الميزات 16، كما لوحظ في نموذج آلة الدعم الناقل. بالإضافة إلى ذلك، تظهر كل من عينات التدريب والاختبار هيستوجرامات متناظرة، مما يشير إلى توزيع متساوٍ لقيم التنبؤ، كما هو موضح في الشكل 10(ب) و(ج).

بعد فصل الميزات التابعة والمستقلة، يكشف نموذج الانحدار اللوجستي في الشكل 11(أ) عن أهمية الميزات وتحليل مصفوفة الالتباس. تصل درجة الدقة إلى

بينما يصل معدل AUC المتوقع إلى

، مما يدل على دقة عالية. تُظهر أهمية الميزات في الانحدار اللوجستي كلاً من المساهمات الإيجابية والسلبية في الكشف عن مرض القلب المستهدف. وبالمثل، يحقق نموذج الغابة العشوائية الموضح في الشكل 11(b) ثاني أعلى دقة عند

، مع درجة AUC جديرة بالثناء. تصل الدقة الكلية لتوقع أمراض القلب إلى

تم رسم دقة النموذج، والاسترجاع، ودرجات F1 وفقًا لذلك. ومع ذلك، يتنبأ نموذج الجار الأقرب بدقة أقل مقارنة بالنماذج التي تم تحليلها.

الشكل 10. خطأ التنبؤ (أ) ورسم المتبقي (ب) المدرج التكراري (ج).

الشكل 11. الانحدار اللوجستي (أ) ملخص إحصائيات الغابة العشوائية (ب).

في الشكل 12(أ)، حقق نموذج آلة الدعم الناقل (SVM) دقة قدرها

عبر أربعة مستويات اختبار، مع مساحة تحت المنحنى (AUC) من

لتوقع غياب مرض القلب. بالإضافة إلى ذلك، تحسنت دقة المتوسط الكلي إلى

. وبالمثل، يبرز الشكل 12(b) أداء نموذج الجيران الأقرب (KNN)، الذي حقق دقة متوسطة ماكرو تبلغ

كانت دقة توقعاته

لكشف حالات مرض القلب و

لتحديد الحالات غير المرضية. الأداء المتفوق لنموذج الغابة العشوائية يبرز أيضًا إمكانيته كأداة موثوقة للكشف المبكر عن أمراض القلب، خاصة في البيئات ذات الموارد المحدودة. ومع ذلك، يجب أن تتناول عملية تنفيذ التعلم الآلي في الرعاية الصحية القضايا الحرجة مثل التحيز، وقابلية التفسير، وخصوصية البيانات لضمان نتائج عادلة وفعالة.

التقييم المتقاطع هو نهج إحصائي قوي لتقييم نماذج التعلم الآلي من خلال تقسيم البيانات بشكل منهجي إلى مجموعات تدريب واختبار. تضمن هذه الطريقة أن تعمم النماذج بشكل فعال على البيانات غير المرئية، مما يقلل من مخاطر الإفراط في التكيف وعدم التكيف. من بين النماذج المتعددة التي تم اختبارها، يحقق نموذج الغابة العشوائية (RF) أعلى دقة ماكرو.

) والدقة (

)، متفوقًا على الانحدار اللوجستي

آلة الدعم الناقل (SVM) وجيران الأقرب (KNN). تظهر SVM

الدقة في تحديد حالات أمراض القلب و

في تمييز الحالات غير المرضية، بينما يبرز التحقق المتقاطع

تباين الدقة لـ SVM وأقل من

لنموذج KNN. تؤكد هذه الرؤى على أهمية التحقق المتقاطع في تحسين دقة النموذج وموثوقيته. تساعد منحنيات التعلم أيضًا في فهم كيفية تحسين النماذج للمعلمات بمرور الوقت، حيث تعكس الدرجات الأعلى أداءً أفضل. حقق النموذج الأساسي بدون تطبيع تباين دقة التحقق المتقاطع من

، مما يشير إلى الإمكانية

الشكل 12. متجه الدعم (أ) وإحصائيات الجوار الأقرب K (ب).

تحسينات من خلال اختيار الميزات المتقدم وطرق التجميع. يوفر دمج التعلم الآلي في الرعاية الصحية إمكانيات هائلة من حيث دقة التنبؤ ودعم القرار. ومع ذلك، يجب معالجة التحديات مثل التحيز، وقابلية التفسير، وخصوصية البيانات لضمان تطبيق عادل وموثوق في البيئات السريرية. تشمل التوصيات المستقبلية تحسين تقنيات اختيار الميزات، واستغلال نماذج التجميع، ودمج مجموعات بيانات أكبر من العالم الحقيقي لتعزيز قوة النموذج وقابليته للتعميم.

مجموعة البيانات مفتوحة المصدر عن أمراض القلب، التي تحتوي على 13 ميزة، متاحة مجانًا على الرابط التالي: ‘https://raw.githubusercontent.com/kb22/Heart-Disease-Prediction/master/dataset.csv) مجموعة البيانات. بالإضافة إلى ذلك، فإن كود بايثون المصدر لنقل البيانات المصدر إلى بيانات البحث متاح علنًا في مستودع GitHub الخاص بي: https://github.com/yagyarimal/Heart22.

تاريخ الاستلام: 21 أبريل 2024؛ تاريخ القبول: 10 مارس 2025
نُشر على الإنترنت: 18 أبريل 2025

Stone, M. Cross-validatory choice and assessment of statistical predictions. J. R Stat. Soc. Ser. B Methodol. 36(2), 111-133. https:// doi.org/10.1111/j.2517-6161.1974.tb00994.x (1974).
Chin, C. & Osborne, J. Students’ questions: a potential resource for teaching and learning science. Stud. Sci. Educ. 44(1), 1-39. https://doi.org/10.1080/03057260701828101 (2008).
Maldonado, S., López, J. & Iturriaga, A. Out-of-time cross-validation strategies for classification in the presence of dataset shift. Appl. Intell. 52(5), 5770-5783. https://doi.org/10.1007/s10489-021-02735-2 (2022).
Mahesh, T. R., Geman, O., Margala, M. & Guduri, M. The stratified K-folds cross-validation and class-balancing methods with high-performance ensemble classifiers for breast cancer classification. Healthc. Anal. 4, 100247. https://doi.org/10.1016/j.health. 2 023.100247 (2023).
Barrow, D. K. & Crone, S. F. Cross-validation aggregation for combining autoregressive neural network forecasts, vol. 32, no. 4. 1120-1137 (Accessed 14 Jan 2025). https://www.sciencedirect.com/science/article/pii/S0169207016300188 https://doi.org/10.101 6/j.ijforecast.2015.12.011 (Elsevier, 2016).
Schmidt, J., Marques, M. R., Botti, S. & Marques, M. A. Recent advances and applications of machine learning in solid-state materials science. Npj Comput. Mater. 5(1), 1-36. https://doi.org/10.1038/s41524-019-0221-0 (2019).
Ye, Z. et al. Predicting beneficial effects of Atomoxetine and Citalopram on response Inhibition in P Arkinson’s disease with clinical and neuroimaging measures. Hum. Brain Mapp. 37(3), 1026-1037. https://doi.org/10.1002/hbm.23087 (2016).
Gimenez-Nadal, J. I., Lafuente, M., Molina, J. A. & Velilla, J. Resampling and bootstrap algorithms to assess the relevance of variables: applications to cross section entrepreneurship data. Empir. Econ. 56(1), 233-267. https://doi.org/10.1007/s00181-017-1 355-x (2019).
Dodge, J., Gururangan, S., Card, D., Schwartz, R. & Smith, N. A. Expected validation performance and estimation of a random variable’s maximum. (Accessed 04 Feb 2024) http://arxiv.org/abs/2110.00613 (2021).
Belkin, M., Hsu, D., Ma, S. & Mandal, S. Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. 116(32), 15849-15854,019. https://doi.org/10.1073/pnas. 1903070116
Kernbach, J. M. & Staartjes, V. E. Foundations of machine learning-based clinical prediction modeling: Part II-generalization and overfitting. In Machine Learning in Clinical Neuroscience Acta Neurochirurgica Supplement, vol. 134, (eds Staartjes, V. E. et al.) 15-21. https://doi.org/10.1007/978-3-030-85292-4_3 (Springer International Publishing, 2022).
Olaniyi, E. O., Oyedotun, O. K., Ogunlade, C. A. & Khashman, A. In-line grading system for Mango fruits using GLCM feature extraction and soft-computing techniques. Int. J. Appl. Pattern Recognit. 6(1), 58-75. https://doi.org/10.1504/IJAPR.2019.104294 (2019).
Benjamin, E. J. et al. Heart disease and stroke statistics-2019 update: a report from the American Heart Association. Circulation. 139(10), e56-e528 (2019).
Arora, S., Santiago, J. A., Bernstein, M. & Potashkin, J. A. Diet and lifestyle impact the development and progression of Alzheimer’s dementia. Front. Nutr. 10, https://doi.org/10.3389/fnut.2023.1213223 (2023).
Zuhair, M. et al. Estimation of the worldwide seroprevalence of cytomegalovirus: A systematic review and meta-analysis. Rev. Med. Virol. 29(3), e2034. https://doi.org/10.1002/rmv. 2034 (2019).
Xiong, B., Jiang, W. & Zhang, F. Semi-supervised classification considering space and spectrum constraint for remote sensing imagery. In 2010 18th International Conference on Geoinformatics, 1-6. https://doi.org/10.1109/GEOINFORMATICS.2010.55679 81 (IEEE, 2010).
Nadar, N. & Kamatchi, R. A novel student risk identification model using machine learning approach. Int. J. Adv. Comput. Sci. Appl. 9, 305-309. https://doi.org/10.14569/IJACSA.2018.091142 (2018).
Khan, A. & Ghosh, S. K. Student performance analysis and prediction in classroom learning: A review of educational data mining studies. Educ. Inf. Technol. 26(1), 205-240. https://doi.org/10.1007/s10639-020-10230-3 (2021).
Yousafzai, B. K., Hayat, M. & Afzal, S. Application of machine learning and data mining in predicting the performance of intermediate and secondary education level student, Educ. Inf. Technol., 25(6), 4677-4697. https://doi.org/10.1007/s10639-020-10 189-1 (2020).
Smirani, L. K., Yamani, H. A., Menzli, L. J. & Boulahia, J. A. Using ensemble learning algorithms to predict student failure and enabling customized educational paths, Sci. Program. 2022, 1-15. https://doi.org/10.1155/2022/3805235 (2022).
Usama, M., Ahmad, B., Xiao, W., Hossain, M. S. & Muhammad, G. Self-attention based recurrent convolutional neural network for disease prediction using healthcare data, comput. Methods Programs Biomed. 190, 105191 (2020).
Shukla, N., Hagenbuchner, M., Win, K. T. & Yang, J. Breast cancer data analysis for survivability studies and prediction, Comput. Methods Programs Biomed., 155, 199-208, https://doi.org/10.1016/j.cmpb.2017.12.011 (2018).
Kaur, G. & Chhabra, A. Improved J48 classification algorithm for the prediction of diabetes. Int. J. Comput. Appl. 98(22), 13-17. https://doi.org/10.5120/17314-7433 (2014).
Naz, H. et al. Deep learning approach for diabetes prediction using PIMA Indian dataset. J. Diabetes Metab. Disord. 19(1), 391-403 https://doi.org/10.1007/s40200-020-00520-5.
Dharma, F. et al. Prediction of Indonesian inflation rate using regression model based on genetic algorithms. J. Online Inform. 5(1), 45-52 https://doi.org/10.15575/join.v5i1.532 (2020).
Touzani, S., Granderson, J. & Fernandes, S. Gradient boosting machine for modeling the energy consumption of commercial buildings. Energy Build. 158, 1533-1543 https://doi.org/10.1016/j.enbuild.2017.11.039 (2018).
Mohan, S., Thirumalai, C. & Srivastava, G. Effective heart disease prediction using hybrid machine learning techniques. IEEE Access. 7, 81542-81554. https://doi.org/10.1109/ACCESS.2019.2923707 (2019).
Anuradha, C. & Velmurugan, T. A comparative analysis on the evaluation of classification algorithms in the prediction of students performance. Indian J. Sci. Technol. 8(15). https://doi.org/10.17485/ijst/2015/v8i15/74555 (2015).
Hussain, A. A. & Dimililer, K. Student grade prediction using machine learning in iot era. In International Conference on Forthcoming Networks and Sustainability in the IoT Era, 65-81. https://doi.org/10.1007/978-3-030-69431-9_6 (Springer, 2021).
Mathers, C. D., Boerma, T. & Ma Fat, D. Global and regional causes of death. vol. 92, no. 1, 7-32 (Accessed 14 Jan 2025). https://a cademic.oup.com/bmb/article-abstract/92/1/7/332071 https://doi.org/10.1093/bmb/ldp028 (Oxford University Press, 2009).
Chowdhury, R. et al. Dynamic interventions to control COVID-19 pandemic: a multivariate prediction modelling study comparing 16 worldwide countries. Eur. J. Epidemiol. 35(5), 389-399. https://doi.org/10.1007/s10654-020-00649-w (2020).
Townsend, N. et al. Epidemiology of cardiovascular disease in Europe. Nat. Rev. Cardiol. 19(2), 2 https://doi.org/10.1038/s41569-0 21-00607-3 (2022).
Ansari, M. F., Alankar, B. & Kaur, H. A prediction of heart disease using machine learning algorithms. In Image Processing and Capsule Networks, vol. 1200, (eds Chen, J. I. Z. et al.) in Advances in Intelligent Systems and Computing, vol. 1200, 497-504. https://doi.org/10.1007/978-3-030-51859-2_45 (Springer International Publishing, 2021).
Amarbayasgalan, T., Pham, V. H., Theera-Umpon, N., Piao, Y. & Ryu, K. H. An efficient prediction method for coronary heart disease risk based on two deep neural networks trained on well-ordered training datasets. IEEE Access. 9, 135210-135223. https:/ /doi.org/10.1109/ACCESS.2021.3116974 (2021).
Barhoom, A. M., Almasri, A., Abu-Nasser, B. S. & Abu-Naser, S. S. Prediction of Heart Disease Using a Collection of Machine and Deep Learning Algorithms (Accessed 04 Feb 2024) https://philpapers.org/rec/BARPOH-4 (2022).

أتمنى أن يتم نشره.

مساهمة المؤلفين: ياجياناث ريمال: تصميم التجربة، تفسير البيانات، تصميم نموذج الذكاء الاصطناعي. نافنيت شارما: الإشراف، اختيار التعلم الآلي. سيدهارث باوديل: كود بايثون. عبير السدون: تدقيق اللغة الإنجليزية، التحقق من النموذج. مدهف بارساد كويرالا: تنسيق الكلمات. سميت جيل: المراجعة.

يعلن المؤلفون عدم وجود مصالح متنافسة.

يجب توجيه المراسلات والطلبات للحصول على المواد إلى Y.R.
معلومات إعادة الطبع والتصاريح متاحة علىwww.nature.com/reprints.
ملاحظة الناشر: تظل شركة سبرينغر ناتشر محايدة فيما يتعلق بالمطالبات القضائية في الخرائط المنشورة والانتماءات المؤسسية.

الوصول المفتوح. هذه المقالة مرخصة بموجب رخصة المشاع الإبداعي النسب-غير التجارية-عدم الاشتقاق 4.0 الدولية، التي تسمح بأي استخدام غير تجاري، ومشاركة، وتوزيع، وإعادة إنتاج في أي وسيلة أو صيغة، طالما أنك تعطي الائتمان المناسب للمؤلفين الأصليين والمصدر، وتوفر رابطًا لرخصة المشاع الإبداعي، وتوضح إذا قمت بتعديل المادة المرخصة. ليس لديك إذن بموجب هذه الرخصة لمشاركة المواد المعدلة المشتقة من هذه المقالة أو أجزاء منها. الصور أو المواد الأخرى من طرف ثالث في هذه المقالة مشمولة في رخصة المشاع الإبداعي الخاصة بالمقالة، ما لم يُشار إلى خلاف ذلك في سطر الائتمان للمادة. إذا لم تكن المادة مشمولة في رخصة المشاع الإبداعي الخاصة بالمقالة وكان استخدامك المقصود غير مسموح به بموجب اللوائح القانونية أو يتجاوز الاستخدام المسموح به، ستحتاج إلى الحصول على إذن مباشرة من صاحب حقوق الطبع والنشر. لعرض نسخة من هذه الرخصة، قم بزيارة http://creativecommo ns.org/licenses/by-nc-nd/4.0/.
© المؤلفون 2025

جامعة IIS (تعتبر جامعة)، جايبور، الهند. جامعة بوكهارا، بوكهارا، نيبال. IOE، حرم بولتشوك، باتان، نيبال. جامعة سيدني الغربية (WSU)، سيدني، أستراليا. كلية آسيا والمحيط الهادئ الدولية (APIC)، سيدني، أستراليا. جامعة مهارشي داياناند، روهتاك، الهند. البريد الإلكتروني: rimal.yagya@gmail.com

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-93675-1
PMID: https://pubmed.ncbi.nlm.nih.gov/40251253
Publication Date: 2025-04-18

Yagyanath Rimal , Navneet Sharma , Siddhartha Paudel , Abeer Alsadoon , Madhav Parsad Koirala & Sumeet Gill

This primary research paper emphasizes cross-validation, where data samples are reshuffled in each iteration to form randomized subsets divided into n folds. This method improves model performance and achieves higher accuracy than the baseline model. The novelty lies in the data preparation process, where numerical features were imputed using the mean, categorical features were imputed using chi-square methods, and normalization was applied. This research study involves transforming the original datasets and comparative model analysis of four Logistic Regression (LR), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Random Forest (RF) cross-validation methodologies to heart disease open datasets. The objective is to easily identify the average accuracy of model predictions and subsequently make recommendations for model selection based on data preprocessing cross-validation model increased ( 5 to ) more than baseline model for best model selection. From comparing each model’s accuracy scores, it is found that the logistic regression and -nearest neighbor models achieved the highest accuracy of among the four models when single accuracy is a concern. However, the random forest model summary statistics attained an F1 score of 95%, precision , and recall ( ), indicating the highest overall macro accuracy score. These findings can be further compared using learning curve validation. Conversely, the logistic regression model exhibited the lowest accuracy of among the four machine learning models. However, this research does not cover hyperparameter optimization, which could potentially improve model performance.

Keywords Machine learning, Cross-validation, Accuracy-precision, Learning curve, Health informatics
Healthcare has been transformed by machine learning, which holds promise for better diagnostic precision in complicated illnesses like heart disease. However, optimal validation techniques for robust heart disease prediction models remain underexplored, highlighting the knowledge gap by enhancing diagnostic accuracy, particularly in complex conditions like heart disease causes. As heart disease continues to be a leading cause of death globally, the need for precise diagnostic tools has never been greater. However, optimal validation techniques for robust heart disease prediction models remain underexplored. Machine learning is the process of designing a model based on training and testing data sets whose value is further evaluated from validation sets of the sample. The train-test split is a widely used method for dividing research datasets into training and testing subsets. Model accuracy primarily depends on the input and validation sets. Cross-validation, with multiple-fold iterations, helps fine-tune the model to achieve optimal performance scores. Machine learning models typically begin by splitting the available dataset into training, validation, and testing sets, often using a ratio of 70:15:15. The model is built and trained using the training data, evaluated on the validation set to fine-tune its performance, and finally tested on the unseen testing set to assess its generalization capability

. Multi-grade classification and prediction based on previously scored grade patterns are more accurate when doctors use the medicine on asking some critical question solutions during examination. However, this approach may be less effective when

doctors are prescribed medicine without prior exposure to critical question papers

. Cross-validation is a method of training a model by dividing the dataset into multiple folds, using a portion of the data from each split as the validation set while the remaining data is used to train the model, ensuring optimal accuracy

. The most common stratified class validation is used to split the data, spreading a similar ratio from target outputs between prediction samples that provides the best average score

. The hold-out method works on the left part of the training sets for the model, while the stratified n -fold works on imbalanced data sets when each fold contains appropriately the same strata of each output class. The leave-out cross-validation uses samples to train and points as validation, which is repeated for all combinations, and the error is averaged until there is no randomness with averaging

. Logistic regression, random forest, support vector machine, bootstrapping, and cross-validation techniques are used to solve overfitting problems in medical research. Bootstrapping uses minimal sample data to resample the data, while cross-validation techniques use many contributing features to compare target responses

. Author

split the dataset into folds, trained the model on the training set, and validated it on the test set. Repeat these steps 3 to 6,000 times, with the first convolution reserved for model testing and the rest used for model training. Bias measures the difference between the model’s predictions and the actual target values, while variance reflects the inconsistency of the model’s predictions across different datasets. Ideally, the model achieves a balance between bias and variance, ensuring optimal explaining ability and yielding the best overall performance on the dataset

. Similarly authors

utilized the generalization process to evaluate how effectively a model is trained to identify meaningful data patterns and classify unseen data samples. Overfitting models remember the data patterns of the training dataset but do not generalize to unseen data, leading to high variance

. Underfitting occurs when the model fails to capture patterns from the dataset, often due to insufficient or low-quality training data. Additionally, a lack of adequate training samples can also lead to underfitting. Researchers aim to achieve a good fit by identifying patterns in the training data, which depends on the quality of data inputs and the splitting of folds during model development. Similarly, the author

proposes a three-phase model based on artificial neural networks (forward, backward). This analysis model showed and achieved a classification accuracy of

using the university dataset. If the neural network backpropagation model showed

accuracy when testing records of unseen data. Similarly

, conducted research on nonsmoking among children aged 12 to 19 years, reporting accuracies ranging from

(minimum) to

(maximum) variations. Using assembly physical movement, body mass list, and blood glucose did not progress; the predominance declined from 70 to

over the same time

. The classification framework and accomplished framework show

accuracy, however, model wise differs largely by

individually utilizing ventricular systolic execution within the distribution the distributed reports shift broadly from 13 to

; the detailed yearly mortality rate moreover shifts from 1.3 to

. Similarly, authors

integrated medical decisions based on cardiac infection symptom framework classifiers, multi-layer perceptron, artificial neural network-driven methodology highlights for heart disease, machine learning algorithms, artificial neural networks, and artificial neural networks using analytical hierarchical fuzzy processing. Their proposed classification system achieved a classification accuracy of

. This work mainly discusses the model selection and its accuracy, but without dealing with various cases of overfitting and underfitting classification computation double classification problems, y

, negative history, and an estimate of the forward variable y for one positive course (AUCROC). The multiclass to predict estimates of y for y

increased variance. A guess is sketched to classify 2 classes and 1 class; the yield of the classifier is a 0.5 threshold value. A support vector machine can be a machine learning classification computation commonly used for classification problems. The support vector machine used the most extreme edge technology modified

of image analysis. According to

, imbalanced data sets classified primary school and higher education using the multiple-choice online of Bharathiar University. The research conducted on the number of students not completing graduation in the USA is

, and in Europe, it is around

to finish their studies on time that utilizing multigrade target features. Similarly, the authors

of relevant studies published between 2000 and 2018 indicate that multiple factors influence performance in non-linear ways in online learning of performance analysis and influence factor identification based on the behavior of assessments, teaching, and association rule mining. The regression, classification analysis, and performance prediction of a majority of

of modeling studies prefer to classify the performance as success or failure too. Similarly, author

used supervised mastering algorithms for the improvement of a predictive version of the Federal Board of Intermediate and Secondary Schooling Islamabad Pakistan, the use of folding 10. In k-fold pass-validation, a reduced education vector-based totally aid vector device capable of predicting at-risk and marginal college students that the support vector completed a training vector discount of at least fifty-nine point

without changing the margin or accuracy of the classifier. Moreover, the effects confirmed the proposed approach to be able to achieve a basic accuracy of

and

in predicting at-risk, respectively, which are good examples of standalone accuracy variation. Similarly, the authors

used light gradient boosting, extended gradient boosting, random forests, and multilayer perception classifiers from UCI records and used three groups for error prediction with stack generalization about machine learning repositories to target features. They achieve an average sensitivity of

, joint accuracy of

classification, an F1 rating of

, and an average

neural network algorithm. The drop rate down from 12 to

. Similarly, author

used a neural community version for the exhibit that the proposed model reaches up to an accuracy of

, which is higher than many present methods for cerebral infarction disorder. Similarly, authors

of this DBSCAN identified and extracted nine clusters of informative gene facts selected by differential gene expression analysis to five special category models. Then, a deep mastering method is hired to ensemble the outputs of the 5 classifiers. Similarly, authors

The modified J48 classifier is used to boost the accuracy fee of the data mining technique. The facts mining tool MATLAB for generating the decision classifiers and Naive Bayesian classifiers in WEKA. The general accuracy is around eighty-three. Similarly, authors

memetic set of rules stepped forward the accuracy from

to ninety-three point

. Which additionally found that the memetic algorithm had a higher accuracy than the version from the genetic set of rules and a regression model. Similarly

, uses a genetic
algorithm-primarily based regression model for predicting inflation levels. The version becomes educated and evaluates the usage of facts. Similarly, the authors

used a prediction model delivered with one-of-a-kind combos of features and several acknowledged category strategies. That produces an improved performance level with an accuracy degree of

via the prediction version for coronary heart disorder with the hybrid random forest area with a linear model. Similarly, authors

used the prediction of the performance of the k-nearest neighbor algorithm for the overall performance class, and overall accuracy of the tested classifiers is acquired at

. The decision tree categorized approximately

for the 10 -fold go-validation testing and sixty-nine

for the share split testing. The precision is high for the primary class (

) and 2 d elegance (

). Similarly, authors

did research to locate which function has the best effect on the goal class to locate which method outperforms the most used RF component, J48, Bayes internet Socio-financial, demographic, and educational facts random forest provided

accuracies. The internal assessment influences the final semester percentage. Similarly, author

used the demographics to outperform random forest by providing

accuracy on training information with 10 -fold cross-validation and

accuracy on the holdout method. While the guidance is implemented, the accuracy of the simplest ANN improves by up to

, and vice versa for other methods. Self-efficacy and motivation for success are good predictors while correlating with GPA. Like demographic features, heart disease independent features were significantly influenced by target heart disease, and besides models, it also varies their accuracy, so this research tries to validate cross-validation scores, including four popular model comparisons.

Machine learning plays a giant position in extracting the hidden capabilities from the scientific records, beneficial for early detection from the heart ailment report repository; that’s the reason approximately 12 million deaths happen globally

. Coronary disorder dying is observed greater in the USA than in other advanced European countries

. Therefore, this research aims to achieve the most accurate validation accuracy prediction for machine learning models using four techniques: logistic regression, support vector machines, nearest neighbors, and random forests. The validation focuses on identifying coronary artery conditions, making it easier for medical practitioners to select the most suitable model for classification and prediction. This study compares four machine learning models-logistic regression, support vector machines, k-nearest neighbors, and random forests-using cross-validation to determine their predictive efficacy.

The datasets on heart disease comprise 13 categorical attributes requiring preprocessing before conducting machine learning model testing and evaluation, depicted in Fig. 1. After loading the dataset into the Python console, the Python command df. Types describe the data types categories with their respective categories. Each feature with a unique c command describes the properties of the data sets. One hot encoder (categories=’auto’) fits with fit transform (df [sex, cp., fbs, restecg, exang, slope, thal, ca]). The column names of each categorical OneHotEncoder constitute 76 columns of data sets. Cross-validation provides better model optimization of heart disease using logistic regression and support vector machine learning models before finalizing the best model for research data. A dataset of

. read csv (‘https://raw.githubusercontent.com/kb22/Heart-DiseasePrediction/master/dataset.csv‘) was used. The most important details in this text are the age, chest pain, treetops, chol, thalach, oldpeak, m, f, typical angina, atypical angina, non-anginal pain, asymptomatic, normal, abnormal, normal, abnormal, yes, no, upsloping, flat, down, normal, fixed defect, reversible defect, non, Ca0, Ca1, Ca2, Ca3, and Ca4’s

. Similarly, author

used a heart disease dataset of samples with 14 independent samples and final target variables having heart disease 1 and not having heart disease 0 as target variables, which

Fig. 1. Steps in data preprocessing for the heart disease dataset.

Age	trestbps	chol	thalach	oldpeak		Age	trestbps	chol	thalach	Oldpeak
63	145	233	150	2.3	Before After	0.952	0.763	-0.256	0.0154	1.08
37	130	250	187	3.5		-1.91	-0.092	0.0721	1.633	2.122
41	130	204	172	1.4		-1.47	-0.092	-0.816	0.977	0.3109
56	120	236	178	0.8		0.18	-0.663	-0.198	1.239	-0.206

Table 1. Data table Preparation before/ after.

Fig. 2. Validation design diagram.

had 303 records of heart disease patients viewed and downloaded 381,647 and 62,705 , respectively, till January 2023. After loading the sklearn preprocessing library of the standard scaler into the Python console, rename their respective columns as final2[age, trestbps, chol, thalach, oldpeak, m, f, and normal]

After being combined with the target column and normalized, the dataset is ready for model comparison, as illustrated in Table 1. The machine learning model always needs train-test splits. This research forest used four model comparisons using normalized data with

splits, and the parameters using stratify indicate that in each of these datasets, the target/label data proportion is preserved as

for the classes

. This indicates there would not have to be oversampling and under-sampling problems of both training and test sets. And with the random state

, we get the identical teach and test units across different executions, but this time, the teach and check sets are exclusive from the preceding case with random state

. The train and test units immediately affect the model’s performance score.

This study aims to validate the most effective machine learning model by employing four widely recognized algorithms, as depicted in Fig. 2. This study compared baseline results with cross-validation using normalized datasets. The datasets were uniquely resampled with various n -splits and tested across machine-learning models, including logistic regression, random forests, support vector machines, and k -nearest neighbors. The process begins with preparing the datasets by applying one-hot encoding for categorical variables and normalizing the data. This is followed by predicting individual model accuracies and performing a 5 -fold cross-validation after splitting the preprocessed heart disease datasets into training and testing sets. This output will further plot using a learning curve with the same cross-validation. The confusion matrix and ROC-AUC curve for each model were compared alongside their respective model summaries, incorporating the learning curve at each step. The study adopts a rigorous approach to data preparation and model validation. The dataset, consisting of 13
categorical attributes, undergoes preprocessing and feature engineering, where one-hot encoding expands the data to 76 columns. An 80:20 train-test split with stratification is used to maintain class proportions, followed by a 5 -fold cross-validation procedure for robust evaluation. Additionally, a 5 -fold cross-validation is employed to plot learning curves, ensuring a comprehensive assessment of model performance. Using Python’s Seaborn for heatmaps. So, data scientists received the best model selection of similar data samples.

After designing the data sets, the correlation between dependent and independent variables is described using a heatmap (final2.corr(), cmap=’cool warm’) in the case of displaying each correlation value. The sns. heatmap (final2.corr(), annot

True) function is used. Correlation plots are used to understand which variables are significantly related to each other and the strength of this relationship to cause heart disease.

The baseline correlation coefficients between the dependent and independent features of the heart disease dataset, derived from logistic regression analysis (Fig. 3), show a mean squared error for features such as sex, cp., true slope, fats, thalach, and ca., indicating satisfactory performance below 10. In contrast, independent features like exang (38), oldpeak (29), and slope (58) exhibit significantly higher error values. Similarly, the model’s coefficient of determination for the dependent variable, as explained by the independent variables, demonstrates substantial variation.

Similarly, the researcher examines the relationship between the dependent and independent features using binary logistic regression to evaluate feature associations. The R -squared metric measures the proportion of variance in the dependent variable explained by the independent variables, as shown in Table 2. In this analysis,

Correlation Heatmap of Features

Fig. 3. Data correlation heat map.

of the variance is explained by the independent variables. The adjusted R -squared, which provides a more accurate measure, stands at

, which was satisfactory. The p -value (P) is close to zero, indicating strong evidence against the null hypothesis, indicating there is a significant contribution of dependent features. The F-statistic (F) is used to test the overall significance of the regression model. A higher F-statistic indicates a more significant relationship between the independent variables and the dependent variable, which is low, 17.71 in total. The intercept represents the expected value of the dependent variable when all independent variables are zero. In this case, the intercept is 0.54 , with a high t -value and low p -value. The coefficients indicate that a one-unit increase in age is associated with a 20 -unit increase in the dependent variable, but not statistically significant. The coefficients suggest that a one-unit increase in Trestbps is associated with a 41 -unit decrease in the dependent variable. The Sex_1 variable is statistically significant, with a t -value of 2.64 and a low p -value (0.001). The other variables (age, trestbps, cholesterol, thalach, oldpeak) do not seem to have a significant relationship with the dependent variable, showing each feature significantly. Therefore, this research needs to further evaluate the machine learning model with cross-validation for the evaluation of prediction accuracy of heart disease patients.

The default machine learning model of four different machine learning model parameters (models = [logistic regression with maximum iteration of 1000 , SVC with kernel is linear, neighbors classifier, random forest]) was designed in the model, and then using a loop, the model whose accuracy score was calculated using after fitting the models. This process involved splitting the data into training and testing subsets, designing the model, making predictions on the test data, and calculating each model’s accuracy score. The workflow consisted of fitting the model using a model fitted with train and train, then predicting the model’s accuracy. The console output revealed that the logistic regression and nearest neighbors models achieved the highest accuracy

, followed by SVM with a linear kernel and scoring

. Specifically, logistic regression is

, K Neighbors is

, the kernel is linear at

, and Random Forest scored

. These accuracy scores demonstrate the models’ performance using default settings. Additionally, the classification accuracy score in the multilabel problem was calculated based on the yttrian sample, where true labels matched predicted labels exactly. This measure effectively highlights the models’ behavior in handling the given dataset Fig. 4: Confusion matrix of four models. The confusion matrix for each target class is generated using the plot of the confusion matrix function, which visualizes class-specific issues and evaluates the model’s error patterns, as shown in Fig. 4. The rows represent the actual classes, while the columns indicate the predicted classes for a binary target. The offdiagonal elements highlight the incorrect predictions, making it easier to identify errors. The model’s accuracy, evaluated using the classification report, was initially calculated under train-test splits of the entire dataset (x and y). However, due to significant variation in the results, cross-validation was employed. In this approach, test samples were varied across five iterations, ensuring a more robust evaluation. The summary statistics for all models, including classification reports, were printed using the command classification report with y, model. predict (x)).

Table 3 summarizes the maximum accuracy achieved by the models and highlights the minimum performance across all four algorithms. Logistic regression and K-nearest neighbor models demonstrate the highest predictive accuracy

compared to support vector machine and random forest models. However, when evaluating macro-average metrics such as precision, recall, and F1-score, the Random Forest model outperforms, achieving an accuracy range of

for true matches. This suggests that while logistic regression is preferable when accuracy alone is the primary concern, the random forest model is more suitable for scenarios where overall performance, including precision, recall, and F1-score, is critical for decision-making.

Another way of resampling heart disease data sets for machine learning is using cross-validation techniques. Therefore, cross-validation is a process for evaluating all K -fold sample models by training each training/test model on subsets of the data. The final majority of the vote will be predicated after evaluating them on the complementary subset of the data. This process is quite good while designing the cross-validation to detect overfitting problems to generalize a pattern. Each model was individually evaluated and fitted, and their accuracy score was calculated using cross-validation and accuracy functions.

-squared:0.58	Adjusted squared:0.54			F-stasticts:17.71
-squared:0.58	coef	Std	t	P
Const	0.54	0.019	28.29	0.00
Age	0.20	0.024	1.023	0.30
Trestbps	-0.41	0.021	-0.92	0.56
Chol	-0.016	0.026	1.67	0.42
Thalach	0.042	0.027	-1.80	0.9
Oldpeak	-0.041	0.022	-3.40	0.07
Sex_1	-0.07	0.020	2.64	0.001
Fbs_1	0.011	0.043	0.58	0.00

Table 2. Logistic summary statists.

Fig. 4. ROC/AUC curve of each model.

	Minimum				Maximum
	Accuracy %	Precision	Recall	F1	Precision	Recall	F1
Logistic regression	81.9	86	88	87	86	83	84
SVC classifier	78.6	88	91	88	85	81	85
K NN classifier	81.9	89	92	89	86	83	86
RF classifier	78.6	96	97	96	95	93	95

Table 3. Maximum and maximum accuracy.

Table 4. Machine learning model accuracy score using cross-validation.

Table 4 outlines the average accuracy obtained through 5 -fold cross-validation. The random forest and k-nearest neighbor models achieved the highest accuracy at

because the model can accept both data sets by applying decision tree separation in tree-like structures. Following closely, logistic regression attained an accuracy of

, while the support vector machine model achieved

accuracy with the normalized data sample. Notably, the random forest algorithm recorded a maximum accuracy of

in certain instances. Using the upgraded data sample, a 5 -fold cross-validation process was performed to evaluate and compare

Model/iteration						Average%
Logistic regression	88.5	88	80	83	78	83.81
Support vector	88.5	88	75	81	78	82.49
K-nearest neighbor	85.2	86	81	85	81	84.15
Random forest	86.8	86	78	86	87	84.73

Table 5. Machine learning model accuracy using combined validation.

Fig. 5. Min-Max bar chart accuracy comparison.

machine learning models, including logistic regression with a maximum iteration equal to 1000 , SVM with kernel=’linear, neighbors, and random forest algorithms. For each model, the cross-validation scores were computed using a cross-validation score at 5 -fold. The mean accuracy was calculated as the average of the crossvalidation scores, expressed as a percentage at two digits. The resulting table highlights the accuracy for each fold and the average accuracy of the sample data.

In Table 5 above, the combined machine-learning models with various hyperparameters display the accuracy scores for each model. The k -nearest neighbor model yields the highest accuracy at

, followed by logistic regression and random forest models with the second lowest accuracy at

. The support vector machine model exhibits the lowest accuracy, scoring at

. Therefore, a researcher might take either the highest score or the lowest score for evaluating the model accuracy for heart disease. The model might be confused due to taking max/min from the five cross-validated accuracies. The accuracy score using the max/min of each model return value depends on the setting for the normalized parameter due to the sample reshuffled using the stratified value becoming true when the researcher considered the sample reshuffled when the cross-validation iteration; the difference between each model matters for large model accuracy for correctly classified samples.

From Fig. 4, the best-tuned models tested with prediction values and inversely tuned parameters achieved the following accuracy scores: Logistic Regression (

), Random Forest (

), Support Vector (

), and K-Nearest Neighbors (slightly lower). All models demonstrated accuracy scores exceeding 90%, indicating satisfactory classification performance when combined and tested.

In Fig. 5 above, a bar diagram compares the accuracy scores of each model with the highest and lowest scores, both with and without loop accuracy considerations. The logistic regression model and support vector model made a large difference between without cross-validation and with cross-validation. The support vector machine had the largest difference, whereas the nearest neighbor’s model produced the best result among the four models. Similarly, when comparing a single independent model vs. multiple with validation model support vector, it differs by

as compared to k. Nearest model accuracy is below 5%, as depicted in Fig. 6. The baseline accuracy of a machine learning model becomes lesser than cross-validation due to standard reshuffling with n -fold while model tuning.

Similarly, it is concluded that the individual model has the least accuracy with mean values. Therefore, it is recommended that the max-multiple cross-validation model produce the highest accuracy. The random forest model with CV scored

accuracy as compared to

. Random Forest performed better due to its ensemble nature, effectively reducing variance and handling imbalanced data. Cross-validation compares model performance by splitting data into training and testing sets multiple times.

The learning curve represents the computational cost and effort involved in acquiring knowledge or skills over time or through repeated experiences. Learning curves visualize the challenges associated with mastering a subject over a given period and the relative progress made during the learning process. The concept assumes a doubling of output, where, for example, a

learning curve indicates that the cumulative average time per unit decreases to

of the previous average as the output doubles. This is measured from the first unit produced.

Fig. 6. Ensemble model bar chart comparison.

Fig. 7. Learning curve k-nearest (a) and Random Forest (b).

Learning curves are often generated using methods like grid search cross-validation, which evaluates models or functions across a range of parameters, leveraging absolute numbers of training examples to plot and assess performance trends. The scoring is used as an evaluating metric for the version performance to determine the fine hyperparameters; if not special, then it uses an estimator rating. Through default, it is ready as five; however, here the researcher decided on 10 reputations. The jobs symbolize the wide variety of jobs to be run in parallel, and -1 indicates the application of all processors. After importing the learning curve package in the Python console, the normalized data first splits into a dependent a and an independent set of heart disease,

final4. drop([non, target], axis

), and

final4 with the target. The learning rate splits with scoring accuracy, and the learning rate starts from

, and 100 iteration splits. After the train test splits, calculate the means of accuracy of the K-nearest model plot. The learning curve describes the training and validation metric for describing overfitting and underfitting.

In Fig. 7 above, the two lines represent the validation curve, which changes gradually, with the lower line indicating the training error or accuracy score. This curve describes how the error metrics when increasing training and validation of the model best fit. Each line describes the combined effects of the model with heart data sets. Initially, when the model reached up to 100 training sets, the model produced a straining line, and then after both were reduced for conversion, reaching 250 iterations produced constant output due to heart disease data sets indicating high variance. Similarly, when the Random Forest learning curve was produced after 50 training samples, the machine learning mode increased the high accuracy score (

) but indicated high bias as compared to the line.

In Fig. 8 above, the maximum variation is depicted as each cross-validation iteration changes during the model training and testing process. The support vector machine and logistic regression study the use of heart sickness data set step by step to validate after 50 generations. The training facts curve shows greater gradual improvement at 200 steps hastily than the validation curve, which in the end suggests overfitting. The system

Fig. 8. Learning curve of support vector (a) and Logistic regression (b).

Fig. 9. Accuracy score and threshold (a) Accuracy at 12 iteration (b).

gets to know the curve is useful for many purposes, together with evaluating distinct algorithms, choosing model parameters for the duration of layout, and determining the number of statistics used for training. This variance in the dating between practice and proficiency over time is referred to as the ‘mastering curve. The logistic regression tuned more than

best tuned when both crossed fitted after 175 iterations.

The dataset, consisting of 303 records, is further divided into 165 and 138 records for testing purposes. Discrimination threshold plots with 100 trials showcase precision, recall, and F1 score plots with both training and testing unseen datasets, revealing the best fit at

. Cross-validation scores might vary within

, as illustrated in Fig. 9(a). Similarly, accuracy scores from 12 iterations show a mean squared score of

, yielding similar results, as shown in Fig. 9(b). After using random forest error cross-validation curves,

with 16 features were scored best optimal when 5 features were folded in each step.

Likewise, in Fig. 10(a), the prediction error and residual error reach 85.5 when the number of features becomes 16, as observed in the support vector machine model. Additionally, both the training and test samples exhibit symmetric histograms, indicating a uniform distribution for predicting values, as depicted in Fig. 10(b) and (c).

Following the separation of dependent and independent features, the logistic regression model in Fig. 11(a) reveals feature importance and confusion matrix analysis. The accuracy score reaches

, while the predicted AUC score attains

, signifying high precision. The feature importance of logistic regression shows both positive and negative contributions to target heart disease detection. Likewise, the random forest model shown in Fig. 11(b) achieves the second-highest accuracy at

, with a commendable AUC score. The macro accuracy for heart disease prediction reaches

, and precision, recall, and F1 scores are plotted accordingly. However, the k-nearest neighbor model predicts with comparatively lower accuracy among the models analyzed.

Fig. 10. Predication error (a) and residual plot (b) histogram (c).

Fig. 11. Logistic regression (a) Random Forest summary statists (b).

In Fig. 12(a), the Support Vector Machine (SVM) model achieved an accuracy of

across four testing levels, with an Area Under the Curve (AUC) of

for predicting the absence of heart disease. Additionally, the macro average accuracy improved to

. Similarly, Fig. 12(b) highlights the performance of the K-Nearest Neighbors (KNN) model, which achieved a macro average accuracy of

. Its prediction accuracy was

for detecting heart disease cases and

for identifying non-disease cases. The superior performance of the Random Forest model further underscores its potential as a dependable tool for early heart disease detection, particularly in resource-constrained environments. However, the successful implementation of machine learning in healthcare must address critical concerns such as bias, interpretability, and data privacy to ensure equitable and effective outcomes.

Cross-validation is a robust statistical approach to evaluating machine learning models by systematically splitting data into training and testing subsets. This method ensures models generalize effectively to unseen data, minimizing the risks of overfitting and underfitting. Among multiple models tested, the Random Forest (RF) model achieves the highest macro accuracy (

) and precision (

), outperforming Logistic Regression

, Support Vector Machine (SVM), and K-Nearest Neighbors (KNN). SVM demonstrates

accuracy in identifying heart disease cases and

in distinguishing non-disease cases, while cross-validation highlights a

accuracy variation for SVM and less than

for KNN. These insights emphasize the importance of cross-validation in improving model accuracy and reliability. Learning curves further aid in understanding how models optimize parameters over time, with higher scores reflecting better performance. The baseline model without normalization achieved a cross-validation accuracy variation of

, indicating potential

Fig. 12. Support vector (a) and K nearest summary statists (b).

improvements through advanced feature selection and ensemble methods. The integration of machine learning into healthcare offers immense potential for predictive accuracy and decision support. However, challenges such as bias, interpretability, and data privacy must be addressed to ensure equitable and reliable application in clinical settings. Future recommendations include refining feature selection techniques, leveraging ensemble models, and incorporating larger, real-world datasets to enhance model robustness and generalizability.

The open-source dataset on heart disease, containing 13 features, is freely accessible at the following link: ‘https://raw.githubusercontent.com/kb22/Heart-Disease-Prediction/master/dataset.csv’) dataset. Additionally, the Python source code for migrating the source data to research data is openly available in my GitHub repository: https://github.com/yagyarimal/Heart22.

Received: 21 April 2024; Accepted: 10 March 2025
Published online: 18 April 2025

Stone, M. Cross-validatory choice and assessment of statistical predictions. J. R Stat. Soc. Ser. B Methodol. 36(2), 111-133. https:// doi.org/10.1111/j.2517-6161.1974.tb00994.x (1974).
Chin, C. & Osborne, J. Students’ questions: a potential resource for teaching and learning science. Stud. Sci. Educ. 44(1), 1-39. https://doi.org/10.1080/03057260701828101 (2008).
Maldonado, S., López, J. & Iturriaga, A. Out-of-time cross-validation strategies for classification in the presence of dataset shift. Appl. Intell. 52(5), 5770-5783. https://doi.org/10.1007/s10489-021-02735-2 (2022).
Mahesh, T. R., Geman, O., Margala, M. & Guduri, M. The stratified K-folds cross-validation and class-balancing methods with high-performance ensemble classifiers for breast cancer classification. Healthc. Anal. 4, 100247. https://doi.org/10.1016/j.health. 2 023.100247 (2023).
Barrow, D. K. & Crone, S. F. Cross-validation aggregation for combining autoregressive neural network forecasts, vol. 32, no. 4. 1120-1137 (Accessed 14 Jan 2025). https://www.sciencedirect.com/science/article/pii/S0169207016300188 https://doi.org/10.101 6/j.ijforecast.2015.12.011 (Elsevier, 2016).
Schmidt, J., Marques, M. R., Botti, S. & Marques, M. A. Recent advances and applications of machine learning in solid-state materials science. Npj Comput. Mater. 5(1), 1-36. https://doi.org/10.1038/s41524-019-0221-0 (2019).
Ye, Z. et al. Predicting beneficial effects of Atomoxetine and Citalopram on response Inhibition in P Arkinson’s disease with clinical and neuroimaging measures. Hum. Brain Mapp. 37(3), 1026-1037. https://doi.org/10.1002/hbm.23087 (2016).
Gimenez-Nadal, J. I., Lafuente, M., Molina, J. A. & Velilla, J. Resampling and bootstrap algorithms to assess the relevance of variables: applications to cross section entrepreneurship data. Empir. Econ. 56(1), 233-267. https://doi.org/10.1007/s00181-017-1 355-x (2019).
Dodge, J., Gururangan, S., Card, D., Schwartz, R. & Smith, N. A. Expected validation performance and estimation of a random variable’s maximum. (Accessed 04 Feb 2024) http://arxiv.org/abs/2110.00613 (2021).
Belkin, M., Hsu, D., Ma, S. & Mandal, S. Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. 116(32), 15849-15854,019. https://doi.org/10.1073/pnas. 1903070116
Kernbach, J. M. & Staartjes, V. E. Foundations of machine learning-based clinical prediction modeling: Part II-generalization and overfitting. In Machine Learning in Clinical Neuroscience Acta Neurochirurgica Supplement, vol. 134, (eds Staartjes, V. E. et al.) 15-21. https://doi.org/10.1007/978-3-030-85292-4_3 (Springer International Publishing, 2022).
Olaniyi, E. O., Oyedotun, O. K., Ogunlade, C. A. & Khashman, A. In-line grading system for Mango fruits using GLCM feature extraction and soft-computing techniques. Int. J. Appl. Pattern Recognit. 6(1), 58-75. https://doi.org/10.1504/IJAPR.2019.104294 (2019).
Benjamin, E. J. et al. Heart disease and stroke statistics-2019 update: a report from the American Heart Association. Circulation. 139(10), e56-e528 (2019).
Arora, S., Santiago, J. A., Bernstein, M. & Potashkin, J. A. Diet and lifestyle impact the development and progression of Alzheimer’s dementia. Front. Nutr. 10, https://doi.org/10.3389/fnut.2023.1213223 (2023).
Zuhair, M. et al. Estimation of the worldwide seroprevalence of cytomegalovirus: A systematic review and meta-analysis. Rev. Med. Virol. 29(3), e2034. https://doi.org/10.1002/rmv. 2034 (2019).
Xiong, B., Jiang, W. & Zhang, F. Semi-supervised classification considering space and spectrum constraint for remote sensing imagery. In 2010 18th International Conference on Geoinformatics, 1-6. https://doi.org/10.1109/GEOINFORMATICS.2010.55679 81 (IEEE, 2010).
Nadar, N. & Kamatchi, R. A novel student risk identification model using machine learning approach. Int. J. Adv. Comput. Sci. Appl. 9, 305-309. https://doi.org/10.14569/IJACSA.2018.091142 (2018).
Khan, A. & Ghosh, S. K. Student performance analysis and prediction in classroom learning: A review of educational data mining studies. Educ. Inf. Technol. 26(1), 205-240. https://doi.org/10.1007/s10639-020-10230-3 (2021).
Yousafzai, B. K., Hayat, M. & Afzal, S. Application of machine learning and data mining in predicting the performance of intermediate and secondary education level student, Educ. Inf. Technol., 25(6), 4677-4697. https://doi.org/10.1007/s10639-020-10 189-1 (2020).
Smirani, L. K., Yamani, H. A., Menzli, L. J. & Boulahia, J. A. Using ensemble learning algorithms to predict student failure and enabling customized educational paths, Sci. Program. 2022, 1-15. https://doi.org/10.1155/2022/3805235 (2022).
Usama, M., Ahmad, B., Xiao, W., Hossain, M. S. & Muhammad, G. Self-attention based recurrent convolutional neural network for disease prediction using healthcare data, comput. Methods Programs Biomed. 190, 105191 (2020).
Shukla, N., Hagenbuchner, M., Win, K. T. & Yang, J. Breast cancer data analysis for survivability studies and prediction, Comput. Methods Programs Biomed., 155, 199-208, https://doi.org/10.1016/j.cmpb.2017.12.011 (2018).
Kaur, G. & Chhabra, A. Improved J48 classification algorithm for the prediction of diabetes. Int. J. Comput. Appl. 98(22), 13-17. https://doi.org/10.5120/17314-7433 (2014).
Naz, H. et al. Deep learning approach for diabetes prediction using PIMA Indian dataset. J. Diabetes Metab. Disord. 19(1), 391-403 https://doi.org/10.1007/s40200-020-00520-5.
Dharma, F. et al. Prediction of Indonesian inflation rate using regression model based on genetic algorithms. J. Online Inform. 5(1), 45-52 https://doi.org/10.15575/join.v5i1.532 (2020).
Touzani, S., Granderson, J. & Fernandes, S. Gradient boosting machine for modeling the energy consumption of commercial buildings. Energy Build. 158, 1533-1543 https://doi.org/10.1016/j.enbuild.2017.11.039 (2018).
Mohan, S., Thirumalai, C. & Srivastava, G. Effective heart disease prediction using hybrid machine learning techniques. IEEE Access. 7, 81542-81554. https://doi.org/10.1109/ACCESS.2019.2923707 (2019).
Anuradha, C. & Velmurugan, T. A comparative analysis on the evaluation of classification algorithms in the prediction of students performance. Indian J. Sci. Technol. 8(15). https://doi.org/10.17485/ijst/2015/v8i15/74555 (2015).
Hussain, A. A. & Dimililer, K. Student grade prediction using machine learning in iot era. In International Conference on Forthcoming Networks and Sustainability in the IoT Era, 65-81. https://doi.org/10.1007/978-3-030-69431-9_6 (Springer, 2021).
Mathers, C. D., Boerma, T. & Ma Fat, D. Global and regional causes of death. vol. 92, no. 1, 7-32 (Accessed 14 Jan 2025). https://a cademic.oup.com/bmb/article-abstract/92/1/7/332071 https://doi.org/10.1093/bmb/ldp028 (Oxford University Press, 2009).
Chowdhury, R. et al. Dynamic interventions to control COVID-19 pandemic: a multivariate prediction modelling study comparing 16 worldwide countries. Eur. J. Epidemiol. 35(5), 389-399. https://doi.org/10.1007/s10654-020-00649-w (2020).
Townsend, N. et al. Epidemiology of cardiovascular disease in Europe. Nat. Rev. Cardiol. 19(2), 2 https://doi.org/10.1038/s41569-0 21-00607-3 (2022).
Ansari, M. F., Alankar, B. & Kaur, H. A prediction of heart disease using machine learning algorithms. In Image Processing and Capsule Networks, vol. 1200, (eds Chen, J. I. Z. et al.) in Advances in Intelligent Systems and Computing, vol. 1200, 497-504. https://doi.org/10.1007/978-3-030-51859-2_45 (Springer International Publishing, 2021).
Amarbayasgalan, T., Pham, V. H., Theera-Umpon, N., Piao, Y. & Ryu, K. H. An efficient prediction method for coronary heart disease risk based on two deep neural networks trained on well-ordered training datasets. IEEE Access. 9, 135210-135223. https:/ /doi.org/10.1109/ACCESS.2021.3116974 (2021).
Barhoom, A. M., Almasri, A., Abu-Nasser, B. S. & Abu-Naser, S. S. Prediction of Heart Disease Using a Collection of Machine and Deep Learning Algorithms (Accessed 04 Feb 2024) https://philpapers.org/rec/BARPOH-4 (2022).

Wish to published.

Authors ContributionYagyanath Rimal: Experimental design, data interpretation, AI Model designNavneet Sharma: Superior, ML selectionSiddhartha Paudel: Python code Abeer Alsadoon:English proof reading, model validationMadhav Parsad Koirala: Word formattingSumeet Gill: revision.

The authors declare no competing interests.

Model/iteration

Logistic regression

Support vector

K-nearest neighbor

Random forest

Correspondence and requests for materials should be addressed to Y.R.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommo ns.org/licenses/by-nc-nd/4.0/.
© The Author(s) 2025

IIS (Deemed to be University), Jaipur, India. Pokhara University, Pokhara, Nepal. IOE, Pulchowk Campus, Patan, Nepal. Western Sydney University (WSU), Sydney, Australia. Asia Pacific International College (APIC), Sydney, Australia. Maharshi Dayanand University, Rohtak, India. email: rimal.yagya@gmail.com