نهج التعلم الآلي لتحسين تشخيص الاضطرابات الدموية Machine Learning Approaches for Enhanced Diagnosis of Hematological Disorders

المجلة: Computational Systems and Artificial Intelligence، المجلد: 1، العدد: 1
DOI: https://doi.org/10.69882/adba.csai.2025072
تاريخ النشر: 2025-07-26
المؤلف: Yiğitcan Çakmak
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية

نظرة عامة

تبحث هذه الدراسة في تطبيق خوارزميات تعلم الآلة (ML) للكشف المبكر وتصنيف فقر الدم وغيرها من اضطرابات الدم، باستخدام نماذج ML التقليدية والمتقدمة المختلفة بما في ذلك LightGBM و CatBoost و Decision Tree و Gradient Boosting و Random Forest و XGBoost. وجدت الدراسة أن LightGBM حقق أعلى دقة بنسبة 98.38%، تلاه عن كثب CatBoost بنسبة 98.37%. أظهرت نماذج أخرى، مثل Decision Tree و Gradient Boosting، دقة بنسبة 98.05%، بينما سجلت Random Forest و XGBoost 97.72%. تشير هذه النتائج إلى أن تقنيات ML يمكن أن تحدد بفعالية الأنماط المعقدة في البيانات الدموية، مما يعزز دقة التشخيص ويسهل التدخلات السريرية في الوقت المناسب.

تعتبر آثار هذه الدراسة مهمة للممارسة السريرية، حيث يمكن أن تؤدي الدقة العالية للنماذج المحددة إلى تحسين تخطيط العلاج لحالات مثل فقر الدم، مما يقلل في النهاية من أخطاء التشخيص وتأخيرات العلاج. يجب أن تركز الأبحاث المستقبلية على تحسين هذه الخوارزميات لتكون قابلة للتطبيق على نطاق أوسع عبر مجموعات بيانات وسياقات سريرية متنوعة، بالإضافة إلى استكشاف تقنيات التعلم العميق المتقدمة. بالإضافة إلى ذلك، فإن دمج طرق الذكاء الاصطناعي القابلة للتفسير، مثل SHAP أو LIME، أمر بالغ الأهمية لضمان الشفافية وتعزيز الثقة بين المتخصصين في الرعاية الصحية. تبرز هذه الدراسة الدور المتزايد لتعلم الآلة في التشخيص الطبي، مما يعزز التحول نحو حلول الرعاية الصحية الدقيقة والشخصية لاضطرابات الدم.

مقدمة

تسلط مقدمة ورقة البحث الضوء على الانتشار الواسع لفقر الدم، وهو اضطراب دموي يتميز بنقص في خلايا الدم الحمراء أو الهيموجلوبين، مما يؤدي إلى أعراض مثل التعب والشحوب. ينشأ فقر الدم من أسباب متنوعة، بما في ذلك نقص التغذية والأمراض المزمنة والحالات الوراثية، مما يتطلب تصنيفًا دقيقًا للتشخيص والعلاج الفعال. تؤكد الورقة على التحديات في تفسير اختبارات تعداد الدم الكامل (CBC) بسبب تعقيد بيانات المرضى وزيادة حجم الحالات.

لمعالجة هذه التحديات، تستكشف الدراسة تطبيق تقنيات تعلم الآلة (ML) والتعلم العميق (DL) في تصنيف فقر الدم والاضطرابات الدموية ذات الصلة. أفاد المؤلفون أن خوارزمية LightGBM حققت أعلى دقة بنسبة 98.38%، تلاها عن كثب CatBoostClassifier ونماذج أخرى. تدعم هذه النتيجة الفكرة القائلة بأن ML يمكن أن تعزز دقة التشخيص ووقته في السياقات السريرية. تشير المقدمة أيضًا إلى الدراسات السابقة التي تظهر فعالية خوارزميات ML المختلفة في تصنيف فقر الدم، مما يعزز إمكانيات هذه التقنيات لتحسين نتائج الرعاية الصحية، لا سيما في البيئات ذات الموارد المحدودة. بشكل عام، تؤكد الأبحاث على الدور المتزايد للنهج المعتمدة على البيانات في التشخيصات الدموية، مما يشير إلى أن نماذج ML يمكن أن تساعد بشكل كبير في التعرف المبكر والدقيق على فقر الدم وحالاته المرتبطة.

طرق

في هذه الدراسة، تم استخدام Light Gradient Boosting Machine (LightGBM) كموصِف لمختلف أنواع فقر الدم، محققًا دقة إجمالية مثيرة للإعجاب بنسبة 98.38%. تميز النموذج في الدقة والاسترجاع ودرجة F1 لفئات معينة، لا سيما الفئة 1 (فقر الدم الناتج عن نقص الحديد) والفئة 2 (سرطان الدم) والفئة 6 (فقر الدم النورموسيتي النورمكرومي)، حيث حصل على درجات مثالية. ومع ذلك، كانت الأداء أقل قليلاً بالنسبة للفئة 3 (سرطان الدم مع نقص الصفائح الدموية) والفئة 4 (فقر الدم الكروي)، مع معدلات استرجاع بلغت 80% و60%، على التوالي.

تعكس درجة F1 الماكرو للنموذج البالغة 0.95 ودرجة F1 المتوسطة الموزونة البالغة 0.98 أدائه المتوازن عبر الفئات ذات الأحجام المتفاوتة، مما يبرز فعالية LightGBM في مهام التصنيف متعددة الفئات ضمن مجموعات البيانات الطبية. يتم تلخيص مقاييس الأداء التفصيلية في الجدول 3، ويتم تقديم مصفوفة الالتباس التي توضح نتائج تصنيف النموذج في الشكل 3.

نتائج

في هذا القسم، تقدم الدراسة طريقة تشخيص مستقلة تستخدم تقنيات تعلم الآلة (ML) لتشخيص فقر الدم، وهو اضطراب دموي شائع. قيمت الأبحاث عدة خوارزميات ML، مع نتائج تشير إلى أن نموذج LightGB حقق أعلى دقة بنسبة 98.38%. إن كفاءة هذا النموذج في إدارة مجموعات البيانات الكبيرة وتقديم توقعات دقيقة تجعله خيارًا قويًا لمهام التصنيف المعقدة. تلاه عن كثب CatBoostClassifier بدقة 98.37%، مما يظهر فعاليته في التعامل مع الميزات الفئوية من خلال نهج تعزيز التدرج. حققت نماذج أخرى، بما في ذلك Decision Tree و Gradient Boosting، دقة بنسبة 98.05%، بينما سجلت Random Forest و XGradient Boosting 97.72%.

تسلط الدراسة الضوء على أن أداء هذه النماذج يتأثر بعوامل مثل اختيار الميزات، ومعالجة البيانات، وتحسين المعلمات. بينما يتفوق LightGB في الدقة، قد يتطلب موارد حسابية كبيرة مع مجموعات البيانات الكبيرة، في حين أن Random Forest و XGradient Boosting توفران قابلية توسيع أفضل للتطبيقات العملية. تشمل اتجاهات البحث المستقبلية تعزيز هندسة الميزات، وتطوير نماذج هجينة تدمج التعلم العميق، واستخدام تدابير القابلية للتفسير مثل قيم SHAP أو LIME لتحسين ثقة المستخدم في أنظمة التشخيص المعتمدة على الذكاء الاصطناعي.

مناقشة

في هذه الدراسة، تم استخدام مجموعة بيانات تحتوي على 1,232 حالة فريدة لتصنيف أنواع مختلفة من فقر الدم والحالات الدموية ذات الصلة باستخدام خوارزميات تعلم الآلة (ML). تضمنت مجموعة البيانات 15 ميزة مميزة، تشمل الفئات التشخيصية مثل صحي، فقر الدم الناتج عن نقص الحديد، وسرطان الدم، من بين أمور أخرى. تؤكد الأبحاث على أهمية التشخيص الدقيق وفي الوقت المناسب لفقر الدم، لا سيما في المناطق ذات الدخل المنخفض والمتوسط، لتوجيه استراتيجيات العلاج الفعالة. تم تنفيذ خطوات معالجة البيانات بدقة لتعزيز جودة البيانات، بما في ذلك التعامل مع القيم المفقودة، وتقييس الميزات، وترميز المتغيرات الفئوية. ثم تم تقسيم مجموعة البيانات إلى مجموعات تدريب (75%) واختبار (25%) لتسهيل تقييم النموذج.

استخدمت الدراسة عدة خوارزميات ML، بما في ذلك LightGBM و CatBoost و Decision Trees و Gradient Boosting، التي تم اختيارها لفعاليتها في التعامل مع البيانات المعقدة وعالية الأبعاد. أشارت النتائج إلى أن LightGBM و CatBoost حققا أعلى دقة تشخيصية بنسبة 98.38% و 98.37%، على التوالي، مما يظهر قدرتهما على تحديد الأنماط المعقدة في البيانات السريرية. كشفت تحليل أهمية الميزات أن متوسط حجم الكريات (MCV) ومتوسط هيموجلوبين الكريات (MCH) والهيموجلوبين (HGB) كانت محورية في عملية التصنيف. تؤكد النتائج على إمكانيات نماذج ML لتعزيز دقة وكفاءة التشخيص في السياقات السريرية، مما يمهد الطريق للبحوث المستقبلية لتحسين هذه الخوارزميات واستكشاف تقنيات متقدمة مثل التعلم العميق والذكاء الاصطناعي القابل للتفسير (XAI) لتطبيقات أوسع في الرعاية الصحية.

Journal: Computational Systems and Artificial Intelligence, Volume: 1, Issue: 1
DOI: https://doi.org/10.69882/adba.csai.2025072
Publication Date: 2025-07-26
Author(s): Yiğitcan Çakmak
Primary Topic: Artificial Intelligence in Healthcare

Overview

This research investigates the application of machine learning (ML) algorithms for the early detection and classification of anemia and other blood disorders, utilizing various traditional and advanced ML models including LightGBM, CatBoost, Decision Tree, Gradient Boosting, Random Forest, and XGBoost. The study found that LightGBM achieved the highest accuracy at 98.38%, closely followed by CatBoost at 98.37%. Other models, such as Decision Tree and Gradient Boosting, demonstrated accuracies of 98.05%, while Random Forest and XGBoost recorded 97.72%. These findings suggest that ML techniques can effectively identify complex patterns in hematological data, thereby enhancing diagnostic accuracy and facilitating timely clinical interventions.

The implications of this research are significant for clinical practice, as the high accuracy of the identified models can lead to improved treatment planning for conditions like anemia, ultimately reducing diagnostic errors and treatment delays. Future research should focus on refining these algorithms for broader applicability across diverse datasets and clinical settings, as well as exploring advanced deep learning techniques. Additionally, the integration of explainable AI methods, such as SHAP or LIME, is crucial for ensuring transparency and fostering trust among healthcare professionals. This study highlights the growing role of ML in medical diagnostics, promoting a shift towards precision and personalized healthcare solutions for hematological disorders.

Introduction

The introduction of the research paper highlights the widespread prevalence of anemia, a blood disorder characterized by a deficiency in red blood cells or hemoglobin, leading to symptoms such as fatigue and pallor. Anemia arises from various causes, including nutritional deficiencies, chronic diseases, and inherited conditions, necessitating accurate classification for effective diagnosis and treatment. The paper emphasizes the challenges in interpreting complete blood count (CBC) tests due to the complexity of patient data and the increasing volume of cases.

To address these challenges, the study explores the application of machine learning (ML) and deep learning (DL) techniques in classifying anemia and related hematologic disorders. The authors report that the LightGBM algorithm achieved the highest accuracy of 98.38%, closely followed by CatBoostClassifier and other models. This finding supports the notion that ML can enhance diagnostic accuracy and timeliness in clinical settings. The introduction also references previous studies that demonstrate the effectiveness of various ML algorithms in anemia classification, reinforcing the potential of these technologies to improve healthcare outcomes, particularly in resource-limited environments. Overall, the research underscores the growing role of data-driven approaches in hematological diagnostics, suggesting that ML models can significantly aid in the early and precise identification of anemia and its associated conditions.

Methods

In this study, the Light Gradient Boosting Machine (LightGBM) was employed as a classifier for various types of anemia, achieving an impressive overall accuracy of 98.38%. The model excelled in precision, recall, and F1-score for specific classes, notably Class 1 (Iron deficiency anemia), Class 2 (Leukemia), and Class 6 (Normocytic normochromic anemia), where it attained perfect scores. However, performance was somewhat diminished for Class 3 (Leukemia with thrombocytopenia) and Class 4 (Macrocytic anemia), with recall rates of 80% and 60%, respectively.

The model’s macro F1-score of 0.95 and weighted average F1-score of 0.98 reflect its balanced performance across classes of varying sizes, underscoring LightGBM’s efficacy in multi-class classification tasks within medical datasets. Detailed performance metrics are summarized in Table 3, and a confusion matrix illustrating the model’s classification results is presented in Figure 3.

Results

In this section, the study presents an independent diagnostic method utilizing machine learning (ML) techniques for diagnosing anemia, a prevalent blood disorder. The research evaluated several ML algorithms, with results indicating that the LightGB model achieved the highest accuracy of 98.38%. This model’s efficiency in managing large datasets and delivering precise predictions positions it as a robust option for complex classification tasks. The CatBoostClassifier closely followed with an accuracy of 98.37%, demonstrating its effectiveness in handling categorical features through a gradient boosting approach. Other models, including Decision Tree and Gradient Boosting, achieved an accuracy of 98.05%, while Random Forest and XGradient Boosting recorded 97.72%.

The study highlights that the performance of these models is influenced by factors such as feature selection, data preprocessing, and hyperparameter optimization. While LightGB excels in accuracy, it may require significant computational resources with large datasets, whereas Random Forest and XGradient Boosting offer better scalability for practical applications. Future research directions include enhancing feature engineering, developing hybrid models that integrate deep learning, and employing explainability measures like SHAP values or LIME to improve user trust in AI-based diagnostic systems.

Discussion

In this study, a dataset of 1,232 unique instances was utilized to classify various types of anemia and related hematological conditions using machine learning (ML) algorithms. The dataset included 15 distinct features, encompassing diagnostic categories such as Healthy, Iron Deficiency Anemia, and Leukemia, among others. The research emphasizes the importance of accurate and timely diagnosis of anemia, particularly in low- and middle-income regions, to guide effective treatment strategies. Data preprocessing steps were meticulously executed to enhance data quality, including handling missing values, feature scaling, and encoding categorical variables. The dataset was then split into training (75%) and testing (25%) subsets to facilitate model evaluation.

The study employed several ML algorithms, including LightGBM, CatBoost, Decision Trees, and Gradient Boosting, chosen for their effectiveness in handling complex, high-dimensional data. Results indicated that LightGBM and CatBoost achieved the highest diagnostic accuracies of 98.38% and 98.37%, respectively, demonstrating their capability to identify intricate patterns in clinical data. Feature importance analysis revealed that Mean Corpuscular Volume (MCV), Mean Corpuscular Hemoglobin (MCH), and Hemoglobin (HGB) were pivotal in the classification process. The findings underscore the potential of ML models to enhance diagnostic accuracy and efficiency in clinical settings, paving the way for future research to refine these algorithms and explore advanced techniques such as deep learning and eXplainable AI (XAI) for broader application in healthcare.