تحسين توقع مرض الغدة الدرقية باستخدام التعلم الآلي الجماعي: نهج عالي الدقة مع اختيار الميزات وتوازن الفئات Enhanced thyroid disease prediction using ensemble machine learning: a high-accuracy approach with feature selection and class balancing

المجلة: Discover Artificial Intelligence، المجلد: 5، العدد: 1
DOI: https://doi.org/10.1007/s44163-025-00225-9
تاريخ النشر: 2025-01-31
المؤلف: Md. Rezaul Islam وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في الرعاية الصحية

نظرة عامة

تتناول الأبحاث الزيادة المتزايدة في انتشار اضطرابات الغدة الدرقية، مع التأكيد على ضرورة الكشف المبكر للتخفيف من الوفيات والمضاعفات. تستخدم الدراسة إطار تعلم الآلة المعزز من خلال تحليل شامل للميزات السريرية واستراتيجية التعلم الجماعي لتحسين دقة التشخيص والتنبؤ بتقدم المرض. من خلال تقييم سبعة عشر نموذجًا من نماذج تعلم الآلة، نفذ المؤلفون مصنف Ensemble ML باستخدام استراتيجية تصويت صارمة، والتي، جنبًا إلى جنب مع تقنيات توازن الفئات مثل إعادة العينة العشوائية، حسنت بشكل كبير من أداء التصنيف. تشير النتائج إلى أن النموذج المقترح، لا سيما باستخدام خوارزمية XGBoost واختيار الميزات SelectKBest، حقق 100% حساسية و99.72% دقة، متفوقًا على المنهجيات الحالية.

في الختام، تسلط الدراسة الضوء على الدور الحاسم للتقييم الدقيق في تشخيص الشذوذات الغدية الدرقية وفعالية تعلم الآلة في تحديد عوامل الخطر. أسفرت مجموعة XGBoost مع استراتيجيات التصويت الصارمة من نماذج Random Forest وDecision Tree عن أداء تصنيف عالٍ، مع 100% حساسية و99.62% دقة. ستركز الأبحاث المستقبلية على توسيع مجموعة البيانات لتعزيز أداء النموذج واستكشاف تقنيات توازن الفئات المتقدمة والمصنفات المعتمدة على التعلم الجماعي ضمن مجال الذكاء الاصطناعي.

مقدمة

تلعب هرمونات الغدة الدرقية دورًا حاسمًا في تنظيم الوظائف التناسلية، وتطور الدماغ، والتمثيل الغذائي. يمكن أن تؤدي خلل في إنتاج هرمونات الغدة الدرقية إلى مشاكل صحية كبيرة، بما في ذلك سرطان الغدة الدرقية والتهاب الغدة الدرقية، مع كون فرط نشاط الغدة الدرقية وقصور الغدة الدرقية هما الاضطرابات الأكثر شيوعًا. تزداد انتشار خلل الغدة الدرقية على مستوى العالم، مما يؤثر على 30 إلى 40% من المرضى في عيادات الغدد الصماء، وفي الولايات المتحدة، يظل حوالي 60% من الأفراد المقدر عددهم بعشرين مليونًا الذين يعانون من اضطرابات الغدة الدرقية غير مشخصين.

تُعقِّد الأعراض المتداخلة لأمراض الغدة الدرقية عملية التشخيص، مما يؤدي غالبًا إلى تشخيص خاطئ أو تأخير في العلاج. تعتمد طرق التشخيص التقليدية على مزيج من اختبارات الدم والفحوصات السريرية، والتي يمكن أن تكون شاملة وتستغرق وقتًا طويلاً. وهذا يبرز الحاجة إلى استراتيجيات تشخيص محسّنة لتعزيز تحديد وإدارة الحالات المتعلقة بالغدة الدرقية.

طرق

توضح قسم الطرق سير العمل الشامل لتطوير وتقييم نموذج تعلم الآلة الذي يهدف إلى التنبؤ بمرض الغدة الدرقية، كما هو موضح في الشكل 3. تتضمن المرحلة الأولية معالجة مجموعة البيانات، التي تعالج القيم المفقودة، وترميز البيانات، واختيار الميزات، وإعادة العينة، والتطبيع. يتم تحديد الميزات الرئيسية باستخدام تقنيات مثل XGBoost وSelectKBest، بينما يتم استخدام إعادة العينة العشوائية لتعزيز توازن مجموعة البيانات.

بعد ذلك، يتم تقسيم مجموعة البيانات المعالجة إلى مجموعات تدريب (80%) واختبار (20%). يتم تدريب خوارزميات تعلم الآلة المختلفة على بيانات التدريب ثم تقييمها على بيانات الاختبار. يتم قياس أداء كل نموذج باستخدام مصفوفة الارتباك، مما يسمح بإجراء تحليل مقارن لتحديد النموذج التنبؤي الأكثر فعالية لمرض الغدة الدرقية.

مناقشة

في قسم المناقشة من ورقة البحث، يؤكد المؤلفون على الأهمية الحاسمة للكشف المبكر والتشخيص الدقيق لاضطرابات الغدة الدرقية، والتي يمكن أن تؤدي إلى مضاعفات خطيرة إذا تُركت دون علاج. تواجه طرق التشخيص التقليدية، التي تعتمد بشكل أساسي على اختبارات الدم التي تقيس هرمونات الغدة الدرقية مثل TSH وT3 وT4 وTSI، تحديات في تقديم تشخيصات نهائية. يتم تسليط الضوء على دمج الذكاء الاصطناعي (AI) وتقنيات تعلم الآلة (ML) كنهج واعد لتعزيز تصنيف وتشخيص اضطرابات الغدة الدرقية. أظهرت دراسات مختلفة فعالية خوارزميات ML، مع نتائج ملحوظة تشمل دقة 94.8% باستخدام Random Forest ودقة 99.35% من خلال تقنيات اختيار الميزات المدمجة مع المصنفات.

يحدد المؤلفون منهجية منهجية لتطوير نموذج تعلم الآلة للتنبؤ بمرض الغدة الدرقية، والتي تشمل معالجة البيانات، واختيار الميزات، وتقييم النموذج. استخدموا تقنيات مثل إعادة العينة العشوائية لمعالجة عدم توازن الفئات وطبقوا خوارزميات متقدمة مثل XGBoost وSelectKBest لاختيار الميزات بشكل مثالي. تشمل مساهمات الدراسة تطوير نموذج تنبؤي آلي وموثوق، وتصويرات لتعزيز فهم أهمية الميزات، ونشر النموذج كتطبيق ويب للاستخدام العملي. بشكل عام، تؤكد النتائج على إمكانيات الذكاء الاصطناعي وتعلم الآلة في تحسين دقة التشخيص ونتائج المرضى في إدارة مرض الغدة الدرقية.

Journal: Discover Artificial Intelligence, Volume: 5, Issue: 1
DOI: https://doi.org/10.1007/s44163-025-00225-9
Publication Date: 2025-01-31
Author(s): Md. Rezaul Islam et al.
Primary Topic: Artificial Intelligence in Healthcare

Overview

The research addresses the rising prevalence of thyroid disorders, emphasizing the necessity for early detection to mitigate mortality and complications. The study employs a machine learning framework enhanced by comprehensive clinical feature analysis and an ensemble learning strategy to improve diagnostic accuracy and predict disease progression. By evaluating seventeen machine learning models, the authors implemented an Ensemble ML classifier utilizing a hard voting strategy, which, alongside class balancing techniques like random oversampling, significantly improved classification performance. The results indicate that the proposed model, particularly using the XGBoost algorithm and SelectKBest feature selection, achieved 100% sensitivity and 99.72% accuracy, outperforming existing methodologies.

In conclusion, the study highlights the critical role of accurate assessment in diagnosing thyroid abnormalities and the effectiveness of machine learning in identifying risk factors. The combination of XGBoost with hard voting strategies from Random Forest and Decision Tree models yielded a high classification performance, with 100% sensitivity and 99.62% accuracy. Future research will focus on expanding the dataset to further enhance model performance and exploring advanced class balancing techniques and ensemble-based classifiers within the realm of artificial intelligence.

Introduction

Thyroid hormones play a crucial role in regulating reproductive functions, brain development, and metabolism. Dysfunction in thyroid hormone production can lead to significant health issues, including thyroid cancer and thyroiditis, with hyperthyroidism and hypothyroidism being the most prevalent disorders. The global prevalence of thyroid dysfunction is increasing, impacting 30 to 40% of patients in endocrine clinics, and in the United States, around 60% of the estimated twenty million individuals with thyroid disorders remain undiagnosed.

The overlapping symptoms of thyroid diseases complicate the diagnostic process, often leading to misdiagnosis or delayed treatment. Traditional diagnostic methods rely on a combination of blood tests and clinical examinations, which can be extensive and time-consuming. This highlights the need for improved diagnostic strategies to enhance the identification and management of thyroid-related conditions.

Methods

The methods section outlines the comprehensive workflow for developing and evaluating a machine learning model aimed at predicting thyroid disease, as depicted in Figure 3. The initial phase involves dataset preprocessing, which addresses missing values, data encoding, feature selection, resampling, and normalization. Key features are identified using techniques such as XGBoost and SelectKBest, while random oversampling is employed to enhance the dataset’s balance.

Subsequently, the processed dataset is divided into training (80%) and testing (20%) subsets. Various machine learning algorithms are trained on the training data and subsequently evaluated on the test data. The performance of each model is quantified using a confusion matrix, allowing for a comparative analysis to determine the most effective predictive model for thyroid disease.

Discussion

In the discussion section of the research paper, the authors emphasize the critical importance of early detection and accurate diagnosis of thyroid disorders, which can lead to severe complications if left untreated. Traditional diagnostic methods, primarily blood tests measuring thyroid hormones such as TSH, T3, T4, and TSI, face challenges in providing definitive diagnoses. The integration of artificial intelligence (AI) and machine learning (ML) techniques is highlighted as a promising approach to enhance the classification and diagnosis of thyroid disorders. Various studies have demonstrated the effectiveness of ML algorithms, with notable results including a 94.8% accuracy using Random Forest and 99.35% accuracy through feature selection techniques combined with classifiers.

The authors outline a systematic methodology for developing a machine learning model for thyroid disease prediction, which includes data preprocessing, feature selection, and model evaluation. They utilized techniques such as Random Oversampling to address class imbalance and employed advanced algorithms like XGBoost and SelectKBest for optimal feature selection. The study’s contributions include the development of an automated and reliable predictive model, visualizations to enhance understanding of feature significance, and the deployment of the model as a web application for practical use. Overall, the findings underscore the potential of AI and ML in improving diagnostic accuracy and patient outcomes in thyroid disease management.