نموذج توقع قائم على التعلم الآلي لحمى البروسيلات المزمنة: نهج متعدد الميزات باستخدام البيانات السريرية والمخبرية Machine learning-based prediction model for chronic brucellosis: a multi-feature approach using clinical and laboratory data

المجلة: Frontiers in Cellular and Infection Microbiology، المجلد: 15
DOI: https://doi.org/10.3389/fcimb.2025.1700233
PMID: https://pubmed.ncbi.nlm.nih.gov/41346366
تاريخ النشر: 2025-11-19
المؤلف: Rong Wang وآخرون
الموضوع الرئيسي: بروسيلا: التشخيص، الوبائيات، العلاج

نظرة عامة

تتناول هذه الدراسة التحدي السريري الكبير المتمثل في التقدم المزمن في حمى البروسيلات البشرية (HB)، والتي تؤثر على حوالي ثلث المرضى ويمكن أن تؤدي إلى إعاقة طويلة الأمد. هدف المؤلفون إلى تطوير والتحقق من صحة نماذج التعلم الآلي (ML) للتنبؤ بالتقدم المزمن باستخدام بيانات سريرية ومخبرية متاحة بسهولة. تم إجراء تحليل استعادي لـ 555 مريضًا تم تأكيد إصابتهم بالبروسيلات الذين تم إدخالهم بين عامي 2019 و2024، مع التركيز على الخصائص السريرية ومؤشرات المختبر عند القبول. تم إجراء اختيار الميزات باستخدام Boruta وإزالة الميزات التكرارية، وتم تقييم ستة نماذج ML تحت الإشراف: الغابة العشوائية (RF)، LightGBM، XGBoost، الانحدار اللوجستي (LR)، الشبكة العصبية متعددة الطبقات (MLP)، وآلة الدعم الناقل (SVM).

أشارت النتائج إلى أن 144 مريضًا (25.9%) تقدموا إلى حمى البروسيلات المزمنة، حيث أظهرت الحالات المزمنة ميزات سريرية مميزة مثل زيادة آلام المفاصل والتهاب المفاصل، إلى جانب ملفات كيميائية حيوية محددة تتميز بمستويات منخفضة من الألانين أمينوترانسفيراز (ALT)، الأسبارتات أمينوترانسفيراز (AST)، والدهون الثلاثية (TG)، ومستويات أعلى من كوليسترول البروتين الدهني عالي الكثافة (HDL-C)، الألبومين (ALB)، نيتروجين اليوريا في الدم (BUN)، وحمض اليوريك (UA). من بين النماذج التي تم تقييمها، أظهر نموذج RF أفضل أداء، محققًا أعلى منطقة تحت المنحنى (AUC) تبلغ 0.782 (95% CI: 0.701 – 0.856) ومعايرة متفوقة (Emax = 0.155). كشفت تحليل SHAP أن TG، HDL-C، UA، عدد الحمضات، PA، ALT، BUN، والغلوبولين (GLB) كانت من أكثر المتنبئين تأثيرًا، مما يدعم أهميتها البيولوجية. تستنتج الدراسة أن نموذج RF، باستخدام ثمانية متغيرات متاحة بشكل روتيني، يوفر تمييزًا معتدلاً وتقديرات احتمالية جيدة المعايرة للتنبؤ بالتقدم المزمن في HB.

مقدمة

حمى البروسيلات، التي تسببها Brucella spp.، هي عدوى حيوانية مهمة مع تقديرات تتراوح بين 1.6 إلى 2.1 مليون حالة جديدة سنويًا، تؤثر بشكل رئيسي على مناطق مثل الشرق الأوسط وآسيا الوسطى وأمريكا الجنوبية والصين. غالبًا ما يتم الإبلاغ عن المرض بشكل ناقص بسبب التحديات التشخيصية ويظهر بمجموعة من الأعراض التي يمكن أن تؤدي إلى حالات مزمنة، تؤثر على أنظمة أعضاء متعددة. عادةً ما تظهر حمى البروسيلات الحادة مع الحمى وآلام العضلات، بينما قد تشمل الحالات المزمنة التعب المستمر والمضاعفات النفسية العصبية، مع كون المشكلات العظمية والمفصلية هي الأكثر شيوعًا. على الرغم من الاعتراف بالاستمرارية، ركزت الأبحاث الحالية بشكل كبير على الفروق الجزيئية بين مراحل المرض، مما يفتقر إلى أدوات تنبؤية موثوقة للتنبؤ بالتقدم المزمن عند التشخيص.

لمعالجة هذه الفجوة، تهدف الدراسة الحالية إلى تطوير نموذج تنبؤي قائم على التعلم الآلي (ML) لتحديد المرضى المعرضين لخطر حمى البروسيلات المزمنة. من خلال استخدام تقنيات الذكاء الاصطناعي القابلة للتفسير، تسعى الدراسة إلى تسليط الضوء على الميزات الرئيسية المرتبطة بالاستمرارية وتقييم أداء خوارزميات مختلفة. يهدف هذا النهج إلى إنشاء إطار عمل يمكن تفسيره سريريًا يعزز التصنيف المبكر للمخاطر والتدخلات الشخصية، مما يحسن في النهاية نتائج المرضى في إدارة حمى البروسيلات.

النتائج

في هذه الدراسة، تم تحليل 555 مريضًا لتحديد المتنبئين بحمى البروسيلات المزمنة، حيث تقدم 144 (25.95%) إلى مرض مزمن. أظهرت المجموعة المزمنة معدلات أعلى من آلام المفاصل، وآلام العضلات، والتهاب المفاصل، إلى جانب معدلات أقل من الحمى، والصداع، وتضخم الطحال مقارنة بمجموعة الشفاء (p < 0.05). لوحظت اختلافات مخبرية كبيرة، بما في ذلك زيادة عدد الصفائح الدموية، والحمضات، ومؤشرات كيميائية حيوية متنوعة في المجموعة المزمنة، مما يشير إلى تغييرات في وظيفة الكبد، والتخثر، واستقلاب الدهون، والالتهاب الجهازي. تم تحديد 14 متغيرًا مهمًا باستخدام خوارزمية Boruta، مع اختيار أفضل ثمانية—BUN، HDL-C، ALT، عدد الحمضات، TG، UA، PA، وGLB—لبناء النموذج. تم تقييم أداء ستة نماذج تعلم آلي تحت الإشراف، مما كشف أن طرق التجميع مثل الغابة العشوائية (RF)، XGBoost، وLightGBM حققت تمييزًا عاليًا في مجموعة التدريب (AUC > 0.93)، بينما أظهر RF أفضل أداء في مجموعة الاختبار (AUC = 0.782). أكدت تحليلات المعايرة تقديرات الاحتمالية المتفوقة لـ RF عبر مجموعات البيانات. أشار تحليل منحنى القرار إلى أن RF قدم أعلى فائدة سريرية صافية، مما يعزز من إمكانية استخدامه في التنبؤ بحمى البروسيلات المزمنة. كما طورت الدراسة أداة تنبؤ بالمخاطر عبر الإنترنت استنادًا إلى نموذج RF، مما يسمح للأطباء بتقييم مخاطر المرضى في الوقت الفعلي باستخدام مؤشرات سريرية متاحة بشكل روتيني. تم تعزيز قابلية تفسير النموذج من خلال تحليل SHAP، مما كشف عن تأثير المتنبئين الفرديين على خطر المرض المزمن، مما يدعم تطبيقه في اتخاذ القرارات السريرية.

المناقشة

في هذه الدراسة، تم تطوير نموذج قائم على التعلم الآلي للتنبؤ بالتقدم المزمن في المرضى المصابين بحمى البروسيلات، باستخدام مجموعة بيانات تضم 555 مشاركًا تم تشخيصهم في المستشفى الأول لجامعة شانشي الطبية. استخدم النموذج ستة خوارزميات، حيث أظهر نموذج الغابة العشوائية (RF) أفضل أداء من حيث التمييز، والمعايرة، والفائدة السريرية. حدد اختيار الميزات ثمانية متنبئين رئيسيين، بما في ذلك نيتروجين اليوريا في الدم (BUN)، كوليسترول البروتين الدهني عالي الكثافة (HDL-C)، الألانين أمينوترانسفيراز (ALT)، عدد الحمضات، الدهون الثلاثية (TG)، حمض اليوريك (UA)، الألبومين (PA)، والغلوبولين (GLB). من الجدير بالذكر أن قابلية تفسير النموذج تم تعزيزها من خلال SHapley Additive exPlanations (SHAP)، التي أوضحت مساهمات الميزات الفردية في خطر حمى البروسيلات المزمنة.

تؤكد النتائج على الآثار الصحية العامة الكبيرة لحمى البروسيلات، خاصة في المناطق الموبوءة حيث يمكن أن تؤدي الحالات المزمنة إلى زيادة المراضة وأعباء الرعاية الصحية. تسلط الدراسة الضوء على إمكانية استخدام البيانات السريرية والمخبرية المجمعة بشكل روتيني للتنبؤ بالمخاطر، مما يتناقض مع الأبحاث السابقة التي ركزت على العلامات الجزيئية. بينما أظهر نموذج RF وعدًا، فإن حساسيته المحدودة تشير إلى أنه يجب أن يعمل كأداة مساعدة للأطباء بدلاً من أن يكون طريقة تشخيصية قائمة بذاتها. تشمل نقاط القوة في الدراسة مجموعة المرضى الواقعية وتطوير أداة تنبؤ عبر الإنترنت سهلة الوصول، بينما تبرر القيود مثل التصميم الاستعادي وإمكانية التصنيف الخاطئ للحالات المزمنة تفسير النتائج بحذر. هناك حاجة إلى مزيد من التحقق والتحسين للنموذج لتعزيز قابليته السريرية.

Journal: Frontiers in Cellular and Infection Microbiology, Volume: 15
DOI: https://doi.org/10.3389/fcimb.2025.1700233
PMID: https://pubmed.ncbi.nlm.nih.gov/41346366
Publication Date: 2025-11-19
Author(s): Rong Wang et al.
Primary Topic: Brucella: diagnosis, epidemiology, treatment

Overview

This study addresses the significant clinical challenge of chronic progression in human brucellosis (HB), which affects approximately one-third of patients and can lead to long-term disability. The authors aimed to develop and validate machine learning (ML) models for predicting chronic progression using readily available clinical and laboratory data. A retrospective analysis of 555 patients with confirmed brucellosis admitted between 2019 and 2024 was conducted, focusing on clinical characteristics and laboratory indicators at admission. Feature selection was performed using Boruta and recursive feature elimination, and six supervised ML models were evaluated: random forest (RF), LightGBM, XGBoost, logistic regression (LR), multilayer perceptron (MLP), and support vector machine (SVM).

The results indicated that 144 patients (25.9%) progressed to chronic brucellosis, with chronic cases exhibiting distinct clinical features such as increased arthralgia and arthritis, alongside specific biochemical profiles characterized by lower levels of alanine aminotransferase (ALT), aspartate aminotransferase (AST), and triglycerides (TG), and higher levels of high-density lipoprotein cholesterol (HDL-C), albumin (ALB), blood urea nitrogen (BUN), and uric acid (UA). Among the evaluated models, the RF model demonstrated the best performance, achieving the highest area under the curve (AUC) of 0.782 (95% CI: 0.701 – 0.856) and superior calibration (Emax = 0.155). SHAP analysis revealed that TG, HDL-C, UA, eosinophil count, PA, ALT, BUN, and globulin (GLB) were the most influential predictors, supporting their biological relevance. The study concludes that the RF model, utilizing eight routinely available variables, provides moderate discrimination and well-calibrated probability estimates for predicting chronic progression in HB.

Introduction

Brucellosis, caused by Brucella spp., is a significant zoonotic infection with an estimated 1.6 to 2.1 million new human cases annually, predominantly affecting regions such as the Middle East, Central Asia, South America, and China. The disease is often underreported due to diagnostic challenges and presents with a range of symptoms that can lead to chronic conditions, affecting multiple organ systems. Acute brucellosis typically manifests with fever and myalgia, while chronic cases may involve persistent fatigue and neuropsychiatric complications, with osteoarticular issues being the most common. Despite the recognition of chronicity, existing research has largely focused on molecular differences between disease stages, lacking validated prognostic tools for predicting chronic progression at diagnosis.

To address this gap, the current study aims to develop a machine learning (ML)-based predictive model to identify patients at risk for chronic brucellosis. By utilizing explainable artificial intelligence techniques, the study seeks to highlight key features associated with chronicity and evaluate the performance of various algorithms. This approach aims to create a clinically interpretable framework that enhances early risk stratification and personalized intervention, ultimately improving patient outcomes in brucellosis management.

Results

In this study, 555 patients were analyzed to identify predictors of chronic brucellosis, with 144 (25.95%) progressing to chronic disease. The chronic group exhibited higher incidences of arthralgia, myalgia, and arthritis, alongside lower rates of fever, headache, and splenomegaly compared to the recovery group (p < 0.05). Significant laboratory differences were noted, including elevated platelet counts, eosinophils, and various biochemical markers in the chronic group, indicating alterations in hepatic function, coagulation, lipid metabolism, and systemic inflammation. Feature selection using the Boruta algorithm identified 14 important variables, with the top eight—BUN, HDL-C, ALT, eosinophil count, TG, UA, PA, and GLB—selected for model construction. The performance of six supervised machine learning models was evaluated, revealing that ensemble methods like Random Forest (RF), XGBoost, and LightGBM achieved high discrimination in the training set (AUC > 0.93), while RF demonstrated the best performance in the test set (AUC = 0.782). Calibration analyses confirmed RF’s superior probability estimates across datasets. Decision curve analysis indicated that RF provided the highest net clinical benefit, reinforcing its potential utility in predicting chronic brucellosis. The study also developed an online risk prediction tool based on the RF model, allowing clinicians to assess patient risk in real-time using routinely available clinical indicators. The model’s interpretability was enhanced through SHAP analysis, revealing the influence of individual predictors on chronic disease risk, thus supporting its application in clinical decision-making.

Discussion

In this study, a machine learning-based model was developed to predict chronic progression in patients with brucellosis, utilizing a dataset of 555 participants diagnosed at the First Hospital of Shanxi Medical University. The model employed six algorithms, with the Random Forest (RF) model demonstrating the best performance in terms of discrimination, calibration, and clinical utility. Feature selection identified eight key predictors, including blood urea nitrogen (BUN), high-density lipoprotein cholesterol (HDL-C), alanine aminotransferase (ALT), eosinophil count, triglycerides (TG), uric acid (UA), prealbumin (PA), and globulin (GLB). Notably, the model’s interpretability was enhanced through SHapley Additive exPlanations (SHAP), which elucidated the contributions of individual features to the risk of chronic brucellosis.

The findings underscore the significant public health implications of brucellosis, particularly in endemic regions where chronic cases can lead to increased morbidity and healthcare burdens. The study highlights the potential of using routinely collected clinical and laboratory data for risk prediction, contrasting with previous research that focused on molecular markers. While the RF model showed promise, its limited sensitivity suggests it should serve as a supplementary tool for clinicians rather than a standalone diagnostic method. The study’s strengths include its real-world cohort and the development of an accessible web-based prediction tool, while limitations such as the retrospective design and potential misclassification of chronic cases warrant cautious interpretation of the results. Further validation and optimization of the model are necessary to enhance its clinical applicability.