تقييم طرق التنبؤ بحمى الضنك: دراسة مقارنة للنماذج الإحصائية وتقنيات التعلم الآلي في ريو دي جانيرو، البرازيل Assessing dengue forecasting methods: a comparative study of statistical models and machine learning techniques in Rio de Janeiro, Brazil

المجلة: Tropical Medicine and Health، المجلد: 53، العدد: 1
DOI: https://doi.org/10.1186/s41182-025-00723-7
PMID: https://pubmed.ncbi.nlm.nih.gov/40211309
تاريخ النشر: 2025-04-10
المؤلف: Xiang Chen وآخرون
الموضوع الرئيسي: الأمراض المنقولة بواسطة البعوض ومكافحتها

نظرة عامة

تستقصي هذه الدراسة الأداء التنبؤي والكفاءة الحسابية لمجموعة متنوعة من النماذج الإحصائية وتقنيات التعلم الآلي لتوقع تفشي حمى الضنك، تحديدًا في ريو دي جانيرو، البرازيل. تؤكد الأبحاث على أهمية التنبؤ الدقيق بحمى الضنك للتخطيط والتدخل في الصحة العامة، خاصة من خلال دمج العوامل المناخية المعروفة بتأثيرها على انتقال المرض. تم استخدام نهج النافذة الديناميكية لتوليد توقعات أسبوعية، مقارنةً بأساليب مثل المتوسط المتحرك المتكامل الذاتي (ARIMA)، وARIMAX الموسمي (SARIMAX)، وتقنيات التعلم الآلي مثل الشبكات العصبية الذاكرة الطويلة القصيرة (LSTM) وغابة عشوائية.

تشير النتائج إلى أن ARIMA تفوقت على النماذج الإحصائية الأخرى عند استخدام بيانات الحالات التاريخية فقط، بينما عززت SARIMAX دقة التنبؤ بشكل كبير من خلال تضمين المتغيرات المناخية. من بين نماذج التعلم الآلي، برزت LSTM، خاصة عند دمجها مع بيانات المناخ، كأكثرها دقة، على الرغم من أوقات التدريب والتنبؤ الأطول. بالنسبة للتوقعات طويلة الأجل، أثبت نموذج النبي (Prophet) مع المتغيرات المناخية أنه الأكثر فعالية. بالإضافة إلى ذلك، أظهرت النماذج التجميعية، وخاصة تلك التي تجمع بين LSTM وARIMA، تحسينات ملحوظة مقارنةً بالأساليب الفردية. تسلط هذه الأبحاث الضوء على إمكانية دمج تقنيات التعلم الآلي والعوامل المناخية لتعزيز توقعات حمى الضنك، مما يوفر رؤى حاسمة لمسؤولي الصحة العامة.

مقدمة

تتناول مقدمة الورقة التحدي الكبير الذي تمثله حمى الضنك للصحة العامة، وهي عدوى فيروسية تنتقل بواسطة بعوضة Aedes، وخاصة Aedes aegypti. يتكون فيروس حمى الضنك من أربعة أنماط مصليّة (DENV-1، DENV-2، DENV-3، وDENV-4)، مع إصابات تتراوح من خفيفة إلى شديدة، بما في ذلك أشكال قد تكون قاتلة مثل حمى الضنك النزفية. المرض متوطن في أكثر من مئة دولة ومن المتوقع أن ينتشر أكثر بسبب تغير المناخ، الذي يؤثر على تكاثر البعوض وانتقال الفيروس. تشمل استراتيجيات الوقاية الحالية الحماية الشخصية، والسيطرة الكيميائية، والمراقبة المعززة، ولكن لا يوجد علاج عالمي.

يقيم المؤلفون مجموعة متنوعة من الأساليب الإحصائية وتقنيات التعلم الآلي لتوقع حمى الضنك، بهدف تحديد أكثر الأساليب فعالية من حيث الأداء التنبؤي والكفاءة الحسابية. يبرزون قيود النماذج الإحصائية التقليدية، مثل ARIMA، التي قد تواجه صعوبة مع التفاعلات غير الخطية، ويقارنونها بتقنيات التعلم الآلي مثل الغابات العشوائية والشبكات العصبية الذاكرة الطويلة القصيرة (LSTM)، التي يمكن أن تلتقط أنماطًا معقدة. تركز الدراسة بشكل مبتكر على التنبؤات الأسبوعية بدلاً من التوقعات الشهرية وتستخدم استراتيجية النافذة المتحركة لتدريب النموذج، مما يعكس تحديثات البيانات في الوقت الحقيقي. بالإضافة إلى ذلك، تؤكد الأبحاث على أهمية قياس عدم اليقين في التوقعات وتستكشف الأساليب التجميعية التي تجمع بين نماذج متعددة لتعزيز الدقة. تستخدم التحليل بيانات من ريو دي جانيرو، البرازيل، تغطي فترة ثماني سنوات، وتدمج مجموعة متنوعة من الأساليب الإحصائية وتقنيات التعلم الآلي لتوفير تقييم شامل لقدرات توقع حمى الضنك.

الطرق

في هذا القسم، يحدد المؤلفون منهجيتهم لتوقع حالات حمى الضنك في ريو دي جانيرو باستخدام استراتيجية النافذة المتحركة، التي تسمح بتكيف النماذج مع أنماط البيانات المتغيرة بمرور الوقت. تم اختيار حجم النافذة الثابت 6 سنوات لالتقاط الاتجاهات طويلة الأجل والتقلبات الموسمية في حدوث حمى الضنك بشكل فعال مع ضمان توفر بيانات كافية لتدريب النموذج. تم تحديد أفق التنبؤ لتوقع الحالات من 1 إلى 12 أسبوعًا مقدمًا، مما يتيح تقييم أداء النموذج عبر التوقعات قصيرة إلى متوسطة الأجل. تؤكد الدراسة على مرونة حجم النافذة كمعامل قابل للتعديل، مما يسمح لممارسي الصحة بتعديله بناءً على الاحتياجات المحددة لتعزيز دقة التنبؤ.

نفذ المؤلفون مجموعة متنوعة من النماذج الإحصائية وتقنيات التعلم الآلي، بما في ذلك AR(1)، MA(1)، ARIMA، SARIMAX، ETS، غابة عشوائية، XGBoost، آلة الدعم الناقل (SVM)، والشبكات العصبية الذاكرة الطويلة القصيرة (LSTM). تم تنفيذ النماذج الإحصائية باستخدام حزمة R `forecast`، بينما استخدمت نماذج التعلم الآلي مكتبات Python مثل `scikit-learn` و`xgboost`. تم بناء شبكة LSTM باستخدام TensorFlow، مع طبقة واحدة من 1,000 وحدة وطبقة إخراج كثيفة. تم اختيار هياكل النماذج والمعلمات لتسهيل المقارنات المباشرة بين الأساليب، مع إعطاء الأولوية للأداء العام على ضبط المعلمات بشكل موسع. تهدف هذه الطريقة إلى تزويد ممارسي الصحة بأدوات توقع متاحة يمكن تحسينها بشكل أكبر للسياقات المحددة.

النتائج

في هذا القسم، يقيم المؤلفون الأداء التنبؤي لمجموعة متنوعة من النماذج الإحصائية، وتقنيات التعلم الآلي، والأساليب التجميعية من خلال تقييم دقتها باستخدام مقاييس مثل متوسط الخطأ المطلق (MAE)، وجذر متوسط مربع الخطأ (RMSE)، ومتوسط الخطأ النسبي المطلق (MAPE) عبر آفاق التوقع المختلفة. بالإضافة إلى ذلك، يبلغون عن احتمال تغطية 95% ومتوسط عرض فترات عدم اليقين لكل نموذج، مما يوفر نظرة شاملة على موثوقيتها. يتضمن التحليل أيضًا مقارنة لكفاءة الأساليب الحسابية.

تتضمن المعلومات التكميلية تمثيلات بصرية للنتائج، حيث توضح الشكل S1 التوقعات من الأساليب الأعلى أداءً عبر مختلف آفاق الزمن. تقدم الأشكال S2 إلى S5 مخططات صندوقية للأخطاء المطلقة، \(|y_i – \hat{y}_i|\)، والأخطاء النسبية المطلقة، \(\frac{|y_i – \hat{y}_i|}{y_i}\)، مصنفة حسب الأساليب التي تستخدم فقط بيانات حالات حمى الضنك مقابل تلك التي تتضمن متغيرات إضافية. تم تنظيم المخططات الصندوقية من الأساليب الأكثر فعالية إلى الأقل فعالية، مما يبرز الفروق في الأداء عبر الأساليب التي تم تقييمها.

المناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على العلاقة الكبيرة بين تفشي حمى الضنك والظروف المناخية في ريو دي جانيرو، البرازيل. يخلق المناخ شبه الاستوائي للمدينة، الذي يتميز بارتفاع درجات الحرارة والرطوبة، بيئة مثالية لبعوضة Aedes aegypti، التي تعتبر الناقل الرئيسي لانتقال حمى الضنك. تشير البيانات التاريخية إلى أن ريو دي جانيرو واجهت عدة أوبئة من حمى الضنك منذ إعادة إدخال البعوض في عام 1977، مع ذروات ملحوظة في الحالات خلال الأشهر الأكثر دفئًا، وخاصة بين مارس ومايو. يلعب نظام InfoDengue دورًا حاسمًا في مراقبة حدوث حمى الضنك من خلال دمج الحالات المبلغ عنها مع بيانات المناخ، مما يسهل الاستجابة الفورية للصحة العامة.

يتناول القسم بمزيد من التفصيل النماذج الإحصائية وتقنيات التعلم الآلي المستخدمة في توقع حمى الضنك، بما في ذلك المتوسط المتحرك المتكامل الذاتي (ARIMA)، وARIMAX الموسمي (SARIMAX)، ومجموعة متنوعة من تقنيات التعلم الآلي مثل الغابات العشوائية والشبكات العصبية الذاكرة الطويلة القصيرة (LSTM). تتضمن النماذج متغيرات مناخية مثل درجة الحرارة والرطوبة، مع اعتبار التأثيرات المتأخرة لتعزيز دقة التنبؤ. تشير النتائج إلى أن نموذج SARIMAX، خاصة عند تضمين المتغيرات المتأخرة، يتفوق على الآخرين في التوقعات قصيرة إلى متوسطة الأجل. في حين تظهر نماذج التعلم الآلي مثل SVM وLSTM قدرات تنبؤية قوية، حيث تتفوق LSTM في التوقعات متوسطة الأجل. تؤكد الدراسة على أهمية دمج العوامل المناخية في النماذج التنبؤية لتحسين توقعات حالات حمى الضنك، مما يساعد المبادرات الصحية العامة في إدارة التفشي بشكل فعال.

Journal: Tropical Medicine and Health, Volume: 53, Issue: 1
DOI: https://doi.org/10.1186/s41182-025-00723-7
PMID: https://pubmed.ncbi.nlm.nih.gov/40211309
Publication Date: 2025-04-10
Author(s): Xiang Chen et al.
Primary Topic: Mosquito-borne diseases and control

Overview

This study investigates the predictive performance and computational efficiency of various statistical models and machine learning techniques for forecasting dengue outbreaks, specifically in Rio de Janeiro, Brazil. The research emphasizes the importance of accurate dengue forecasting for public health planning and intervention, particularly by incorporating climate factors known to influence disease transmission. A dynamic window approach was employed to generate weekly forecasts, comparing methods such as Autoregressive Integrated Moving Average (ARIMA), Seasonal ARIMAX (SARIMAX), and machine learning techniques like Long-Short-Term Memory (LSTM) networks and Random Forest.

The results indicate that ARIMA outperformed other statistical models when using only historical case data, while SARIMAX significantly enhanced predictive accuracy by including climate covariates. Among machine learning models, LSTM, especially when combined with climate data, emerged as the most accurate, albeit with longer training and prediction times. For long-term forecasts, the Prophet model with climate covariates proved to be the most effective. Additionally, ensemble models, particularly those combining LSTM and ARIMA, demonstrated notable improvements over individual approaches. This research underscores the potential of integrating machine learning techniques and climate factors to enhance dengue forecasting, providing critical insights for public health officials.

Introduction

The introduction of the paper addresses the significant public health challenge posed by dengue fever, a viral infection transmitted by Aedes mosquitoes, particularly Aedes aegypti. The dengue virus comprises four serotypes (DENV-1, DENV-2, DENV-3, and DENV-4), with infections ranging from mild to severe, including potentially fatal forms such as dengue hemorrhagic fever. The disease is endemic in over a hundred countries and is expected to spread further due to climate change, which influences mosquito breeding and viral transmission. Current prevention strategies include personal protection, chemical control, and enhanced surveillance, but no universal treatment exists.

The authors evaluate various statistical and machine learning methods for dengue forecasting, aiming to identify the most effective approaches for predictive performance and computational efficiency. They highlight the limitations of traditional statistical models, such as ARIMA, which may struggle with non-linear interactions, and contrast these with machine learning techniques like Random Forests and Long-Short-Term Memory (LSTM) networks, which can capture complex patterns. The study innovatively focuses on weekly predictions rather than monthly forecasts and employs a moving window strategy for model training, reflecting real-time data updates. Additionally, the research emphasizes the importance of uncertainty quantification in forecasts and explores ensemble methods that combine multiple models to enhance accuracy. The analysis utilizes data from Rio de Janeiro, Brazil, covering an eight-year span, and incorporates various statistical and machine learning techniques to provide a comprehensive evaluation of dengue forecasting capabilities.

Methods

In this section, the authors outline their methodology for forecasting dengue cases in Rio de Janeiro using a moving window strategy, which allows for the adaptation of models to changing data patterns over time. The fixed window size of 6 years was chosen to effectively capture long-term trends and seasonal variations in dengue incidence while ensuring sufficient data for model training. The forecasting horizon was set to predict cases 1 to 12 weeks ahead, enabling an evaluation of model performance across short-to-medium-term predictions. The study emphasizes the flexibility of the window size as a tunable parameter, allowing health practitioners to adjust it based on specific needs to enhance forecasting accuracy.

The authors implemented various statistical and machine learning models, including AR(1), MA(1), ARIMA, SARIMAX, ETS, Random Forest, XGBoost, Support Vector Machine (SVM), and Long Short-Term Memory (LSTM) networks. Statistical models were executed using the R package `forecast`, while machine learning models utilized Python libraries such as `scikit-learn` and `xgboost`. The LSTM network was built with TensorFlow, featuring a single layer of 1,000 units and a dense output layer. The choice of model architectures and parameters was made to facilitate straightforward comparisons between methods, prioritizing general performance over extensive parameter tuning. This approach aims to provide health practitioners with accessible forecasting tools that can be further optimized for specific contexts.

Results

In this section, the authors assess the predictive performance of various statistical models, machine learning techniques, and ensemble methods by evaluating their accuracy using metrics such as Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE) across different forecast horizons. Additionally, they report the 95% coverage probability and the average width of the uncertainty intervals for each model, providing a comprehensive overview of their reliability. A comparative analysis of the computational efficiency of the methods is also included.

Supplementary Information includes visual representations of the results, with Figure S1 illustrating the predictions from the top-performing methods across various time horizons. Figures S2 to S5 present boxplots of the absolute errors, \(|y_i – \hat{y}_i|\), and absolute percentage errors, \(\frac{|y_i – \hat{y}_i|}{y_i}\), categorized by methods that utilize only dengue case data versus those that incorporate additional covariates. The boxplots are organized from the most effective to the least effective forecasting methods, highlighting the performance differences across the evaluated approaches.

Discussion

The discussion section of the research paper highlights the significant relationship between dengue fever outbreaks and climatic conditions in Rio de Janeiro, Brazil. The city’s subtropical climate, characterized by high temperatures and humidity, creates an ideal environment for the Aedes aegypti mosquito, which is the primary vector for dengue transmission. Historical data indicates that Rio de Janeiro has faced multiple dengue epidemics since the mosquito’s reintroduction in 1977, with notable peaks in cases during the warmer months, particularly between March and May. The InfoDengue system plays a crucial role in monitoring dengue incidence by integrating reported cases with climate data, thus facilitating timely public health responses.

The section further elaborates on the statistical and machine learning models employed for dengue forecasting, including Autoregressive Integrated Moving Average (ARIMA), Seasonal ARIMAX (SARIMAX), and various machine learning techniques such as Random Forest and Long-Short-Term Memory (LSTM) networks. The models incorporate climatic variables like temperature and humidity, with lagged effects considered to enhance predictive accuracy. Results indicate that the SARIMAX model, particularly when including lagged covariates, outperforms others in short- to medium-term forecasts. Meanwhile, machine learning models like SVM and LSTM demonstrate strong predictive capabilities, with LSTM excelling in medium-term forecasts. The study underscores the importance of integrating climatic factors into predictive models to improve dengue case forecasting, thereby aiding public health initiatives in managing outbreaks effectively.