نموذج توقع الأمراض المعدية بناءً على خوارزمية التعلم العميق المحسّنة Infectious disease prediction model based on optimized deep learning algorithm

المجلة: Frontiers in Public Health، المجلد: 13
DOI: https://doi.org/10.3389/fpubh.2025.1703506
PMID: https://pubmed.ncbi.nlm.nih.gov/41613087
تاريخ النشر: 2026-01-14
المؤلف: Qian Cao وآخرون
الموضوع الرئيسي: دراسات وبائية حول COVID-19

نظرة عامة

تقدم البحث نموذج توقع هجين، GA-BiLSTM-ARIMA، مصمم لتعزيز دقة التنبؤ ببيانات حالات COVID-19 في اليابان. يدمج هذا النموذج الخوارزميات الجينية (GA) مع الشبكات العصبية ثنائية الاتجاه (BiLSTM) ونموذج المتوسط المتحرك التكاملي الذاتي (ARIMA). يقوم GA بتحسين المعلمات الفائقة بشكل تكراري، مما يحسن قدرة النموذج على التقاط الأنماط غير الخطية والخطية في البيانات. أسفرت مقاييس التقييم لنموذج GA-BiLSTM-ARIMA عن RMSE قدره 2,262.42، MAE قدره 1,672.07، MAPE قدره 6.81، وR² قدره 0.9764، مما يدل على أداء متفوق مقارنة بنماذج BiLSTM وARIMA المستقلة.

تسلط الدراسة الضوء على الآثار العملية للنموذج في الصحة العامة، مشددة على إمكانيته في تقديم تحذيرات مبكرة للسيطرة على الأمراض، وتوجيه تخصيص الموارد، ومحاكاة التدخلات السياسية. ومع ذلك، تعترف بالقيود المتعلقة بجودة البيانات، واستبعاد المتغيرات الخارجية مثل طفرات الفيروس، وفرضية استقرار الميزات الخطية وغير الخطية طوال فترة التنبؤ. تهدف الأبحاث المستقبلية إلى تطوير إطار توقع ديناميكي يدمج بيانات متعددة المصادر ويتكيف مع الظروف المتغيرة، مما يعزز قوة النموذج وقابليته للنقل في توقعات الأوبئة.

مقدمة

تستعرض المقدمة التأثير العالمي لجائحة COVID-19، التي بدأت في أواخر عام 2019 مع تفشي فيروس كورونا الجديد في ووهان، الصين. أدت القابلية العالية للفيروس على الانتقال والفتك المحتمل إلى استجابات حكومية واسعة النطاق، بما في ذلك عمليات الإغلاق وقيود السفر. في ضوء هذه التحديات، تم إجراء أبحاث واسعة حول توقع بيانات الأوبئة، مما أدى إلى تطوير نماذج ومنهجيات مختلفة تهدف إلى التنبؤ بالاتجاهات الوبائية. تشمل المساهمات الملحوظة استخدام النماذج المضافة العامة (GAMs) لحمى الضنك، وطرق التعلم الآلي لـ COVID-19، ومعايرة نماذج المعرضين للإصابة والمتعافين والمتوفين (SIRD) لتقدير المعلمات الوبائية الرئيسية.

تسلط المقدمة الضوء أيضًا على تعقيد التنبؤ بدقة باتجاهات الأوبئة بسبب عوامل مثل التدخلات السياسية وحركة السكان. تعتبر النماذج التقليدية مثل ARIMA فعالة في المكونات الخطية، بينما تلتقط BiLSTM الميزات غير الخطية. ومع ذلك، غالبًا ما تفشل النماذج الفردية، مما يستدعي أساليب مبتكرة. يقترح هذا البحث نموذج هجين موزون، GA-BiLSTM-ARIMA، الذي يستخدم خوارزمية جينية (GA) لتحسين دمج توقعات ARIMA وBiLSTM. تهدف الدراسة إلى تعزيز دقة التنبؤ لبيانات COVID-19 في اليابان وألمانيا، مقارنة أداء النموذج الهجين ضد الطرق التقليدية باستخدام مقاييس مثل خطأ الجذر التربيعي المتوسط (RMSE) ونسبة الخطأ المطلق المتوسطة (MAPE).

طرق

تستعرض قسم “المواد والطرق” تصميم التجربة والإجراءات المستخدمة في الدراسة. توضح المواد المحددة المستخدمة، بما في ذلك أي مواد كيميائية، معدات، وعينات بيولوجية، لضمان إمكانية تكرار التجارب. تشمل المنهجية التقنيات المطبقة لجمع البيانات وتحليلها، مثل الطرق الإحصائية، بروتوكولات التجارب، وأي أدوات حسابية مستخدمة.

علاوة على ذلك، يبرز القسم الضوابط والمتغيرات التي تم أخذها في الاعتبار أثناء التجارب، مما يوفر إطارًا واضحًا لفهم النتائج. من خلال الالتزام بمعايير منهجية صارمة، تهدف الدراسة إلى التحقق من نتائجها والمساهمة في المعرفة الموجودة في هذا المجال.

نتائج

في هذه الدراسة، تم تطوير نموذج الذاكرة طويلة وقصيرة المدى ثنائية الاتجاه (BiLSTM) للتنبؤ بعدد حالات COVID-19 الجديدة المؤكدة يوميًا في اليابان من 20 أبريل 2022 إلى 5 مارس 2023. تم تقسيم مجموعة البيانات إلى مجموعة تدريب (ثلثين) ومجموعة اختبار (ثلث)، مع تعيين نافذة التدريب إلى 60% من إجمالي البيانات لالتقاط الاتجاهات التاريخية بشكل فعال. تم تكوين النموذج لأفق توقع قدره 30 خطوة زمنية واستخدم نهج النافذة المنزلقة مع حجم خطوة قدره 7 أيام. تم مقارنة أداء نموذج BiLSTM بصريًا مع القيم الفعلية في رسم بياني زمني (الشكل 5)، وتم تلخيص مقاييس تقييمه في الجدول 2.

بالإضافة إلى ذلك، تم بناء نموذج ARIMA على نفس مجموعة البيانات، محققًا الاستقرار من خلال الفرق من الدرجة الأولى. تم تحديد نموذج ARIMA الأمثل وهو ARIMA(1, 1, 2)، الذي أظهر توافقًا جيدًا مع بقايا الضوضاء البيضاء. يتم توضيح مقارنة القيم الملائمة لنموذج ARIMA مع الحالات الفعلية في الشكل 6، مع تفاصيل مقاييس الأداء في الجدول 3. لتعزيز دقة التنبؤ، تم دمج المخرجات من كل من نماذج BiLSTM وARIMA باستخدام خوارزمية جينية (GA)، التي قامت بتحسين المعاملات لكل نموذج لتقليل خطأ الجذر التربيعي المتوسط (RMSE) للتنبؤات المجمعة.

مناقشة

في هذا القسم، يناقش المؤلفون أداء وتقييم ثلاثة نماذج تنبؤية لتوقع حالات COVID-19: نموذج الذاكرة طويلة وقصيرة المدى ثنائية الاتجاه (BiLSTM)، نموذج المتوسط المتحرك التكاملي الذاتي (ARIMA)، ونموذج هجين يجمع بين الاثنين مع خوارزمية جينية (GA). أظهر نموذج BiLSTM دقة متفوقة مع مقاييس تقييم RMSE = 2405.98، MAE = 1836.8، MAPE = 7.39، و$R^2 = 0.9734$. بالمقابل، حقق نموذج ARIMA نتائج أقل ملاءمة (RMSE = 7385.03، MAE = 5156.67، MAPE = 20.57، و$R^2 = 0.7491$). حسّن النموذج الهجين GA-BiLSTM-ARIMA بشكل كبير دقة التنبؤ، محققًا RMSE = 2262.42، MAE = 1672.07، MAPE = 6.81، و$R^2 = 0.9764$، متفوقًا بذلك على كلا النموذجين الفرديين.

يؤكد المؤلفون على قدرة النموذج الهجين على التقاط الأنماط الخطية وغير الخطية في البيانات بشكل فعال، مستفيدين من نقاط القوة في BiLSTM للتناسب غير الخطي وARIMA للاتجاهات الخطية. كما يبرزون الآثار العملية للنموذج في الصحة العامة، بما في ذلك إمكانيته في إبلاغ استراتيجيات التدخل المبكر، وتحسين تخصيص الموارد، ومحاكاة تأثيرات السياسات. ومع ذلك، يعترفون بالقيود المتعلقة بجودة البيانات، واستبعاد المتغيرات الخارجية مثل طفرات الفيروس، والطبيعة الثابتة لافتراضات النموذج. يتم اقتراح أعمال مستقبلية لتعزيز قدرة النموذج على التكيف والصلابة من خلال إطار توقع ديناميكي يدمج بيانات متعددة المصادر ويسمح بالتحديثات في الوقت الحقيقي.

Journal: Frontiers in Public Health, Volume: 13
DOI: https://doi.org/10.3389/fpubh.2025.1703506
PMID: https://pubmed.ncbi.nlm.nih.gov/41613087
Publication Date: 2026-01-14
Author(s): Qian Cao et al.
Primary Topic: COVID-19 epidemiological studies

Overview

The research presents a hybrid forecasting model, GA-BiLSTM-ARIMA, designed to enhance the predictive accuracy of COVID-19 case data in Japan. This model integrates Genetic Algorithms (GA) with Bidirectional Long Short-Term Memory (BiLSTM) networks and the Autoregressive Integrated Moving Average (ARIMA) model. The GA optimizes hyperparameters iteratively, improving the model’s ability to capture both nonlinear and linear patterns in the data. Evaluation metrics for the GA-BiLSTM-ARIMA model yielded RMSE of 2,262.42, MAE of 1,672.07, MAPE of 6.81, and R² of 0.9764, demonstrating superior performance compared to standalone BiLSTM and ARIMA models.

The study highlights the model’s practical implications in public health, emphasizing its potential to provide early warnings for disease control, guide resource allocation, and simulate policy interventions. However, it acknowledges limitations related to data quality, the exclusion of external variables such as virus mutations, and the assumption of stable linear and nonlinear features throughout the prediction period. Future research aims to develop a dynamic prediction framework that incorporates multi-source data and adapts to changing conditions, thereby enhancing the model’s robustness and transferability in epidemic forecasting.

Introduction

The introduction outlines the global impact of the COVID-19 pandemic, which began in late 2019 with a novel coronavirus outbreak in Wuhan, China. The virus’s high transmissibility and potential lethality prompted widespread governmental responses, including lockdowns and travel restrictions. In light of these challenges, extensive research has been conducted on epidemic data prediction, leading to the development of various models and methodologies aimed at forecasting epidemic trends. Notable contributions include the use of generalized additive models (GAMs) for dengue fever, machine learning approaches for COVID-19, and the calibration of Susceptible-Infected-Recovered-Deceased (SIRD) models to estimate key epidemiological parameters.

The introduction further highlights the complexity of accurately predicting epidemic trends due to factors such as policy interventions and population mobility. Traditional models like ARIMA are effective for linear components, while BiLSTM captures nonlinear features. However, single models often fall short, necessitating innovative approaches. This paper proposes a weighted hybrid model, GA-BiLSTM-ARIMA, which utilizes a genetic algorithm (GA) to optimize the fusion of ARIMA and BiLSTM predictions. The study aims to enhance predictive accuracy for COVID-19 data in Japan and Germany, comparing the performance of the hybrid model against traditional methods using metrics such as Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE).

Methods

The “Materials and Methods” section outlines the experimental design and procedures employed in the study. It details the specific materials used, including any reagents, equipment, and biological samples, ensuring reproducibility of the experiments. The methodology encompasses the techniques applied for data collection and analysis, such as statistical methods, experimental protocols, and any computational tools utilized.

Furthermore, the section emphasizes the controls and variables considered during the experiments, providing a clear framework for understanding the results. By adhering to rigorous methodological standards, the study aims to validate its findings and contribute to the existing body of knowledge in the field.

Results

In this study, a Bidirectional Long Short-Term Memory (BiLSTM) model was developed to predict daily new confirmed COVID-19 cases in Japan from April 20, 2022, to March 5, 2023. The dataset was divided into a training set (two-thirds) and a test set (one-third), with the training window set to 60% of the total data to capture historical trends effectively. The model was configured for a prediction horizon of 30 time steps and utilized a sliding window approach with a step size of 7 days. The performance of the BiLSTM model was visually compared to actual values in a time series plot (Figure 5), and its evaluation metrics are summarized in Table 2.

Additionally, an ARIMA model was constructed on the same dataset, achieving stationarity through first-order differencing. The optimal ARIMA model identified was ARIMA(1, 1, 2), which demonstrated a good fit with white noise residuals. A comparison of the ARIMA model’s fitted values against actual cases is illustrated in Figure 6, with performance metrics detailed in Table 3. To enhance predictive accuracy, the outputs from both the BiLSTM and ARIMA models were combined using a genetic algorithm (GA), which optimized the coefficients for each model to minimize the root mean square error (RMSE) of the ensemble predictions.

Discussion

In this section, the authors discuss the performance and evaluation of three predictive models for COVID-19 case forecasting: the Bidirectional Long Short-Term Memory (BiLSTM) model, the Autoregressive Integrated Moving Average (ARIMA) model, and a hybrid model combining both with a Genetic Algorithm (GA). The BiLSTM model demonstrated superior accuracy with evaluation metrics of RMSE = 2405.98, MAE = 1836.8, MAPE = 7.39, and $R^2 = 0.9734$. In contrast, the ARIMA model yielded less favorable results (RMSE = 7385.03, MAE = 5156.67, MAPE = 20.57, and $R^2 = 0.7491$). The hybrid GA-BiLSTM-ARIMA model significantly improved predictive accuracy, achieving RMSE = 2262.42, MAE = 1672.07, MAPE = 6.81, and $R^2 = 0.9764$, thus outperforming both individual models.

The authors emphasize the hybrid model’s ability to effectively capture both linear and nonlinear patterns in the data, leveraging the strengths of BiLSTM for nonlinear fitting and ARIMA for linear trends. They also highlight the model’s practical implications for public health, including its potential to inform early intervention strategies, optimize resource allocation, and simulate policy impacts. However, they acknowledge limitations related to data quality, the exclusion of external variables like virus mutations, and the static nature of the model’s assumptions. Future work is proposed to enhance the model’s adaptability and robustness through a dynamic prediction framework that integrates multi-source data and allows for real-time updates.