الاستدلال القائم على الاحتمالية لنموذج غومبيرتس مع أخطاء بواسون Likelihood-based inference for the Gompertz model with Poisson errors

المجلة: Statistics and Computing، المجلد: 36، العدد: 3
DOI: https://doi.org/10.1007/s11222-026-10878-w
تاريخ النشر: 2026-04-25
المؤلف: Paolo Onorati وآخرون
الموضوع الرئيسي: سلاسل ماركوف وطرق مونت كارلو

نظرة عامة

تقدم هذه القسم نظرة عامة على أهمية نماذج ديناميات السكان عبر مجالات مختلفة، بما في ذلك العلوم الاكتوارية، وعلم السكان، والبيئة، مع التأكيد على دورها في فهم التغيرات السكانية التاريخية وتوقع الاتجاهات المستقبلية. كما تسلط الضوء على التحديات التي تطرحها أخطاء العينة، والتي يمكن أن تُدخل عدم اليقين والتحيز في الاستدلال الإحصائي، مما قد يؤدي إلى استنتاجات غير صحيحة بشأن سلوك السكان.

يُعتبر نموذج غومبرتز أداة شائعة لتحليل ديناميات حجم السكان، ويُلاحظ أنه يواجه تحديات حسابية عند دمج خطأ العينة من خلال نهج الاحتمالية الكاملة. لمعالجة هذه المشكلة، يقدم المؤلفون طرق حسابية فعالة للاستدلال الإحصائي ضمن إطار عمل غومبرتز، مع الأخذ في الاعتبار خطأ العينة بواسون. يتم توضيح فعالية هذه الطرق من خلال المحاكاة وتحليل البيانات، مما يُظهر قابليتها للتطبيق في كل من السياقات البايزية والتكرارية.

مقدمة

تؤكد مقدمة هذه الورقة البحثية على أهمية نماذج ديناميات السكان في البيولوجيا الرياضية، مع تسليط الضوء على تطبيقاتها في الحفاظ على الأنواع، والغزوات البيولوجية، وتقييم استجابة البيئة. تحدد مصدرين رئيسيين من عدم اليقين يؤثران على دقة النموذج: العمليات العشوائية المتعلقة بالعوامل الديموغرافية والبيئية، وخطأ العينة. تشير الورقة إلى التقدم في النمذجة البيئية، خاصة من خلال نماذج الحالة، لكنها تعترف بالتحديات المستمرة في دمج كل من ضوضاء العملية وخطأ الملاحظة، مما يعقد الجهود الحسابية.

يتم تقديم نموذج غومبرتز كإطار عمل مستخدم على نطاق واسع لوصف ديناميات السكان عبر مجالات مختلفة، بما في ذلك العلوم الاكتوارية وعلوم الحياة. يناقش المؤلفون التعديلات على نموذج غومبرتز التي تستوعب تباين العينة، خاصة عندما تتبع ضوضاء الملاحظة توزيع لوغاريتمي طبيعي، مما يسمح بصياغة دالة احتمال مناسبة لتصفية كالمان. ومع ذلك، تظهر التعقيدات عندما تنحرف أخطاء العينة عن اللوغاريتمية الطبيعية، كما أبرزت أعمال ليلي حول التكاملات عالية الأبعاد ونهج الاحتمالية المركبة. المساهمة الرئيسية لهذه الورقة هي تطوير خوارزميات حسابية تمكّن من تحليل قائم على الاحتمالية الكاملة لنموذج غومبرتز مع أخطاء بواسون، مما يسهل الاستدلال التكراري والبايزي. توضح الورقة هيكلها، مع تفاصيل النموذج الإحصائي، وإجراءات التقدير، والتجارب العددية، مما يؤدي إلى مناقشة اتجاهات البحث المستقبلية.

النتائج

تشير نتائج المحاكاة إلى أن أداء الخوارزميات الثلاثة يتحسن مع زيادة طول السلاسل الزمنية. ومن الملاحظ أن عينة غيبس أظهرت عدم يقين أكبر في التقديرات للمعامل $\theta_2$ مقارنة بالخوارزميات الأخرى. بينما حققت جميع الطرق الثلاثة قيم تغطية قريبة من 0.95 للمعامل $b$، أظهر نهج تقدير الاحتمالية القصوى (MLE) تغطية أقل، خاصة عند $T = 30$ في ظل ظروف ارتباط عالية، على الأرجح بسبب حجم العينة غير الكافي لتقديرات التباين الأسيمبتي الموثوقة. عند $T = 100$، قدمت عينة غيبس التغطية الأكثر دقة عبر جميع المعاملات وسيناريوهات الارتباط.

في سياق النماذج غير المحددة بشكل صحيح، كانت الفروق في الأداء بين الخوارزميات واضحة، خاصة بين سيناريوهات الارتباط العالية والمتوسطة. أدت الارتباطات المنخفضة إلى تحسين الأداء عبر جميع الخوارزميات، حيث تميزت بتقليل متوسط الخطأ التربيعي (MSE)، وانخفاض عدم اليقين، وتغطية أكثر دقة. اختلفت الكفاءة الحسابية للخوارزميات، حيث كان نهج MLE هو الأبطأ، حيث استغرق من مرتين إلى تسع مرات أطول من عينة غيبس، التي كانت أسرع بحوالي خمس مرات من ستان. علاوة على ذلك، كانت عينة غيبس consistently yield effective sample sizes أكبر مقارنة بستين، وذلك بسبب تصميمها المخصص للمشكلة المحددة المطروحة. على الرغم من أن كلا الخوارزميتين تتقارب إلى استنتاجات باطنية مماثلة مع عدد كافٍ من التكرارات، فإن أوقات الخلط المتفوقة لعينة غيبس تجعلها أكثر كفاءة للجولات القصيرة، مما يبرز الآثار العملية لاختيار الخوارزمية في الاستدلال البايزي.

مناقشة

في هذا القسم، يناقش المؤلفون تطبيق نموذج غومبرتز مع أخطاء عشوائية لاستنتاج ديناميات السكان، خاصة عندما تكون أحجام السكان الحقيقية غير مرئية. يتم تعريف النموذج بالمعادلة $ N_{t+1} = N_t \exp(a + b \log(N_t) + \epsilon_{t+1}) $، حيث تمثل $ \epsilon_t $ الأخطاء الموزعة بشكل طبيعي. يستخرج المؤلفون تحويلًا لوغاريتميًا يؤدي إلى عملية ذاتية الانحدار، مما يسمح بتقدير المعاملات من خلال تقدير الاحتمالية القصوى (MLE) والاستدلال البايزي. يستخدم نهج MLE استراتيجية زيادة البيانات عبر خوارزمية التوقع-التعظيم (EM)، بينما يستخدم الإطار البايزي طرق سلسلة ماركوف مونت كارلو (MCMC) لأخذ عينات من التوزيع الباطني، الذي يكون غير قابل للتحليل.

يسلط المؤلفون الضوء على التحديات الحسابية المرتبطة بتوزيع خطأ العينة بواسون، خاصة نقص الحلول المغلقة لدالة الاحتمال. يقترحون طريقة مونت كارلو EM (MCEM) لمعالجة هذه التحديات، والتي تتضمن تقدير المعاملات بشكل تكراري وزيادة البيانات المفقودة. كما يوضح القسم استراتيجيات التهيئة للخوارزميات ويناقش أداء الطرق المقترحة من خلال التجارب العددية. تهدف هذه التجارب إلى مقارنة الأساليب التكرارية والبايزي، وتقييم تأثير المواصفات السابقة، وتقييم كفاءة الخوارزميات مقارنة بالطرق المعتمدة مثل ستان. تشير النتائج إلى أن عينة MCMC المقترحة فعالة حسابيًا وتوفر تقديرات موثوقة، حتى في سيناريوهات عدم تحديد النموذج.

Journal: Statistics and Computing, Volume: 36, Issue: 3
DOI: https://doi.org/10.1007/s11222-026-10878-w
Publication Date: 2026-04-25
Author(s): Paolo Onorati et al.
Primary Topic: Markov Chains and Monte Carlo Methods

Overview

The section provides an overview of the significance of population dynamics models across various fields, including actuarial science, demography, and ecology, emphasizing their role in understanding historical population changes and forecasting future trends. It highlights the challenges posed by sampling error, which can introduce uncertainty and bias in statistical inference, potentially leading to incorrect conclusions regarding population behavior.

The Gompertz model, a prevalent tool for analyzing population size dynamics, is noted for its computational challenges when incorporating sampling error through a full likelihood approach. To address this issue, the authors present efficient computational methods for statistical inference within the Gompertz framework, specifically considering Poisson sampling error. The effectiveness of these methods is demonstrated through simulations and data analysis, showcasing their applicability in both Bayesian and frequentist contexts.

Introduction

The introduction of this research paper emphasizes the significance of population dynamics models in mathematical biology, highlighting their applications in species conservation, biological invasions, and environmental response assessments. It identifies two primary sources of uncertainty affecting model accuracy: stochastic processes related to demographic and environmental factors, and sampling error. The paper notes advancements in ecological modeling, particularly through state-space models, but acknowledges ongoing challenges in incorporating both process noise and observation error, which complicate computational efforts.

The Gompertz model is presented as a widely utilized framework for characterizing population dynamics across various fields, including actuarial science and life sciences. The authors discuss modifications to the Gompertz model that accommodate sampling variability, particularly when observation noise follows a log-normal distribution, allowing for the formulation of a likelihood function suitable for Kalman filtering. However, complexities arise when sampling errors deviate from log-normality, as highlighted by Lele’s work on high-dimensional integrals and composite likelihood approaches. The main contribution of this paper is the development of computational algorithms enabling full likelihood-based analysis of the Gompertz model with Poisson errors, facilitating both frequentist and Bayesian inference. The paper outlines its structure, detailing the statistical model, estimation procedures, and numerical experiments, culminating in a discussion of future research directions.

Results

The simulation results indicate that the performance of the three algorithms improves with longer time series lengths. Notably, the Gibbs sampler exhibited greater uncertainty in estimates for the parameter $\theta_2$ compared to the other algorithms. While all three methods achieved coverage values close to 0.95 for the parameter $b$, the Maximum Likelihood Estimation (MLE) approach demonstrated lower coverage, especially at $T = 30$ under high correlation conditions, likely due to insufficient sample size for reliable asymptotic variance estimates. At $T = 100$, the Gibbs sampler provided the most accurate coverage across all parameters and correlation scenarios.

In the context of misspecified models, performance disparities among the algorithms were evident, particularly between high and moderate correlation scenarios. Lower correlations led to improved performance across all algorithms, characterized by reduced mean squared error (MSE), lower uncertainty, and more accurate coverage. The computational efficiency of the algorithms varied, with the MLE approach being the slowest, taking two to nine times longer than the Gibbs sampler, which was approximately five times faster than Stan. Furthermore, the Gibbs sampler consistently yielded larger effective sample sizes compared to Stan, attributed to its design tailored for the specific problem at hand. Although both algorithms converge to similar posterior conclusions with sufficient iterations, the Gibbs sampler’s superior mixing times make it more efficient for shorter runs, highlighting the practical implications of algorithm selection in Bayesian inference.

Discussion

In this section, the authors discuss the application of the Gompertz model with stochastic errors to infer population dynamics, particularly when the true population sizes are unobserved. The model is defined by the equation $ N_{t+1} = N_t \exp(a + b \log(N_t) + \epsilon_{t+1}) $, where $ \epsilon_t $ represents normally distributed errors. The authors derive a logarithmic transformation leading to an autoregressive process, allowing for the estimation of parameters through maximum likelihood estimation (MLE) and Bayesian inference. The MLE approach utilizes a data augmentation strategy via the Expectation-Maximization (EM) algorithm, while the Bayesian framework employs Markov Chain Monte Carlo (MCMC) methods to sample from the posterior distribution, which is analytically intractable.

The authors highlight the computational challenges associated with the Poisson sampling error distribution, particularly the lack of closed-form solutions for the likelihood function. They propose a Monte Carlo EM (MCEM) method to address these challenges, which involves iteratively estimating parameters and augmenting missing data. The section also outlines the initialization strategies for the algorithms and discusses the performance of the proposed methods through numerical experiments. These experiments aim to compare the frequentist and Bayesian approaches, assess the impact of prior specifications, and evaluate the algorithms’ efficiency against established methods like Stan. The results indicate that the proposed MCMC sampler is computationally efficient and provides reliable estimates, even under model misspecification scenarios.