تطوير مقدر الحافة العام لنموذج الانحدار بواسون-معكوس غاوسي مع التعدد الخطي Development of the generalized ridge estimator for the Poisson-Inverse Gaussian regression model with multicollinearity

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-15334-9
PMID: https://pubmed.ncbi.nlm.nih.gov/40850969
تاريخ النشر: 2025-08-25
المؤلف: Fatimah A. Almulhim وآخرون
الموضوع الرئيسي: طرق ونماذج إحصائية متقدمة

نظرة عامة

نموذج الانحدار بواسون-غاوسي العكسي (PIGRM) هو نهج شائع لتحليل بيانات العد المفرط التشتت، ولكن تقدير الاحتمالية القصوى (MLE) يمكن أن يكون غير موثوق في وجود تعدد الارتباط بين المتغيرات التفسيرية. تقدم هذه الدراسة مقدر الانحدار بواسون-غاوسي العكسي مع تقنية الانحدار (PIGGRE) كطريقة تقدير متحيزة تستخدم تقنيات معامل الانكماش للتخفيف من آثار تعدد الارتباط. تم تقييم أداء PIGGRE بدقة من خلال دراسات محاكاة عبر سيناريوهات مختلفة، مما يظهر أنه يتفوق باستمرار على كل من MLE وPIGRRE التقليدي، خاصة عند استخدام معاملات الانكماش المثلى $K_1$ و$K_3$، مما أدى إلى أدنى متوسط خطأ تربيعي (MSE).

بالإضافة إلى ذلك، تم تأكيد القابلية العملية لـ PIGGRE من خلال تحليل مجموعتين من البيانات الحقيقية، مما يعزز نتائج المحاكاة ويسلط الضوء على مزايا المقدّر في معالجة قضايا تعدد الارتباط داخل PIGRM. بينما تدعو النتائج إلى استخدام PIGGRE مع قيم $K$ محددة، تشير الدراسة إلى قيد حاسم: أداء المقدّر حساس لاختيار هذه المعاملات، والتي يمكن أن تؤثر على التوازن بين التحيز والتباين. يمكن أن تستكشف الأبحاث المستقبلية توسيع هذه المنهجية لتشمل نماذج أخرى، مثل النماذج السلبية ذات الانفجار الصفري ونماذج بواسون، مع الأخذ في الاعتبار أيضًا تحسينات من خلال تقنيات تقدير متقدمة.

الطرق

يصف قسم المنهجية تطوير وتقدير توزيع PIG، الذي يدمج توزيعات بواسون وغاوسي العكسي. المتغير العشوائي $ Y $ يتبع توزيع بواسون بمتوسط $ \mu \nu $، حيث يتم سحب $ \nu $ من توزيع غاوسي عكسي بمتوسط 1 ومعامل تشتت $ \phi $. دالة الكتلة الاحتمالية (PMF) لـ $ Y $ تُعطى بواسطة

\[
f(y|\mu, \nu) = \frac{(\mu \nu)^y}{y!} \exp(-\mu \nu),
\]

ودالة الكتلة الاحتمالية الهامشية لـ $ Y $ تُشتق من خلال التكامل على $ \nu $، مما يؤدي إلى توزيع PIG المميز بـ

\[
P(Y = y|\mu) = \frac{\sqrt{2\phi}}{\pi \mu^y} \exp(\phi) R_s(\alpha) \frac{1}{y!},
\]

حيث $ s = y – 0.5 $ و $ \alpha^2 = \phi^2(1 + 2\mu\phi) $. المتوسط والتباين لهذا التوزيع هما $ E(Y) = \mu $ و $ V(Y) = \mu + \mu^3 \phi $، على التوالي.

يتم إجراء تقدير المعاملات باستخدام طريقة تقدير الاحتمالية القصوى (MLE). يتم صياغة دالة اللوغاريتم الاحتمالي لكل من المعاملات $ \mu $ و $ \phi $، ويتم تعيين مشتقات اللوغاريتم الاحتمالي بالنسبة للمعاملات إلى الصفر لإيجاد التقديرات. تُستخدم طرق عددية، مثل خوارزمية نيوتن-رافسون أو المربعات الصغرى المعاد وزنها بشكل تكراري (IRLS)، بسبب الطبيعة غير الخطية للمعادلات. يتم حساب MLE للمعاملات، ويتم تقييم دقة المقدّر باستخدام مصفوفة متوسط الخطأ التربيعي (MSEM) ومتوسط الخطأ التربيعي (MSE) لـ $ \beta_{MLE} $. يتم اشتقاق MSEM وMSE من التحلل الطيفي للمصفوفة $ X^T \mathbf{W} X $، حيث $ \mathbf{W} $ هي مصفوفة قطرية تعكس تباين التقديرات.

النتائج

تكشف نتائج المحاكاة المقدمة في الجداول 2 إلى 10 عن رؤى مهمة حول أداء مقدرات مختلفة، وخاصة مقدر الاحتمالية القصوى (MLE)، PIGRRE، وPIGGRE، تحت ظروف مختلفة من تعدد الارتباط والتشتت وحجم العينة. تشير النتائج الرئيسية إلى أن زيادة في تعدد الارتباط (المشار إليه بـ $\rho$)، أو معامل التشتت ($\phi$)، أو عدد المتغيرات التفسيرية ($p$) تؤدي إلى قيم أعلى من متوسط الخطأ التربيعي (MSE) عبر جميع المقدرات، مما يشير إلى تراجع في الأداء تحت هذه الظروف. على العكس، ترتبط أحجام العينات الأكبر ($n$) بقيم MSE أقل، مما يبرز أهميتها في تعزيز موثوقية المقدّر.

من الجدير بالذكر أن PIGRRE تفوقت باستمرار على MLE، مما أسفر عن قيم MSE أقل عبر جميع السيناريوهات. ومع ذلك، أظهر PIGGRE أفضل أداء بشكل عام، محققًا أدنى قيم MSE بغض النظر عن التغيرات في $\phi$، $\rho$، $p$، و$n$. أثبت هذا المقدّر فعاليته بشكل خاص في سياقات تعدد الارتباط العالي والتشتت العالي، مما يشير إلى قوته في المواقف المعقدة حيث قد تفشل الطرق التقليدية. توضح الأشكال 1a-d هذه النتائج، حيث تعرض MSEs لمقدرات مختلفة عبر إعدادات متنوعة، مع عرض PIGGRE لأداء متفوق، خاصة عند استخدام معاملات الانحدار $K_1$ و$K_4$، التي تقلل بشكل فعال من أخطاء التقدير.

المناقشة

في هذا القسم، يناقش المؤلفون تطوير وتقييم مقدر الانحدار العام بواسون-غاوسي العكسي (PIGGRE) كتحسين للمقدر الحالي للانحدار بواسون-غاوسي العكسي (PIGRRE). يقدم PIGGRE معامل انكماش فريد لكل معامل انحدار، مما يسمح بمرونة أكبر في معالجة قضايا تعدد الارتباط المتأصلة في النماذج الخطية العامة (GLMs). يتم تقديم الصياغة الرياضية لـ PIGGRE، بما في ذلك هيكلياته من التحيز والتباين، والتي تعتبر ضرورية لفهم أدائه بالنسبة للمقدرات التقليدية مثل تقدير الاحتمالية القصوى (MLE) وPIGRRE. يحدد المؤلفون الشروط التي يتفوق فيها PIGGRE على هذه البدائل، مما يظهر تفوقه من خلال الإثباتات النظرية والمحاكاة التجريبية.

يتم التحقق من أداء PIGGRE من خلال محاكاة مونت كارلو وتطبيقات على مجموعات بيانات حقيقية، بما في ذلك تحليلات بيانات الاقتباس في علم الأحياء التطوري ومقاييس أداء الرياضيين. تشير النتائج إلى أن PIGGRE يحقق باستمرار متوسط أخطاء تربيعية (MSE) أقل مقارنة بـ MLE وPIGRRE، خاصة عند استخدام معاملات تحيز محددة. تؤكد النتائج فعالية PIGGRE في التخفيف من الآثار السلبية لتعدد الارتباط، بينما تبرز أيضًا أهمية اختيار معاملات الانكماش المناسبة لتحسين أداء المقدّر. يوصي المؤلفون بمزيد من الاستكشاف لتطبيق PIGGRE على نماذج أخرى، مع الاعتراف بحساسيتها لاختيارات المعاملات واقتراح طرق للبحث المستقبلي لتعزيز قوتها.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-15334-9
PMID: https://pubmed.ncbi.nlm.nih.gov/40850969
Publication Date: 2025-08-25
Author(s): Fatimah A. Almulhim et al.
Primary Topic: Advanced Statistical Methods and Models

Overview

The Poisson-Inverse Gaussian regression model (PIGRM) is a prevalent approach for analyzing over-dispersed count data, but its maximum likelihood estimation (MLE) can be unreliable in the presence of multicollinearity among explanatory variables. This study introduces the Poisson-Inverse Gaussian ridge regression estimator (PIGGRE) as a biased estimation method that employs shrinkage parameter techniques to mitigate the effects of multicollinearity. The performance of PIGGRE was rigorously evaluated through simulation studies across various scenarios, demonstrating that it consistently outperforms both MLE and traditional PIGRRE, particularly when using optimal shrinkage parameters $K_1$ and $K_3$, which resulted in the lowest mean squared error (MSE).

Additionally, the practical applicability of PIGGRE was confirmed through analyses of two real datasets, reinforcing the simulation findings and highlighting the estimator’s advantages in addressing multicollinearity issues within PIGRM. While the results advocate for the use of PIGGRE with specific $K$ values, the study notes a critical limitation: the estimator’s performance is sensitive to the choice of these parameters, which can influence the balance between bias and variance. Future research could explore extending this methodology to other models, such as zero-inflated negative binomial and Poisson models, while also considering enhancements through advanced estimation techniques.

Methods

The methodology section describes the development and estimation of the PIG distribution, which integrates the Poisson and Inverse Gaussian distributions. The random variable $ Y $ follows a Poisson distribution with mean $ \mu \nu $, where $ \nu $ is drawn from an Inverse Gaussian distribution with mean 1 and dispersion parameter $ \phi $. The probability mass function (PMF) of $ Y $ is given by

\[
f(y|\mu, \nu) = \frac{(\mu \nu)^y}{y!} \exp(-\mu \nu),
\]

and the marginal PMF of $ Y $ is derived through integration over $ \nu $, resulting in the PIG distribution characterized by

\[
P(Y = y|\mu) = \frac{\sqrt{2\phi}}{\pi \mu^y} \exp(\phi) R_s(\alpha) \frac{1}{y!},
\]

where $ s = y – 0.5 $ and $ \alpha^2 = \phi^2(1 + 2\mu\phi) $. The mean and variance of this distribution are $ E(Y) = \mu $ and $ V(Y) = \mu + \mu^3 \phi $, respectively.

Parameter estimation is conducted using the maximum likelihood estimation (MLE) method. The log-likelihood function is formulated for both the parameters $ \mu $ and $ \phi $, and the derivatives of the log-likelihood with respect to the parameters are set to zero to find the estimates. Numerical methods, such as the Newton-Raphson algorithm or iteratively reweighted least squares (IRLS), are employed due to the non-linear nature of the equations. The MLE for the parameters is computed, and the accuracy of the estimator is assessed using the mean squared error matrix (MSEM) and the mean squared error (MSE) of $ \beta_{MLE} $. The MSEM and MSE are derived from the spectral decomposition of the matrix $ X^T \mathbf{W} X $, where $ \mathbf{W} $ is a diagonal matrix reflecting the variance of the estimates.

Results

The simulation results presented in Tables 2 through 10 reveal significant insights into the performance of various estimators, specifically the Maximum Likelihood Estimator (MLE), PIGRRE, and PIGGRE, under different conditions of multicollinearity, dispersion, and sample size. Key findings indicate that an increase in multicollinearity (denoted by $\rho$), the dispersion parameter ($\phi$), or the number of explanatory variables ($p$) leads to higher Mean Squared Error (MSE) values across all estimators, suggesting a decline in performance under these circumstances. Conversely, larger sample sizes ($n$) correlate with reduced MSE values, highlighting their importance in enhancing estimator reliability.

Notably, the PIGRRE consistently outperformed the MLE, yielding lower MSE values across all scenarios. However, the PIGGRE demonstrated the best performance overall, achieving the lowest MSE values regardless of variations in $\phi$, $\rho$, $p$, and $n$. This estimator proved particularly effective in high multicollinearity and high dispersion contexts, indicating its robustness in complex situations where traditional methods may falter. Figures 1a-d further illustrate these findings, showcasing the MSEs for different estimators across various settings, with PIGGRE exhibiting superior performance, especially when utilizing ridge parameters $K_1$ and $K_4$, which effectively reduce estimation errors.

Discussion

In this section, the authors discuss the development and evaluation of the Poisson-Inverse Gaussian Generalized Ridge Estimator (PIGGRE) as an enhancement to the existing Poisson-Inverse Gaussian Ridge Regression Estimator (PIGRRE). The PIGGRE introduces a unique shrinkage parameter for each regression coefficient, allowing for greater flexibility in addressing multicollinearity issues inherent in Generalized Linear Models (GLMs). The mathematical formulation of the PIGGRE is presented, including its bias and covariance structures, which are essential for understanding its performance relative to traditional estimators like Maximum Likelihood Estimation (MLE) and PIGRRE. The authors establish conditions under which PIGGRE outperforms these alternatives, demonstrating its superiority through theoretical proofs and empirical simulations.

The performance of the PIGGRE is validated through Monte Carlo simulations and applications to real-world datasets, including analyses of citation data in evolutionary biology and athlete performance metrics. Results indicate that PIGGRE consistently yields lower Mean Squared Errors (MSE) compared to MLE and PIGRRE, particularly when employing specific biasing parameters. The findings underscore the efficacy of PIGGRE in mitigating the adverse effects of multicollinearity, while also highlighting the importance of selecting appropriate shrinkage parameters to optimize estimator performance. The authors recommend further exploration of PIGGRE’s application to other models, acknowledging its sensitivity to parameter choices and suggesting avenues for future research to enhance its robustness.