الانحدار شبه المعلمي للكتل والذيل باستخدام الشبكات العصبية المعتمدة على السلاسل Semi-parametric bulk and tail regression using spline-based neural networks

المجلة: Extremes
DOI: https://doi.org/10.1007/s10687-026-00533-y
تاريخ النشر: 2026-02-25
المؤلف: Reetam Majumder وآخرون
الموضوع الرئيسي: طرق إحصائية واستدلال

نظرة عامة

تقدم هذه القسم نهجًا جديدًا لتقدير الكثافة يعرف باسم الانحدار الكمي شبه المعلمي للحدود القصوى (SPQRx)، والذي يدمج الانحدار الكمي شبه المعلمي (SPQR) مع توزيع باريتو العام المدمج (GP). يتميز SPQR بمرونته في نمذجة دوال الكثافة الشرطية باستخدام تمثيلات سبلين والشبكات العصبية، لكنه يواجه صعوبات مع البيانات ذات الذيل الثقيل والتقدير خارج النطاقات المرصودة بسبب عدم الالتزام بنظرية القيم القصوى (EVT). يعالج توزيع GP المدمج هذه القيود من خلال تقديم نموذج شامل ينتقل بسلاسة إلى ذيول GP العلوية الدقيقة دون الحاجة إلى اختيار عتبة وسيطة.

يعزز SPQRx قدرات SPQR من خلال ضمان الامتثال لـ EVT، مما يحسن من موثوقية التقدير في الذيل. بالإضافة إلى ذلك، يتضمن النموذج قابلية التفسير من خلال درجات أهمية المتغيرات غير المعتمدة على النموذج، والتي تقيم أهمية المتغيرات في تحديد كل من الكتلة والذيل من الكثافة الشرطية. يتم إثبات فعالية SPQRx من خلال المحاكاة وتطبيق لتحليل مناطق حرائق الغابات في الولايات المتحدة من 1990 إلى 2020، مما يبرز فائدته العملية في السيناريوهات الواقعية.

مقدمة

تناقش مقدمة هذه الورقة البحثية أهمية نماذج الانحدار الكمي للحدود القصوى في مجالات مختلفة، بما في ذلك الاقتصاد القياسي، والتأمين، ونمذجة المخاطر الطبيعية. تعتمد هذه النماذج عادةً على فرضية معلمية تفترض أن الزيادات في متغير الاستجابة $ Y $ فوق عتبة $ u(x) $ تتبع توزيع باريتو العام (GP)، الذي يتميز بمعلمات الشكل والمقياس المعتمدة على المتغيرات $ \xi(x) $ و $ \sigma_u(x) $. ومع ذلك، فإن اختيار العتبة $ u(x) $ يقدم عدم اليقين ويعقد تفسير النموذج. لمعالجة ذلك، يقترح المؤلفون توزيع GP المدمج (bGP)، الذي يسمح بنمذجة النطاق الكامل للبيانات دون الحاجة إلى تقدير العتبة مع الحفاظ على ذيول GP العلوية الدقيقة.

تقدم الورقة إطارًا جديدًا يسمى الانحدار الكمي شبه المعلمي للحدود القصوى (SPQRx)، والذي يجمع بين نموذج انحدار GP العميق للذيل العلوي مع نموذج انحدار كمي شبه معلمي لكتلة التوزيع. يستفيد هذا النهج من مرونة نموذج الانحدار الكمي شبه المعلمي (SPQR)، الذي أظهر سابقًا وعدًا في العمليات القصوى المكانية. تشمل مساهمات هذا العمل تقديم توزيع bGP، وتطوير إطار SPQRx لتقدير الكثافة الشامل، وتعزيز قابلية تفسير النموذج من خلال استخدام التأثيرات المحلية التراكمية (ALEs). ستتناول الأقسام التالية من الورقة الخلفية النظرية، والمنهجية، وتطبيقات النماذج المقترحة.

النتائج

تشير النتائج المقدمة في الجدول 1 من دراسة المحاكاة إلى أن نموذج SPQRx، الذي يستخدم دالة تنشيط سيغمويد، يتفوق باستمرار على كل من طريقة SPQR التقليدية ونموذج الانحدار GP العميق في تقدير المسافات المتكاملة وازرشتاين (IWD) و IWD في الذيل (tIWD). على وجه التحديد، يظهر SPQRx فعالية متفوقة في نمذجة الكثافات الشرطية ذات الذيل الثقيل، حيث تكشف المعلمات الفائقة المثلى أن قيمة أقل من $c_1$ تكون مفيدة في إعدادات العينة المنخفضة، بينما يفضل استخدام قيمة أعلى مع المزيد من البيانات. تم تحديد مستويات الكمية المفضلة لنموذج SPQRx على أنها $p_a = 0.9$ و $p_b = 0.99$ عندما يكون $n = 1000$، مما يسهل دمج فعال لدوال التوزيع المكونة.

تظهر التحليلات الإضافية، الموضحة في الأشكال 3 و 4، أن SPQRx يلتقط الذيل العلوي للتوزيع اللوغاريتمي الطبيعي الحقيقي بشكل أكثر فعالية من SPQR، الذي يفشل في تقديم تقديرات مفيدة للتوزيعات الشرطية حيث $y > 1$. بالإضافة إلى ذلك، تكشف درجات أهمية المتغيرات المستمدة من نموذج SPQRx أن المتغيرات $X_1$ و $X_2$ تؤثر بشكل كبير على التوزيع الشرطي عبر جميع مستويات الكمية، بينما ليس لـ $X_3$ أي تأثير. تؤكد دراسات المحاكاة اللاحقة في أبعاد أعلى أن SPQRx يتفوق باستمرار على SPQR، خاصة في السيناريوهات ذات الذيل الثقيل، على الرغم من أنه يواجه تحديات في ظروف الذيل العلوي المحدود. يبرز تطبيق هذه النتائج على مناطق حرائق الغابات في الولايات المتحدة أهمية SPQRx في تحليل توزيعات البيانات ذات الذيل الثقيل.

المناقشة

في هذا القسم، يناقش المؤلفون إطار الانحدار الكمي شبه المعلمي (SPQR)، الذي قدمه شو ورايش (2021)، والذي يسمح بتقدير كثافة شرطية مرنة دون افتراضات معلمية حول التوزيع الأساسي. يستخدم نموذج SPQR دوال أساس سبلين M لتمثيل الكثافة الشرطية لمتغير الاستجابة $ Y $ بالنظر إلى المتغيرات $ X $، معبرًا عنها كمزيج محدب من هذه الدوال الأساسية. يتم تحديد أوزان النموذج باستخدام شبكة عصبية متعددة الطبقات (MLP)، مما يسهل الاستدلال والتنبؤ بكفاءة. كما يبرز المؤلفون استخدام التأثيرات المحلية التراكمية (ALEs) لتقييم الأهمية النسبية للمتغيرات على كميات معينة من توزيع الاستجابة، مما يوفر طريقة لت quantifying حساسية دالة الكمية للتغيرات في المتغيرات.

علاوة على ذلك، يقدم القسم توزيع باريتو العام المدمج (bGP) وتطبيقه في الانحدار الكمي شبه المعلمي للحدود القصوى (SPQRx). يجمع هذا النهج بين إطار SPQR مع نموذج انحدار باريتو العام، مما يسمح بنمذجة مرنة لكل من سلوكيات الكتلة والذيل من التوزيع. يصف المؤلفون بناء توزيع bGP، الذي يمزج بين توزيع الكتلة والذيل العلوي لتوزيع باريتو العام، مما يضمن الاستمرارية والدعم المناسب. تم تصميم نموذج SPQRx لاستيعاب تأثيرات المتغيرات على كل من جسم التوزيع وذيله، مما يتيح تقييمًا دقيقًا للأهمية النسبية للمتغيرات عبر مستويات الكمية المختلفة. يختتم القسم بمناقشة حول عملية الاستدلال لـ SPQRx، مع التأكيد على أهمية تنظيم النموذج والتحديات المحتملة المرتبطة بمعلمات الشكل السلبية في توزيع باريتو العام.

Journal: Extremes
DOI: https://doi.org/10.1007/s10687-026-00533-y
Publication Date: 2026-02-25
Author(s): Reetam Majumder et al.
Primary Topic: Statistical Methods and Inference

Overview

The section presents a novel approach to density regression known as semi-parametric quantile regression for extremes (SPQRx), which integrates semi-parametric quantile regression (SPQR) with a blended generalized Pareto (GP) distribution. SPQR is characterized by its flexibility in modeling conditional density functions using spline representations and neural networks, but it struggles with heavy-tailed data and extrapolation beyond observed ranges due to the lack of adherence to extreme value theory (EVT). The blended GP distribution addresses these limitations by offering a comprehensive model that smoothly transitions to exact GP upper-tails without requiring intermediate threshold selection.

SPQRx enhances the capabilities of SPQR by ensuring compliance with EVT, thereby improving reliability in tail extrapolation. Additionally, the model incorporates interpretability through model-agnostic variable importance scores, which assess the significance of covariates in determining both the bulk and tail of the conditional density. The effectiveness of SPQRx is demonstrated through simulations and an application to analyze U.S. wildfire burnt areas from 1990 to 2020, showcasing its practical utility in real-world scenarios.

Introduction

The introduction of this research paper discusses the significance of extreme quantile regression models in various fields, including econometrics, insurance, and natural hazard modeling. These models typically rely on a parametric assumption that excesses of a response variable $ Y $ above a threshold $ u(x) $ follow a generalized Pareto (GP) distribution, characterized by covariate-dependent shape and scale parameters $ \xi(x) $ and $ \sigma_u(x) $. However, the choice of the threshold $ u(x) $ introduces uncertainty and complicates model interpretability. To address this, the authors propose the blended GP (bGP) distribution, which allows for modeling the entire range of data without requiring threshold estimation while maintaining exact GP upper-tails.

The paper introduces a novel framework termed semi-parametric quantile regression for extremes (SPQRx), which combines a deep GP regression model for the upper-tails with a semi-parametric quantile regression model for the bulk of the distribution. This approach leverages the flexibility of the semi-parametric quantile regression (SPQR) model, which has previously shown promise in spatial extremal processes. The contributions of this work include the introduction of the bGP distribution, the development of the SPQRx framework for comprehensive density regression, and the enhancement of model interpretability through the use of accumulative local effects (ALEs). The subsequent sections of the paper will elaborate on the theoretical background, methodology, and applications of the proposed models.

Results

The results presented in Table 1 of the simulation study indicate that the SPQRx model, utilizing the sigmoid activation function, consistently outperforms both the traditional SPQR method and the deep GP regression model in estimating integrated Wasserstein distances (IWD) and tail IWD (tIWD). Specifically, SPQRx demonstrates superior efficacy in modeling heavy-tailed conditional densities, with optimal hyper-parameters revealing that a lower value of $c_1$ is advantageous in low sample settings, while a higher value is preferred with more data. The preferred blending quantile levels for the SPQRx model are identified as $p_a = 0.9$ and $p_b = 0.99$ when $n = 1000$, facilitating effective blending of the constituent distribution functions.

Further analysis, illustrated in Figures 3 and 4, shows that SPQRx captures the upper tails of the true log-normal distribution more effectively than SPQR, which fails to provide useful estimates for conditional distributions where $y > 1$. Additionally, variable importance scores derived from the SPQRx model reveal that covariates $X_1$ and $X_2$ significantly influence the conditional distribution across all quantile levels, while $X_3$ has no effect. Subsequent simulation studies in higher dimensions confirm that SPQRx consistently outperforms SPQR, particularly in heavy-tailed scenarios, although it encounters challenges in bounded upper-tail conditions. The application of these findings to U.S. wildfire burnt areas underscores the relevance of SPQRx in analyzing heavy-tailed data distributions.

Discussion

In this section, the authors discuss the semi-parametric quantile regression (SPQR) framework, introduced by Xu and Reich (2021), which allows for flexible conditional density estimation without parametric assumptions about the underlying distribution. The SPQR model utilizes M-spline basis functions to represent the conditional density of a response variable $ Y $ given covariates $ X $, expressed as a convex combination of these basis functions. The model’s weights are determined using a multi-layer perceptron (MLP), which facilitates efficient inference and prediction. The authors also highlight the use of accumulated local effects (ALEs) to assess the relative importance of covariates on specific quantiles of the response distribution, providing a method to quantify the sensitivity of the quantile function to changes in covariates.

Furthermore, the section introduces the blended generalized Pareto distribution (bGP) and its application in semi-parametric quantile regression for extremes (SPQRx). This approach combines the SPQR framework with a generalized Pareto regression model, allowing for flexible modeling of both the bulk and tail behaviors of the distribution. The authors describe the construction of the bGP distribution, which blends a bulk distribution with the upper tail of a generalized Pareto distribution, ensuring continuity and appropriate support. The SPQRx model is designed to accommodate covariate effects on both the body and tail of the distribution, enabling a nuanced assessment of relative variable importance across different quantile levels. The section concludes with a discussion on the inference process for SPQRx, emphasizing the importance of model regularization and the potential challenges associated with negative shape parameters in the generalized Pareto distribution.