تحليل مقارن للانحدار الرئيسي والانحدار على التلال في معالجة التعدد الخطي COMPARATIVE ANALYSIS OF RIDGE AND PRINCIPAL COMPONENT REGRESSION IN ADDRESSING MULTICOLLINEARITY

المجلة: FUDMA Journal of Sciences، المجلد: 9، العدد: 1
DOI: https://doi.org/10.33003/fjs-2025-0901-2981
تاريخ النشر: 2025-01-31
المؤلف: Kingsley Chinedu Arum وآخرون
الموضوع الرئيسي: طرق ونماذج إحصائية متقدمة

نظرة عامة

تناقش هذه القسم قضية التعدد الخطي في نماذج الانحدار الخطي المتعدد (MLRM)، حيث يكون هناك ارتباط بين متغيرين أو أكثر، مما يؤدي إلى تقديرات غير فعالة وأداء ضعيف للنموذج. تركز الدراسة على مقارنة فعالية تقديرين – تقدير الحافة (RE) وتقدير المكونات الرئيسية (PCE) – في التخفيف من آثار التعدد الخطي. باستخدام بيانات ثانوية من البنك الدولي وصندوق النقد الدولي (IMF) ومكتب إدارة الدين النيجيري، أثبت الباحثون وجود التعدد الخطي من خلال مصفوفة الارتباط وعوامل تضخم التباين (VIF).

لتقييم أداء الطريقتين، تم تطبيق الانحدار الحدي والانحدار بالمكونات الرئيسية على مجموعة البيانات، وتم حساب متوسط الأخطاء المربعة (MSE) كمعيار لأداء النموذج. تشير النتائج إلى أنه بينما عالجت الطريقتان التعدد الخطي بشكل فعال، تفوق تقدير الحافة على تقدير المكونات الرئيسية، حيث أظهر متوسط خطأ مربع أصغر. وهذا يشير إلى أن الانحدار الحدي قد يكون نهجًا أكثر موثوقية للتعامل مع التعدد الخطي في MLRM.

مقدمة

تستعرض مقدمة ورقة البحث أهمية تحليل الانحدار في فهم العلاقات بين المتغيرات عبر مجالات مختلفة، بما في ذلك الهندسة والعلوم الاجتماعية. تؤكد على فائدة نماذج الانحدار الخطي (LRM) في تقدير العلاقات بين متغير استجابة ($y$) وواحد أو أكثر من المتغيرات التفسيرية، حيث يتم تقدير المعاملات عادةً باستخدام طريقة المربعات الصغرى. ومع ذلك، تسلط الورقة الضوء على تحدي التعدد الخطي، وهو حالة يكون فيها المتغيرات التفسيرية مرتبطة بشكل كبير، مما قد يؤدي إلى تقديرات معامل غير مستقرة وتقليل موثوقية النموذج. يشير المؤلفون إلى أن التعدد الخطي يمكن أن ينشأ من مصادر مختلفة، بما في ذلك طرق جمع البيانات وقضايا تحديد النموذج.

لكشف التعدد الخطي، تناقش الورقة عدة أدوات تشخيصية، مثل مصفوفة الارتباط، وعامل تضخم التباين (VIF)، والقيم الذاتية، وأرقام الحالة. تهدف الدراسة إلى مقارنة فعالية الانحدار الحدي (RE) والانحدار بالمكونات الرئيسية (PCE) في معالجة قضايا التعدد الخطي ضمن مجموعات البيانات المالية، باستخدام متوسط الأخطاء المربعة (MSE) كمعيار للأداء. يتم الاستشهاد بأبحاث سابقة، تظهر تفوق طرق الانحدار الحدي على المربعات الصغرى العادية (OLS) في وجود التعدد الخطي، مع الإشارة أيضًا إلى تطوير تقديرات جديدة تجمع بين أساليب مختلفة لتعزيز القوة. الهدف الشامل هو تحديد التقدير الأكثر ملاءمة لتناسب نماذج الانحدار في وجود التعدد الخطي، وبالتالي المساهمة في تحليل إحصائي أكثر موثوقية في تطبيقات متنوعة.

الطرق

في هذا القسم، يوضح المؤلفون المنهجية المستخدمة لتحليل تأثير المتغيرات المتعددة على متغير الاستجابة باستخدام نموذج الانحدار الخطي المتعدد (LRM). يتم تمثيل النموذج بالمعادلة $ Y = X\beta + \epsilon $، حيث $ Y $ هو متجه $ n \times 1 $ لمتغير الاستجابة، و$ X $ هو مصفوفة $ n \times p $ من المتغيرات المستقلة، و$ \beta $ هو متجه $ p \times 1 $ من المعاملات غير المعروفة، و$ \epsilon $ هو متجه $ n \times 1 $ من الأخطاء العشوائية مع القيمة المتوقعة $ E(\epsilon) = 0 $ والتباين $ Var(\epsilon) = \sigma^2 I_n $، حيث $ I_n $ هي مصفوفة الهوية.

يتم تقدير المعاملات غير المعروفة $ \beta $ باستخدام مقدر المربعات الصغرى (LSE)، المعطى بـ $ \hat{\beta} = (X’X)^{-1}X’Y $. ومع ذلك، يشير المؤلفون إلى أن LSE يمكن أن يتأثر سلبًا بالتعدد الخطي بين المتغيرات التفسيرية. للتخفيف من هذه المشكلة، يستخدمون الانحدار الحدي والانحدار بالمكونات الرئيسية كطرق تقدير بديلة. تهدف الدراسة إلى مقارنة فعالية هاتين الطريقتين في سياق تحليل البيانات المالية، مع معالجة القيود التي تفرضها التعدد الخطي في LSE التقليدي.

المناقشة

في هذا القسم، يوضح المؤلفون منهجيتهم لتحليل تأثير مؤشرات اقتصادية مختلفة على الاستثمار الإجمالي باستخدام الانحدار الحدي والانحدار بالمكونات الرئيسية (PCR). تستخدم الدراسة بيانات ثانوية تمتد على 51 عامًا (1972-2022) من مصادر موثوقة، بما في ذلك البنك الدولي وصندوق النقد الدولي. تم توحيد البيانات لضمان قابلية المقارنة بين المتغيرات التفسيرية، والتي تشمل الناتج المحلي الإجمالي الحقيقي، والصادرات الصافية، وأسعار الفائدة، وعرض النقود، ومعدلات التضخم، والاحتياطيات الخارجية، وأسعار الصرف، والدين. تم إجراء التحليل باستخدام SPSS لـ PCR وR للانحدار الحدي، مع استخدام الإحصاءات الوصفية ومصفوفات الارتباط لتقييم التعدد الخطي بين المتغيرات التفسيرية.

تشير النتائج إلى وجود تعدد خطي كبير، خاصة بين الناتج المحلي الإجمالي الحقيقي والعديد من المتغيرات التفسيرية، مما يستلزم استخدام الانحدار الحدي للتخفيف من آثاره. تفوق مقدر الانحدار الحدي، الذي يقدم تحيزًا لتقليل التباين وتحسين متوسط الأخطاء المربعة (MSE)، على نهج PCR، حيث حقق MSE قدره 0.3777 مقارنة بـ 0.571 لـ PCR. تحدد الدراسة ثلاثة مكونات رئيسية تفسر حوالي 88% من التباين في المتغيرات المستقلة، مصنفة إلى مؤشرات نمو، وسياسات نقدية، ومعدلات. في النهاية، يُفضل نموذج الانحدار الحدي لأدائه المتفوق في معالجة التعدد الخطي، مما يوفر إطارًا قويًا لفهم ديناميات الاستثمار الإجمالي في سياق المؤشرات الاقتصادية التي تم تحليلها.

Journal: FUDMA Journal of Sciences, Volume: 9, Issue: 1
DOI: https://doi.org/10.33003/fjs-2025-0901-2981
Publication Date: 2025-01-31
Author(s): Kingsley Chinedu Arum et al.
Primary Topic: Advanced Statistical Methods and Models

Overview

This section discusses the issue of multicollinearity in multiple linear regression models (MLRM), where two or more regressors are correlated, leading to inefficient estimates and poor model performance. The study focuses on comparing the effectiveness of two estimators—ridge estimator (RE) and principal components estimator (PCE)—in mitigating the effects of multicollinearity. Utilizing secondary data from the World Bank, International Monetary Fund (IMF), and the Nigerian Debt Management Office, the researchers established the presence of multicollinearity through a correlation matrix and Variance Inflation Factors (VIF).

To evaluate the performance of the two methods, ridge regression and principal components regression were applied to the dataset, and their mean squared errors (MSE) were calculated as a measure of model performance. The findings indicate that while both methods effectively addressed multicollinearity, the ridge estimator outperformed the principal components estimator, exhibiting a smaller mean squared error. This suggests that ridge regression may be a more reliable approach for handling multicollinearity in MLRM.

Introduction

The introduction of the research paper outlines the significance of regression analysis in understanding the relationships among variables across various fields, including engineering and social sciences. It emphasizes the utility of linear regression models (LRM) in estimating relationships between a response variable ($y$) and one or more explanatory variables, with coefficients typically estimated using the least squares method. However, the paper highlights the challenge of multicollinearity, a condition where explanatory variables are highly correlated, which can lead to unstable coefficient estimates and reduced model reliability. The authors note that multicollinearity can arise from various sources, including data collection methods and model specification issues.

To detect multicollinearity, the paper discusses several diagnostic tools, such as the correlation matrix, Variance Inflation Factor (VIF), eigenvalues, and condition numbers. The study aims to compare the effectiveness of ridge regression (RE) and principal components regression (PCE) in addressing multicollinearity issues within financial datasets, utilizing the mean squared error (MSE) as a performance criterion. Previous research is cited, demonstrating the superiority of ridge regression methods over ordinary least squares (OLS) in the presence of multicollinearity, while also noting the development of new estimators that combine various approaches to enhance robustness. The overarching goal is to identify the most suitable estimator for fitting regression models in the presence of multicollinearity, thereby contributing to more reliable statistical analysis in diverse applications.

Methods

In this section, the authors outline the methodology employed to analyze the impact of multiple regressors on a response variable using a multiple linear regression model (LRM). The model is represented by the equation $ Y = X\beta + \epsilon $, where $ Y $ is an $ n \times 1 $ vector of the response variable, $ X $ is an $ n \times p $ matrix of regressors, $ \beta $ is a $ p \times 1 $ vector of unknown coefficients, and $ \epsilon $ is an $ n \times 1 $ vector of random errors with expected value $ E(\epsilon) = 0 $ and variance $ Var(\epsilon) = \sigma^2 I_n $, where $ I_n $ is the identity matrix.

The unknown coefficients $ \beta $ are estimated using the least squares estimator (LSE), given by $ \hat{\beta} = (X’X)^{-1}X’Y $. However, the authors note that LSE can be adversely affected by multicollinearity among the explanatory variables. To mitigate this issue, they employ ridge regression and principal component regression as alternative estimation methods. The research aims to compare the effectiveness of these two methods in the context of financial data analysis, addressing the limitations posed by multicollinearity in traditional LSE.

Discussion

In this section, the authors detail their methodology for analyzing the impact of various economic indicators on aggregate investment using ridge regression and principal components regression (PCR). The study utilizes secondary data spanning 51 years (1972-2022) from reputable sources, including the World Bank and the International Monetary Fund. The data was standardized to ensure comparability among predictor variables, which include real GDP, net exports, interest rates, money supply, inflation rates, external reserves, exchange rates, and debt. The analysis was conducted using SPSS for PCR and R for ridge regression, with descriptive statistics and correlation matrices employed to assess multicollinearity among predictors.

The findings indicate significant multicollinearity, particularly between real GDP and several predictors, necessitating the use of ridge regression to mitigate its effects. The ridge regression estimator, which introduces a bias to reduce variance and improve mean squared error (MSE), outperformed the PCR approach, achieving an MSE of 0.3777 compared to PCR’s 0.571. The study identifies three principal components that explain approximately 88% of the variance in the independent variables, categorized into growth indices, monetary policies, and rates. Ultimately, the ridge regression model is favored for its superior performance in addressing multicollinearity, providing a robust framework for understanding the dynamics of aggregate investment in the context of the analyzed economic indicators.