النمذجة الاستكشافية للمعادلات الهيكلية ولعنة الأبعاد Exploratory structural equation modeling and the curse of dimensionality

المجلة: Behavior Research Methods، المجلد: 58، العدد: 3
DOI: https://doi.org/10.3758/s13428-026-02960-y
PMID: https://pubmed.ncbi.nlm.nih.gov/41814057
تاريخ النشر: 2026-03-11
المؤلف: Trà T. Lê وآخرون
الموضوع الرئيسي: المنهجيات النفسية والاختبار

نظرة عامة

تقدم ورقة البحث نهجًا من الجيل التالي لعلوم السلوكيات يعالج التحديات التي تطرحها البيانات عالية الأبعاد وأحجام العينات المحدودة في الطرق التقليدية القائمة على المتغيرات الكامنة. غالبًا ما تؤدي هذه الطرق التقليدية إلى حلول غير مستقرة عندما يتجاوز عدد المتغيرات حجم العينة. للتغلب على هذه القيود، يقترح المؤلفون نهجًا منتظمًا من مرحلتين لنمذجة المعادلات الهيكلية الاستكشافية (ESEM).

في المرحلة الأولى، يتم تقديم تقنية جديدة لتحليل العوامل التقريبية الاستكشافية، والتي تقدر كل من نموذج القياس ودرجات العوامل. تستخدم هذه التقنية طرق التنظيم، وتحديدًا عقوبة LASSO وقيود الكاردينالية، لمعالجة عدم تحديد نموذج القياس من خلال فرض هيكل بسيط. في المرحلة الثانية، تُستخدم درجات العوامل المقدرة لتقدير النموذج الهيكلي. تُظهر المحاكاة الواسعة أن هذه الطريقة المقترحة تتفوق على الأساليب الحالية في استعادة الهيكل البسيط الأساسي لنموذج القياس بدقة عبر أبعاد وأحجام عينات متغيرة. يتم توضيح فعالية الطريقة بشكل أكبر من خلال تطبيقها على مجموعتين من البيانات التجريبية، ويتم توفير تنفيذ متاح للجمهور في برنامج R على الرابط المقدم في GitHub.

الطرق

في هذا القسم، يحدد المؤلفون طرقهم المقترحة التي تهدف إلى معالجة إعدادات حجم العينة المنخفضة عالية الأبعاد (HDLSS)، مع التركيز على استنتاج هياكل العوامل القابلة للتفسير واستكشاف العلاقات بين العوامل والمتغيرات المرصودة. يصنفون الأساليب ذات الصلة الموجودة إلى ثلاث مجموعات: طرق الاحتمالات القصوى المعاقبة (ML)، طرق المربعات الصغرى المعاقبة، والطرق البايزية، ملخصين ميزاتها الرئيسية في الجدول 3. تشمل الطرق الملحوظة للـ ML المعاقبة RegSEM وlslx وPSEM-Mplus، والتي تستخدم خوارزميات تحسين واستراتيجيات عقوبة مختلفة لاستقرار التقدير وتعزيز القابلية للتفسير. ومع ذلك، لم يتم تصميم هذه الطرق خصيصًا لمشاكل HDLSS وقد تم تقييمها بشكل أساسي في سياقات منخفضة الأبعاد.

بالإضافة إلى ذلك، يناقش المؤلفون تحليل العوامل الهيكلية (SFA) وتحليل المكونات الهيكلية العامة المنتظمة (Regularized IGSCA)، وكلاهما يتضمن عقوبات لمعالجة قضايا مثل التعدد الخطي والحلول غير الصحيحة في نمذجة المعادلات الهيكلية التقليدية القائمة على التباين. كما يتم ذكر الإطار البايزي كبديل قابل للتطبيق لنماذج SEM المعقدة، على الرغم من أنه يطرح تحديات تتعلق بتحديد الأولويات وكثافة الحساب. يؤكد المؤلفون أن طريقتهم تميز نفسها من خلال استخدام إطار عمل للمربعات الصغرى من مرحلتين مع التركيز على التنظيم من خلال قيود LASSO أو الكاردينالية، مما يسمح بنماذج قياس استكشافية وحلول قابلة للتوسع لمجموعات البيانات الكبيرة. يتم تحديد قوة الندرة المفروضة باستخدام معيار مدفوع بالبيانات، وهو مؤشر الندرة، مما يعزز قابلية تفسير مصفوفة التحميل.

النتائج

تشير نتائج دراسة المحاكاة إلى أن طريقة RegSEM واجهت مشاكل كبيرة في التقارب، مما أدى إلى عدم تحقيق نتائج موثوقة حتى في السيناريوهات البسيطة الخالية من التعقيدات مثل التحميلات المتقاطعة والموثوقية العالية. وبالتالي، قرر المؤلفون استبعاد نتائج RegSEM من التحليل العام. لاستكشاف نتائج المحاكاة بشكل أكبر، يمكن للقراء المهتمين الوصول إلى تطبيق تفاعلي R Shiny على الرابط المقدم (Le et al., 2025).

المناقشة

في هذا القسم، يقدم المؤلفون مناقشة شاملة حول نموذج المعادلات الهيكلية المقترح (SEM) لتحليل البيانات عالية الأبعاد، مع التركيز على مكونات القياس والهيكل للنموذج. يتم تعريف نموذج القياس بواسطة المعادلة \( \mathbf{y} = \mathbf{P} \boldsymbol{\eta} + \boldsymbol{\epsilon} \)، حيث \( \mathbf{P} \) هي مصفوفة التحميل، و\( \boldsymbol{\eta} \) تمثل درجات العوامل، و\( \boldsymbol{\epsilon} \) تشير إلى المتبقيات. يقدم المؤلفون نموذج العوامل التقريبية، الذي يسمح بوجود ارتباطات ضعيفة بين المتبقيات، وهو تعديل ضروري في الإعدادات عالية الأبعاد حيث قد لا تنطبق الافتراضات التقليدية. يتم الاقتراب من تقدير نموذج القياس من خلال طريقة المربعات الصغرى، مع تضمين عقوبة LASSO لفرض الندرة في مصفوفة التحميل، مما يسهل القابلية للتفسير والتفرد في هيكل العوامل.

يقدم المؤلفون مزيدًا من التفاصيل حول تقدير النموذج الهيكلي، الممثل بواسطة \( \boldsymbol{\eta} = \mathbf{B} \boldsymbol{\eta} + \boldsymbol{\zeta} \)، حيث تحتوي \( \mathbf{B} \) على معاملات المسار. يقترحون استراتيجية اختيار نموذج تسلسلي تتضمن تحديد عدد العوامل ومعامل الضبط لعقوبة LASSO بناءً على مؤشر الندرة (IS)، الذي يوازن بين ملاءمة النموذج وتعقيده. يتم إجراء دراسة محاكاة لتقييم أداء الطرق المقترحة مقارنة بالأساليب الحالية، مما يكشف أن الطريقة المقيدة للكاردينالية تتفوق باستمرار على الآخرين في استعادة الهيكل الحقيقي للتحميل، خاصة في الإعدادات منخفضة الأبعاد. تشير النتائج إلى أنه بينما تحقق كلتا الطريقتين المقترحتين تقديرات موثوقة، فإن النهج المقيد للكاردينالية يظهر أداءً متفوقًا من حيث معدلات الاستعادة والانحياز، خاصة في ظل ظروف صعبة مثل أحجام العينات المنخفضة وارتفاع ارتباطات العناصر.

Journal: Behavior Research Methods, Volume: 58, Issue: 3
DOI: https://doi.org/10.3758/s13428-026-02960-y
PMID: https://pubmed.ncbi.nlm.nih.gov/41814057
Publication Date: 2026-03-11
Author(s): Trà T. Lê et al.
Primary Topic: Psychometric Methodologies and Testing

Overview

The research paper presents a next-generation approach to behavioral sciences that addresses the challenges posed by high-dimensional data and limited sample sizes in traditional latent-variable methods. These conventional methods often yield unstable solutions when the number of variables exceeds the sample size. To overcome this limitation, the authors propose a two-stage regularized approach for exploratory structural equation modeling (ESEM).

In the first stage, a novel exploratory approximate factor analysis technique is introduced, which estimates both the measurement model and the factor scores. This technique employs regularization methods, specifically LASSO penalty and cardinality constraints, to address the indeterminacy of the measurement model by imposing a simple structure. In the second stage, the estimated factor scores are utilized to estimate the structural model. Extensive simulations demonstrate that this proposed method outperforms existing approaches in accurately recovering the underlying simple structure of the measurement model across varying dimensions and sample sizes. The method’s efficacy is further illustrated through its application to two empirical datasets, and an implementation is made publicly available in R software at the provided GitHub link.

Methods

In this section, the authors outline their proposed methods aimed at addressing high-dimensional low sample size (HDLSS) settings, focusing on deriving interpretable factor structures and exploring relationships among factors and observed variables. They categorize existing related methods into three groups: penalized maximum likelihood (ML) methods, penalized least-squares methods, and Bayesian methods, summarizing their key features in Table 3. Notable penalized ML approaches include RegSEM, lslx, and PSEM-Mplus, which utilize different optimization algorithms and penalty strategies to stabilize estimation and enhance interpretability. However, these methods have not been specifically designed for HDLSS problems and have primarily been evaluated in low-dimensional contexts.

Additionally, the authors discuss Structured Factor Analysis (SFA) and Regularized Generalized Structured Component Analysis (Regularized IGSCA), both of which incorporate penalties to address issues like multicollinearity and improper solutions in traditional covariance-based SEM. The Bayesian framework is also mentioned as a viable alternative for complex SEM models, though it poses challenges related to prior specification and computational intensity. The authors emphasize that their method distinguishes itself by employing a two-stage least-squares framework with a focus on regularization through LASSO or cardinality constraints, allowing for exploratory measurement models and scalable solutions for large datasets. The strength of the imposed sparsity is determined using a data-driven criterion, the Index of Sparseness, thereby enhancing the interpretability of the loading matrix.

Results

The results of the simulation study indicate that the RegSEM method encountered significant convergence issues, failing to yield reliable outcomes even in straightforward scenarios devoid of complexities such as cross-loadings and high reliability. Consequently, the authors decided to exclude RegSEM results from the overall analysis. For further exploration of the simulation outcomes, interested readers can access an interactive R Shiny application at the provided link (Le et al., 2025).

Discussion

In this section, the authors present a comprehensive discussion on the proposed structural equation model (SEM) for analyzing high-dimensional data, emphasizing the measurement and structural components of the model. The measurement model is defined by the equation \( \mathbf{y} = \mathbf{P} \boldsymbol{\eta} + \boldsymbol{\epsilon} \), where \( \mathbf{P} \) is the loading matrix, \( \boldsymbol{\eta} \) represents the factor scores, and \( \boldsymbol{\epsilon} \) denotes residuals. The authors introduce the approximate factor model, which allows for weak correlations among residuals, a necessary adjustment in high-dimensional settings where traditional assumptions may not hold. The estimation of the measurement model is approached through a least-squares method, incorporating a LASSO penalty to enforce sparsity in the loading matrix, thus facilitating interpretability and uniqueness in the factor structure.

The authors further detail the estimation of the structural model, represented by \( \boldsymbol{\eta} = \mathbf{B} \boldsymbol{\eta} + \boldsymbol{\zeta} \), where \( \mathbf{B} \) contains path coefficients. They propose a sequential model selection strategy that involves determining the number of factors and the tuning parameter for the LASSO penalty based on the Index of Sparseness (IS), which balances model fit and complexity. A simulation study is conducted to evaluate the performance of the proposed methods against existing approaches, revealing that the cardinality-constrained method consistently outperforms others in recovering the true loading structure, particularly in low-dimensional settings. The results indicate that while both proposed methods yield reliable estimates, the cardinality-constrained approach exhibits superior performance in terms of recovery rates and bias, especially under challenging conditions such as low sample sizes and high item correlations.