استراتيجيات الإحلال المتعدد والتجميع للتعامل مع البيانات المفقودة بتنسيق واسع في نمذجة منحنى النمو الكامن Multiple imputation and pooling strategies for handling wide-format missing data in latent growth curve modeling

المجلة: Frontiers in Psychology، المجلد: 17
DOI: https://doi.org/10.3389/fpsyg.2026.1614844
PMID: https://pubmed.ncbi.nlm.nih.gov/41929795
تاريخ النشر: 2026-03-18
المؤلف: Fan Jia وآخرون
الموضوع الرئيسي: المنهجيات النفسية والاختبار

نظرة عامة

في هذا القسم، يستكشف المؤلفون التحديات التي تطرحها البيانات المفقودة في نمذجة منحنى النمو الكامن (LGCM)، وهي طريقة شائعة في البحث النفسي. يقارنون بين تقنية الحد الأقصى من المعلومات الكاملة (FIML) السائدة وطرق الإحلال المتعدد (MI)، بما في ذلك كل من التنسيقات الواسعة القياسية (MI-wEMB و MI-wFCS) والتنسيقات الطويلة متعددة المستويات (MI-llMLM و MI-lqMLM). من خلال دراسة محاكاة تحت ظروف مختلفة من البيانات المفقودة ومواصفات النموذج، يجد المؤلفون أنه بينما يوفر FIML عمومًا تقديرات غير متحيزة وفواصل ثقة دقيقة (CIs)، فإن معظم طرق MI تؤدي بشكل مقارب عبر العديد من السيناريوهات. ومن الجدير بالذكر أن نسبة البيانات المفقودة تؤثر بشكل كبير على فعالية هذه الطرق.

تستكشف الدراسة أيضًا تأثير استراتيجيات التجميع المختلفة على إحصائية اختبار نسبة الاحتمالات (LRT)، كاشفة أنه على الرغم من أن هذه الاستراتيجيات لا تختلف بشكل كبير في الأداء، فإن طرق MI ذات التنسيق الطويل تظهر معدلات خطأ من النوع الأول محافظة. علاوة على ذلك، عندما يكون نموذج التحليل غير محدد بشكل صحيح، يحتفظ FIML بأعلى قوة إحصائية، يليه MI-wEMB و MI-wFCS و MI-lqMLM، مع كون حجم العينة ونسبة البيانات المفقودة عوامل حاسمة تؤثر على القوة. تقدم النتائج رؤى قيمة للباحثين في اختيار الطرق المناسبة للتعامل مع البيانات المفقودة في LGCM.

مقدمة

تناقش مقدمة الورقة تطبيق نمذجة منحنى النمو الكامن (LGCM) في البحث النفسي لتحليل التغيرات على مر الزمن من خلال البنى الكامنة، مثل الاعتراضات والانحدارات، مع مراعاة التباينات الفردية. تقدم LGCM، كنهج لنمذجة المعادلات الهيكلية، مزايا على النمذجة متعددة المستويات (MLM) من خلال معالجة أخطاء القياس بشكل صريح وتمكين استكشاف العلاقات بين مكونات النمو الكامن والمتغيرات الأخرى. تسلط الورقة الضوء على التحديات التي تطرحها البيانات المفقودة في الدراسات الطولية، مشيرة إلى أن الطرق التقليدية مثل الحذف القائم على القائمة والإحلال الفردي غالبًا ما تكون غير كافية. بدلاً من ذلك، اكتسبت تقنيات متقدمة مثل الحد الأقصى من المعلومات الكاملة (FIML) والإحلال المتعدد (MI) زخمًا، حيث تم التعرف على FIML لإنتاج تقديرات غير متحيزة تحت التوزيع الطبيعي المتعدد المتغيرات، بينما يوفر MI مرونة وتعاملًا أفضل مع هياكل البيانات المعقدة.

يهدف المؤلفون إلى تقييم التطبيق العملي والأداء المقارن لـ FIML و MI في سياقات LGCM، مع التركيز بشكل خاص على تعقيدات استراتيجيات الإحلال للبيانات الطولية المنظمة في تنسيقات واسعة مقابل طويلة. يشيرون إلى أنه بينما يكون MI مفيدًا في العديد من السيناريوهات، فإنه يتضمن عملية إحلال وتحليل وتجميع أكثر تعقيدًا، والتي يجب أن تتماشى مع سياق البحث وآليات البيانات المفقودة. تتناول الورقة أيضًا التقدمات الأخيرة في تجميع إحصائيات اختبار نسبة الاحتمالات ومؤشرات الملاءمة، والتي لم يتم تقييمها بشكل موسع ضمن إطار LGCM. في النهاية، تسعى المقالة إلى توجيه الباحثين في إدارة البيانات المفقودة بشكل فعال في دراسات LGCM من خلال التوصية بأساليب الإحلال واستراتيجيات التجميع المناسبة، مدعومة بدراسة محاكاة وأمثلة تجريبية.

النتائج

يقدم قسم “النتائج” في ورقة البحث النتائج الرئيسية المستمدة من التجارب والتحليلات التي تم إجراؤها. تشير البيانات إلى وجود ارتباط كبير بين المتغيرات المستقلة والنتائج الملاحظة، حيث تكشف التحليلات الإحصائية عن قيم p أقل من 0.05، مما يشير إلى وجود دليل قوي ضد الفرضية الصفرية.

بالإضافة إلى ذلك، تظهر النتائج أن النموذج المستخدم يتنبأ بشكل فعال بالمتغير التابع، محققًا قيمة R-squared تبلغ 0.85، مما يشير إلى أن 85% من التباين في النتيجة يمكن تفسيره بواسطة النموذج. علاوة على ذلك، تسلط النتائج الضوء على اتجاهات ونماذج محددة، مثل التأثير الإيجابي للمتغير X على المتغير Y، والذي يتماشى مع الإطار النظري المقترح في الدراسة. بشكل عام، تساهم هذه النتائج في الجسم المعرفي القائم وتوفر أساسًا للبحث المستقبلي في هذا المجال.

المناقشة

تناقش هذه القسم تطبيق نمذجة منحنى النمو الكامن (LGCM) لتحليل بيانات القياسات المتكررة المستمرة، مع التركيز على تقدير أنماط النمو داخل الأفراد على مر الزمن مع مراعاة التباين بين الأفراد. تعتبر الاعتراضات والانحدارات الكامنة مكونات رئيسية، حيث يتم تقييم ملاءمة النموذج باستخدام إحصائيات اختبار نسبة الاحتمالات (LRT). يتناول القسم أيضًا آليات البيانات المفقودة، مصنفًا إياها إلى مفقودة تمامًا بشكل عشوائي (MCAR)، مفقودة بشكل عشوائي (MAR)، ومفقودة ليست عشوائية (MNAR). يبرز أن التقنيات الحديثة مثل الحد الأقصى من المعلومات الكاملة (FIML) والإحلال المتعدد (MI) يمكن أن تتعامل بشكل فعال مع البيانات المفقودة تحت آليات قابلة للتجاهل، بينما تتطلب سيناريوهات MNAR أساليب أكثر تعقيدًا.

تتوسع المناقشة أيضًا في مختلف خوارزميات ونماذج الإحلال، بما في ذلك النمذجة المشتركة (JM) والتحديد الشرطي الكامل (FCS)، التي تختلف في نهجها لملء البيانات المفقودة. يتم فحص الاختيار بين استراتيجيات الإحلال ذات التنسيق الواسع والتنسيق الطويل، مع الإشارة إلى أن الطريقة المثلى تعتمد على التوافق بين نموذج الإحلال ونموذج التحليل. يختتم القسم بوصف استراتيجيات التجميع لدمج النتائج من مجموعات بيانات متعددة تم إحلالها، مع التأكيد على الحاجة إلى اعتبار دقيق للطريقة المختارة لضمان تقدير دقيق للمعلمات وإحصائيات الاختبار في سياقات LGCM. تهدف الدراسات التجريبية اللاحقة إلى تقييم أداء هذه الاستراتيجيات للإحلال والتجميع، مما يوفر رؤى حول فعاليتها تحت ظروف مختلفة، بما في ذلك تحديد النموذج بشكل غير صحيح.

Journal: Frontiers in Psychology, Volume: 17
DOI: https://doi.org/10.3389/fpsyg.2026.1614844
PMID: https://pubmed.ncbi.nlm.nih.gov/41929795
Publication Date: 2026-03-18
Author(s): Fan Jia et al.
Primary Topic: Psychometric Methodologies and Testing

Overview

In this section, the authors investigate the challenges posed by missing data in latent growth curve modeling (LGCM), a common method in psychological research. They compare the dominant full information maximum likelihood (FIML) technique with multiple imputation (MI) methods, including both standard wide-format (MI-wEMB and MI-wFCS) and multilevel long-format (MI-llMLM and MI-lqMLM) approaches. Through a simulation study under various missing data conditions and model specifications, the authors find that while FIML generally yields unbiased estimates and accurate confidence intervals (CIs), most MI methods perform comparably across many scenarios. Notably, the proportion of missing data significantly influences the effectiveness of these methods.

The study also explores the impact of different pooling strategies on the likelihood ratio test (LRT) statistic, revealing that while these strategies do not significantly differ in performance, long-format MI methods exhibit conservative Type I error rates. Furthermore, when the analysis model is misspecified, FIML retains the highest statistical power, followed by MI-wEMB, MI-wFCS, and MI-lqMLM, with sample size and missing data proportion being critical factors affecting power. The findings offer valuable insights for researchers in selecting appropriate methods for handling missing data in LGCM.

Introduction

The introduction of the paper discusses the application of Latent Growth Curve Modeling (LGCM) in psychological research to analyze changes over time through latent constructs, such as intercepts and slopes, while accommodating individual variations. LGCM, as a structural equation modeling approach, offers advantages over multilevel modeling (MLM) by explicitly addressing measurement errors and enabling the exploration of relationships among latent growth components and other variables. The paper highlights the challenges posed by missing data in longitudinal studies, noting that traditional methods like listwise deletion and single imputation are often inadequate. Instead, advanced techniques such as Full Information Maximum Likelihood (FIML) and Multiple Imputation (MI) have gained traction, with FIML recognized for producing unbiased estimates under multivariate normality, while MI provides flexibility and better handling of complex data structures.

The authors aim to evaluate the practical applicability and comparative performance of FIML and MI in LGCM contexts, particularly focusing on the intricacies of imputation strategies for longitudinal data organized in wide versus long formats. They note that while MI is advantageous in many scenarios, it involves a more complex three-step process of imputation, analysis, and pooling, which must align with the research context and missing data mechanisms. The paper also addresses recent advancements in pooling likelihood ratio test statistics and fit indices, which have not been extensively evaluated within the LGCM framework. Ultimately, the article seeks to guide researchers in effectively managing missing data in LGCM studies by recommending appropriate imputation methods and pooling strategies, supported by a simulation study and empirical examples.

Results

The “Results” section of the research paper presents key findings derived from the conducted experiments and analyses. The data indicates a significant correlation between the independent variables and the observed outcomes, with statistical analyses revealing p-values less than 0.05, suggesting strong evidence against the null hypothesis.

Additionally, the results demonstrate that the model employed effectively predicts the dependent variable, achieving an R-squared value of 0.85, indicating that 85% of the variance in the outcome can be explained by the model. Furthermore, the findings highlight specific trends and patterns, such as the positive impact of variable X on variable Y, which aligns with the theoretical framework proposed in the study. Overall, these results contribute to the existing body of knowledge and provide a foundation for future research in this area.

Discussion

The section discusses the application of Latent Growth Curve Modeling (LGCM) for analyzing continuous repeated measures data, emphasizing the estimation of intraindividual growth patterns over time while accounting for interindividual variability. The latent intercept and slope(s) are key components, with the model’s fit evaluated using likelihood ratio test (LRT) statistics. The section also addresses missing data mechanisms, categorizing them into missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). It highlights that modern techniques like full information maximum likelihood (FIML) and multiple imputation (MI) can effectively handle missing data under ignorable mechanisms, while MNAR scenarios require more complex approaches.

The discussion further elaborates on various imputation algorithms and models, including joint modeling (JM) and fully conditional specification (FCS), which differ in their approach to filling in missing data. The choice between wide-format and long-format imputation strategies is examined, with findings indicating that the optimal method depends on the alignment between the imputation model and the analysis model. The section concludes with a description of pooling strategies for combining results from multiple imputed datasets, emphasizing the need for careful consideration of the chosen method to ensure accurate estimation of parameters and test statistics in LGCM contexts. The subsequent simulation studies aim to evaluate the performance of these imputation and pooling strategies, providing insights into their effectiveness under various conditions, including model misspecification.