الإطار المفاهيمي كدليل لاختيار طريقة الإكمال المناسبة للقيم المفقودة في مجموعة بيانات سريرية منظمة Conceptual framework as a guide to choose appropriate imputation method for missing values in a clinical structured dataset

المجلة: BMC Medical Research Methodology، المجلد: 25، العدد: 1
DOI: https://doi.org/10.1186/s12874-025-02496-3
PMID: https://pubmed.ncbi.nlm.nih.gov/39979819
تاريخ النشر: 2025-02-20
المؤلف: Marziyeh Afkanpour وآخرون
الموضوع الرئيسي: طرق إحصائية واستدلال بايزي

نظرة عامة

تتناول هذه الدراسة القضية الشائعة للبيانات المفقودة في مجموعات البيانات المنظمة من خلال اقتراح إطار مفاهيمي يدمج طرق الإحلال المختلفة. تستند الأبحاث إلى مراجعة منهجية لـ 58 دراسة، والتي حددت العوامل الرئيسية التي تؤثر على اختيار تقنيات الإحلال المثلى. تم تصميم الإطار لتوجيه الباحثين في اختيار الطرق المناسبة بناءً على الخصائص المحددة لمجموعات بياناتهم، مما يعزز موثوقية النتائج التحليلية.

تضمنت المنهجية خطوتين رئيسيتين: أولاً، تعريف المكونات المفاهيمية وعلاقاتها المتبادلة من خلال التحليل الثانوي؛ وثانياً، تحليل عملية التنفيذ من خلال فحص خصائص القيم المفقودة والتحقق من الافتراضات وفقًا لإطار التقدير من إرشادات ICH E9(R1). تشير النتائج إلى أن المفاهيم المحددة تؤثر بشكل كبير على اختيار طرق الإحلال، مما يؤدي في النهاية إلى نتائج أكثر صحة وموثوقية في الأبحاث السريرية.

في الختام، يهدف الإطار المقترح إلى تبسيط معالجة البيانات، وتحسين اتخاذ القرارات السريرية، ومساعدة الباحثين في معالجة البيانات المفقودة بشكل منهجي. من خلال تعزيز الإبلاغ الشفاف عن تأثير البيانات المفقودة على نتائج الدراسة، يسعى الإطار إلى تعزيز الثقة وقابلية إعادة إنتاج نتائج الأبحاث.

مقدمة

تتناول مقدمة هذه الورقة البحثية القضية الحرجة للقيم المفقودة في تحليل البيانات، لا سيما في قطاع الرعاية الصحية. يمكن أن تنشأ البيانات المفقودة من مصادر متنوعة، بما في ذلك جمع البيانات غير المكتمل، عدم امتثال المرضى، وأعطال المعدات، مما يؤدي إلى تحديات كبيرة مثل تقديرات المعلمات المنحازة، وانخفاض القوة الإحصائية، وت compromised صلاحية نتائج الدراسة. تؤكد الورقة على أهمية معالجة البيانات المفقودة من خلال طرق مناسبة وممارسات إدارة بيانات قوية، كما هو موضح في إرشادات ICH E9(R1)، التي توفر استراتيجيات للتعامل مع الأحداث المتزامنة وإجراء تحليلات حساسية لضمان موثوقية النتائج عبر سياقات بحثية مختلفة.

يقترح المؤلفون إطارًا مفاهيميًا يهدف إلى تبسيط عملية اختيار طرق الإحلال المناسبة للقيم المفقودة، موجهًا للباحثين في مجال الصحة الذين قد يفتقرون إلى المعرفة الخلفية الواسعة في النظريات الإحصائية. تم تصميم هذا الإطار لتسهيل تحديد آلية البيانات المفقودة—سواء كانت مفقودة تمامًا بشكل عشوائي (MCAR)، مفقودة بشكل عشوائي (MAR)، أو مفقودة ليست عشوائية (MNAR)—وتوجيه الباحثين في اختيار تقنيات الإحلال المناسبة وفقًا لذلك. من خلال توفير نهج منظم، يهدف الإطار إلى تعزيز مصداقية نتائج الأبحاث وتحسين اتخاذ القرارات في الأبحاث السريرية، مما يدعم في النهاية نتائج أفضل للمرضى.

طرق

في هذه الدراسة، تم تنفيذ المنهجية في مرحلتين رئيسيتين. تضمنت المرحلة الأولى تعريف المكونات المفاهيمية وعلاقاتها المتبادلة، والتي تم اشتقاقها من تحليل ثانوي لمراجعة منهجية سابقة تشمل 58 دراسة. تم تحديد المفاهيم الرئيسية وتصنيفها ودمجها لتوضيح علاقاتها. تم إنشاء جدول لتوضيح الافتراضات الأساسية لاختيار طرق الإحلال المناسبة، بناءً على رؤى من الأبحاث السابقة. قام ثلاثة خبراء مستقلين بتقييم هذه الافتراضات لكل دراسة، مما أدى إلى توافق في الآراء حول العوامل ذات الصلة التي تؤثر على اختيار طرق الإحلال. بالإضافة إلى ذلك، تم توثيق خوارزميات الإحلال المستخدمة في الدراسات كنماذج أساسية.

ركزت المرحلة الثانية على عملية التنفيذ، لا سيما تحليل خصائص القيم المفقودة واختيار طرق الإحلال المناسبة. أكدت الدراسة على أهمية التوافق مع إرشادات ICH E9(R1)، التي تؤطر طريقة الإحلال المناسبة كتقدير. تم تصنيف طرق الإحلال المختلفة كـ تقديرات، مع ضرورة إجراء تحليلات حساسية لتقييم الافتراضات المتعلقة بخصائص البيانات المفقودة. يعد هذا التقييم الدقيق للافتراضات أمرًا حيويًا لضمان أن التقديرات المستمدة من طرق الإحلال غير متحيزة وموثوقة، مما يحافظ على نزاهة التقدير والنتائج العامة للدراسة.

نتائج

تحدد قسم النتائج المكونات الأساسية للإطار المفاهيمي المقترح لمعالجة البيانات المفقودة، وتصنفها إلى مفاهيم رئيسية ومفهوم رئيسي نهائي. توضح الأشكال 1 و 2 خصائص البيانات المفقودة، بما في ذلك الآليات، الأنماط، وطرق الإحلال المختلفة، والتي تصنف إلى طرق إحصائية تقليدية وطرق قائمة على التعلم، مع طريقة هجينة تدمج بين الاثنين. يتم توضيح تعريفات الإطار والأسس النظرية في الملحق 1، بينما يوفر الملحق 2 معلومات مفصلة عن طرق الإحلال للبيانات.

كشفت تحليل ثانوي لـ 58 دراسة ذات صلة عن علاقات متبادلة كبيرة بين المفاهيم الرئيسية، مثل آليات القيم المفقودة، الأنماط، ونسبة الفقد، والتي تعد حاسمة لاختيار طرق الإحلال المناسبة. يلخص الجدول 1 هذه النتائج، موضحًا كيف تؤثر خصائص كل دراسة—مثل آلية الفقد (على سبيل المثال، مفقودة بشكل عشوائي (MAR)، مفقودة ليست عشوائية (MNAR))، نوع البيانات، والترابط بين المتغيرات—على اختيار طرق الإحلال. تشير التحليلات إلى أن القيم المفقودة غالبًا ما تؤدي أدوارًا مختلفة (مؤشرات أو نتائج) عبر الدراسات، وأن توزيع المتغيرات يختلف، مما يؤثر على استراتيجية الإحلال. توضح التمثيلات الرسومية في الأشكال 4 و 5 العلاقات داخل الإطار، مع التركيز على الآليات والأنماط للقيم المفقودة وطرق الإحلال المقابلة لها.

مناقشة

تؤكد قسم المناقشة في الورقة على الأهمية الحرجة لإدارة البيانات المفقودة بفعالية في أبحاث الرعاية الصحية لضمان نتائج تحليلية غير متحيزة وموثوقة. يبرز المؤلفون أن طرق الإحلال، التي تسمح بتحليل مجموعات البيانات الكاملة، ضرورية للحفاظ على صلاحية وموثوقية النتائج التي تؤثر على القرارات الطبية. يحددون مجموعة متنوعة من طرق الإحلال، بما في ذلك الطرق الإحصائية التقليدية وطرق التعلم، ويشددون على ضرورة فهم خصائص وآليات القيم المفقودة لتجنب إدخال تحيز وتقليل القوة الإحصائية.

يقترح المؤلفون إطارًا مفاهيميًا مصممًا لتوجيه الباحثين في اختيار طرق الإحلال المناسبة بناءً على عوامل رئيسية مثل آلية الفقد، نمط ونسبة القيم المفقودة، وخصائص مجموعة البيانات. يعتمد هذا الإطار على رؤى من مراجعة منهجية، والتي حددت المفاهيم الأساسية وعلاقاتها المتبادلة. بينما يعمل الإطار كأداة قوية لمعالجة البيانات المفقودة، يعترف المؤلفون بحدوده ويوصون بمراجعات منهجية إضافية لتعزيز شموليتها. تشمل التطورات المستقبلية تطبيق ويب لتسهيل اختيار طرق الإحلال بناءً على معايير محددة من قبل المستخدم، مما يحسن التطبيق العملي للإطار في الأبحاث السريرية.

Journal: BMC Medical Research Methodology, Volume: 25, Issue: 1
DOI: https://doi.org/10.1186/s12874-025-02496-3
PMID: https://pubmed.ncbi.nlm.nih.gov/39979819
Publication Date: 2025-02-20
Author(s): Marziyeh Afkanpour et al.
Primary Topic: Statistical Methods and Bayesian Inference

Overview

This study addresses the prevalent issue of missing data in structured datasets by proposing a conceptual framework that integrates various imputation methods. The research is grounded in a systematic review of 58 studies, which identified key factors influencing the selection of optimal imputation techniques. The framework is designed to guide researchers in choosing appropriate methods based on the specific characteristics of their datasets, thereby enhancing the reliability of analytical results.

The methodology involved two main steps: first, defining the conceptual components and their interrelationships through secondary analysis; and second, analyzing the implementation process by examining the properties of missing values and verifying assumptions in accordance with the estimand framework from the ICH E9(R1) Guideline. The findings indicate that the identified concepts significantly influence the choice of imputation methods, ultimately leading to more valid and trustworthy outcomes in clinical research.

In conclusion, the proposed framework aims to streamline data preprocessing, improve clinical decision-making, and assist researchers in systematically addressing missing data. By promoting transparent reporting of missing data’s impact on study results, the framework seeks to enhance the confidence and reproducibility of research findings.

Introduction

The introduction of this research paper addresses the critical issue of missing values in data analysis, particularly within the healthcare sector. Missing data can arise from various sources, including incomplete data collection, patient non-compliance, and equipment failures, leading to significant challenges such as biased parameter estimates, reduced statistical power, and compromised validity of study findings. The paper emphasizes the importance of addressing missing data through appropriate methods and robust data management practices, as outlined in the ICH E9(R1) Guideline, which provides strategies for handling intercurrent events and conducting sensitivity analyses to ensure the reliability of findings across different research contexts.

The authors propose a conceptual framework aimed at simplifying the selection process for appropriate imputation methods for missing values, catering to health researchers who may lack extensive background knowledge in statistical theories. This framework is designed to facilitate the identification of the missing data mechanism—whether missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR)—and guide researchers in choosing suitable imputation techniques accordingly. By providing a structured approach, the framework aims to enhance the credibility of research findings and improve decision-making in clinical research, ultimately supporting better patient outcomes.

Methods

In this study, the methodology was executed in two primary phases. The first phase involved defining conceptual components and their interrelationships, which were derived from a secondary analysis of a prior systematic review encompassing 58 studies. Key concepts were identified, categorized, and integrated to elucidate their relationships. A table was created to outline essential assumptions for selecting appropriate imputation methods, based on insights from previous research. Three independent experts evaluated these assumptions for each study, leading to a consensus on the relevant factors influencing the choice of imputation methods. Additionally, the imputation algorithms utilized in the studies were documented as foundational models.

The second phase focused on the implementation process, particularly the analysis of missing value properties and the selection of suitable imputation methods. The study emphasized the importance of aligning with the ICH E9(R1) guideline, which frames the appropriate imputation method as an estimand. Various imputation methods were categorized as estimators, with the necessity of conducting sensitivity analyses to evaluate the assumptions related to missing data characteristics. This rigorous assessment of assumptions is vital for ensuring that the estimates derived from the imputation methods are unbiased and reliable, thereby preserving the integrity of the estimand and the overall findings of the study.

Results

The results section delineates the essential components of the proposed conceptual framework for addressing missing data, categorizing them into primary concepts and a final key concept. Figures 1 and 2 illustrate the characteristics of missing data, including mechanisms, patterns, and various imputation methods, which are classified into traditional statistical and learning-based approaches, with a hybrid method that integrates both. The framework’s definitions and theoretical foundations are elaborated in Appendix 1, while Appendix 2 provides detailed information on data imputation methods.

A secondary analysis of 58 relevant studies revealed significant interrelationships among the primary concepts, such as missing value mechanisms, patterns, and the ratio of missingness, which are critical for selecting appropriate imputation methods. Table 1 summarizes these findings, showing how each study’s characteristics—like the mechanism of missingness (e.g., Missing at Random (MAR), Missing Not at Random (MNAR)), data type, and correlation among variables—inform the choice of imputation methods. The analysis indicates that missing values often serve different roles (predictors or outcomes) across studies, and the distribution of variables varies, impacting the imputation strategy. Graphical representations in Figures 4 and 5 further illustrate the relationships within the framework, focusing on the mechanisms and patterns of missing values and their corresponding imputation methods.

Discussion

The discussion section of the paper emphasizes the critical importance of effectively managing missing data in healthcare research to ensure unbiased and reliable analytical outcomes. The authors highlight that imputation methods, which allow for the analysis of complete datasets, are essential for maintaining the validity and reliability of findings that inform medical decisions. They outline various imputation approaches, including traditional statistical and learning-based methods, and stress the necessity of understanding the characteristics and mechanisms of missing values to avoid introducing bias and diminishing statistical power.

The authors propose a conceptual framework designed to guide researchers in selecting appropriate imputation methods based on key factors such as the mechanism of missingness, the pattern and proportion of missing values, and the properties of the dataset. This framework is built upon insights from a systematic review, which identified foundational concepts and their interrelationships. While the framework serves as a robust tool for addressing missing data, the authors acknowledge its limitations and recommend further systematic reviews to enhance its comprehensiveness. Future developments include a web application to facilitate the selection of imputation methods based on user-defined parameters, thereby improving the practical application of the framework in clinical research.