تقييم نماذج التنبؤ السريرية (الجزء 3): حساب حجم العينة المطلوب لدراسة التحقق الخارجي Evaluation of clinical prediction models (part 3): calculating the sample size required for an external validation study

المجلة: BMJ
DOI: https://doi.org/10.1136/bmj-2023-074821
PMID: https://pubmed.ncbi.nlm.nih.gov/38253388
تاريخ النشر: 2024-01-22
المؤلف: Richard D Riley وآخرون
الموضوع الرئيسي: تحليل البيانات الشامل والمراجعات المنهجية

نظرة عامة

في المقالة الثالثة من سلسلتهم حول تقييم النماذج، يتناول رايلي وزملاؤه القضية الحرجة المتعلقة بتحديد حجم العينة المناسب لدراسات التحقق الخارجي لنماذج التنبؤ. يؤكدون أن العديد من الدراسات الحالية ليست مدعومة بشكل كافٍ، مما يؤدي إلى تقديرات أداء غير موثوقة. يدعو المؤلفون إلى حسابات حجم العينة المخصصة بدلاً من الاعتماد على قواعد عامة، بهدف ضمان أن يكون حجم العينة كافياً لتوفير تقديرات دقيقة لمقاييس أداء النموذج مثل المعايرة، والتمييز، والملاءمة العامة، والفائدة السريرية.

تُبرز الأسس المنطقية لهذه الحسابات الإمكانية التي تتيح للأحجام الصغيرة للعينة إنتاج فترات ثقة واسعة، مما قد يؤدي إلى استنتاجات مضللة حول موثوقية النموذج. على سبيل المثال، أظهرت محاكاة تضم 100 مشارك تباينًا كبيرًا في منحنيات المعايرة، مما يبرز أن مثل هذا الحجم من العينة غير كافٍ لتقييمات أداء مستقرة. يقترح المؤلفون أنه بالنسبة للنتائج الثنائية أو الوقت حتى الحدث، فإن الحد الأدنى من 100 حدث و100 عدم حدث ضروري لتقدير دقيق لمقاييس الأداء الرئيسية. تختتم هذه المقالة السلسلة من خلال تعزيز أهمية حسابات حجم العينة القوية في دراسات التحقق الخارجي لتعزيز مصداقية وقابلية تطبيق نماذج التنبؤ السريرية.

نقاش

في قسم النقاش، يؤكد المؤلفون على أهمية تحديد أحجام العينات المناسبة للتحقق الخارجي من نماذج التنبؤ ذات النتائج المستمرة، مثل ضغط الدم أو درجات الألم. يبرزون ضرورة تقييم مقاييس الأداء المختلفة، بما في ذلك الملاءمة العامة (R²)، والمعايرة (المعايرة في العموم ومنحدر المعايرة)، والتباين المتبقي. يقترح المؤلفون أربعة حسابات مخصصة لحجم العينة تتطلب من الباحثين تحديد القيم الحقيقية المفترضة لهذه المقاييس بناءً على دراسة تطوير النموذج الأصلية. يوصون باستخدام تقديرات معدلة للتفاؤل لـ R² ويفترضون توقعات جيدة المعايرة في مجموعة التحقق الخارجي لتوجيه هذه الحسابات.

يتم تقديم مثال تطبيقي، يوضح التحقق الخارجي من نموذج تعلم الآلة الذي يتنبأ بشدة الألم لدى مرضى آلام الظهر المزمنة. أسفرت عملية التحقق الأولية بحجم عينة صغير عن فترات ثقة واسعة لـ R²، مما يشير إلى الحاجة لدراسة تحقق خارجي أكبر. حسب المؤلفون أحجام العينات اللازمة لتقديرات دقيقة لـ R²، والمعايرة في العموم، ومنحدر المعايرة، والتباين المتبقي، موصين في النهاية بحد أدنى من 886 مشاركًا لضمان تقييمات أداء موثوقة. يبرز هذا القسم الدور الحاسم لحجم العينة في تحقيق تحقق دقيق من نماذج التنبؤ، لا سيما في البيئات السريرية حيث تكون المعايرة الدقيقة ضرورية لاتخاذ قرارات فعالة.

Journal: BMJ
DOI: https://doi.org/10.1136/bmj-2023-074821
PMID: https://pubmed.ncbi.nlm.nih.gov/38253388
Publication Date: 2024-01-22
Author(s): Richard D Riley et al.
Primary Topic: Meta-analysis and systematic reviews

Overview

In the third article of their series on model evaluation, Riley and colleagues address the critical issue of determining the appropriate sample size for external validation studies of prediction models. They emphasize that many existing studies are inadequately powered, leading to unreliable performance estimates. The authors advocate for tailored sample size calculations rather than relying on generic rules of thumb, aiming to ensure that the sample size is sufficient to provide precise estimates of model performance measures such as calibration, discrimination, overall fit, and clinical utility.

The rationale for these calculations is underscored by the potential for small sample sizes to produce wide confidence intervals, which can result in misleading conclusions about a model’s reliability. For instance, a simulation involving 100 participants demonstrated significant variability in calibration curves, highlighting that such a sample size is inadequate for stable performance assessments. The authors propose that for binary or time-to-event outcomes, a minimum of 100 events and 100 non-events is necessary to accurately estimate key performance metrics. This article concludes the series by reinforcing the importance of robust sample size calculations in external validation studies to enhance the credibility and applicability of clinical prediction models.

Discussion

In the discussion section, the authors emphasize the importance of determining appropriate sample sizes for the external validation of prediction models with continuous outcomes, such as blood pressure or pain scores. They highlight the necessity of evaluating various performance measures, including overall fit (R²), calibration (calibration-in-the-large and calibration slope), and residual variance. The authors propose four tailored sample size calculations that require researchers to specify assumed true values for these performance metrics based on the original model development study. They recommend using optimism-adjusted estimates for R² and assuming well-calibrated predictions in the external validation population to guide these calculations.

An applied example is provided, illustrating the external validation of a machine learning model predicting pain intensity in chronic low back pain patients. The initial validation with a small sample size yielded wide confidence intervals for R², indicating the need for a larger external validation study. The authors calculated the necessary sample sizes for precise estimates of R², calibration-in-the-large, calibration slope, and residual variance, ultimately recommending a minimum of 886 participants to ensure reliable performance assessments. This section underscores the critical role of sample size in achieving accurate validation of prediction models, particularly in clinical settings where precise calibration is essential for effective decision-making.

كلمات مفتاحية: النمذجة التنبؤية، بشر، تحقق من صحة النموذج، تشخيص، حجم العينة، عينة (مادة)، قاعدة الإبهام، نماذج، إحصائية