تصحيحات الاختبار المتعدد غير المتسقة: مغالطة استخدام معدلات الخطأ المستندة إلى العائلة لاستنتاجات حول الفرضيات الفردية Inconsistent multiple testing corrections: The fallacy of using family-based error rates to make inferences about individual hypotheses

المجلة: Methods in Psychology، المجلد: 10
DOI: https://doi.org/10.1016/j.metip.2024.100140
تاريخ النشر: 2024-03-28
المؤلف: Mark Rubin
الموضوع الرئيسي: طرق إحصائية في التجارب السريرية

نظرة عامة

يبدو أن القسم المعنون “نظرة عامة” يتناول تقييم معدلات الخطأ المرتبطة بالمقارنات الإحصائية ضمن عائلة معينة من الاختبارات. من المحتمل أن يناقش معدل الخطأ من النوع الأول، وهو احتمال رفض فرضية العدم الحقيقية بشكل غير صحيح. يشير النص إلى التركيز على تداعيات هذه المعدلات على صحة الاستنتاجات الإحصائية المستخلصة من البيانات.

بينما التفاصيل غير مكتملة، من الضروري أن نلاحظ أن فهم معدل الخطأ من النوع الأول أمر بالغ الأهمية لتقييم موثوقية النتائج المستخلصة من المقارنات المتعددة. يساعد هذا التقييم في تحديد قوة النتائج وضمان أن الاستنتاجات المستخلصة من التحليلات الإحصائية سليمة. سيساهم المزيد من الشرح حول المنهجيات المستخدمة لحساب هذه المعدلات وتأثيرها على الدراسة بشكل عام في تعزيز فهم أهمية البحث.

نقاش

يتناول قسم النقاش تعقيدات وتداعيات مشكلة الاختبار المتعدد، خاصة في سياق اختبار الاتحاد والتقاطع. عندما يقوم الباحثون بإجراء اختبارات دلالة متعددة على فرضيات العدم المكونة (مثل $H_{0,1}$، $H_{0,2}$، $H_{0,3}$)، يمكن أن يتجاوز معدل الخطأ من النوع الأول الإجمالي مستوى ألفا الاسمي (الذي يتم تعيينه عادة عند 0.050). يتطلب ذلك تعديل ألفا للتحكم في معدل الخطأ العائلي (FWER) عند إجراء استنتاجات حول فرضية العدم المشتركة. على سبيل المثال، إذا وجد الباحث نتائج ذات دلالة في أحد الاختبارات، يمكنه رفض فرضية العدم المشتركة، لكن نتائج الاختبارات الفردية لا تبرر استنتاجات منفصلة دون تعديلات مناسبة. يبرز القسم أنه بينما يزيد الاختبار المتعدد من احتمال حدوث خطأ من النوع الأول على الأقل عبر الاختبارات، إلا أنه لا يرفع معدل الخطأ من النوع الأول لكل اختبار فردي.

علاوة على ذلك، يسلط النقاش الضوء على التناقضات الشائعة في تطبيق تصحيحات الاختبار المتعدد. غالبًا ما يقوم الباحثون بتعديل مستويات ألفا للفرضيات المشتركة ولكنهم يفسرون بعد ذلك النتائج من الاختبارات الفردية دون الاعتراف بالطبيعة المشتركة لفرضياتهم. يمكن أن يؤدي هذا التناقض إلى تخفيضات غير مبررة في القوة الإحصائية، كما هو الحال في الحالات التي يقوم فيها الباحثون بتطبيق تصحيحات مثل تعديل بونفيروني بشكل غير ضروري، مما يؤدي إلى فقدان نتائج ذات دلالة. يختتم القسم بالإشارة إلى أن مثل هذه الممارسات شائعة في الأدبيات المنشورة، مما يمكن أن يقوض صحة الاستنتاجات الإحصائية ويقلل من القوة العامة للدراسات.

Journal: Methods in Psychology, Volume: 10
DOI: https://doi.org/10.1016/j.metip.2024.100140
Publication Date: 2024-03-28
Author(s): Mark Rubin
Primary Topic: Statistical Methods in Clinical Trials

Overview

The section titled “Overview” appears to address the evaluation of error rates associated with statistical comparisons within a specific family of tests. It likely discusses the Type I error rate, which is the probability of incorrectly rejecting a true null hypothesis. The text suggests a focus on the implications of these error rates for the validity of statistical inferences drawn from the data.

While the details are incomplete, it is essential to note that understanding the Type I error rate is crucial for assessing the reliability of the results obtained from multiple comparisons. This evaluation helps in determining the robustness of the findings and ensuring that the conclusions drawn from the statistical analyses are sound. Further elaboration on the methodologies used to calculate these error rates and their impact on the overall study would enhance the understanding of the research’s significance.

Discussion

The discussion section addresses the complexities and implications of the multiple testing problem, particularly in the context of union-intersection testing. When researchers conduct multiple significance tests on constituent null hypotheses (e.g., $H_{0,1}$, $H_{0,2}$, $H_{0,3}$), the overall Type I error rate can exceed the nominal alpha level (typically set at 0.050). This necessitates an alpha adjustment to control the familywise error rate (FWER) when making inferences about a joint null hypothesis. For instance, if a researcher finds significant results in one of several tests, they can reject the joint null hypothesis, but the individual tests’ results do not warrant separate inferences without appropriate adjustments. The section emphasizes that while multiple testing increases the likelihood of at least one Type I error across tests, it does not inflate the Type I error rate for each individual test.

Furthermore, the discussion highlights common inconsistencies in applying multiple testing corrections. Researchers often adjust alpha levels for joint hypotheses but then interpret results from individual tests without acknowledging the joint nature of their hypotheses. This inconsistency can lead to unjustified reductions in statistical power, as seen in cases where researchers apply corrections like the Bonferroni adjustment unnecessarily, resulting in missed significant findings. The section concludes by noting that such practices are prevalent in published literature, which can undermine the validity of statistical inferences and reduce the overall power of studies.

كلمات مفتاحية: خطأ منطقي