ممارسات التحقق من التوزيع الطبيعي أحادي المتغير في أبحاث اللغة الثانية: مراجعة منهجية مدعومة بالذكاء الاصطناعي Univariate normality checking practices in L2 research: An AI-assisted systematic review

المجلة: Studies in Second Language Acquisition
DOI: https://doi.org/10.1017/s0272263126101600
تاريخ النشر: 2026-02-25
المؤلف: Vahid Aryadoust وآخرون
الموضوع الرئيسي: المنهجيات النفسية والاختبار

نظرة عامة

تسلط هذه القسم من ورقة البحث الضوء على الدور الحاسم لاختبار الطبيعية في التحليل الإحصائي ضمن أبحاث اللغة الثانية (L2)، وخاصة في سياق الذكاء الاصطناعي التوليدي (GenAI) والأساليب التقليدية. على الرغم من وجود أكثر من 60 طريقة لتقييم الطبيعية الأحادية، يشير المؤلفون إلى أن المراجعات المنهجية تشير إلى تقارير ناقصة بشكل كبير عن هذه الاختبارات في دراسات L2. تستعرض الورقة 12 طريقة بارزة لتقييم الطبيعية – خمسة رسومية وسبعة تحليلية – موضحة آلياتها وحساسيتها لأشكال مختلفة من عدم الطبيعية، مثل الانحراف والتعددية.

أجرى المؤلفون مراجعة لـ 237 مقالة تجريبية من عشرة مجلات تركز على L2 تم نشرها بين عامي 2020 و2025، مما يكشف عن عدم اتساق في الإبلاغ عن اختبارات الطبيعية. وجدوا اعتمادًا كبيرًا على اختبارات كولموغوروف-سميرنوف (KS) وشابيرو-ويلك (SW)، غالبًا دون اعتبار كافٍ لتأثيرات حجم العينة أو الافتراضات الأساسية لهذه الاختبارات. تختتم الدراسة بتوصيات لتحسين ممارسات اختبار الطبيعية في أبحاث L2، مع التأكيد على الحاجة إلى إرشادات منهجية أوضح وأهمية إعادة تقييم الطبيعية بعد تحويل البيانات. كما تم الإشارة إلى استخدام التعليق المدعوم بالذكاء الاصطناعي في عملية المراجعة الخاصة بهم، على الرغم من أنه لم يحقق دقة مثالية. بشكل عام، تشير النتائج إلى وجود انفصال بين الممارسات الإحصائية الموصى بها والأساليب الفعلية المستخدمة في أبحاث L2، مما يبرز ضرورة تعزيز الصرامة في اختبار الطبيعية.

مقدمة

تناقش مقدمة ورقة البحث الدور الحاسم للطبيعية في التحليل الإحصائي، والتي تُعرف بأنها توزيع متساوي الشكل يشبه الجرس حيث تتجمع الملاحظات حول المتوسط. تعتبر طبيعية الأخطاء، بدلاً من البيانات الخام، ضرورية لمجموعة متنوعة من الأساليب الإحصائية، بما في ذلك اختبارات t، ANOVA، والانحدار الخطي، التي تحدد قابلية تطبيق التقنيات المعلمية مقابل غير المعلمية. على الرغم من تطوير أكثر من 60 طريقة للتحقق من الطبيعية، يواجه الباحثون تحديات بسبب التباين في تعقيد التنفيذ والموثوقية. بينما تتطلب بعض الطرق تقييمًا وصفيًا بصريًا، توفر أخرى، مثل اختبارات شابيرو-ويلك وكولموغوروف-سميرنوف، نتائج كمية ولكن قد تكون معقدة وأقل بديهية.

تسلط الورقة الضوء على اتجاه مقلق في أبحاث اللغة الثانية (L2)، حيث يتم إيلاء اهتمام غير كاف لافتراض الطبيعية. تشير الدراسات السابقة إلى أن نسبة منخفضة من أبحاث L2 تبلغ عن التحقق من الطبيعية، حيث وثقت 21 من أصل 96 دراسة كمية هذه الممارسة. يتكرر هذا الاتجاه في مجالات أخرى، مما يهدد شفافية البحث وقابليته للتكرار. تهدف الدراسة الحالية إلى سد هذه الفجوة من خلال مراجعة طرق تقييم الطبيعية الأحادية الشائعة، وتحليل فعاليتها في اكتشاف الانحرافات عن الطبيعية، وتقييم تطبيقها في 237 دراسة تجريبية نشرت في عشرة مجلات تركز على L2 من 2020 إلى 2025. سيكون التركيز على الاختبارات المحددة المستخدمة، والمعايير المعتمدة، والامتثال للإرشادات الإحصائية، مما يوفر في النهاية رؤى حول الممارسات الحالية وتوصيات لاختيار طرق التحقق من الطبيعية المناسبة في سياقات بحثية متنوعة.

النتائج

في هذا القسم، يتم تقديم نتائج التحقق من الطبيعية عبر 237 دراسة، مع تحليل إجمالي 382 حالة. وُجد أن الأساليب التحليلية كانت مستخدمة بشكل رئيسي، حيث شكلت 75.92% من الأساليب، بينما شكلت الأساليب الرسومية 24.08%. من بين التقنيات التحليلية، كانت الطرق المعتمدة على اللحظات، وخاصة الانحراف والتفرطح، هي الأكثر استخدامًا، إلى جانب اختبارات شابيرو-ويلك وكولموغوروف-سميرنوف، التي كانت الاختبارات الرائدة لملاءمة النموذج (GoF). رسوميًا، كانت الرسوم البيانية Q-Q والهيستوجرامات هي الطرق المفضلة لتقييم الطبيعية.

كما يشير القسم إلى وجود فئة “مجمعة”، والتي تشمل الدراسات التي استخدمت كل من الأساليب التحليلية والرسومية للتحقق من الطبيعية. ومع ذلك، لتجنب العد المزدوج، تم استبعاد هذه الحالات المجمعة من حسابات النسبة المئوية للملخصات التحليلية والرسومية. يتم تقديم مزيد من التفاصيل حول طرق الاختبار المحددة المستخدمة في الجدول 3، الذي يقدم نظرة شاملة على النتائج.

المناقشة

تؤكد المناقشة حول الطبيعية الأحادية على دورها الحاسم في التحليل الإحصائي، وخاصة في سياق الإحصاءات الاستنتاجية المعلمية. تتميز الطبيعية الأحادية بثلاث ميزات رئيسية: عدم التماثل (الانحراف)، وزن الذيل (التفرطح)، وخصائص القمة. يشير الانحراف إلى درجة عدم التماثل حول المتوسط، بينما يقيس التفرطح ثقل ذيول التوزيع. تقترح نظرية الحد المركزي (CLT) أنه مع زيادة حجم العينة، تقترب توزيع العينة للمتوسط من الطبيعية، على الرغم من أن الحد الأدنى لحجم العينة لهذا الافتراض لا يزال محل جدل، مع توصيات تتراوح من 30 إلى أكثر من 200. يتم التأكيد على أهمية الطبيعية من خلال تأثيرها على تقدير المعلمات، وفترات الثقة، واختبار الفرضيات، حيث يمكن أن تؤثر الانحرافات عن الطبيعية على النتائج الإحصائية.

تستعرض الورقة أيضًا مجموعة متنوعة من اختبارات الطبيعية، مصنفة إياها إلى طرق رسومية واختبارات تحليلية. توفر الطرق الرسومية، مثل الهيستوجرامات والرسوم البيانية Q-Q، تقييمات بصرية بديهية لأشكال التوزيع، بينما تقدم الاختبارات التحليلية، بما في ذلك اختبارات شابيرو-ويلك وجارك-بيرا، مقاييس كمية للطبيعية. يمكن أن تختلف أداء هذه الاختبارات بشكل كبير بناءً على حجم العينة وخصائص التوزيع، مما يبرز ضرورة التفسير الدقيق لنتائج اختبارات الطبيعية. تشير النتائج إلى أنه بينما تعتبر الطبيعية افتراضًا حيويًا، يجب ألا تُعتبر معيارًا ثنائيًا صارمًا لاختيار الأساليب الإحصائية. بدلاً من ذلك، يجب على الباحثين النظر في طبيعة بياناتهم وموثوقية أساليبهم التحليلية في ظل انتهاكات الطبيعية.

Journal: Studies in Second Language Acquisition
DOI: https://doi.org/10.1017/s0272263126101600
Publication Date: 2026-02-25
Author(s): Vahid Aryadoust et al.
Primary Topic: Psychometric Methodologies and Testing

Overview

This section of the research paper highlights the critical role of normality testing in statistical analysis within second language (L2) research, particularly in the context of generative artificial intelligence (GenAI) and traditional methods. Despite the existence of over 60 methods for assessing univariate normality, the authors note that systematic reviews indicate a significant underreporting of these tests in L2 studies. The paper reviews 12 prominent normality assessment methods—five graphical and seven analytical—detailing their mechanisms and sensitivities to various forms of nonnormality, such as skewness and multimodality.

The authors conducted a review of 237 empirical articles from ten Q1 L2-focused journals published between 2020 and 2025, revealing inconsistencies in the reporting of normality tests. They found a predominant reliance on the Kolmogorov-Smirnov (KS) and Shapiro-Wilk (SW) tests, often without adequate consideration of sample size effects or the assumptions underlying these tests. The study concludes with recommendations for improving normality testing practices in L2 research, emphasizing the need for clearer methodological guidance and the importance of re-evaluating normality post-data transformation. The use of AI-assisted annotation in their review process is also noted, although it did not achieve perfect accuracy. Overall, the findings suggest a disconnect between recommended statistical practices and actual methodologies employed in L2 research, underscoring the necessity for enhanced rigor in normality testing.

Introduction

The introduction of the research paper discusses the critical role of normality in statistical analysis, defined as a symmetric bell-shaped distribution where observations cluster around the mean. Normality of errors, rather than raw data, is essential for various statistical methods, including t-tests, ANOVA, and linear regression, which determine the applicability of parametric versus nonparametric techniques. Despite the development of over 60 normality checking methods, researchers face challenges due to the variability in implementation complexity and reliability. While some methods are descriptive and require visual assessment, others, such as the Shapiro-Wilk and Kolmogorov-Smirnov tests, provide quantitative results but may be complex and less intuitive.

The paper highlights a concerning trend in second language (L2) research, where insufficient attention is paid to the normality assumption. Previous studies indicate that a low percentage of L2 research reports checking normality, with only 21 out of 96 quantitative studies documenting this practice. This trend is mirrored in other fields, threatening research transparency and reproducibility. The present study aims to fill this gap by reviewing common univariate normality assessment methods, analyzing their effectiveness in detecting deviations from normality, and evaluating their application in 237 empirical studies published in ten Q1 L2-focused journals from 2020 to 2025. The focus will be on the specific tests used, the criteria adopted, and adherence to statistical guidelines, ultimately providing insights into current practices and recommendations for selecting appropriate normality checking methods in various research contexts.

Results

In this section, the results of normality checking across 237 studies are presented, with a total of 382 instances analyzed. It was found that analytical methods were predominantly utilized, comprising 75.92% of the approaches, while graphical methods accounted for 24.08%. Among analytical techniques, moment-based methods, particularly skewness and kurtosis, were the most frequently employed, alongside the Shapiro-Wilk and Kolmogorov-Smirnov tests, which were the leading goodness-of-fit (GoF) tests. Graphically, Q-Q plots and histograms were the preferred methods for assessing normality.

The section also notes the presence of a “Combined” category, which includes studies that utilized both analytical and graphical methods for normality checking. However, to avoid double-counting, these combined instances were excluded from the percentage calculations of the analytical and graphical summaries. Further details regarding the specific test methods used are provided in Table 3, which offers a comprehensive overview of the findings.

Discussion

The discussion on univariate normality emphasizes its critical role in statistical analysis, particularly in the context of parametric inferential statistics. Univariate normality is characterized by three main features: asymmetry (skewness), tail weight (kurtosis), and peak characteristics. Skewness indicates the degree of asymmetry around the mean, while kurtosis measures the heaviness of the distribution’s tails. The Central Limit Theorem (CLT) suggests that as sample size increases, the sampling distribution of the mean approaches normality, although the minimum sample size for this assumption remains debated, with recommendations ranging from 30 to over 200. The significance of normality is underscored by its influence on parameter estimation, confidence intervals, and hypothesis testing, with deviations from normality potentially impacting statistical outcomes.

The paper also reviews various normality tests, categorizing them into graphical methods and analytical tests. Graphical methods, such as histograms and Q-Q plots, provide intuitive visual assessments of distribution shapes, while analytical tests, including the Shapiro-Wilk and Jarque-Bera tests, offer quantitative measures of normality. The performance of these tests can vary significantly based on sample size and distribution characteristics, highlighting the necessity for careful interpretation of normality test results. The findings suggest that while normality is a vital assumption, it should not be viewed as a strict binary criterion for selecting statistical methods. Instead, researchers should consider the nature of their data and the robustness of their analytical methods in the presence of normality violations.