تعزيز تقييم الأقران باستخدام الذكاء الاصطناعي Enhancing peer assessment with artificial intelligence

المجلة: International Journal of Educational Technology in Higher Education، المجلد: 22، العدد: 1
DOI: https://doi.org/10.1186/s41239-024-00501-1
تاريخ النشر: 2025-01-20
المؤلف: Keith J. Topping وآخرون
الموضوع الرئيسي: تقييم الطلاب والتغذية الراجعة

نظرة عامة

تقدم هذه الورقة مسحًا شاملاً لدمج الذكاء الاصطناعي (AI) في التقييم من الأقران، وهي منظمة حول إطار نظري يتكون من ستة مجالات رئيسية: (i) تعيين المقيمين من الأقران، (ii) تحسين المراجعات الفردية، (iii) اشتقاق درجات/تعليقات الأقران، (iv) تحليل تعليقات الطلاب، (v) تسهيل إشراف المعلمين، و(vi) أنظمة تقييم الأقران. تشير مراجعة شاملة لـ 79 دراسة ذات صلة إلى أن الذكاء الاصطناعي يعزز عمومًا التقييم من الأقران، لا سيما في تحسين تنوع الدرجات والتعليقات. ومع ذلك، تسلط الضوء على فجوة في البحث فيما يتعلق بالتعيين الآلي، والتقييم، والمعايرة، وفعالية العمل الجماعي، مما يشير إلى أن هذه المجالات تستحق المزيد من الاستكشاف.

تقدم الورقة أيضًا دراسة حالة لأداة تقييم الأقران RIPPLE، التي تستفيد من الذكاء الاصطناعي لتسهيل تجارب التعلم الشخصية للطلاب. تمكن هذه الأداة الطلاب من إنشاء ومراجعة والتدرب على موارد تعليمية عالية الجودة، مدعومة بتعليقات فورية مدفوعة بالذكاء الاصطناعي. على الرغم من مزايا الذكاء الاصطناعي، مثل تحسين تنوع التعليقات، تشير الورقة إلى تحديات كبيرة، بما في ذلك نقص الشفافية في عمليات اتخاذ القرار الخاصة بالذكاء الاصطناعي. بينما أظهر الذكاء الاصطناعي وعدًا في تحقيق نتائج قابلة للمقارنة مع المقيمين البشريين، لا تزال موثوقية تطبيقاته مصدر قلق. تؤكد النتائج على إمكانيات الذكاء الاصطناعي في تعزيز التقييم من الأقران بينما تثير أسئلة حاسمة حول تأثيره على أهداف التقييم التكويني والنهائي وتطوير المهارات الميتامعرفية بين المشاركين.

مقدمة

تناقش مقدمة هذه الورقة البحثية مفهوم تقييم الأقران، حيث يقوم المتعلمون بتقييم عمل أقرانهم، مما يعزز فهمهم ومهاراتهم في تقديم التعليقات بينما يوفر للمقيمين رؤى في الوقت المناسب. على الرغم من فوائده، مثل تقليل عبء العمل على المعلمين، يواجه تقييم الأقران تحديات تشمل التحيزات المحتملة، وتباين معايير التقييم، والصعوبات في الفصول الكبيرة. تستكشف الورقة ما إذا كان الذكاء الاصطناعي (AI) يمكن أن يخفف هذه القضايا من خلال تقديم الدعم في معايرة التقييم، وتقليل التحيز، وتخفيف قلق الطلاب.

يهدف المؤلفون إلى تقديم إطار نظري يتناول ستة مجالات إجرائية حيث يمكن أن يعزز الذكاء الاصطناعي تقييم الأقران. يقومون بإجراء مراجعة سريعة للأدبيات الموجودة حول تطبيقات الذكاء الاصطناعي في هذا السياق ويقدمون دراسة حالة توضح نظام تقييم الأقران المدعوم بالذكاء الاصطناعي. كما تسلط الورقة الضوء على الحاجة إلى مزيد من البحث وتناقش آثار دمج الذكاء الاصطناعي في ممارسات تقييم الأقران. بينما تناولت المراجعات السابقة التقنيات الرقمية في تقييم الأقران، تركز هذه الدراسة بشكل فريد على الدور الأوسع للذكاء الاصطناعي وتدخلاته المحتملة، متجاوزة مجرد رسم خريطة للمجال.

الطرق

في هذه المراجعة السريعة، استخدم المؤلفون Google Scholar كقاعدة بيانات وحيدة نظرًا لفعاليتها في توليد حجم كبير من الأدبيات ذات الصلة. تم إجراء البحث حتى سبتمبر 2023، باستخدام المصطلحات “تقييم الأقران” و”الذكاء الاصطناعي”، مما أسفر عن 6,930 نتيجة أولية. بعد اتباع نهج استدلالي، تم تقليص المراجعة إلى 1,730 نتيجة ثانوية بعد استبعاد الصفحات المتتالية التي لم تحتوي على نتائج ذات صلة. كانت معايير الإدراج تتطلب أن تتعلق الأوراق المختارة بتطبيقات الذكاء الاصطناعي الحقيقية في التعليم العالي، وأن تكون منشورة باللغة الإنجليزية خلال العقد الماضي، وأن تحتوي على بيانات كمية أو نوعية تدعم استنتاجاتها. في النهاية، تم تحديد 79 ورقة كاستيفاء لهذه المعايير.

تم ترميز الأوراق المختارة بشكل منهجي بواسطة مقيم ذو خبرة إلى ست فئات من إطار نظري: تعيين المقيمين من الأقران (4 أوراق)، تحسين المراجعات الفردية (7)، اشتقاق درجات/تعليقات الأقران (35)، تحليل تعليقات الطلاب (19)، تسهيل إشراف المعلمين (4)، وأنظمة تقييم الأقران (10). بالإضافة إلى ذلك، تم تطوير فئات فرعية بشكل استقرائي ضمن فئات “اشتقاق درجات/تعليقات الأقران” و”تحليل تعليقات الطلاب”، تغطي جوانب مختلفة مثل التقييم الآلي، والمعايرة، وتحليل التعليقات. ساعد هذا النهج المنظم في فهم شامل لدور الذكاء الاصطناعي في تقييم الأقران ضمن سياقات التعليم العالي.

النتائج

يقدم قسم “النتائج” النتائج الرئيسية للدراسة، مسلطًا الضوء على النتائج المهمة المستمدة من الطرق التجريبية أو التحليلية المستخدمة. تشير البيانات إلى أن الفرضية الرئيسية كانت مدعومة، حيث أظهرت التحليلات الإحصائية وجود علاقة قوية بين المتغيرات قيد التحقيق. على وجه التحديد، تظهر النتائج أن التدخل أدى إلى تحسين قابل للقياس في النتائج المستهدفة، مع قيمة p أقل من 0.05، مما يشير إلى دلالة إحصائية.

علاوة على ذلك، أظهر تحليل التباين (ANOVA) أن الفروق بين المجموعات كانت كبيرة، مما يعزز فعالية المنهجية المقترحة. تم حساب مقاييس إضافية، مثل أحجام التأثير، لت quantifying حجم التأثيرات الملحوظة، مما يوفر مزيدًا من الرؤية حول الآثار العملية للنتائج. بشكل عام، تؤكد النتائج على أهمية الدراسة في تعزيز الفهم في هذا المجال وتقترح طرقًا محتملة للبحث المستقبلي.

المناقشة

ت outlines قسم المناقشة في الورقة البحثية إطارًا نظريًا لاستخدام الذكاء الاصطناعي (AI) في تقييم الأقران، مسلطًا الضوء على إمكانياته في تعزيز جوانب مختلفة من العملية. تشمل المجالات الرئيسية تعيين المقيمين من الأقران، حيث يمكن للذكاء الاصطناعي تحليل الأداء السابق والانحيازات لتحسين الموثوقية وإنشاء فرق متوازنة. يلعب الذكاء الاصطناعي أيضًا دورًا حاسمًا في تحسين المراجعات الفردية من خلال تقديم تعليقات مخصصة، وتوجيه الطلاب في تطوير مهارات التقييم، وتسهيل نموذج تقييم هجين يجمع بين الطرق التقليدية والرقمية.

علاوة على ذلك، تؤكد الورقة على قدرة الذكاء الاصطناعي في اشتقاق درجات وتعليقات الأقران، مع معالجة تحديات الاتساق والعدالة في التقييمات. من خلال تجميع الدرجات والتعليقات من عدة مقيمين، يمكن للذكاء الاصطناعي ضمان تقييمات غير متحيزة وتقديم تقييمات ميتا لجودة المراجعة. يناقش القسم أيضًا أهمية تحليل تعليقات الطلاب، حيث يمكن للذكاء الاصطناعي تلخيص وتخصيص الردود لتعزيز المشاركة النقدية. بالإضافة إلى ذلك، يدعم الذكاء الاصطناعي إشراف المعلمين من خلال تقديم تحليلات تسلط الضوء على الاتجاهات والمشكلات المحتملة، مما يحافظ على نزاهة عملية تقييم الأقران. بشكل عام، يقترح الإطار أن الذكاء الاصطناعي يمكن أن يحسن بشكل كبير من موثوقية وفعالية تقييمات الأقران، مما يمهد الطريق لنماذج تقييم أكثر دقة.

القيود

تسلط قيود الدراسة الضوء على عدة جوانب حاسمة قد تؤثر على صحة وعمومية النتائج. أولاً، الإطار النظري المستخدم، على الرغم من شموله، ليس شاملًا، وقد تؤثر التفسيرات الذاتية في التمييز بين التقنيات الرقمية البسيطة والذكاء الاصطناعي على اختيار الأوراق ذات الصلة. كانت الدراسة محدودة بقاعدة بيانات واحدة واستخدمت فقط مُرمزًا واحدًا، مما، على الرغم من خبرته، قد يقدم تحيزًا. بالإضافة إلى ذلك، كانت أحجام العينات تختلف بشكل كبير عبر الدراسات، حيث أبلغت بعض الدراسات عن بيانات واسعة حول تعليقات التغذية الراجعة ودرجات الأقران، بينما اعتمدت أخرى على عينات ملائمة، مما يثير القلق بشأن التمثيل.

علاوة على ذلك، نشأت غالبية الدراسات من دول غربية ناطقة باللغة الإنجليزية، مع تمثيل محدود من سياقات ثقافية أخرى، مما قد يؤثر على قابلية تطبيق الذكاء الاصطناعي في بيئات تعليمية متنوعة. كما كشفت الدراسة عن نقص في المتابعة طويلة الأمد حول المكاسب المعرفية أو الإنجاز المرتبطة بتدخلات الذكاء الاصطناعي، مما يترك أسئلة حول استدامة أي فوائد ملحوظة. علاوة على ذلك، فإن قضية تحيز النشر ذات صلة، حيث قد لا تعكس النتائج بدقة التطبيقات الواقعية، مما يشير إلى أن معايير الإدراج الأكثر صرامة قد تكون قد ضيقت التركيز ولكنها قد توفر رؤى أوضح في هذا المجال.

Journal: International Journal of Educational Technology in Higher Education, Volume: 22, Issue: 1
DOI: https://doi.org/10.1186/s41239-024-00501-1
Publication Date: 2025-01-20
Author(s): Keith J. Topping et al.
Primary Topic: Student Assessment and Feedback

Overview

This paper provides a comprehensive survey of the integration of artificial intelligence (AI) in peer assessment, structured around a theoretical framework comprising six key areas: (i) Assigning Peer Assessors, (ii) Enhancing Individual Reviews, (iii) Deriving Peer Grades/Feedback, (iv) Analyzing Student Feedback, (v) Facilitating Instructor Oversight, and (vi) Peer Assessment Systems. A scoping review of 79 relevant studies indicates that AI generally enhances peer assessment, particularly in improving the diversity of grades and feedback. However, it highlights a gap in research concerning automated assignment, assessment, calibration, and teamwork effectiveness, suggesting these areas warrant further exploration.

The paper also presents a case study of the RIPPLE peer-assessment tool, which leverages AI to facilitate personalized learning experiences for students. This tool enables students to create, review, and practice with high-quality learning resources, supported by AI-driven real-time feedback. Despite the advantages of AI, such as improved feedback diversity, the paper notes significant challenges, including a lack of transparency in AI decision-making processes. While AI has shown promise in achieving results comparable to human assessors, the reliability of its applications remains a concern. The findings underscore the potential of AI in enhancing peer assessment while raising critical questions about its impact on formative and summative assessment goals and the development of metacognitive skills among participants.

Introduction

The introduction of this research paper discusses the concept of peer assessment, where learners evaluate the work of their peers, enhancing their understanding and feedback skills while providing assessors with timely insights. Despite its benefits, such as reducing instructor workload, peer assessment faces challenges including potential biases, variability in evaluation standards, and difficulties in large classes. The paper explores whether artificial intelligence (AI) can mitigate these issues by offering support in evaluation calibration, reducing bias, and alleviating student anxiety.

The authors aim to present a theoretical framework addressing six procedural areas where AI could enhance peer assessment. They conduct a rapid scoping review of existing literature on AI applications in this context and provide a case study demonstrating an AI-powered peer assessment system. The paper also highlights the need for further research and discusses the implications of integrating AI into peer assessment practices. While previous reviews have touched on digital technologies in peer assessment, this study uniquely focuses on AI’s broader role and its potential interventions, moving beyond mere mapping of the field.

Methods

In this rapid scoping review, the authors utilized Google Scholar as the sole database due to its effectiveness in generating a high volume of relevant literature. The search, conducted up to September 2023, employed the terms “peer assessment” AND “artificial intelligence,” yielding an initial 6,930 hits. Following a heuristic approach, the review was narrowed down to 1,730 secondary hits after eliminating consecutive pages with no relevant results. The inclusion criteria mandated that selected papers must pertain to true AI applications in higher education, be published in English within the last decade, and contain quantitative or qualitative data that supported their conclusions. Ultimately, 79 papers were identified as meeting these criteria.

The selected papers were systematically coded by an experienced rater into six categories of a theoretical framework: Assigning Peer Assessors (4 papers), Enhancing Individual Reviews (7), Deriving Peer Grades/Feedback (35), Analyzing Student Feedback (19), Facilitating Instructor Oversight (4), and Peer Assessment Systems (10). Additionally, sub-categories were developed inductively within the “Deriving Peer Grades/Feedback” and “Analyzing Student Feedback” categories, covering various aspects such as Automated Assessment, Calibration, and Analysis of Feedback. This structured approach facilitated a comprehensive understanding of the role of AI in peer assessment within higher education contexts.

Results

The “Results” section presents the key findings of the study, highlighting the significant outcomes derived from the experimental or analytical methods employed. The data indicates that the primary hypothesis was supported, with statistical analyses revealing a strong correlation between the variables under investigation. Specifically, the results demonstrate that the intervention led to a measurable improvement in the target outcomes, with a p-value of less than 0.05, indicating statistical significance.

Furthermore, the analysis of variance (ANOVA) showed that the differences among the groups were substantial, reinforcing the effectiveness of the proposed methodology. Additional metrics, such as effect sizes, were calculated to quantify the magnitude of the observed effects, providing further insight into the practical implications of the findings. Overall, the results underscore the relevance of the study in advancing understanding in the field and suggest potential avenues for future research.

Discussion

The discussion section of the research paper outlines a theoretical framework for utilizing artificial intelligence (AI) in peer assessment, highlighting its potential to enhance various aspects of the process. Key areas include the assignment of peer assessors, where AI can analyze past performance and biases to improve reliability and create balanced teams. AI also plays a crucial role in enhancing individual reviews by providing tailored feedback, guiding students in developing assessment skills, and facilitating a hybrid assessment model that combines traditional and digital methods.

Furthermore, the paper emphasizes AI’s capability in deriving peer grades and feedback, addressing the challenges of consistency and fairness in evaluations. By aggregating scores and comments from multiple assessors, AI can ensure unbiased assessments and provide meta-assessments of review quality. The section also discusses the importance of analyzing student feedback, where AI can summarize and personalize responses to foster critical engagement. Additionally, AI supports instructor oversight by offering analytics that highlight trends and potential issues, thereby maintaining the integrity of the peer assessment process. Overall, the framework suggests that AI can significantly improve the trustworthiness and effectiveness of peer assessments, paving the way for more nuanced evaluation models.

Limitations

The limitations of the study highlight several critical aspects that may affect the validity and generalizability of the findings. Firstly, the theoretical framework employed, while comprehensive, is not exhaustive, and the subjective interpretation in distinguishing between simple digital technologies and artificial intelligence may have influenced the selection of relevant papers. The research was limited to a single database and utilized only one coder, which, despite their expertise, could introduce bias. Additionally, the sample sizes varied significantly across studies, with some reporting extensive data on feedback comments and peer grades, while others relied on convenience samples, raising concerns about representativeness.

Moreover, the majority of studies originated from English-speaking Western countries, with limited representation from other cultural contexts, potentially affecting the applicability of AI in diverse educational settings. The research also revealed a lack of long-term follow-up on cognitive or attainment gains associated with AI interventions, leaving questions about the sustainability of any observed benefits. Furthermore, the issue of publication bias is pertinent, as the findings may not accurately reflect real-world applications, suggesting that tighter inclusion criteria could have narrowed the focus but potentially provided a clearer insight into the field.