GLAT: اختبار تقييم معرفة الذكاء الاصطناعي التوليدي GLAT: The generative AI literacy assessment test

المجلة: Computers and Education Artificial Intelligence، المجلد: 9
DOI: https://doi.org/10.1016/j.caeai.2025.100436
تاريخ النشر: 2025-06-09
المؤلف: Yueqiao Jin وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي القابل للتفسير (XAI)

نظرة عامة

تقدم البحث اختبار تقييم معرفة الذكاء الاصطناعي التوليدي (GLAT)، وهو أداة متعددة الخيارات تتكون من 20 عنصرًا مصممة لقياس معرفة الذكاء الاصطناعي التوليدي (GenAI) بين طلاب التعليم العالي. يبرز هذا الدراسة ضرورة وجود أدوات تقييم دقيقة في ضوء التكامل السريع لتكنولوجيا GenAI في التعليم. تم تطوير GLAT وفقًا لبروتوكولات قياس نفسية وتعليمية صارمة، مما يظهر صلاحية هيكلية وقوة موثوقة من خلال تحليلات شملت 355 طالبًا. أسفرت النتائج عن نموذج لوجستي ذو معلمتين موثوق به (2PL)، مع ألفا كرونباخ عند 0.80 وأوميغا الكلي عند 0.81، إلى جانب هيكل عامل قوي (RMSEA = 0.03؛ CFI = 0.97). من الجدير بالذكر أن درجات GLAT توقعت بشكل كبير أداء المتعلمين في المهام المدعومة بـ GenAI، متفوقة على مقاييس التقرير الذاتي التقليدية، مما يثبت صلاحيتها الخارجية.

في الختام، يعد GLAT أداة تقييم قائمة على الأداء تقيم بفعالية معرفة GenAI، خاصة بين الطلاب بمستويات خبرة مختلفة. تدعو الدراسة إلى اعتماد مثل هذه التقييمات القائمة على الأداء لتكملة مقاييس التقرير الذاتي، مع التأكيد على الحاجة إلى التكيف المستمر لأدوات التقييم لتتوافق مع التقدم التكنولوجي. تؤكد النتائج على أهمية تجهيز كل من المعلمين والطلاب بالمهارات اللازمة للتنقل في مستقبل معزز بالذكاء الاصطناعي. يُشجع البحث المستقبلي على استكشاف قابلية تطبيق GLAT عبر سياقات ومستويات تعليمية مختلفة، مع معالجة المشهد الديناميكي لتكنولوجيا GenAI.

مقدمة

تسلط مقدمة ورقة البحث الضوء على التأثير التحويلي للذكاء الاصطناعي التوليدي (GenAI) على التعليم العالي، مقدمةً كل من الفرص والتحديات. يمكن أن تعزز أدوات مثل ChatGPT وGoogle’s Gemini التعليم الشخصي، وتوليد المواد التعليمية، والوصول، بينما تثير أيضًا مخاوف أخلاقية، ومخاطر المعلومات المضللة، وقضايا النزاهة الأكاديمية. للاستفادة الفعالة من GenAI، هناك حاجة ملحة لتحسين معرفة الذكاء الاصطناعي بين المعلمين والمتعلمين، والتي تُعرف بأنها الكفاءات اللازمة للتفاعل المعنوي مع تقنيات الذكاء الاصطناعي، بما في ذلك فهم مفاهيم الذكاء الاصطناعي والاستخدام الأخلاقي.

تحدد الورقة فجوة في التقييم الحالي لمعرفه GenAI، والذي غالبًا ما يتم قياسه من خلال أدوات التقرير الذاتي التي قد لا تعكس بدقة الكفاءات الفعلية. بالمقابل، توفر التقييمات القائمة على الأداء تقييمًا أكثر موثوقية للمهارات، وهو أمر حاسم بشكل خاص لمعرفه GenAI، حيث تكون الفجوات بين القدرات المدركة والفعلية شائعة. لمعالجة هذه الفجوة، تقدم الدراسة اختبار تقييم معرفة الذكاء الاصطناعي التوليدي (GLAT)، وهو أداة قائمة على الأداء مصممة لتقييم الكفاءات المطلوبة للتفاعل الفعال مع تقنيات GenAI في السياقات التعليمية. يهدف GLAT إلى ضمان الصلاحية والموثوقية، مع التركيز على المهارات الأساسية مثل الكفاءة التقنية، والوعي الأخلاقي، والتقييم النقدي لمخرجات الذكاء الاصطناعي، مما يوجه التدخلات التعليمية المستهدفة.

الطرق

في هذا القسم، يتم وصف تطوير أداة تقييم معرفة الذكاء الاصطناعي التوليدي (GLAT)، مع الالتزام بإجراءات تطوير الاختبار التي وضعتها ثورن دايك وآخرون (1991). شمل العملية ثلاث خطوات رئيسية: أولاً، تم إنشاء مخطط يوضح مفاهيم الذكاء الاصطناعي التوليدي ذات الصلة؛ ثانيًا، تم توليد مجموعة أولية من عناصر الاختبار بناءً على هذا المخطط؛ وثالثًا، تم تقييم صلاحية الوجه والمحتوى من خلال مراجعات الخبراء والدراسات التجريبية.

لضمان جودة عناصر الاختبار، تم إجراء تحليل عنصر باستخدام نظرية الاختبار الكلاسيكية (CTT)، مع التركيز على صعوبة العنصر ومؤشرات التمييز. تم تقييم الصلاحية الهيكلية والموثوقية لـ GLAT من خلال نظرية استجابة العنصر (IRT)، مع معالجة سؤال البحث 1 (RQ1). بالإضافة إلى ذلك، تم فحص الصلاحية الخارجية لـ GLAT من خلال تحليل فعاليته التنبؤية فيما يتعلق بأداء المتعلمين في المهام التي تتضمن التفاعل مع دردشة الذكاء الاصطناعي التوليدي، مقارنةً بأداة تقرير ذاتي لمعرفه الذكاء الاصطناعي التوليدي (لي وبارك، 2024)، والتي تتوافق مع سؤال البحث 2 (RQ2). يتم تقديم مزيد من التفاصيل في الأقسام التالية.

المناقشة

تؤكد قسم المناقشة في ورقة البحث على الحاجة الملحة لمعرفه الذكاء الاصطناعي والذكاء الاصطناعي التوليدي (GenAI)، خاصة في السياقات التعليمية. تُعرف معرفه الذكاء الاصطناعي بأنها مجموعة من الكفاءات التي تمكن الأفراد من تقييم، والتواصل مع، واستخدام تقنيات الذكاء الاصطناعي بفعالية، كما أوضح لونغ وماجيركو (2020). وقد وسعت الدراسات الحديثة هذا التعريف ليشمل المهارات الأساسية مثل التعرف، والتطبيق، والتقييم، والإبداع، والتنقل الأخلاقي، خاصة في ضوء التحديات الفريدة التي تطرحها تقنيات GenAI مثل ChatGPT. غالبًا ما تُعتبر الأطر الحالية لمعرفه الذكاء الاصطناعي عامة جدًا، حيث تفشل في معالجة الكفاءات المحددة اللازمة للتفاعل الفعال مع GenAI، مما يتطلب نهجًا أكثر تكاملاً يجمع بين المعرفة النظرية، والمهارات العملية، والتفكير النقدي.

تسلط الورقة أيضًا الضوء على تطوير أدوات تقييم معرفه الذكاء الاصطناعي المختلفة، مثل مقياس تقييم معرفه الذكاء الاصطناعي لغير الخبراء (SNAIL) ومقياس معرفه الذكاء الاصطناعي (AILS)، التي تقيم جوانب مختلفة من معرفه الذكاء الاصطناعي. ومع ذلك، لا تزال هناك فجوة كبيرة في تقييم معرفه GenAI، خاصة فيما يتعلق بالتقييمات القائمة على الأداء. يهدف تقديم اختبار تقييم معرفه الذكاء الاصطناعي التوليدي (GLAT) إلى سد هذه الفجوة من خلال قياس معرفه GenAI بين طلاب التعليم العالي من خلال عملية تحقق قوية تستند إلى نظرية الاختبار الكلاسيكية (CTT) ونظرية استجابة العنصر (IRT). تم تصميم GLAT لتقييم المعرفة الأساسية، والتطبيق، والوعي الأخلاقي، وقدرات التقييم النقدي، مع معالجة الحاجة الملحة لمقاييس موثوقة وصحيحة لمعرفه GenAI في البيئات التعليمية.

القيود

تسلط قسم القيود الضوء على عدة جوانب حاسمة من الدراسة حول قياس معرفه GenAI في التعليم العالي. أولاً، تم تطوير أداة تقييم معرفه GenAI (GLAT) بشكل أساسي لطلاب التعليم العالي، مما يحد من قابليتها للتطبيق على الطلاب الأصغر سنًا في K-12 والمعلمين. قد يحد استخدام مصطلحات GenAI المتخصصة في عناصر GLAT من صلتها عبر مختلف التخصصات الأكاديمية والبيئات التعليمية. يجب أن تهدف الأبحاث المستقبلية إلى تعديل وتوسيع GLAT لتشمل مجموعة أوسع من المستويات التعليمية، وخصائص المشاركين، ومجالات الموضوع.

بالإضافة إلى ذلك، تستند الصلاحية الخارجية للدراسة إلى مهام محددة تتعلق بالتحليلات البصرية وتفاعلات الدردشة، مما يشير إلى الحاجة إلى تضمين سياقات متنوعة وتعقيدات المهام في التحقيقات المستقبلية لفهم التأثير الكامل لمعرفه GenAI على أداء التعلم. كما أن التركيز على أنواع معينة من المعرفة الميدانية، مثل معرفة التصور، دون النظر في مجالات أخرى ذات صلة، يمثل أيضًا قيدًا. يجب أن تستكشف الدراسات المستقبلية مجموعة أوسع من المعرفة الميدانية لتقييم تأثيرها بشكل أفضل على نتائج معرفه GenAI. علاوة على ذلك، تتطلب الطبيعة المتطورة لتكنولوجيا GenAI تحديثات مستمرة لـ GLAT للحفاظ على فعاليتها. أخيرًا، قد تعيق متطلبات اللغة الإنجليزية في التقييم الوصول لغير الناطقين باللغة الإنجليزية، مما يشير إلى الحاجة إلى تعديلات بلغات مختلفة لتعزيز صلاحيتها عبر مجموعات لغوية متنوعة. يمكن أن تسمح دمج تقنيات استخراج البيانات والتحليلات البصرية أيضًا بتتبع زمني لاستخدام GLAT، مما يوفر رؤى حول العلاقة بين درجات التقييم وسلوكيات التعلم الحقيقية في السياقات التعليمية الواقعية.

Journal: Computers and Education Artificial Intelligence, Volume: 9
DOI: https://doi.org/10.1016/j.caeai.2025.100436
Publication Date: 2025-06-09
Author(s): Yueqiao Jin et al.
Primary Topic: Explainable Artificial Intelligence (XAI)

Overview

The research presents the GenAI Literacy Assessment Test (GLAT), a 20-item multiple-choice instrument designed to measure generative artificial intelligence (GenAI) literacy among higher education students. This study highlights the necessity for precise assessment tools in light of the rapid integration of GenAI technology in education. The GLAT was developed following rigorous psychological and educational measurement protocols, demonstrating strong structural validity and reliability through analyses involving 355 students. The results yielded a reliable 2-parameter logistic (2PL) model, with Cronbach’s alpha at 0.80 and omega total at 0.81, alongside a robust factor structure (RMSEA = 0.03; CFI = 0.97). Notably, GLAT scores significantly predicted learners’ performance in GenAI-supported tasks, outperforming traditional self-reported measures, thus establishing its external validity.

In conclusion, the GLAT serves as a performance-based assessment tool that effectively evaluates GenAI literacy, particularly among students with varying expertise levels. The study advocates for the adoption of such performance-based assessments to complement self-reported measures, emphasizing the need for continuous adaptation of evaluation tools to align with technological advancements. The findings underscore the importance of equipping both educators and students with the necessary skills to navigate an AI-enhanced future. Future research is encouraged to explore the GLAT’s applicability across different educational contexts and levels, addressing the dynamic landscape of GenAI technologies.

Introduction

The introduction of the research paper highlights the transformative impact of generative artificial intelligence (GenAI) on higher education, presenting both opportunities and challenges. Tools such as ChatGPT and Google’s Gemini can enhance personalized tutoring, instructional material generation, and accessibility, while also raising ethical concerns, misinformation risks, and academic integrity issues. To effectively leverage GenAI, there is a pressing need for improved AI literacy among educators and learners, defined as the competencies necessary for meaningful interaction with AI technologies, including understanding AI concepts and ethical usage.

The paper identifies a gap in the current assessment of GenAI literacy, which is often measured through self-reported instruments that may not accurately reflect actual competencies. In contrast, performance-based assessments provide a more reliable evaluation of skills, particularly critical for GenAI literacy, where discrepancies between perceived and actual abilities are common. To address this gap, the study introduces the GenAI Literacy Assessment Test (GLAT), a performance-based tool designed to rigorously assess the competencies required for effective engagement with GenAI technologies in educational contexts. The GLAT aims to ensure validity and reliability, focusing on essential skills such as technical proficiency, ethical awareness, and critical evaluation of AI outputs, thereby informing targeted educational interventions.

Methods

In this section, the development of the Generative AI Literacy Assessment Tool (GLAT) is described, adhering to the test development procedures established by Thorndike et al. (1991). The process comprised three main steps: first, a blueprint outlining relevant generative AI concepts was created; second, an initial set of test items was generated based on this blueprint; and third, face and content validity were evaluated through expert reviews and pilot studies.

To ensure the quality of the test items, an item analysis was performed using Classical Test Theory (CTT), focusing on item difficulty and discrimination indices. The structural validity and reliability of the GLAT were assessed through Item Response Theory (IRT), addressing Research Question 1 (RQ1). Additionally, the external validity of the GLAT was examined by analyzing its predictive effectiveness regarding learners’ performance on tasks involving interaction with a generative AI chatbot, in comparison to a self-reported generative AI literacy instrument (Lee & Park, 2024), which corresponds to Research Question 2 (RQ2). Further details are provided in subsequent sections.

Discussion

The discussion section of the research paper emphasizes the critical need for AI and Generative AI (GenAI) literacy, particularly in educational contexts. AI literacy is defined as a set of competencies enabling individuals to evaluate, communicate with, and effectively utilize AI technologies, as articulated by Long and Magerko (2020). Recent studies have expanded this definition to include essential skills such as recognition, application, evaluation, creation, and ethical navigation, particularly in light of the unique challenges posed by GenAI technologies like ChatGPT. The existing frameworks for AI literacy are often deemed too general, failing to address the specific competencies necessary for effectively engaging with GenAI, which necessitates a more integrated approach that combines theoretical knowledge, practical skills, and critical reflection.

The paper also highlights the development of various AI literacy assessment instruments, such as the Scale for the Assessment of Non-Experts’ AI Literacy (SNAIL) and the AI Literacy Scale (AILS), which evaluate different facets of AI literacy. However, a significant gap remains in the assessment of GenAI literacy, particularly regarding performance-based evaluations. The introduction of the Generative AI Literacy Assessment Test (GLAT) aims to fill this gap by measuring GenAI literacy among higher education students through a robust validation process grounded in Classical Test Theory (CTT) and Item Response Theory (IRT). The GLAT is designed to assess foundational knowledge, application, ethical awareness, and critical evaluation capabilities, addressing the urgent need for reliable and valid measures of GenAI literacy in educational settings.

Limitations

The section on limitations highlights several critical aspects of the study on GenAI literacy measurement in higher education. Firstly, the GenAI Literacy Assessment Tool (GLAT) was primarily developed for higher education students, which limits its applicability to younger K-12 students and educators. The use of specialized GenAI terminology in the GLAT items may further restrict its relevance across various academic disciplines and educational settings. Future research should aim to adapt and expand the GLAT to encompass a broader range of educational levels, participant demographics, and subject areas.

Additionally, the study’s external validity is based on specific tasks related to visual analytics and chatbot interactions, suggesting a need for future investigations to include diverse contexts and task complexities to fully understand the impact of GenAI literacy on learning performance. The focus on particular types of domain knowledge, such as visualization literacy, without considering other relevant areas, also presents a limitation. Future studies should explore a wider array of domain knowledge to better assess its influence on GenAI literacy outcomes. Moreover, the evolving nature of GenAI technology necessitates continuous updates to the GLAT to maintain its effectiveness. Lastly, the assessment’s English language requirement may hinder accessibility for non-English speakers, indicating a need for adaptations in different languages to enhance its validity across diverse linguistic populations. Integrating data-mining and visual analytics techniques could further allow for longitudinal tracking of GLAT usage, providing insights into the relationship between assessment scores and authentic learning behaviors in real-world educational contexts.