تطوير والتحقق من فعالية مجموعات بيانات تعليم الذكاء الاصطناعي بناءً على مبادئ التعلم البنائي لتعزيز معرفة الذكاء الاصطناعي Development and effectiveness verification of AI education data sets based on constructivist learning principles for enhancing AI literacy

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-95802-4
PMID: https://pubmed.ncbi.nlm.nih.gov/40155549
تاريخ النشر: 2025-03-28
المؤلف: Seul-Ki Kim وآخرون
الموضوع الرئيسي: جودة البيانات وإدارتها

نظرة عامة

تؤكد هذه الدراسة على الدور الحاسم لتعليم الذكاء الاصطناعي في تعزيز معرفة الطلاب بالذكاء الاصطناعي من خلال تطوير مجموعات بيانات متوافقة مع البناء. من خلال التعرف على قيود مجموعات البيانات التقليدية التي غالبًا ما تفتقر إلى الصلة بتجارب الطلاب الحياتية، أعاد الباحثون هيكلة دورة تطوير مجموعة بيانات التعلم الآلي. أنشأوا أربع مجموعات بيانات تعليمية متخصصة، تم تقييمها بدقة من حيث الجودة والأصالة، ونشرها على منصات البرمجة التعليمية المستخدمة على نطاق واسع. أظهر تحليل مقاييس الاستخدام والفعالية المقارنة أن هذه المجموعات من البيانات حسنت بشكل كبير من كفاءات الطلاب في الذكاء الاصطناعي مقارنة بمجموعات البيانات التقليدية، مما يلبي حاجة ملحة لموارد تعليمية مصممة خصيصًا.

تدعو النتائج إلى نهج منهجي لتطوير مجموعات البيانات في تعليم الذكاء الاصطناعي، مما يبرز أهمية ربط المعرفة السابقة للطلاب بتجارب حل المشكلات في العالم الحقيقي. من خلال توفير مجموعات بيانات عالية الجودة وذات صلة بالسياق، تساهم الدراسة في النقاش النظري حول تعليم الذكاء الاصطناعي وتقدم تداعيات عملية للمعلمين وصانعي السياسات. تؤكد الأبحاث على ضرورة تطوير مجموعات بيانات تركز على الصناعة وأخرى موجهة تربويًا لدعم تعليم الذكاء الاصطناعي الفعال، مشيرة إلى أن وضع معايير منهجية وطنية بناءً على هذه الموارد يمكن أن يعزز تنفيذ تعليم الذكاء الاصطناعي الجيد على مستوى المدارس.

مقدمة

في مقدمة ورقة البحث، يصف المؤلفون تنفيذ مجموعة بيانات تعليم الذكاء الاصطناعي للصيانة التي تهدف إلى تعزيز الوصول والفائدة التعليمية. يتم توزيع مجموعة البيانات عبر منصة البرمجة Entry، مما يسمح للمعلمين والطلاب بالوصول إليها من خلال سير عمل موحد يضمن التناسق مع مجموعات البيانات الأخرى على المنصة. تم تصميم واجهة مجموعة البيانات بناءً على توصيات الخبراء من مرحلة الاختبار، مع التركيز على تقييم الجودة والتطبيق العملي. تتضمن كل مجموعة بيانات مكونات أساسية مثل وصف أساسي، وتفسيرات للمتغيرات الرئيسية، وبيانات وصفية للأعمدة/الصفوف، وأمثلة على الاستخدام، جميعها تهدف إلى جعل مجموعات البيانات مفهومة وقابلة للتطبيق في سيناريوهات العالم الحقيقي.

بالإضافة إلى ذلك، أنشأ المؤلفون إطار عمل للصيانة يتضمن قنوات تغذية راجعة متعددة، بما في ذلك لوحة إعلانات متكاملة وبوابة ويب مخصصة لأدلة الاستخدام. يتيح هذا البنية التحتية للمستخدمين تقديم اقتراحات للتحسين، والتي تتم مراجعتها بشكل تعاوني من قبل الباحثين ومؤسسة Connect، الهيئة الحاكمة لـ Entry. يتم دمج التعديلات المعتمدة في بيئة البرمجة من خلال خطوط نشر آلية، مما يضمن بقاء مجموعات البيانات محدثة وذات صلة للأغراض التعليمية.

الطرق

تحدد قسم “الطرق” تصميم التجربة والتقنيات التحليلية المستخدمة في الدراسة. استخدم الباحثون نهجًا كميًا، يتضمن تجارب محكومة لجمع البيانات حول المتغيرات المحددة. تم إجراء تحليلات إحصائية باستخدام أدوات البرمجيات لضمان موثوقية وصلاحية النتائج. تضمنت المنهجيات الرئيسية تطبيق تحليل الانحدار لتحديد العلاقات بين المتغيرات واستخدام ANOVA لمقارنة متوسطات المجموعات.

بالإضافة إلى ذلك، دمجت الدراسة تقنية أخذ عينات منهجية لضمان عينة تمثيلية من السكان. شملت جمع البيانات مصادر أولية وثانوية، مع معايرة الأدوات من أجل الدقة. تم تصميم الطرق لتقليل التحيز وتعزيز إمكانية إعادة الإنتاج، مما يعزز بالتالي النتائج العامة للبحث.

النتائج

تُعرض نتائج الدراسة التي تهدف إلى التحقق من فعالية مجموعة بيانات تم تطويرها حديثًا لتعزيز معرفة الذكاء الاصطناعي من خلال برنامج تعليمي. تم إجراء اختبار مسبق باستخدام تقييمات موحدة لتحديد التكافؤ الأساسي بين المجموعات التجريبية والضابطة، حيث أظهرت كلا المجموعتين درجات متوسطة متطابقة من “محايد” عبر جميع الكفاءات الفرعية للذكاء الاصطناعي. على الرغم من أن المجموعة التجريبية أظهرت قيمًا أعلى قليلاً في الربع الثالث (Q3)، مما يشير إلى اختلاف طفيف في توزيع الدرجات العليا، أكدت اختبارات مان-ويتني U عدم وجود اختلافات ذات دلالة إحصائية (p > .05) بين المجموعتين قبل التدخل.

كشفت نتائج ما بعد التدخل عن تحسينات كبيرة في المجموعة التجريبية، التي استخدمت مجموعة بيانات “أحجام القمصان”، مقارنةً بالمجموعة الضابطة التي استخدمت مجموعة بيانات “أيريس” التقليدية. كانت الدرجات المتوسطة للمجموعة التجريبية عبر جميع الكفاءات الفرعية تقع ضمن نطاق “موافق” إلى “موافق بشدة”، بينما ظلت درجات المجموعة الضابطة في نطاق “محايد” إلى “موافق”. أشار التحليل الإحصائي إلى وجود اختلافات كبيرة (p < .001) عبر جميع الكفاءات الفرعية، مع إظهار قيم دلتا كليف أحجام تأثير إيجابية للمجموعة التجريبية، لا سيما في "معرفة البيانات" (0.397) و"فهم الذكاء الاصطناعي" (0.366). تؤكد هذه النتائج فعالية مجموعات البيانات ذات الصلة بالسياق في تعزيز معرفة الذكاء الاصطناعي وتبرز الحاجة إلى مجموعات بيانات متخصصة لمعالجة الجوانب الأخلاقية لتعليم الذكاء الاصطناعي.

المناقشة

تؤكد قسم المناقشة في ورقة البحث على أهمية الأساليب البنائية في تعليم الذكاء الاصطناعي، مما يبرز فعالية التعليم القائم على المشكلة/المشروع والتمارين العملية. يعزز البناء التعلم النشط من خلال ربط المعرفة الحالية للطلاب بالمفاهيم الجديدة، لا سيما في فهم مبادئ الذكاء الاصطناعي وتعزيز التفكير الحسابي. تدعو الورقة إلى أنشطة حقيقية ترتبط ارتباطًا وثيقًا بسياقات الحياة الواقعية للطلاب، مشيرة إلى أن استخدام البيانات العامة أو المفتوحة يمكن أن يحول تعليم الذكاء الاصطناعي إلى تجارب ذات مغزى. ومع ذلك، تشير إلى التحديات في الحصول على مجموعات البيانات وتكييفها لأغراض تعليمية، مما يبرز الحاجة إلى عمليات منهجية لضمان جودة وملاءمة هذه المجموعات.

تحدد القسم أيضًا إطار عمل مقترح لتطوير وتقييم مجموعات البيانات المخصصة لتعليم الذكاء الاصطناعي، مستمدة من مبادئ هندسة البرمجيات. يتضمن هذا الإطار دورة حياة تركز على التعاون مع الخبراء، والمساءلة، والتقييم المستمر لجودة مجموعة البيانات. يقترح المؤلفون نهجًا منظمًا لتطوير مجموعة البيانات يتماشى مع الأهداف التعليمية ويشمل مقاييس تقييم جودة صارمة. تهدف الدراسة إلى التحقق من فعالية هذه المجموعات من البيانات في تعزيز معرفة الطلاب بالذكاء الاصطناعي من خلال التقييمات التجريبية، مقارنة النتائج بين المجموعات التي تستخدم مجموعات البيانات المطورة حديثًا وتلك التقليدية. بشكل عام، تؤكد الأبحاث على ضرورة دمج مجموعات البيانات الحقيقية وذات الصلة بالسياق في تعليم الذكاء الاصطناعي لتعزيز التعلم الأعمق والانخراط.

القيود

تعترف الدراسة بعدة قيود، تتعلق أساسًا بالتحديد الجغرافي لمجموعات البيانات المطورة، التي تم توزيعها بشكل كبير عبر منصات شائعة في كوريا الجنوبية. من الجدير بالذكر أن بعض مجموعات البيانات، مثل مجموعة بيانات البعوض في سيول وحالة حدوث الزلازل، مرتبطة بظواهر إقليمية، مما يفرض قيودًا جغرافية على قابليتها للتطبيق.

لمعالجة هذه القيود، يجب أن تركز الأبحاث المستقبلية على تعميم عملية تطوير مجموعة البيانات لإنشاء مجموعات بيانات تعليم الذكاء الاصطناعي التي تمتلك خصائص محايدة جغرافيًا ويمكن توسيع نطاقها بشكل فعال. علاوة على ذلك، يقترح المؤلفون تطوير مكتبات أو خدمات ويب تهدف إلى توليد مجموعات بيانات صناعية مشابهة إحصائيًا من مجموعات البيانات الكبيرة الموجودة، مما يعزز الفائدة العامة لموارد التعليم في مجال الذكاء الاصطناعي.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-95802-4
PMID: https://pubmed.ncbi.nlm.nih.gov/40155549
Publication Date: 2025-03-28
Author(s): Seul-Ki Kim et al.
Primary Topic: Data Quality and Management

Overview

This study emphasizes the critical role of AI education in enhancing students’ AI literacy through the development of constructivist-aligned datasets. Recognizing the limitations of conventional datasets that often lack relevance to students’ lived experiences, the researchers restructured the machine learning dataset development cycle. They created four specialized educational datasets, rigorously evaluated for quality and authenticity, and deployed them on widely used educational programming platforms. The analysis of usage metrics and comparative effectiveness demonstrated that these datasets significantly improved students’ AI competencies compared to traditional datasets, fulfilling a pressing need for purpose-built educational resources.

The findings advocate for a systematic approach to dataset development in AI education, highlighting the importance of connecting students’ prior knowledge with real-world problem-solving experiences. By providing high-quality, contextually relevant datasets, the study contributes to the theoretical discourse on AI education and offers practical implications for educators and policymakers. The research underscores the necessity of developing both industry-focused and pedagogically oriented datasets to support effective AI education, suggesting that establishing national curriculum standards based on these resources could enhance the implementation of quality AI education at the school level.

Introduction

In the introduction of the research paper, the authors describe the implementation of a Maintenance AI education dataset aimed at enhancing accessibility and educational utility. The dataset is distributed via the Entry programming platform, allowing educators and students to access it through a standardized workflow that ensures consistency with other datasets on the platform. The dataset interface has been designed based on expert recommendations from the testing phase, focusing on quality assessment and practical application. Each dataset includes essential components such as a basic description, key variable explanations, column/row metadata, and usage examples, all aimed at making the datasets comprehensible and applicable to real-world scenarios.

Additionally, the authors established a maintenance framework that incorporates multiple feedback channels, including an integrated bulletin board and a dedicated web portal for usage guides. This infrastructure enables users to submit suggestions for improvements, which are then collaboratively reviewed by researchers and the Connect Foundation, the governing body of Entry. Approved modifications are integrated into the programming environment through automated deployment pipelines, ensuring that the datasets remain up-to-date and relevant for educational purposes.

Methods

The “Methods” section outlines the experimental design and analytical techniques employed in the study. The researchers utilized a quantitative approach, involving controlled experiments to gather data on the specified variables. Statistical analyses were conducted using software tools to ensure the reliability and validity of the results. Key methodologies included the application of regression analysis to identify relationships between variables and the use of ANOVA to compare group means.

Additionally, the study incorporated a systematic sampling technique to ensure a representative sample of the population. Data collection involved both primary and secondary sources, with instruments calibrated for accuracy. The methods were designed to minimize bias and enhance reproducibility, thereby strengthening the overall findings of the research.

Results

The results of the study aimed at validating the effectiveness of a newly developed dataset for enhancing AI literacy through an educational program are presented. A pre-test was conducted using standardized assessments to establish baseline equivalence between experimental and control groups, both of which exhibited identical median scores of ‘Neutral’ across all AI literacy sub-competencies. Although the experimental group showed slightly higher third quartile (Q3) values, indicating a marginal difference in upper score distribution, Mann-Whitney U tests confirmed no statistically significant differences (p > .05) between the groups prior to the intervention.

Post-intervention results revealed significant improvements in the experimental group, which utilized the ‘T-shirt sizes’ dataset, compared to the control group that used the traditional ‘Iris’ dataset. The experimental group’s median scores across all sub-competencies fell within the ‘Agree’ to ‘Strongly Agree’ range, while the control group’s scores remained in the ‘Neutral’ to ‘Agree’ range. Statistical analysis indicated significant differences (p < .001) across all sub-competencies, with Cliff's Delta values showing positive effect sizes for the experimental group, particularly in 'Data literacy' (0.397) and 'Understanding of AI' (0.366). These findings underscore the effectiveness of contextually relevant datasets in enhancing AI literacy and highlight the need for specialized datasets to address ethical aspects of AI education.

Discussion

The discussion section of the research paper emphasizes the significance of constructivist approaches in AI education, highlighting the effectiveness of problem/project-centered instruction and practical exercises. Constructivism fosters active learning by connecting students’ existing knowledge with new concepts, particularly in understanding AI principles and enhancing computational thinking. The paper advocates for authentic activities that relate closely to students’ real-life contexts, suggesting that utilizing public or open data can transform AI education into meaningful experiences. However, it notes the challenges in sourcing and adapting datasets for educational purposes, stressing the need for systematic processes to ensure the quality and relevance of these datasets.

The section further outlines a proposed framework for developing and evaluating datasets tailored for AI education, drawing from software engineering principles. This framework includes a lifecycle that emphasizes collaboration with experts, accountability, and continuous evaluation of dataset quality. The authors propose a structured approach to dataset development that aligns with educational objectives and incorporates rigorous quality assessment metrics. The study aims to validate the effectiveness of these datasets in enhancing students’ AI literacy through experimental evaluations, comparing outcomes between groups using the newly developed datasets and traditional ones. Overall, the research underscores the necessity of integrating authentic, contextually relevant datasets into AI education to foster deeper learning and engagement.

Limitations

The study acknowledges several limitations, primarily related to the geographical specificity of the datasets developed, which were largely distributed through platforms prevalent in South Korea. Notably, certain datasets, such as the Seoul Mosquito and Earthquake Occurrence Status, are tied to regional phenomena, thereby imposing geographical constraints on their applicability.

To address these limitations, future research should focus on generalizing the dataset development process to create AI education datasets that possess geographically neutral characteristics and can be scaled effectively. Furthermore, the authors propose the development of libraries or web services aimed at generating statistically similar synthetic datasets from existing bulk datasets, thereby enhancing the overall utility of AI educational resources.