أتمتة المهام ودعم التخطيط التعليمي باستخدام نماذج اللغة الكبيرة: مراجعة منهجية Task automation and instructional planning support with large language models: a systematic review

المجلة: Frontiers in Education، المجلد: 11
DOI: https://doi.org/10.3389/feduc.2026.1733861
تاريخ النشر: 2026-02-05
المؤلف: Giovanni Luna Chontal وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في تفاعلات الخدمة

نظرة عامة

تستعرض المراجعة المنهجية تأثير نماذج اللغة الكبيرة (LLMs) على الممارسات التعليمية، لا سيما في التعليم الثانوي والعالي. من خلال تحليل ستة عشر دراسة نُشرت بين عامي 2023 و2025، تحدد المراجعة أن نماذج اللغة الكبيرة يمكن أن تقلل من الوقت الذي يقضيه المعلمون في إنتاج المواد التعليمية وتعزز الجودة المدركة لهذه الموارد. ومع ذلك، فإن النتائج غير متجانسة، وتعتمد بشكل كبير على البيانات المبلغ عنها ذاتيًا وتظهر خطرًا غير واضح للتحيز، لا سيما في سياقات التعليم العالي ذات أحجام العينات الصغيرة. تشمل التحديات الرئيسية عدم إمكانية إعادة إنتاج مخرجات نماذج اللغة الكبيرة، والاعتماد على تصميم المطالبات، والمخاوف الأخلاقية المتعلقة باستخدامها.

تسلط المراجعة الضوء على أربعة مواضيع رئيسية: (1) يمكن أن تخفف نماذج اللغة الكبيرة من أعباء المعلمين مع تحسين جودة المواد التعليمية؛ (2) يتطلب التبني الفعال تدريبًا في هندسة المطالبات وتصميم واجهات سهلة الاستخدام؛ (3) هناك حاجة ملحة لمقاييس موحدة وآليات تحقق لضمان موثوقية المحتوى الناتج؛ و(4) يجب موازنة الفوائد المحتملة لنماذج اللغة الكبيرة مع المخاطر مثل عدم تغطية المحتوى بشكل كامل وغياب البروتوكولات الأخلاقية. تؤكد النتائج على ضرورة الشفافية وآليات التحقق مع تحول نماذج اللغة الكبيرة إلى جزء لا يتجزأ من تصميم التعليم، مشددة على أنه بدون هذه الضمانات، تظل موثوقية المخرجات الناتجة عن نماذج اللغة الكبيرة محل تساؤل.

مقدمة

تسلط المقدمة الضوء على التحديات التي تواجهها الأنظمة التعليمية التقليدية في التكيف مع متطلبات القرن الحادي والعشرين، لا سيما في البلدان النامية. تشمل القضايا الرئيسية عدم كفاية دمج التقنيات الرقمية، والتي تعزى إلى الموارد المحدودة، والمقاومة المؤسسية، وعدم كفاية تدريب المعلمين في الكفاءات الرقمية. إن استمرار طرق التدريس التقليدية يقيد الابتكار التربوي، مما يبرز الحاجة إلى بيئات تعليمية أكثر ديناميكية وتكيفًا وشمولية.

تظهر نماذج اللغة الكبيرة (LLMs) مثل GPT-4 كحل محتمل، حيث تقدم قدرات للتعلم الشخصي والدعم التربوي في الوقت الحقيقي. يمكن لهذه الأدوات الذكية، المبنية على الشبكات العصبية العميقة، أن تنتج محتوى واعٍ بالسياق وتسهّل تجارب التعلم التفاعلية. تتراوح تطبيقاتها من محاكاة المرضى الافتراضية في التعليم الطبي إلى العمل كمساعدين تربويين في التعليم العالي. ومع ذلك، لا تزال التحديات مثل إعادة إنتاج التحيز، وموثوقية المحتوى، والحاجة إلى بنية تحتية تكنولوجية كبيرة قائمة. بالإضافة إلى ذلك، تثير المخاوف الأخلاقية المتعلقة بالخصوصية والشفافية من طبيعة هذه الخوارزميات “الصندوق الأسود”، مما يستدعي الحاجة إلى أطر تنظيمية واستراتيجيات للتخفيف من التحيز، والتي لا يتم تنفيذها بشكل موحد عبر مناطق وسياقات تعليمية مختلفة.

الطرق

تم تسجيل المراجعة المنهجية الموضحة في هذا القسم مع إطار العلوم المفتوحة واتبعت إرشادات PRISMA 2020، مع التركيز على تطبيق نماذج اللغة الكبيرة (LLMs) في السياقات التعليمية. تم تنفيذ عملية اختيار وتحليل صارمة باستخدام أدوات رقمية متخصصة. سهلت Rayyan Web الفحص الأولي للعناوين والملخصات، مما عزز التعاون وقلل من التحيز من خلال ميزات مثل اتخاذ القرارات المجهولة. لاستخراج البيانات والتحليل النوعي، تم استخدام RevMan Web لتنظيم خصائص الدراسات وتوليف النتائج. استخدمت المراجعة أدوات تقييم الجودة الموحدة، بما في ذلك أداة ROBINS-I للدراسات غير العشوائية لتقييم خطر التحيز عبر سبعة مجالات، وقائمة فحص CASP للدراسات النوعية والمختلطة لضمان الصرامة المنهجية.

لضمان استرجاع شامل للأدبيات، تم تطوير استراتيجية بحث تجمع بين مصطلحات النص الحر مع المفردات الخاضعة للرقابة من قاموس IEEE، مع التركيز على مجموعتين مفهومتين: التطبيقات التعليمية لنماذج اللغة الكبيرة والمصطلحات الأساسية في الذكاء الاصطناعي. كانت هذه المقاربة تهدف إلى زيادة تحديد الدراسات ذات الصلة مع الاعتراف بإمكانية وجود تحيز في الاختيار بسبب خصوصية مصطلحات البحث. تم تطبيق إطار البحث عبر قواعد بيانات متعددة، بما في ذلك Scopus ومكتبة ACM الرقمية، لضمان تغطية واسعة ومتسقة للأدبيات المتعلقة بالتطبيقات التعليمية لنماذج اللغة الكبيرة.

النتائج

يقدم قسم النتائج ملخصًا للنتائج الرئيسية المتعلقة باستخدام نماذج اللغة الكبيرة (LLMs) في السياقات التعليمية، مع تسليط الضوء على كل من فوائدها المحتملة والقيود العملية. يوضح الجدول 5 الاتجاهات الرئيسية وطرق القياس، مشددًا على أن النتائج يجب أن تُعتبر دلالية بدلاً من نهائية. تشير الأدلة إلى أن نماذج اللغة الكبيرة يمكن أن تساعد في التخطيط التعليمي وتقليل وقت إعداد المحتوى؛ ومع ذلك، يجب معالجة عدة قيود.

أولاً، تشير الدراسات إلى أن المواد الناتجة عن نماذج اللغة الكبيرة غالبًا ما تفتقر إلى العناصر التربوية الأساسية وتكون حساسة للتغييرات الطفيفة في المطالبات، مما يتطلب عملية مراجعة بشرية منهجية لضمان التوافق مع أهداف التعلم ومعايير التقييم. وهذا يشير إلى تفضيل سير العمل الذي يتضمن الإنسان (مسودة → تحقق → تعديل) بدلاً من الأتمتة الكاملة. ثانيًا، يتأثر فعالية نماذج اللغة الكبيرة بشكل كبير بعوامل التنفيذ مثل معرفة المعلمين بالمطالبات وتوافر قوالب سهلة الاستخدام، مما يمكن أن يؤثر على التوفير الزمني المبلغ عنه. أخيرًا، يمكن أن تحد القيود المؤسسية، بما في ذلك الوصول إلى النماذج، ومخاوف الخصوصية، وسياسات الحوكمة، من نشر نماذج اللغة الكبيرة في البيئات التعليمية. بشكل جماعي، تسلط هذه النتائج الضوء على أنه بينما تحمل نماذج اللغة الكبيرة وعودًا، فإن دمجها الناجح في الممارسات التعليمية يتطلب اعتبارًا دقيقًا لكل من قدرات النموذج واستعداد السياق.

المناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على التعقيدات والتحديات المرتبطة بدمج نماذج اللغة الكبيرة (LLMs) في البيئات التعليمية. بينما تقدم نماذج اللغة الكبيرة فوائد محتملة، مثل تحسين إنتاج المحتوى وتعزيز التفاعلات التربوية، فإنها تثير أيضًا مخاوف أخلاقية وعملية كبيرة. يتم التأكيد على قضايا مثل الاستجابات المتحيزة، والأخطاء في المحتوى الناتج، وخطر اعتماد الطلاب على هذه الأدوات. علاوة على ذلك، يتم التساؤل عن صلاحية التقييمات الآلية، وتُعقد غموض نماذج اللغة الكبيرة من إمكانية تفسيرها والتحكم التربوي فيها. تؤكد الدراسة على ضرورة وجود استراتيجيات تربوية قوية، وأطر تنظيمية، وتدريب شامل للمعلمين لتسهيل الاستخدام الفعال والمسؤول لنماذج اللغة الكبيرة في التعليم.

تهدف المراجعة المنهجية إلى تلخيص الأدبيات الموجودة حول نماذج اللغة الكبيرة في السياقات التعليمية، مسترشدة بأسئلة بحث محددة تم صياغتها ضمن إطار PICOS. تركز هذه الأسئلة على تأثير نماذج اللغة الكبيرة على الوقت الذي يقضيه المعلمون في إنتاج المحتوى والجودة المدركة للمواد التعليمية، بالإضافة إلى أتمتة المهام التعليمية والتخطيط. تتضمن المراجعة عملية اختيار صارمة للدراسات، مما يضمن أن يتم النظر فقط في المقالات التي تمت مراجعتها من قبل الأقران والمتعلقة باستخدام نماذج اللغة الكبيرة في التعليم. في النهاية، تهدف النتائج إلى تقديم فهم دقيق لكيفية الاستفادة من نماذج اللغة الكبيرة لتعزيز الممارسات التعليمية مع الاعتراف بالقيود والمخاطر المرتبطة باستخدامها.

القيود

يسلط قسم القيود الضوء على عدة قيود رئيسية مرتبطة باستخدام نماذج اللغة الكبيرة (LLMs) في السياقات التعليمية، كما هو محدد في الدراسات التي أجريت بين عامي 2023 وأوائل 2025. هذه القيود ليست متأصلة في نماذج اللغة الكبيرة ككل، ولكنها محددة للنماذج والتكوينات التي تم تقييمها. تشمل القيود الملحوظة النوافذ السياقية المحدودة التي تعيق دمج مصادر متعددة، وقطع المعرفة التي تؤدي إلى استجابات قديمة، ومخاطر عدم دقة المخرجات مثل الهلاوس والانحيازات. بالإضافة إلى ذلك، تعيق القيود التصميمية للتعليم، مثل الاعتماد على المطالبات النصية التي تتطلب مستوى عالٍ من الكفاءة الرقمية وغياب الواجهات الرسومية المتكاملة، التبني الفعال من قبل المعلمين.

تشمل القيود الناتجة عن المخرجات التي تم تحديدها عبر الدراسات نقص في إمكانية إعادة إنتاج الاستجابات، وعدم كفاية تغطية المحتوى، والأخطاء الواقعية، ومخاوف تتعلق بأمان البيانات الحساسة. تؤكد هذه القضايا على ضرورة التحقق الدقيق من المحتوى الناتج وتبرز اعتماد فعالية نماذج اللغة الكبيرة على جودة المطالبات المدخلة. علاوة على ذلك، تعترف المراجعة نفسها بالقيود المتعلقة بتنوع قاعدة الأدلة، والتركيز على سياقات التعليم العالي، وإمكانية وجود تحيزات في اختيار الدراسات. بشكل عام، تؤكد النتائج على أن قيود نماذج اللغة الكبيرة مرتبطة بالزمن والسياق، مما يتطلب إجراء أبحاث مستمرة للتكيف مع المشهد المتطور بسرعة لتكنولوجيا التعليم.

Journal: Frontiers in Education, Volume: 11
DOI: https://doi.org/10.3389/feduc.2026.1733861
Publication Date: 2026-02-05
Author(s): Giovanni Luna Chontal et al.
Primary Topic: AI in Service Interactions

Overview

The systematic review investigates the impact of large language models (LLMs) on educational practices, particularly in secondary and higher education. Analyzing sixteen studies published between 2023 and 2025, the review identifies that LLMs can potentially reduce the time teachers spend on generating educational materials and enhance the perceived quality of these resources. However, the findings are heterogeneous, relying heavily on self-reported data and exhibiting unclear risk of bias, particularly in higher education contexts with small sample sizes. Key challenges include the non-reproducibility of LLM outputs, dependence on prompt design, and ethical concerns regarding their use.

The review highlights four core themes: (1) LLMs can alleviate teachers’ workloads while improving the quality of educational materials; (2) effective adoption requires training in prompt engineering and user-friendly interface design; (3) there is an urgent need for standardized metrics and validation mechanisms to ensure the reliability of generated content; and (4) the potential benefits of LLMs must be balanced against risks such as incomplete content coverage and lack of ethical protocols. The findings underscore the necessity for transparency and validation mechanisms as LLMs become integral to instructional design, emphasizing that without these safeguards, the trustworthiness of LLM-generated outputs remains questionable.

Introduction

The introduction highlights the challenges faced by traditional educational systems in adapting to the demands of the 21st century, particularly in developing countries. Key issues include the insufficient integration of digital technologies, which is attributed to limited resources, institutional resistance, and inadequate teacher training in digital competencies. The persistence of conventional teaching methods further restricts pedagogical innovation, underscoring the need for educational environments that are more dynamic, adaptive, and inclusive.

Emerging as a potential solution, Large Language Models (LLMs) like GPT-4 offer capabilities for personalized learning and real-time pedagogical support. These AI tools, built on deep neural networks, can generate context-aware content and facilitate interactive learning experiences. Their applications range from virtual patient simulations in medical education to serving as pedagogical assistants in higher education. However, challenges such as bias reproduction, content reliability, and the need for substantial technological infrastructure persist. Additionally, ethical concerns regarding privacy and transparency arise from the “black-box” nature of these algorithms, prompting the need for regulatory frameworks and bias mitigation strategies, which are not uniformly implemented across different regions and educational contexts.

Methods

The systematic review outlined in this section was registered with the Open Science Framework and adhered to PRISMA 2020 guidelines, focusing on the application of Large Language Models (LLMs) in educational contexts. A rigorous selection and analysis process was implemented using specialized digital tools. Rayyan Web facilitated the initial screening of titles and abstracts, enhancing collaboration and minimizing bias through features such as blinded decision-making. For data extraction and qualitative analysis, RevMan Web was utilized to organize study characteristics and synthesize findings. The review employed standardized quality-assessment instruments, including the ROBINS-I tool for non-randomized studies to evaluate risk of bias across seven domains, and the CASP checklist for qualitative and mixed-methods studies to ensure methodological rigor.

To ensure comprehensive literature retrieval, a search strategy was developed that combined free-text terms with controlled vocabulary from the IEEE Thesaurus, focusing on two conceptual clusters: educational applications of LLMs and core artificial intelligence terminology. This approach aimed to maximize the identification of relevant studies while acknowledging potential selection bias due to the specificity of the search terms. The search framework was applied across multiple databases, including Scopus and the ACM Digital Library, ensuring a broad and consistent coverage of literature pertinent to the educational applications of LLMs.

Results

The results section presents a summary of key findings regarding the use of Large Language Models (LLMs) in educational contexts, highlighting both their potential benefits and practical limitations. Table 5 outlines the main trends and measurement approaches, emphasizing that findings should be viewed as indicative rather than definitive. The evidence suggests that LLMs can assist in instructional planning and reduce content preparation time; however, several limitations must be addressed.

Firstly, the studies indicate that LLM-generated materials often lack essential pedagogical elements and are sensitive to minor changes in prompts, necessitating a systematic human review process to ensure alignment with learning objectives and assessment criteria. This suggests a preference for a human-in-the-loop workflow (draft → verify → adapt) rather than complete automation. Secondly, the effectiveness of LLMs is significantly influenced by implementation factors such as teachers’ prompt literacy and the availability of user-friendly templates, which can affect the time savings reported. Lastly, institutional constraints, including model access, data privacy concerns, and governance policies, can limit the deployment of LLMs in educational settings. Collectively, these findings highlight that while LLMs hold promise, their successful integration into instructional practices requires careful consideration of both model capabilities and contextual readiness.

Discussion

The discussion section of the research paper highlights the complexities and challenges associated with the integration of Large Language Models (LLMs) in educational settings. While LLMs offer potential benefits, such as optimizing content generation and enhancing pedagogical interactions, they also raise significant ethical and practical concerns. Issues such as biased responses, errors in generated content, and the risk of student dependency on these tools are emphasized. Furthermore, the validity of automated assessments is questioned, and the opacity of LLMs complicates their explainability and pedagogical control. The study underscores the necessity for robust pedagogical strategies, regulatory frameworks, and comprehensive teacher training to facilitate the effective and responsible use of LLMs in education.

The systematic review aims to synthesize existing literature on LLMs in educational contexts, guided by specific research questions framed within the PICOS framework. These questions focus on the impact of LLMs on the time teachers spend on content generation and the perceived quality of educational materials, as well as the automation of educational tasks and planning. The review includes a rigorous selection process for studies, ensuring that only peer-reviewed articles relevant to the use of LLMs in education are considered. Ultimately, the findings aim to provide a nuanced understanding of how LLMs can be leveraged to enhance educational practices while acknowledging the limitations and risks associated with their use.

Limitations

The section on limitations highlights several key constraints associated with the use of Large Language Models (LLMs) in educational contexts, as identified in studies conducted between 2023 and early 2025. These limitations are not inherent to LLMs as a whole but are specific to the models and configurations evaluated. Notable constraints include limited context windows that hinder the integration of multiple sources, knowledge cutoffs leading to outdated responses, and risks of output inaccuracies such as hallucinations and biases. Additionally, design limitations for teaching, such as reliance on textual prompts requiring high digital literacy and the absence of integrated graphical interfaces, impede effective adoption by educators.

The output limitations identified across the studies include a lack of reproducibility in responses, insufficient content coverage, factual inaccuracies, and security concerns regarding sensitive data. These issues underscore the necessity for careful validation of generated content and highlight the dependence of LLM effectiveness on the quality of input prompts. Furthermore, the review itself acknowledges limitations related to the heterogeneity of the evidence base, the focus on higher education contexts, and potential biases in study selection. Overall, the findings emphasize that the constraints of LLMs are time-bound and context-dependent, necessitating ongoing research to adapt to the rapidly evolving landscape of educational technology.