الذكاء الاصطناعي في تطوير المقاييس: تقييم عناصر الاستطلاع التي تم إنشاؤها بواسطة الذكاء الاصطناعي مقابل المقاييس القياسية Artificial intelligence in scale development: evaluating AI-generated survey items against gold standard measures

المجلة: Current Psychology، المجلد: 44، العدد: 20
DOI: https://doi.org/10.1007/s12144-025-08240-w
تاريخ النشر: 2025-08-25
المؤلف: John D. Terry وآخرون
الموضوع الرئيسي: الإصرار، الكفاءة الذاتية، والدافع

نظرة عامة

تبحث الدراسة في تطبيق ChatGPT النسخة 3.5 في تطوير عناصر الاستبيان لقياس العزيمة، مقارنة خصائصها النفسية بتلك الخاصة بأداة موثوقة. في الدراسة 1، أكدت تحليل العوامل الاستكشافية الذي شمل 180 طالبًا جامعيًا أن العناصر التي تم إنشاؤها بواسطة الذكاء الاصطناعي أعادت إنتاج الهيكل ذو العاملين لمقياس العزيمة القصير، محققة اتساقًا داخليًا عاليًا (العامل 1، $\alpha = 0.94$؛ العامل 2، $\alpha = 0.93$) وأظهرت ارتباطات معتدلة إلى قوية مع المقياس المعتمد. الدراسة 2، التي شملت عينة أكبر من 366 مشاركًا، استخدمت تحليل العوامل التأكيدي للتحقق من الهيكل ذو العاملين مع تحميلات عامل قوية (تتراوح من 0.78 إلى 0.88) ومؤشرات ملاءمة نموذج مقبولة (CFI = 0.97، TLI = 0.95، RMSEA = 0.09، SRMR = 0.04). علاوة على ذلك، أشارت تحليل الانحدار الهرمي إلى أن العناصر التي تم إنشاؤها بواسطة الذكاء الاصطناعي كانت أفضل مؤشرات على الأداء الأكاديمي، حيث تفسر تباينًا أكبر في متوسط الدرجات مقارنة بمقياس العزيمة القصير.

تقترح الدراسة أن تركز الأبحاث المستقبلية على تعزيز الأصالة وجودة المخرجات التي ينتجها الذكاء الاصطناعي من خلال استخدام نماذج ذكاء اصطناعي متقدمة، وهندسة مطالبات متطورة، ودمج الخبرة البشرية. يتم تشجيع الباحثين على تقديم مطالبات مفصلة تحدد الخصائص المرغوبة وأمثلة على الأصالة لتحسين إبداع المحتوى الذي ينتجه الذكاء الاصطناعي. يتم التأكيد على أنه بينما يمكن للذكاء الاصطناعي المساعدة في توليد عناصر الاستبيان، يجب أن يكمل الأساليب التقليدية المدفوعة بالنظرية لتطوير المقياس، مما يضمن توافق العناصر مع البنى النظرية لصلاحية الأداة.

النتائج

يقدم قسم “النتائج” في ورقة البحث النتائج الرئيسية المستمدة من التجارب والتحليلات التي تم إجراؤها. يوضح النتائج الناتجة عن اختبارات مختلفة، مع تسليط الضوء على الارتباطات الإحصائية الهامة والاتجاهات الملحوظة في البيانات. تشير النتائج إلى أن الفرضية المقترحة مدعومة، مع قياسات كمية تظهر علاقة واضحة بين المتغيرات قيد التحقيق.

بالإضافة إلى ذلك، يتضمن القسم تمثيلات بيانية وجداول تلخص البيانات، مما يسمح بتفسير بصري للنتائج. يتم الإبلاغ عن مقاييس محددة، مثل قيم p وفترات الثقة، لدعم صلاحية النتائج. بشكل عام، تسهم النتائج في الجسم المعرفي القائم وتقترح تداعيات محتملة للبحث المستقبلي في هذا المجال.

المناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على التكامل المتزايد للذكاء الاصطناعي (AI) في العلوم النفسية، مع التأكيد على إمكانيته في تعزيز منهجيات البحث، لا سيما في تطوير المقاييس. يشير المؤلفون إلى أنه بينما أظهر الذكاء الاصطناعي، وخاصة الذكاء الاصطناعي التوليدي ونماذج اللغة الكبيرة (LLMs) مثل ChatGPT من OpenAI، وعدًا في توليد عناصر الاستبيان وتحليل الخصائص النفسية، لا يزال هناك فجوة كبيرة في البحث التجريبي الذي يقيم كل من الفوائد والمخاطر المرتبطة بالذكاء الاصطناعي في البحث النفسي. توضح الدراسة كيف يمكن للذكاء الاصطناعي تبسيط عملية توليد العناصر، مما قد يحسن من كفاءة وجودة التقييمات النفسية.

علاوة على ذلك، يؤكد المؤلفون على ضرورة التقييم الدقيق لتطبيقات الذكاء الاصطناعي في البحث، داعين إلى وضع إرشادات أخلاقية وتقييمات شاملة لتأثيرات الذكاء الاصطناعي. يقترحون أن التحقق التجريبي من العناصر التي تم إنشاؤها بواسطة الذكاء الاصطناعي أمر حاسم لضمان موثوقيتها وصلاحيتها، فضلاً عن التخفيف من المخاطر مثل التحيز. تقدم الورقة نتائج من دراستين تقيم الخصائص النفسية لمقياس العزيمة الذي تم إنشاؤه بواسطة الذكاء الاصطناعي، مما يوضح أن الذكاء الاصطناعي يمكن أن ينتج عناصر بخصائص نفسية قابلة للمقارنة مع المقاييس المعتمدة. وهذا يبرز إمكانيات الذكاء الاصطناعي في تحويل تطوير المقاييس في علم النفس، بينما يسلط الضوء أيضًا على الحاجة إلى التدقيق المستمر والتحقق من المحتوى الذي ينتجه الذكاء الاصطناعي لحماية نزاهة البحث.

Journal: Current Psychology, Volume: 44, Issue: 20
DOI: https://doi.org/10.1007/s12144-025-08240-w
Publication Date: 2025-08-25
Author(s): John D. Terry et al.
Primary Topic: Grit, Self-Efficacy, and Motivation

Overview

The research investigates the application of ChatGPT version 3.5 in developing survey items for measuring grit, comparing their psychometric properties with those of a validated instrument. In Study 1, an exploratory factor analysis involving 180 college students confirmed that AI-generated items replicated the two-factor structure of the Short Grit Scale, achieving high internal consistency (Factor 1, $\alpha = 0.94$; Factor 2, $\alpha = 0.93$) and demonstrating moderate to strong correlations with the established scale. Study 2, which included a larger sample of 366 participants, utilized confirmatory factor analysis to validate the two-factor structure with strong factor loadings (ranging from 0.78 to 0.88) and acceptable model fit indices (CFI = 0.97, TLI = 0.95, RMSEA = 0.09, SRMR = 0.04). Furthermore, hierarchical regression analysis indicated that AI-generated items were better predictors of academic performance, explaining more variance in grade point average than the Short Grit Scale.

The study suggests future research should focus on enhancing the originality and quality of AI-generated outputs by employing advanced AI models, sophisticated prompt engineering, and integrating human expertise. Researchers are encouraged to provide detailed prompts that specify desired characteristics and examples of originality to improve the creativity of AI-generated content. It is emphasized that while AI can assist in generating survey items, it should complement traditional, theory-driven approaches to scale development, ensuring that the items align with theoretical constructs for instrument validity.

Results

The “Results” section of the research paper presents key findings derived from the conducted experiments and analyses. It details the outcomes of various tests, highlighting significant statistical correlations and trends observed in the data. The results indicate that the proposed hypothesis is supported, with quantitative measures demonstrating a clear relationship between the variables under investigation.

Additionally, the section includes graphical representations and tables that summarize the data, allowing for a visual interpretation of the findings. Specific metrics, such as p-values and confidence intervals, are reported to substantiate the validity of the results. Overall, the findings contribute to the existing body of knowledge and suggest potential implications for future research in the field.

Discussion

The discussion section of the research paper highlights the increasing integration of artificial intelligence (AI) in psychological science, emphasizing its potential to enhance research methodologies, particularly in scale development. The authors note that while AI, especially generative AI and large language models (LLMs) like OpenAI’s ChatGPT, has shown promise in generating survey items and analyzing psychometric properties, there remains a significant gap in empirical research assessing both the benefits and risks associated with AI in psychological research. The study exemplifies how AI can streamline the item generation process, potentially improving the efficiency and quality of psychological assessments.

Furthermore, the authors stress the necessity for rigorous evaluation of AI applications in research, advocating for the establishment of ethical guidelines and comprehensive assessments of AI’s impacts. They propose that empirical validation of AI-generated items is crucial to ensure their reliability and validity, as well as to mitigate risks such as bias. The paper presents findings from two studies that evaluate the psychometric properties of an AI-generated grit scale, demonstrating that AI can produce items with psychometric properties comparable to established measures. This underscores the potential of AI to transform scale development in psychology, while also highlighting the need for ongoing scrutiny and validation of AI-generated content to safeguard research integrity.