حول القدرات الناشئة لـ ChatGPT 4 لتقدير سمات الشخصية On the emergent capabilities of ChatGPT 4 to estimate personality traits

المجلة: Frontiers in Artificial Intelligence، المجلد: 8
DOI: https://doi.org/10.3389/frai.2025.1484260
PMID: https://pubmed.ncbi.nlm.nih.gov/40017486
تاريخ النشر: 2025-02-13
المؤلف: Marco Piastra وآخرون
الموضوع الرئيسي: سمات الشخصية وعلم النفس

نظرة عامة

تستكشف هذه الدراسة فعالية ChatGPT 4 في تقييم سمات الشخصية من خلال النصوص المكتوبة، باستخدام مجموعتين من البيانات متاحة للجمهور تشمل عينات نصية وتقييمات ذاتية بناءً على نموذج السمات الخمس الكبرى. سعى الباحثون لتقييم أداء ChatGPT 4 التنبؤي من خلال توليد توقعات عددية على مقياس من أحد عشر نقطة لكل نص ومقارنة هذه التوقعات مع التقييمات الذاتية للمؤلفين. بالإضافة إلى ذلك، تم طلب درجات الثقة من ChatGPT 4 لكل توقع.

تشير النتائج إلى أن ChatGPT 4 يظهر قدرات معتدلة ولكنها ذات دلالة إحصائية في استنتاج سمات الشخصية من المحتوى المكتوب. ومع ذلك، يظهر النموذج قيودًا في تحديد ملاءمة وتمثيل النصوص المدخلة، مما قد يؤثر على دقة استنتاجاته. تسلط الدراسة أيضًا الضوء على إمكانية تحسين طرق القياس لتعزيز كفاءة وموثوقية عملية التقييم. بشكل عام، تساهم هذه النتائج في فهم أعمق لقدرات نماذج اللغة الكبيرة في مجال تقييم الشخصية بناءً على تحليل النصوص.

مقدمة

تناقش مقدمة هذه الورقة البحثية قدرات نماذج اللغة الكبيرة التوليدية (LLMs)، وخاصة ChatGPT، في توليد محتوى قيم ومساعدة في مهام متنوعة. يبرز المؤلفون الفعالية المتزايدة لنماذج LLMs كعوامل محادثة، مستشهدين بدراسات تظهر قدرتها على إنشاء رسائل مقنعة مصممة لتناسب السمات النفسية الفردية والانخراط في مناقشات تعزز التوافق حول القضايا المثيرة للجدل. تتراوح التطبيقات المحتملة لنماذج LLMs من الدعم النفسي الشخصي إلى أنظمة التعلم التكيفية، مما يبرز الحاجة إلى هذه النماذج لتقييم الخصائص النفسية للمستخدمين وتكييف ردودها وفقًا لذلك دون الحاجة إلى إعادة التدريب.

تدرس الدراسة بشكل محدد مدى دقة تقييم ChatGPT 4 لأبعاد الشخصية الخمس الكبرى من النص المكتوب، باستخدام استراتيجية تحفيز “صفر-لقطة” دون أمثلة مسبقة أو تقنيات تعزيز. يطرح المؤلفون سؤالين بحثيين: (RQ1) كيف تقارن تقييمات ChatGPT 4 الصفرية لأبعاد الشخصية الخمس مع طرق أخرى باستخدام مقياس رقمي موسع، و(RQ2) مدى فعالية ChatGPT 4 في تقييم ثقته في تقديراته بناءً على كمية وتمثيل النص المحلل. تستخدم الدراسة مجموعتين من البيانات: مجموعة مقالات لتقييم سمات الشخصية ومجموعة pan15 لتقييم الثقة، مع استراتيجية تحفيز منظمة مصممة لإنتاج تقييمات عددية وقيم ثقة. تمهد المقدمة الطريق للأقسام التالية، التي ستفصل الطرق والنتائج والآثار المترتبة على النتائج.

طرق

تحدد قسم الطرق تصميم التجربة والتقنيات التحليلية المستخدمة في الدراسة. استخدم الباحثون نهجًا كميًا، حيث تم استخدام التحليلات الإحصائية لتقييم البيانات المجمعة من تجارب متنوعة. تضمنت المنهجيات المحددة تجارب محكومة، حيث تم التلاعب بالمتغيرات بشكل منهجي لملاحظة آثارها على النتائج المعنية.

شملت جمع البيانات إجراءات موحدة لضمان الموثوقية والصلاحية، مع قياسات تم أخذها في فترات محددة مسبقًا. تم إجراء التحليل باستخدام أدوات برمجية سهلت تطبيق الاختبارات الإحصائية المناسبة، مثل ANOVA أو تحليل الانحدار، لتحديد الفروق أو الارتباطات المهمة بين المتغيرات. يبرز القسم صرامة الإطار المنهجي، مما يضمن أن النتائج قوية وقابلة للتكرار.

نتائج

تتناول النتائج المعروضة في الجدول 1 سؤال البحث حول فعالية ChatGPT 4 في التنبؤ بسمات الشخصية الخمس الكبرى من بيانات النص عبر مجموعتين من البيانات. يوضح الجدول متوسط درجات التقرير الذاتي والانحرافات المعيارية، إلى جانب الدرجات المقدرة من ChatGPT 4، والدقة العددية (المعبر عنها كـ \(1 – \text{NRMSE}\))، والارتباطات بين الدرجات المقدرة والمبلغ عنها ذاتيًا. تشير النتائج إلى أن ChatGPT 4 يميل إلى المبالغة في تقدير سمات مثل العصابية، والانفتاح، والانبساط، مع درجات متفاوتة من الارتباط بين الدرجات المبلغ عنها ذاتيًا والمقدرة عبر مجموعات البيانات. من الجدير بالذكر أنه بينما تبقى الدقة العددية عالية، إلا أنها لا تعكس بالضرورة قوة الارتباط، خاصة في الحالات التي يكون فيها الارتباط منخفضًا أو سالبًا.

يكشف التحليل الإضافي أن قيم الثقة المعينة من قبل ChatGPT 4 لا ترتبط بموثوقية تقديرات سماته. على سبيل المثال، في مجموعة بيانات pan15، لوحظت قيم ثقة عالية للانبساط والانفتاح على الرغم من الارتباطات المنخفضة. بالإضافة إلى ذلك، توضح التمثيلات البيانية في الشكلين 1 و2 توزيع الدرجات المبلغ عنها ذاتيًا والمقدرة، مما يبرز تحيزًا منهجيًا في تقديرات ChatGPT 4، وخاصة الميل لتجنب التنبؤات ذات القيمة الصفرية. مع انخفاض كمية النص المقدمة للتحليل، يقترب الارتباط بين الدرجات المقدرة والمبلغ عنها ذاتيًا من الصفر، بينما تظل قيم الثقة مرتفعة نسبيًا، مما يشير إلى انفصال بين الثقة والدقة في تقدير السمات.

مناقشة

في هذه الدراسة، استكشفنا قدرات ChatGPT-4 في تقدير سمات الشخصية الخمس الكبرى من النصوص المكتوبة، باستخدام مجموعتين من البيانات: مجموعة المقالات ومجموعة pan15. قدمت مجموعة المقالات، التي تضم 2,347 مشاركًا، تحديات بسبب نظام الدرجات الثنائي لسمات الشخصية، الذي تم اشتقاقه من درجات اختبار خام متغيرة تم جمعها على مدى عدة سنوات. لمعالجة ذلك، قمنا بتطبيع الدرجات إلى نطاق [-0.5، +0.5]. تضمنت مجموعة pan15 رسائل تويتر لـ 294 مشاركًا، مع درجات السمات الخمس الكبرى التي تم تطبيعها بالفعل، مما يسمح بتحليل أكثر بساطة. تضمنت استراتيجيتنا التحفيزية تنسيقًا منظمًا يتضمن أوصاف مهام متنوعة وتقييمات ثقة اختيارية لتوجيه ChatGPT-4 في تقييماته.

أشارت النتائج إلى أن ChatGPT-4 أظهر قدرة واعدة على تقدير سمات الشخصية، محققًا معاملات ارتباط تتراوح بين 0.25 و0.29، والتي تتماشى مع الدراسات السابقة التي تشير إلى أحجام تأثير صغيرة إلى معتدلة. ومع ذلك، لم تكن قيم الثقة المقدمة من ChatGPT-4 مرتبطة بشكل موثوق بدقة توقعاته، مما يشير إلى انفصال بين ثقة النموذج وأدائه. تثير هذه القيود مخاوف بشأن التطبيق العملي لـ ChatGPT-4 في التقييمات النفسية، حيث قد يواجه صعوبة في قياس ملاءمة النصوص المدخلة للتحليل الدقيق. يجب أن تركز الأبحاث المستقبلية على تحسين المنهجيات، وتحسين مجموعات البيانات المرجعية، واستكشاف استراتيجيات مقارنة بديلة لتعزيز موثوقية تقييمات سمات الشخصية لـ ChatGPT-4.

Journal: Frontiers in Artificial Intelligence, Volume: 8
DOI: https://doi.org/10.3389/frai.2025.1484260
PMID: https://pubmed.ncbi.nlm.nih.gov/40017486
Publication Date: 2025-02-13
Author(s): Marco Piastra et al.
Primary Topic: Personality Traits and Psychology

Overview

This study explores the efficacy of ChatGPT 4 in assessing personality traits through written texts, utilizing two publicly available datasets that include both textual samples and self-assessments based on the Big Five personality model. The researchers sought to evaluate ChatGPT 4’s predictive performance by generating numerical predictions on an eleven-point scale for each text and comparing these predictions with the authors’ self-assessments. Additionally, confidence scores were requested from ChatGPT 4 for each prediction.

The findings indicate that ChatGPT 4 demonstrates moderate yet statistically significant capabilities in inferring personality traits from written content. However, the model exhibits limitations in determining the appropriateness and representativeness of the input texts, which may affect the accuracy of its inferences. The study also highlights the potential for enhanced benchmarking methods to improve the evaluation process’s efficiency and reliability. Overall, these results contribute to a deeper understanding of the capabilities of Large Language Models in the domain of personality assessment based on textual analysis.

Introduction

The introduction of this research paper discusses the capabilities of Generative Large Language Models (LLMs), particularly ChatGPT, in generating valuable content and assisting with various tasks. The authors highlight the increasing effectiveness of LLMs as conversational agents, citing studies that demonstrate their ability to create persuasive messages tailored to individual psychological traits and engage in discussions that foster consensus on contentious issues. The potential applications of LLMs range from personalized psychological support to adaptive learning systems, emphasizing the need for these models to assess users’ psychological characteristics and adapt their responses accordingly without requiring retraining.

The study specifically investigates how accurately ChatGPT 4 can assess the Big Five personality dimensions from written text, utilizing a “zero-shot” prompting strategy without prior examples or enhancement techniques. The authors pose two research questions: (RQ1) how ChatGPT 4’s zero-shot assessments of the Big Five compare to other methods using an extended numeric scale, and (RQ2) how effectively ChatGPT 4 can assess its own confidence in its estimates based on the amount and representativeness of the analyzed text. The research employs two datasets: the essays dataset for evaluating personality traits and the pan15 dataset for assessing confidence, with a structured prompting strategy designed to yield numerical ratings and confidence values. The introduction sets the stage for the subsequent sections, which will detail the methods, results, and implications of the findings.

Methods

The Methods section outlines the experimental design and analytical techniques employed in the study. The researchers utilized a quantitative approach, employing statistical analyses to evaluate the data collected from various experiments. Specific methodologies included controlled trials, where variables were systematically manipulated to observe their effects on the outcomes of interest.

Data collection involved standardized procedures to ensure reliability and validity, with measurements taken at predetermined intervals. The analysis was conducted using software tools that facilitated the application of appropriate statistical tests, such as ANOVA or regression analysis, to determine significant differences or correlations among the variables. The section emphasizes the rigor of the methodological framework, ensuring that the findings are robust and reproducible.

Results

The results presented in Table 1 address the research question regarding the effectiveness of ChatGPT 4 in predicting the Big Five personality traits from text data across two datasets. The table outlines mean self-report scores and standard deviations, alongside ChatGPT 4’s estimated scores, numerical accuracy (expressed as \(1 – \text{NRMSE}\)), and Pearson correlations between the estimated and self-reported scores. The findings indicate that ChatGPT 4 tends to overestimate traits such as Neuroticism, Extraversion, and Openness, with varying degrees of correlation between self-reported and estimated scores across datasets. Notably, while the numerical accuracy remains high, it does not necessarily reflect the strength of the correlation, particularly in cases where the correlation is low or negative.

Further analysis reveals that the confidence values assigned by ChatGPT 4 do not correlate with the reliability of its trait estimates. For instance, in the pan15 dataset, high confidence values for Extraversion and Openness were observed despite low correlations. Additionally, the graphical representations in Figures 1 and 2 illustrate the distribution of self-reported and estimated scores, highlighting a systematic bias in ChatGPT 4’s estimates, particularly a tendency to avoid zero-value predictions. As the amount of text provided for analysis decreases, the correlation between estimated and self-reported scores approaches zero, while confidence values remain relatively high, indicating a disconnect between confidence and accuracy in trait estimation.

Discussion

In this study, we explored the capabilities of ChatGPT-4 in estimating the Big Five personality traits from written texts, utilizing two datasets: the essays dataset and the pan15 dataset. The essays dataset, comprising 2,347 participants, presented challenges due to its binary scoring system for personality traits, which was derived from varying raw test scores collected over several years. To address this, we normalized the scores to a range of [-0.5, +0.5]. The pan15 dataset included 294 participants’ Twitter messages, with Big Five scores already normalized, allowing for a more straightforward analysis. Our prompting strategy involved a structured format that included various task descriptions and optional confidence assessments to guide ChatGPT-4 in its evaluations.

The findings indicated that ChatGPT-4 demonstrated a promising ability to estimate personality traits, achieving correlation coefficients between 0.25 and 0.29, which align with previous studies indicating small to moderate effect sizes. However, the confidence values provided by ChatGPT-4 were not reliably correlated with the accuracy of its predictions, suggesting a disconnect between the model’s confidence and its performance. This limitation raises concerns about the practical application of ChatGPT-4 in psychological assessments, as it may struggle to gauge the adequacy of input texts for accurate analysis. Future research should focus on refining methodologies, improving benchmark datasets, and exploring alternative comparison strategies to enhance the reliability of ChatGPT-4’s personality trait assessments.