التحقيق في الحكم الأخلاقي للآلة من خلال تجربة دلفي Investigating machine moral judgement through the Delphi experiment

المجلة: Nature Machine Intelligence، المجلد: 7، العدد: 1
DOI: https://doi.org/10.1038/s42256-024-00969-6
تاريخ النشر: 2025-01-13
المؤلف: Liwei Jiang وآخرون
الموضوع الرئيسي: علم نفس الحكم الأخلاقي والعاطفي

الطرق

قسم “الطرق” يوضح الإجراءات التجريبية والتحليلية المستخدمة في الدراسة. يتناول اختيار المشاركين، وتصميم التجارب، والتقنيات الإحصائية المستخدمة لتحليل البيانات. استخدم الباحثون تنسيق تجربة عشوائية محكومة لضمان موثوقية النتائج، مع معايير محددة للإدراج والاستبعاد لاختيار المشاركين لتقليل التحيز.

شملت جمع البيانات قياسات وتقييمات موحدة، والتي تم تحليلها بعد ذلك باستخدام برامج إحصائية مناسبة. شمل التحليل كل من الإحصاءات الوصفية والاستنتاجية لتقييم أهمية النتائج. يبرز القسم صرامة المنهجية، مما يضمن أن النتائج قوية ويمكن تعميمها على مجموعة سكانية أوسع.

النتائج

يقدم قسم النتائج تقييمًا شاملاً لنموذج دلفي مقابل معايير مختلفة، بما في ذلك تكوينات مختلفة من نماذج دلفي و GPT-3/3.5/4، باستخدام كل من التقييمات التلقائية والبشرية. يتم تصنيف التقييمات إلى أوضاع حرة وأوضاع نعم/لا، مع اشتقاق مقاييس الأداء من بنك المعايير العامة. على وجه التحديد، يتم تقييم مهام التصنيف (C) عبر ثلاث فئات (جيد، تقديري، سيء) وفئتين (جيد وتقديري مجتمعتين)، بينما يتم تقييم مهام النص المفتوح (T) بناءً على مطابقة السلاسل الاستدلالية لتحديد القطبية. قدم المقيمون البشريون أيضًا درجات تعكس دقة توقعات النموذج. من الجدير بالذكر أن نتائج نموذج دلفي تم التأكيد عليها بخط عريض، مما يدل على أداء متفوق في مجموعة التحقق.

بالإضافة إلى ذلك، يوضح الجدول 9 الممتد دقة التصنيف لكل من نماذج دلفي ودلفي الهجينة على مجموعات بيانات التقييم المعادية، والتي تم اشتقاقها من استفسارات المستخدمين في سيناريوهات العالم الحقيقي وتم أخذ عينات منها من بنك المعايير العامة. يبرز هذا التحليل قوة نماذج دلفي في سياقات تقييم متنوعة، مما يسلط الضوء على فعاليتها في فهم وتوليد التفكير السليم.

المناقشة

في قسم المناقشة من ورقة البحث، يستكشف المؤلفون تداعيات إطار رولز الأخلاقي، وخاصة إجراء اتخاذ القرار الخاص به للأخلاق، الذي يركز على التعلم من مجموعة متنوعة من الأمثلة الأخلاقية بدلاً من الالتزام الصارم بالقواعد الوصفية. يبرزون كيف أن الأساليب الحسابية الحديثة، وبشكل خاص نظام دلفي، تقوم بتشغيل أفكار رولز من خلال استخدام أحكام أخلاقية مستمدة من الحشود على نطاق واسع لتحديد الأنماط في الأخلاق البشرية. يُعترف بأن هذا النهج من الأسفل إلى الأعلى، رغم فعاليته، عرضة للتحيزات النظامية الموجودة في الحشد، مما يدعو إلى نماذج هجينة تدمج المبادئ الأخلاقية من الأعلى إلى الأسفل لتخفيف هذه التحيزات.

يظهر دلفي دقة تنبؤية كبيرة في الأحكام الأخلاقية، حيث حقق دقة 92.8% عند اختباره ضد مجموعة بيانات شاملة تتكون من 1.7 مليون سيناريو أخلاقي. يشير المؤلفون إلى أن أداء دلفي يتجاوز أداء نماذج اللغة المعاصرة، مما يدل على إمكانية أن تتعلم أنظمة الذكاء الاصطناعي التفكير الأخلاقي من خلال بيانات غنية بالسياق. ومع ذلك، يحددون أيضًا قيودًا حاسمة، بما في ذلك التحيزات ضد المجموعات المهمشة وعدم الحساسية الثقافية، والتي تعكس التركيبة الاجتماعية والديمغرافية لعمال الحشود الذين يساهمون في مجموعة البيانات. يدعو المؤلفون إلى أبحاث مستقبلية لتعزيز الوعي الثقافي والصلابة الأخلاقية لأنظمة الذكاء الاصطناعي، مقترحين أن الجمع بين منهجيات من الأسفل إلى الأعلى ومن الأعلى إلى الأسفل يمكن أن يؤدي إلى أطر تفكير أخلاقي أكثر عدلاً وتمثيلاً في الذكاء الاصطناعي.

القيود

تستدعي قيود أنظمة الذكاء الاصطناعي من نموذج من الأسفل إلى الأعلى، وخاصة نظام دلفي، تقييمًا دقيقًا لفهم إنجازاتها ونقاط ضعفها. يكشف الفحص النقدي عن مخاوف كبيرة، بما في ذلك وجود تحيزات اجتماعية وعدم حساسية ثقافية داخل النموذج. لا تؤثر هذه القضايا فقط على موثوقية مخرجات الذكاء الاصطناعي ولكن تثير أيضًا اعتبارات أخلاقية بشأن نشره في سياقات اجتماعية متنوعة. إن معالجة هذه القيود أمر ضروري لتحسين فعالية وعدالة أنظمة الذكاء الاصطناعي.

Journal: Nature Machine Intelligence, Volume: 7, Issue: 1
DOI: https://doi.org/10.1038/s42256-024-00969-6
Publication Date: 2025-01-13
Author(s): Liwei Jiang et al.
Primary Topic: Psychology of Moral and Emotional Judgment

Methods

The “Methods” section outlines the experimental and analytical procedures employed in the study. It details the selection of participants, the design of the experiments, and the statistical techniques used for data analysis. The researchers utilized a randomized controlled trial format to ensure the reliability of the results, with specific inclusion and exclusion criteria for participant selection to minimize bias.

Data collection involved standardized measurements and assessments, which were subsequently analyzed using appropriate statistical software. The analysis included both descriptive and inferential statistics to evaluate the significance of the findings. The section emphasizes the rigor of the methodology, ensuring that the results are robust and can be generalized to a broader population.

Results

The results section presents a comprehensive evaluation of the Delphi model against various baselines, including different configurations of Delphi and GPT-3/3.5/4 models, utilizing both automatic and human assessments. The evaluations are categorized into free-form and yes/no modes, with performance metrics derived from the Commonsense Norm Bank. Specifically, classification tasks (C) are assessed across three categories (good, discretionary, bad) and two categories (good and discretionary combined), while open-text tasks (T) are evaluated based on heuristic string matching to determine polarity. Human evaluators also provided scores reflecting the perceived correctness of the model predictions. Notably, the results for the Delphi model are emphasized in bold, indicating superior performance on the validation set.

Additionally, Extended Data Table 9 details the classification accuracies of both the Delphi and Delphi-hybrid models on adversarial evaluation datasets, which were derived from user queries in real-world scenarios and subsampled from the Commonsense Norm Bank. This analysis underscores the robustness of the Delphi models in diverse evaluation contexts, highlighting their effectiveness in understanding and generating commonsense reasoning.

Discussion

In the discussion section of the research paper, the authors explore the implications of Rawls’s ethical framework, particularly his decision procedure for ethics, which emphasizes learning from a diverse set of moral examples rather than adhering strictly to prescriptive rules. They highlight how modern computational methods, specifically the Delphi system, operationalize Rawls’s ideas by utilizing large-scale crowdsourced moral judgments to identify patterns in human ethics. This bottom-up approach, while effective, is acknowledged to be vulnerable to systemic biases inherent in the crowd, prompting a call for hybrid models that integrate top-down ethical principles to mitigate these biases.

Delphi demonstrates significant predictive accuracy in moral judgments, achieving 92.8% accuracy when tested against a comprehensive dataset of 1.7 million moral scenarios. The authors note that Delphi’s performance surpasses that of contemporary language models, indicating the potential for AI systems to learn moral reasoning through extensive, contextually rich data. However, they also identify critical limitations, including biases against marginalized groups and cultural insensitivity, which reflect the socio-demographic composition of the crowdworkers contributing to the dataset. The authors advocate for future research to enhance the cultural awareness and ethical robustness of AI systems, suggesting that a combination of bottom-up and top-down methodologies could lead to more equitable and representative moral reasoning frameworks in AI.

Limitations

The limitations of bottom-up model AI systems, particularly the Delphi system, warrant thorough evaluation to understand both their achievements and shortcomings. A critical examination reveals significant concerns, including the presence of social biases and cultural insensitivity within the model. These issues not only affect the reliability of the AI’s outputs but also raise ethical considerations regarding its deployment in diverse societal contexts. Addressing these limitations is essential for improving the efficacy and fairness of AI systems.