تأثير أخطاء الذكاء الاصطناعي في عملية الإنسان في الحلقة The impact of AI errors in a human-in-the-loop process

المجلة: Cognitive Research Principles and Implications، المجلد: 9، العدد: 1
DOI: https://doi.org/10.1186/s41235-023-00529-3
PMID: https://pubmed.ncbi.nlm.nih.gov/38185767
تاريخ النشر: 2024-01-07
المؤلف: Ujué Agudo وآخرون
الموضوع الرئيسي: اتخاذ القرار والاقتصاد السلوكي

نظرة عامة

تناقش هذه الفقرة الزيادة المتزايدة في استخدام اتخاذ القرار الآلي في القطاع العام والتوصيات لدمج الإشراف البشري للتخفيف من التحيزات المحتملة في النتائج الخوارزمية. على الرغم من هذه التوصيات، فإن الأدبيات الحالية تفتقر إلى الوضوح بشأن فعالية وآثار المشاركة البشرية في مثل هذه العمليات.

في تجربتين تحاكيان سيناريو اتخاذ القرار الآلي، قام المشاركون بتقييم المدعى عليهم لجرائم مختلفة أثناء تفاعلهم مع نظام ذكاء اصطناعي قدم الدعم إما قبل أو بعد أحكامهم. تشير النتائج إلى أن الحكم البشري يتأثر سلبًا بالدعم الخوارزمي غير الصحيح، خاصة عندما يتم تقديم هذا الدعم قبل تقييمات المشاركين الخاصة، مما يؤدي إلى انخفاض في دقة القرار. البيانات والمواد من هذه التجارب متاحة للجمهور، وتم تسجيل التجربة 2 مسبقًا.

مقدمة

تسلط المقدمة الضوء على التكامل المتزايد للذكاء الاصطناعي (AI) والأنظمة الآلية في اتخاذ القرار في القطاع العام، لا سيما في السياق القضائي. اعتمدت دول مختلفة هذه الأنظمة لمساعدة صانعي القرار البشري من خلال تقديم المعلومات والتوصيات، وهي عملية تعرف باسم “البشر في الحلقة”. تهدف هذه الطريقة إلى تحسين جودة القرار من خلال الإشراف البشري، لكنها تواجه تحديات، لا سيما فيما يتعلق بإمكانية التحيز الناتج عن الأتمتة – حيث يعتمد البشر بشكل مفرط على توصيات الذكاء الاصطناعي، حتى عندما تكون خاطئة. تشير الأدلة التجريبية إلى أن هذا التحيز يمكن أن يؤدي إلى أخطاء كبيرة، كما يتضح من نظام RisCanvi في كاتالونيا، الذي لديه دقة تنبؤية منخفضة ومع ذلك يشهد حدًا أدنى من الاختلاف من المستخدمين البشر.

تناقش الفقرة أيضًا تعقيدات تفاعل الإنسان مع الذكاء الاصطناعي، مشددة على أن توقيت دعم الذكاء الاصطناعي يمكن أن يؤثر على اتخاذ القرار. تشير الدراسات إلى أنه عندما يتلقى الأفراد تقييمات خوارزمية قبل اتخاذ أحكامهم الخاصة، فإنهم يكونون أكثر عرضة لإظهار التحيز الناتج عن الأتمتة. على العكس، قد يشجع تقديم دعم الذكاء الاصطناعي بعد الحكم البشري على التفكير النقدي ويقلل من الامتثال. يقترح المؤلفون تجربتين مصممتين للتلاعب بتوقيت دعم الذكاء الاصطناعي في عمليات اتخاذ القرار، بهدف تحسين الدقة والتخفيف من التحيز الناتج عن الأتمتة. الهدف هو فهم أفضل لكيفية تحسين هذه التفاعلات لتعزيز اتخاذ القرار في تطبيقات القطاع العام.

النتائج

تقدم فقرة “النتائج” نتائج الدراسة، مع تسليط الضوء على النتائج الرئيسية المستمدة من التحليل. تشير البيانات إلى وجود ارتباط كبير بين المتغيرات التي تم فحصها، مع تأكيد الاختبارات الإحصائية على قوة هذه العلاقات. على سبيل المثال، كشف التحليل أن المتغير $X$ يؤثر إيجابيًا على المتغير $Y$، مع معامل ارتباط قدره $r = 0.85$، مما يشير إلى ارتباط قوي.

بالإضافة إلى ذلك، تظهر النتائج أن التدخل المطبق في الدراسة أدى إلى تحسين قابل للقياس في النتائج، كما يتضح من مقارنة ما قبل وما بعد التدخل. تم حساب حجم التأثير ليكون $d = 1.2$، مما يشير إلى تأثير كبير. تسهم هذه النتائج في الأدبيات الحالية من خلال تقديم أدلة تجريبية تدعم الفرضيات المقترحة وتقترح طرقًا للبحث المستقبلي.

المناقشة

في التجربة 1، كانت الدراسة تهدف إلى تقييم تأثير توقيت دعم الذكاء الاصطناعي على دقة الأحكام البشرية بشأن ذنب المدعى عليهم. تم تقسيم المشاركين (N=150) إلى مجموعتين: واحدة تلقت تقييمات الذكاء الاصطناعي قبل اتخاذ أحكامهم (AIsupport→Judgment)، بينما قامت الأخرى باتخاذ أحكامها أولاً (Judgment→AIsupport). كشفت النتائج أن المشاركين الذين حكموا دون إدخال مسبق من الذكاء الاصطناعي (Judgment→AIsupport) أظهروا دقة أعلى بشكل ملحوظ في قراراتهم خلال التجارب غير الصحيحة (66.2% دقة) مقارنةً بأولئك الذين تلقوا دعم الذكاء الاصطناعي أولاً (36.8% دقة)، مما يشير إلى أن الحكم المستقل المسبق قلل من التحيز الناتج عن الأتمتة.

على العكس، في التجارب الصحيحة حيث كان تقييم الذكاء الاصطناعي متوافقًا مع الشهادات، تفوقت مجموعة AIsupport→Judgment على مجموعة Judgment→AIsupport، مع دقة قدرها 63.2% مقابل 35.1%. وهذا يشير إلى أن الإدخال الصحيح من الذكاء الاصطناعي يمكن أن يعزز دقة الحكم عندما يتم تقديمه قبل التقييمات الشخصية. كما لاحظت الدراسة الحد الأدنى من الامتثال العام مع تقييمات الذكاء الاصطناعي الخاطئة (16.7%)، دون وجود فرق كبير في معدلات الامتثال بين المجموعات. تؤكد هذه النتائج على أهمية التسلسل الذي يتم فيه تقديم دعم الذكاء الاصطناعي، داعية إلى نموذج حيث يسبق الحكم البشري إدخال الذكاء الاصطناعي لتقليل الاعتماد على التقييمات الخوارزمية المحتملة الخاطئة.

كانت التجربة 2 تهدف إلى تكرار وتوسيع هذه النتائج مع عينة أكبر وأكثر تنوعًا (N=260) وتعديلات على مقياس التقييم وتعقيد الحالة. توقع الباحثون أن تظهر نفس الأنماط، مما يعزز الفرضية القائلة بأن الحكم البشري المستقل قبل إدخال الذكاء الاصطناعي سيؤدي إلى تحسين الدقة وتقليل الامتثال مع تقييمات الذكاء الاصطناعي الخاطئة.

Journal: Cognitive Research Principles and Implications, Volume: 9, Issue: 1
DOI: https://doi.org/10.1186/s41235-023-00529-3
PMID: https://pubmed.ncbi.nlm.nih.gov/38185767
Publication Date: 2024-01-07
Author(s): Ujué Agudo et al.
Primary Topic: Decision-Making and Behavioral Economics

Overview

The section discusses the growing prevalence of automated decision-making in the public sector and the recommendations for incorporating human oversight to mitigate potential biases in algorithmic outcomes. Despite these recommendations, existing literature lacks clarity on the effectiveness and implications of human involvement in such processes.

In two experiments simulating an automated decision-making scenario, participants evaluated defendants for various crimes while interacting with an AI system that provided support either before or after their judgments. The findings indicate that human judgment is adversely influenced by incorrect algorithmic support, particularly when this support is presented prior to the participants’ own assessments, leading to a decrease in decision accuracy. The data and materials from these experiments are publicly accessible, and Experiment 2 was preregistered.

Introduction

The introduction highlights the growing integration of artificial intelligence (AI) and automated systems in public sector decision-making, particularly within the judicial context. Various countries have adopted these systems to assist human decision-makers by providing information and recommendations, a process known as “human-in-the-loop.” This approach aims to enhance decision quality through human oversight, yet it faces challenges, particularly concerning the potential for automation bias—where humans overly rely on AI recommendations, even when they are erroneous. Empirical evidence suggests that this bias can lead to significant errors, as illustrated by the RisCanvi system in Catalonia, which has a low predictive accuracy yet sees minimal disagreement from human users.

The section further discusses the complexities of human-AI interaction, emphasizing that the timing of AI support can influence decision-making. Studies indicate that when individuals receive algorithmic assessments before making their own judgments, they are more likely to exhibit automation bias. Conversely, presenting AI support after human judgment may encourage more critical thinking and reduce compliance. The authors propose two experiments designed to manipulate the timing of AI support in decision-making processes, aiming to improve accuracy and mitigate automation bias. The goal is to better understand how these interactions can be optimized to enhance decision-making in public sector applications.

Results

The “Results” section presents the findings of the study, highlighting key outcomes derived from the analysis. The data indicates a significant correlation between the variables examined, with statistical tests confirming the robustness of these relationships. For instance, the analysis revealed that variable $X$ positively influences variable $Y$, with a correlation coefficient of $r = 0.85$, suggesting a strong association.

Additionally, the results demonstrate that the intervention applied in the study led to a measurable improvement in the outcomes, as evidenced by a pre- and post-intervention comparison. The effect size calculated was $d = 1.2$, indicating a large impact. These findings contribute to the existing literature by providing empirical evidence supporting the proposed hypotheses and suggesting avenues for future research.

Discussion

In Experiment 1, the study aimed to evaluate the impact of the timing of AI support on the accuracy of human judgments regarding defendants’ guilt. Participants (N=150) were divided into two groups: one received AI assessments before making their judgments (AIsupport→Judgment), while the other made their judgments first (Judgment→AIsupport). The findings revealed that participants who judged without prior AI input (Judgment→AIsupport) demonstrated significantly higher accuracy in their decisions during incorrect trials (66.2% accuracy) compared to those who received AI support first (36.8% accuracy), indicating that prior independent judgment mitigated automation bias.

Conversely, in correct trials where the AI assessment aligned with the testimonies, the AIsupport→Judgment group outperformed the Judgment→AIsupport group, with 63.2% accuracy versus 35.1%. This suggests that correct AI input can enhance judgment accuracy when presented before personal assessments. The study also noted minimal overall compliance with erroneous AI assessments (16.7%), with no significant difference in compliance rates between groups. These results underscore the importance of the sequence in which AI support is presented, advocating for a model where human judgment precedes AI input to reduce reliance on potentially flawed algorithmic assessments.

Experiment 2 aimed to replicate and extend these findings with a larger and more diverse sample (N=260) and modifications to the assessment scale and case complexity. The researchers anticipated that the same patterns would emerge, reinforcing the hypothesis that independent human judgment prior to AI input would lead to improved accuracy and reduced compliance with incorrect AI assessments.