فهم تأثير العوامل المتعلقة بالتصميم على التعاون بين الإنسان والذكاء الاصطناعي في مهمة مطابقة الوجوه Understanding the influence of design-related factors on human–AI teaming in a face matching task

المجلة: Cognitive Research Principles and Implications، المجلد: 11، العدد: 1
DOI: https://doi.org/10.1186/s41235-025-00701-x
PMID: https://pubmed.ncbi.nlm.nih.gov/41501590
تاريخ النشر: 2026-01-07
المؤلف: Zhenyun Du
الموضوع الرئيسي: التعرف على الوجه والإدراك

نظرة عامة

تدرس الدراسة تأثير عوامل التصميم المختلفة على فعالية أدوات اتخاذ القرار المدعومة بالذكاء الاصطناعي للتحقق من الهوية الوجهية. تم إجراء البحث من خلال ثلاثة تجارب مسجلة مسبقًا، حيث تم تقييم كيفية تأثير دقة الذكاء الاصطناعي المعلنة، وتكرار عدم التطابق، ونوع النصيحة (التنبؤات الثنائية مقابل التنبؤات الثنائية مع تقييمات التشابه) على أداء المشاركين في مهمة مطابقة الوجه من شخص لآخر. أظهرت النتائج أن مساعدة الذكاء الاصطناعي حسنت الأداء بشكل عام مقارنةً بقاعدة بدون دعم، حيث حدثت أكبر تحسينات عندما تم حجب معلومات دقة الذكاء الاصطناعي. لم يؤثر تكرار عدم التطابق على الأداء ولكنه أدخل تحيزات في الاستجابة، بينما حسنت تقييمات التشابه الأداء قليلاً وزادت من ثقة المستخدمين دون أن تخفف بشكل فعال من الاعتماد على التنبؤات غير الدقيقة.

في الختام، تسلط النتائج الضوء على اعتبارات تصميم حاسمة لأنظمة التعاون بين الإنسان والذكاء الاصطناعي. من الجدير بالذكر أن الكشف عن دقة الذكاء الاصطناعي لا يمنح مزايا، وأن انخفاض معدل عدم التطابق يمكن أن يسبب تحيزات في الاستجابات. بالإضافة إلى ذلك، أدى تضمين تقييمات التشابه جنبًا إلى جنب مع التنبؤات الثنائية إلى زيادة الاعتماد على الذكاء الاصطناعي دون تعزيز اكتشاف النصائح غير الصحيحة. بشكل عام، لم يتجاوز فريق الإنسان والذكاء الاصطناعي أداء الذكاء الاصطناعي بمفرده، على الرغم من أن بعض الأفراد قد تجاوزوا دقة الذكاء الاصطناعي. تؤكد الدراسة على الحاجة إلى مزيد من البحث لتحسين عناصر التصميم التي يمكن أن تعزز فعالية التعاون بين الإنسان والذكاء الاصطناعي.

مقدمة

تناقش مقدمة هذه الورقة البحثية التكامل المتزايد للذكاء الاصطناعي (AI) في عمليات اتخاذ القرار عبر مجالات مختلفة، بما في ذلك الموارد البشرية، والمالية، والرعاية الصحية. تُستخدم أدوات اتخاذ القرار المدعومة بالذكاء الاصطناعي، مثل برامج فحص السير الذاتية وأنظمة دعم القرار السريري، بشكل متزايد جنبًا إلى جنب مع الحكم البشري فيما يُطلق عليه أنظمة الإنسان في الحلقة. تقدم هذه التعاونات تحديات، خاصة في تحقيق التوازن بين الثقة في توصيات الذكاء الاصطناعي والاعتماد على الخبرة الشخصية.

تسلط الورقة الضوء على ظاهرتين متناقضتين: النفور من الخوارزميات، حيث لا يثق المستخدمون في نصائح الذكاء الاصطناعي – وغالبًا ما ينشأ ذلك من الثقة المفرطة في قدراتهم الخاصة – وتقدير الخوارزميات، حيث يفضل المستخدمون توجيه الذكاء الاصطناعي، خاصة في المهام المعقدة. يؤكد المؤلفون على مخاطر الاعتماد المفرط على الذكاء الاصطناعي عندما يفشل المستخدمون في تقييم دقته بشكل نقدي. نظرًا لانتشار الذكاء الاصطناعي في اتخاذ القرار، تهدف الدراسة إلى تحديد ومعالجة القضايا المتعلقة بنفور الخوارزميات والاعتماد المفرط، مقترحةً تنسيقات عرض مثلى لتنبؤات الذكاء الاصطناعي لتعزيز تفاعل المستخدم وضمان التنفيذ الآمن. تستخدم الأبحاث نموذج مطابقة الوجه من شخص لآخر للتحقيق في العوامل التي تؤثر على تفاعل الإنسان والذكاء الاصطناعي.

النتائج

استخدمت نتائج الدراسة سلسلة من التصاميم داخل الموضوعات للتحقيق في تأثيرات دقة تنبؤات الذكاء الاصطناعي وتقييمات التشابه على اتخاذ القرار. أظهر تحليل الانحدار اللوجستي المختلط أن تضمين تقييمات التشابه جنبًا إلى جنب مع التنبؤات الثنائية حسّن بشكل كبير من ملاءمة النموذج (χ²(3) = 48.40، p < 0.001). أشارت نسب الأرجحية إلى أن احتمال اتخاذ قرار صحيح كان الأعلى عندما كانت جميع المتغيرات صفرًا، مع تقاطع قدره 0.96. على وجه التحديد، زادت وجود تقييمات التشابه من احتمالات اتخاذ قرارات صحيحة (OR = 1.36، p < 0.001)، بينما قللت التنبؤات غير الدقيقة من احتمالات النجاح (OR = 0.09، p < 0.001). أظهر المشاركون مزيدًا من اليقين في قراراتهم عندما تم تقديم تقييمات التشابه، على الرغم من أن ذلك لم يترجم إلى زيادة في الثقة المبلغ عنها ذاتيًا أو الفائدة المدركة لتنبؤات الذكاء الاصطناعي. من الجدير بالذكر أن درجة التشابه التي أبلغ عنها الذكاء الاصطناعي كانت مرتبطة إيجابيًا مع يقين المشاركين في قراراتهم، خاصة عندما كانت تنبؤات الذكاء الاصطناعي دقيقة. ومع ذلك، كان التأثير العام لتقييمات التشابه على الأداء متواضعًا، مع عدم ملاحظة تحسينات كبيرة عندما كانت التنبؤات غير دقيقة. تؤكد هذه النتائج على الدور الدقيق لتقييمات التشابه في تعزيز ثقة اتخاذ القرار، بينما تسلط الضوء أيضًا على قيود تنبؤات الذكاء الاصطناعي في التأثير على استجابات المستخدمين.

المناقشة

في مناقشة الورقة البحثية، يسلط المؤلفون الضوء على التحديات والقيود التي تواجه أداء كل من الإنسان والذكاء الاصطناعي في مهام مطابقة الوجه، خاصة في السيناريوهات عالية المخاطر مثل التحقق من الهوية عند نقاط التفتيش الحدودية. على الرغم من الاستخدام المنتظم لمطابقة الوجه في الحياة اليومية، يمكن أن تصل معدلات الخطأ البشري إلى 10-20% في ظل الظروف المثلى، مع تدهور الأداء أكثر في الظروف السلبية. بينما أظهرت أنظمة الذكاء الاصطناعي دقة متفوقة مقارنة بالبشر مع الصور عالية الجودة، إلا أنها لا تزال تواجه صعوبات مع المدخلات منخفضة الجودة وتحتاج إلى إشراف بشري، كما هو مطلوب بموجب إرشادات وكالة الحدود الأوروبية الحالية. يجادل المؤلفون بأن الجمع بين قدرات الإنسان والذكاء الاصطناعي يمكن أن يعزز الأداء، لكن هذه التآزر يعتمد على التنفيذ والتصميم الأمثل لأنظمة الذكاء الاصطناعي.

تستعرض الورقة أيضًا الأدبيات الموجودة حول تفاعل الإنسان والذكاء الاصطناعي في مهام مطابقة الوجه، مشيرةً إلى أن الثقة في الذكاء الاصطناعي يمكن أن تؤثر على نتائج الأداء. تشير بعض الدراسات إلى أن المشاركين قد يعتمدون بشكل مفرط على تنبؤات الذكاء الاصطناعي، خاصة بعد التجارب الإيجابية، بينما تشير دراسات أخرى إلى أن الدقة المدركة للذكاء الاصطناعي يمكن أن تؤثر على سلوك المستخدم. يؤكد المؤلفون على الحاجة إلى مزيد من التحقيق في عوامل التصميم التي يمكن أن تعظم فعالية الذكاء الاصطناعي في مطابقة الوجه، حيث تشير النتائج الحالية إلى أن أداء الإنسان، حتى عند مساعدته بالذكاء الاصطناعي، لم يتجاوز دقة الذكاء الاصطناعي بمفرده. تهدف الدراسة إلى استكشاف عناصر التصميم المختلفة وتنسيقات التنبؤ لتحسين الأداء التعاوني في مهام مطابقة الوجه، مما يبرز أهمية فهم ديناميات تفاعل الإنسان والذكاء الاصطناعي في التطبيقات العملية.

القيود

تقدم الدراسة عدة قيود قد تؤثر على قابلية تعميم وموثوقية نتائجها. أولاً، لم يتم اختيار مجموعة المشاركين بشكل محدد بناءً على الخبرة في مطابقة الوجه، مما يثير تساؤلات حول قابلية تطبيق النتائج على المستخدمين المحترفين لأنظمة الذكاء الاصطناعي لمطابقة الوجه. بينما تشير الأبحاث السابقة إلى أن المحترفين، مثل موظفي جوازات السفر، لا يتفوقون بالضرورة على عموم السكان في مهام مطابقة الوجه (White et al.، 2014)، لا تأخذ الدراسة في الاعتبار التأثير المحتمل للخبرة الطويلة الأمد مع أدوات الذكاء الاصطناعي أو أداء المتعرفين الفائقين وممتحني الوجه الجنائيين (Phillips et al.، 2018).

بالإضافة إلى ذلك، كانت تصميم التجربة مقيدًا بطبيعة المهمة ذات المخاطر المنخفضة، حيث تفتقر إلى العواقب السلبية للأداء الضعيف، مما قد يؤثر على سلوك المشاركين (Stabile et al.، 2024؛ Yuen & Fitzgerald، 2024). تثير عدم القدرة على موازنة المحفزات عبر الظروف التجريبية في التجربة 2 مخاوف بشأن تأثير صعوبة المحفزات على التأثيرات الملحوظة. علاوة على ذلك، يحد استبعاد الوجوه النسائية من المحفزات من قابلية تطبيق النتائج، مما يستلزم مزيدًا من البحث لتضمين تمثيلات متنوعة. أدى الاعتماد على تنبؤات الذكاء الاصطناعي الفعلية، بينما يعزز من الصلاحية البيئية، إلى عدد محدود من التجارب غير الدقيقة، مما قد يقلل من القوة الإحصائية. أخيرًا، يتناقض إعداد الدراسة، الذي يحاكي الظروف المثلى للتحقق من الهوية، مع السيناريوهات الواقعية حيث يمكن أن تؤثر عوامل مثل التعب ومدة المهمة المطولة سلبًا على الأداء (Alenezi et al.، 2015؛ Fysh & Bindemann، 2017؛ Megreya et al.، 2013)، مما يؤدي إلى زيادة الاعتماد المفرط على مساعدات الذكاء الاصطناعي.

Journal: Cognitive Research Principles and Implications, Volume: 11, Issue: 1
DOI: https://doi.org/10.1186/s41235-025-00701-x
PMID: https://pubmed.ncbi.nlm.nih.gov/41501590
Publication Date: 2026-01-07
Author(s): Zhenyun Du
Primary Topic: Face Recognition and Perception

Overview

The study investigates the impact of various design factors on the effectiveness of AI-enabled decision aids for facial identity verification. Conducted through three pre-registered experiments, the research assessed how stated AI accuracy, mismatch frequency, and advice type (binary predictions versus binary predictions with similarity ratings) influenced participants’ performance in a one-to-one face matching task. Results indicated that AI assistance generally improved performance compared to a baseline without support, with the most significant enhancement occurring when AI accuracy information was withheld. Mismatch frequency did not affect performance but introduced response biases, while similarity ratings slightly improved performance and increased user confidence without effectively mitigating reliance on inaccurate predictions.

In conclusion, the findings highlight critical design considerations for human-AI collaborative systems. Notably, revealing AI accuracy does not confer advantages, and a lower base rate of mismatches can bias responses. Additionally, the inclusion of similarity ratings alongside binary predictions led to increased reliance on AI without enhancing the detection of incorrect advice. Overall, the human-AI team did not surpass the performance of the AI alone, although some individuals did exceed AI accuracy. The study underscores the need for further research to refine design elements that could optimize the efficacy of human-AI collaborations.

Introduction

The introduction of this research paper discusses the growing integration of artificial intelligence (AI) in decision-making processes across various fields, including human resources, finance, and healthcare. AI decision aids, such as CV screening software and clinical decision support systems, are increasingly utilized alongside human judgment in what are termed human-in-the-loop systems. This collaboration presents challenges, particularly in achieving a balance between trusting AI recommendations and relying on personal expertise.

The paper highlights two contrasting phenomena: algorithmic aversion, where users distrust AI advice—often stemming from overconfidence in their own abilities—and algorithmic appreciation, where users prefer AI guidance, especially in complex tasks. The authors emphasize the risks of overreliance on AI when users fail to critically assess its accuracy. Given the prevalence of AI in decision-making, the study aims to identify and address issues related to algorithm aversion and overreliance, proposing optimal presentation formats for AI predictions to enhance user interaction and ensure safe implementation. The research employs a one-to-one face matching paradigm to investigate the factors influencing human-AI interaction.

Results

The results of the study employed a series of within-subject designs to investigate the effects of AI prediction accuracy and similarity ratings on decision-making. A mixed-effects logistic regression analysis revealed that the inclusion of similarity ratings alongside binary predictions significantly improved model fit (χ²(3) = 48.40, p < 0.001). The odds ratios indicated that the probability of making a correct decision was highest when all predictors were zero, with an intercept of 0.96. Specifically, the presence of similarity ratings increased the odds of correct decisions (OR = 1.36, p < 0.001), while inaccurate AI predictions decreased the odds (OR = 0.09, p < 0.001). Participants exhibited greater certainty in their decisions when similarity ratings were provided, although this did not translate into higher self-reported confidence or perceived usefulness of the AI predictions. Notably, the degree of similarity reported by the AI was positively correlated with participants' certainty in their decisions, particularly when the AI predictions were accurate. However, the overall impact of similarity ratings on performance was modest, with no significant improvement observed when predictions were inaccurate. These findings underscore the nuanced role of similarity ratings in enhancing decision-making confidence, while also highlighting the limitations of AI predictions in influencing user responses.

Discussion

In the discussion of the research paper, the authors highlight the challenges and limitations of both human and AI performance in face matching tasks, particularly in high-stakes scenarios such as identity verification at border control. Despite the regular use of face matching in daily life, human error rates can reach 10-20% under optimal conditions, with performance deteriorating further under adverse circumstances. While AI systems have shown superior accuracy compared to humans with high-quality images, they still struggle with low-quality inputs and require human oversight, as mandated by current European Border Agency guidelines. The authors argue that the combination of human and AI capabilities could potentially enhance performance, but this synergy is contingent upon optimal implementation and design of AI systems.

The paper also reviews existing literature on human-AI interaction in face matching tasks, noting that trust in AI can influence performance outcomes. Some studies indicate that participants may over-rely on AI predictions, especially after positive experiences, while others suggest that the perceived accuracy of AI can affect user behavior. The authors emphasize the need for further investigation into design factors that can maximize the effectiveness of AI in face matching, as current findings indicate that human performance, even when aided by AI, has not surpassed AI accuracy alone. The study aims to explore various design elements and prediction formats to improve collaborative performance in face matching tasks, underscoring the importance of understanding the dynamics of human-AI interaction in practical applications.

Limitations

The study presents several limitations that may affect the generalizability and validity of its findings. Firstly, the participant group was not specifically selected based on expertise in face matching, which raises questions about the applicability of results to professional users of face matching AI systems. While prior research suggests that professionals, such as passport officers, do not necessarily outperform the general population in face matching tasks (White et al., 2014), the study does not consider the potential impact of long-term experience with AI tools or the performance of superrecognisers and forensic face examiners (Phillips et al., 2018).

Additionally, the experimental design was constrained by the low-risk nature of the decision-making task, lacking negative consequences for poor performance, which may have influenced participant behavior (Stabile et al., 2024; Yuen & Fitzgerald, 2024). The inability to counterbalance stimuli across experimental conditions in Experiment 2 raises concerns about the influence of stimulus difficulty on observed effects. Furthermore, the exclusion of female faces from the stimuli limits the findings’ applicability, necessitating further research to include diverse representations. The reliance on actual AI predictions, while enhancing ecological validity, resulted in a limited number of inaccurate trials, potentially reducing statistical power. Lastly, the study’s setup, which mimicked optimal conditions for identity verification, contrasts with real-world scenarios where factors such as fatigue and prolonged task duration could adversely affect performance (Alenezi et al., 2015; Fysh & Bindemann, 2017; Megreya et al., 2013), leading to increased overreliance on AI aids.