ما بعد المعلومات الشخصية: كيف يحاول المستخدمون تقدير وتخفيف استنتاجات LLM الضمنية Beyond PII: How Users Attempt to Estimate and Mitigate Implicit LLM Inference

المجلة: Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems
DOI: https://doi.org/10.1145/3772318.3791762
تاريخ النشر: 2026-04-13
المؤلف: Synthia Wang وآخرون
الموضوع الرئيسي: نمذجة الموضوعات

نظرة عامة

تبحث الدراسة في كيفية إدراك المستخدمين والاستجابة لمخاطر الخصوصية المستندة إلى الاستنتاج المرتبطة بالنماذج اللغوية الكبيرة (LLMs) مثل ChatGPT. أظهر استطلاع شمل 240 مشاركًا من الولايات المتحدة أن المستخدمين واجهوا صعوبة في تقدير مخاطر الاستنتاج بدقة، حيث أدوا بشكل أفضل قليلاً من الصدفة. بينما حاول المشاركون إعادة صياغة النصوص لتخفيف هذه المخاطر، كانت نسبة نجاحهم 28% فقط، وهي أفضل من أداة التنظيف المتطورة Rescriber، لكنها كانت أقل بكثير من فعالية ChatGPT. حددت الدراسة أنه بينما كانت إعادة الصياغة هي الاستراتيجية الأكثر استخدامًا، إلا أنها كانت الأقل فعالية؛ حيث أسفرت استراتيجيات مثل التجريد وإضافة الغموض عن نتائج أفضل.

في الختام، تشير النتائج إلى وجود انفصال بين قدرة المستخدمين على التعرف على مخاطر الاستنتاج وفعاليتهم في التعامل معها. أظهر المشاركون وعيًا معقولًا بالمخاطر المتعلقة بالموقع وحالة العلاقة، لكنهم فشلوا في تحديد المخاطر المرتبطة بالمهنة وسمات أخرى. تؤكد الدراسة على الحاجة إلى تدخلات تصميمية تعزز فهم المستخدمين لمخاطر الاستنتاج، وتدعم استراتيجيات إعادة الكتابة الأكثر فعالية، وتنفذ حماية على مستوى النظام. هذه الرؤى ضرورية لتطوير أنظمة واعية بالاستنتاج تحمي الخصوصية بينما تمكن المستخدمين من التفاعل مع LLMs بثقة وفعالية أكبر.

مقدمة

تسلط المقدمة الضوء على التكامل الواسع للنماذج اللغوية الكبيرة (LLMs)، وخاصة ChatGPT، في الحياة اليومية، كما يتضح من قاعدة مستخدميها الكبيرة التي تصل إلى 800 مليون مستخدم نشط أسبوعيًا بحلول أبريل 2025. يثير هذا الاستخدام الواسع مخاوف كبيرة بشأن الخصوصية والثقة، حيث يتنقل المستخدمون بين فوائد مشاركة المعلومات والمخاطر المرتبطة بجمع البيانات. بينما تركز التقنيات الحالية لاكتشاف المعلومات الشخصية القابلة للتحديد (PII) على الإفصاحات الصريحة، فإنها تفشل في معالجة قضية حاسمة: مخاطر الخصوصية المستندة إلى الاستنتاج. تنشأ هذه المخاطر عندما تستنتج LLMs معلومات حساسة من محادثات تبدو غير ضارة، مما قد يكشف عن تفاصيل حول عمر المستخدم أو موقعه أو مهنته دون أي ذكر صريح لهذه البيانات.

تؤكد الورقة أن المستخدمين غالبًا ما يكونون غير مستعدين لهذه المخاطر الاستنتاجية، حيث يركزون بشكل أساسي على مخاوف جمع البيانات وتخزينها بدلاً من الآثار الأكثر دقة لتفاعلات LLM. يمتلك العديد من المستخدمين نماذج عقلية غير مكتملة أو غير دقيقة عن LLMs، حيث يرونها كأدوات بحث بدلاً من أنظمة تنبؤية، مما يزيد من تعقيد قدرتهم على توقع وتخفيف هذه المخاطر. يمكن أن تؤدي الطبيعة الحوارية لـ LLMs، التي تتميز بالبلاغة والتصميم الشبيه بالبشر، إلى كشف المستخدمين عن مزيد من المعلومات أكثر مما هو مقصود، مما يزيد من تحديات الخصوصية. لمعالجة هذه القضايا، يقترح المؤلفون التحقيق في الفجوة بين قدرات الاستنتاج لـ LLMs وتوقعات المستخدمين، مؤطرين استفسارهم حول أسئلة بحث محددة تهدف إلى تعزيز الشفافية والثقة ووكالة المستخدم في تفاعلات LLM.

النتائج

في هذا القسم، تقدم الدراسة تحليلًا مفصلًا لأداء المستخدمين فيما يتعلق بتقدير السمات المستهدفة المستنتجة من البيانات النصية. تشير النتائج إلى أن المستخدمين يظهرون مستويات متفاوتة من الدقة في تحديد هذه السمات، مما يشير إلى الحاجة إلى تحسين المنهجيات في تفسير النصوص. بعد ذلك، تقيم الدراسة مستويات قلق المستخدمين عند اكتشاف السمات المستنتجة، مما يبرز استجابات عاطفية ومعرفية كبيرة قد تؤثر على تفاعل المستخدم وثقته.

بالإضافة إلى ذلك، يتم مناقشة تقييم إعادة صياغة المستخدمين، مما يوفر رؤى حول كيفية تعديل المستخدمين لتفسيراتهم الأولية بناءً على المعلومات المستنتجة. يبرز هذا الجانب من النتائج الطبيعة الديناميكية لتفاعل المستخدم مع النص والآثار المترتبة على تصميم أنظمة تدعم فهمًا أفضل وتواصلًا للسمات المستنتجة. بشكل عام، تؤكد النتائج على أهمية وعي المستخدم وقدرته على التكيف في سياق استنتاج السمات من المصادر النصية.

المناقشة

في هذا القسم، يستكشف المؤلفون إدراك المستخدمين وقدراتهم بشأن استنتاج السمات الشخصية بواسطة النماذج اللغوية الكبيرة (LLMs) من خلال استطلاع شمل 240 مشاركًا. تتناول الدراسة ثلاثة أسئلة بحثية رئيسية: (1) وعي المستخدمين بالمعلومات الشخصية التي يمكن استنتاجها بواسطة LLMs، (2) الاختلافات في مستويات القلق بناءً على نوع المعلومات المستنتجة، و(3) فعالية إعادة الكتابة التي ينتجها المستخدمون لمنع مثل هذه الاستنتاجات. تم تقديم مقتطفات نصية من مجموعة بيانات SynthPAI للمشاركين وطلب منهم تقدير السمات القابلة للاستنتاج، والتعبير عن مستويات القلق عند الكشف عن هذه السمات، ومحاولة إعادة كتابة النصوص لإخفاء الاستنتاجات. تشير النتائج إلى أن المستخدمين عمومًا واجهوا صعوبة في توقع السمات التي يمكن استنتاجها بدقة، حيث أدوا بشكل أفضل قليلاً من الصدفة، وأظهروا قلقًا معتدلًا عند معرفة الاستنتاجات. بينما استخدم المشاركون استراتيجيات إعادة كتابة متنوعة—مثل الحذف، والتعميم، وإضافة الغموض—كانت فعالية هذه الإعادة محدودة مقارنة بتلك التي تم إنشاؤها بواسطة أدوات متقدمة مثل ChatGPT.

تمتد آثار هذا البحث إلى التطبيقات الواقعية لـ LLMs، مما يبرز الحاجة إلى تحسين وعي المستخدم وتدابير الحماية ضد مخاطر الخصوصية المستندة إلى الاستنتاج. تؤكد الدراسة أن المستخدمين غالبًا ما يبالغون في تقدير قدرات الاستنتاج لـ LLMs ويفتقرون إلى استراتيجيات فعالة للتخفيف من هذه المخاطر، مما يمكن أن يؤثر على تفاعلاتهم مع أنظمة الذكاء الاصطناعي عبر سياقات مختلفة، بما في ذلك مكان العمل والرعاية الصحية والتعليم. من خلال توضيح إدراك المستخدمين وسلوكياتهم، يهدف المؤلفون إلى إبلاغ تصميم تدابير الحماية الواعية بالاستنتاج التي توازن بين سهولة الاستخدام وحماية الخصوصية في تطبيقات LLM.

القيود

تقدم الدراسة عدة قيود قد تؤثر على تفسير نتائجها. أولاً، الاعتماد على مقتطفات نصية من مجموعة بيانات SynthPAI، بينما يوفر خط أنابيب منظم وحقائق مثبتة لاستنتاج السمات الشخصية، قد لا يعكس تعقيد تسرب المعلومات في العالم الحقيقي. أكمل المشاركون المهام في بيئة استطلاعية شجعت على المبالغة في التقدير، مما قد يشوه مستويات قلقهم المبلغ عنها ودافعهم لإعادة الكتابة، مما قد يمثل سلوكياتهم الفعلية في تفاعلات الدردشة ذات المخاطر العالية بشكل غير دقيق.

علاوة على ذلك، أدخل تصميم الاستطلاع تأثير تحفيز الخصوصية، حيث أصبح المشاركون على دراية بتركيز الدراسة على مخاطر الخصوصية بعد الأسئلة الأولية. قد يكون لهذه الوعي تأثير على تقييماتهم للقلق، مما يعكس فهمًا للمخاطر المحتملة بدلاً من إدراكاتهم غير الوسيطة. كما أن قابلية تعميم النتائج محدودة أيضًا، حيث يتكون العينة فقط من 240 بالغًا من الولايات المتحدة، مما قد لا يمثل سياقات ثقافية ولغوية متنوعة. بالإضافة إلى ذلك، قد يكون استخدام البيانات الاصطناعية قد قلل من الاتصال الشخصي للمشاركين بالنص، مما قد يؤدي إلى اعتبار أقل جدية لمخاطر الخصوصية. يجب أن تهدف الأبحاث المستقبلية إلى استكشاف هذه القضايا في سياقات أكثر واقعية وطويلة الأمد وعبر ثقافات مختلفة، بالإضافة إلى التحقيق في كيفية تأثير تصميم الواجهة والإشارات السياقية على وعي المستخدمين واستجاباتهم لمخاطر الخصوصية المستندة إلى الاستنتاج.

Journal: Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems
DOI: https://doi.org/10.1145/3772318.3791762
Publication Date: 2026-04-13
Author(s): Synthia Wang et al.
Primary Topic: Topic Modeling

Overview

The research investigates how users perceive and respond to inference-based privacy risks associated with Large Language Models (LLMs) like ChatGPT. A survey involving 240 U.S. participants revealed that users struggled to accurately estimate inference risks, performing only slightly better than chance. While participants attempted rewrites to mitigate these risks, their success rate was only 28%, which, although better than the state-of-the-art sanitization tool Rescriber, was significantly lower than the effectiveness of ChatGPT. The study identified that while paraphrasing was the most frequently employed strategy, it was the least effective; strategies such as abstraction and adding ambiguity yielded better results.

In conclusion, the findings indicate a disconnect between users’ ability to recognize inference risks and their effectiveness in addressing them. Participants showed reasonable awareness of risks related to location and relationship status but failed to identify risks associated with occupation and other attributes. The study emphasizes the need for design interventions that enhance users’ understanding of inference risks, support more effective rewriting strategies, and implement system-level protections. These insights are crucial for developing inference-aware systems that safeguard privacy while enabling users to interact with LLMs more confidently and effectively.

Introduction

The introduction highlights the pervasive integration of Large Language Models (LLMs), particularly ChatGPT, into daily life, evidenced by its substantial user base of 800 million weekly active users by April 2025. This widespread usage raises significant concerns regarding privacy and trust, as users navigate the balance between the benefits of sharing information and the risks associated with data collection. While existing techniques for Personally Identifiable Information (PII) detection focus on explicit disclosures, they fail to address a critical issue: inference-based privacy risks. These risks arise when LLMs deduce sensitive information from seemingly innocuous conversations, potentially revealing details about a user’s age, location, or occupation without any explicit mention of such data.

The paper underscores that users are often ill-prepared for these inference risks, primarily focusing on data collection and storage concerns rather than the subtler implications of LLM interactions. Many users possess incomplete or inaccurate mental models of LLMs, viewing them as search engines rather than predictive systems, which further complicates their ability to anticipate and mitigate these risks. The conversational nature of LLMs, characterized by fluency and anthropomorphic design, can lead users to disclose more information than intended, exacerbating the privacy challenges. To address these issues, the authors propose to investigate the gap between LLMs’ inference capabilities and users’ expectations, framing their inquiry around specific research questions aimed at enhancing transparency, trust, and user agency in LLM interactions.

Results

In this section, the research presents a detailed analysis of user performance regarding the estimation of target attributes inferred from textual data. The findings indicate that users exhibit varying levels of accuracy in identifying these attributes, which suggests a need for improved methodologies in text interpretation. Following this, the study assesses user concern levels upon discovering the inferred attributes, highlighting significant emotional and cognitive responses that may influence user engagement and trust.

Additionally, the evaluation of user rewrites is discussed, providing insights into how users modify their initial interpretations based on the inferred information. This aspect of the findings underscores the dynamic nature of user interaction with text and the implications for designing systems that support better understanding and communication of inferred attributes. Overall, the results emphasize the importance of user awareness and adaptability in the context of attribute inference from textual sources.

Discussion

In this section, the authors investigate user perceptions and capabilities regarding personal attribute inference by large language models (LLMs) through a survey of 240 participants. The study addresses three primary research questions: (1) users’ awareness of the personal information that can be inferred by LLMs, (2) variations in concern levels based on the type of inferred information, and (3) the effectiveness of user-generated rewrites to prevent such inferences. Participants were presented with text snippets from the SynthPAI dataset and asked to estimate inferable attributes, express concern levels upon revelation of these attributes, and attempt to rewrite the texts to obscure the inferences. The findings indicate that users generally struggled to accurately predict which attributes could be inferred, performing only slightly above chance, and exhibited moderate concern upon learning about the inferences. While participants employed various rewriting strategies—such as omission, generalization, and adding ambiguity—the effectiveness of these rewrites was limited compared to those generated by advanced tools like ChatGPT.

The implications of this research extend to real-world applications of LLMs, highlighting the need for improved user awareness and protective measures against inference-based privacy risks. The study underscores that users often misestimate the inference capabilities of LLMs and lack effective strategies to mitigate these risks, which can influence their interactions with AI systems across various contexts, including workplace, healthcare, and education. By elucidating user perceptions and behaviors, the authors aim to inform the design of inference-aware safeguards that balance usability with privacy protection in LLM applications.

Limitations

The study presents several limitations that may affect the interpretation of its findings. Firstly, the reliance on text snippets from the SynthPAI dataset, while providing a structured pipeline and established ground truth for personal attribute inference, may not encapsulate the complexity of real-world information leakage. Participants completed tasks in a survey environment that encouraged overestimation, which could skew their reported levels of concern and motivation for rewriting, potentially misrepresenting their actual behaviors in high-stakes chatbot interactions.

Moreover, the survey design introduced a privacy priming effect, as participants became aware of the study’s focus on privacy risks after initial questions. This awareness may have influenced their concern ratings, reflecting an understanding of potential risks rather than their unmediated perceptions. The generalizability of the findings is also limited, as the sample consisted solely of 240 U.S. adults, which may not represent diverse cultural and linguistic contexts. Additionally, the use of synthetic data may have diminished participants’ personal connection to the text, possibly leading to a less serious consideration of privacy risks. Future research should aim to explore these issues in more realistic, longitudinal, and cross-cultural settings, as well as investigate how interface design and contextual cues impact users’ awareness and responses to inference-based privacy risks.