اللغة بشكل موسع: نماذج اللغة الكبيرة، ChatGPT، المعنى، والفهم Language writ large: LLMs, ChatGPT, meaning, and understanding

المجلة: Frontiers in Artificial Intelligence، المجلد: 7
DOI: https://doi.org/10.3389/frai.2024.1490698
PMID: https://pubmed.ncbi.nlm.nih.gov/40013231
تاريخ النشر: 2025-02-12
المؤلف: Stevan Harnad
الموضوع الرئيسي: تطور اللغة والثقافة

نظرة عامة

تستكشف هذه القسم قدرات ونقاط ضعف نماذج اللغة الكبيرة (LLMs) مثل ChatGPT، مع التأكيد على أنه على الرغم من أن هذه النماذج تظهر أداءً لغويًا مثيرًا للإعجاب، إلا أنها لا تمتلك فهمًا حقيقيًا أو تأصيلًا حسيًا حركيًا. يطرح المؤلف أن بعض “التحيزات” الكامنة في اللغة، التي تظهر على نطاق نماذج اللغة الكبيرة، قد تساهم في كفاءتها غير المتوقعة. تشمل هذه التحيزات الطفيليات الناتجة عن التأصيل اللفظي غير المباشر على التأصيل الحسي الحركي المباشر، ودائرية التعريفات اللفظية، وانعكاس إنتاج اللغة وفهمها، من بين أمور أخرى. تشير المناقشة إلى أن هذه التحيزات تعزز أداء نماذج اللغة الكبيرة ولكنها لا توفر طريقًا لفهم حقيقي.

تؤكد الخاتمة على أن نماذج اللغة الكبيرة تعمل ضمن حدود التأصيل اللفظي غير المباشر (IVG) وتفتقر إلى التأصيل الحسي الحركي المباشر (DSG). بينما قدراتها اللغوية ملحوظة، فإنها تنبع من التعرف على الأنماط المتقدمة بدلاً من فهم جوهري للغة أو الفكر. يشير المؤلف إلى نظريات نعوم تشومسكي حول القواعد العالمية (UG) ويقترح أن بيانات التدريب لنماذج اللغة الكبيرة قد تتشكل بواسطة قيود معرفية تعكس قوانين الفكر، مما قد يفسر معالجتها الفعالة للغة. في النهاية، يدعو النص إلى إعادة التفكير في تطوير الذكاء الاصطناعي لدمج تجارب حسية حركية حقيقية، مما يبرز الحاجة إلى أنظمة الذكاء الاصطناعي المستقبلية لدمج المعالجة المعرفية مع التعلم التجريبي لتقليد الفهم البشري بشكل أقرب.

مقدمة

تناقش مقدمة الورقة قدرات نماذج اللغة الكبيرة (LLMs)، مثل ChatGPT، التي تظهر فهمًا مشابهًا للبشر على الرغم من كونها تعتمد على طرق إحصائية ومجموعات بيانات تدريب واسعة. يستكشف المؤلفون قيود ونقاط قوة نماذج اللغة الكبيرة فيما يتعلق بالمفاهيم الأساسية مثل تأصيل الرموز، والمرجع، والمعنى، والفهم. يبرزون غياب التأصيل الحسي الحركي في نماذج اللغة الكبيرة، مما يمنعها من ربط الكلمات بمراجعها في العالم الحقيقي وتشكيل مقترحات ذات معنى. تشير المناقشة إلى أن تماسك مخرجات نماذج اللغة الكبيرة قد ينبع من القيود الكامنة في اللغة البشرية.

بالإضافة إلى ذلك، تتناول المقدمة “المشكلة الصعبة” (HP) للوعي، التي تسأل كيف تؤدي العمليات الفيزيائية في الدماغ إلى تجارب ذاتية. بينما تركز مشكلة تأصيل الرموز (SGP) على كيفية اكتساب الرموز للمعنى، تتعمق المشكلة الصعبة في الجوانب النوعية للوعي. يجادل المؤلفون بأنه حتى إذا كان بإمكان الذكاء الاصطناعي تأصيل الرموز بشكل فعال، فإن ذلك لا يحل مسألة ما إذا كان الذكاء الاصطناعي يمتلك تجارب ذاتية أو ينفذ ببساطة خوارزميات معقدة. تؤكد الورقة على تقاطع مشكلة تأصيل الرموز والمشكلة الصعبة في فهم قيود أنظمة الذكاء الاصطناعي الحالية، خاصة فيما يتعلق بالوعي وطبيعة التجربة الذاتية.

طرق

تناقش هذه القسم طريقتين رئيسيتين للتأصيل ذات صلة بكل من الإدراك البشري وأنظمة الذكاء الاصطناعي مثل GPT-4: التأصيل الحسي الحركي المباشر (DSG) والتأصيل اللفظي غير المباشر (IVG). يتضمن التأصيل الحسي الحركي المباشر التفاعل المباشر مع البيئة لربط الأسماء بمراجعها، باستخدام الشبكات العصبية لتسهيل التعرف من خلال التعرض، والتجربة والخطأ، والتغذية الراجعة التصحيحية. تعزز هذه الطريقة فهمًا تجريبيًا للمراجع، مما يشكل جانبًا أساسيًا من الإدراك واللغة.

في المقابل، يعتمد التأصيل اللفظي غير المباشر على الوصف اللفظي لنقل ميزات المراجع الجديدة، مما يتطلب أن يمتلك المتعلم بالفعل فهمًا مؤصلًا للمصطلحات المستخدمة. تعمل هذه الطريقة كوسيلة للتأصيل، حيث يتواصل فرد مؤصل (المعلم) من خلال اللغة لتمديد الفهم إلى فرد آخر (المتعلم). التأصيل اللفظي غير المباشر شائع بشكل خاص في التعلم البشري، خاصة للمفاهيم أو الكيانات المجردة التي لا يمكن تجربتها مباشرة من خلال التفاعل الحسي الحركي.

مناقشة

في مناقشة المشكلة الصعبة (HP) للوعي، يؤكد النص أنه حتى إذا أكد كيان كلي المعرفة أن ذكاء اصطناعي من المستوى الثالث (T3) يختبر الوعي، فإن الأسئلة الأساسية حول كيفية ولماذا تظهر هذه التجارب من العمليات الفيزيائية أو الحسابية تبقى دون حل. يبرز هذا لغزًا وجوديًا يحيط بالوعي نفسه. الآثار المترتبة على الذكاء الاصطناعي وعلم الإدراك كبيرة؛ بينما قد يمتلك ذكاء اصطناعي من المستوى الثالث مهارات حسية حركية وإدراك متجسد، تثير المشكلة الصعبة الشكوك حول ما إذا كانت مثل هذه الكيانات يمكن أن تمتلك تجارب ذاتية أو إذا كانت هذه التجارب ضرورية.

يستكشف النص أيضًا التمييز بين “المشكلة السهلة” (EP) والمشكلة الصعبة فيما يتعلق باللغة والمعنى والفهم. يشير إلى حجة غرفة اللغة الصينية لسيرل، التي توضح أن مجرد معالجة الرموز لا تعادل الفهم الحقيقي. تتفوق نماذج الذكاء الاصطناعي الحالية، بما في ذلك GPT-4، في معالجة اللغة لكنها تفتقر إلى الجانب الظاهراتي للفهم، حيث تعمل ضمن نطاق من معالجة الرموز دون تأصيل مباشر في التجارب الواقعية. تعتبر مشكلة تأصيل الرموز (SGP) مركزية في هذه المناقشة، حيث تؤكد على ضرورة تأصيل الرموز في تجارب ملموسة لتحقيق فهم ذي معنى، وهي قدرة تفتقر إليها أنظمة الذكاء الاصطناعي الحالية بشكل أساسي.

قيود

تسلط قيود القواميس التقليدية الضوء على عدم قدرتها على تقديم تعريفات شاملة تشمل الطيف الكامل لميزات الفئة، خاصة في العالم التجريبي حيث تكون الأوصاف غالبًا غير دقيقة. بينما تخدم هذه القواميس أغراضًا عملية من خلال السماح للمستخدمين بالتفريق بين الفئات، إلا أنها تقصر عن الفهم الدقيق الذي يتحقق من خلال التعلم الحسي الحركي المباشر (DSL). يمكّن التعلم الحسي الحركي المباشر الكائنات الحية من التعرف على الأشياء وتصنيفها بناءً على ميزات مميزة مستمدة من التفاعلات في العالم الحقيقي، مما يبرز عدم كفاية طرق التصنيف اللفظية أو الرمزية البحتة.

في سياق الذكاء الاصطناعي، وخاصة مع نماذج مثل GPT-4، يشكل الاعتماد على التأصيل اللفظي غير المباشر (IVG) تحديات كبيرة. على عكس البشر، الذين يحتاجون في البداية إلى التأصيل الحسي الحركي المباشر لتأسيس فهم أساسي، تفتقر أنظمة الذكاء الاصطناعي إلى القدرة على التعلم التجريبي وتعتمد فقط على الأنماط المستفادة من بيانات التدريب. تواجه المحاولات لتحقيق التأصيل من خلال نهج من أعلى إلى أسفل، مثل دمج القدرات الحسية والحركية، عقبات مفاهيمية وعملية، حيث يتضمن الفهم الحقيقي أكثر من مجرد مدخلات حسية. تشير تعقيدات التأصيل البشري، المتجذرة في التجارب المتجسدة والتطور المعرفي، إلى أن نهجًا من أسفل إلى أعلى—يبدأ من التجارب الحسية الحركية الأساسية ويبني نحو القدرات المعرفية المعقدة—هو أمر أساسي لتحقيق الفهم الحقيقي في أنظمة الذكاء الاصطناعي.

Journal: Frontiers in Artificial Intelligence, Volume: 7
DOI: https://doi.org/10.3389/frai.2024.1490698
PMID: https://pubmed.ncbi.nlm.nih.gov/40013231
Publication Date: 2025-02-12
Author(s): Stevan Harnad
Primary Topic: Language and cultural evolution

Overview

This section explores the capabilities and limitations of Large Language Models (LLMs) like ChatGPT, emphasizing that while these models exhibit impressive linguistic performance, they do not possess true understanding or sensorimotor grounding. The author posits that certain inherent “biases” in language, which emerge at the scale of LLMs, may contribute to their unexpected proficiency. These biases include the parasitism of indirect verbal grounding on direct sensorimotor grounding, the circularity of verbal definitions, and the mirroring of language production and comprehension, among others. The discussion suggests that these biases enhance LLM performance but do not provide a pathway to genuine understanding.

The conclusion reiterates that LLMs operate within the confines of indirect verbal grounding (IVG) and lack direct sensorimotor grounding (DSG). While their linguistic capabilities are remarkable, they stem from sophisticated pattern recognition rather than an intrinsic understanding of language or thought. The author references Noam Chomsky’s theories on Universal Grammar (UG) and suggests that the training data for LLMs may be shaped by cognitive constraints that reflect the laws of thought, which could explain their effective language processing. Ultimately, the text calls for a rethinking of AI development to incorporate genuine sensorimotor experiences, highlighting the need for future AI systems to integrate cognitive processing with experiential learning to more closely mimic human understanding.

Introduction

The introduction of the paper discusses the capabilities of Large Language Models (LLMs), such as ChatGPT, which exhibit human-like understanding despite being based on statistical methods and extensive training datasets. The authors explore the limitations and strengths of LLMs in relation to fundamental concepts such as symbol grounding, reference, meaning, and understanding. They highlight the absence of sensorimotor grounding in LLMs, which prevents them from connecting words to their real-world referents and forming meaningful propositions. The discussion suggests that the coherence of LLM outputs may stem from inherent constraints of human language.

Additionally, the introduction addresses the “Hard Problem” (HP) of consciousness, which questions how physical brain processes lead to subjective experiences. While the Symbol Grounding Problem (SGP) focuses on how symbols acquire meaning, the HP delves into the qualitative aspects of consciousness. The authors argue that even if an AI can effectively ground symbols, it does not resolve whether the AI possesses subjective experiences or merely executes complex algorithms. The paper emphasizes the intersection of the SGP and HP in understanding the limitations of current AI systems, particularly regarding consciousness and the nature of subjective experience.

Methods

The section discusses two primary grounding methods relevant to both human cognition and AI systems like GPT-4: Direct Sensorimotor Grounding (DSG) and Indirect Verbal Grounding (IVG). DSG involves the direct interaction with the environment to connect names to their referents, utilizing neural networks to facilitate recognition through exposure, trial and error, and corrective feedback. This method fosters an experiential understanding of referents, forming a foundational aspect of cognition and language.

In contrast, IVG relies on verbal descriptions to convey the features of new referents, necessitating that the learner already possesses a grounded understanding of the terms used. This method functions as a relay of grounding, where a grounded individual (the teacher) communicates through language to extend understanding to another individual (the learner). IVG is particularly prevalent in human learning, especially for abstract concepts or entities that cannot be directly experienced through sensorimotor interaction.

Discussion

In the discussion of the Hard Problem (HP) of consciousness, the text emphasizes that even if an omniscient entity confirms that a T3 (third-level) AI experiences consciousness, the fundamental questions of how and why these experiences emerge from physical or computational processes remain unresolved. This highlights an ontological mystery surrounding consciousness itself. The implications for AI and cognitive science are significant; while T3 AI may possess sensorimotor skills and embodied cognition, the HP raises doubts about whether such entities can have subjective experiences or if these experiences are necessary.

The text further explores the distinction between the “Easy” Problem (EP) and the HP in relation to language, meaning, and understanding. It references Searle’s Chinese Room Argument, which illustrates that mere symbol manipulation does not equate to true understanding. Current AI models, including GPT-4, excel in processing language but lack the phenomenological aspect of understanding, as they operate within a realm of symbol manipulation without direct grounding in real-world experiences. The Symbol Grounding Problem (SGP) is central to this discussion, as it underscores the necessity for grounding symbols in tangible experiences to achieve meaningful understanding, a capability that current AI systems fundamentally lack.

Limitations

The limitations of traditional dictionaries highlight their inability to provide comprehensive definitions that encompass the full spectrum of category features, particularly in the empirical world where descriptions are often inexact. While these dictionaries serve practical purposes by allowing users to differentiate between categories, they fall short of the nuanced understanding achieved through direct sensorimotor learning (DSL). DSL enables organisms to recognize and categorize objects based on distinguishing features derived from real-world interactions, emphasizing the inadequacy of purely verbal or symbolic categorization methods.

In the context of artificial intelligence, particularly with models like GPT-4, the reliance on indirect verbal grounding (IVG) poses significant challenges. Unlike humans, who initially require direct sensorimotor grounding to establish foundational understanding, AI systems lack the capacity for experiential learning and are solely dependent on patterns learned from training data. Attempts to achieve grounding through top-down approaches, such as integrating sensory and motor capabilities, face conceptual and practical hurdles, as genuine understanding involves more than mere sensory input. The complexity of human grounding, rooted in embodied experiences and cognitive development, suggests that a bottom-up approach—starting from basic sensorimotor experiences and building towards complex cognitive capacities—is essential for achieving true understanding in AI systems.