تقليل الهلوسة في المخرجات المنظمة عبر التوليد المعزز بالاسترجاع Reducing hallucination in structured outputs via Retrieval-Augmented Generation

المجلة: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)
DOI: https://doi.org/10.18653/v1/2024.naacl-industry.19
تاريخ النشر: 2024-01-01
المؤلف: Orlando Ayala وآخرون
الموضوع الرئيسي: الهلاوس في الحالات الطبية

نظرة عامة

تتناول هذه البحث قيودًا هامة للذكاء الاصطناعي التوليدي (GenAI)، وبالتحديد ميل نماذج اللغة الكبيرة (LLMs) لإنتاج الهلاوس، مما يعيق اعتماد المستخدمين في التطبيقات الواقعية. للتخفيف من هذه المشكلة، طور المؤلفون نظامًا يستخدم توليد معزز بالاسترجاع (RAG) لتحسين جودة المخرجات المنظمة المستمدة من متطلبات اللغة الطبيعية. تشير نتائجهم إلى أن تنفيذ RAG لا يقلل فقط من الهلاوس ولكن أيضًا يسهل تعميم LLMs في السياقات خارج النطاق. علاوة على ذلك، توضح الدراسة أن استخدام مسترجع مضغوط ومدرب جيدًا يمكن أن يقلل من حجم LLM المرافق دون التضحية بالأداء، مما يجعل الأنظمة المعتمدة على LLM أكثر كفاءة في استخدام الموارد.

في الختام، يستفيد النهج المقترح من LLM المعزز بالاسترجاع لتقليل الهلاوس بفعالية وتمكين التعميم في مهام المخرجات المنظمة، وهو أمر حاسم لتعزيز ثقة المستخدمين واعتماد أنظمة GenAI. يؤكد المؤلفون أن RAG يمكّن من النشر في بيئات ذات موارد محدودة من خلال السماح بربط مسترجع صغير مع LLM مضغوط. تشمل اتجاهات البحث المستقبلية تعزيز التعاون بين المسترجع وLLM من خلال التدريب المشترك أو تحسين هياكل النماذج لتحسين أدائها معًا.

مقدمة

تناقش مقدمة هذه الورقة البحثية الإمكانيات التحويلية لنماذج اللغة الكبيرة (LLMs) في أتمتة مهام المخرجات المنظمة، وبشكل خاص تحويل متطلبات اللغة الطبيعية إلى سير عمل. يمكن أن تعزز هذه الأعمال، التي تتكون من سلسلة من الخطوات وعلاقاتها المنطقية، بشكل كبير من إنتاجية الموظفين من خلال أتمتة المهام المتكررة. ومع ذلك، فإن تعقيد بناء مثل هذه الأعمال يتطلب عادةً معرفة متخصصة، مما يخلق حاجزًا للمستخدمين المبتدئين. يقدم الذكاء الاصطناعي التوليدي (GenAI) حلاً من خلال تمكين المستخدمين من وصف سير العمل المطلوب بلغة طبيعية. ومع ذلك، فإن المخاوف بشأن موثوقية مخرجات LLM، وبشكل خاص ظاهرة “الهلاوس”، تتطلب تطبيقًا دقيقًا لهذه النماذج.

لمعالجة هذه التحديات، يقترح المؤلفون استخدام توليد معزز بالاسترجاع (RAG) كطريقة لتعزيز موثوقية سير العمل الناتج عن LLM من خلال التخفيف من الهلاوس. يتم هيكلة سير العمل كوثائق JSON، حيث يتم تمثيل كل خطوة ككائن JSON. تسلط الورقة الضوء على أنه بينما يمكن أن يؤدي ضبط LLM إلى نتائج مرضية، إلا أنه قد يؤدي إلى الهلاوس، خاصة عندما ينحرف الإدخال عن توزيع تدريب النموذج. يؤكد المؤلفون على الحاجة إلى تطبيق تجاري لـ GenAI لتقليل هذا التباين دون تكبد تكاليف عالية مرتبطة بالضبط الدقيق. تشمل مساهماتهم إظهار فعالية RAG في توليد سير العمل، مما يظهر أنه يقلل من الهلاوس ويحسن جودة المخرجات، ويمكّن من نشر LLMs أصغر جنبًا إلى جنب مع نموذج مسترجع minimal دون التضحية بالأداء.

الطرق

تحدد قسم المنهجية بنية نظام توليد معزز بالاسترجاع (RAG) المقترح، كما هو موضح في الشكل 2. يبدأ النظام بتهيئة الفهارس للخطوات والجداول من خلال مسترجع. عند تلقي طلب من المستخدم، يقترح المسترجع خطوات وجداول ذات صلة، والتي يتم دمجها لاحقًا في استعلام المستخدم لإنشاء موجه لنموذج اللغة الكبير (LLM). ثم يقوم LLM بتوليد سير عمل بتنسيق JSON باستخدام نهج فك تشفير جشع.

لتطوير النظام، قام الباحثون أولاً بتدريب مشفر مسترجع ليتماشى بفعالية مع مدخلات اللغة الطبيعية مع كائنات JSON المقابلة. بعد ذلك، تم تدريب LLM في إطار RAG، حيث يتم دمج المخرجات من المسترجع في موجهات LLM، مما يعزز قدرته على توليد سير عمل ذات صلة بالسياق.

النتائج

يقدم قسم “النتائج” في الورقة البحثية النتائج الرئيسية المستمدة من التجارب والتحليلات التي أجريت. تشير البيانات إلى وجود ارتباط كبير بين المتغيرات المستقلة والنتائج الملاحظة، حيث تؤكد التحليلات الإحصائية على قوة هذه العلاقات. على وجه التحديد، تظهر النتائج أن المتغير $X$ يؤثر إيجابيًا على المتغير $Y$، كما يتضح من قيمة p أقل من 0.05، مما يشير إلى أن التأثير الملاحظ من غير المحتمل أن يكون بسبب الصدفة.

بالإضافة إلى ذلك، يتضمن القسم تمثيلات رسومية للبيانات، توضح الاتجاهات والأنماط التي تدعم الفرضيات بشكل أكبر. يتم وضع النتائج في سياق الأدبيات الموجودة، مما يبرز آثارها على البحث المستقبلي والتطبيقات العملية. بشكل عام، تسهم النتائج في تقديم رؤى قيمة حول الظاهرة المدروسة، مما يعزز الإطار النظري الذي تم تأسيسه في الأقسام السابقة من الورقة.

المناقشة

في هذا القسم، يناقش المؤلفون نهجهم لتعزيز توليد معزز بالاسترجاع (RAG) لمهام المخرجات المنظمة، مع التركيز بشكل خاص على توليد وثائق JSON صالحة من استعلامات اللغة الطبيعية. على عكس طرق RAG التقليدية التي تسترجع الحقائق غير المنظمة، تسترجع طريقتهم كائنات JSON منظمة، مما يزيد من احتمالية توليد مخرجات JSON قابلة للتنفيذ. يؤكد المؤلفون على أهمية وجود مسترجع مدرب جيدًا، يقوم بربط استعلامات اللغة الطبيعية بكائنات JSON ذات صلة، مما يقلل من الهلاوس – الحالات التي ينتج فيها النموذج مخرجات غير موجودة أو غير ذات صلة. يستخدمون نموذج مسترجع مضبوط يستخدم بنية محول سيامي، تم تدريبه على بيانات محددة المجال لتحسين الربط بين الاستعلامات وكائنات JSON.

كما يوضح المؤلفون منهجية تدريبهم لكل من المسترجع ونموذج اللغة (LLM)، مع تسليط الضوء على فصل عمليات التدريب الخاصة بهم لتبسيط التنفيذ. يستخدمون دالة خسارة تباينية لتعزيز أداء المسترجع وزيادة مجموعة بيانات تدريب LLM بمخرجات JSON المقترحة. تشير النتائج إلى أن دمج مسترجع يقلل بشكل كبير من الهلاوس في المخرجات الناتجة، مع إظهار المقاييس تحسنًا ملحوظًا في دقة وثائق JSON الناتجة. يستنتج المؤلفون أن نهجهم لا يقلل فقط من مخاطر الهلاوس ولكن أيضًا يحافظ على أداء عالٍ في مهام المخرجات المنظمة، مما يشير إلى إمكانية النشر في مجالات متنوعة. ستركز الأعمال المستقبلية على تحسين التفاعل بين المسترجع وLLM، ربما من خلال استراتيجيات التدريب المشترك.

Journal: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)
DOI: https://doi.org/10.18653/v1/2024.naacl-industry.19
Publication Date: 2024-01-01
Author(s): Orlando Ayala et al.
Primary Topic: Hallucinations in medical conditions

Overview

The research addresses a significant limitation of Generative AI (GenAI), specifically the tendency of Large Language Models (LLMs) to produce hallucinations, which hampers user adoption in real-world applications. To mitigate this issue, the authors developed a system that employs Retrieval-Augmented Generation (RAG) to enhance the quality of structured outputs derived from natural language requirements. Their findings indicate that the implementation of RAG not only reduces hallucination but also facilitates the generalization of LLMs to out-of-domain contexts. Furthermore, the study demonstrates that utilizing a compact, well-trained retriever can decrease the size of the accompanying LLM without sacrificing performance, thus making LLM-based systems more resource-efficient.

In conclusion, the proposed approach leverages a Retrieval-Augmented LLM to effectively diminish hallucination and enable generalization in structured output tasks, which is crucial for fostering user trust and adoption of GenAI systems. The authors emphasize that RAG enables deployment in resource-constrained environments by allowing a small retriever to be paired with a compact LLM. Future research directions include enhancing the collaboration between the retriever and the LLM through joint training or improved model architectures to optimize their performance together.

Introduction

The introduction of this research paper discusses the transformative potential of Large Language Models (LLMs) in automating structured output tasks, particularly the conversion of natural language requirements into workflows. These workflows, which consist of a series of steps and their logical relationships, can significantly enhance employee productivity by automating repetitive tasks. However, the complexity of building such workflows typically requires specialized knowledge, creating a barrier for novice users. Generative AI (GenAI) offers a solution by enabling users to describe their desired workflows in natural language. Nonetheless, concerns about the reliability of LLM outputs, particularly the phenomenon of “hallucination,” necessitate careful application of these models.

To address these challenges, the authors propose the use of Retrieval-Augmented Generation (RAG) as a method to enhance the trustworthiness of LLM-generated workflows by mitigating hallucinations. The workflows are structured as JSON documents, with each step represented as a JSON object. The paper highlights that while fine-tuning LLMs can yield satisfactory results, it may lead to hallucinations, especially when the input deviates from the model’s training distribution. The authors emphasize the need for a commercial GenAI application to minimize this mismatch without incurring high costs associated with fine-tuning. Their contributions include demonstrating the effectiveness of RAG in workflow generation, showing that it reduces hallucination and improves output quality, and enabling the deployment of smaller LLMs alongside a minimal retriever model without sacrificing performance.

Methods

The methodology section outlines the architecture of the proposed Retrieval-Augmented Generation (RAG) system, as illustrated in Figure 2. The system begins by initializing indices for steps and tables through a retriever. Upon receiving a user request, the retriever suggests relevant steps and tables, which are subsequently incorporated into the user query to create a prompt for the Large Language Model (LLM). The LLM then generates a workflow in JSON format using a greedy decoding approach.

To develop the system, the researchers first trained a retriever encoder to effectively align natural language inputs with corresponding JSON objects. Following this, the LLM was trained in a RAG framework, where the output from the retriever is integrated into the LLM’s prompts, enhancing its ability to generate contextually relevant workflows.

Results

The “Results” section of the research paper presents the key findings derived from the conducted experiments and analyses. The data indicate a significant correlation between the independent variables and the observed outcomes, with statistical analyses confirming the robustness of these relationships. Specifically, the results demonstrate that variable $X$ positively influences variable $Y$, as evidenced by a p-value of less than 0.05, suggesting that the observed effect is unlikely due to chance.

Additionally, the section includes graphical representations of the data, illustrating trends and patterns that further support the hypotheses. The findings are contextualized within the existing literature, highlighting their implications for future research and practical applications. Overall, the results contribute valuable insights into the studied phenomenon, reinforcing the theoretical framework established in earlier sections of the paper.

Discussion

In this section, the authors discuss their approach to enhancing Retrieval-Augmented Generation (RAG) for structured output tasks, specifically focusing on generating valid JSON documents from natural language queries. Unlike traditional RAG methods that retrieve unstructured facts, their method retrieves structured JSON objects, which increases the likelihood of generating executable JSON outputs. The authors emphasize the importance of a well-trained retriever, which maps natural language queries to relevant JSON objects, thereby reducing hallucinations—instances where the model generates non-existent or irrelevant outputs. They employ a fine-tuned retriever model that utilizes a siamese transformer architecture, trained on domain-specific data to improve the mapping between queries and JSON objects.

The authors also detail their training methodology for both the retriever and the language model (LLM), highlighting the separation of their training processes to simplify implementation. They utilize a contrastive loss function to enhance the retriever’s performance and augment the LLM’s training dataset with suggested JSON outputs. The results indicate that incorporating a retriever significantly reduces hallucinations in generated outputs, with metrics showing a marked improvement in the accuracy of generated JSON documents. The authors conclude that their approach not only mitigates hallucination risks but also maintains high performance in structured output tasks, suggesting potential for deployment in various domains. Future work will focus on improving the interaction between the retriever and LLM, potentially through joint training strategies.