تبادل المعرفة في التصنيع باستخدام أدوات مدعومة من LLM: دراسة مستخدمين ومعايير نموذجية Knowledge sharing in manufacturing using LLM-powered tools: user study and model benchmarking

المجلة: Frontiers in Artificial Intelligence، المجلد: 7
DOI: https://doi.org/10.3389/frai.2024.1293084
PMID: https://pubmed.ncbi.nlm.nih.gov/38601111
تاريخ النشر: 2024-03-27
المؤلف: Samuel Kernan Freire وآخرون
الموضوع الرئيسي: جودة البيانات وإدارتها

نظرة عامة

تناقش ورقة البحث تنفيذ نظام قائم على نموذج لغة كبير (LLM) يهدف إلى تعزيز تبادل المعرفة في بيئات التصنيع، حيث تتطلب تعقيدات تشغيل خطوط الإنتاج استرجاع معلومات فعال. تم تصميم النظام لاستخراج المعلومات ذات الصلة من الوثائق الواسعة للمصنع ومعرفة المشغلين الخبراء، مما يسهل الاستجابة السريعة لاستفسارات المشغلين. أظهرت دراسة مستخدم أجريت في بيئة مصنع أن نظام LLM قد حسّن بشكل كبير من سرعة استرجاع المعلومات وكفاءة حل المشكلات، إلا أن المشغلين لا يزالوا يفضلون التعلم من الخبراء البشريين عند توفرهم.

أشارت نتائج القياس إلى أن النموذج المتطور، GPT-4، تفوق على النماذج الأخرى من حيث الدقة الواقعية، والكمال، وتقليل الهلاوس. كما أظهرت النماذج مفتوحة المصدر، مثل StableBeluga2 و Mixtral 8x7B، أداءً تنافسياً، مقدمة مزايا في خصوصية البيانات والتخصيص. أبرزت الدراسة سهولة استخدام النظام ووظيفته المنطقية، على الرغم من ملاحظات حول تحسين واجهة المستخدم وخصوصية المحتوى. بشكل عام، تشير النتائج إلى أن أدوات LLM يمكن أن تحدث ثورة في عمليات المصنع وتسريع بعض المهام، على الرغم من المخاوف المتعلقة بالسلامة والكفاءة والتفوق المدرك للخبرة البشرية.

مقدمة

تناقش مقدمة ورقة البحث مفهوم التصنيع المتمحور حول الإنسان، الذي يهدف إلى دمج نقاط القوة بين البشر والآلات لتحسين الإبداع، والرفاهية، وحل المشكلات، والإنتاجية في البيئات الصناعية. على الرغم من التقدم، لا يزال هناك تحدٍ كبير في إدارة المعرفة الواسعة التي يتم توليدها في بيئات التصنيع، مثل تقارير المشكلات ووثائق الآلات. غالبًا ما يتم استغلال هذه المعرفة بشكل غير كافٍ بسبب الصعوبات في معالجة وتفسير المعلومات التقنية غير المنظمة. أثبتت الطرق التقليدية لإدارة المعرفة أنها مرهقة، مما دفع المشغلين للاعتماد على الأجهزة الشخصية بدلاً من الإجراءات الرسمية، بينما تكافح أنظمة الذكاء الاصطناعي مع تعقيد البيانات.

تقدم التطورات الأخيرة في نماذج اللغة الكبيرة (LLMs)، مثل GPT-4، حلاً محتملاً لهذه التحديات من خلال تمكين التفسير الفعال، والتلخيص، واسترجاع المعلومات من مجموعات بيانات نصية كبيرة. ومع ذلك، فإن تطبيق LLMs في التصنيع معقد بسبب الحاجة إلى التخصيص لمعالجة المعرفة المحددة والديناميكية المطلوبة في هذا المجال. تقدم الورقة أداة مدعومة من LLM مصممة للاستفادة من وثائق المصنع وتقارير تحليل المشكلات لمساعدة المشغلين في الاستفسار والإبلاغ عن مشكلات جديدة. تم إجراء دراسة مستخدم لتقييم قابلية استخدام الأداة وتأثيرها على عمليات المصنع، مع معالجة نقص المعايير المحددة لتقييم LLMs في سياقات التصنيع. تبرز الدراسة أهمية تكييف LLMs مع المصطلحات المتخصصة وقواعد المعرفة المتطورة في بيئات التصنيع.

مناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على قدرات وتحديات استخدام نماذج اللغة الكبيرة (LLMs) في إدارة المعرفة، خاصة في التطبيقات الخاصة بالمجال مثل التصنيع. تظهر LLMs، المدربة على نصوص متنوعة، مهارات قوية في التفكير وتفسير المعلومات، مما يجعلها قيمة لمهام اتخاذ القرار المعتمدة على المعرفة. ومع ذلك، فإن القضايا مثل المعلومات القديمة وعدم الدقة، التي تُسمى “الهلاوس”، تشكل تحديات كبيرة. تم اقتراح تقنيات مثل توليد معزز بالاسترجاع (RAG) والتحفيز القليل لتعزيز الدقة الواقعية وملاءمة استجابات LLM. يُظهر إدخال أدوات مثل WikiChat الإمكانية لـ LLMs لتفوق النماذج الحالية في الواقعية مع الحفاظ على جودة المحادثة.

يكشف قياس أداء LLMs المختلفة، بما في ذلك GPT-4 من OpenAI وعدة نماذج مفتوحة المصدر، أنه بينما يتفوق GPT-4 في الواقعية والكمال، تظهر نماذج مثل StableBeluga2 و Mixtral 8x7B تحسينات كبيرة مقارنة بأسلافها. تشير دراسات المستخدمين إلى أنه بينما يقدر مشغلو المصنع سرعة النظام وسهولة الوصول، لا تزال المخاوف بشأن السلامة وتفضيل الخبرة البشرية قائمة. تشير النتائج إلى أنه بينما يمكن أن تحدث LLMs ثورة في العمليات وتوفر وصولاً فوريًا للمعلومات، يجب أن تكمل بدلاً من استبدال الحكم البشري، خاصة في البيئات ذات المخاطر العالية. بشكل عام، تؤكد الأبحاث على أهمية تحسين تطبيقات LLM لتعزيز تجربة المستخدم ومعالجة المخاوف المتعلقة بالسلامة مع الاستفادة من قدراتها لإدارة المعرفة بشكل فعال.

القيود

تسلط القيود في هذه الدراسة الضوء على عدة عوامل حاسمة قد تؤثر على قابلية تعميم النتائج وتطبيقها. أولاً، قد لا يكون التحفيز الموحد المستخدم عبر جميع نماذج اللغة الكبيرة (LLMs) قد حقق الأداء الأمثل لكل نموذج، مما يشير إلى أن التحفيزات المخصصة قد تؤدي إلى نتائج أفضل. بالإضافة إلى ذلك، على الرغم من بذل الجهود لتوحيد المعلمات الفائقة، فإن عدم الوصول إلى إعدادات محددة لـ Llama 2 قد يقدم تباينًا. تثير عملية القياس، التي شملت 20 سؤالًا تم تقييمها بواسطة مبرمج واحد، مخاوف بشأن التحيز المحتمل وتمثيل الأسئلة في التقاط تعقيدات السيناريوهات الواقعية.

علاوة على ذلك، فإن غياب تقييم في العالم الحقيقي يشمل المستخدمين النهائيين في بيئة الإنتاج يحد من قابلية تطبيق الدراسة، حيث قد تقدم مثل هذه الإعدادات تحديات فريدة، مثل قيود الوقت. ومع ذلك، فإن مشاركة المشغلين والمديرين في توليد الأسئلة بناءً على تجاربهم قدمت بعض مستوى من الرؤية العملية. تشمل اتجاهات البحث المستقبلية إجراء دراسات طولية تقيم تأثير الأداة على أداء الإنتاج، ورفاهية المشغلين، والقدرات المعرفية في البيئات الواقعية، بالإضافة إلى استكشاف استراتيجيات تخصيص التحفيز والنموذج بشكل أكثر شمولاً.

Journal: Frontiers in Artificial Intelligence, Volume: 7
DOI: https://doi.org/10.3389/frai.2024.1293084
PMID: https://pubmed.ncbi.nlm.nih.gov/38601111
Publication Date: 2024-03-27
Author(s): Samuel Kernan Freire et al.
Primary Topic: Data Quality and Management

Overview

The research paper discusses the implementation of a Large Language Model (LLM)-based system aimed at enhancing knowledge sharing in manufacturing environments, where the complexity of operating production lines necessitates efficient information retrieval. The system is designed to extract relevant information from extensive factory documentation and expert operator knowledge, thereby facilitating quicker responses to operator queries. A user study conducted in a factory setting revealed that while the LLM system significantly improved information retrieval speed and issue resolution efficiency, operators still preferred learning from human experts when available.

Benchmarking results indicated that the state-of-the-art model, GPT-4, outperformed other models in terms of factual accuracy, completeness, and reduced hallucinations. Open-source models, such as StableBeluga2 and Mixtral 8x7B, also demonstrated competitive performance, offering advantages in data privacy and customization. The study highlighted the system’s user-friendliness and logical functionality, although suggestions for enhancing the user interface and content specificity were noted. Overall, the findings suggest that LLM tools can modernize factory operations and expedite certain tasks, despite concerns regarding safety, efficiency, and the perceived superiority of human expertise.

Introduction

The introduction of the research paper discusses the concept of human-centric manufacturing, which aims to integrate the strengths of humans and machines to improve creativity, wellbeing, problem-solving, and productivity in industrial settings. Despite advancements, a significant challenge remains in effectively managing the extensive knowledge generated in manufacturing environments, such as issue reports and machine documentation. This knowledge is often underutilized due to difficulties in processing and interpreting unstructured technical information. Traditional methods of knowledge management have proven cumbersome, leading operators to rely on personal devices rather than official procedures, while AI systems struggle with the complexity of the data.

Recent developments in Large Language Models (LLMs), such as GPT-4, present a potential solution to these challenges by enabling effective interpretation, summarization, and retrieval of information from large text datasets. However, the application of LLMs in manufacturing is complicated by the need for customization to address the specific and dynamic knowledge required in this domain. The paper introduces an LLM-powered tool designed to leverage factory documents and issue analysis reports to assist operators in querying and reporting new issues. A user study was conducted to evaluate the tool’s usability and impact on factory operations, while also addressing the lack of specific benchmarks for assessing LLMs in manufacturing contexts. The study highlights the importance of adapting LLMs to the specialized terminology and evolving knowledge bases of manufacturing environments.

Discussion

The discussion section of the research paper highlights the capabilities and challenges of using Large Language Models (LLMs) in knowledge management, particularly in domain-specific applications such as manufacturing. LLMs, trained on diverse texts, exhibit strong reasoning and information interpretation skills, making them valuable for knowledge-intensive decision-making tasks. However, issues such as outdated information and inaccuracies, termed “hallucinations,” pose significant challenges. Techniques like Retrieval Augmented Generation (RAG) and few-shot prompting have been proposed to enhance the factual accuracy and relevance of LLM responses. The introduction of tools like WikiChat demonstrates the potential for LLMs to outperform existing models in factuality while maintaining conversational quality.

The benchmarking of various LLMs, including OpenAI’s GPT-4 and several open-source models, reveals that while GPT-4 excels in factuality and completeness, models like StableBeluga2 and Mixtral 8x7B show significant improvements over their predecessors. User studies indicate that while factory operators appreciate the system’s speed and accessibility, concerns regarding safety and the preference for human expertise remain prevalent. The findings suggest that while LLMs can modernize operations and provide immediate information access, they should complement rather than replace human judgment, particularly in high-stakes environments. Overall, the research underscores the importance of refining LLM applications to enhance user experience and address safety concerns while leveraging their capabilities for efficient knowledge management.

Limitations

The limitations of this study highlight several critical factors that may affect the generalizability and applicability of the findings. Firstly, the uniform prompt used across all large language models (LLMs) may not have optimized performance for each model, suggesting that tailored prompts could yield better results. Additionally, while efforts were made to standardize hyperparameters, the lack of access to specific settings for Llama 2 could introduce variability. The benchmarking process, which involved only 20 questions assessed by a single coder, raises concerns about potential bias and the representativeness of the questions in capturing the complexities of real-world scenarios.

Moreover, the absence of a real-world evaluation involving end users in a production environment limits the study’s applicability, as such settings may introduce unique challenges, such as time constraints. However, the involvement of operators and managers in generating questions based on their experiences provided some level of practical insight. Future research directions include conducting longitudinal studies that assess the tool’s impact on production performance, operator wellbeing, and cognitive abilities in real-world settings, as well as exploring more comprehensive prompt and model customization strategies.