إطار التعلم النشط بدون تدريب في علوم المواد باستخدام نماذج اللغة الكبيرة Training-free active learning framework in materials science with large language models

المجلة: npj Computational Materials
DOI: https://doi.org/10.1038/s41524-026-02136-4
تاريخ النشر: 2026-05-20
المؤلف: Hongchen Wang وآخرون
الموضوع الرئيسي: تعلم الآلة في علوم المواد

نظرة عامة

تقدم البحث إطار عمل جديد للتعلم النشط، يسمى LLM-AL، والذي يستخدم نماذج اللغة الكبيرة (LLMs) لتعزيز كفاءة التجارب العلمية في علم المواد. تواجه نماذج التعلم الآلي التقليدية في التعلم النشط تحديات مثل قيود البداية الباردة والحاجة إلى هندسة ميزات محددة للمجال، مما يقيد قابليتها للتطبيق. بالمقابل، يستفيد LLM-AL من المعرفة المدربة مسبقًا والتمثيلات المعتمدة على الرموز لاقتراح تجارب مباشرة من الأوصاف النصية، ويعمل في إعدادات قليلة التجارب بشكل تكراري.

يقوم الدراسة بمقارنة LLM-AL مع نماذج التعلم الآلي التقليدية عبر أربعة مجموعات بيانات متنوعة في علم المواد، باستخدام استراتيجيتين للتوجيه: واحدة للمدخلات العددية المنظمة وأخرى للنصوص الوصفية. تكشف النتائج أن LLM-AL يقلل بشكل كبير من عدد التجارب المطلوبة لتحديد المرشحين الأعلى أداءً بأكثر من 70%، بينما يتفوق باستمرار على النماذج التقليدية. بالإضافة إلى ذلك، يظهر LLM-AL قدرة على البحث الاستكشافي الأوسع، محققًا نتائج مثالية مع عدد أقل من التكرارات. تظل أداء الإطار مستقرًا عبر عدة جولات، متماشيًا مع التباين الذي يُلاحظ عادةً في أساليب التعلم الآلي التقليدية. بشكل عام، يقدم LLM-AL بديلاً واعدًا لاختيار التجارب بشكل أكثر كفاءة وقابلية للتفسير، مما يمهد الطريق لاكتشاف محتمل مستقل مدفوعًا بـ LLMs.

مقدمة

تناقش مقدمة هذه الورقة البحثية التحديات المرتبطة باكتشاف المواد وتحسين العمليات، لا سيما عدم جدوى طرق التجربة والخطأ الشاملة بسبب قيود الوقت والتكلفة. لقد ظهر التعلم النشط (AL) كنهج حيوي لتعزيز كفاءة البيانات وتبسيط اتخاذ القرارات التجريبية من خلال إعطاء الأولوية للتجارب الأكثر إفادة. وقد أدى ذلك إلى تقدم كبير في المختبرات ذاتية القيادة (SDLs)، حيث أظهر AL، وخاصة من خلال تحسين بايزي، تحسينات كبيرة في كفاءة التجارب، وغالبًا ما يتطلب 10 إلى 20 مرة أقل من التجارب مقارنة بالطرق التقليدية.

على الرغم من تقدم التعلم الآلي التقليدي (ML) القائم على AL، فإن القيود مثل القابلية للتوسع، والاعتماد على هندسة ميزات محددة للمهام، ومشكلة “البداية الباردة” تعيق فعاليته، خاصة في مراحل التجريب المبكرة. بالمقابل، تقدم نماذج اللغة الكبيرة (LLMs) بديلاً واعدًا من خلال الاستفادة من المعرفة العلمية الواسعة لتحديد الأنماط في أوصاف المواد وخصائصها، مما يوجه تدفقات العمل في AL. أظهرت الدراسات السابقة أن LLMs يمكن أن تؤدي توقعات الخصائص بفعالية، حتى مع بيانات محدودة، ويمكن أن تخفف من مشكلة البداية الباردة. تقترح هذه الورقة تدفق عمل AL قائم على مجموعة لتقييم LLMs كنماذج بديلة في تصميم التجارب، مع مقارنة أدائها بأساليب ML التقليدية عبر مجموعات بيانات متنوعة في علم المواد. تشير النتائج إلى أن LLMs يمكن أن تحقق نتائج تنافسية أو متفوقة في تحديد المواد المثلى مع عدد أقل من التكرارات، مما يبرز إمكاناتها كأدوات مرنة وقابلة للتعميم في خطوط أنابيب AL.

طرق

تحدد قسم “الطرق” تصميم التجربة والتقنيات التحليلية المستخدمة في الدراسة. استخدم الباحثون نهجًا كميًا، حيث قاموا بتنفيذ تجارب محكومة لجمع البيانات حول المتغيرات المحددة. تم إجراء تحليلات إحصائية باستخدام أدوات البرمجيات لضمان موثوقية وصدق النتائج، مع تحديد مستويات الدلالة عند p < 0.05. شملت جمع البيانات أخذ عينات منهجية وتطبيق أدوات قياس موحدة لتقليل التحيز. كما تضمنت المنهجية وصفًا تفصيليًا لظروف التجربة، وخصائص المشاركين، والبروتوكولات المتبعة لضمان الامتثال الأخلاقي. بشكل عام، تم تصميم الطرق بدقة لتسهيل إعادة الإنتاج ودعم قوة النتائج.

نتائج

يحدد قسم النتائج تدفق العمل لإعداد التعلم النشط القائم على المجموعة ضمن إطار عمل التعلم النشط لنموذج اللغة الكبيرة (LLM-AL)، كما هو موضح في الشكل 1. تبدأ العملية بملاحظة مختارة عشوائيًا، تليها دورة تكرارية حيث يتلقى LLM هدف التحسين، وسياق مجموعة البيانات، والأمثلة التي تم ملاحظتها سابقًا لاقتراح ظروف تجريبية جديدة. يتم بعد ذلك مطابقة كل شرط مقترح مع أقرب مرشح موجود في مجموعة البيانات من خلال مقارنة التشابه، مما يسمح باسترجاع قيمة الهدف الملاحظة المقابلة، والتي تتم إضافتها بعد ذلك كمثال قليل التجارب.

تستمر هذه الحلقة التكرارية – التي تتكون من الاقتراح، والمطابقة، والملاحظة، والتحديث – حتى يتم تحديد الأمثل العالمي لمجموعة البيانات. تسجل الدراسة عدد التكرارات والمسار اللازم لتحقيق هذا المعيار التوقف، مما يمكّن من إجراء تحليل مقارن مع طرق التعلم النشط القائم على التعلم الآلي التقليدي. يمكن العثور على مزيد من التفاصيل حول تدفق العمل في قسم الطرق.

مناقشة

تناقش الورقة البحثية أداء وقابلية إعادة إنتاج إطار عمل التعلم النشط القائم على نموذج اللغة الكبيرة (LLM-AL) عبر أربعة مجموعات بيانات متنوعة: matbench_steels، P3HT/CNT، Perovskite، وMembrane. تشير النتائج إلى أن LLM-AL يمكنه بكفاءة اقتراح اقتراحات تجريبية وتحديد الأهداف المثلى باستخدام أقل من 30% من البيانات في معظم الحالات. تبرز الدراسة التأثير الكبير لبنية التوجيه على أداء النموذج، من خلال مقارنة استراتيجيتين: تنسيق معلمات مختصر وتنسيق تقرير سردي. كانت توجيهات تنسيق المعلمات فعالة بشكل خاص لمجموعات البيانات عالية الأبعاد مثل matbench_steels، حيث يمكن أن overwhelm الأوصاف المطولة النموذج. على العكس، تفوقت توجيهات تنسيق التقرير في مجموعات البيانات ذات المتغيرات الأقل، مثل Membrane، حيث سهلت الأوصاف التفصيلية استخراج العلاقات الضمنية.

كما تكشف التحليلات أن LLM-AL يظهر نمط بحث استكشافي عالي، غالبًا ما يتنقل لمسافات أطول في فضاء الميزات مقارنةً بنماذج ML التقليدية، التي تظهر عادةً سلوك تقارب أكثر توجيهًا. على الرغم من هذه الطبيعة الاستكشافية، يصل LLM-AL باستمرار إلى الأهداف المثلى في عدد أقل من التكرارات مقارنة بالنماذج التقليدية، مما يشير إلى أن اعتماده على الإشارات الدلالية والتفكير السياقي قد يعزز أدائه في تصميم التجارب. تؤكد النتائج على إمكانات LLM-AL كأداة قوية وقابلة للتعميم لتوجيه تدفقات العمل التجريبية عبر مجالات متنوعة، بينما تؤكد أيضًا على أهمية تخصيص استراتيجيات التوجيه لخصائص مجموعة البيانات المحددة. بشكل عام، يظهر LLM-AL وعدًا كبديل تنافسي لأساليب التعلم النشط التقليدية، لا سيما في السيناريوهات التي تتوفر فيها سياقات إجرائية غنية.

Journal: npj Computational Materials
DOI: https://doi.org/10.1038/s41524-026-02136-4
Publication Date: 2026-05-20
Author(s): Hongchen Wang et al.
Primary Topic: Machine Learning in Materials Science

Overview

The research introduces a novel active learning framework, termed LLM-AL, which utilizes large language models (LLMs) to enhance the efficiency of scientific experimentation in materials science. Traditional machine learning models in active learning face challenges such as cold-start limitations and the need for domain-specific feature engineering, which restrict their applicability. In contrast, LLM-AL leverages pretrained knowledge and token-based representations to propose experiments directly from textual descriptions, operating in an iterative few-shot setting.

The study benchmarks LLM-AL against conventional machine learning models across four diverse materials science datasets, employing two prompting strategies: one for structured numerical inputs and another for descriptive text. The findings reveal that LLM-AL significantly reduces the number of experiments required to identify top-performing candidates by over 70%, while consistently outperforming traditional models. Additionally, LLM-AL demonstrates a capacity for broader exploratory searches, achieving optimal results with fewer iterations. The framework’s performance remains stable across multiple runs, aligning with the variability typically observed in traditional machine learning approaches. Overall, LLM-AL presents a promising alternative for more efficient and interpretable experiment selection, paving the way for potential autonomous discovery driven by LLMs.

Introduction

The introduction of this research paper discusses the challenges associated with material discovery and process optimization, particularly the impracticality of exhaustive trial-and-error methods due to time and cost constraints. Active learning (AL) has emerged as a vital approach to enhance data efficiency and streamline experimental decision-making by prioritizing the most informative experiments. This has led to significant advancements in self-driving laboratories (SDLs), where AL, particularly through Bayesian optimization, has demonstrated substantial improvements in experimental efficiency, often requiring 10 to 20 times fewer trials than traditional methods.

Despite the progress of traditional machine learning (ML)-based AL, limitations such as scalability, reliance on task-specific feature engineering, and the “cold start” problem hinder its effectiveness, especially in early experimentation phases. In contrast, large language models (LLMs) present a promising alternative by leveraging extensive scientific knowledge to identify patterns in material descriptors and properties, thereby guiding AL workflows. Previous studies have shown that LLMs can perform property predictions effectively, even with limited data, and can mitigate the cold start problem. This paper proposes a pool-based AL workflow to evaluate LLMs as surrogate models in experimental design, benchmarking their performance against traditional ML approaches across various materials science datasets. The findings indicate that LLMs can achieve competitive or superior results in identifying optimal materials with fewer iterations, highlighting their potential as flexible and generalizable tools in AL pipelines.

Methods

The “Methods” section outlines the experimental design and analytical techniques employed in the study. The researchers utilized a quantitative approach, implementing controlled experiments to gather data on the specified variables. Statistical analyses were conducted using software tools to ensure the reliability and validity of the results, with significance levels set at p < 0.05. Data collection involved systematic sampling and the application of standardized measurement instruments to minimize bias. The methodology also included a detailed description of the experimental conditions, participant demographics, and the protocols followed to ensure ethical compliance. Overall, the methods were rigorously designed to facilitate reproducibility and to support the robustness of the findings.

Results

The results section outlines the workflow of the pool-based Active Learning (AL) setup within the Large Language Model Active Learning (LLM-AL) framework, as illustrated in Figure 1. The process initiates with a randomly selected observation, followed by an iterative cycle where the LLM receives the optimization objective, dataset context, and previously observed examples to propose new experimental conditions. Each proposed condition is then matched to the closest existing candidate in the dataset through a similarity comparison, allowing for the retrieval of the corresponding observed target value, which is subsequently added as a few-shot example.

This iterative loop—comprising proposal, matching, observation, and updating—continues until the global optimum of the dataset is identified. The study records the number of iterations and the acquisition trajectory necessary to meet this stopping criterion, enabling a comparative analysis with traditional machine learning-based active learning methods. Further details regarding the workflow can be found in the Methods Section.

Discussion

The research paper discusses the performance and reproducibility of a Large Language Model-based Active Learning (LLM-AL) framework across four diverse datasets: matbench_steels, P3HT/CNT, Perovskite, and Membrane. The findings indicate that LLM-AL can efficiently propose experimental suggestions and identify optimal targets using less than 30% of the data in most cases. The study highlights the significant impact of prompt structure on model performance, comparing two strategies: a concise parameter-format and a narrative report-format. The parameter-format prompts were particularly effective for high-dimensional datasets like matbench_steels, where verbose descriptions could overwhelm the model. Conversely, the report-format prompts excelled in datasets with fewer variables, such as Membrane, where detailed descriptions facilitated the extraction of implicit relationships.

The analysis also reveals that LLM-AL demonstrates a highly exploratory search pattern, often traversing longer distances in the feature space compared to traditional ML models, which typically exhibit more directed convergence behavior. Despite this exploratory nature, LLM-AL consistently reaches optimal targets in fewer iterations than traditional models, suggesting that its reliance on semantic cues and contextual reasoning may enhance its performance in experimental design. The results underscore the potential of LLM-AL as a robust and generalizable tool for guiding experimental workflows across various domains, while also emphasizing the importance of tailoring prompting strategies to the specific characteristics of the dataset. Overall, LLM-AL shows promise as a competitive alternative to conventional active learning approaches, particularly in scenarios where rich procedural context is available.