DPA-2: نموذج ذري كبير كمتعلم متعدد المهام DPA-2: a large atomic model as a multi-task learner

المجلة: npj Computational Materials، المجلد: 10، العدد: 1
DOI: https://doi.org/10.1038/s41524-024-01493-2
PMID: https://pubmed.ncbi.nlm.nih.gov/40851785
تاريخ النشر: 2024-12-19
المؤلف: Duo Zhang وآخرون
الموضوع الرئيسي: تعلم الآلة في علوم المواد

طرق

في هذه الدراسة، يحقق المؤلفون في نظام يتكون من $ N $ ذرة، يتميز بأرقامها الذرية $ Z = \{Z_1, \ldots, Z_i, \ldots, Z_N\} $ وإحداثياتها الذرية $ R = \{r_1, \ldots, r_i, \ldots, r_N\} $. سطح الطاقة الكامنة (PES) للنظام، المشار إليه بـ $ E $، هو دالة تعتمد على كل من أنواع الذرات وإحداثياتها، معبرًا عنها رسميًا كـ $ E = E(X) $ حيث $ X := (R, Z) $. يمكن تحليل الطاقة الكلية إلى مساهمات من الذرات الفردية، ممثلة بالمعادلة $ E = \sum_{i} E_i $، مع $ E_i $ تشير إلى الطاقة المرتبطة بالذرة $ i $.

القوة الذرية المؤثرة على الذرة $ i $، المشار إليها بـ $ F_i $، تعرف بأنها التدرج السالب للطاقة الكلية بالنسبة للإحداثيات الذرية، المعطاة بـ $ F_i = -\nabla_{r_i} E $. بالنسبة للأنظمة الدورية، يتم حساب موتر الفيريل $ \alpha\beta $ باستخدام العلاقة $ \Xi_{\alpha\beta} = -\sum_{\gamma} \frac{\partial E}{\partial h_{\gamma\alpha}} h_{\gamma\beta} $، حيث يمثل $ \Xi_{\alpha\beta} $ المكون $ \alpha\beta $ من موتر الفيريل و $ h_{\alpha\beta} $ يتوافق مع المكون $ \beta $ من متجه الخلية $ \alpha $. توفر هذه الصياغة إطارًا شاملاً لتحليل التفاعلات والقوى داخل النظام الذري.

النتائج

تحدد قسم النتائج سير العمل لنموذج LAM (نماذج الذرات التعليمية)، والذي يشمل التدريب المسبق، والتعديل الدقيق للمهام اللاحقة، وتقطير المعرفة. يستخدم LAM موصوفًا موحدًا يقوم بتشفير تمثيلات تحافظ على التناظر للأنظمة الذرية، مرتبطة بشبكات ملائمة للطاقة تتنبأ بمخرجات الطاقة ($E$) والقوة ($F$) بناءً على بيانات التدريب المسبق. يتم استخدام استراتيجية تدريب متعددة المهام، optimizing معلمات الموصوف الموحد عبر جميع مجموعات بيانات التدريب المسبق بينما يتم تحديث معلمات الشبكة الملائمة مع مجموعات بيانات محددة. يتناقض هذا النهج مع التدريب أحادي المهمة، الذي يستخدم مجموعة بيانات واحدة، وهو ضروري بسبب التباين في التسميات المستمدة من حسابات DFT عبر مجموعات بيانات مختلفة.

يمكن تعديل الموصوف والشبكات الملائمة المدربة مسبقًا لتناسب مهام نمذجة سطح الطاقة الكامنة (PES) المحددة، مع بدء الموصوف من التدريب المسبق والشبكة الملائمة التي تبدأ إما عشوائيًا أو من مهمة تدريب مسبق. تقدم الدراسة عدة مجموعات بيانات لاحقة للتحقق من المنهجية، بينما ستستكشف الأعمال المستقبلية التعلم المتزامن لتوليد البيانات. لتعزيز الكفاءة في التطبيقات مثل محاكاة الديناميات الجزيئية (MD)، يقترح المؤلفون عملية تقطير نموذج. يتضمن ذلك نموذج “معلم” يقوم بتصنيف البيانات ونموذج “طالب” مبسط يتم تدريبه على هذه البيانات المصنفة، مع حلقات تعلم تكرارية لتحسين توقعات نموذج الطالب حتى تلبي معايير الدقة.

المناقشة

في هذا القسم، يقدم المؤلفون نموذج DPA-2، وهو هيكل جديد لنماذج الذرات الكبيرة (LAMs) مصمم لتعزيز محاكاة الجزيئات والمواد من خلال سير عمل شامل يتضمن التدريب المسبق متعدد المهام، والتعديل الدقيق، وتقطير المعرفة. يستفيد نموذج DPA-2 من مجموعة متنوعة من 18 مجموعة بيانات تشمل 73 عنصرًا كيميائيًا، مما يسهل قدرات تعميم متفوقة مقارنة بأساليب التدريب المسبق أحادية المهمة التقليدية. من الجدير بالذكر أن نهج التدريب المسبق متعدد المهام يقلل بشكل كبير من أخطاء التعميم بدون عينة، حيث يحقق انخفاضًا بنسبة 52% في الطاقة وانخفاضًا بنسبة 59% في RMSE للقوة مقارنة بنموذج MACE المدرب مسبقًا من MPtrj، وانخفاضًا بنسبة 50% و62% على التوالي، مقارنة بنموذج DPA-2 المدرب مسبقًا أحادي المهمة.

يؤكد المؤلفون أن هيكل نموذج DPA-2 مصمم لاحترام تناظرات مختلفة ويحافظ على الطاقة من خلال حساب القوى الذرية كتدرج سالب لطاقة النظام. تظهر عملية التعديل الدقيق للنموذج كفاءة كبيرة، حيث تتطلب بيانات أقل بمقدار 1-2 ترتيب من أجل تحقيق دقة مقارنة بالنماذج المدربة من الصفر. علاوة على ذلك، يتم استخدام تقطير المعرفة لإنشاء نسخ مضغوطة من نموذج DPA-2، والتي تحافظ على الدقة بينما تحسن بشكل كبير الكفاءة الحسابية. يخلص المؤلفون إلى أنه بينما توفر مجموعات البيانات الحالية أساسًا قويًا، فإن الجهود المستمرة ضرورية لتوسيع بيانات التدريب، خاصة للمواد الممثلة تمثيلًا ناقصًا مثل الهياكل ثنائية الأبعاد، ويدعون إلى نهج تعاوني لتطوير LAMs من خلال مبادرات مثل OpenLAM.

Journal: npj Computational Materials, Volume: 10, Issue: 1
DOI: https://doi.org/10.1038/s41524-024-01493-2
PMID: https://pubmed.ncbi.nlm.nih.gov/40851785
Publication Date: 2024-12-19
Author(s): Duo Zhang et al.
Primary Topic: Machine Learning in Materials Science

Methods

In this study, the authors investigate a system comprising $ N $ atoms, characterized by their atomic numbers $ Z = \{Z_1, \ldots, Z_i, \ldots, Z_N\} $ and atomic coordinates $ R = \{r_1, \ldots, r_i, \ldots, r_N\} $. The potential energy surface (PES) of the system, denoted as $ E $, is a function of both the atomic types and their coordinates, formally expressed as $ E = E(X) $ where $ X := (R, Z) $. The total energy can be decomposed into contributions from individual atoms, represented by the equation $ E = \sum_{i} E_i $, with $ E_i $ indicating the energy associated with atom $ i $.

The atomic force acting on atom $ i $, denoted as $ F_i $, is defined as the negative gradient of the total energy with respect to the atomic coordinate, given by $ F_i = -\nabla_{r_i} E $. For periodic systems, the virial tensor $ \alpha\beta $ is calculated using the relation $ \Xi_{\alpha\beta} = -\sum_{\gamma} \frac{\partial E}{\partial h_{\gamma\alpha}} h_{\gamma\beta} $, where $ \Xi_{\alpha\beta} $ represents the $ \alpha\beta $ component of the virial tensor and $ h_{\alpha\beta} $ corresponds to the $ \beta $-th component of the $ \alpha $-th cell vector. This formulation provides a comprehensive framework for analyzing the interactions and forces within the atomic system.

Results

The results section outlines the workflow of the LAM (Learning Atomic Models), which encompasses pre-training, fine-tuning for downstream tasks, and knowledge distillation. The LAM utilizes a unified descriptor that encodes symmetry-preserving representations of atomic systems, connected to energy-fitting networks that predict energy ($E$) and force ($F$) outputs based on pre-training data. A multi-task training strategy is employed, optimizing the unified descriptor’s parameters across all pre-training datasets while updating the fitting network parameters with specific datasets. This approach contrasts with single-task training, which uses a singular dataset, and is necessitated by the variability in labels derived from DFT calculations across different datasets.

The pre-trained descriptor and fitting networks can be fine-tuned for specific potential energy surface (PES) modeling tasks, with the descriptor initialized from pre-training and the fitting network initialized either randomly or from a pre-training task. The study presents several downstream datasets to validate the methodology, while future work will explore concurrent learning for data generation. To enhance efficiency in applications like molecular dynamics (MD) simulations, the authors propose a model distillation process. This involves a “teacher” model labeling data and a simplified “student” model being trained on this labeled data, with iterative learning loops to refine the student model’s predictions until they meet accuracy standards.

Discussion

In this section, the authors present the DPA-2 model, a novel architecture for Large Atomic Models (LAMs) designed to enhance molecular and materials simulations through a comprehensive workflow that includes multi-task pre-training, fine-tuning, and knowledge distillation. The DPA-2 model leverages a diverse set of 18 datasets encompassing 73 chemical elements, which facilitates superior generalization capabilities compared to traditional single-task pre-training methods. Notably, the multi-task pre-training approach significantly reduces zero-shot generalization errors, achieving a 52% reduction in energy and a 59% reduction in force RMSEs compared to the MPtrj pre-trained MACE model, and a 50% and 62% reduction, respectively, compared to the single-task pre-trained DPA-2 model.

The authors emphasize that the DPA-2 model’s architecture is designed to respect various symmetries and conserves energy through the computation of atomic forces as the negative gradient of the system’s energy. The model’s fine-tuning process demonstrates substantial efficiency, requiring 1-2 orders of magnitude less data to achieve comparable accuracy to models trained from scratch. Furthermore, knowledge distillation is employed to create compressed versions of the DPA-2 model, which maintain accuracy while significantly improving computational efficiency. The authors conclude that while the current datasets provide a strong foundation, ongoing efforts are necessary to expand the training data, particularly for underrepresented materials like 2-D structures, and they advocate for a collaborative approach to the development of LAMs through initiatives like OpenLAM.