تحليل لمشكلات التحسين التي تتضمن الشبكات العصبية ReLU An analysis of optimization problems involving ReLU neural networks

المجلة: Optimization and Engineering
DOI: https://doi.org/10.1007/s11081-026-10075-8
تاريخ النشر: 2026-03-10
المؤلف: Christoph Plate وآخرون
الموضوع الرئيسي: تطبيقات الشبكات العصبية المتقدمة

نظرة عامة

تناقش هذه القسم التحديات المتعلقة بحل مشاكل تحسين الأعداد المختلطة التي تتضمن الشبكات العصبية مع دوال تفعيل ReLU، خاصة بسبب النمو الأسي لمعاملات Big-M المرتبطة بتخفيف القرارات الثنائية. يستعرض المؤلفون استراتيجيات متنوعة تهدف إلى تحسين أداء وقت التشغيل لمحللات البرمجة المختلطة، بما في ذلك دوال التفعيل المقصوصة، تقنيات التنظيم أثناء التدريب، تشديد الحدود المعتمد على التحسين، وطريقة جديدة للتدرج لشبكات ReLU. يقومون بإجراء مقارنات عددية عبر ثلاث مشاكل مرجعية، تقييم فعالية هذه الأساليب بناءً على مقاييس مثل عدد المناطق الخطية، نسبة الخلايا العصبية المستقرة، والجهد الحسابي الكلي. إحدى النتائج الرئيسية هي التوازن بين تكرار نماذج الشبكات العصبية والتكاليف الحسابية المتكبدة في حل مشاكل التحسين ذات الصلة.

في استنتاجاتهم، يبرز المؤلفون فعالية طريقة التدرج المقترحة، التي تقلل من معيار $\ell_1$ للأوزان والانحيازات في شبكات ReLU، مما يقلل من المعاملات الثابتة في صيغ Big-M. تظهر هذه الطريقة، عند دمجها مع تشديد الحدود، تقليلاً في الجهد الحسابي لمشاكل التحسين الأكبر. كما تؤسس الدراسة صلة بين طرق التدريب المختلفة وتأثيرها على عدد المناطق الخطية والخلايا العصبية الثابتة، مما يوفر دعماً تجريبياً للأدبيات الموجودة. تشمل اتجاهات البحث المستقبلية استكشافاً أعمق لكيفية تأثير طرق التدريب والمعلمات الفائقة على خصائص الشبكة، بالإضافة إلى التحقيق في تقنيات المعالجة اللاحقة البديلة القابلة للتطبيق على فئات أوسع من دوال التفعيل.

مقدمة

تناقش مقدمة الورقة أهمية الشبكات العصبية الاصطناعية (ANNs) كأدوات قوية لتقريب الدوال عبر تطبيقات متنوعة، بما في ذلك التحسين في الهندسة والرعاية الصحية. يركز المؤلفون بشكل خاص على الشبكات العصبية الأمامية التي تستخدم دالة تفعيل ReLU، والتي تكتسب شهرة بسبب كفاءتها في التعامل مع مشاكل تحسين الأعداد المختلطة. تبرز الورقة أن الشبكات العصبية الاصطناعية هي مقربات عالمية قادرة على تقريب الدوال المستمرة على مجموعات مضغوطة، مما يجعلها قيمة في السيناريوهات التي تكون فيها الدوال الأساسية معقدة أو مكلفة للنمذجة.

يؤكد المؤلفون على دمج الشبكات العصبية الاصطناعية في مشاكل تحسين غير الخطية المختلطة (MINLPs) ويحددون التحديات المرتبطة بالتحقق من موثوقية هذه الشبكات، خاصة ضد الهجمات العدائية. يناقشون أهمية صياغة مشاكل التحسين التي تتضمن الشبكات العصبية الاصطناعية، بما في ذلك استخدام صيغ Big-M، والبحث المستمر الذي يهدف إلى تحسين هذه الصيغ من أجل كفاءة حسابية أفضل. تختتم المقدمة بتحديد مساهمات الورقة، والتي تشمل تقييماً منهجياً لطرق متنوعة لتعزيز أداء التحسين مع الشبكات العصبية الاصطناعية المدمجة، ونهجاً جديداً لتقليل حجم معاملات Big-M في صيغ البرمجة الخطية المختلطة (MILP) الخاصة بهم.

طرق

في قسم الطرق، يستكشف المؤلفون استراتيجيات متنوعة لتعزيز أداء محللات البرمجة غير الخطية المختلطة (MINLP) مثل Gurobi، عند معالجة مشاكل التحسين التي تتميز بأنواع محددة. تبدأ المناقشة بتقديم قياسين للتعقيد في القسم 2.1: عدد المناطق في مجال الإدخال حيث تظهر الدالة $ h(x) $ سلوك إخراج خطي متسق، وعدد الخلايا العصبية المستقرة في بنية الشبكة العصبية.

تتناول الأقسام اللاحقة تقنيات محددة قابلة للتطبيق على الشبكات العصبية الاصطناعية المدربة. يقدم القسم 2.2 طرق تشديد الحدود التي تهدف إلى تحسين المشكلة. في القسم 2.3، يتم تقديم طريقة تدرج جديدة، تعزز من مصطلح التنظيم $ \ell_1 $ لشبكة مدربة مسبقاً دون تغيير دالتها المشفرة. هذه الطريقة قابلة للتطبيق بعد مرحلة التدريب (بعد ذلك) وقبل عملية التحسين (قبل ذلك). أخيراً، تركز الأقسام 2.4 و2.5 و2.6 على التعديلات في عملية تدريب الشبكة العصبية الاصطناعية، بما في ذلك تنظيم أوزان التدريب، تنفيذ صيغ ReLU المقصوصة، وتطبيق تقنيات التسرب أثناء التدريب.

نتائج

في هذا القسم، يقدم المؤلفون تقييمات عددية لطرق متنوعة لصياغة وتدريب وتدرج الشبكات العصبية الأمامية مع تفعيل ReLU، كما تم مناقشته في القسم 2. يركز النقاش على تقييم تأثير هذه الطرق على أداء التحسين العالمي. تم تدريب الشبكات العصبية كبدائل لعدة دوال مرجعية غير محدبة وك classifiers للأرقام المكتوبة بخط اليد باستخدام مجموعة بيانات MNIST. يتم مقارنة أداء محللات التحسين المختلفة عبر هذه المهام، مع التركيز بشكل خاص على تأثيرات خطوات المعالجة اللاحقة المختلفة.

تشمل المقاييس الرئيسية التي تم تحليلها قوة التعبير للشبكات العصبية الاصطناعية (ANNs) مع تفعيل ReLU، والتي تم قياسها بعدد المناطق الخطية التي تحددها، ونسبة الخلايا العصبية المستقرة القابلة للتحديد من حدود ما قبل التفعيل. تظهر النتائج أن المنهجيات المقدمة في القسم 2 تعزز بشكل كبير هذه الخصائص، مما يؤدي إلى تحسين الأداء في خوارزميات التحسين.

مناقشة

في هذا القسم، يناقش المؤلفون قياسات التعقيد للشبكات العصبية الاصطناعية (ANNs) مع تفعيل ReLU وتأثيراتها على التحسين. يبرزون مؤشرين رئيسيين: عدد المناطق الخطية وعدد الخلايا العصبية المستقرة. يؤثر عدد المناطق الخطية، الذي يتوافق مع أنماط التفعيل الممكنة للخلايا العصبية، بشكل كبير على تعقيد مشاكل التحسين، خاصة في أطر البرمجة الخطية المختلطة (MILP). يشير المؤلفون إلى أعمال سابقة تقدم حدوداً على عدد المناطق الخطية بناءً على عمق الشبكة وعرضها. بالإضافة إلى ذلك، يقدمون مفهوم الخلايا العصبية المستقرة، التي يمكن تحديدها باستخدام حدود ما قبل التفعيل، مما يقلل من عدد المتغيرات الثنائية المطلوبة للتحسين.

كما يوضح المؤلفون تقنيات تشديد الحدود، التي تعتبر حاسمة لتحسين أداء خوارزميات التحسين. يصفون طريقتين: الحساب العددي وطرق تشديد الحدود المعتمدة على البرمجة الخطية (LP). تستفيد الطريقة الأخيرة من الاعتماديات بين الخلايا العصبية لحساب حدود أكثر دقة، مما يؤدي إلى تقليل كبير في معاملات Big-M، التي تعتبر أساسية لصيغ MILP. يقدم المؤلفون نتائج تجريبية تظهر أن تشديد الحدود المعتمد على LP يمكن أن يقلل معاملات Big-M إلى النصف ويزيد من نسبة الخلايا العصبية المستقرة، مما يؤدي إلى تحسين الكفاءة الحسابية. علاوة على ذلك، يناقشون الفوائد المحتملة لتدرج شبكات ReLU لتحقيق معاملات Big-M أقل مع الحفاظ على التكافؤ الوظيفي، مشيرين إلى أن هذا النهج يمكن دمجه في مشاكل التحسين لتعزيز الأداء.

Journal: Optimization and Engineering
DOI: https://doi.org/10.1007/s11081-026-10075-8
Publication Date: 2026-03-10
Author(s): Christoph Plate et al.
Primary Topic: Advanced Neural Network Applications

Overview

This section discusses the challenges of solving mixed-integer optimization problems that incorporate neural networks with ReLU activation functions, particularly due to the exponential growth of Big-M coefficients associated with binary decision relaxations. The authors survey various strategies aimed at enhancing the runtime performance of mixed-integer programming solvers, including clipped activation functions, regularization techniques during training, optimization-based bound tightening, and a novel scaling method for ReLU networks. They conduct numerical comparisons across three benchmark problems, evaluating the effectiveness of these approaches based on metrics such as the number of linear regions, the percentage of stable neurons, and overall computational effort. A key finding is the trade-off between the redundancy of neural network models and the computational costs incurred in solving related optimization problems.

In their conclusions, the authors highlight the effectiveness of their proposed scaling method, which minimizes the $\ell_1$ norm of the weights and biases in ReLU networks, thereby reducing the constant coefficients in Big-M formulations. This method, when combined with bound tightening, demonstrates a reduction in computational effort for larger optimization problems. The study also establishes a connection between various training methods and their impact on the number of linear regions and fixed neurons, providing empirical support for existing literature. Future research directions include a deeper exploration of how training methods and hyperparameters influence network properties, as well as the investigation of alternative post-processing techniques applicable to broader classes of activation functions.

Introduction

The introduction of the paper discusses the significance of artificial neural networks (ANNs) as powerful tools for function approximation across various applications, including optimization in engineering and healthcare. Specifically, the authors focus on feedforward ANNs utilizing the ReLU activation function, which is prevalent due to its efficiency in handling mixed-integer optimization problems. The paper highlights that ANNs are universal approximators capable of approximating continuous functions on compact sets, making them valuable in scenarios where the underlying functions are complex or expensive to model.

The authors emphasize the integration of ANNs into mixed-integer nonlinear optimization problems (MINLPs) and outline the challenges associated with verifying the reliability of these networks, particularly against adversarial attacks. They discuss the importance of formulating optimization problems that incorporate ANNs, including the use of big-M formulations, and the ongoing research aimed at improving these formulations for better computational efficiency. The introduction concludes by outlining the paper’s contributions, which include a systematic evaluation of various methods to enhance optimization performance with embedded ANNs, and a novel approach to reducing the magnitude of big-M coefficients in their mixed-integer linear programming (MILP) formulations.

Methods

In the Methods section, the authors explore various strategies to enhance the performance of Mixed-Integer Nonlinear Programming (MINLP) solvers, such as Gurobi, when addressing optimization problems characterized by specific types. The discussion begins with the introduction of two complexity measures in Section 2.1: the number of regions in the input domain where the function $ h(x) $ exhibits consistent linear output behavior, and the count of stable ReLU neurons within the neural network architecture.

Subsequent sections delve into specific techniques applicable to trained artificial neural networks (ANNs). Section 2.2 presents bound tightening approaches aimed at optimizing the problem. In Section 2.3, a novel scaling method is introduced, which enhances the $ \ell_1 $ regularization term of a pretrained network without altering its encoded function. This method is applicable after the training phase (a posteriori) and prior to the optimization process (a priori). Finally, Sections 2.4, 2.5, and 2.6 focus on modifications to the ANN training process, including the regularization of training weights, the implementation of clipped ReLU formulations, and the application of dropout techniques during training.

Results

In this section, the authors present numerical evaluations of various methods for formulating, training, and scaling feedforward neural networks with ReLU activation, as discussed in Section 2. The focus is on assessing the impact of these methods on global optimization performance. The neural networks are trained as surrogates for several non-convex benchmark functions and as classifiers for handwritten digits using the MNIST dataset. The performance of different optimization solvers is compared across these tasks, particularly emphasizing the effects of various post-processing steps.

Key metrics analyzed include the expressive power of ReLU artificial neural networks (ANNs), quantified by the number of linear regions they define, and the percentage of stable neurons ascertainable from pre-activation bounds. The findings demonstrate that the methodologies introduced in Section 2 significantly enhance these characteristics, leading to improved performance in optimization algorithms.

Discussion

In this section, the authors discuss the complexity measures of ReLU artificial neural networks (ANNs) and their implications for optimization. They highlight two key indicators: the number of linear regions and the number of stable ReLU neurons. The number of linear regions, which corresponds to the feasible activation patterns of neurons, significantly affects the complexity of optimization problems, particularly in mixed-integer linear programming (MILP) frameworks. The authors reference previous works that provide bounds on the number of linear regions based on network depth and width. Additionally, they introduce the concept of stable neurons, which can be identified using pre-activation bounds, thereby reducing the number of binary variables needed for optimization.

The authors also elaborate on techniques for bound tightening, which is crucial for improving the performance of optimization algorithms. They describe two methods: interval arithmetic and linear programming (LP)-based bound tightening. The latter method leverages dependencies between neurons to compute tighter bounds, resulting in a significant reduction in big-M coefficients, which are essential for MILP formulations. The authors present empirical results demonstrating that LP-based bound tightening can halve the big-M coefficients and increase the percentage of stable neurons, leading to improved computational efficiency. Furthermore, they discuss the potential benefits of scaling ReLU networks to achieve lower big-M coefficients while maintaining functional equivalence, suggesting that this approach can be integrated into optimization problems to enhance performance.