تحسين عالمي مصمم لوحدات معالجة الرسوميات: بحث كامل ودقيق عن التminimization غير الخطية على نطاق واسع Global optimization tailored for graphics processing units: Complete and rigorous search for large-scale nonlinear minimization

المجلة: PNAS Nexus، المجلد: 5، العدد: 4
DOI: https://doi.org/10.1093/pnasnexus/pgag103
PMID: https://pubmed.ncbi.nlm.nih.gov/41982507
تاريخ النشر: 2026-04-01
المؤلف: Guanglu Zhang وآخرون
الموضوع الرئيسي: خوارزميات تحسين متعددة الأهداف المتقدمة

نظرة عامة

تقدم هذه الورقة طريقة عددية جديدة لتحديد الحد الأدنى العالمي للدوال غير الخطية المقيدة بواسطة حدود متغيرات بسيطة، باستخدام تحليل الفترات بالتزامن مع القدرات الحاسوبية لوحدات معالجة الرسوميات (GPUs). تقوم الطريقة بشكل منهجي بإزالة المناطق في مجال البحث حيث لا يمكن أن يوجد الحد الأدنى العالمي، مما يؤدي في النهاية إلى تحديد مجموعة محدودة من المناطق التي يجب أن يتواجد فيها. ومن الجدير بالذكر أن هذه الطريقة تضمن احتواء الحد الأدنى العالمي على الرغم من الأخطاء المحتملة في التقريب، مما يعزز موثوقيتها.

لتحسين الأداء، يقدم المؤلفون نمط برمجة متوازي يسمى برنامج واحد، بيانات واحدة (SPSD) يعالج اختناقات الأداء الكبيرة لوحدات معالجة الرسوميات، وخاصة نقل البيانات بين وحدة المعالجة المركزية ووحدة معالجة الرسوميات والوصول إلى الذاكرة العالمية. بالإضافة إلى ذلك، تم دمج تقنية تدوير المتغيرات لتقليل التكاليف الحاسوبية عند التعامل مع الدوال غير الخطية على نطاق واسع. تم التحقق من صحة الطريقة من خلال تقليل أحد عشر دالة اختبار معيارية، بما في ذلك دوال أكلي، غريوانك، ليفي، رستريغين، وروزنبرغ، محققة احتواءً ناجحًا للحدود الدنيا العالمية في حالات تتجاوز 80 بعدًا—وهو إنجاز لم يتم توثيقه سابقًا في الأدبيات. تظهر النتائج قدرة الطريقة على التعامل مع الدوال التي تصل أبعادها إلى 10,000 بكفاءة، مما يمثل تقدمًا كبيرًا في تقنيات التحسين العالمي القابلة للتطبيق عبر مختلف التخصصات العلمية والهندسية.

مقدمة

تتناول مقدمة هذه الورقة البحثية تعقيدات مشاكل التقليل غير الخطية، المحددة بهدف تقليل دالة \( f(x) \) مع مراعاة القيود \( l \leq x \leq u \). قد تكون الدالة \( f \) غير مستمرة، وغير قابلة للاشتقاق، وغير محدبة بشكل كبير، مما يطرح تحديات كبيرة في العثور على الحد الأدنى العالمي. غالبًا ما تفشل طرق التحسين العددي التقليدية، بما في ذلك الطرق المعتمدة على التدرج والطرق الاستدلالية، في تحقيق ذلك بسبب الاعتماد على التخمينات الأولية، والضعف أمام الحدود الدنيا المحلية، وإغفال الأخطاء الناتجة عن التقريب في الحسابات العائمة. تم اقتراح طرق التحسين العالمي الحالية التي تستخدم تحليل الفترات لتحديد الحد الأدنى العالمي ولكنها محدودة بتكاليفها الحاسوبية ومتطلباتها لاستمرارية الدالة وقابليتها للاشتقاق.

تقدم الورقة طريقة جديدة تعتمد على وحدات معالجة الرسوميات تستفيد من القوة الحاسوبية لوحدات معالجة الرسوميات لمعالجة هذه القيود. على عكس الطرق الحالية، لا تتطلب هذه الطريقة الجديدة أن تكون الدالة مستمرة أو قابلة للاشتقاق، وتضمن احتواء الحد الأدنى العالمي، وتم تصميمها خصيصًا لهندسة وحدات معالجة الرسوميات لتعزيز الكفاءة. تستخدم نمط برمجة متوازي يسمى برنامج واحد، بيانات واحدة (SPSD) لتخفيف اختناقات الأداء المرتبطة بنقل البيانات بين وحدة المعالجة المركزية ووحدة معالجة الرسوميات والوصول إلى الذاكرة العالمية. بالإضافة إلى ذلك، تم دمج تقنية تدوير المتغيرات لتقليل التكاليف الحاسوبية بشكل أكبر، خاصة بالنسبة للدوال غير المحدبة على نطاق واسع التي تحتوي على العديد من الحدود الدنيا المحلية. تمثل هذه الطريقة تقدمًا كبيرًا في التحسين العالمي، واعدةً بحلول صارمة وفعالة لمشاكل غير خطية معقدة.

النتائج

في هذا القسم، يتم تقديم نتائج تقليل عشر دوال اختبار معيارية باستخدام طريقة عددية تعتمد على وحدات معالجة الرسوميات. استخدمت الدراسة أربعة موارد حاسوبية، مع تفاصيل التكوينات في المواد التكميلية. يتم تلخيص مقاييس الأداء، بما في ذلك عدد التكرارات، والمناطق الناتجة، وأوقات الحساب لكل دالة عبر أبعاد مختلفة، في الجداول S2 إلى S10. ومن الجدير بالذكر أن “NA” تشير إلى الحالات التي لم يتم فيها استخدام مورد حاسوبي بسبب توقع أوقات تنفيذ طويلة.

على سبيل المثال، أظهرت دالة أكلي تباينًا كبيرًا في أوقات الحساب عبر الموارد، حيث حقق الخادم السحابي أسرع وقت قدره 11 ثانية بعدد أبعاد 50، بينما استغرق الخادم المحلي 72,625 ثانية بعدد أبعاد 1,000. لوحظت اتجاهات مماثلة عبر دوال أخرى، مثل دوال بيليغوندو وبريمان، حيث تفوق الخادم السحابي باستمرار على الموارد الأخرى من حيث السرعة. تسلط النتائج الضوء على كفاءة الطريقة المعتمدة على وحدات معالجة الرسوميات، خاصة في السيناريوهات عالية الأبعاد، وتؤكد على أهمية اختيار الموارد في تحسين الأداء الحاسوبي.

المناقشة

تناقش هذه القسم تطبيق تحليل الفترات في التحسين العالمي، مع تسليط الضوء على قدرته على توفير حدود صارمة للحسابات العددية من خلال تمثيل المتغيرات كفترات. تضمن عمليات الحساب بالفترات، كما هو محدد في معيار IEEE 1788، أن تشمل النتائج جميع القيم الممكنة ضمن الحدود المحددة، مما يعالج مشكلات المبالغة في التقدير بسبب مشكلة الاعتماد. على الرغم من مزاياها النظرية، تواجه الطرق الحالية القائمة على الفترات قيودًا في التطبيقات العملية، خاصة لتقليل الدوال غير الخطية غير المحدبة على نطاق واسع، بسبب اعتمادها على الدوال المستمرة والقابلة للاشتقاق وعدم كفاءتها الحاسوبية على هياكل وحدة المعالجة المركزية.

لمعالجة هذه التحديات، يقترح المؤلفون طريقة عددية جديدة تعتمد على وحدات معالجة الرسوميات مصممة خصيصًا لتقليل الدوال غير الخطية باستخدام تحليل الفترات. تستفيد هذه الطريقة من قدرات الحوسبة المتوازية لوحدات معالجة الرسوميات الحديثة لاستكشاف مجال البحث بكفاءة من خلال استبعاد المناطق غير المثلى حيث لا يمكن أن يوجد الحد الأدنى العالمي بشكل تكراري. تتضمن الطريقة تقنية تدوير المتغيرات لإدارة لعنة الأبعاد، مما يسمح لها بالتعامل مع الدوال التي تحتوي على أكثر من 100 بعد. من خلال استخدام هندسة وحدات معالجة الرسوميات، تهدف الطريقة إلى تحقيق إشغال عالٍ وتحسين استخدام الذاكرة، مما يضمن في النهاية احتواء الحد الأدنى العالمي ضمن وقت حساب معقول مع مراعاة الأخطاء الناتجة عن التقريب. تؤكد تفاصيل التنفيذ على أهمية تقسيم مجال البحث وتقييم الدالة غير الخطية عبر المناطق الفرعية لتحسين الحدود بشكل تكراري، مما يضمن تحديد الحد الأدنى العالمي بدقة.

Journal: PNAS Nexus, Volume: 5, Issue: 4
DOI: https://doi.org/10.1093/pnasnexus/pgag103
PMID: https://pubmed.ncbi.nlm.nih.gov/41982507
Publication Date: 2026-04-01
Author(s): Guanglu Zhang et al.
Primary Topic: Advanced Multi-Objective Optimization Algorithms

Overview

This paper presents a novel numerical method for enclosing the global minimum of nonlinear functions constrained by simple variable bounds, utilizing interval analysis in conjunction with the computational capabilities of graphics processing units (GPUs). The method systematically eliminates regions in the search domain where the global minimum cannot exist, ultimately identifying a finite set of regions where it must reside. Notably, the approach guarantees the enclosure of the global minimum despite potential rounding errors, enhancing its reliability.

To optimize performance, the authors introduce a single program, single data (SPSD) parallel programming style that addresses significant GPU performance bottlenecks, specifically CPU-GPU data transfers and global memory access. Additionally, a variable cycling technique is incorporated to lower computational costs when dealing with large-scale nonlinear functions. The method is validated through the minimization of eleven benchmark test functions, including the Ackley, Griewank, Levy, Rastrigin, and Rosenbrock functions, achieving successful enclosures of global minima in cases exceeding 80 dimensions—an accomplishment not previously documented in the literature. The results demonstrate the method’s capability to handle functions with up to 10,000 dimensions efficiently, marking a significant advancement in global optimization techniques applicable across various scientific and engineering disciplines.

Introduction

The introduction of this research paper addresses the complexities of nonlinear minimization problems, defined by the objective of minimizing a function \( f(x) \) subject to constraints \( l \leq x \leq u \). The function \( f \) may be non-continuous, non-differentiable, and highly nonconvex, posing significant challenges in finding the global minimum. Traditional numerical optimization methods, including gradient-based and heuristic approaches, often fail to achieve this due to reliance on initial guesses, susceptibility to local minima, and neglect of rounding errors in floating-point arithmetic. Existing global optimization methods utilizing interval analysis have been proposed to enclose the global minimum but are limited by their computational expense and requirement for function continuity and differentiability.

The paper introduces a novel GPU-based method that leverages the computational power of graphics processing units to address these limitations. Unlike existing methods, this new approach does not necessitate the function to be continuous or differentiable, guarantees the enclosure of the global minimum, and is designed specifically for GPU architecture to enhance efficiency. It employs a single program, single data (SPSD) parallel programming style to mitigate performance bottlenecks associated with CPU-GPU data transfers and global memory access. Additionally, a variable cycling technique is incorporated to further reduce computational costs, particularly for large-scale nonconvex functions with numerous local minima. This method represents a significant advancement in global optimization, promising rigorous and efficient solutions for complex nonlinear problems.

Results

In this section, the results of minimizing ten benchmark test functions using a GPU-based numerical method are presented. The study employed four computational resources, with configurations detailed in the supplementary materials. The performance metrics, including the number of iterations, output regions, and computation times for each function across varying dimensions, are summarized in tables S2 to S10. Notably, “NA” indicates instances where a computational resource was not utilized due to anticipated long execution times.

For example, the Ackley function demonstrated significant variability in computation times across resources, with the cloud server achieving the fastest time of 11 seconds for a dimension of 50, while the local server took 72,625 seconds for a dimension of 1,000. Similar trends were observed across other functions, such as the Belegundu and Breiman functions, where the cloud server consistently outperformed other resources in terms of speed. The results highlight the efficiency of the GPU-based method, particularly in high-dimensional scenarios, and underscore the importance of resource selection in optimizing computational performance.

Discussion

The section discusses the application of interval analysis in global optimization, highlighting its ability to provide rigorous bounds for numerical computations by representing variables as intervals. Interval arithmetic operations, as defined in the IEEE 1788 standard, ensure that results encompass all possible values within specified bounds, addressing issues of overestimation due to the dependence problem. Despite its theoretical advantages, existing interval methods face limitations in practical applications, particularly for minimizing large-scale nonconvex nonlinear functions, due to their reliance on continuous and differentiable functions and their computational inefficiency on CPU architectures.

To address these challenges, the authors propose a novel GPU-based numerical method designed specifically for minimizing nonlinear functions using interval analysis. This method leverages the parallel computing capabilities of modern GPUs to efficiently explore the search domain by iteratively ruling out suboptimal regions where the global minimum cannot exist. The method incorporates a variable cycling technique to manage the curse of dimensionality, allowing it to handle functions with over 100 dimensions. By utilizing GPU architecture, the method aims to achieve high occupancy and optimize memory usage, ultimately guaranteeing the enclosure of the global minimum within a reasonable computation time while accommodating rounding errors. The implementation details emphasize the importance of partitioning the search domain and evaluating the nonlinear function across subregions to refine bounds iteratively, ensuring that the global minimum is accurately identified.