التعلم المعزز لتحسين كفاءة الطاقة في مراكز البيانات: مراجعة منهجية للأدبيات وخارطة بحثية Reinforcement learning for data center energy efficiency optimization: A systematic literature review and research roadmap

المجلة: Applied Energy، المجلد: 389
DOI: https://doi.org/10.1016/j.apenergy.2025.125734
تاريخ النشر: 2025-03-26
المؤلف: Hussain Kahil وآخرون
الموضوع الرئيسي: الحوسبة السحابية وإدارة الموارد

نظرة عامة

يقدم قسم ورقة البحث نظرة شاملة على تطبيق التعلم المعزز (RL) والتعلم المعزز العميق (DRL) في تحسين كفاءة الطاقة داخل مراكز البيانات، مع التركيز بشكل خاص على أنظمة التبريد وتكنولوجيا المعلومات والاتصالات (ICT). قام المؤلفون بإجراء مراجعة منهجية للأدبيات وفقًا لبروتوكول PRISMA، حيث قاموا بتحليل 65 دراسة تناولت ثمانية أسئلة بحثية رئيسية تتعلق بخوارزميات RL/DRL، ومقاييس الطاقة، وإعدادات التجارب، ومقارنات المعايير. تسلط النتائج الضوء على التوفير الكبير في الطاقة الذي تم تحقيقه من خلال استراتيجيات RL المختلفة، مثل استخدام خوارزميات التدرج السياسي الحتمي العميق (DDPG) وخوارزمية الممثل الناعم (SAC)، والتي أظهرت تخفيضات في الطاقة تتراوح من 10% إلى أكثر من 32% في تكوينات مختلفة.

تحدد المراجعة أيضًا الفجوات البحثية والتحديات الرئيسية في هذا المجال، مقترحة اتجاهات مستقبلية لدمج تقنيات RL لتعزيز تحسين الطاقة في مراكز البيانات. ومن الجدير بالذكر أن الورقة تؤكد على أهمية استراتيجيات التحسين المشترك التي تأخذ في الاعتبار أهدافًا متعددة، بما في ذلك كفاءة الطاقة والاستقرار الحراري. تكشف التحليلات التفصيلية للدراسات المختارة أن أساليب RL يمكن أن توازن بفعالية بين استهلاك الطاقة والقيود التشغيلية، مما يمهد الطريق لعمليات مراكز بيانات أكثر استدامة. بشكل عام، تعتبر هذه الدراسة موردًا قيمًا لتوجيه الأبحاث المستقبلية التي تهدف إلى تحسين كفاءة الطاقة وأداء مراكز البيانات من خلال منهجيات RL المتقدمة.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على الطلب المتزايد على قوة الحوسبة المدفوعة بالرقمنة وانتشار تقنيات الذكاء الاصطناعي، مما جعل مراكز البيانات بنية تحتية حيوية في المجتمع الحديث. تقدر الوكالة الدولية للطاقة أن مراكز البيانات استهلكت حوالي 460 تيراوات ساعة (TWh) من الكهرباء في عام 2022، مع توقعات تشير إلى أن هذا قد يتجاوز 1000 TWh بحلول عام 2026. لا يرفع هذا الاستهلاك الكبير للطاقة التكاليف التشغيلية فحسب، بل له أيضًا آثار بيئية خطيرة، بما في ذلك انبعاثات غازات الدفيئة والضغط على شبكات الطاقة. وبالتالي، أصبح تعزيز كفاءة الطاقة في مراكز البيانات أمرًا ملحًا، حيث برز التعلم المعزز (RL) والتعلم المعزز العميق (DRL) كمنهجيات واعدة لتحسين استخدام الطاقة من خلال اتخاذ قرارات تكيفية في الوقت الحقيقي.

تهدف الورقة إلى مراجعة منهجية لتطبيقات RL/DRL في تحسين كفاءة الطاقة داخل مراكز البيانات، مع معالجة الفجوات الموجودة في الأدبيات فيما يتعلق بالدمج الشامل لهذه التقنيات عبر أنظمة فرعية مختلفة. تشمل الأهداف المحددة تقييم فعالية تطبيقات RL/DRL، وتلخيص تفاصيل التنفيذ، وتحديد التحديات، واقتراح خارطة طريق استراتيجية للأبحاث المستقبلية. من خلال استخدام إطار عمل PRISMA، تسعى الدراسة إلى توحيد المعرفة حول تطبيقات RL/DRL في مراكز البيانات، وبالتالي المساهمة في فهم إمكانياتها في تحسين كفاءة الطاقة مع معالجة تحسين متعدد الأهداف للعمليات المستدامة. ستوفر الأقسام اللاحقة من الورقة خلفية شاملة حول خوارزميات RL/DRL، ومنهجية البحث، وتحليل الأدبيات التفصيلي، مما يؤدي إلى مناقشات حول الفجوات البحثية والاتجاهات المستقبلية.

طرق البحث

اتبعت منهجية هذه المراجعة إطار عمل PRISMA لضمان الشفافية وقابلية التكرار. من بين 14 دراسة حول أنظمة التبريد، نفذت دراسة واحدة فقط الاستراتيجية المقترحة للتعلم المعزز العميق (DRL) في مركز بيانات حقيقي، بينما اعتمدت الدراسات الأخرى على المحاكاة باستخدام مجموعات بيانات حقيقية أو تركيبية أو هجينة. تضمنت أدوات المحاكاة الرئيسية برنامج EnergyPlus لنمذجة استهلاك الطاقة وMATLAB، وخاصة أدوات Simulink وSimscape، التي تم استخدامها بشكل متكرر لتقييم استراتيجيات التحكم في أنظمة التبريد. يتم تقديم نظرة عامة مفصلة عن إعدادات التجارب، بما في ذلك البيئات، ومصادر مجموعات البيانات، والمنصات، في الجدول 5.

استخدمت حوالي 50% من الدراسات CloudSim ونسختها WorkflowSim لتحسين الجدولة وإدارة الآلات الافتراضية في أنظمة تكنولوجيا المعلومات والاتصالات لمراكز البيانات. بالإضافة إلى ذلك، كانت لغات البرمجة مثل Java وPython مستخدمة بشكل شائع في المحاكاة، حيث تم استخدام MATLAB في أربع دراسات. تم الاستفادة من عدة مجموعات بيانات حقيقية من مراكز بيانات كبرى، بما في ذلك Google وAlibaba، بالإضافة إلى مراكز أصغر مثل المركز الوطني للحوسبة الفائقة في سنغافورة. لعبت مجموعات البيانات التركيبية أيضًا دورًا حاسمًا في إنشاء سيناريوهات اختبار محكومة. يتم تقديم ملخص شامل لبيئات المحاكاة، ومصادر مجموعات البيانات، والمنصات المستخدمة عبر الدراسات المحددة في الجدول 6.

النتائج

في قسم النتائج، يقدم المؤلفون مراجعة منهجية للدراسات المتعلقة بالتعلم المعزز (RL) وخوارزميات التعلم المعزز العميق (DRL) المطبقة على كفاءة الطاقة في مراكز البيانات. يتم تصنيف كل دراسة محددة مع تفاصيل أساسية مثل العنوان، والمؤلفين، ومكان النشر، والسنة، مما يسهل تحليل البيانات المنظم. يصنف المؤلفون الدراسات بناءً على الأنظمة الفرعية المستهدفة، مما يوفر رؤى حول الخوارزميات والنماذج المستخدمة، بالإضافة إلى تلخيص لمشكلات البحث، والأهداف، وإعدادات التجارب، ونتائج الطاقة.

يكشف تقييم نتائج كفاءة الطاقة عن اتجاه في الإبلاغ عن المقاييس بشكل أساسي كنسب تخفيض في استهلاك الطاقة مقارنة بالمعايير. ومع ذلك، يجادل المؤلفون بضرورة اعتماد مقاييس أكثر شمولاً مصممة خصيصًا لعمليات مراكز البيانات. تشمل المقاييس الرئيسية التي تم مناقشتها فعالية استخدام الطاقة (PUE)، وفعالية إعادة استخدام الطاقة (ERE)، وإنتاجية الطاقة لمراكز البيانات (DCeP)، وأداء مراكز البيانات لكل وحدة طاقة (DPPE). تقدم كل مقياس منظورًا فريدًا حول كفاءة الطاقة، ومع ذلك، يؤكد المؤلفون على ضرورة دمج مقاييس متعددة لتحقيق تقييم شامل لأداء استراتيجيات تحسين RL/DRL المقترحة عبر أبعاد تشغيلية مختلفة.

المناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على الأدبيات الحالية حول كفاءة الطاقة في أنظمة تبريد مراكز البيانات وأنظمة تكنولوجيا المعلومات والاتصالات، مع التأكيد على دور خوارزميات التعلم المعزز (RL) والتعلم المعزز العميق (DRL). استكشفت العديد من المراجعات استراتيجيات تحسين مختلفة، بما في ذلك استخدام RL للتحكم في التبريد وDRL لإدارة طاقة HVAC. ومن الجدير بالذكر أنه بينما تركز بعض الدراسات على جوانب محددة مثل دمج الآلات الافتراضية وجدولة الموارد، هناك فجوة معترف بها في المراجعات المنهجية التي تجمع تطبيقات RL/DRL التي تهدف إلى تعزيز كفاءة الطاقة عبر أنظمة مراكز البيانات. يجادل المؤلفون بأن المراجعات السابقة لم تعالج بشكل كافٍ استراتيجيات التحسين المشترك وتفتقر إلى مناقشات تفصيلية حول إعدادات التجارب، ومصادر البيانات، ومنصات التنفيذ.

لمعالجة هذه الفجوات، يقترح المؤلفون مراجعة منهجية للأدبيات تجمع بين التقدمات الحديثة في تقنيات RL/DRL لكفاءة الطاقة في مراكز البيانات. يحددون أسئلة بحث رئيسية تهدف إلى فهم الأنظمة الفرعية المستهدفة، والخوارزميات المستخدمة، وإعدادات التجارب، والمقاييس المستخدمة لتقييم تحسينات كفاءة الطاقة. تهدف المراجعة إلى تقديم تحليل شامل للحالة الحالية للبحث، وتحديد الفجوات الموجودة، واقتراح اتجاهات مستقبلية لتطبيق RL/DRL في تحسين كفاءة الطاقة في مراكز البيانات. سيتم تلخيص النتائج في جدول مقارن، يوضح كيف تميزت هذه الدراسة عن جهود البحث السابقة.

Journal: Applied Energy, Volume: 389
DOI: https://doi.org/10.1016/j.apenergy.2025.125734
Publication Date: 2025-03-26
Author(s): Hussain Kahil et al.
Primary Topic: Cloud Computing and Resource Management

Overview

The research paper section provides a comprehensive overview of the application of Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL) in optimizing energy efficiency within data centers, specifically focusing on cooling and information and communication technology (ICT) systems. The authors conducted a systematic literature review adhering to the PRISMA Protocol, analyzing 65 studies that addressed eight key research questions related to RL/DRL algorithms, energy metrics, experimental setups, and benchmark comparisons. The findings highlight significant energy savings achieved through various RL strategies, such as the use of Deep Deterministic Policy Gradient (DDPG) and Soft Actor-Critic (SAC) algorithms, which demonstrated energy reductions ranging from 10% to over 32% in different configurations.

The review also identifies critical research gaps and challenges in the field, suggesting future directions for integrating RL techniques to enhance energy optimization in data centers. Notably, the paper emphasizes the importance of joint optimization strategies that consider multiple objectives, including energy efficiency and thermal stability. The detailed analysis of selected studies reveals that RL approaches can effectively balance energy consumption with operational constraints, paving the way for more sustainable data center operations. Overall, this work serves as a valuable resource for guiding future research aimed at improving the energy efficiency and performance of data centers through advanced RL methodologies.

Introduction

The introduction of this research paper highlights the increasing demand for computing power driven by digitalization and the proliferation of AI technologies, which has positioned data centers as critical infrastructure in modern society. The International Energy Agency estimates that data centers consumed approximately 460 terawatt hours (TWh) of electricity in 2022, with projections suggesting this could exceed 1000 TWh by 2026. This significant energy consumption not only raises operational costs but also has serious environmental implications, including greenhouse gas emissions and strain on power grids. Consequently, enhancing energy efficiency in data centers has become imperative, with Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL) emerging as promising methodologies for optimizing energy use through real-time adaptive decision-making.

The paper aims to systematically review the applications of RL/DRL in improving energy efficiency within data centers, addressing existing gaps in the literature regarding the holistic integration of these techniques across various subsystems. Specific objectives include evaluating the effectiveness of RL/DRL applications, summarizing implementation details, identifying challenges, and proposing a strategic roadmap for future research. By utilizing the PRISMA framework, the study seeks to consolidate knowledge on RL/DRL implementations in data centers, thereby contributing to the understanding of their potential in optimizing energy efficiency while addressing multi-objective optimization for sustainable operations. The subsequent sections of the paper will provide a comprehensive background on RL/DRL algorithms, research methodology, and detailed literature analysis, culminating in discussions on research gaps and future directions.

Methods

The methodology of this review adhered to the PRISMA framework to ensure transparency and reproducibility. Among the 14 studies on cooling systems, only one implemented the proposed Deep Reinforcement Learning (DRL) strategy in a real-world data center, while the others relied on simulations using real, synthetic, or hybrid datasets. Key simulation tools included the EnergyPlus program for energy consumption modeling and MATLAB, particularly its Simulink and Simscape toolboxes, which were frequently used to evaluate cooling system control strategies. A detailed overview of the experimental setups, including environments, dataset sources, and platforms, is provided in Table 5.

Approximately 50% of the studies utilized CloudSim and its variant WorkflowSim for scheduling optimization and virtual machine management in data center ICT systems. Additionally, programming languages such as Java and Python were commonly employed for simulations, with MATLAB being used in four studies. Several real-world datasets from major data centers, including Google and Alibaba, as well as smaller centers like the National Supercomputing Centre of Singapore, were leveraged. Synthetic datasets also played a crucial role in creating controlled testing scenarios. A comprehensive summary of the simulation environments, dataset sources, and platforms used across the identified studies is presented in Table 6.

Results

In the Results section, the authors present a systematic review of studies related to reinforcement learning (RL) and deep reinforcement learning (DRL) algorithms applied to energy efficiency in data centers. Each identified study is cataloged with essential details such as title, authors, publication venue, and year, facilitating organized data analysis. The authors categorize the studies based on targeted subsystems, providing insights into the algorithms and models used, as well as a synthesis of research problems, objectives, experimental setups, and energy outcomes.

The evaluation of energy efficiency results reveals a trend in reporting metrics primarily as percentage reductions in energy consumption relative to benchmarks. However, the authors argue for the adoption of more comprehensive metrics tailored to data center operations. Key metrics discussed include Power Usage Effectiveness (PUE), Energy Reuse Effectiveness (ERE), Data Center Energy Productivity (DCeP), and Data Center Performance per Energy (DPPE). Each metric offers a unique perspective on energy efficiency, yet the authors emphasize the necessity of combining multiple metrics to achieve a holistic evaluation of the performance of proposed RL/DRL optimization strategies across various operational dimensions.

Discussion

The discussion section of the research paper highlights the existing literature on energy efficiency in data center cooling systems and ICT systems, emphasizing the role of Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL) algorithms. Several reviews have explored various optimization strategies, including the use of RL for cooling control and DRL for HVAC energy management. Notably, while some studies focus on specific aspects like VM consolidation and resource scheduling, there is a recognized gap in systematic overviews that consolidate RL/DRL applications aimed at enhancing energy efficiency across data center systems. The authors argue that previous reviews inadequately address joint optimization strategies and lack detailed discussions on experimental setups, data sources, and implementation platforms.

To address these gaps, the authors propose a systematic literature review that synthesizes recent advancements in RL/DRL techniques for energy efficiency in data centers. They outline key research questions aimed at understanding the targeted subsystems, the algorithms employed, experimental setups, and the metrics used to evaluate energy efficiency improvements. The review aims to provide a comprehensive analysis of the current state of research, identify existing gaps, and suggest future directions for the application of RL/DRL in optimizing energy efficiency in data centers. The findings will be summarized in a comparative table, illustrating how this study differentiates itself from prior research efforts.