نهج عميق لتدرج السياسة الحتمية لتحسين معدلات التغذية وإدارة جودة المياه في أنظمة تربية الأحياء المائية المعاد تدويرها A deep deterministic policy gradient approach for optimizing feeding rates and water quality management in recirculating aquaculture systems

المجلة: Aquaculture International، المجلد: 33، العدد: 4
DOI: https://doi.org/10.1007/s10499-025-01914-z
تاريخ النشر: 2025-03-31
المؤلف: Wael M. Elmessery وآخرون
الموضوع الرئيسي: تكنولوجيا مراقبة جودة المياه

نظرة عامة

تقدم هذه البحث خوارزمية جديدة للتعلم المعزز تعتمد على سياسة حتمية عميقة (DDPG) لتحسين معدلات التغذية في أنظمة تربية الأحياء المائية المعاد تدويرها (RAS). لا يعزز النظام المطور نمو الأسماك وصحتها فحسب، بل يدمج أيضًا إدارة جودة المياه، مما يؤدي إلى تحسين الكفاءة التشغيلية والاستقرار. تفوق جهاز التحكم DDPG على طرق التحكم التقليدية، مثل التحكم التنبؤي النموذجي (MPC)، وPID، والتحكم بنمط Bang-Bang، محققًا انخفاضًا بنسبة 25.1% في خطأ الجذر التربيعي المتوسط (RMSE)، وانخفاضًا بنسبة 77.9% في استهلاك العلف، وزيادة بنسبة 17.9% في مؤشر الاستقرار. علاوة على ذلك، حافظ على معايير جودة المياه الحرجة (مثل الأكسجين المذاب: 6.0-7.2 ملغ/لتر، pH: 6.8-7.8، الأمونيا: < 0.3 ملغ/لتر) مع استقرار تشغيلي يزيد عن 96%. أظهر جهاز التحكم DDPG أوقات استرداد أسرع من الاضطرابات البيئية، مع تحسينات تتراوح بين 25-60% لتغيرات درجة الحرارة و31-64% لتقلبات الأكسجين المذاب. أشار تحليل اقتصادي إلى توفيرات تشغيلية كبيرة، مع توفيرات سنوية تتراوح بين 12,000 دولار و466,000 دولار، مما أدى إلى عائد على الاستثمار (ROI) يتراوح بين 85% إلى 238%. تم التحقق من متانة النظام عبر مقاييس تشغيلية مختلفة، ومراحل نمو، وظروف عطل، مع الحفاظ على دقة تزيد عن 94%. تشمل اتجاهات البحث المستقبلية تحسين آليات التحكم، وتحليل الارتباطات الزمنية بين التغذية وجودة المياه، وتقييم قابلية التوسع للمرافق الأكبر. بشكل عام، تشير هذه المقاربة المعتمدة على DDPG إلى تقدم كبير في إدارة تربية الأحياء المائية الذكية، مع إمكانية تحويل ممارسات RAS التجارية وتعزيز الاستدامة في الصناعة.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على الأهمية المتزايدة لتربية الأحياء المائية المستدامة، لا سيما من خلال استخدام أنظمة تربية الأحياء المائية المعاد تدويرها (RAS)، التي تقلل بشكل كبير من استخدام المياه بينما تحسن نمو الأسماك. على الرغم من مزاياها، فإن إدارة معدلات التغذية في RAS تمثل تحديات تؤثر بشكل مباشر على كل من نمو الأسماك وجودة المياه. غالبًا ما تفشل طرق التحكم في التغذية التقليدية في التكيف مع الطبيعة الديناميكية لعملية التمثيل الغذائي للأسماك والظروف البيئية، مما يؤدي إلى مشاكل مثل تدهور جودة المياه بسرعة بسبب الإفراط في التغذية. تؤكد الورقة على الحاجة إلى استراتيجيات تحكم متقدمة يمكن أن تعدل معدلات التغذية ديناميكيًا بناءً على بيانات الوقت الحقيقي، مع مراعاة عوامل مثل حجم الأسماك ومعايير جودة المياه.

لمعالجة هذه التحديات، يقترح المؤلفون مقاربة جديدة تعتمد على سياسة حتمية عميقة (DDPG) تدمج تحسين متعدد الأهداف لمعدلات التغذية وإدارة جودة المياه. يتميز هذا النظام بوظيفة مكافأة مبتكرة ذات مكونين توازن بين أهداف التغذية الفورية والاستقرار على المدى الطويل، مما يعزز القدرة على التكيف والأداء عبر ظروف تشغيلية متنوعة. تهدف الدراسة إلى التحقق من صحة هذه المقاربة من خلال اختبارات واسعة، مما يظهر تحسينات كبيرة في أداء النظام، والاستقرار، والكفاءة الاقتصادية مقارنة بالطرق التقليدية. تؤكد الأبحاث على إمكانية تقنيات الذكاء الاصطناعي المتقدمة، لا سيما التعلم المعزز، في إحداث ثورة في إدارة تربية الأحياء المائية.

طرق

تضمنت الطرق المستخدمة في هذه الدراسة منهجية تحقق صارمة متعددة المراحل لنظام التحكم في التغذية المعتمد على سياسة حتمية عميقة (DDPG) في تربية الأحياء المائية. تم هيكلة عملية التحقق حول ثلاثة أبعاد: تقسيمات زمنية، تقسيمات حسب الخزانات، وتغيرات في الظروف البيئية. في البداية، ركزت مرحلة التدريب على تتبع مقاييس الأداء، بما في ذلك مكافآت الحلقات والأوزان المثلى للنموذج بناءً على المكافآت التراكمية. تم تسجيل توزيعات الإجراءات وانتقالات الحالة لتحليل تقدم التعلم. بعد ذلك، تم اختبار توصيات النموذج ضد بيانات محجوزة لتقييم قدرات التعميم، لا سيما فيما يتعلق باستقرار معدل التغذية واستجابته لتغيرات جودة المياه. تم التحقق من القيود التشغيلية في المرحلة النهائية، مما يضمن أن توصيات معدل التغذية تلتزم بالحدود العملية وأن النظام يمكنه الانتقال بسلاسة بين المعدلات بينما يستجيب بفعالية لتقلبات البيئة.

تم إجراء تقييمات تجريبية تحت ثلاثة سيناريوهات متميزة لتقييم متانة النظام. تم الحفاظ على ظروف التشغيل العادية عند درجة حرارة 29.7 ± 0.5 °م، ومستويات الأكسجين المذاب بين 6.0 و7.2 ملغ/لتر، ومستويات pH من 6.8-7.8 على مدى فترة 300 يوم مع قياسات بفواصل زمنية قدرها 15 دقيقة. أدخلت اختبارات الاضطراب البيئي اضطرابات محكومة، مثل تقلبات درجة الحرارة بمقدار ± 2 °م، وانخفاضات الأكسجين المذاب بنسبة 20%، وارتفاعات الأمونيا من 0.1 إلى 0.3 ملغ/لتر، كل منها يستمر لمدة 48 ساعة لتقييم استرداد النظام. اختبرت اختبارات الاستقرار على المدى الطويل على مدى 30 يومًا تحديات تشغيلية ديناميكية، مع دمج متطلبات تغذية متغيرة ومعايير جودة مياه متقلبة. قدمت هذه المقاربة الشاملة رؤى قيمة حول كل من الاستجابات قصيرة المدى والاستقرار على المدى الطويل لنظام التحكم في التغذية تحت ظروف تشغيلية متنوعة.

النتائج

يقدم قسم النتائج نتائج الدراسة، مع تسليط الضوء على النتائج الرئيسية وآثارها. يكشف التحليل عن ارتباطات كبيرة بين المتغيرات قيد التحقيق، مع مؤشرات إحصائية تشير إلى قيمة p أقل من 0.05، مما يشير إلى أن النتائج ذات دلالة إحصائية. علاوة على ذلك، تدعم البيانات الفرضيات الأولية، مما يظهر أن التدخل كان له تأثير قابل للقياس على المتغير التابع.

بالإضافة إلى النتائج الرئيسية، تتناول المناقشة الآليات المحتملة الكامنة وراء هذه النتائج، مستندة إلى الأدبيات الموجودة لوضع النتائج في سياق أوسع. يؤكد المؤلفون على أهمية هذه النتائج للبحوث المستقبلية والتطبيقات العملية، مقترحين أن التدخل قد يكون مفيدًا في سياقات مماثلة. بشكل عام، تسهم النتائج في تقديم رؤى قيمة تعزز الفهم في مجال الدراسة.

مناقشة

تناقش الورقة البحثية تنفيذ وتقييم نظام التحكم المعتمد على سياسة حتمية عميقة (DDPG) ضمن نظام تربية الأحياء المائية المعاد تدويرها على نطاق واسع (RAS) يتكون من 108 خزانات. أنشأت الدراسة إطار تقييم قابلية التوسع لتقييم قدرة جهاز التحكم DDPG على التكيف عبر مقاييس تشغيل صغيرة (1000 لتر)، ومتوسطة (10,000 لتر)، وكبيرة (50,000 لتر)، مما يظهر إمكانيته للتطبيق المرن في بيئات تربية الأحياء المائية المختلفة. تضمنت استراتيجية جمع البيانات أخذ عينات عالية الدقة ومراقبة طويلة الأجل، مما أدى إلى توليد أكثر من 77,000 ملاحظة عبر دورة إنتاجية مدتها 300 يوم، وهو ما كان حاسمًا لتتبع الظروف المحلية وأداء النظام.

تم تصميم بنية DDPG بشبكتين مزدوجتين—الممثل والناقد—تم تحسينهما من خلال ضبط المعلمات بشكل منهجي. تضمنت بروتوكولات التدريب استراتيجيات تنظيم ووظيفة مكافأة شاملة توازن بين تحسين نمو الأسماك واستقرار العمليات. حافظ نظام التحكم على معايير جودة المياه الحرجة وطبق استراتيجية إدارة الأمونيا متعددة المستويات لضمان صحة الأسماك. تضمنت مقاييس تقييم الأداء تتبع الدقة، وجهد التحكم، واستقرار النظام، مع نتائج تشير إلى قدرة جهاز التحكم DDPG على التكيف والفعالية عبر سياقات تشغيلية مختلفة. توفر هذه المقاربة الشاملة رؤى قيمة حول استراتيجيات التحكم المتقدمة لأنظمة تربية الأحياء المائية، مما يبرز أهمية دمج التعلم الآلي مع المراقبة البيئية لتحقيق أداء تشغيلي مثالي.

القيود

تسلط الدراسة الضوء على قابلية التوسع والجدوى العملية لنظام التحكم المعتمد على سياسة حتمية عميقة (DDPG) ضمن منشأة تربية الأحياء المائية المعاد تدويرها على نطاق واسع (RAS). ومع ذلك، تعترف بعدة قيود تتطلب مزيدًا من البحث. تشمل المجالات الرئيسية للاستكشاف تحسين قدرة النظام على التكيف مع الظروف التشغيلية المتغيرة وتعزيز قابليته للتعميم عبر بيئات RAS المختلفة. سيكون معالجة هذه القيود أمرًا حاسمًا لتحسين أداء نظام التحكم وضمان فعاليته في إعدادات تربية الأحياء المائية المتنوعة.

Journal: Aquaculture International, Volume: 33, Issue: 4
DOI: https://doi.org/10.1007/s10499-025-01914-z
Publication Date: 2025-03-31
Author(s): Wael M. Elmessery et al.
Primary Topic: Water Quality Monitoring Technologies

Overview

This research presents a novel Deep Deterministic Policy Gradient (DDPG) reinforcement learning algorithm for optimizing feeding rates in recirculating aquaculture systems (RAS). The developed system not only enhances fish growth and health but also integrates water quality management, leading to improved operational efficiency and stability. The DDPG controller outperformed traditional control methods, such as Model Predictive Control (MPC), PID, and Bang-Bang control, achieving a 25.1% reduction in root mean square error (RMSE), a 77.9% decrease in feed consumption, and a 17.9% enhancement in stability index. Furthermore, it maintained critical water quality parameters (e.g., dissolved oxygen: 6.0-7.2 mg/L, pH: 6.8-7.8, ammonia: < 0.3 mg/L) with over 96% operational stability. The DDPG controller demonstrated faster recovery times from environmental perturbations, with improvements ranging from 25-60% for temperature changes and 31-64% for dissolved oxygen fluctuations. An economic analysis indicated substantial operational savings, with annual savings between $12,000 and $466,000, leading to a return on investment (ROI) of 85% to 238%. The system's robustness was validated across various operational scales, growth phases, and fault conditions, maintaining over 94% accuracy. Future research directions include refining control mechanisms, analyzing temporal correlations between feeding and water quality, and assessing scalability for larger facilities. Overall, this DDPG-based approach signifies a significant advancement in intelligent aquaculture management, with the potential to transform commercial RAS practices and promote sustainability in the industry.

Introduction

The introduction of this research paper highlights the increasing importance of sustainable aquaculture, particularly through the use of Recirculating Aquaculture Systems (RAS), which significantly reduce water usage while optimizing fish growth. Despite their advantages, managing feeding rates in RAS presents challenges that directly affect both fish growth and water quality. Traditional feeding control methods often fail to adapt to the dynamic nature of fish metabolism and environmental conditions, leading to issues such as rapid water quality deterioration due to overfeeding. The paper emphasizes the need for advanced control strategies that can dynamically adjust feeding rates based on real-time data, considering factors like fish size and water quality parameters.

To address these challenges, the authors propose a novel Deep Deterministic Policy Gradient (DDPG)-based approach that integrates multi-objective optimization for feeding rates and water quality management. This system features an innovative dual-component reward function that balances immediate feeding goals with long-term stability, enhancing adaptability and performance across varying operational conditions. The study aims to validate this approach through extensive testing, demonstrating significant improvements in system performance, stability, and economic efficiency compared to traditional methods. The research underscores the potential of advanced artificial intelligence techniques, particularly reinforcement learning, in revolutionizing aquaculture management.

Methods

The methods employed in this study involved a rigorous multi-phase validation methodology for a Deep Deterministic Policy Gradient (DDPG)-based feeding control system in aquaculture. The validation process was structured around three dimensions: temporal splits, tank-wise partitioning, and variations in environmental conditions. Initially, the training phase focused on tracking performance metrics, including episode rewards and the model’s optimal weights based on cumulative rewards. Action distributions and state transitions were recorded to analyze the learning progression. Subsequently, the model’s recommendations were tested against held-out data to evaluate generalization capabilities, particularly concerning feeding rate stability and responsiveness to varying water quality conditions. The final phase verified operational constraints, ensuring that feeding rate recommendations adhered to practical limits and that the system could smoothly transition between rates while responding effectively to environmental fluctuations.

Experimental evaluations were conducted under three distinct scenarios to assess the system’s robustness. Normal operating conditions were maintained at a temperature of 29.7 ± 0.5 °C, dissolved oxygen levels between 6.0 and 7.2 mg/L, and pH levels of 6.8-7.8 over a 300-day period with 15-minute interval measurements. Environmental perturbation tests introduced controlled disturbances, such as temperature fluctuations of ± 2 °C, dissolved oxygen reductions of 20%, and ammonia spikes of 0.1 to 0.3 mg/L, each lasting 48 hours to evaluate system recovery. Long-term stability testing over 30 days simulated dynamic operational challenges, incorporating varying feeding demands and fluctuating water quality parameters. This comprehensive approach provided valuable insights into both the short-term responses and long-term stability of the feeding control system under diverse operational conditions.

Results

The results section presents the findings of the study, highlighting key outcomes and their implications. The analysis reveals significant correlations between the variables under investigation, with statistical tests indicating a p-value of less than 0.05, suggesting that the results are statistically significant. Furthermore, the data supports the initial hypotheses, demonstrating that the intervention had a measurable impact on the dependent variable.

In addition to the primary findings, the discussion elaborates on the potential mechanisms underlying these results, drawing on existing literature to contextualize the findings within the broader field. The authors emphasize the importance of these results for future research and practical applications, suggesting that the intervention could be beneficial in similar contexts. Overall, the results contribute valuable insights that advance understanding in the area of study.

Discussion

The research paper discusses the implementation and evaluation of a Deep Deterministic Policy Gradient (DDPG) control system within a large-scale recirculating aquaculture system (RAS) comprising 108 tanks. The study established a scalability assessment framework to evaluate the DDPG controller’s adaptability across small (1000 L), medium (10,000 L), and large (50,000 L) operational scales, demonstrating its potential for flexible application in various aquaculture environments. The data collection strategy involved high-resolution sampling and long-term monitoring, generating over 77,000 observations across a 300-day production cycle, which was crucial for tracking local conditions and system performance.

The DDPG architecture was designed with dual networks—actor and critic—optimized through systematic hyperparameter tuning. The training protocol incorporated regularization strategies and a comprehensive reward function that balanced fish growth optimization with operational stability. The control system maintained critical water quality parameters and employed a multi-level ammonia management strategy to ensure fish health. Performance evaluation metrics included tracking accuracy, control effort, and system stability, with results indicating the DDPG controller’s robust adaptability and effectiveness across different operational contexts. This comprehensive approach provides valuable insights into advanced control strategies for aquaculture systems, emphasizing the importance of integrating machine learning with environmental monitoring for optimal operational performance.

Limitations

The study highlights the scalability and practical feasibility of a Deep Deterministic Policy Gradient (DDPG)-based control system within a large-scale Recirculating Aquaculture System (RAS) facility. However, it acknowledges several limitations that necessitate further research. Key areas for exploration include improving the system’s adaptability to varying operational conditions and enhancing its generalizability across different RAS environments. Addressing these limitations will be crucial for optimizing the control system’s performance and ensuring its effectiveness in diverse aquaculture settings.