Multi-Objective Reinforcement Learning-based Approach for Pressurized Water Reactor Optimization

📄 arXiv: 2312.10194v3 📥 PDF

作者: Paul Seurin, Koroush Shirvan

分类: cs.LG, cs.CE

发布日期: 2023-12-15 (更新: 2024-03-15)


💡 一句话要点

提出PEARL方法以优化压水反应堆的多目标问题

🎯 匹配领域: 支柱二:RL算法与架构 (RL & Architecture)

关键词: 多目标强化学习 压水反应堆 优化方法 课程学习 帕累托前沿 深度学习 约束管理

📋 核心要点

  1. 现有的多目标强化学习方法通常需要多个神经网络来解决子问题,导致计算复杂性高和效率低下。
  2. PEARL方法通过学习单一策略来解决多目标问题,简化了模型结构并提高了效率,同时引入课程学习来管理约束。
  3. 实验结果表明,PEARL在多个性能指标上优于传统方法,特别是在压水反应堆优化问题中表现突出。

📝 摘要(中文)

本文提出了一种新方法,称为带强化学习的帕累托包络(PEARL),旨在解决多目标问题,尤其是在工程领域中候选解决方案评估耗时的问题。PEARL通过学习单一策略,避免了传统多目标强化学习方法中需要多个神经网络独立解决子问题的复杂性。该方法在经典多目标基准测试上进行了评估,并在两个实际的压水反应堆核心加载模式优化问题上进行了测试,展示了其在实际应用中的有效性。PEARL在多个性能指标上超越了传统方法,尤其是PEARL-NdS变体在发现帕累托前沿时表现出色,无需额外的算法设计努力。

🔬 方法详解

问题定义:本文旨在解决压水反应堆(PWR)优化中的多目标问题,现有方法在处理复杂约束时效率低下,且需要多个神经网络来解决不同的子问题。

核心思路:PEARL方法通过学习单一策略来应对多目标优化问题,避免了传统方法的复杂性,并利用课程学习有效管理约束条件。

技术框架:PEARL的整体架构包括策略学习模块、约束管理模块和性能评估模块。首先,通过强化学习优化策略,然后在约束条件下评估性能,最后进行多目标优化。

关键创新:PEARL的主要创新在于其单一策略学习方法,显著减少了对多个神经网络的需求,并通过课程学习有效处理约束,提升了优化效率。

关键设计:在PEARL中,损失函数设计考虑了多目标的平衡,网络结构采用深度学习技术,参数设置经过多次实验调优,以确保在不同约束条件下的稳定性和有效性。

📊 实验亮点

实验结果显示,PEARL方法在多个性能指标上超越了传统优化方法,尤其是PEARL-NdS变体在发现帕累托前沿时表现优异,且在超体积等指标上有显著提升,证明了其在实际应用中的有效性。

🎯 应用场景

该研究的潜在应用领域包括核能工程中的反应堆核心优化、能源管理系统以及其他需要多目标优化的工程问题。PEARL方法的有效性和高效性将为相关领域提供更优的解决方案,推动技术进步和资源优化配置。

📄 摘要(原文)

A novel method, the Pareto Envelope Augmented with Reinforcement Learning (PEARL), has been developed to address the challenges posed by multi-objective problems, particularly in the field of engineering where the evaluation of candidate solutions can be time-consuming. PEARL distinguishes itself from traditional policy-based multi-objective Reinforcement Learning methods by learning a single policy, eliminating the need for multiple neural networks to independently solve simpler sub-problems. Several versions inspired from deep learning and evolutionary techniques have been crafted, catering to both unconstrained and constrained problem domains. Curriculum Learning is harnessed to effectively manage constraints in these versions. PEARL's performance is first evaluated on classical multi-objective benchmarks. Additionally, it is tested on two practical PWR core Loading Pattern optimization problems to showcase its real-world applicability. The first problem involves optimizing the Cycle length and the rod-integrated peaking factor as the primary objectives, while the second problem incorporates the mean average enrichment as an additional objective. Furthermore, PEARL addresses three types of constraints related to boron concentration, peak pin burnup, and peak pin power. The results are systematically compared against conventional approaches. Notably, PEARL, specifically the PEARL-NdS variant, efficiently uncovers a Pareto front without necessitating additional efforts from the algorithm designer, as opposed to a single optimization with scaled objectives. It also outperforms the classical approach across multiple performance metrics, including the Hyper-volume.