| 11 |
AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning |
提出AutoRule以自动化提取规则改善偏好学习 |
reinforcement learning preference learning RLHF |
✅ |
|
| 12 |
Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning |
提出稳定梯度方法以解决深度强化学习的规模化挑战 |
reinforcement learning deep reinforcement learning |
|
|
| 13 |
Reward Models in Deep Reinforcement Learning: A Survey |
综述深度强化学习中的奖励模型以优化策略 |
reinforcement learning deep reinforcement learning |
|
|
| 14 |
Minimizing Structural Vibrations via Guided Flow Matching Design Optimization |
提出基于引导流匹配的设计优化以减少结构振动 |
flow matching |
✅ |
|
| 15 |
Heterogeneous Federated Reinforcement Learning Using Wasserstein Barycenters |
提出基于Wasserstein重心的异构联邦强化学习算法 |
reinforcement learning |
|
|
| 16 |
CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy Optimization |
提出CAWR以解决离线强化学习中的数据腐蚀问题 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 17 |
Zero-Shot Reinforcement Learning Under Partial Observability |
提出基于记忆的零-shot强化学习以解决部分可观测性问题 |
reinforcement learning |
|
|
| 18 |
When and How Unlabeled Data Provably Improve In-Context Learning |
提出利用未标记数据提升上下文学习能力的方法 |
linear attention foundation model |
|
|