| 1 |
MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning |
提出MOBODY以解决离线强化学习中的动态不匹配问题 |
reinforcement learning policy learning offline RL |
|
|
| 2 |
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning |
提出自我意识弱点驱动的问题合成框架以提升强化学习效果 |
reinforcement learning distillation large language model |
|
|
| 3 |
How to Provably Improve Return Conditioned Supervised Learning? |
提出强化回报条件监督学习以解决现有方法性能限制问题 |
reinforcement learning policy learning offline RL |
|
|
| 4 |
Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search |
提出基于原子事实增强和前瞻搜索的LLM代理规划方法 |
world model large language model |
|
|
| 5 |
Time-Aware World Model for Adaptive Prediction and Control |
提出时间感知世界模型以解决控制任务中的动态预测问题 |
world model |
|
|