| 11 |
Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning |
提出动态适应世界模型以增强离线模型强化学习的鲁棒性 |
reinforcement learning policy learning offline reinforcement learning |
|
|
| 12 |
Modular Diffusion Policy Training: Decoupling and Recombining Guidance and Diffusion for Offline RL |
提出模块化扩散策略训练以优化离线强化学习 |
reinforcement learning offline RL diffusion policy |
|
|
| 13 |
Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning |
提出TempDATA以解决离线强化学习中的稀疏奖励问题 |
reinforcement learning offline reinforcement learning model-based RL |
|
|
| 14 |
HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity |
构建HR-VILAGE-3K3M数据集以解决呼吸病毒免疫研究中的数据不足问题 |
predictive model foundation model multimodal |
|
|
| 15 |
4Hammer: a board-game reinforcement learning environment for the hour long time frame |
提出4Hammer以解决长时间框架下强化学习环境不足问题 |
reinforcement learning large language model |
|
|
| 16 |
Mean Flows for One-step Generative Modeling |
提出MeanFlow模型以解决一阶段生成建模问题 |
flow matching curriculum learning distillation |
|
|
| 17 |
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs |
分析强化学习后训练在大语言模型中的结构假设 |
reinforcement learning large language model |
|
|
| 18 |
Optimizing Anytime Reasoning via Budget Relative Policy Optimization |
提出AnytimeReasoner以优化大语言模型的即时推理能力 |
reinforcement learning large language model |
|
|
| 19 |
One-Step Offline Distillation of Diffusion-based Models via Koopman Modeling |
提出基于Koopman建模的一步离线蒸馏方法以提升扩散模型效率 |
distillation |
|
|