| 1 |
LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers |
LiFT:利用基础模型作为教师的无监督强化学习,提升智能体语义行为学习能力 |
reinforcement learning large language model foundation model |
|
|
| 2 |
Global Rewards in Multi-Agent Deep Reinforcement Learning for Autonomous Mobility on Demand Systems |
提出基于全局奖励的多智能体深度强化学习算法,优化按需出行系统车辆调度 |
reinforcement learning deep reinforcement learning |
✅ |
|
| 3 |
Less is more -- the Dispatcher/ Executor principle for multi-task Reinforcement Learning |
提出Dispatcher/Executor原则,提升多任务强化学习泛化能力和数据效率 |
reinforcement learning |
|
|
| 4 |
RdimKD: Generic Distillation Paradigm by Dimensionality Reduction |
提出基于降维的通用知识蒸馏范式RdimKD,简化蒸馏流程并提升泛化性 |
distillation |
|
|
| 5 |
iOn-Profiler: intelligent Online multi-objective VNF Profiling with Reinforcement Learning |
提出iOn-Profiler,利用强化学习进行智能在线多目标VNF剖析,优化资源分配和性能。 |
reinforcement learning |
|
|
| 6 |
Vision-Language Models as a Source of Rewards |
利用视觉-语言模型作为强化学习的奖励来源,提升通用智能体能力 |
reinforcement learning generalist agent |
|
|
| 7 |
Personalized Path Recourse for Reinforcement Learning Agents |
提出个性化路径补救方法,为强化学习智能体生成目标导向的相似行为路径。 |
reinforcement learning |
|
|
| 8 |
Gradient Informed Proximal Policy Optimization |
提出梯度指导的近端策略优化算法,提升强化学习在可微环境中的性能 |
policy learning PPO |
✅ |
|