| 1 |
FedAFD: Multimodal Federated Learning via Adversarial Fusion and Distillation |
提出FedAFD,通过对抗融合与蒸馏实现更优的多模态联邦学习 |
distillation multimodal |
|
|
| 2 |
Diffusion Policy through Conditional Proximal Policy Optimization |
提出基于条件近端策略优化的扩散策略,提升强化学习中多模态行为建模能力。 |
reinforcement learning diffusion policy multimodal |
|
|
| 3 |
BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning |
BandPO:通过概率感知边界桥接信任域与比例裁剪,提升LLM强化学习稳定性 |
reinforcement learning PPO large language model |
|
|
| 4 |
WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation |
WavSLM:通过WavLM蒸馏实现单流语音语言建模 |
distillation large language model |
✅ |
|
| 5 |
Decoupling Task and Behavior: A Two-Stage Reward Curriculum in Reinforcement Learning for Robotics |
提出两阶段奖励课程以解决机器人强化学习中的奖励设计问题 |
reinforcement learning deep reinforcement learning |
|
|
| 6 |
Competitive Multi-Operator Reinforcement Learning for Joint Pricing and Fleet Rebalancing in AMoD Systems |
提出竞争性多智能体强化学习,解决AMoD系统中联合定价与车队再平衡问题 |
reinforcement learning policy learning |
|
|
| 7 |
Probabilistic Dreaming for World Models |
提出基于概率梦境的世界模型,提升强化学习样本效率与鲁棒性 |
world model dreamer |
|
|
| 8 |
Latent Wasserstein Adversarial Imitation Learning |
提出LWAIL,利用动态感知隐空间Wasserstein距离实现高效状态模仿学习 |
imitation learning |
|
|
| 9 |
On-Policy Self-Distillation for Reasoning Compression |
提出OPSDC,通过自蒸馏压缩推理模型,提升精度并减少token使用。 |
distillation |
|
|
| 10 |
Reward-Conditioned Reinforcement Learning |
提出奖励条件强化学习,解决单智能体适应多奖励目标问题 |
reinforcement learning |
|
|
| 11 |
$\nabla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space |
提出$
abla$-Reasoner,通过潜空间梯度下降优化LLM推理,提升数学推理能力。 |
reinforcement learning large language model |
|
|
| 12 |
Osmosis Distillation: Model Hijacking with the Fewest Samples |
提出Osmosis Distillation攻击,利用少量样本实现模型劫持。 |
distillation |
|
|
| 13 |
Why Is RLHF Alignment Shallow? A Gradient Analysis |
梯度分析揭示RLHF对齐的浅层性,并提出基于恢复惩罚的深度对齐方法 |
RLHF |
|
|
| 14 |
Distributional Reinforcement Learning with Information Bottleneck for Uncertainty-Aware DRAM Equalization |
提出基于信息瓶颈的分布强化学习,用于不确定性感知DRAM均衡。 |
reinforcement learning |
|
|
| 15 |
MIRACL: A Diverse Meta-Reinforcement Learning for Multi-Objective Multi-Echelon Combinatorial Supply Chain Optimisation |
提出MIRACL框架,用于解决多目标多层级组合供应链优化中的少样本泛化问题。 |
reinforcement learning |
|
|
| 16 |
Reinforcement Learning for Power-Flow Network Analysis |
提出基于强化学习的电力潮流网络分析方法,寻找多平衡点网络参数 |
reinforcement learning |
|
|
| 17 |
A Novel Hybrid Heuristic-Reinforcement Learning Optimization Approach for a Class of Railcar Shunting Problems |
提出混合启发式-强化学习算法,解决铁路货场调车场优化问题 |
reinforcement learning |
|
|