| 1 |
Pretrained Vision-Language-Action Models are Surprisingly Resistant to Forgetting in Continual Learning |
提出预训练视觉-语言-动作模型以解决持续学习中的遗忘问题 |
policy learning behavior cloning vision-language-action |
✅ |
|
| 2 |
Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks |
提出DMAST框架,提升多模态Web Agent在跨模态攻击下的鲁棒性与任务效率。 |
reinforcement learning imitation learning multimodal |
|
|
| 3 |
Architectural Proprioception in State Space Models: Thermodynamic Training Induces Anticipatory Halt Detection |
提出概率导航架构以提升状态空间模型的自我意识 |
SSM state space model zero-shot transfer |
|
|
| 4 |
What Does Flow Matching Bring To TD Learning? |
提出流匹配方法以提升时序差分学习效果 |
reinforcement learning flow matching |
|
|
| 5 |
Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation |
提出DSRM-HRL框架以解决交互推荐中的公平性问题 |
reinforcement learning reward shaping |
|
|
| 6 |
GIPO: Gaussian Importance Sampling Policy Optimization |
GIPO:基于高斯重要性采样策略优化,提升强化学习数据效率 |
reinforcement learning multimodal |
|
|
| 7 |
BD-Merging: Bias-Aware Dynamic Model Merging with Evidence-Guided Contrastive Learning |
提出BD-Merging,通过证据引导对比学习实现偏差感知的动态模型融合,提升模型在分布偏移下的鲁棒性。 |
contrastive learning |
|
|
| 8 |
Harmonic Dataset Distillation for Time Series Forecasting |
提出HDT,通过频域谐波匹配进行时间序列数据集蒸馏,提升泛化性和可扩展性 |
distillation |
|
|
| 9 |
A Constrained RL Approach for Cost-Efficient Delivery of Latency-Sensitive Applications |
提出基于约束强化学习的低成本、低延迟敏感型应用数据传输方案 |
reinforcement learning deep reinforcement learning |
|
|
| 10 |
Freezing of Gait Prediction using Proactive Agent that Learns from Selected Experience and DDQN Algorithm |
提出基于DDQN和经验回放的强化学习框架,用于帕金森患者步态冻结的预测。 |
reinforcement learning reward shaping |
|
|