| 1 |
The Hidden Link Between RLHF and Contrastive Learning |
提出互信息优化方法以提升人类反馈强化学习效果 |
reinforcement learning RLHF DPO |
|
|
| 2 |
Hyper-modal Imputation Diffusion Embedding with Dual-Distillation for Federated Multimodal Knowledge Graph Completion |
提出MMFeD3-HidE以解决联邦多模态知识图谱补全问题 |
distillation multimodal |
|
|
| 3 |
Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting |
提出频率对齐知识蒸馏以解决轻量级时空预测问题 |
MAE distillation spatiotemporal |
✅ |
|
| 4 |
TROFI: Trajectory-Ranked Offline Inverse Reinforcement Learning |
提出TROFI以解决离线强化学习中的奖励函数缺失问题 |
reinforcement learning offline reinforcement learning inverse reinforcement learning |
|
|
| 5 |
EFRame: Deeper Reasoning via Exploration-Filter-Replay Reinforcement Learning Framework |
提出EFRame框架以解决GRPO在复杂推理任务中的不足 |
reinforcement learning PPO large language model |
✅ |
|
| 6 |
Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training |
提出层重要性分析以优化数学推理能力 |
reinforcement learning distillation large language model |
|
|
| 7 |
TOAST: Task-Oriented Adaptive Semantic Transmission over Dynamic Wireless Environments |
提出TOAST框架以解决动态无线环境中的多任务优化问题 |
reinforcement learning deep reinforcement learning PULSE |
|
|
| 8 |
Reinforcement Learning with Physics-Informed Symbolic Program Priors for Zero-Shot Wireless Indoor Navigation |
提出物理信息符号程序先验的强化学习框架以解决零样本室内导航问题 |
reinforcement learning |
|
|
| 9 |
SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model |
提出SceneDiffuser++以解决城市规模交通模拟问题 |
world model |
|
|
| 10 |
MetaCipher: A Time-Persistent and Universal Multi-Agent Framework for Cipher-Based Jailbreak Attacks for LLMs |
提出MetaCipher以解决LLMs的低成本多代理越狱攻击问题 |
reinforcement learning large language model |
|
|
| 11 |
Smooth-Distill: A Self-distillation Framework for Multitask Learning with Wearable Sensor Data |
提出Smooth-Distill框架以解决可穿戴传感器数据的多任务学习问题 |
distillation |
✅ |
|
| 12 |
Advancements and Challenges in Continual Reinforcement Learning: A Comprehensive Review |
综述持续强化学习的进展与挑战,推动动态学习能力提升 |
reinforcement learning |
|
|
| 13 |
A Survey of Continual Reinforcement Learning |
提出持续强化学习方法以解决动态环境中的知识保持问题 |
reinforcement learning |
|
|
| 14 |
Unfolding Generative Flows with Koopman Operators: Fast and Interpretable Sampling |
提出基于Koopman算子的生成流展开方法以加速采样 |
flow matching distillation |
|
|