| 1 |
Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO |
提出细致理论分析以理解RLHF与DPO间的性能差距 |
reinforcement learning preference learning RLHF |
|
|
| 2 |
Alignment of large language models with constrained learning |
提出基于拉格朗日对偶的迭代方法以解决约束对齐问题 |
RLHF large language model |
|
|
| 3 |
Learning a Pessimistic Reward Model in RLHF |
提出PET方法以解决离线RLHF中的奖励黑客问题 |
reinforcement learning offline reinforcement learning RLHF |
|
|
| 4 |
Rotary Masked Autoencoders are Versatile Learners |
提出Rotary Masked Autoencoder以解决时间序列学习问题 |
representation learning masked autoencoder MAE |
|
|
| 5 |
JEDI: Latent End-to-end Diffusion Mitigates Agent-Human Performance Asymmetry in Model-Based Reinforcement Learning |
提出JEDI以解决模型基础强化学习中的人机性能不对称问题 |
reinforcement learning world model |
|
|
| 6 |
The Limits of Preference Data for Post-Training |
研究偏好数据对后训练优化的限制及其影响 |
reinforcement learning RLHF large language model |
|
|
| 7 |
An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning |
提出可解释的神经退行性痴呆诊断框架以提升诊断透明度 |
reinforcement learning distillation large language model |
|
|
| 8 |
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning |
提出DISCOVER以解决稀疏奖励强化学习中的探索问题 |
reinforcement learning |
|
|
| 9 |
Equivariant Representation Learning for Symmetry-Aware Inference with Guarantees |
提出对称感知推理的等变表示学习框架 |
representation learning |
|
|
| 10 |
Refining Few-Step Text-to-Multiview Diffusion via Reinforcement Learning |
提出强化学习框架以优化少步文本到多视图扩散模型 |
reinforcement learning |
|
|
| 11 |
The challenge of hidden gifts in multi-agent reinforcement learning |
提出解决多智能体强化学习中隐藏礼物问题的新方法 |
reinforcement learning |
|
|
| 12 |
Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning |
提出贝叶斯自适应强化学习方法以增强LLM的反思探索能力 |
reinforcement learning large language model |
✅ |
|
| 13 |
Characterizing Pattern Matching and Its Limits on Compositional Task Structures |
提出模式匹配的形式化框架以解决组合任务中的泛化问题 |
Mamba chain-of-thought |
|
|
| 14 |
ESLM: Risk-Averse Selective Language Modeling for Efficient Pretraining |
提出ESLM以提高大语言模型预训练的效率与鲁棒性 |
distillation large language model |
|
|