| 11 |
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale |
提出RADLADS以快速转换软max注意力变换器为线性解码模型 |
linear attention distillation |
✅ |
|
| 12 |
RM-R1: Reward Modeling as Reasoning |
提出基于推理的奖励建模方法以提升模型性能 |
reinforcement learning distillation large language model |
✅ |
|
| 13 |
A Survey on Progress in LLM Alignment from the Perspective of Reward Design |
提出奖励设计框架以提升大语言模型的对齐能力 |
reinforcement learning reward design large language model |
|
|
| 14 |
Generative Sign-description Prompts with Multi-positive Contrastive Learning for Sign Language Recognition |
提出GSP-MC方法以解决手语识别中的标注准确性问题 |
contrastive learning large language model |
|
|
| 15 |
EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning |
提出EMORL框架以解决多目标强化学习的效率与灵活性问题 |
reinforcement learning large language model |
|
|
| 16 |
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards |
提出奖励模型与学习策略以优化大语言模型的学习过程 |
reinforcement learning RLHF DPO |
✅ |
|
| 17 |
JTCSE: Joint Tensor-Modulus Constraints and Cross-Attention for Unsupervised Contrastive Learning of Sentence Embeddings |
提出JTCSE框架以增强无监督对比学习的句子嵌入 |
contrastive learning distillation |
|
|
| 18 |
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning |
提出SIMPLEMIX以解决语言模型偏好学习中的数据混合问题 |
preference learning DPO |
|
|