| 38 |
Distillation of Large Language Models via Concrete Score Matching |
提出Concrete Score Distillation,解决LLM蒸馏中logit信息损失和解空间限制问题 |
distillation large language model instruction following |
|
|
| 39 |
Clip-Low Increases Entropy and Clip-High Decreases Entropy in Reinforcement Learning of Large Language Models |
揭示PPO/GRPO中裁剪机制对LLM强化学习熵的影响,提出clip-low增加探索。 |
reinforcement learning PPO large language model |
|
|
| 40 |
OPPO: Accelerating PPO-based RLHF via Pipeline Overlap |
OPPO:通过流水线重叠加速基于PPO的RLHF训练 |
reinforcement learning PPO RLHF |
|
|
| 41 |
Thin Bridges for Drug Text Alignment: Lightweight Contrastive Learning for Target Specific Drug Retrieval |
提出轻量级对比学习桥接方法,用于靶点特异性药物文本对齐与检索。 |
contrastive learning foundation model multimodal |
|
|
| 42 |
Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models |
提出递归自聚合(RSA)方法,提升大语言模型在推理时的深度思考能力。 |
reinforcement learning large language model |
✅ |
|
| 43 |
TAP: Two-Stage Adaptive Personalization of Multi-task and Multi-Modal Foundation Models in Federated Learning |
提出TAP:联邦学习中多任务多模态基础模型的两阶段自适应个性化方法 |
distillation foundation model |
✅ |
|
| 44 |
CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models |
提出CAST框架,实现大语言模型半结构化稀疏训练,提升推理效率。 |
distillation large language model |
|
|
| 45 |
Data-to-Energy Stochastic Dynamics |
提出数据到能量的随机动力学方法,解决无数据样本下的薛定谔桥问题。 |
reinforcement learning flow matching multimodal |
✅ |
|
| 46 |
Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners |
提出TFPI,加速RLVR训练,提升推理模型效率与性能 |
reinforcement learning distillation chain-of-thought |
|
|
| 47 |
Efficient On-Policy Reinforcement Learning via Exploration of Sparse Parameter Space |
提出ExploRLer,通过探索稀疏参数空间提升On-Policy强化学习效率 |
reinforcement learning PPO |
|
|
| 48 |
Less is More: Towards Simple Graph Contrastive Learning |
提出一种简化的图对比学习方法,有效解决异质图上的表示学习问题。 |
representation learning contrastive learning |
|
|
| 49 |
Boundary-to-Region Supervision for Offline Safe Reinforcement Learning |
提出B2R框架,通过非对称条件作用解决离线安全强化学习中的约束满足问题 |
reinforcement learning |
✅ |
|
| 50 |
Clarification as Supervision: Reinforcement Learning for Vision-Language Interfaces |
提出自适应澄清强化学习(AC-RL),提升视觉语言模型在视觉数学推理中的性能。 |
reinforcement learning |
|
|
| 51 |
Reweighted Flow Matching via Unbalanced OT for Label-free Long-tailed Generation |
提出UOT-RFM,通过非平衡最优传输和重加权解决无标签长尾生成问题 |
flow matching |
|
|
| 52 |
Debunk the Myth of SFT Generalization |
通过提示多样性和思维链,提升SFT在决策任务中的泛化能力 |
reinforcement learning chain-of-thought |
✅ |
|
| 53 |
Directed-MAML: Meta Reinforcement Learning Algorithm with Task-directed Approximation |
提出Directed-MAML,通过任务导向近似加速元强化学习收敛并降低计算成本。 |
reinforcement learning |
|
|
| 54 |
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models |
AttnRL:基于注意力机制的强化学习框架,提升推理模型的过程监督探索效率 |
reinforcement learning large language model |
|
|
| 55 |
Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning |
提出条件奖励建模(CRM)以提升LLM推理能力,解决过程奖励模型的局限性。 |
reinforcement learning large language model |
|
|
| 56 |
Extensions of Robbins-Siegmund Theorem with Applications in Reinforcement Learning |
扩展Robbins-Siegmund定理,解决强化学习中非可和零阶项的收敛性分析难题 |
reinforcement learning |
|
|
| 57 |
Alignment-Aware Decoding |
提出对齐感知解码(AAD),在推理阶段提升大语言模型的对齐效果。 |
DPO large language model |
|
|
| 58 |
RL-Guided Data Selection for Language Model Finetuning |
提出基于强化学习的数据选择方法,提升大语言模型微调效率与性能。 |
reinforcement learning large language model |
|
|
| 59 |
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation |
Knapsack RL:通过优化预算分配解锁LLM的探索能力 |
reinforcement learning large language model |
|
|
| 60 |
Learning to Reason as Action Abstractions with Scalable Mid-Training RL |
提出RA3算法,通过可扩展的中期训练强化学习提升代码生成任务性能 |
reinforcement learning large language model |
|
|
| 61 |
Improving Sampling Efficiency in RLVR through Adaptive Rollout and Response Reuse |
AR3PO:通过自适应Rollout和响应复用提升RLVR采样效率 |
reinforcement learning large language model |
|
|
| 62 |
Differentiable Autoencoding Neural Operator for Interpretable and Integrable Latent Space Modeling |
提出可微自编码神经算子DIANO,用于可解释和可集成的隐空间建模 |
latent dynamics spatiotemporal |
|
|
| 63 |
Accelerating Transformers in Online RL |
提出加速器策略训练Transformer,解决在线强化学习中Transformer训练不稳定问题。 |
reinforcement learning behavior cloning |
|
|
| 64 |
Informed Asymmetric Actor-Critic: Leveraging Privileged Signals Beyond Full-State Access |
提出Informed Asymmetric Actor-Critic,利用特权信号提升部分可观测环境下的强化学习。 |
reinforcement learning privileged information |
|
|