cs.LG(2026-04-06)
📊 共 45 篇论文
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (19)
支柱九:具身大模型 (Embodied Foundation Models) (18)
支柱一:机器人控制 (Robot Control) (7)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (19 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models | 提出Hallucination-as-Cue框架,揭示RL后训练中幻觉对多模态推理模型的影响。 | reinforcement learning large language model multimodal | ||
| 2 | Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs | 在线DPO中,随机采样已难以超越:现代LLM主动选择的局限性 | preference learning DPO direct preference optimization | ||
| 3 | Reinforcement Learning from Human Feedback: A Statistical Perspective | 从统计视角解读人类反馈强化学习(RLHF)在LLM对齐中的应用 | reinforcement learning RLHF direct preference optimization | ||
| 4 | Generalization Limits of Reinforcement Learning Alignment | 针对RLHF对齐的泛化局限性,提出复合越狱攻击方法 | reinforcement learning RLHF large language model | ||
| 5 | Accelerated Learning with Linear Temporal Logic using Differentiable Simulation | 提出基于可微仿真的线性时序逻辑加速学习框架,解决强化学习中安全约束问题。 | reinforcement learning differentiable simulation | ||
| 6 | Contextual Intelligence The Next Leap for Reinforcement Learning | 提出上下文智能,提升强化学习在真实环境中的泛化能力 | reinforcement learning zero-shot transfer | ||
| 7 | Prism: Policy Reuse via Interpretable Strategy Mapping in Reinforcement Learning | PRISM:通过可解释策略映射实现强化学习中的策略复用 | reinforcement learning zero-shot transfer | ||
| 8 | DSBD: Dual-Aligned Structural Basis Distillation for Graph Domain Adaptation | 提出DSBD,通过双重对齐结构基础蒸馏解决图领域自适应问题 | distillation geometric consistency | ||
| 9 | Diffusion Models as Dataset Distillation Priors | 提出DAP:利用扩散模型先验提升数据集蒸馏的代表性,无需额外训练。 | distillation foundation model | ||
| 10 | Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning | Seer:面向快速同步LLM强化学习的在线上下文学习 | reinforcement learning large language model | ||
| 11 | UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics | UI-Oceanus:利用合成环境动力学扩展GUI智能体 | world model world models predictive model | ||
| 12 | Mitigating Reward Hacking in RLHF via Advantage Sign Robustness | 提出SignCert-PO算法,通过优势函数符号稳健性缓解RLHF中的奖励黑客问题 | reinforcement learning RLHF | ||
| 13 | LLM Reasoning with Process Rewards for Outcome-Guided Steps | 提出PROGRS框架以优化数学推理中的过程奖励 | reinforcement learning large language model | ||
| 14 | Self-Distilled RLVR | 提出RLSD,结合自蒸馏与RLVR,提升强化学习训练的稳定性和收敛上限 | reinforcement learning distillation privileged information | ||
| 15 | Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions | 提出基于动作条件均方根Q函数的局部强化学习方法,提升无反向传播RL算法性能。 | reinforcement learning | ||
| 16 | Goal-Driven Reward by Video Diffusion Models for Reinforcement Learning | 提出基于视频扩散模型的奖励函数,用于解决强化学习中奖励函数设计难题。 | reinforcement learning | ||
| 17 | Pushing the Limits of Distillation-Based Continual Learning via Classifier-Proximal Lightweight Plugins | 提出DLC插件式终身学习框架,解决蒸馏方法中的稳定性-可塑性困境 | distillation | ||
| 18 | CRISP: Compressed Reasoning via Iterative Self-Policy Distillation | CRISP:通过迭代自策略蒸馏压缩推理过程,提升模型效率与精度。 | distillation | ||
| 19 | ERPO: Token-Level Entropy-Regulated Policy Optimization for Large Reasoning Models | ERPO:针对大型推理模型的Token级熵正则化策略优化 | reinforcement learning large language model |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (18 篇)
🔬 支柱一:机器人控制 (Robot Control) (7 篇)
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 45 | Privacy-Accuracy Trade-offs in High-Dimensional LASSO under Perturbation Mechanisms | 针对高维LASSO,研究扰动机制下的隐私-精度权衡问题 | AMP |