cs.LG(2026-04-06)

📊 共 45 篇论文

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (19) 支柱九:具身大模型 (Embodied Foundation Models) (18) 支柱一:机器人控制 (Robot Control) (7) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (19 篇)

#题目一句话要点标签🔗
1 Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models 提出Hallucination-as-Cue框架,揭示RL后训练中幻觉对多模态推理模型的影响。 reinforcement learning large language model multimodal
2 Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs 在线DPO中,随机采样已难以超越:现代LLM主动选择的局限性 preference learning DPO direct preference optimization
3 Reinforcement Learning from Human Feedback: A Statistical Perspective 从统计视角解读人类反馈强化学习(RLHF)在LLM对齐中的应用 reinforcement learning RLHF direct preference optimization
4 Generalization Limits of Reinforcement Learning Alignment 针对RLHF对齐的泛化局限性,提出复合越狱攻击方法 reinforcement learning RLHF large language model
5 Accelerated Learning with Linear Temporal Logic using Differentiable Simulation 提出基于可微仿真的线性时序逻辑加速学习框架,解决强化学习中安全约束问题。 reinforcement learning differentiable simulation
6 Contextual Intelligence The Next Leap for Reinforcement Learning 提出上下文智能,提升强化学习在真实环境中的泛化能力 reinforcement learning zero-shot transfer
7 Prism: Policy Reuse via Interpretable Strategy Mapping in Reinforcement Learning PRISM:通过可解释策略映射实现强化学习中的策略复用 reinforcement learning zero-shot transfer
8 DSBD: Dual-Aligned Structural Basis Distillation for Graph Domain Adaptation 提出DSBD,通过双重对齐结构基础蒸馏解决图领域自适应问题 distillation geometric consistency
9 Diffusion Models as Dataset Distillation Priors 提出DAP:利用扩散模型先验提升数据集蒸馏的代表性,无需额外训练。 distillation foundation model
10 Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning Seer:面向快速同步LLM强化学习的在线上下文学习 reinforcement learning large language model
11 UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics UI-Oceanus:利用合成环境动力学扩展GUI智能体 world model world models predictive model
12 Mitigating Reward Hacking in RLHF via Advantage Sign Robustness 提出SignCert-PO算法,通过优势函数符号稳健性缓解RLHF中的奖励黑客问题 reinforcement learning RLHF
13 LLM Reasoning with Process Rewards for Outcome-Guided Steps 提出PROGRS框架以优化数学推理中的过程奖励 reinforcement learning large language model
14 Self-Distilled RLVR 提出RLSD,结合自蒸馏与RLVR,提升强化学习训练的稳定性和收敛上限 reinforcement learning distillation privileged information
15 Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions 提出基于动作条件均方根Q函数的局部强化学习方法,提升无反向传播RL算法性能。 reinforcement learning
16 Goal-Driven Reward by Video Diffusion Models for Reinforcement Learning 提出基于视频扩散模型的奖励函数,用于解决强化学习中奖励函数设计难题。 reinforcement learning
17 Pushing the Limits of Distillation-Based Continual Learning via Classifier-Proximal Lightweight Plugins 提出DLC插件式终身学习框架,解决蒸馏方法中的稳定性-可塑性困境 distillation
18 CRISP: Compressed Reasoning via Iterative Self-Policy Distillation CRISP:通过迭代自策略蒸馏压缩推理过程,提升模型效率与精度。 distillation
19 ERPO: Token-Level Entropy-Regulated Policy Optimization for Large Reasoning Models ERPO:针对大型推理模型的Token级熵正则化策略优化 reinforcement learning large language model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (18 篇)

#题目一句话要点标签🔗
20 LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning 提出LiME:一种轻量级混合专家模型,用于高效的多模态多任务学习。 multimodal
21 DrugPlayGround: Benchmarking Large Language Models and Embeddings for Drug Discovery DrugPlayGround:用于药物发现的大语言模型与嵌入基准测试框架 large language model
22 Fast NF4 Dequantization Kernels for Large Language Model Inference 针对大语言模型推理,提出快速NF4反量化核优化方案,提升性能。 large language model
23 Efficient Causal Graph Discovery Using Large Language Models 提出基于LLM和广度优先搜索的高效因果图发现框架,解决传统方法查询效率低下的问题。 large language model
24 Resting Neurons, Active Insights: Robustify Activation Sparsity for Large Language Models SPON:通过激活稀疏化提升大语言模型推理效率并保持精度 large language model
25 SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond SafeSci:构建科学领域大语言模型安全评估与提升的综合框架 large language model
26 JointFM-0.1: A Foundation Model for Multi-Target Joint Distributional Prediction 提出JointFM:用于多目标联合分布预测的通用基础模型 foundation model
27 AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation AdaHOP:通过感知异常模式的旋转实现快速准确的低精度训练 large language model
28 Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits 提出LLM初始化Bandit算法的理论分析框架,评估噪声和偏差对性能的影响 large language model
29 Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens 提出功能向量以解决大语言模型的引导与解码问题 large language model
30 Finding Belief Geometries with Sparse Autoencoders 利用稀疏自编码器发现Transformer中的信念几何结构 large language model
31 FluxMoE: Decoupling Expert Residency for High-Performance MoE Serving FluxMoE:解耦专家常驻性,提升MoE模型高性能推理服务 large language model
32 Structure-Aware Commitment Reduction for Network-Constrained Unit Commitment with Solver-Preserving Guarantees 提出结构感知承诺缩减框架,加速求解网络约束机组组合问题。 large language model
33 Co-Evolution of Policy and Internal Reward for Language Agents 提出Self-Guide,通过策略与内部奖励的共同进化,提升语言Agent在长程任务中的表现。 large language model
34 Backdoor Attacks on Decentralised Post-Training 提出针对分散式后训练语言模型的后门攻击,可有效降低模型对齐率。 large language model
35 ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization ARMOR:通过自适应矩阵分解实现高性能半结构化剪枝 large language model
36 Fast and Robust Simulation-Based Inference With Optimization Monte Carlo 提出基于优化蒙特卡洛的快速鲁棒模拟推断方法,提升复杂随机模拟器的贝叶斯参数推断效率。 multimodal
37 Textual Equilibrium Propagation for Deep Compound AI Systems 提出文本均衡传播(TEP)以解决深度复合AI系统中长程文本反馈的梯度消失/爆炸问题 large language model

🔬 支柱一:机器人控制 (Robot Control) (7 篇)

#题目一句话要点标签🔗
38 Hierarchical Planning with Latent World Models 提出基于分层潜在世界模型的规划方法,提升长时域机器人控制性能。 manipulation MPC model predictive control
39 Mitigating Data Scarcity in Spaceflight Applications for Offline Reinforcement Learning Using Physics-Informed Deep Generative Models 提出基于物理信息的深度生成模型MI-VAE,缓解航天应用中离线强化学习的数据稀缺问题。 sim-to-real reinforcement learning offline RL
40 VoxelCodeBench: Benchmarking 3D World Modeling Through Code Generation VoxelCodeBench:通过代码生成评估3D世界建模能力 manipulation world model world models
41 OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration 提出OPRIDE以解决离线偏好强化学习中的低查询效率问题 locomotion manipulation reinforcement learning
42 Beyond Semantic Manipulation: Token-Space Attacks on Reward Models 提出TOMPA框架,绕过语义空间直接攻击奖励模型,揭示RLHF的安全漏洞。 manipulation reinforcement learning RLHF
43 DRtool: An Interactive Tool for Analyzing High-Dimensional Clusterings DRtool:用于分析高维聚类结果的交互式工具,辅助识别伪聚类。 manipulation
44 Koopman-Based Nonlinear Identification and Adaptive Control of a Turbofan Engine 提出基于Koopman算子的自适应控制方法,实现涡轮风扇发动机的多变量鲁棒控制。 model predictive control

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
45 Privacy-Accuracy Trade-offs in High-Dimensional LASSO under Perturbation Mechanisms 针对高维LASSO,研究扰动机制下的隐私-精度权衡问题 AMP

⬅️ 返回 cs.LG 首页 · 🏠 返回主页