cs.LG（2026-04-06）

📊 共 45 篇论文

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (19) 支柱九：具身大模型 (Embodied Foundation Models) (18) 支柱一：机器人控制 (Robot Control) (7) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (19 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models	提出Hallucination-as-Cue框架，揭示RL后训练中幻觉对多模态推理模型的影响。	reinforcement learning large language model multimodal
2	Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs	在线DPO中，随机采样已难以超越：现代LLM主动选择的局限性	preference learning DPO direct preference optimization
3	Reinforcement Learning from Human Feedback: A Statistical Perspective	从统计视角解读人类反馈强化学习(RLHF)在LLM对齐中的应用	reinforcement learning RLHF direct preference optimization
4	Generalization Limits of Reinforcement Learning Alignment	针对RLHF对齐的泛化局限性，提出复合越狱攻击方法	reinforcement learning RLHF large language model
5	Accelerated Learning with Linear Temporal Logic using Differentiable Simulation	提出基于可微仿真的线性时序逻辑加速学习框架，解决强化学习中安全约束问题。	reinforcement learning differentiable simulation
6	Contextual Intelligence The Next Leap for Reinforcement Learning	提出上下文智能，提升强化学习在真实环境中的泛化能力	reinforcement learning zero-shot transfer
7	Prism: Policy Reuse via Interpretable Strategy Mapping in Reinforcement Learning	PRISM：通过可解释策略映射实现强化学习中的策略复用	reinforcement learning zero-shot transfer
8	DSBD: Dual-Aligned Structural Basis Distillation for Graph Domain Adaptation	提出DSBD，通过双重对齐结构基础蒸馏解决图领域自适应问题	distillation geometric consistency
9	Diffusion Models as Dataset Distillation Priors	提出DAP：利用扩散模型先验提升数据集蒸馏的代表性，无需额外训练。	distillation foundation model
10	Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning	Seer：面向快速同步LLM强化学习的在线上下文学习	reinforcement learning large language model
11	UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics	UI-Oceanus：利用合成环境动力学扩展GUI智能体	world model world models predictive model
12	Mitigating Reward Hacking in RLHF via Advantage Sign Robustness	提出SignCert-PO算法，通过优势函数符号稳健性缓解RLHF中的奖励黑客问题	reinforcement learning RLHF
13	LLM Reasoning with Process Rewards for Outcome-Guided Steps	提出PROGRS框架以优化数学推理中的过程奖励	reinforcement learning large language model
14	Self-Distilled RLVR	提出RLSD，结合自蒸馏与RLVR，提升强化学习训练的稳定性和收敛上限	reinforcement learning distillation privileged information
15	Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions	提出基于动作条件均方根Q函数的局部强化学习方法，提升无反向传播RL算法性能。	reinforcement learning
16	Goal-Driven Reward by Video Diffusion Models for Reinforcement Learning	提出基于视频扩散模型的奖励函数，用于解决强化学习中奖励函数设计难题。	reinforcement learning
17	Pushing the Limits of Distillation-Based Continual Learning via Classifier-Proximal Lightweight Plugins	提出DLC插件式终身学习框架，解决蒸馏方法中的稳定性-可塑性困境	distillation
18	CRISP: Compressed Reasoning via Iterative Self-Policy Distillation	CRISP：通过迭代自策略蒸馏压缩推理过程，提升模型效率与精度。	distillation
19	ERPO: Token-Level Entropy-Regulated Policy Optimization for Large Reasoning Models	ERPO：针对大型推理模型的Token级熵正则化策略优化	reinforcement learning large language model

🔬 支柱九：具身大模型 (Embodied Foundation Models) (18 篇)

#	题目	一句话要点	标签	🔗	⭐
20	LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning	提出LiME：一种轻量级混合专家模型，用于高效的多模态多任务学习。	multimodal
21	DrugPlayGround: Benchmarking Large Language Models and Embeddings for Drug Discovery	DrugPlayGround：用于药物发现的大语言模型与嵌入基准测试框架	large language model
22	Fast NF4 Dequantization Kernels for Large Language Model Inference	针对大语言模型推理，提出快速NF4反量化核优化方案，提升性能。	large language model
23	Efficient Causal Graph Discovery Using Large Language Models	提出基于LLM和广度优先搜索的高效因果图发现框架，解决传统方法查询效率低下的问题。	large language model
24	Resting Neurons, Active Insights: Robustify Activation Sparsity for Large Language Models	SPON：通过激活稀疏化提升大语言模型推理效率并保持精度	large language model
25	SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond	SafeSci：构建科学领域大语言模型安全评估与提升的综合框架	large language model
26	JointFM-0.1: A Foundation Model for Multi-Target Joint Distributional Prediction	提出JointFM：用于多目标联合分布预测的通用基础模型	foundation model
27	AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation	AdaHOP：通过感知异常模式的旋转实现快速准确的低精度训练	large language model
28	Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits	提出LLM初始化Bandit算法的理论分析框架，评估噪声和偏差对性能的影响	large language model
29	Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens	提出功能向量以解决大语言模型的引导与解码问题	large language model
30	Finding Belief Geometries with Sparse Autoencoders	利用稀疏自编码器发现Transformer中的信念几何结构	large language model
31	FluxMoE: Decoupling Expert Residency for High-Performance MoE Serving	FluxMoE：解耦专家常驻性，提升MoE模型高性能推理服务	large language model
32	Structure-Aware Commitment Reduction for Network-Constrained Unit Commitment with Solver-Preserving Guarantees	提出结构感知承诺缩减框架，加速求解网络约束机组组合问题。	large language model
33	Co-Evolution of Policy and Internal Reward for Language Agents	提出Self-Guide，通过策略与内部奖励的共同进化，提升语言Agent在长程任务中的表现。	large language model
34	Backdoor Attacks on Decentralised Post-Training	提出针对分散式后训练语言模型的后门攻击，可有效降低模型对齐率。	large language model
35	ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization	ARMOR：通过自适应矩阵分解实现高性能半结构化剪枝	large language model
36	Fast and Robust Simulation-Based Inference With Optimization Monte Carlo	提出基于优化蒙特卡洛的快速鲁棒模拟推断方法，提升复杂随机模拟器的贝叶斯参数推断效率。	multimodal
37	Textual Equilibrium Propagation for Deep Compound AI Systems	提出文本均衡传播(TEP)以解决深度复合AI系统中长程文本反馈的梯度消失/爆炸问题	large language model

🔬 支柱一：机器人控制 (Robot Control) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
38	Hierarchical Planning with Latent World Models	提出基于分层潜在世界模型的规划方法，提升长时域机器人控制性能。	manipulation MPC model predictive control
39	Mitigating Data Scarcity in Spaceflight Applications for Offline Reinforcement Learning Using Physics-Informed Deep Generative Models	提出基于物理信息的深度生成模型MI-VAE，缓解航天应用中离线强化学习的数据稀缺问题。	sim-to-real reinforcement learning offline RL
40	VoxelCodeBench: Benchmarking 3D World Modeling Through Code Generation	VoxelCodeBench：通过代码生成评估3D世界建模能力	manipulation world model world models
41	OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration	提出OPRIDE以解决离线偏好强化学习中的低查询效率问题	locomotion manipulation reinforcement learning
42	Beyond Semantic Manipulation: Token-Space Attacks on Reward Models	提出TOMPA框架，绕过语义空间直接攻击奖励模型，揭示RLHF的安全漏洞。	manipulation reinforcement learning RLHF
43	DRtool: An Interactive Tool for Analyzing High-Dimensional Clusterings	DRtool：用于分析高维聚类结果的交互式工具，辅助识别伪聚类。	manipulation
44	Koopman-Based Nonlinear Identification and Adaptive Control of a Turbofan Engine	提出基于Koopman算子的自适应控制方法，实现涡轮风扇发动机的多变量鲁棒控制。	model predictive control

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
45	Privacy-Accuracy Trade-offs in High-Dimensional LASSO under Perturbation Mechanisms	针对高维LASSO，研究扰动机制下的隐私-精度权衡问题	AMP

⬅️ 返回 cs.LG 首页 · 🏠 返回主页