cs.LG(2025-09-26)

📊 共 39 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (22 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (11 🔗2) 支柱四:生成式动作 (Generative Motion) (3) 支柱一:机器人控制 (Robot Control) (3 🔗1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (22 篇)

#题目一句话要点标签🔗
1 Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective 提出基于约束马尔可夫决策过程的大语言模型蒸馏方法 reinforcement learning distillation large language model
2 Aurora: Towards Universal Generative Multimodal Time Series Forecasting Aurora:面向通用生成式多模态时间序列预测的基座模型 flow matching distillation foundation model
3 Learning the Neighborhood: Contrast-Free Multimodal Self-Supervised Molecular Graph Pretraining C-FREE:一种无对比多模态自监督分子图预训练方法,融合2D拓扑和3D结构信息。 representation learning multimodal
4 SpinGPT: A Large-Language-Model Approach to Playing Poker Correctly SpinGPT:一种基于大型语言模型解决德州扑克问题的方案 reinforcement learning large language model
5 Enriching Knowledge Distillation with Intra-Class Contrastive Learning 提出基于类内对比学习的知识蒸馏方法,提升软标签的信息丰富度 contrastive learning distillation
6 Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces 提出基于离散扩散策略的强化学习方法,解决组合动作空间难题 reinforcement learning diffusion policy
7 Adaptive Margin RLHF via Preference over Preferences 提出DPO-PoP,利用偏好之上的偏好信息自适应调整边际,提升RLHF性能。 reinforcement learning RLHF DPO
8 Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning SPEAR:基于自模仿学习和渐进探索的Agentic强化学习方法 reinforcement learning imitation learning reward shaping
9 Linear Causal Representation Learning by Topological Ordering, Pruning, and Disentanglement 提出一种基于拓扑排序、剪枝和解耦的线性因果表示学习方法 representation learning large language model
10 Context and Diversity Matter: The Emergence of In-Context Learning in World Models 提出上下文环境学习(ICEL)框架,提升世界模型在未知环境下的适应性。 world model embodied AI
11 Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs) 提出RealUID:一种通用的无GAN匹配模型逆向蒸馏框架,可利用真实数据加速生成。 flow matching distillation
12 In-Context Learning can Perform Continual Learning Like Humans 提出上下文持续学习(ICCL),实现类人长期记忆和跨任务知识积累。 Mamba linear attention large language model
13 Adaptive Dual-Mode Distillation with Incentive Schemes for Scalable, Heterogeneous Federated Learning on Non-IID Data 提出自适应双模式蒸馏与激励机制,解决非独立同分布数据下异构联邦学习的可扩展性问题。 distillation
14 RLP: Reinforcement as a Pretraining Objective 提出RLP:一种将强化学习作为预训练目标的方法,提升模型推理能力。 reinforcement learning chain-of-thought
15 A Theoretical Analysis of Discrete Flow Matching Generative Models 为离散流匹配生成模型提供理论分析,证明其收敛性 flow matching
16 Effective Policy Learning for Multi-Agent Online Coordination Beyond Submodular Objectives 提出MA-SPL和MA-MPL算法以解决多智能体在线协调问题 policy learning
17 EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning 提出EPO算法,解决LLM Agent在多轮稀疏奖励强化学习中的探索-利用崩溃问题 reinforcement learning
18 From Parameters to Behavior: Unsupervised Compression of the Policy Space 提出无监督方法压缩策略空间以提高深度强化学习效率 reinforcement learning deep reinforcement learning DRL
19 Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning 提出MASA自对齐强化学习,提升推理模型元认知能力与泛化性 reinforcement learning
20 Fairness-Aware Reinforcement Learning (FAReL): A Framework for Transparent and Balanced Sequential Decision-Making 提出FAReL框架,解决强化学习中性能与公平性的权衡问题,应用于招聘和欺诈检测。 reinforcement learning
21 Overclocking Electrostatic Generative Models 提出逆泊松流匹配以加速电静态生成模型 flow matching distillation
22 Triple-BERT: Do We Really Need MARL for Order Dispatch on Ride-Sharing Platforms? Triple-BERT:用于网约车订单调度的单智能体强化学习方法,性能优于多智能体强化学习。 reinforcement learning TD3

🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)

#题目一句话要点标签🔗
23 Fine-Grained Uncertainty Decomposition in Large Language Models: A Spectral Approach 提出Spectral Uncertainty,用于大语言模型中细粒度的不确定性分解 large language model
24 Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning 提出Ssiuu方法,通过抑制虚假反学习神经元实现语言模型鲁棒反学习 large language model instruction following
25 OptiMind: Teaching LLMs to Think Like Optimization Experts OptiMind:教导LLM像优化专家一样思考,提升混合整数线性规划建模精度 large language model
26 SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights SINQ:用于免校准低精度LLM权重 Sinkhorn 归一化量化 large language model
27 Boundary on the Table: Efficient Black-Box Decision-Based Attacks for Structured Data 针对表格数据的黑盒决策型对抗攻击方法,高效攻击结构化数据模型 large language model
28 What Do They Fix? LLM-Aided Categorization of Security Patches for Critical Memory Bugs DUALLM:利用LLM辅助识别Linux内核中关键内存漏洞的安全补丁 large language model
29 OFMU: Optimization-Driven Framework for Machine Unlearning 提出OFMU:一种优化驱动的机器学习遗忘框架,提升遗忘效果和模型效用。 large language model
30 Investigating Faithfulness in Large Audio Language Models 研究表明大型音频语言模型(LALM)的思维链(CoT)在一定程度上是可信的。 chain-of-thought
31 Stochastic activations 提出随机激活函数,提升大语言模型推理速度并增强生成文本多样性 large language model
32 HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space HEAPr:基于Hessian的输出空间高效原子专家剪枝方法 large language model
33 Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs 提出轻量级误差缓解策略,用于LLM后训练N:M激活稀疏化,提升推理效率。 large language model

🔬 支柱四:生成式动作 (Generative Motion) (3 篇)

#题目一句话要点标签🔗
34 MoveFM-R: Advancing Mobility Foundation Models via Language-driven Semantic Reasoning MoveFM-R:通过语言驱动的语义推理提升出行基础模型性能 physically plausible large language model foundation model
35 Physically Plausible Multi-System Trajectory Generation and Symmetry Discovery 提出SPS-GAN,用于多系统轨迹生成和对称性发现,无需先验知识并泛化到未见参数。 physically plausible
36 Reversible GNS for Dissipative Fluids with Consistent Bidirectional Dynamics 提出可逆图网络模拟器解决耗散流体的双向动态问题 physically plausible

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
37 ReLAM: Learning Anticipation Model for Rewarding Visual Robotic Manipulation 提出ReLAM,通过学习预测模型为视觉机器人操作生成奖励 manipulation reinforcement learning reward design
38 A Framework for Scalable Heterogeneous Multi-Agent Adversarial Reinforcement Learning in IsaacLab 扩展IsaacLab框架,实现异构多智能体对抗强化学习的可扩展训练 manipulation reinforcement learning
39 Observation-Free Attacks on Online Learning to Rank 提出针对在线排序学习的无观察攻击框架,提升目标项目排名并诱导线性遗憾。 manipulation

⬅️ 返回 cs.LG 首页 · 🏠 返回主页