cs.LG(2026-03-05)

📊 共 32 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (17 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (8) 支柱一:机器人控制 (Robot Control) (4) 支柱八:物理动画 (Physics-based Animation) (2) 支柱五:交互与反应 (Interaction & Reaction) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (17 篇)

#题目一句话要点标签🔗
1 FedAFD: Multimodal Federated Learning via Adversarial Fusion and Distillation 提出FedAFD,通过对抗融合与蒸馏实现更优的多模态联邦学习 distillation multimodal
2 Diffusion Policy through Conditional Proximal Policy Optimization 提出基于条件近端策略优化的扩散策略,提升强化学习中多模态行为建模能力。 reinforcement learning diffusion policy multimodal
3 BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning BandPO:通过概率感知边界桥接信任域与比例裁剪,提升LLM强化学习稳定性 reinforcement learning PPO large language model
4 WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation WavSLM:通过WavLM蒸馏实现单流语音语言建模 distillation large language model
5 Decoupling Task and Behavior: A Two-Stage Reward Curriculum in Reinforcement Learning for Robotics 提出两阶段奖励课程以解决机器人强化学习中的奖励设计问题 reinforcement learning deep reinforcement learning
6 Competitive Multi-Operator Reinforcement Learning for Joint Pricing and Fleet Rebalancing in AMoD Systems 提出竞争性多智能体强化学习,解决AMoD系统中联合定价与车队再平衡问题 reinforcement learning policy learning
7 Probabilistic Dreaming for World Models 提出基于概率梦境的世界模型,提升强化学习样本效率与鲁棒性 world model dreamer
8 Latent Wasserstein Adversarial Imitation Learning 提出LWAIL,利用动态感知隐空间Wasserstein距离实现高效状态模仿学习 imitation learning
9 On-Policy Self-Distillation for Reasoning Compression 提出OPSDC,通过自蒸馏压缩推理模型,提升精度并减少token使用。 distillation
10 Reward-Conditioned Reinforcement Learning 提出奖励条件强化学习,解决单智能体适应多奖励目标问题 reinforcement learning
11 $\nabla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space 提出$ abla$-Reasoner,通过潜空间梯度下降优化LLM推理,提升数学推理能力。 reinforcement learning large language model
12 Osmosis Distillation: Model Hijacking with the Fewest Samples 提出Osmosis Distillation攻击,利用少量样本实现模型劫持。 distillation
13 Why Is RLHF Alignment Shallow? A Gradient Analysis 梯度分析揭示RLHF对齐的浅层性,并提出基于恢复惩罚的深度对齐方法 RLHF
14 Distributional Reinforcement Learning with Information Bottleneck for Uncertainty-Aware DRAM Equalization 提出基于信息瓶颈的分布强化学习,用于不确定性感知DRAM均衡。 reinforcement learning
15 MIRACL: A Diverse Meta-Reinforcement Learning for Multi-Objective Multi-Echelon Combinatorial Supply Chain Optimisation 提出MIRACL框架,用于解决多目标多层级组合供应链优化中的少样本泛化问题。 reinforcement learning
16 Reinforcement Learning for Power-Flow Network Analysis 提出基于强化学习的电力潮流网络分析方法,寻找多平衡点网络参数 reinforcement learning
17 A Novel Hybrid Heuristic-Reinforcement Learning Optimization Approach for a Class of Railcar Shunting Problems 提出混合启发式-强化学习算法,解决铁路货场调车场优化问题 reinforcement learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
18 POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation POET-X:通过缩放正交变换实现内存高效的大语言模型训练 large language model
19 Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation 利用审查LLM作为测试平台,探索秘密知识诱导方法 large language model
20 Aura: Universal Multi-dimensional Exogenous Integration for Aviation Time Series Aura:针对航空时间序列,提出通用多维外部信息融合框架。 multimodal
21 RepoLaunch: Automating Build&Test Pipeline of Code Repositories on ANY Language and ANY Platform RepoLaunch:自动化任意语言和平台代码仓库的构建与测试流程 large language model
22 U-Parking: Distributed UWB-Assisted Autonomous Parking System with Robust Localization and Intelligent Planning U-Parking:基于UWB的分布式自主泊车系统,实现鲁棒定位与智能规划 large language model
23 SLO-Aware Compute Resource Allocation for Prefill-Decode Disaggregated LLM Inference 提出一种面向SLO的预填充-解码分离LLM推理计算资源分配方法 large language model
24 Parallelization Strategies for Dense LLM Deployment: Navigating Through Application-Specific Tradeoffs and Bottlenecks 针对稠密LLM部署,研究并行化策略在延迟与吞吐量间的权衡与瓶颈 large language model
25 Aligning the True Semantics: Constrained Decoupling and Distribution Sampling for Cross-Modal Alignment 提出CDDS,通过约束解耦和分布采样实现更精准的跨模态语义对齐。 multimodal

🔬 支柱一:机器人控制 (Robot Control) (4 篇)

#题目一句话要点标签🔗
26 Good-Enough LLM Obfuscation (GELO) 提出GELO,通过动态混淆隐藏状态,保护大语言模型在共享加速器上的推理隐私。 MPC large language model
27 Good-Enough LLM Obfuscation (GELO) GELO:一种轻量级LLM混淆方法,保护共享加速器上的prompt隐私 MPC large language model
28 Bias In, Bias Out? Finding Unbiased Subnetworks in Vanilla Models 提出BISE方法,从普通训练模型中提取无偏子网络,提升公平性。 manipulation
29 Identifying Adversary Characteristics from an Observed Attack 提出一种领域无关框架,通过观测到的攻击识别攻击者的特征,提升防御效果。 manipulation

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
30 Spatiotemporal Heterogeneity of AI-Driven Traffic Flow Patterns and Land Use Interaction: A GeoAI-Based Analysis of Multimodal Urban Mobility 提出GeoAI混合框架,用于建模城市交通流时空异质性及土地利用交互。 spatiotemporal multimodal
31 On the Value of Tokeniser Pretraining in Physics Foundation Models 提出物理基础模型Tokeniser预训练方法,提升物理模拟精度和效率。 spatiotemporal foundation model

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
32 Balancing Privacy-Quality-Efficiency in Federated Learning through Round-Based Interleaving of Protection Techniques 提出Alt-FL框架,通过轮次交织隐私技术平衡联邦学习中的隐私、质量和效率 OMOMO

⬅️ 返回 cs.LG 首页 · 🏠 返回主页