cs.LG(2025-09-27)

📊 共 39 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (20 🔗4) 支柱九:具身大模型 (Embodied Foundation Models) (16 🔗6) 支柱一:机器人控制 (Robot Control) (1) 支柱四:生成式动作 (Generative Motion) (1) 支柱五:交互与反应 (Interaction & Reaction) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (20 篇)

#题目一句话要点标签🔗
1 Decipher the Modality Gap in Multimodal Contrastive Learning: From Convergent Representations to Pairwise Alignment 理论分析多模态对比学习中的模态鸿沟,揭示维度坍塌是根本原因,并提出对齐方案。 contrastive learning multimodal
2 SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts SPEC-RL:通过推测性Rollout加速On-Policy强化学习,提升LLM推理效率。 reinforcement learning PPO large language model
3 General Exploratory Bonus for Optimistic Exploration in RLHF 提出通用探索奖励(GEB),解决RLHF中乐观探索的偏差问题。 reinforcement learning RLHF large language model
4 Causally-Enhanced Reinforcement Policy Optimization 提出因果增强策略优化(CE-PO),提升LLM推理的因果一致性和鲁棒性。 PPO reward shaping large language model
5 Knowledge distillation through geometry-aware representational alignment 提出基于几何感知的表征对齐知识蒸馏方法,提升语言模型性能。 distillation instruction following
6 CrystalGym: A New Benchmark for Materials Discovery Using Reinforcement Learning 提出 CrystalGym:一个用于强化学习材料发现的新基准测试环境 reinforcement learning large language model
7 Solve Smart, Not Often: Policy Learning for Costly MILP Re-solving 提出POC框架,通过策略学习优化代价敏感的MILP重求解问题 reinforcement learning policy learning
8 Unleashing Flow Policies with Distributional Critics 提出分布流Critic(DFC),增强离线强化学习中Flow Policy的性能 reinforcement learning flow matching multimodal
9 Factor Decorrelation Enhanced Data Removal from Deep Predictive Models 提出因子解耦增强的数据移除方法,提升深度预测模型在分布偏移下的性能。 predictive model
10 LOTFormer: Doubly-Stochastic Linear Attention via Low-Rank Optimal Transport LOTFormer:通过低秩最优传输实现双重随机线性注意力机制,提升长序列建模效率。 linear attention
11 Flow Matching for Robust Simulation-Based Inference under Model Misspecification 提出FMCPE框架,利用Flow Matching提升SBI在模型失配下的鲁棒性 flow matching
12 LLM Interpretability with Identifiable Temporal-Instantaneous Representation 提出可识别时序瞬时表示的LLM可解释性框架,提升概念关系发现能力 representation learning large language model
13 Two-Scale Latent Dynamics for Recurrent-Depth Transformers 提出基于二尺度潜在动态的循环深度Transformer,提升计算效率和性能。 latent dynamics
14 Towards Monotonic Improvement in In-Context Reinforcement Learning 提出上下文价值引导的ICRL方法,解决ICRL中单调改进的难题 reinforcement learning
15 Learning without Global Backpropagation via Synergistic Information Distillation 提出协同信息蒸馏(SID)框架,解决深度学习反向传播的扩展性瓶颈。 distillation
16 C$^2$GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning 提出C$^2$GSPG,解决强化学习推理模型中的过度自信问题,提升自知推理能力。 reinforcement learning large language model
17 Impute-MACFM: Imputation based on Mask-Aware Flow Matching 提出Impute-MACFM,基于掩码感知流匹配实现更鲁棒高效的表格数据插补,尤其适用于纵向数据。 flow matching
18 Tracing the Representation Geometry of Language Models from Pretraining to Post-training 提出几何表示追踪方法以揭示语言模型的复杂能力 DPO large language model
19 From Noise to Laws: Regularized Time-Series Forecasting via Denoised Dynamic Graphs PRISM:通过去噪动态图正则化时间序列预测,实现长期稳定预测 MAE physically plausible
20 Trust Region Reward Optimization and Proximal Inverse Reward Optimization Algorithm 提出信赖域奖励优化(TRRO)框架,解决逆强化学习中reward和policy联合学习的不稳定性问题。 reinforcement learning inverse reinforcement learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (16 篇)

#题目一句话要点标签🔗
21 Multifractal features of multimodal cardiac signals: Nonlinear dynamics of exercise recovery 利用多重分形特征分析多模态心音信号,研究运动后心脏恢复的非线性动力学 multimodal
22 Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models Quant-dLLM:面向扩散大语言模型的后训练极低比特量化框架 large language model
23 Demystifying Network Foundation Models 通过表征分析揭示网络基础模型的内在知识与局限性 foundation model
24 PT$^2$-LLM: Post-Training Ternarization for Large Language Models PT$^2$-LLM:面向大语言模型的后训练三值化框架,实现高效压缩与加速。 large language model
25 GuardNet: Graph-Attention Filtering for Jailbreak Defense in Large Language Models 提出GuardNet,通过图注意力过滤防御大型语言模型的越狱攻击 large language model
26 SDQ-LLM: Sigma-Delta Quantization for 1-bit LLMs of any size SDQ-LLM:面向任意规模LLM的Sigma-Delta量化,实现高效1比特量化。 large language model
27 Memory-Efficient Fine-Tuning via Low-Rank Activation Compression 提出LoRAct,通过低秩激活压缩实现高效的参数微调,显著降低内存占用。 foundation model
28 PATCH: Learnable Tile-level Hybrid Sparsity for LLMs PATCH:面向LLM的可学习瓦片级混合稀疏框架,实现精度与加速的平衡。 large language model
29 Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought 揭示连续思维链中叠加机制的涌现:通过训练动态分析Transformer的推理能力 large language model
30 Adaptive Token-Weighted Differential Privacy for LLMs: Not All Tokens Require Equal Protection 提出自适应Token加权差分隐私(ATDP)方法,加速LLM的差分隐私训练并提升敏感信息保护。 large language model
31 Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization 提出MR-GPTQ,针对FP4量化特性优化GPTQ算法,提升LLM推理性能。 large language model
32 ZeroSiam: An Efficient Siamese for Test-Time Entropy Optimization without Collapse 提出ZeroSiam,通过Siamese架构和熵优化解决测试时模型坍塌问题。 large language model
33 Decision Potential Surface: A Theoretical and Practical Approximation of LLM's Decision Boundary 提出决策势面(DPS)以近似大语言模型(LLM)的决策边界 large language model
34 Critique to Verify: Accurate and Honest Test-Time Scaling with RL-Trained Verifiers 提出Mirror-Critique框架,通过强化学习训练验证器,提升大语言模型测试时推理的准确性和可靠性。 large language model
35 TimeExpert: Boosting Long Time Series Forecasting with Temporal Mix of Experts 提出时间混合专家(TMOE)机制,提升长时序预测精度。 TAMP
36 RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility 提出RHYTHM框架以解决人类移动预测中的复杂依赖问题 large language model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
37 WirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement Learning WirelessMathLM:利用强化学习提升LLM在无线通信数学推理中的能力 manipulation reinforcement learning large language model

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
38 CREPE: Controlling Diffusion with Replica Exchange CREPE:利用副本交换控制扩散模型,实现推理时灵活引导。 classifier-free guidance

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
39 FedBit: Accelerating Privacy-Preserving Federated Learning via Bit-Interleaved Packing and Cross-Layer Co-Design FedBit:通过比特交织打包和跨层协同设计加速隐私保护联邦学习 OMOMO

⬅️ 返回 cs.LG 首页 · 🏠 返回主页