cs.LG(2025-09-30)

📊 共 69 篇论文 | 🔗 16 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (37 🔗10) 支柱二:RL算法与架构 (RL & Architecture) (27 🔗5) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱八:物理动画 (Physics-based Animation) (2) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (37 篇)

#题目一句话要点标签🔗
1 DecepChain: Inducing Deceptive Reasoning in Large Language Models DecepChain:诱导大语言模型产生欺骗性推理链的后门攻击 large language model chain-of-thought
2 MultiFair: Multimodal Balanced Fairness-Aware Medical Classification with Dual-Level Gradient Modulation 提出MultiFair,通过双层梯度调制解决多模态医学分类中的不平衡与公平性问题 multimodal
3 Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models 提出FreeDave算法,实现扩散大语言模型无损并行解码,显著提升推理速度。 large language model
4 Large Language Models Inference Engines based on Spiking Neural Networks 提出NeurTransformer,利用脉冲神经网络加速Transformer模型推理。 large language model
5 AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond AccidentBench:构建大规模多模态基准,评估车辆事故及其他安全场景下的理解与推理能力 multimodal
6 Memory-Driven Self-Improvement for Decision Making with Large Language Models 提出基于记忆驱动的自提升框架,提升LLM在序列决策任务中的性能 large language model
7 NeuroTTT: Bridging Pretraining-Downstream Task Misalignment in EEG Foundation Models via Test-Time Training 提出NeuroTTT,通过测试时训练桥接脑电图预训练模型与下游任务的错位问题 foundation model
8 MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning 提出MIDAS,通过不一致数据增强解决多模态不平衡学习问题 multimodal
9 Kairos: Towards Adaptive and Generalizable Time Series Foundation Models Kairos:面向自适应和泛化时间序列的通用模型框架 foundation model
10 Guiding Mixture-of-Experts with Temporal Multimodal Interactions 提出时序多模态交互引导的MoE架构,提升多模态模型性能与可解释性 multimodal
11 Layer-wise dynamic rank for compressing large language models 提出D-Rank:一种层级动态秩分配框架,用于压缩大型语言模型 large language model
12 Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training 揭示LLM视觉先验:通过语言预训练学习视觉感知与推理能力 large language model multimodal
13 ACT: Agentic Classification Tree 提出Agentic Classification Tree以解决AI决策透明性问题 large language model chain-of-thought
14 Attribution-Guided Decoding 提出基于归因引导的解码方法AGD,提升LLM指令遵循和知识准确性。 large language model instruction following
15 Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking 提出Expert Merging,通过无监督专家对齐和重要性引导的分层分块实现模型融合。 large language model multimodal
16 Adaptive and Resource-efficient Agentic AI Systems for Mobile and Embedded Devices: A Survey 针对移动和嵌入式设备,提出自适应且资源高效的Agentic AI系统综述 foundation model multimodal
17 LLM-Generated Samples for Android Malware Detection 利用LLM生成样本增强Android恶意软件检测,提升数据稀缺场景下的模型性能。 large language model
18 In-Context Curiosity: Distilling Exploration for Decision-Pretrained Transformers on Bandit Tasks 提出基于预测误差的好奇心驱动的离线预训练方法,提升决策Transformer在Bandit任务中的泛化性。 large language model
19 Which Programming Language and Model Work Best With LLM-as-a-Judge For Code Retrieval? 利用大型语言模型优化代码检索与注释生成 large language model
20 From Trace to Line: LLM Agent for Real-World OSS Vulnerability Localization 提出T2L-Agent,用于开源软件漏洞的精准行级定位与修复 large language model
21 DiSC-AMC: Token- and Parameter-Efficient Discretized Statistics In-Context Automatic Modulation Classification DiSC-AMC:面向自动调制分类,提出token和参数高效的离散化统计上下文学习方法 large language model
22 Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT 提出ACT-ViT,利用激活张量检测大语言模型中的幻觉问题 large language model
23 Predicting Effects, Missing Distributions: Evaluating LLMs as Human Behavior Simulators in Operations Management 评估LLM在运营管理中作为人类行为模拟器的能力:效果预测与分布对齐 chain-of-thought
24 The Pitfalls of KV Cache Compression 揭示KV缓存压缩在多指令场景下的缺陷,提出改进方案 instruction following
25 Thoughtbubbles: an Unsupervised Method for Parallel Thinking in Latent Space 提出Thoughtbubbles,一种在隐空间进行并行自适应计算的无监督Transformer方法。 chain-of-thought
26 LoRAFusion: Efficient LoRA Fine-Tuning for LLMs LoRAFusion:一种高效的LLM LoRA微调系统,通过内核融合和自适应批处理优化性能。 large language model
27 GRPO-$λ$: Credit Assignment improves LLM Reasoning GRPO-λ:通过改进信用分配提升大型语言模型推理能力 large language model
28 PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning PrunedLoRA:基于梯度鲁棒结构化剪枝的低秩自适应微调方法 large language model
29 Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls Transformer无法学习乘法?逆向工程揭示了长程依赖的陷阱 chain-of-thought
30 Estimating Dimensionality of Neural Representations from Finite Samples 提出一种偏差校正的维度估计器,用于解决神经表征维度估计中样本量依赖问题。 large language model
31 TASP: Topology-aware Sequence Parallelism TASP:一种拓扑感知的序列并行方法,提升长文本LLM在现代加速器上的通信效率。 large language model
32 AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size AdaBlock-dLLM:通过自适应块大小实现语义感知的扩散LLM推理 large language model
33 Are neural scaling laws leading quantum chemistry astray? 揭示神经标度律在量子化学中面临的挑战:单纯扩大模型和数据规模不足以保证可靠性 foundation model
34 Beyond Linear Probes: Dynamic Safety Monitoring for Language Models 提出截断多项式分类器,用于大语言模型动态安全监控,提升效率与安全性。 large language model
35 Muon Outperforms Adam in Tail-End Associative Memory Learning Muon优化器在长尾关联记忆学习中优于Adam,通过奇异谱分析揭示其优势 large language model
36 Better Privilege Separation for Agents by Restricting Data Types 提出基于数据类型限制的特权分离方法,系统性防御LLM中的Prompt注入攻击。 large language model
37 Rotation Control Unlearning: Quantifying and Controlling Continuous Unlearning for LLM with The Cognitive Rotation Space 提出旋转控制卸载学习(RCU),解决LLM持续卸载中的灾难性遗忘问题。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (27 篇)

#题目一句话要点标签🔗
38 Distillation of Large Language Models via Concrete Score Matching 提出Concrete Score Distillation,解决LLM蒸馏中logit信息损失和解空间限制问题 distillation large language model instruction following
39 Clip-Low Increases Entropy and Clip-High Decreases Entropy in Reinforcement Learning of Large Language Models 揭示PPO/GRPO中裁剪机制对LLM强化学习熵的影响,提出clip-low增加探索。 reinforcement learning PPO large language model
40 OPPO: Accelerating PPO-based RLHF via Pipeline Overlap OPPO:通过流水线重叠加速基于PPO的RLHF训练 reinforcement learning PPO RLHF
41 Thin Bridges for Drug Text Alignment: Lightweight Contrastive Learning for Target Specific Drug Retrieval 提出轻量级对比学习桥接方法,用于靶点特异性药物文本对齐与检索。 contrastive learning foundation model multimodal
42 Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models 提出递归自聚合(RSA)方法,提升大语言模型在推理时的深度思考能力。 reinforcement learning large language model
43 TAP: Two-Stage Adaptive Personalization of Multi-task and Multi-Modal Foundation Models in Federated Learning 提出TAP:联邦学习中多任务多模态基础模型的两阶段自适应个性化方法 distillation foundation model
44 CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models 提出CAST框架,实现大语言模型半结构化稀疏训练,提升推理效率。 distillation large language model
45 Data-to-Energy Stochastic Dynamics 提出数据到能量的随机动力学方法,解决无数据样本下的薛定谔桥问题。 reinforcement learning flow matching multimodal
46 Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners 提出TFPI,加速RLVR训练,提升推理模型效率与性能 reinforcement learning distillation chain-of-thought
47 Efficient On-Policy Reinforcement Learning via Exploration of Sparse Parameter Space 提出ExploRLer,通过探索稀疏参数空间提升On-Policy强化学习效率 reinforcement learning PPO
48 Less is More: Towards Simple Graph Contrastive Learning 提出一种简化的图对比学习方法,有效解决异质图上的表示学习问题。 representation learning contrastive learning
49 Boundary-to-Region Supervision for Offline Safe Reinforcement Learning 提出B2R框架,通过非对称条件作用解决离线安全强化学习中的约束满足问题 reinforcement learning
50 Clarification as Supervision: Reinforcement Learning for Vision-Language Interfaces 提出自适应澄清强化学习(AC-RL),提升视觉语言模型在视觉数学推理中的性能。 reinforcement learning
51 Reweighted Flow Matching via Unbalanced OT for Label-free Long-tailed Generation 提出UOT-RFM,通过非平衡最优传输和重加权解决无标签长尾生成问题 flow matching
52 Debunk the Myth of SFT Generalization 通过提示多样性和思维链,提升SFT在决策任务中的泛化能力 reinforcement learning chain-of-thought
53 Directed-MAML: Meta Reinforcement Learning Algorithm with Task-directed Approximation 提出Directed-MAML,通过任务导向近似加速元强化学习收敛并降低计算成本。 reinforcement learning
54 Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models AttnRL:基于注意力机制的强化学习框架,提升推理模型的过程监督探索效率 reinforcement learning large language model
55 Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning 提出条件奖励建模(CRM)以提升LLM推理能力,解决过程奖励模型的局限性。 reinforcement learning large language model
56 Extensions of Robbins-Siegmund Theorem with Applications in Reinforcement Learning 扩展Robbins-Siegmund定理,解决强化学习中非可和零阶项的收敛性分析难题 reinforcement learning
57 Alignment-Aware Decoding 提出对齐感知解码(AAD),在推理阶段提升大语言模型的对齐效果。 DPO large language model
58 RL-Guided Data Selection for Language Model Finetuning 提出基于强化学习的数据选择方法,提升大语言模型微调效率与性能。 reinforcement learning large language model
59 Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation Knapsack RL:通过优化预算分配解锁LLM的探索能力 reinforcement learning large language model
60 Learning to Reason as Action Abstractions with Scalable Mid-Training RL 提出RA3算法,通过可扩展的中期训练强化学习提升代码生成任务性能 reinforcement learning large language model
61 Improving Sampling Efficiency in RLVR through Adaptive Rollout and Response Reuse AR3PO:通过自适应Rollout和响应复用提升RLVR采样效率 reinforcement learning large language model
62 Differentiable Autoencoding Neural Operator for Interpretable and Integrable Latent Space Modeling 提出可微自编码神经算子DIANO,用于可解释和可集成的隐空间建模 latent dynamics spatiotemporal
63 Accelerating Transformers in Online RL 提出加速器策略训练Transformer,解决在线强化学习中Transformer训练不稳定问题。 reinforcement learning behavior cloning
64 Informed Asymmetric Actor-Critic: Leveraging Privileged Signals Beyond Full-State Access 提出Informed Asymmetric Actor-Critic,利用特权信号提升部分可观测环境下的强化学习。 reinforcement learning privileged information

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
65 Revoking Amnesia: RL-based Trajectory Optimization to Resurrect Erased Concepts in Diffusion Models 提出RevAm,通过强化学习优化扩散模型采样轨迹,恢复被擦除的概念 manipulation trajectory optimization
66 Noise-Guided Transport for Imitation Learning 提出噪声引导传输(NGT),解决低数据量模仿学习中的专家策略学习问题。 humanoid imitation learning

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
67 Parametric Neural Amp Modeling with Active Learning 提出Panama,利用主动学习训练参数化吉他放大器模型,逼近非参数模型效果。 AMP
68 Unsupervised Detection of Spatiotemporal Anomalies in PMU Data Using Transformer-Based BiGAN 提出T-BiGAN,用于电力系统PMU数据中时空异常的无监督检测。 spatiotemporal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
69 DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick 提出DiVeQ以解决向量量化中的梯度阻塞问题 VQ-VAE

⬅️ 返回 cs.LG 首页 · 🏠 返回主页