cs.LG（2025-09-30）

📊 共 69 篇论文 | 🔗 16 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (37 🔗10) 支柱二：RL算法与架构 (RL & Architecture) (27 🔗5) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱八：物理动画 (Physics-based Animation) (2) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (37 篇)

#	题目	一句话要点	标签	🔗
1	DecepChain: Inducing Deceptive Reasoning in Large Language Models	DecepChain：诱导大语言模型产生欺骗性推理链的后门攻击	large language model chain-of-thought	✅
2	MultiFair: Multimodal Balanced Fairness-Aware Medical Classification with Dual-Level Gradient Modulation	提出MultiFair，通过双层梯度调制解决多模态医学分类中的不平衡与公平性问题	multimodal
3	Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models	提出FreeDave算法，实现扩散大语言模型无损并行解码，显著提升推理速度。	large language model
4	Large Language Models Inference Engines based on Spiking Neural Networks	提出NeurTransformer，利用脉冲神经网络加速Transformer模型推理。	large language model
5	AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond	AccidentBench：构建大规模多模态基准，评估车辆事故及其他安全场景下的理解与推理能力	multimodal	✅
6	Memory-Driven Self-Improvement for Decision Making with Large Language Models	提出基于记忆驱动的自提升框架，提升LLM在序列决策任务中的性能	large language model
7	NeuroTTT: Bridging Pretraining-Downstream Task Misalignment in EEG Foundation Models via Test-Time Training	提出NeuroTTT，通过测试时训练桥接脑电图预训练模型与下游任务的错位问题	foundation model	✅
8	MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning	提出MIDAS，通过不一致数据增强解决多模态不平衡学习问题	multimodal
9	Kairos: Towards Adaptive and Generalizable Time Series Foundation Models	Kairos：面向自适应和泛化时间序列的通用模型框架	foundation model	✅
10	Guiding Mixture-of-Experts with Temporal Multimodal Interactions	提出时序多模态交互引导的MoE架构，提升多模态模型性能与可解释性	multimodal
11	Layer-wise dynamic rank for compressing large language models	提出D-Rank：一种层级动态秩分配框架，用于压缩大型语言模型	large language model
12	Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training	揭示LLM视觉先验：通过语言预训练学习视觉感知与推理能力	large language model multimodal
13	ACT: Agentic Classification Tree	提出Agentic Classification Tree以解决AI决策透明性问题	large language model chain-of-thought
14	Attribution-Guided Decoding	提出基于归因引导的解码方法AGD，提升LLM指令遵循和知识准确性。	large language model instruction following
15	Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking	提出Expert Merging，通过无监督专家对齐和重要性引导的分层分块实现模型融合。	large language model multimodal	✅
16	Adaptive and Resource-efficient Agentic AI Systems for Mobile and Embedded Devices: A Survey	针对移动和嵌入式设备，提出自适应且资源高效的Agentic AI系统综述	foundation model multimodal
17	LLM-Generated Samples for Android Malware Detection	利用LLM生成样本增强Android恶意软件检测，提升数据稀缺场景下的模型性能。	large language model
18	In-Context Curiosity: Distilling Exploration for Decision-Pretrained Transformers on Bandit Tasks	提出基于预测误差的好奇心驱动的离线预训练方法，提升决策Transformer在Bandit任务中的泛化性。	large language model
19	Which Programming Language and Model Work Best With LLM-as-a-Judge For Code Retrieval?	利用大型语言模型优化代码检索与注释生成	large language model	✅
20	From Trace to Line: LLM Agent for Real-World OSS Vulnerability Localization	提出T2L-Agent，用于开源软件漏洞的精准行级定位与修复	large language model
21	DiSC-AMC: Token- and Parameter-Efficient Discretized Statistics In-Context Automatic Modulation Classification	DiSC-AMC：面向自动调制分类，提出token和参数高效的离散化统计上下文学习方法	large language model
22	Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT	提出ACT-ViT，利用激活张量检测大语言模型中的幻觉问题	large language model	✅
23	Predicting Effects, Missing Distributions: Evaluating LLMs as Human Behavior Simulators in Operations Management	评估LLM在运营管理中作为人类行为模拟器的能力：效果预测与分布对齐	chain-of-thought
24	The Pitfalls of KV Cache Compression	揭示KV缓存压缩在多指令场景下的缺陷，提出改进方案	instruction following
25	Thoughtbubbles: an Unsupervised Method for Parallel Thinking in Latent Space	提出Thoughtbubbles，一种在隐空间进行并行自适应计算的无监督Transformer方法。	chain-of-thought
26	LoRAFusion: Efficient LoRA Fine-Tuning for LLMs	LoRAFusion：一种高效的LLM LoRA微调系统，通过内核融合和自适应批处理优化性能。	large language model	✅
27	GRPO-$λ$: Credit Assignment improves LLM Reasoning	GRPO-λ：通过改进信用分配提升大型语言模型推理能力	large language model
28	PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning	PrunedLoRA：基于梯度鲁棒结构化剪枝的低秩自适应微调方法	large language model
29	Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls	Transformer无法学习乘法？逆向工程揭示了长程依赖的陷阱	chain-of-thought
30	Estimating Dimensionality of Neural Representations from Finite Samples	提出一种偏差校正的维度估计器，用于解决神经表征维度估计中样本量依赖问题。	large language model
31	TASP: Topology-aware Sequence Parallelism	TASP：一种拓扑感知的序列并行方法，提升长文本LLM在现代加速器上的通信效率。	large language model	✅
32	AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size	AdaBlock-dLLM：通过自适应块大小实现语义感知的扩散LLM推理	large language model
33	Are neural scaling laws leading quantum chemistry astray?	揭示神经标度律在量子化学中面临的挑战：单纯扩大模型和数据规模不足以保证可靠性	foundation model
34	Beyond Linear Probes: Dynamic Safety Monitoring for Language Models	提出截断多项式分类器，用于大语言模型动态安全监控，提升效率与安全性。	large language model	✅
35	Muon Outperforms Adam in Tail-End Associative Memory Learning	Muon优化器在长尾关联记忆学习中优于Adam，通过奇异谱分析揭示其优势	large language model
36	Better Privilege Separation for Agents by Restricting Data Types	提出基于数据类型限制的特权分离方法，系统性防御LLM中的Prompt注入攻击。	large language model
37	Rotation Control Unlearning: Quantifying and Controlling Continuous Unlearning for LLM with The Cognitive Rotation Space	提出旋转控制卸载学习(RCU)，解决LLM持续卸载中的灾难性遗忘问题。	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (27 篇)

#	题目	一句话要点	标签	🔗
38	Distillation of Large Language Models via Concrete Score Matching	提出Concrete Score Distillation，解决LLM蒸馏中logit信息损失和解空间限制问题	distillation large language model instruction following
39	Clip-Low Increases Entropy and Clip-High Decreases Entropy in Reinforcement Learning of Large Language Models	揭示PPO/GRPO中裁剪机制对LLM强化学习熵的影响，提出clip-low增加探索。	reinforcement learning PPO large language model
40	OPPO: Accelerating PPO-based RLHF via Pipeline Overlap	OPPO：通过流水线重叠加速基于PPO的RLHF训练	reinforcement learning PPO RLHF
41	Thin Bridges for Drug Text Alignment: Lightweight Contrastive Learning for Target Specific Drug Retrieval	提出轻量级对比学习桥接方法，用于靶点特异性药物文本对齐与检索。	contrastive learning foundation model multimodal
42	Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models	提出递归自聚合(RSA)方法，提升大语言模型在推理时的深度思考能力。	reinforcement learning large language model	✅
43	TAP: Two-Stage Adaptive Personalization of Multi-task and Multi-Modal Foundation Models in Federated Learning	提出TAP：联邦学习中多任务多模态基础模型的两阶段自适应个性化方法	distillation foundation model	✅
44	CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models	提出CAST框架，实现大语言模型半结构化稀疏训练，提升推理效率。	distillation large language model
45	Data-to-Energy Stochastic Dynamics	提出数据到能量的随机动力学方法，解决无数据样本下的薛定谔桥问题。	reinforcement learning flow matching multimodal	✅
46	Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners	提出TFPI，加速RLVR训练，提升推理模型效率与性能	reinforcement learning distillation chain-of-thought
47	Efficient On-Policy Reinforcement Learning via Exploration of Sparse Parameter Space	提出ExploRLer，通过探索稀疏参数空间提升On-Policy强化学习效率	reinforcement learning PPO
48	Less is More: Towards Simple Graph Contrastive Learning	提出一种简化的图对比学习方法，有效解决异质图上的表示学习问题。	representation learning contrastive learning
49	Boundary-to-Region Supervision for Offline Safe Reinforcement Learning	提出B2R框架，通过非对称条件作用解决离线安全强化学习中的约束满足问题	reinforcement learning	✅
50	Clarification as Supervision: Reinforcement Learning for Vision-Language Interfaces	提出自适应澄清强化学习(AC-RL)，提升视觉语言模型在视觉数学推理中的性能。	reinforcement learning
51	Reweighted Flow Matching via Unbalanced OT for Label-free Long-tailed Generation	提出UOT-RFM，通过非平衡最优传输和重加权解决无标签长尾生成问题	flow matching
52	Debunk the Myth of SFT Generalization	通过提示多样性和思维链，提升SFT在决策任务中的泛化能力	reinforcement learning chain-of-thought	✅
53	Directed-MAML: Meta Reinforcement Learning Algorithm with Task-directed Approximation	提出Directed-MAML，通过任务导向近似加速元强化学习收敛并降低计算成本。	reinforcement learning
54	Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models	AttnRL：基于注意力机制的强化学习框架，提升推理模型的过程监督探索效率	reinforcement learning large language model
55	Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning	提出条件奖励建模（CRM）以提升LLM推理能力，解决过程奖励模型的局限性。	reinforcement learning large language model
56	Extensions of Robbins-Siegmund Theorem with Applications in Reinforcement Learning	扩展Robbins-Siegmund定理，解决强化学习中非可和零阶项的收敛性分析难题	reinforcement learning
57	Alignment-Aware Decoding	提出对齐感知解码（AAD），在推理阶段提升大语言模型的对齐效果。	DPO large language model
58	RL-Guided Data Selection for Language Model Finetuning	提出基于强化学习的数据选择方法，提升大语言模型微调效率与性能。	reinforcement learning large language model
59	Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation	Knapsack RL：通过优化预算分配解锁LLM的探索能力	reinforcement learning large language model
60	Learning to Reason as Action Abstractions with Scalable Mid-Training RL	提出RA3算法，通过可扩展的中期训练强化学习提升代码生成任务性能	reinforcement learning large language model
61	Improving Sampling Efficiency in RLVR through Adaptive Rollout and Response Reuse	AR3PO：通过自适应Rollout和响应复用提升RLVR采样效率	reinforcement learning large language model
62	Differentiable Autoencoding Neural Operator for Interpretable and Integrable Latent Space Modeling	提出可微自编码神经算子DIANO，用于可解释和可集成的隐空间建模	latent dynamics spatiotemporal
63	Accelerating Transformers in Online RL	提出加速器策略训练Transformer，解决在线强化学习中Transformer训练不稳定问题。	reinforcement learning behavior cloning
64	Informed Asymmetric Actor-Critic: Leveraging Privileged Signals Beyond Full-State Access	提出Informed Asymmetric Actor-Critic，利用特权信号提升部分可观测环境下的强化学习。	reinforcement learning privileged information

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
65	Revoking Amnesia: RL-based Trajectory Optimization to Resurrect Erased Concepts in Diffusion Models	提出RevAm，通过强化学习优化扩散模型采样轨迹，恢复被擦除的概念	manipulation trajectory optimization
66	Noise-Guided Transport for Imitation Learning	提出噪声引导传输（NGT），解决低数据量模仿学习中的专家策略学习问题。	humanoid imitation learning	✅

🔬 支柱八：物理动画 (Physics-based Animation) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
67	Parametric Neural Amp Modeling with Active Learning	提出Panama，利用主动学习训练参数化吉他放大器模型，逼近非参数模型效果。	AMP
68	Unsupervised Detection of Spatiotemporal Anomalies in PMU Data Using Transformer-Based BiGAN	提出T-BiGAN，用于电力系统PMU数据中时空异常的无监督检测。	spatiotemporal

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
69	DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick	提出DiVeQ以解决向量量化中的梯度阻塞问题	VQ-VAE

⬅️ 返回 cs.LG 首页 · 🏠 返回主页

cs.LG（2025-09-30）

🎯 兴趣领域导航

🔬 支柱九：具身大模型 (Embodied Foundation Models) (37 篇)

🔬 支柱二：RL算法与架构 (RL & Architecture) (27 篇)

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

🔬 支柱八：物理动画 (Physics-based Animation) (2 篇)

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册