cs.LG(2025-05-19)

📊 共 19 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (10 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (9)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)

#题目一句话要点标签🔗
1 Fractured Chain-of-Thought Reasoning 提出Fractured Sampling以提升大语言模型推理效率 large language model chain-of-thought
2 Walking the Tightrope: Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning 提出反事实偏好优化以解决非平稳环境中的有害概念漂移问题 large language model chain-of-thought
3 Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space 提出LatentSeek以提升大语言模型的推理能力 large language model chain-of-thought
4 Fine-tuning Quantized Neural Networks with Zeroth-order Optimization 提出量化神经网络的零阶优化方法以解决内存瓶颈问题 large language model
5 Breaking the Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression 提出UltraDelta以解决数据依赖的超高效增量压缩问题 large language model
6 TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks 提出TinyAlign以解决轻量级视觉语言模型对齐瓶颈问题 multimodal
7 Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens 提出无语义中间标记以挑战推理模型的传统理解 chain-of-thought
8 Panda: A pretrained forecast model for chaotic dynamics 提出Panda模型以解决混沌动力学预测问题 foundation model
9 Incentivizing Truthful Language Models via Peer Elicitation Games 提出同行引导游戏以解决语言模型的真实报告问题 large language model
10 FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference 提出FreeKV以解决长上下文KV缓存检索效率问题 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)

#题目一句话要点标签🔗
11 Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning 提出动态适应世界模型以增强离线模型强化学习的鲁棒性 reinforcement learning policy learning offline reinforcement learning
12 Modular Diffusion Policy Training: Decoupling and Recombining Guidance and Diffusion for Offline RL 提出模块化扩散策略训练以优化离线强化学习 reinforcement learning offline RL diffusion policy
13 Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning 提出TempDATA以解决离线强化学习中的稀疏奖励问题 reinforcement learning offline reinforcement learning model-based RL
14 HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity 构建HR-VILAGE-3K3M数据集以解决呼吸病毒免疫研究中的数据不足问题 predictive model foundation model multimodal
15 4Hammer: a board-game reinforcement learning environment for the hour long time frame 提出4Hammer以解决长时间框架下强化学习环境不足问题 reinforcement learning large language model
16 Mean Flows for One-step Generative Modeling 提出MeanFlow模型以解决一阶段生成建模问题 flow matching curriculum learning distillation
17 RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs 分析强化学习后训练在大语言模型中的结构假设 reinforcement learning large language model
18 Optimizing Anytime Reasoning via Budget Relative Policy Optimization 提出AnytimeReasoner以优化大语言模型的即时推理能力 reinforcement learning large language model
19 One-Step Offline Distillation of Diffusion-based Models via Koopman Modeling 提出基于Koopman建模的一步离线蒸馏方法以提升扩散模型效率 distillation

⬅️ 返回 cs.LG 首页 · 🏠 返回主页