cs.LG(2025-12-19)

📊 共 9 篇论文

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (4) 支柱八:物理动画 (Physics-based Animation) (3) 支柱九:具身大模型 (Embodied Foundation Models) (2)

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
1 Trust-Region Adaptive Policy Optimization 提出TRAPO,交错SFT与RL优化LLM推理能力,显著提升数学推理性能。 reinforcement learning large language model
2 Assessing Long-Term Electricity Market Design for Ambitious Decarbonization Targets using Multi-Agent Reinforcement Learning 提出基于多智能体强化学习的电力市场长期设计评估框架,助力实现深度脱碳目标。 reinforcement learning
3 AdvJudge-Zero: Binary Decision Flips in LLM-as-a-Judge via Adversarial Control Tokens AdvJudge-Zero:通过对抗控制令牌翻转LLM评判器的二元决策 RLHF DPO
4 A Theoretical Analysis of State Similarity Between Markov Decision Processes 提出广义双模拟度量GBSM,用于评估马尔可夫决策过程间的状态相似性。 reinforcement learning representation learning

🔬 支柱八:物理动画 (Physics-based Animation) (3 篇)

#题目一句话要点标签🔗
5 MINPO: Memory-Informed Neural Pseudo-Operator to Resolve Nonlocal Spatiotemporal Dynamics 提出MINPO,利用记忆信息神经伪算子解决非局部时空动力学问题 spatiotemporal
6 Perfect reconstruction of sparse signals using nonconvexity control and one-step RSB message passing 提出基于非凸性控制和一步RSB消息传递的稀疏信号完美重构方法 AMP
7 Learning solution operator of dynamical systems with diffusion maps kernel ridge regression 提出基于扩散映射核岭回归(DM-KRR)的动力系统解算子学习方法,提升长期预测精度。 spatiotemporal

🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)

#题目一句话要点标签🔗
8 Enabling Disaggregated Multi-Stage MLLM Inference via GPU-Internal Scheduling and Resource Sharing 提出FlashCodec和UnifiedServe,通过GPU内调度和资源共享加速多阶段MLLM推理。 large language model multimodal
9 Weighted Stochastic Differential Equation to Implement Wasserstein-Fisher-Rao Gradient Flow 提出基于加权随机微分方程的Wasserstein-Fisher-Rao梯度流方法,提升生成模型采样效率。 multimodal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页