cs.LG(2025-09-18)

📊 共 25 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (13 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (9) 支柱八:物理动画 (Physics-based Animation) (2) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (13 篇)

#题目一句话要点标签🔗
1 Self-Improving Embodied Foundation Models 提出自提升具身基础模型,通过两阶段训练实现机器人自主技能学习。 reinforcement learning imitation learning large language model
2 Exploring multimodal implicit behavior learning for vehicle navigation in simulated cities 提出数据增强隐式行为克隆,解决城市车辆导航多模态决策问题 behavior cloning multimodal
3 Fleming-R1: Toward Expert-Level Medical Reasoning via Reinforcement Learning Fleming-R1:通过强化学习实现专家级医学推理 reinforcement learning large language model chain-of-thought
4 The Energy-Efficient Hierarchical Neural Network with Fast FPGA-Based Incremental Learning 提出基于FPGA加速的能量高效分层神经网络,用于快速增量学习。 representation learning large language model foundation model
5 FlowRL: Matching Reward Distributions for LLM Reasoning FlowRL:通过匹配奖励分布提升大语言模型推理能力 reinforcement learning PPO large language model
6 Reinforcement Learning Agent for a 2D Shooter Game 提出结合模仿学习与强化学习的混合训练方法,提升2D射击游戏AI智能体性能 reinforcement learning imitation learning
7 Structure-Aware Contrastive Learning with Fine-Grained Binding Representations for Drug Discovery 提出结构感知对比学习框架,结合精细结合表征,提升药物发现中DTI预测性能。 linear attention contrastive learning
8 ToolSample: Dual Dynamic Sampling Methods with Curriculum Learning for RL-based Tool Learning 提出DSCL框架,通过双重动态采样与课程学习提升RL工具学习效率 reinforcement learning curriculum learning
9 Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation EVOL-RL:一种无标签进化语言模型框架,通过多数驱动选择和新颖性促进变异实现自提升。 reinforcement learning large language model
10 Mind the Gap: Data Rewriting for Stable Off-Policy Supervised Fine-Tuning 提出数据重写框架,解决SFT中Off-Policy学习的分布偏移问题 policy learning large language model
11 Stochastic Bilevel Optimization with Heavy-Tailed Noise 提出N²SBA方法以解决带重尾噪声的双层优化问题 reinforcement learning large language model
12 Self-Explaining Reinforcement Learning for Mobile Network Resource Allocation 提出基于自解释神经网络的强化学习方法,用于解决移动网络资源分配问题。 reinforcement learning
13 Leveraging Reinforcement Learning, Genetic Algorithms and Transformers for background determination in particle physics 利用强化学习、遗传算法和Transformer解决粒子物理背景确定问题 reinforcement learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
14 Temporal Reasoning with Large Language Models Augmented by Evolving Knowledge Graphs 提出EvoReasoner以解决动态知识推理问题 large language model
15 CARGO: A Framework for Confidence-Aware Routing of Large Language Models CARGO:一种置信度感知的LLM路由框架,优化性能与成本。 large language model
16 CoopQ: Cooperative Game Inspired Layerwise Mixed Precision Quantization for LLMs 提出CoopQ,利用合作博弈优化LLM混合精度量化,显著提升低比特量化性能。 large language model
17 Predicting Language Models' Success at Zero-Shot Probabilistic Prediction 提出评估指标以预测大型语言模型在零样本概率预测任务中的性能表现 large language model
18 Adaptive LoRA Experts Allocation and Selection for Federated Fine-Tuning FedLEASE:联邦学习中自适应LoRA专家分配与选择,提升异构数据下的微调性能。 large language model
19 BabyHuBERT: Multilingual Self-Supervised Learning for Segmenting Speakers in Child-Centered Long-Form Recordings BabyHuBERT:面向儿童语音的长时录音说话人分割多语种自监督学习 foundation model
20 A Comparative Analysis of Transformer Models in Social Bot Detection 对比Transformer模型在社交机器人检测中的应用,揭示编码器模型的优势 large language model
21 Copycat vs. Original: Multi-modal Pretraining and Variable Importance in Box-office Prediction 提出融合电影海报视觉信息的多模态预训练模型,提升票房预测精度。 multimodal
22 Modeling Transformers as complex networks to analyze learning dynamics 将Transformer建模为复杂网络,分析LLM的学习动态 large language model

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
23 Solar Forecasting with Causality: A Graph-Transformer Approach to Spatiotemporal Dependencies SolarCAST:利用因果图Transformer预测太阳辐射,无需专用硬件。 spatiotemporal multimodal
24 Accurate typhoon intensity forecasts using a non-iterative spatiotemporal transformer model 提出TIFNet,一种非迭代时空Transformer模型,用于精准台风强度预测。 spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
25 Diffusion-Based Scenario Tree Generation for Multivariate Time Series Prediction and Multistage Stochastic Optimization 提出基于扩散模型的场景树生成框架DST,用于多元时间序列预测和多阶段随机优化。 MPC reinforcement learning

⬅️ 返回 cs.LG 首页 · 🏠 返回主页