cs.LG(2026-04-01)
📊 共 22 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (8 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (8 🔗1)
支柱一:机器人控制 (Robot Control) (3)
支柱三:空间感知与语义 (Perception & Semantics) (2)
支柱六:视频提取与匹配 (Video Extraction) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | MOON3.0: Reasoning-aware Multimodal Representation Learning for E-commerce Product Understanding | 提出MOON3.0,一种推理感知的多模态表征学习方法,用于电商产品理解。 | reinforcement learning representation learning large language model | ||
| 2 | A Survey of On-Policy Distillation for Large Language Models | 针对大语言模型的On-Policy蒸馏方法综述,解决暴露偏差问题。 | imitation learning distillation large language model | ||
| 3 | Policy Improvement Reinforcement Learning | 提出PIRL框架,通过显式优化策略迭代间的累积改进来提升LLM的推理能力。 | reinforcement learning large language model | ||
| 4 | GUIDE: Reinforcement Learning for Behavioral Action Support in Type 1 Diabetes | 提出GUIDE框架,利用强化学习为1型糖尿病患者提供行为干预决策支持。 | reinforcement learning offline RL CQL | ||
| 5 | Focal plane wavefront control with model-based reinforcement learning | 提出基于模型强化学习的焦平面波前控制方法PO4NCPA,用于校正高对比度成像中的动态和静态像差。 | reinforcement learning model-based RL | ||
| 6 | NeuroDDAF: Neural Dynamic Diffusion-Advection Fields with Evidential Fusion for Air Quality Forecasting | NeuroDDAF:融合证据的神经动态扩散-平流场,用于空气质量预测 | representation learning MAE spatiotemporal | ||
| 7 | Deconfounding Scores and Representation Learning for Causal Effect Estimation with Weak Overlap | 提出去混淆评分以解决因果效应估计中的重叠问题 | representation learning | ||
| 8 | Learning to Hint for Reinforcement Learning | 提出HiLL框架,通过自适应提示学习提升强化学习在复杂任务中的性能。 | reinforcement learning | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)
🔬 支柱一:机器人控制 (Robot Control) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | Flow-based Policy With Distributional Reinforcement Learning in Trajectory Optimization | 提出基于Flow的策略与分布强化学习算法FP-DRL,提升轨迹优化中多模态策略的表达能力。 | trajectory optimization reinforcement learning DRL | ||
| 18 | Gradient-Based Data Valuation Improves Curriculum Learning for Game-Theoretic Motion Planning | 利用梯度数据估值改进博弈运动规划的课程学习 | motion planning curriculum learning | ||
| 19 | Convergence of Byzantine-Resilient Gradient Tracking via Probabilistic Edge Dropout | 提出基于概率边丢弃的拜占庭容错梯度追踪方法,解决分布式优化中的恶意攻击问题。 | manipulation |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | ActivityNarrated: An Open-Ended Narrative Paradigm for Wearable Human Activity Understanding | 提出ActivityNarrated框架,以开放式叙事范式提升可穿戴设备的人类活动理解能力 | open-vocabulary open vocabulary language conditioned | ||
| 21 | Property-Level Flood Risk Assessment Using AI-Enabled Street-View Lowest Floor Elevation Extraction and ML Imputation Across Texas | 利用AI街景图像和机器学习插补进行德克萨斯州房屋级洪水风险评估 | elevation map |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 22 | LAtent Phase Inference from Short time sequences using SHallow REcurrent Decoders (LAPIS-SHRED) | LAPIS-SHRED:利用浅层循环解码器从短时序列推断潜在相位,重建时空动态。 | sparse sensors spatiotemporal |