cs.CV(2025-08-12)
📊 共 29 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (9 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (8)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗1)
支柱八:物理动画 (Physics-based Animation) (3 🔗2)
支柱四:生成式动作 (Generative Motion) (3)
支柱五:交互与反应 (Interaction & Reaction) (1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding | 提出DocThinker以解决多模态大语言模型的可解释性与适应性问题 | reinforcement learning policy learning large language model | ✅ | |
| 19 | UltraLight Med-Vision Mamba for Classification of Neoplastic Progression in Tubular Adenomas | 提出Ultralight Med-Vision Mamba以解决肠道腺瘤分类问题 | Mamba SSM | ||
| 20 | Addressing Bias in VLMs for Glaucoma Detection Without Protected Attribute Supervision | 提出无监督属性去偏见方法以改善青光眼检测 | contrastive learning multimodal | ||
| 21 | AME: Aligned Manifold Entropy for Robust Vision-Language Distillation | 提出AME以解决视觉-语言蒸馏中的不确定性问题 | distillation |
🔬 支柱八:物理动画 (Physics-based Animation) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 22 | FusionEnsemble-Net: An Attention-Based Ensemble of Spatiotemporal Networks for Multimodal Sign Language Recognition | 提出FusionEnsemble-Net以解决多模态手语识别问题 | spatiotemporal multimodal | ✅ | |
| 23 | KFFocus: Highlighting Keyframes for Enhanced Video Understanding | 提出KFFocus以解决视频理解中的关键帧压缩问题 | spatiotemporal large language model multimodal | ||
| 24 | UniConvNet: Expanding Effective Receptive Field while Maintaining Asymptotically Gaussian Distribution for ConvNets of Any Scale | 提出UniConvNet以扩展有效感受野并保持高斯分布 | UniCon | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 25 | X-UniMotion: Animating Human Images with Expressive, Unified and Identity-Agnostic Motion Latents | 提出X-UniMotion以实现高保真、身份无关的人体动画 | motion latent | ||
| 26 | Spatial-Temporal Multi-Scale Quantization for Flexible Motion Generation | 提出多尺度量化方法以解决人类动作生成的灵活性问题 | motion generation | ||
| 27 | RealisMotion: Decomposed Human Motion Control and Video Generation in the World Space | 提出RealisMotion以解决人类运动控制与视频生成的挑战 | text-to-motion |
🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 28 | QueryCraft: Transformer-Guided Query Initialization for Enhanced Human-Object Interaction Detection | 提出QueryCraft以解决HOI检测中查询初始化不足问题 | human-object interaction HOI |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 29 | SegDAC: Improving Visual Reinforcement Learning by Extracting Dynamic Objectc-Centric Representations from Pretrained Vision Models | 提出SegDAC以解决视觉强化学习中的动态对象表示问题 | manipulation reinforcement learning |