cs.CV(2025-08-26)
📊 共 20 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (6 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (5 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗1)
支柱五:交互与反应 (Interaction & Reaction) (2 🔗1)
支柱四:生成式动作 (Generative Motion) (1 🔗1)
支柱一:机器人控制 (Robot Control) (1 🔗1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Dual Enhancement on 3D Vision-Language Perception for Monocular 3D Visual Grounding | 提出双重增强方法以解决单目3D视觉定位问题 | visual grounding | ||
| 2 | Decouple, Reorganize, and Fuse: A Multimodal Framework for Cancer Survival Prediction | 提出DeReF框架以解决癌症生存预测中的信息融合问题 | multimodal | ||
| 3 | Beyond the Textual: Generating Coherent Visual Options for MCQs | 提出跨模态选项合成框架以生成视觉选项的多项选择题 | multimodal chain-of-thought | ||
| 4 | Autoregressive Universal Video Segmentation Model | 提出自回归通用视频分割模型以解决无提示分割问题 | foundation model | ||
| 5 | Event-Enriched Image Analysis Grand Challenge at ACM Multimedia 2025 | 提出EVENTA挑战以解决事件级多模态理解问题 | multimodal | ✅ | |
| 6 | OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward | 提出OwlCap以解决视频字幕生成中的运动细节不平衡问题 | large language model |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | ColorGS: High-fidelity Surgical Scene Reconstruction with Colored Gaussian Splatting | 提出ColorGS以解决内窥镜视频中组织重建的色彩与变形建模问题 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 8 | Can we make NeRF-based visual localization privacy-preserving? | 提出ppNeSF以解决NeRF视觉定位中的隐私问题 | NeRF | ||
| 9 | PseudoMapTrainer: Learning Online Mapping without HD Maps | 提出PseudoMapTrainer以解决在线地图训练依赖高清地图的问题 | gaussian splatting splatting | ||
| 10 | SoccerNet 2025 Challenges Results | SoccerNet 2025挑战推动足球视频理解研究进展 | depth estimation monocular depth | ||
| 11 | Robust and Label-Efficient Deep Waste Detection | 提出基于集成的半监督学习框架以提升废物检测效率 | open-vocabulary open vocabulary | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | MIDAS: Multimodal Interactive Digital-humAn Synthesis via Real-time Autoregressive Video Generation | 提出MIDAS框架以解决实时多模态交互数字人合成问题 | world model large language model multimodal | ||
| 13 | Geo2Vec: Shape- and Distance-Aware Neural Representation of Geospatial Entities | 提出Geo2Vec以解决地理实体表示学习中的高计算成本问题 | representation learning spatial relationship | ✅ | |
| 14 | Flatness-aware Curriculum Learning via Adversarial Difficulty | 提出对抗性难度度量以解决课程学习与平坦最小值结合问题 | curriculum learning | ||
| 15 | Clustering-based Feature Representation Learning for Oracle Bone Inscriptions Detection | 提出基于聚类的特征表示学习方法以解决甲骨文检测问题 | representation learning |
🔬 支柱五:交互与反应 (Interaction & Reaction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | Rethinking Human-Object Interaction Evaluation for both Vision-Language Models and HOI-Specific Methods | 提出新基准数据集以评估人机交互检测方法的有效性 | human-object interaction HOI | ||
| 17 | DQEN: Dual Query Enhancement Network for DETR-based HOI Detection | 提出双查询增强网络以解决DETR基础的HOI检测问题 | human-object interaction HOI | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation | 提出OmniHuman-1.5以解决视频化身动画的情感表达问题 | physically plausible character animation large language model | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | All-in-One Slider for Attribute Manipulation in Diffusion Models | 提出全能滑块以解决生成图像属性操控难题 | manipulation | ✅ |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | Wan-S2V: Audio-Driven Cinematic Video Generation | 提出Wan-S2V以解决复杂影视动画生成问题 | character animation |