cs.CV(2025-05-03)
📊 共 17 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (4)
支柱二:RL算法与架构 (RL & Architecture) (2)
支柱六:视频提取与匹配 (Video Extraction) (2 🔗1)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱八:物理动画 (Physics-based Animation) (1)
支柱四:生成式动作 (Generative Motion) (1)
支柱一:机器人控制 (Robot Control) (1)
支柱五:交互与反应 (Interaction & Reaction) (1 🔗1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | AquaGS: Fast Underwater Scene Reconstruction with SfM-Free Gaussian Splatting | 提出AquaGS以解决水下场景重建速度慢与精度低的问题 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 2 | GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting | 提出GenSync框架以解决多身份口型同步问题 | 3D gaussian splatting gaussian splatting splatting | ||
| 3 | HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder | 提出HybridGS以解决3D高斯点云压缩效率低的问题 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 4 | RESAnything: Attribute Prompting for Arbitrary Referring Segmentation | 提出RESAnything以解决任意指称分割问题 | open-vocabulary open vocabulary large language model |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | Knowledge-Augmented Language Models Interpreting Structured Chest X-Ray Findings | 提出CXR-TextInter以解决胸部X光图像解读问题 | large language model foundation model multimodal | ||
| 6 | Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study | 提出自动化ARAT评分系统以解决中风康复评估的时间和准确性问题 | multimodal | ||
| 7 | Mitigating Group-Level Fairness Disparities in Federated Visual Language Models | 提出FVL-FP框架以解决联邦视觉语言模型中的群体公平性问题 | multimodal | ||
| 8 | Enhancing the Learning Experience: Using Vision-Language Models to Generate Questions for Educational Videos | 利用视觉-语言模型生成教育视频问题以提升学习体验 | multimodal |
🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | Multimodal Graph Representation Learning for Robust Surgical Workflow Recognition with Adversarial Feature Disentanglement | 提出多模态图表示学习以解决手术工作流程识别的鲁棒性问题 | representation learning multimodal | ||
| 10 | PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth | 提出PosePilot以解决摄像头姿态控制问题 | world model depth estimation geometric consistency |
🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | Vision and Intention Boost Large Language Model in Long-Term Action Anticipation | 提出意图条件视觉语言模型以解决长时间动作预测问题 | Ego4D large language model | ||
| 12 | MVHumanNet++: A Large-scale Dataset of Multi-view Daily Dressing Human Captures with Richer Annotations for 3D Human Digitization | 提出MVHumanNet++以解决3D人类数字化数据不足问题 | SMPL SMPL-X large language model | ✅ |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | 3DWG: 3D Weakly Supervised Visual Grounding via Category and Instance-Level Alignment | 提出3DWG以解决3D弱监督视觉定位中的类别与实例复杂性问题 | spatial relationship visual grounding |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | VideoLLM Benchmarks and Evaluation: A Survey | 评估视频大语言模型的基准与方法论综述 | spatiotemporal large language model multimodal |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | Efficient 3D Full-Body Motion Generation from Sparse Tracking Inputs with Temporal Windows | 提出基于MLP的高效3D全身动作生成方法以解决稀疏追踪输入问题 | motion generation |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models | 提出视觉语言模型的视觉视角理解评估方法 | humanoid scene understanding |
🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | Co$^{3}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion | 提出Co$^{3}$Gesture以解决双人互动语音手势生成问题 | mutual attention | ✅ |